Friday, December 15, 2006

Read-only support for ZFS on Linux

I know it has been a loong time since my last post (sorry!), but today I'm very excited to bring you zfs-fuse 0.3.0 which is able to mount ZFS filesystems in read-only mode :)

Current status:
  • It is possible to create and destroy ZFS pools, filesystems and snapshots.
  • It is possible to use disks (any block device, actually) and files as virtual devices (vdevs).
  • It is possible to use any vdev configuration supported by the original ZFS implementation. This includes striping (RAID-0), mirroring (RAID-1), RAID-Z and RAID-Z2.
  • It is possible to change properties of filesystems.
  • It is possible to mount ZFS filesystems, but you can only read files or directories, you can not create/modify/remove files or directories yet.
  • ZIL replay is not implemented yet.
  • It is not possible to mount snapshots.
  • It is not possible to use 'zfs send/recv'.
  • ACLs and extended attributes do not work.
  • There is no support for ZVols.
  • It's buggy and probably has a few memory leaks :p
If you want to test it, just download it and follow the README (don't forget to read the prerequisites).

A few notes:

  • Even though you can't write to filesystems, the pools are opened in read-write mode. There are bugs and they can possibly corrupt your pools, so don't use zfs-fuse on important files!
  • There's no point in running benchmarks, since it is still highly unoptimized.
  • You cannot write to ZFS filesystems yet, so the best you can do right now is populate a filesystem in Solaris and then mount it in Linux. I recommend you create your zpools on files (since it's easier to switch between Linux and Solaris), but you can also use it directly on block devices.
  • I recommend you use EVMS if you use ZFS pools on block devices, since it places all of them in /dev/evms - makes it easier to import pools.
  • If you create your zpools in Solaris directly on whole disks, it will create an EFI label, so to properly mount it on Linux you'll need GPT/EFI partition support configured in the kernel (I think most x86 and amd64 kernels don't have it enabled, so you must compile the kernel yourself). Since my USB disk has died, and I'm still waiting for a replacement, I can't properly test this yet. The last time I tried I had some difficulty getting it to work, but I think I was able to do it using EVMS.
  • In order to import zpools on block devices, you'll need to do 'zpool import -d /dev'. Be careful since at the moment zpool will try to open and find ZFS pools on every device in /dev! If you're using EVMS, use /dev/evms instead.
The project is progressing at a fast pace since last week, when I did some major code restructuring and finished uncommenting most of the original ZPL code :)

And I am still highly confused about vnode lifetimes, so expect some bugs and probably some memory leaks until I sort it out eventually..



Anonymous said...

great, I'll give it a try :)... just a quicky... when do you expect full write support in ZFS on Linux?


wizeman said...

I knew someone would ask that, I should have answered it in the post :p

The answer is.. I don't know, it depends on how many bugs I'll run into. But, in the current state of the code, it shouldn't be too hard ;)

I think the hardest part of the future medium-term work will be the ability to mount ZFS snapshots.

Anonymous said...

Wizeman, you are the man. Thanks so much for the work you are putting into this. ZFS is such an incredible tool that we have switched some infrastructure servers over to Solaris from SUSE Linux just to be able to use ZFS. We would love to have ZFS on Linux.


Anonymous said...

Thats good work.
Keep it up..

Do you have any suggestions for beta-testing, helping out??
(I'm not very VFS or FUSE aware, but I'm available to help a bit..)

Best Regards,

Miguel Filipe

wizeman said...

If you want to help beta-testing (more like alpha-testing), see the TESTING file.

Anonymous said...

"200+ MB", you say.
root 6281 36.1 2.4 1240224 25248 pts/6 Sl+ 03:52 0:05 ./ztest -T 3600 -f /home/zfstest

I become really unamused when processes attempt to use 2 GB and my system hangs because it's swapping constantly. :)

You might want to note that "200 MB" is a very very conservative estimate in the TESTING file next time.

wizeman said...

It only uses about 200 MB in my tests :p

Make sure you're not confusing VM size with memory usage, which are 2 different things.

Looking at the numbers you posted, I see that ztest was using 1240224 KB of VM size (VSZ column in ps), but in fact it was only using 25 MB of memory (RSS column).

Anonymous said...

Thanks allot for the great work! Stay cool during the northern hemisphere winter :)

Anonymous said...

You're correct, I'm sorry.

I was associating the VM usage with the unresponsiveness of my system, which was really caused by the load average of 85 that ztest caused. :(

Anonymous said...

Hey Man, great work !

Anonymous said...

Build failed on SUSE 10.2

gcc -o lib/libsolkerncompat/flock.o -c -DDEBUG -ggdb -pipe -std=c99 -Wall -Wno-missing-braces -Wno-parentheses -Wno-uninitialized -Werror -fno-strict-aliasing -finstrument-functions -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DTEXT_DOMAIN=\"zfs-fuse\" -D_KERNEL -Ilib/libsolkerncompat -Ilib/libsolkerncompat/include -Ilib/libsolkerncompat/include/i386 -Ilib/libumem/include lib/libsolkerncompat/flock.c
In file included from /usr/include/bits/fcntl.h:27,
from /usr/include/fcntl.h:34,
from /usr/include/sys/file.h:25,
from lib/libsolkerncompat/include/sys/file.h:30,
from lib/libsolkerncompat/include/sys/flock.h:31,
from lib/libsolkerncompat/flock.c:28:
/usr/include/bits/uio.h:45: error: redefinition of 'struct iovec'
scons: *** [lib/libsolkerncompat/flock.o] Error 1
scons: building terminated because of errors.

wizeman said...

That was already fixed in the mercurial repository, so you can either download from it (see the homepage) or wait until I release 0.3.1 :)

Eric said...

What can I do to help the project? I don't have solaris so I can't really test it... but is there any code that you need done? I'd really like to help this project (ok, I'm selfish - I want it to work ASAP).

wizeman said...

Hi eric,

If you want to try to code something up, take a look at zfs-fuse/zfs_operations.c.

At the end you'll find a list of the implemented FUSE operations. You can find out what operations are missing by looking at /usr/include/fuse/fuse-lowlevel.h.

At the moment it'll be relatively easy to get the remaining operations working.

I'm currently working on some permission issues, and trying to get the open and create operations working correctly with all the flags.

If you make progress and/or if you have any questions, contact me by email (see my address at the end of ) so that we can coordinate our efforts :)