Monday, April 16, 2007

ZFS in the Linux kernel?

Due to the recent surge of interest in porting ZFS to the Linux kernel (if you are in the mood to read dozens of messages, see this thread, the follow-up, plus this one and one more), I'd like to offer my view on things.

I have a feeling most Linux kernel hackers (or at least those that talk about ZFS on linux-kernel) don't really know how ZFS works or what it can do. The best example is perhaps this message from Rik van Riel.

Well, first of all, ZFS doesn't have/need fsck (what?! are you nuts??). This is because ZFS checks and repairs the filesystem online and on-the-fly, as it is being used. And when it can't repair, it will pinpoint exactly which files and which bytes in those files were corrupted. You might think this is complex or expensive, but it's really simple and beautiful actually. These slides explain this and a lot more, so please read them carefully.

The great thing is that ZFS can also repair metadata on-the-fly even on ZFS pools that don't have any inherent redundancy (in other words, this also works for single disks). This is due to a feature called ditto blocks, which basically keeps multiple copies of metadata dynamically spread through the disk. Oh and now this works for data too, so you can configure your filesystem with important files to keep 2 or even 3 copies of data on the disk (this is despite any inherent pool redundancy).

ZFS has a lot of other nice things too, like cheap and instantaneous snapshots and clones, optional compression, variable sector sizes, easy management, .. I really think interested people should read these slides and try zfs-fuse.

Now regarding a ZFS port to the Linux kernel:

1) As for technical difficulty, I don't think it is a problem. I don't know Linux VFS internals, but if I was able to port it so easily to FUSE, it certainly can be done. I don't think this is a problem at all.

2) As for the license, well.. that is a real problem. I'm a big believer in FSF's ideals, but in this case I think the GPLv2 is preventing progress. It would be a big plus to have Linux benefit from a fully open-source, useful piece of functionality with 6 years of development behind it.

Of course, as Adrian Bunk put it, I don't think it'll be possible to have 10,000 (live and dead) people to agree on a licensing change.

One option would be to reimplement ZFS (or a comparable filesystem) from scratch. I don't think this is feasible, first because it would require a huge effort and several years to reach the same level of robustness as ZFS has right now. And second because Sun has filed more than 50 patents on ZFS. Even if Sun never uses those patents against Linux, some people might see it as a risk (in the United States).

The only way I'm seeing ZFS on the Linux kernel is to convince Sun to dual-license ZFS under the GPL and the CDDL. Some people might say Sun would never do this, but Sun has been very open to the open-source community recently. And in fact, Sun's ZFS FAQ initially had an answer saying Sun was considering a ZFS port to Linux (not to FUSE, that was my idea ;).

Finally I'd like to debunk a couple of myths about zfs-fuse:

1) In terms of features, zfs-fuse will certainly be comparable to a ZFS kernel implementation (and in fact, most of it already works). The only thing that can't be done is to store swap on a ZFS pool, due to the way ZFS works. You can see the STATUS file for more details about implemented features.
2) As for performance, well.. zfs-fuse is slow right now, but it will certainly improve. I haven't even started to seriously look at performance. And FUSE-based filesystems can have comparable performance to kernel filesystems, as the bottleneck is usually the disk(s), not the CPU.