Posted on 06/29/2011 6:16:30 AM PDT by ShadowAce
I was directed to a recent mailing list post by Linus Torvalds on linux-fsdevel in which he derided the concept of user-space filesystems. Not a particular implementation, mind you, but the very concept of it.
Jeff Darcy, of Red Hat and CloudFS fame, wrote a wonderful response, which you should read first before continuing further.
From my perspective, as the creator of GlusterFS, Linus is rather blinkered on this issue. The fact is, user space advantages far outweigh kernel space advantages. Youll notice that Linus pointed to no benchmarks or studies confirming his opinion, he merely presented his bias as if it were fact. It is not.
Hypervisors are the modern micro kernels. Microkernel is not about size, but about what should be in kernel mode. Linuss ideas about filesystems are rather old. He thinks that it is a bad idea to push the filesystems to user space, leaving the memory manager to run in kernel mode. The bulk of the memory buffers are filesystem contents, and you need both of them to work together. This is true for root filesystems with relatively small amounts of data but not true when it comes to scalable storage systems. Dont let the kernel manage the memory for you. In my opinion, Kernel-space does a poor job of handling large amounts of memory with 4k pages. If you see the bigger picture, disks and memory have grown much larger, and user requirements have grown 1000-fold. To handle todays scalable, highly available storage needs, filesystems need to scale across multiple commodity systems, which is much easier to do in user space. Real bottlenecks come from the network/disk latencies, buffer-copying and chatty IPC/RPC communications. Kernel-user context switches are hardly visible in the broader picture, thus whatever performance improvements it offers are irrelevant. Better, then, to use the simpler, easier methods offered in user-space to satisfy modern storage needs. Operating systems run in user-space in virtualized and cloud environments, and kernel developers should over come this mental barrier.
Once upon a time, Linus eschewed microkernels for a monolithic architecture for sake of simplicity. One would hope that he would be able to grasp the reasons why simplicity wins in this case, too. Unfortunately, he seems to have learned the wrong lesson from the microkernel vs. monolithic kernel debates: instead of the lesson being that all important stuff gets thrown into the kernel, it should have been that simplicity outweighs insignificant improvements elsewhere. We have seen this in the growth of virtualization and cloud computing, where the tradeoff between new features and performance loss has proved to be irrelevant.
There are bigger issues to address. Simplicity is *the* key to scalability. Features like online self-healing, online upgrade, online node addition/removal, HTTP based object protocol support, compression/encryption support, HDFS APIs, and certificate based security are complex in their own right. Necessitating that they be in kernel space only adds to the complexity, thus hampering progress and development. Kernel mode programming is too complex, too restrictive and unsustainable in many ways. It is hard to find kernel hackers, hard to write code and debug in kernel mode, and it is hard to handle hardware reliability when you scale out due to multiple points of failure.
GlusterFS got its inspiration from the GNU Hurd kernel. Many years before, GNU Hurd was able to mount tar balls as a filesystem, FTP as a filesystem, and POP3 as an mbox file. Users could extend the operating system in clever ways. A FUSE-like user space architecture was an inherent part of the Hurd operating system design. Instead of treating filesystems as a module of the operating system, Hurd treated Filesystems as the operating system. All parts of the operating system were developed as stackable modules, and Hurd handled hardware abstraction. Didnt we see the benefits of converging the volume manager, software RAID and filesystem in ZFS? GNU Hurd took it a step further, and GlusterFS brought it to the next level with Linux and other Unix kernels. It treats the Linux kernel as a microkernel that handles hardware abstraction and broaches the subject that everyone is thinking, if not stating outloud: the cloud is the operating system. In this brave new world, stuffing filesystems into kernel space is counter-productive and hinders development. GlusterFS has inverted the stack, with many traditional kernel space jobs now handled in user space.
In fact, when you begin to see the cloud and distributed computing as the future (and present), you realize that the entire nomenclature of user space vs. kernel space is anachronistic. In a world where entire operating systems sit inside virtualized containers in user space, what does it even mean to be kernel space any more? Looking at the broader trends, arguing against user space filesystems is like arguing against rising and falling tides. To suggest that nothing significant is accomplished in user space is to ignore all major computing advances of the last decade.
To solve 21st-century distributed computing problems, we needed 21st-century tools for the job, and we wrote them into GlusterFS. GlusterFS manages most of the operating system functionality within its own user space, from memory management, IO scheduling, volume management, NFS, and RDMA to RAID-like distribution. For memory management, it allocates large blocks for large files, resulting in far fewer page table entries, and it is easier to garbage collect in user space. Similarly with IO scheduling, GlusterFS uses elastic hashing across nodes and IO-threads within the nodes. It can scale threads on demand and group blocks belonging to the same inodes together, eliminating disk contention. GlusterFS does a better job of managing its memory or scheduling, and the Linux kernel doesnt have an integrated approach. It is user-space storage implementations that have scaled GNU/Linux OS beyond petabytes seamlessly. Thats not my opinion, its a fact: the largest deployments in the world are all user-space. Whats wrong with FUSE simplying filesystem development to the level of toy making? :-)
Thanks man, good article.
Two responses:
1. You can argue the theory now until forever but these questions are always complex enough that it all comes down to what works. When linux stared a micro-kernel was the “right” theoretical answer but the monolith worked. And it won.
2. So what? If Linux doesn’t like userland FS - so what - since it’s in user space, doesn’t that mean it can succeed with or without love from Linus? Or, alternatively, does it need certain kernel hooks to survive and indeed thrive?
Ouch, “If Linux doesnt like userland FS” should have been “If Linus doesnt like userland FS” - my fingers are too used to typing the “x” there LOL!
I am not a big fan of the whole concept of “the cloud”.
I simply do not see the point. I want the computing power and storage on my machine, independent of the web.
Storage is cheap and ram is relatively. Big file transfer takes time.
I am an old Linux user, since some time in 1997. Have been a Ham Op since 1976 and hold commercial radio license. Am very comfortable with all electronics, but not a programmer.
I question the motives for “the cloud” and do not think it is in the interest of the user.
OTOH, if the particular FS (XFS) is only user-space, then I don't see a problem.
Those who depend significantly on “the cloud” will find themselves sitting around wondering “WTF?” when hackers begin targeting “the cloud.”
In my opinion, “the cloud” is an idea to extend the control of the platform to a few big players in computing.
It is about compensating for their inability to significantly improve software to justify the high price. It is especially noticeable due to the Open Source apps first for Linux and now ported for Windows. For the average user there is no longer a need to purchase an upgraded office suite each time they change computers.
Some of the Open Source apps have functionality that the best close source does not. Example: regex find/replace in a spreadsheet. Gnumeric does that and it is ported for Windows and is free. (there are some things that Excel will do that Gnumeric will not)
BINGO!
don’t pay your monthly MS/Apple bill then no data for you.
ANYONE with confidential client data has to be insane to use the cloud system. It is just virtual servers with a HUGE liability stick.
Cloud is all about mobility and multiple devices - take those two factors away and it’s no longer a compelling argument. Add those factors back in and I figure it’s here to stay.
In my opinion the security issue outweighs the portability issue. At least for me.
The web compromises us all, but I try to minimize it.
If your data does not reside in your physical control (on your machine), it’s not your data.
Is “cloud” necessarily the same thing as a user space file system?
No and they’ve actually got nothing (that I can see) to do with each other - it’s just that the “cloud” is a sexier topic and more people are aware of the issues. Good question!
No, “the cloud” also includes access to apps as I understand it.
Quote:
Or, alternatively, does it need certain kernel hooks to survive and indeed thrive?
I would think it is like the vfs and vfs_ops.h family of header files and accompanying calls.
You produce alternate functions to be called when the user
mounts/dismounts a file system as well as alternate functions for when any code uses typical file manipulation library calls (open(), stat(), read(), seek() or a dozen other typical calls).
You create kernel modules to load these into the kernel, and if any utility (anything from “ls” to “write()”, etc)performs an action within your file system, then your function gets the callout, and you can have your way with the data.
Well of course this makes sense - everything eventually relates back to the kernel, doesn’t it?
So is the suggestion that Linus can effectively block these hooks from being integrated in, or does he basically say I don’t like it but do it if you must and we’ll see who comes out the winner in the end?
Quote:
“So is the suggestion that Linus can effectively block these hooks from being integrated in, or does he basically say I dont like it but do it if you must and well see who comes out the winner in the end?”
Well, they aren’t ‘Linux’ hooks alone.
It goes back to good-old raw down-to-earth UNIX days,
and VFS (virtual file systems) grew out of that.
AIX has forms of the calls, Solaris, HPUX, Apple’s OSX (since it is a BSD Unix), and all sorts of versions of Linux, etc.
I *think* Windows has VFS calls too.
I could be mistaken, though.
I only did a bit of kernel module stuff, and it has been awhile.
It sounded like good sexy work when I first started coding, but 20 years later when I actually had to do it I found that dealing with the kernel can be nasty.
Timing issues, callouts between system and user space, etc, can be a nightmare to write - much less debug.
I imagine all kernel hooks could be blocked - but then that wouldn’t be the UNIX way.
I think what Linus is saying is a tool like fuser (though I’ve never used it) is slow and not applicable to a direct hook into the kernel to be able to perform a lot of file manipulation.
That’s my guess.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.