Parallel File Systems


Lustre was arguably the gold standard of parallel file systems around 15 years ago. It was a free parallel file system, the module was built into and distributed with the Linux kernel, and it was used at a majority of large HPC centers. Despite being a free project, it was "purchased" by Sun Microsystems (along with MySQL, at the time). When Sun finally folded, Oracle Corporation acquired sun's assets, including Lustre (and MySQL). Shortly thereafter, Oracle announced it would stop supporting Lustre (they're a database company, after all), sending the HPC community into a tizzy. The code for Lustre was forked by multiple organizations (OpenSFS, EOFS, Whamcloud, etc.). Over the past decade or so, Lustre (in its various instantiations) has changed hands multiple times, bouncing through Xyratex, Intel, Seagate, DDN, and probably your grandmother, at some point.

Things have started to settle down a bit in the Lustre community, although things haven't completely recovered from the splintering. A lot of large HPC centers are still running Lustre (of some flavor) but many of the smaller centers with fewer resources have started looking for something a little more stable with reasonable performance.


PVFS is one of the oldest parallel file systems still available. It has also gone through a fair number of owners and revisions, The code was re-engineered as PVFS2, mostly at ANL (Argonne National Laboratory) with help from Clemson University and the Ohio Supercomputer Center. After a few year, the code was forked into two branches: Orange and Blue. OrangeFS is still maintained by Clemson University and is still targeted toward general purpose Linux clusters. BlueFS was maintained by ANL, though it's not clear that there's any current work going on with it.

PVFS/OrangeFS provides reasonably good performance for a parallel file system, at least for large block sizes of data. Additionally, it was reasonably easy to set up, even though very small mistakes in specifying the configuration would render your file system inoperable. It seems to be mostly falling out of favor in preference for more modern parallel file systems.


BeeGFS is a relatively new addition to the parallel file system landscape, but has generated a lot of interest, and has been developing rapidly. I haven't personally worked with this (yet, but it's on my list for the Raspberry Pi cluster), so I can't attest to performance or ease of implementation and use. It was a large and loyal following, though, so it's worth looking into.


Ceph is another relatively new addition to the parallel file system landscape, with an initial public release in 2012. It's actually been around a lot longer than that, however, and those of us who worked with some of the earlier versions are still feeling some pain from the early development releases.

Nevertheless, development is still ongoing, and Ceph provides some nice features that were designed in from the start, instead of pasted on after the fact like many other options. Ceph may be worth another look, and is another project for my Raspberry Pi cluster.


GPFS is a commercial clustered file system built by IBM (now called IBM Spectrum Scale). Prior to Lustre, GPFS was the fastest file system around. Even after Lustre became the Gold Standard, GPFS performed very well, and was made available on Linux clusters, and even on Windows (!).

GPFS takes some getting used to, and sometimes fails or goes sideways in strange and unusual ways. With the other parallel file systems, your system administrator can usually untangle problems. With GPFS (and possibly Lustre), you really need someone who has intimate knowledge of the internal workings of this before you try running it in production.


WekaIO is a commercial parallel file system, and probably the newest in this list. I've spoken with the people at WekaIO, and they have a good story. I know some centers who are officially "running" WekaIO, but so far I can't find anyone who has actually implemented it and has any reasonable experiences. As a commercial solution, I won't be able to do any testing with this (at least in my current position), but it's worth looking into if you're looking to deploy a parallel file system and have the time and funding to consider it.


While these are some of the more popular file systems, there are a number of others like Gluster from Red Hat, Google's GFS, Hadoop FS (does anyone really run Hadoop any more?), etc. The current array of file systems can reach wire-speed performance, so it's unlikely that anything else will be developed or gain wide acceptance until our networks advance significantly.