Absolute File System Design
Posted on Monday, February 11, 2008.
Butler Lampson and Robert Sproull's 1979 SOSP paper “An Open Operating System for a Single-User Machine” (PDF; also HTML) describes the operating system on the Alto, a groundbreaking personal networked workstation built at Xerox PARC in the late 1970s and early 1980s. Many Alto features became everyday aspects of modern computing environments, but one that didn't catch on was the Alto's robust file system.
The Alto file system was designed to withstand disk or programming errors. If some set of disk blocks were lost, the Alto file system would only lose the data contained in those sectors. If you lose a data block from a file, you lose just that data block. If you lose a block in a directory, you lose the name for that file but not the file itself.
Section 3 of the paper gives the details. The basic technique is that each disk block contains a small header giving the id number and version of the associated file along with the block's offset in the file and the amount of data in the block (the last block in a file might contain less than a block worth of data). Lampson and Sproull refer to this information as absolute: it is the single point of truth for the block in question. A disk scavenger (like Unix's fsck) can scan the entire file system reading block labels to reconstruct the contents of any file.
Scanning the entire file system to find block contents is of course very slow. The block label also contains pointers to the previous and next block in a file, so that reading a file can just follow the block pointers (the file system layout algorithm arranged that much of the time files would be laid out sequentially so that following these pointers would not be too expensive in terms of disk seeks). These pointers are called hints, because they exist only for performance, not for correctness. If the hints are incorrect, that fact will become clear when they are used (the label of the pointed at block will not agree with expectations), and the scavenger can be invoked to fix the file system.
Contrast this scheme to inodes in a modern Unix file system (say, BSD FFS or Linux Ext2/3). If you lose the 512-byte disk block containing a 1GB file's inode, the entire file is inaccessible. No amount of scavenging can help, because the list of blocks making up the file was only contained in the inode, and the blocks themselves are not tagged with which file they belong to. This is much less robust than the Alto file system. Of course, the situation is even worse if you lose the 512-byte disk block containing the file system superblock: then you've lost the entire file system. Because the superblock is so crucial, file systems usually maintain backup copies of the superblock scattered across the disk. Inodes are almost as important but not so carefully guarded.
Why don't modern file systems don't use labels like the Alto did?
Disk geometry might play a role: it is hard to reconcile a (say) 32 byte label with (say) a 4096 byte data block. You put the label in the data block, cutting the actual data size to 4064 bytes. You could put the label in the sector before the block, taking up 512 bytes for a 32 byte label. Or you could write a label sector containing 16 labels every 16 data blocks. The third is probably the best option, but all of them are somewhat cumbersome. (But you'd lose less data!)
Another reason is historical: the original Unix file system didn't have any such mechanism, and FFS and Ext2/3 differ little from the original Unix file system.
Another reason might be philosophical: you're supposed to handle disk failures with backups, not clever file system tricks. This is the least useful argument: who wouldn't want a file system that needed to resort to backups less often?
Another reason might be pragmatic: maybe disks fail in different ways now than they did in 1979. (Of course, that doesn't explain the original Unix file system or FFS.) Since there's so little data on disk failures, it's hard to say anything for certain.
Whatever the arguments against labels, it is undeniable that they produced a very robust system. In fact, the operating system on the Alto could be almost completely swapped out an replaced with user programs, each of which could supply its own file system implementation. Over time, there came to be three file system implementations sharing the disk, one in BCPL, one in Mesa, and one in Smalltalk. In a retrospective paper, Lampson wrote:
If a disk address is wrong, the label check will detect the error, and various recovery mechanisms can be invoked. A Scavenger program, written by Jim Morris, takes about a minute to check or restore the consistency of an Alto file system; it can be routinely run by nonprofessional users. As a result of this conservative design, the file system has proved to be very reliably; loss of any information other than bits physically damaged on the disk is essentially unheard of. This reliability is achieved in spite of the fact that many programs besides the standard operating system have access to the disk.
Who wouldn't want a robust system like that? And why aren't more of today's file systems that robust?