next up previous
Next: The box Up: The Box: A Replacement Previous: The Box: A Replacement


The ``file'' has been used as an OS abstraction since the epoch. Mainly, because it is simple to understand. Most users3 are comfortable thinking they are opening, reading, writing, and closing files. Similarly, some operating systems, most notably Plan 9 [7], rely on the file as their central abstraction. By allowing remote access to files, they provide a simple yet powerful structure for building a distributed computing system.

But, the apparent simplicity of files hides complexity, which, sooner or later, has to be shown to the user. Abuse of ioctl in UNIX environments is an example. Establishment of network connections is another example, where additional complexity is needed when one uses the file abstraction. All in all, a network connection should not be a concern for OS users, as they usually think in terms of ``copying some data from one place to another''.

The low-level, ancient, nature of traditional read and write operations is far from the semantics of operations as seen by applications. Consider for instance an application copying a file. It executes:

  while (read(aFile,buffer))

When both aFile and otherFile are remote files held in the same file server, how many times should file data cross the network? The answer should be none, but experience says two. Under many circumstances, data copying can be avoided if the system knows both the source and the target data locations. Maintaining read and write as separate operations harms efficiency. Boxes provide a single operation (copy) as a replacement for read and write. It specifies both the source and target location of the data. Thus, copying boxes can be potentially more efficient than copying files (by means of file read and write operations).

Going further, is it adequate to maintain the notion of ``opening'' a file on a distributed system? Self contained operations (which do not require maintaining a ``connection'') can be more adequate in a distributed environment. This has been shown by systems we use everyday including stateless file systems like NFS [9]. On mobile environments, such avoidance of connection maintenance can be even more beneficial. Boxes are operated by means of self-contained operations which do not depend on maintaining (in the server) a per-client ``connection'' state.

Consider also some data being shared by different users or applications. How do files capture and handle sharing semantics? Usually on a share-all-or-nothing basis, and typically imposing the same sharing semantics to every shared file [9,10]. Boxes provide a share operation to express that two different boxes should be seen as a single one. The high level of the abstraction allows different underlying implementations on a per-box basis.

Yet another inconvenience of most file systems is that they provide either:

Untyped files, which do not capture application semantics and are error prone (e.g. they can lead to binary process images being ``nicely'' displayed on the user's terminal).
Strongly typed files, which are typically too restrictive. They bring the user into frustration: editors refuse to edit files, because their type is not ``text''; compilers refuse to compile compiler generated source files, because they are ``output'' files; etc. This model also raises the question of which types should the system provide and how to build new ones.

In both cases, opportunities to automate data conversions are lost. As an example, figures written using FIG format can be translated to encapsulated postscript by using a converter. Being the conversion really simple, it is often hard for users to automate--even on strongly typed file systems. (Of course, make or a similar program can be used, but files are not helping to automate the task. Besides, makefiles tend to get complex quickly, and many users would consider them as part of the problem, not as part of the solution.)

Boxes are typed, and need to be type-compatible to be copied or shared. However, boxes can be type-converted implicitly, on demand, to alleviate user frustration. There is a per-application type-converter set to automate translations. By using converters, boxes help in two different ways: (1) they invoke translators, automatically, to perform transformations; and (2) they are able to maintain the ``generated'' data up to date (as it will be seen in section 2.3). The converter set defines the strength of the typing with boxes.

Some applications might tailor their converter set to maintain the strongly-typed nature of boxes, while others might use it to effectively nullify the type system (e.g. by inserting an universal type converter). Therefore, the type system used for boxes is not as restrictive as traditional strongly typed file systems--users can relax and adjust it at will by means of the converter set.

Finally, another deficiency of the file abstraction is that it was designed for traditional computing systems with some core memory, and a bunch of secondary storage. Nowadays, neither palmtops nor smart cards have disks, and remote memory can have better latency than a local disk. The distinction between disk-memory and core-memory is not so clear. The need to patch the file abstraction with a separate memory map operation and the need to emulate files on disk-less environments (e.g. PDAs), equipped with just a bunch of non-volatile memory, can be symptoms claiming that files are an old concept, suggesting that different abstractions have to be devised.

Certainly, it would be feasible to implement a higher-level abstraction on top of files. However, boxes are as simple as files and could be used as a central abstraction for structuring OS services--without using yet another layer of software to adapt, inadequate, lower-level system services. Note that introducing another layer of software on top of files to implement a box-like abstraction is likely to degrade system performance.

In a few words, the file is a good and simple concept. But other abstractions, like the one we are proposing, could be a better choice for modern computing systems. It has just been the successfulness of files, the one cutting the birth of new, potentially better, alternatives. Sadly, even those authors who advocate that the ``everything is a file'' model has had its days, ended up proposing ``integrated file systems'' as the replacement for file systems [3]. We think it is time to seek for a new abstraction, that could fulfill the needs of modern computing systems. In what follows we try to do so.

next up previous
Next: The box Up: The Box: A Replacement Previous: The Box: A Replacement