The computing environment used to write this paper is made of three different network technologies (ethernet, wireless ethernet, and serial links) that interconnect a number of different devices including a laptop, a hand-held, a machine used as a cpu server, several terminals, and a file server. Moreover, all these devices serve files to the network, some of them have I/O devices to interact with a human, some other ones have CD readers, some have audio devices, some don't. The set of available devices as well as their location is inconvenient to handle and, since both the laptop and the pocket-pc can move, the actual computing environment used can vary from a bare pocket-pc to a full network of servers and workstations. Should the user be using just the laptop or the pocket-pc, the set of available devices can increase/decrease depending on the location of the user. New printers and I/O devices may be available in a location nearby the user.
To pick up an example to illustrate the motivation for a new operating system, consider that while using our laptop in this environment we would like to be able to say "copy this file to a printer" and let the system discover if there is a printer in the room willing to accept print jobs from our laptop. If there are several printers available, we would like the system to choose any one that understands the format of the file to be printed. If no printer understands the file format but there is a program to convert the data to the printer format, we would like the system to run that program and then queue the result for printing. We have enough technology to be able to do all this with just a single command like
cp /this/file /any/printer
Never wanted to borrow a friend's keyboard for a while to type some input for an application you already started on the handheld? Wouldn't you like your applications to be able to accept redirections on its set of I/O files?
More examples: we would like our editor to be able to ask "save this file to /tmp/file" to the operating system running on the laptop; and let the system discover if the file should be saved to the file system in the laptop (if any), or to the department file server (if available), or to a nearby disk in the room willing to accept requests for temporary storage from our laptop (when available). Again, the code to program this operation should probably be much more simple than it turns out to be on our computing systems.
Although the examples above illustrate the need for a new system, a different example is used through the rest of the paper: implementing a shell for a computing environment like the ones we use today. The example of building a shell for this environment is hopefully more simple to discuss yet still addresses the issues raised above.
The task of our shell is to execute program binaries on a set of available processors. Depending on the set of network connections and devices available, the most appropriate pair of binary file and processor used to execute a new program may vary. When only the ARM based pocket-pc is at hand, the only choice would be to take an ARM binary file from the pocket-pc file system and execute it on the pocket-pc processor. If a file server and a 386-based machine is available, the preferred user choice may be to use a 386 binary file from the file server and execute it on the PC. But any other combination may be also the way to go depending on the actual environment (for example, the binary file for a given program may exist only for the pocket-pc processor and it may reside on the file server.)
Even in the most favorable case(1),
To make things worse, not all network connections have a high bandwidth. Therefore, data transfers (e.g. copying the binary file to the memory of the processor used to execute it) would better be done end-to-end, with no intervention of the node that triggered the transfer. For example, the pocket-pc using a low-bandwidth infrared connection could be executing a program that transfers megabytes between nodes sharing a high-bandwidth ethernet connection. Although there are enough resources to do so, most systems would require such program to run at a different node.
Another related problem is that resources come and go during the life of applications. How can we then update the set of resources used by the application considered? For example, while the shell used as an example is executing, its owner might want to use a different I/O device to interact with it; or perhaps, the output of a process already started by that shell should better be redirected to a new display that has become available in the mean time; or may be the owner of the shell would like to borrow the keyboard from a friend for a few seconds to issue a new command line to the shell. The application shouldn't have to do anything special to admit such kind of redirections, or to discover new devices available.
Plan B is an attempt to address these problems in a simple way. Although probably none of its ideas is new on its own, their combination makes up an environment which hopefully will simplify how applications are built and programmed for our computing environments.
These are the principles Plan B is built on:
System call Purpose cast defines a type conversion change changes the current box name chinfo modifies metadata for a box copy copies one box to another delete deletes a box dot retrieves the current box name forget forgets about an imported box import defines a new name for a box info retrieves type, constraints and metadata for a box kind retrieves the type and constraints for a given box link links one box to another make makes a box selectors retrieves names for boxes contained within another uncast forgets about a type conversionTable 1: Plan B system calls
As it can be seen, Plan B owes much to Plan 9[7], although it incorporates important differences that are described through this paper. These principles lead to a simple system with 14 system calls. The complete list of system calls is shown in table . Following sections describe such calls as well as how these principles work together by discussing the different pieces that make up the system.
A box[2] is an abstraction which is meant to replace the more traditional file abstraction, trying to be expressive enough to capture high-level data semantics and relationships. System resources ranging from an entire application, to a user-settable variable can be represented by boxes. A box may also contain inner boxes, leading to a tree structure. Regarding the box interface, there is no difference between boxes that contain inner boxes and those that do not. All boxes are operated with the same set of operations.
Unlike files, there is no need to open a box before using it; there is
no
open
system call in Plan B. Box operations
use box names, which means that a name is likely to be resolved every time
a box is used. Although this adds some performance penalty, it permits
to use different copies of a resource at different times.
Figure 1: Data transfers do not need to keep the client in the middle.
The most used box operation is copy, which copies a box to another (figure , left). Unlike traditional file copying (figure , right), data can go straight from the source to the target box. This can lead to significant performance improvements[1]. Furthermore, when the network connection between the client and the rest of the system is poor it can make the difference between being able to perform the copy and not being able to do it.
To make an example, the Plan B shell uses two box operations, make and copy, as the means to execute a new program. A request to execute the program lb (list boxes) is understood by the shell as a request to perform the following box operations:
mk /b/proc/lb # create a process box for lb cp /bin/lb /b/proc/lb/prog # load the lb program on it
make("/b/proc/lb", nil, nil, nil);
copy("/bin/lb", nil, "/b/proc/lb/prog", nil);
One sensible question to ask at this point is how to supply the lack of read and write. To permit applications to copy data from their memory to the outside world and viceversa, each address space permits the creation of boxes representing segments. The equivalent of a traditional write would be a copy from a memory segment box to some other box. This mechanism provides a controlled means to export a piece of memory to the rest of the system: if desired, an application's configuration variable can be exported as a box so that others could change its value; the application would notice the change the next time it consults the variable.
The lack of open together with the use of copy may permit applications to operate better under transient network failures, because instead of using streams across the network (which could break) Plan B applications tend to follow a "copy-input, work, copy-results" model. At first sight it could be argued that with copy applications are no longer able to operate on huge ammounts of data; but when the boxes copied are so big that they cannot be handled by the system it can be also understood as a symptom that inner boxes should be used instead.
Another useful operation is link, which is used to link two boxes together. The exact meaning of that depends on the boxes linked. For example, each process is represented by a box (eg. /b/proc/lb) and uses to have an inner box named io1 (e.g. /b/proc/lb/io1). The convention is that this box is used by the process to serve the same purpose of UNIX's standard output. A link from a box to io1 would make io1 point to such box. The next time the process copies something to io1 data will be sent to the linked box instead. Again, the code needed for the application to redirect the output of our example process is quite simple:
link("/b/term/cons", nil, "/b/proc/lb/io1", nil);
We have seen most of the box interface, other operations are delete, which removes a box, info, which retrieves box metadata, chinfo, which updates box metadata, and selectors, which asks for the names of inner boxes to a given box. Since their names give a good approximation to what they do, they are not discussed on this paper.
All boxes are typed and have associated constraints. A box type identifies which other boxes can be used (copied, linked) with the box. A box constraint specifies properties that might be taken into account when operating on the box (e.g. the location of the box). To make it simpler, the type and constraint are specified together as shown below.
Constraints are made of a sequence of constraint elements, each one
specifying a particular property of the box, separated
by the character "!". As a convenience, the constraint is specified after
the type using also "!" as a delimiter. For example,
the box containing a program binary to be run on a Plan B system on a PC
may have the type and constraint
bin!386!wave.
This means that its type is
binary;
specifying also
that the architecture for the binary is
Intel based PC,
and that the box comes from a node whose primary network is the wireless
ethernet used at our department. There is no implementation of Plan B
for a native 386 environment yet. The current implementation runs hosted
on top of the Plan 9 operating system. Thus, the constraint for a binary
box would rather look like
bin!p98!wave
or
bin!p95!wave;
where
p98
means that the architecture is
Plan 9 on a PC
and
p95
means
Plan 9 on an ARM.
Figure 2: Plan B combines constraints to choose appropriate resources.
Constraints are useful to determine which ones of the available boxes are to be used. Therefore, most box operations admit not just a name for the box involved, but also a constraint specifier, to determine which constraints are to be taken into account to select the named box. Figure depicts an scenario for the shell in our example. The shell could perform the system call
make("/b/proc/lb", nil, "/bin/lb", "%", nil)
As it was seen, boxes have names that are similar to traditional path names. A name space is made of a set of boxes found in the network bound to a set of names. The name space glues together the trees in the forest of boxes that are of interest for the application. This is similar to the use of name spaces in Plan 9, although there are some important differences that hopefully will become clear in this section.
Plan B refers to names that have boxes bound to them as "prefixes"; the implementation of the name space is in fact a prefix table. Because each box bound to a prefix may contain inner boxes, box names are made of a prefix followed by a set of name components to be interpreted by the box bound to it. For example, /b/proc is the customary prefix to bind processor boxes, and it leads to names like /b/proc/lb/io0, where lb/io0 is to be interpreted by a box bound at /b/proc. Conventions are important here, because a user would expect all boxes bound to /b/proc to present a similar structure. The same happens with other conventional(2) prefixes like /bin.
Each name space has a current working box, but that is just an absolute name to be prefixed to each relative name used by the process. The concept of a "root box" does not exist in Plan B.
There can be multiple boxes bound to the same prefix and it can be specified whether a new binding is to be used before or after existing bindings. Unlike name spaces for files found on other systems, it is important to notice that boxes have constraints that may determine which one of the boxes with a given name is to be used. An application may want a binary for a program to be copied to a processor for example, and it does not need to know for how many architectures is the binary available nor does it need to know how many architectures have processors available for execution. The command to define a new prefix is
import /a/prefix networkaddress /a/box/name constraint spec
The constraint supplied to import is useful because another feature of Plan B name spaces is that they can listen for resources advertised on the network. It is customary that each server that implements the protocol to serve boxes, also implements an advertising protocol. Using this protocol, it can be requested for any advertised resource matching a given name and constraint to be bound under a given prefix in the prefix table. For example,
import /b/proc any /proc proc!p98 b
When it is found that an advertised bounded box is no longer accessible, it is removed from the name space. Currently, this may only happen after an operation is attempted on the box. Although it has not been tested yet, constraints could be used to provide hints about the expected lease time for a new resource to stay. Besides, the forget system call can be used to forget about a previously imported box.
Boxes must be of the same type to be operated (e.g. copied) together¿for example, /bin/lb and /b/proc/lb/prog are both of type bin. However, each name space carries a set of conversion definitions along with the prefix table. The converter set is made of entries that specify how to convert data of a particular type to get data of a different type. New entries can be added with the cast system call, and removed later with the uncast system call.
The means to do the conversion is usually to run a program; but it may be also a null program. So called null-converter entries mean that we can consider that the source type can be used as the target type. By using converters, conversions on data can be automated. The type system can be circumvented as dictated by the user by means of null converters. Therefore, the type system used for boxes is not as restrictive as traditional strongly typed file systems. In fact, the system knows nothing about a particular type, apart from the type name and constraints. All data is considered as arrays of bytes once the types and constraints have been considered to determine if a particular operation can proceed. This is useful to permit the user to add new types to the system without using user code within the system software.
Converters are started by copying their programs to a compatible processor, which provides a means to establish proxies for data transformation in a semiautomatic way.
When there is no direct source to target type conversion, the system tries to achieve the same effect by plugging together several converters. Again, each one can potentially run at a different place.
Heterogeneity of architectures and networks is dealt with by combining several tools: Presenting all resources as a single abstraction helps to deal with them independently of their architecture and inner structure; Using constraints to express restrictions and features of the resources helps to combine them together; Finally, name conventions are important in that they can make it easy to locate the resources and operate on them.
The implementation also pays attention to this issue and the protocol used to operate on boxes uses strings wherever feasible, because they are portable and meaningful to humans.
The converter set further simplifies the task of combining heterogeneous resources by allowing them to be converted when needed. Although this was meant to be used to automate operations on data, it can be applied to convert between different data formats.
Plan B is designed to work both in hosted and native environments, although currently there is only an implementation to run hosted on top of a Plan 9 system. The binary format used by Plan B on a hosted environment is that of the host system. Thus, a Plan B process on a hosted environment can perform system calls and use libraries from the host environment, to steal facilities from it and export them to the rest of the Plan B system.
Besides, any process on the network implementing the box protocol (called Op) can provide services to be used from a Plan B process. This means that foreign resources can be made accessible by writing server processes speaking Op. No such server exists yet, but it is expected to be easy to implement them since most of the code could be borrowed from the Plan B implementation.
The protocol is simple enough to permit implementations with tiny memory and processor usage. Messages in Op match operations in the interface supplied by Plan B boxes, namely:
Besides, it would not be hard to export a hierarchy of boxes as a hierarchy of files for foreign processes; allowing other systems to use Plan B resources.
Protection is based on authenticated access checking based on access control lists. Each box has a set of permission bits (matching the operations in the Op protocol) to determine which operations can be performed by the box owner and by others. There are no groups in Plan B, should a group be necessary, the boxes can be owned by a user and a speaks for relation may be set for the users in the group.
The authentication protocol is left out of the system and it is assumed that connections between box clients and servers will be authenticated before Op is used on them.
It has to be said that the authentication protocol is disabled in the current prototype, which means that all the parties are assumed to be already authenticated. Nevertheless, an authentication protocol from a different system may be stolen and adapted for its use in Plan B.
To support copy and link, a process can establish that the operation source may speak on behalf the client process owner for the operation affected.
As the title of this section suggests, there is no Plan B storage server yet. Plan B borrows files from the Plan 9 system it runs on. However, the design of the storage system has been done to fit to the box abstraction and it may be useful to take a look at it to better understand the system.
A Plan B storage server is supposed to provide boxes that persist on some secondary storage medium. Unlike files, boxes are made of either raw data or inner boxes depending on the user's point of view. To implement this, a new box made into the storage server will be initially stored as a traditional file with raw bytes. Should the user make an inner box on it, its implementation will keep the name for the inner box and the contents of the inner box as well.
A copy operation of a box with inner boxes is to be accepted by the storage server, and it can be implemented by bundling together the set of inner boxes and recreating them at the target¿In fact, other system boxes outside the storage server are also expected to implement copy in very much the same way.
What has been said can be understood as a traditional file server which knows how to bundle/unbundle a directory hierarchy when copying it to a different place. Besides, the storage server is also expected to advertise boxes to the network as instructed by its configuration. This is needed to export storage resources to those name spaces that care to listen.
It is also expected to admit link operations to replicate boxes among different storage servers. The implementation can be similar to that of file systems with replicated volumes and disconnected operation[8]. This is important when considered together with selection of boxes based on constraints, because it will allow applications to use different box replicas depending on their availability.
A shell is a good example of how an application can benefit from a system like Plan B. It's main task is to start new processes by using boxes with the binary of the program and boxes representing the processor where the binary can run.
As it has been shown in the above examples, the Plan B shell can learn at run time of newly arrived binaries and processors and employ them when needed. Data would flow from the storage server to the processor server without intervention of the node running the shell, which means that the shell can run on a tiny machine with a slow connection yet transfer megabytes of binary data to the node where the data is needed.
All adjustments of the processes created by the shell can be performed remotely by either copying or linking boxes. For example, this is the code used by the shell to arrange for a process to use the standard io boxes used by the shell:
r = random();
sprint(proc, "/b/proc/%s:%d", arg0, r);
sprint(pio0, "/b/proc/%s:%d/io0", arg0, r);
sprint(pio1, "/b/proc/%s:%d/io1", arg0, r);
make(proc, nil, nil, nil);
make(pio0, nil, nil, nil);
link("/b/proc/me/io0", nil, pio0, nil);
make(pio1, nil, nil, nil);
link("/b/proc/me/io1", nil, pio1, nil);
As another example, the means for the shell to wait for the completion of the child is also a using a box. Each process has a legacy box, initially a link to a null box, and is expected to copy its legacy there. If the shell has interest in receiving the legacy of its child, it can link another box to the child legacy and then copy from such box to wherever it wants to. If the box linked is a channel, (similar to a pipe), the shell blocks until something is copied into the channel. Note that this mechanism works across machine boundaries, and that the child process can be destroyed entirely before the parent asks for it; In fact, there is no parent/child relationship for processed on Plan B.
As another example, consider how to export/import data to an arbitrary process. In Plan B, each process has a vm box, which has one inner box per memory segment in the process. A make can be performed on a process' vm box to create a new segment on it. A copy can then copy data to/from the process address space. The Plan B kernel beneath the process considered maintains a table of segment boxes which can be read from user space; this way a process can know the address in memory of its, say, vm/x box and read/write it. This mechanism unifies virtual memory segments, environment variables, and argument passing in Plan B. Note that a new value copied to a memory segment box would be noticed as soon as the process would read the variable, similar to memory mapped files in other systems.
As a final example, there is a usr box in Plan B that represents the set of I/O devices preferred by a user. Conventionally, the shell links its I/O devices to those preferred by the user who started the shell. Through this simple mechanism, started processes can follow the user and use an appropriate I/O device depending on the state of the human using the system. Furthermore, whould the user change his mind, for example, to borrow a friend's keyboard for a while to type to an already started shell, all the user has to do is to link a new set of I/O devices for the involved process.
The current implementation works hosted on top of Plan 9. Plan 9 processes and files (including network connections and rio windows) are re-exported as boxes by the machine dependent part of the Plan B implementation. There are boxes for processes, files, network connections, memory, terminals and some other miscellaneous system boxes. A shell and several box manipulation programs have been implemented.
The performance is affordable for an impatient human, although
no performance tunning has been done. Since the merit of any
performance numbers is of the underlying Plan 9 system, no measures
have been made yet. A native port would be tried before.
Figure 3: Implementation on top of Plan 9
The approach used to implement Plan B (see the figure ) on top of Plan 9 is to employ a separate plan 9 process for the Plan B kernel, and then one Plan 9 process per each Plan B process. More precisely, the Plan 9 process used for the Plan B kernel has one thread (one Plan 9 process actually) per Plan B process. Each thread services system calls for the corresponding Plan B process. System calls are made using RPCs through a pipe between the user and kernel processes. Although this may be slow, it does not cause appreciable delays while using the system. On the other hand, this permits for Plan B processes to use exactly the Plan 9 binary format, which means that the whole plethora of Plan 9 libraries can be used as-is. No OS toolkit has been used to implement Plan B, yet it is benefiting from all sort of facilities provided by a host system.
The source code is 9134 lines of C, counted using wc on all C source files under the src directory, including the source for the few commands implemented. This is just a hint of the simplicity of the system but is not to be taken too seriously because it is mostly due to the fact that Plan 9 processes and other resources are being used to implement Plan B boxes. The source is available upon request.
Plan 9[7] is a distributed system that is built by exporting all resources as files and allowing those files to be accessed through the network. Plan B borrows many of the Plan 9 ideas. There are some important differences though. Unlike Plan 9, Plan B uses boxes instead of files permitting data to flow without the controlling client intervention. This is important to permit clients with bad connectivity to control the transfer of huge ammounts of information. Besides, along with the box abstraction comes the use of constraints to determine box selection according to the expected usage for the resources.
The lack of file descriptors in Plan B provides some degree of transparency under network failures. In Plan B, applications are meant to copy a resource, work on it, and then copy the result back to some other place. While working, resource availability can change and the most appropriate place to place results could change. By resolving the name on each box operation, the location of the box instance is allowed to change. Closely related is the ability to listen for resources in the network by means of box advertisements.
The Inferno operating system[5] offers the ideas of Plan 9 in a highly portable environment. What has been said for Plan 9 can be also applied to Inferno.
Going back in time, old typed file systems as well as modern typed file systems, like the one used by Nemesis[4], are obviously ancestors of the box abstraction, however, such file systems did not use constraints, nor a converter set. There are many other systems that try to permit flexible access to network resources, such as Odissey[6] and Khazana[3]. Most of them do not provide a full operating system environment and expose the application to the native operating system used. The difference in complexity is also to be considered. Nevertheless, work done on those systems to achieve replicated storage with support for disconnected operation can be applied to the design and implementation of the Plan B storage servers.
In the near future, an editor and a compiler set will be ported from Plan 9 to Plan B. By using the system for daily work its shortcommings and strenghts will be hopefully noticed. Later, a native port to run on Intel based PCs will follow. There is much to do in terms of performance tunning too.
Some issues remain open, like how to support parsing for new box types defined by the user and how to make the ad protocol scale. Nevertheless, there is work being done in these areas for other different systems and its results could be applied to Plan B.