Quilt (sucessor to Bento)

Joe Kiniry (kiniry@cs.caltech.edu)
Fri, 28 Mar 1997 12:56:15 -0800


i'll be a happier camper if quilt succeeded in solving some of the
short-term issues wrt bento (too heavy-weight, some folks belief that
there was needless complexity in the api and/or implementation, etc.).
it sounds like this is perhaps the case, based upon a reading of this
summary (diff trees, btrees, indexing). note that i do not have
first-hand experience with bento, but i look forward to digging into
quilt if/when it becomes widely available.

we'll just have to see if this technology is transitioned with the
rest of the opendoc camp and then we'll get a chance to look at it
first hand and make a decent evaluation on it applicability.

joe

Ernest N. Prabhakar writes:
> Hello,
>
> Since we've had discussions of OODBs as replacements for filesystems,
> I thought I'd post this for your perusal. Bento is the file format
> for OpenDoc, which may yet survive despite OpenDoc's demise. I
> suspect it doesn't get Rhapsody a whole lot that stylized
> TypedStream's couldn't do, but perhaps there's still something to be
> said for it.
>
> Actually, the main thing I thought FoRK would appreciate is his .sig qoute:
>
> Values have meaning only against the context of a set of relationships.
>
> Something to think about for your book, Adam and Rohit.
>
> -- Ernie P.
>
>
> From: mccusker@netcom.com (David McCusker)
> Subject: Quilt (answer Bento riddle)
>
>
> Friday morning I asked the following riddle.
>
> >What is a whole composed of many uniform but varying sized parts,
> >especially when the parts are reused after earlier incarnations, and
> >are often carefully arranged in regular patterns?
>
> The answer is "Quilt".
>
> Quilt is an alternative name for Bento 2.0. Quilt is not quite
> finished, and I have no information at this time about my prospects for
> completing the effort. Quilt source availability is expected to be the
> same as that for OpenDoc and Cyberdog, however that might evolve in the
> future. However, I plan to finish the incomplete aspects of Quilt,
> even it becomes necessary to do this on my own time. If folks have
> access to Quilt, I'll provide whatever free support I can manage, also
> on my own time if necessary.
>
> [ The following is a description of Quilt that I wrote in February.
> In many respects, the descripion is very similar to that of Bento
> 1.0d5 when the semantics are identical. So this description is a
> good synopsis of what structured storage means when one is talking
> about Bento. Quilt is just a much more efficient version of the old
> version's semantics, with a few additional features like btree's for
> efficiently indexing document content (or simply storing it in a form
> that can be efficiently searched on disk). This material does not
> disclose any confidential information because it is mostly a
> rewording of publicly defined Bento semantics; the Quilt specific
> info just situates the Bento optimizations I've often alluded to. ]
>
> Below I'll tend to talk about Quilt and Bento as if there's not much
> difference, but Quilt has additional features like B+tree's, and scales
> much more efficiently to very large numbers of streams, and to large
> streams which are modified by insertion, deletion, writing, or appending
> (Unlike Unix files, Quilt streams support fast insertion and deletion.)
>
> Quilt can also use runtime resources like memory much more frugally
> than Bento, but this does not really affect semantics or the interface
> significantly. The Quilt format is block structured to support
> efficient paging, whether or not a platform supports VM. And blocks
> can be paged not only from disk to memory, but also from remote to local
> storage; so it's possible to efficiently serve large remote documents.
>
> >This does not add any complexity nor risk to the file system (and
> >allows to stick with standard distributions) and offers applications
> >the flexibility they need.
>
> >From the file system's point of view, a Quilt (or Bento) file is just
> a single file. But from an app point of view, each Quilt file contains
> and organizes an arbitrary number of efficient random access streams
> and/or B+tree dictionaries that are interleaved in varying formats that
> are named and typed so that all content is tagged with metainformation
> that clarifies the purpose of each stream or dictionary.
>
> Quilt could provide a structured storage solution for the Rhapsody
> platform. I don't believe OpenStep has it's own structured storage,
> but it might support Microsoft's structured storage for OLE (however
> this would be much less efficient that Quilt, based on anecdotal
> evidence).
>
> The Quilt system for typing streams is as rich as MIME, if not richer,
> so it could solve metainformation problems that are not best addressed
> by file extensions or file creators and types.
>
> Small streams are represented in compact forms much smaller than a
> the size of a block, so it is practical to have small streams
> containing, say, only four bytes each. At the other extreme, large
> streams are represented by Exodus style Btrees with performance very
> similar to Unix inodes, except that insertion and deletion at any
> offset in the stream is supported with performance equal to writing
> anywhere in the stream.
>
> Quilt supports three different kinds of primitive content, called
> values. These are random access byte streams, b+tree dictionaries,
> and boxes. A box can be considered a directory or folder (but with a
> shorter name). Each box is an arbitrary sized collection of objects,
> where each object has the same meaning as an OpenDoc storage unit.
> (Objects might as well have been named "files", except this meaning
> would collide with the idea of the file containing the entire
> document.)
>
> Each object is a collection of named properties. (Names are tokenized
> on a per document basis in a top level name dictionary.) Other than
> a persistent id, objects are only distinguishable by the properties
> they contain; to mimic a file system, each object would require a
> filename property. An app might decide to use a dictionary to index
> objects by one or more properties, or not as the app deemed
> appropriate.
>
> By convention, each property represents a different kind of semantic
> content in it's object. Each property is a collection of named
> values, where the name indicates the type of the value, which by
> convention implies the format of the value. So by convention, all the
> values in one property are considered alternative representations of
> the same semantic content, but in different formats.
>
> Each value in a property can be a stream, a dictionary, or a box.
> The type of a stream implies the byte format. The type of a
> dictionary implies the sizes and byte formats of <key, value>
> associations. The type of a box might imply invariant relationships
> among the objects contained by the box. Because a value can be a box,
> this allows boxes to be embedded in boxes to create a recursive path
> hierarchy like that in a file system. Each Quilt document contains a
> distinguished top box forming the root of all the doc content besides
> the name dictionary.
>
> In addition to a value's type, a property associates with each value
> a map (linear format dictionary) of references from a value to other
> objects, and a "version" field used either as a generation number or a
> modification date (depending on an app conventions; value type might
> imply version semantics). Storing inter-object references in
> externally inspectable reference maps enables copying graphs of object
> networks from one place to another (like another document) because
> object ids can be swizzled to maintain the original network
> topography. (Btree dictionaries can also maintain direct id
> references, but these cannot be swizzled anonymously when the format
> is unknown without additional conventions for ref location.)
>
> Quilt documents support global transactions that atomically commit or
> abort all changes to a document between two points in time (typically
> coinciding with user level save operations, but just as easily
> corresponding to some low-level app operation). The Quilt file i/o
> interface gives a great deal of latitude to file class
> implementations, and this is what allows transactions to be
> transparently implemented by a standard file class into which clients
> plug-in simple files that need perform only block level reads and
> writes. The standard transaction file class does not support nested
> transactions, but other file classes could do so. The standard class
> journals changed blocks using copy-on- write semantics, using the
> file's empty free blocks as journal space. Old file content is never
> overwritten until after a transaction commits, so Quilt files are
> robust under crashes or other failures.
>
> The same standard transaction file also supports updating other
> readonly files using the same format. This allows a mutable file to
> derive from a readonly base file, where the derived file becomes an
> extention of the base file's address space, and all changes to the
> base file are mapped into the mutable file using copy-on-write
> semantics. Also changes to the derived file are atomically
> transacted. Because all changes occur only in the derived file, this
> allows an update file to "edit" another file without actually
> modifying the other file. This can be a useful approach to editing
> content in some media applications. It also allows files stored on
> readonly media to present the interface of a mutable file; among other
> things, this allows readonly media to be efficiently patched using
> smaller update files on writable media.
>
> Since I've already described so many Quilt features, I might was well
> cover one more. Like Bento, Quilt also supports the idea of "drafts"
> of doc content. A "draft" is a readonly earlier version of a set of
> objects that forms the basis of a more recent readonly or modifiable
> version of that same set of objects, possibly with further additions.
> The purpose of drafts is to provide a specific advantage over simply
> keeping multiple independent copies of content: drafts occupy less
> space because drafts share structure for content that has not been
> modified between drafts.
>
> Unlike Bento, which has a linearly increasing cost as more drafts are
> appended, Quilt has a small cost for drafts that is independent of
> the number of drafts in a given draft path (drafts can actually branch
> like trees). Quilt has no cost associated with supporting the draft
> feature when drafts are not actually used.
>
> Quilt drafts are implemented using boxes. When a box is created, it
> can be specified as a new draft that derives from some other base box
> which is considered the old draft. The base box becomes readonly
> (until all boxes derived from it are destroyed; boxes are reference
> counted). The new derived box contains a very shallow copy of the
> base box which causes the two boxes to share structure. But the
> derived box maintains a dictionary of all old entities in the base
> box, which allows copy-on- write to be implemented before an old
> entity is actually modified. Attempting to modify a readonly entity
> shared with the base box causes that entity to be shallow copied, so
> the mutable copy replaces the old version in the derived box. So a
> derived box differs from its base only to the extent that it has been
> modified.
>
> Copy-on-write is performed at as small a granularity as practical.
> For example, large byte stream values are represented as Btrees. When
> a readonly value Btree is modified, the minimum possible number of
> leaves and nodes in the tree are replaced with mutable copies, with
> other leaves and nodes remaining readonly until they too are modified.
> A heavily modified readonly value might end up replacing most of its
> blocks with mutable versions, with just a few scattered readonly leaves.
>
> David McCusker, structured storage/Bento/Quilt guy, mccusker@netcom.com
> Values have meaning only against the context of a set of relationships.
>