Re: The Metadata Saga

Jim Whitehead (ejw@ics.uci.edu)
Wed, 30 Apr 1997 12:05:07 -0700


At 10:34 PM 4/29/97, Rohit Khare wrote:
>[I prepared this in response to a nameless questioner from the Net]
>[JimW, RS: any comments?]

A few comments :-)

>> The MetaData project?
>>
>> The idea of having more content in HTML pages. So they can be read by
>> programs, rather than just people.

Let's start off by addressing this faulty assumption. Metadata on the Web
should be applicable to resources of any media type. Even on the Web
today, there are many, many resources which are not HTML,such as bitmap
images, java executables, PDF, Postscript, etc. The idea is to be able to
have descriptive information about Web resources of any media type,
including those which do not have any built-in provision for storing
general purpose metadata, and never will.

>Here's the scoop: several different metadata initiatives are colliding
>messily in real-time. This is the kind of godawful mess of convergent
>evolution that the W3C *just might* be the right answer to.

It is true there is some overlap between some metadata proposals. While
the metadata proposals listed in this email are in some sense roughly
related (they all use the word metadata), most of them are complementary,
and are not colliding. For example, Dublin Core, MARC, and RFC1806 (the
Dienst bib. format) are all bibliographic record formats, created (as you
would hope) by researchers from the digital libraries community. These
formats are *not* intended to solve the general-purpose Web metadata
problem -- for example, none of these bibliographic record formats can
effectively convey PICS-like rating information. On the other hand, PICS
is not a good bibliographic record format. Thus I would say that PICS and
Dublin Core/MARC/RFC1806 are not colliding in any sense.

>3) WebDAV. Jim Whitehead's team took a detour (IMHO) into storing and
>manipulating 'small' metadata chunks with versioned documents. Immediately
>ran afoul of PEP, HTTP purists (what's with GETMETA?), and heavier
>schemes like...

Wow, so many inaccuracies, so little time. First, for people who aren't
subscribed to either the WebDAV mailing list (w3c-dist-auth@w3.org) or the
dsig collections list, I recently posted a proposal for adding metadata
operations to HTTP, which is available at:

http://www.ics.uci.edu/~ejw/authoring/proposals/metadata.html

In this proposal, I actually do give a framework which explains the
relationship between what I term "large chunk" metadata proposals like
PICS, PICS-NG, Dublin Core, MARC, Web Data, etc. and the "small chunk" HTTP
extensions proposal. This proposal contains an extensive hyperlinked
reference section, which makes it easy to track down the source material
being described.

Basically the proposal extends the HTTP object model to create a new area
for state storage within a resource, to be used for the storage of
name/value metadata pairs. While there is no effective upper bound on the
length of a metadata item (and hence you could make a name/value pair like
"PICS-label", "{an instance of a PICS label}", typically you'd want to
create a link on the resource which points to the PICS label which is
itself stored as a separate resource. Methods are introduced to create
name/value pairs (ADDMETA), delete name/value pairs (DELMETA), and to
access name/value pairs (GETMETA). The GETMETA method is bundled with a
simple s-expression like search syntax, so if you want to get a listing of
all the attributes on a resource you'd pass a search specification of (OR
(AND (name "*")(value "*"))). Hypertext links are defined as a special
type of metadata with some constraints on the format and semantics of the
value of the link name/value pair (e.g., name="DAV:/Link", value="Type =
{token} Source = {URI} Dest = {URI}").

The WebDAV proposal supports small chunk metadata and large chunk metadata.
It doesn't address packaging issues, because there are already many
proposals for how to package metadata. Far fewer proposals actually
address how this metadata is stored and associated with the resources they
describe. Because the proposal is implemented via HTTP, it also provides
the ability to store metadata on resources of *any* content type, not just
HTML. The WebDAV proposal describes "how" metadata is stored and
associated, while efforts like Dublin Core, PICS-NG, Web Data, etc.,
describe "what" metadata is stored, and its packaging.

Thus the WebDAV proposal is complementary to packaging efforts such as
Dublin Core, PICS-NG, Web Data, XML, Digital Signature manifests, etc.

I think your characterization of WebDAV running afoul of PEP is completely
groundless. PEP only describes extensions to HTTP which involve adding new
headers to modify the semantics of existing methods. WebDAV is proposing
to add several new methods to HTTP, and hence is outside the scope of PEP.
This applies equally to methods like COPY and MOVE as well as to methods
like GETMETA. As for HTTP purity, Roy Fielding was present at the meeting
where we crafted the GETMETA method (he helped write the BNF for the search
syntax), and there are few others who can claim the mantle of "HTTP purist"
more effectively than he. So your characterization of WebDAV running afoul
of HTTP purists is also groundless.

>The whole schmear has fallen in Ralph Swick's lap at W3C. He owns the
>helm on coordinating strategy on these issues. I identify this crucible
>as an argument *for* W3C, because it may be that only because we had
>staff in all these areas,

I think Ralph has been doing a great job coordinating the various metadata
efforts, and I agree that the W3C is a natural place for this coordination
activity to take place.

>and because we have real technical people,
>not project managers, we may be able to restabilize this whole knot of
>problems with a dose of 'true gospel' as dispensed by Dan Connolly and
>Tim Berners-Lee.

As opposed to the *unreal* technical people working for Web technology
companies?

In this arena what is needed is less religion, not more.

- Jim