Storing information vs. storing bits (was: Three Places Bebe rg Is Wrong)

Sherry Listgarten sherry@timesten.com
Wed, 2 Jan 2002 15:41:54 -0800


> From: Russell Turpin [mailto:deafbox@hotmail.com]
> Sent: Wednesday, January 02, 2002 5:14 AM
> Jeff Bone:
> >STORAGE. ..
> >
> >INTELLIGENCE. Information isn't power, intelligence is. .. 
> And IMO "the 
> >Semantic Web" isn't the answer; it's a hammer, but not 
> everything is nails. 
> >..
> 
> Probably. But to beat one of my favorite drums, there is
> something fundamentally wrong when a database stores
> height as 5.5, or a 5 and a 6, or 1691, or 1.69, with
> implied units of feet, feet and inches, centimeters, or
> meters, and it takes a programmer to then sort people by
> height across different databases, files, etc. At one
> level higher, few people should ever see the difference
> between JPG and GIF, or between different audio formats,
> and signatures of the same soundtrack, photo, or video
> should remain invariant across different file
> representations. If you want intelligence rather than
> information, at some point, we have to get beyond the
> bits.

Well, as far as managing heterogeneity in databases goes, there's a lot of
interest in this right now, partly because of XML. I went to a talk at
Stanford that really impressed me a month or two ago, though I'm not sure if
that was partly because I couldn't understand a bunch of it... What
impressed me of the part that managed to get through was (a) how hard the
problem is and (b) the approach that IBM took, which results in a rather
friendly tool for an extremely hard problem.

The talk was about a tool called Clio, being developed at IBM's research lab
in Almaden, which endeavors (in part) to create XML schemas from relational
schemas. You'd think at first that was the usual object-relational mapping
problem, but it's harder than I thought. (Maybe I thought wrong...) What was
interesting is that in many places they were forced to require user-input.
The tool works in the absence of certain kinds of information, but even when
full information is present, user input is needed. They facilicate input by
generating specific, carefully-chosen examples that highlight the choice
that must be made. You can see the "intelligence" part of this, some of
which the tool had to punt on (though providing good crutches).

More information at http://www.almaden.ibm.com/cs/clio/ and also at
http://www.cs.toronto.edu/~miller/Research/hetero.html.

-- Sherry.