From: Dan Brickley (Daniel.Brickley@bristol.ac.uk)
Date: Sun Sep 17 2000 - 02:50:12 PDT
On Sun, 17 Sep 2000, Jeff Bone wrote:
> Ya know, this was considered crazy when it was just Doug Lennat doing it...
A few weeks ago, computer scientist Chris McKinstry
announced a plan to harness the brain power of Internet users to fuel an
artificially intelligent thinking machine.
Web surfers flocked to his Mindpixel Digital Mind
Modeling Project website, and McKinstry's database of mindpixels --
"one-bit" pieces of knowledge -- swelled so quickly that his system
became temporarily overloaded.
It's still crazy, to the extent that they think that collecting
declarative factoids is the route to artificial intelligence.
('Gac' is the noise I make when I'm asked to believe this sort of
thing is the way to produce thinking machinery).
That said, this kind of effort could turn into a pretty neat dataset, if
only it had a little more structure. A bit like the little factoid
databases you get with IRC bots.
"All great truths are composed of a multitude
of minor truths, and the minor truths are composed of massive
numbers of atomic truths," says Chris. The
basic facts he hopes to elicit from you and me are the atomic
truths, items of binary consensus fact or mindpixels.
That's the interesting (and IMHO bogus) claim; it's like trying to infer
the meaning of 'adult education' versus 'adult entertainment' from
knowing what 'adult','education' and 'entertainment' mean...
Whatever, both databases are pretty much free text right now, which
seems something of a self imposed hurdle.
As a longstanding RDF zealot, and in the spirit of 'RDF - dry humping
the corpse of Knowledge Representation' (@@url?, Rohit/Adam, can you fix
the FoRK search engine so I can check my scholarly references?; Egroups
is down for 'routine' plumbing so can't check there), here's how I'm
doing this sort of thing in RDF.
1. this isn't KR/AI, it's a big freeform database full of lies and errors
2. big freeform databases are crap unless people use the same identifiers for
3. On the Web, it's polite to use URIs for unique identifiers
4. Although it'd be nice if people/places/objects all had URIs (URLs, URNs...)
in practice that isn't likely in our lifetimes
5. But that's OK, because most things we want to talk about can be
identified indirectly through URIs, using a property/relation/attribute
with a cardinality of 'at most one', eg. there is atMostOne person
whose personal mailbox is mailto:email@example.com.
6. A good pattern for simple factoid exchange is 'object -
relationshiptype - object'.
So here's the RDF/Webby version of the distributed factoid model: we
harvest simple claims of the form 'object1 - relationshiptype ->
object2'. Relationshiptype has to be a URI,
eg. http://xmlns.example.com/hasHomePage'. Object1 would ideally be a
URI (but see below); object2 would either be a URI or an atomic
What do we do when we don't know URIs for these objects? The bit that's
needed to join all this stuff up is to be able to make structured
claims about objects who don't typically have URIs. If we get this
right, we can attempt (there are some big subtleties I'm pretending
don't exist for purposes of zealotry ;-) the mother of all data-joins
across diverse collections of factoids. I think we can do much of this
fairly easily. Not worth going into detail but basically, instead of picking
out a restaurant (say) by trying to invent a URI for it, or by
conflating it with its homepage, you say "the restaurant whose home
page is http://..../" or "the restaurant whose phoneNumber is
tel://43523423424" or "the restaurant startedBy the person whose
mailbox is mailto:etcetc". Same goes for people, places etc.
I'm not claiming this will make everything easy, magically unify all
data etc etc, just that starting off a project for heterogenous factoid
aggregation without an approach to uniquely identifying stuff seems
like a recipie for hard work and little gain...
This archive was generated by hypermail 2b29 : Sun Sep 17 2000 - 02:54:43 PDT