Searching for FoRK was Re: "women and intelligent men"

Lloyd Wood (L.Wood@surrey.ac.uk)
Fri, 10 Apr 1998 03:10:46 +0100 (BST)


On Thu, 9 Apr 1998, Rohit Khare wrote:

> PS. Finding the Chitra FoRK url was exceedingly difficult. AltaVista is
> missing about 3,000 posts, so I'm getting worried about setting up a reliable
> search fallback. Hotbot hit the spot, but still had the old xent.w3.org URL.

http://www.ee.surrey.ac.uk/Personal/L.Wood/spacesearch/

I've added the current FoRK archive location as a no-brainer popup
option; in time it might be more useful. As of right now, selecting
FoRK and thumping Find with no search string gives a complete list of
contents in the database:

Searching on +url:xent.ics.uci.edu/FoRK-archive
Entries Database Top article

1505 Altavista USA Iowa bans *sober* exotic dancing...
and Yahoo
1505 Looksmart (all) Iowa bans *sober* exotic dancing...
496 Infoseek/c|net Anna "Forager Of New Bits" Graham disgorges...

36 Altavista Europe Iowa bans *sober* exotic dancing...
3 Altavista Australia Iowa bans *sober* exotic dancing...
3 Altavista Iberia Iowa bans *sober* exotic dancing...
3 Altavista Malaysia Iowa bans *sober* exotic dancing...
0 Altavista Canada (what a joke!)

Searching on +url:xent.w3.org/FoRK-archive
Entries Database Top article
2434 Altavista Europe Only two consumer goods call their customers "users."
156 Altavista Australia Remind me which phase we're in?
156 Altavista Iberia Remind me which phase we're in?
40 Altavista Malaysia Remind me which phase we're in?
7 Infoseek/c|net Black HW support mentioned in Cringely's column...
1 Looksmart (all) Index of /
1 Altavista USA Index of /
and Yahoo
0 Altavista Canada (what a joke!)

[What Are Canadians Saying About AltaVista Canada?
What a sucky piece of crap with no useful content, _that's_
what they're saying about Altavista Canada. Then they go somewhere
else for an index to the US and French servers, too.]

Some interesting hints about database replication/relative size
up there. The increasing diversity in search engine content is a
worrying trend; the web _will_ fragment. AltaCan't is probably the
most obvious example of this thusfar.

Hotbot is supposed to have most of the web indexed (34% vs 29% or
thereabouts for Altavista at its best, if you can believe Wired News -
but then Wired do own Hotbot) but since Inktomi can only search on
domain rather than full url strings (most powerful search engine ever
written? ha!) the results aren't directly comparable.

hotbot supersearch:
xent.w3.org gives 2030, top article "Three or Four's the Charm?"
xent.ics.uci.edu gives 5997 top article "The One one" - but there's an
awful lot of other junk on that server.

For now, I'd do Alta US for new server, Alta Europe for old.

> PPS. When I searched for "xent" on excite, the top ten related terms were, in
> order: khare, rohit, fixations, kohn, mentions, barrera, woodpeckers, crook,
> thau, and abelson. Methinks their database is a wee bit out of date,

hardly surprising.

> unless they REALLY have a thing for peckers.

searching for xent:
2131 hits on Northern Light. *Big* database. Shame about the weak
query tool.
1746 hits on blue window, but it doesn't prioritise on url strings.
1440 hits on Excite(&AOL)
254 on Lycos Prodigy, but that does a fuzzy search because Prodigy
users can't spell.
56 on Euroseek
52 hits on Lycos US/Lost in Space/etc
40 hits on Lycos UK
40 hits on OpenText (RIP, but a bit of a coincidence.)

The key to searching isn't formulating the right query. It's starting
with the right search engine. Excite wouldn't be on my list.

The key to relying on the public databases is sending them URL
submissions regularly; form submissions could probably be hacked into
the web archive mechanism.

L.

<L.Wood@surrey.ac.uk>PGP<http://www.sat-net.com/L.Wood/>+44-1483-300800x3641