Re: Search engine for FoRK

Dan Kohn (dan@teledesic.com)
Wed, 2 Jul 1997 15:41:26 -0700


On Wednesday, July 02, Rohit Khare <khare@mci.net> wrote:

>My ideal
>choice would be a standalone, spider-indexer that works on a separate
>server. Seems like a fine isolation, instead of a tool which indexes the
>file-space and maps them together later.

Have you seen the way people link to Infoseek, at say
<http://www.cnn.com/SEARCH/index.html>? Is this free? Will Infoseek do
more than 40 pages? How about Hotbot? Linking to an external site
seems like it should be feasible.

>How often does scooter visit sites?
>When we introduced AltaVista, we used to create a completely new
>index every four to six weeks. Now we update our index
>continuously to reflect changes on the Web.
> But we don't use a fixed schedule. Instead Scooter visits pages
>according to the frequency at which they appear to change: a
>page which has been stable for months will be revisited less often
>than a page which is different every time we check it.

This is a bad joke, oddly analogous to AmEx's line that you have no
fixed credit limit. Obviously, there's a fixed credit limit. The King
of Borneo may be able to charge 3 Lear Jets without needing to make an
explanatory phone call (a true story), but I doubt my charge would go
through. In both cases, Alta Vista and AmEx are replacing a straight
(if unsatisfactory answer) -- 4 to 6 weeks and $10K -- with a euphemism
-- "continuously" and "no credit limit".

As I told Adam, there's no reason a search engine can't keep up with the
whole Web except for lack of will. Also, note that HTTP/1.1 cacheing
headers will provide incredibly helpful info to spiders on how often
they should refresh. (Since, for instance, all the posts in the FoRK
archive should have a refresh setting of one year, while the current
archive's index should be refreshed hourly.)

Alta Vista is just pathetic, and I've locked myself in by learning their
searching grammar. Has anyone tried Hotbot? Is it better?

- dan