Re: Real-time distributed Web search (Gnutella knockoff?)

Date view Thread view Subject view Author view

From: Kragen Sitaker (kragen@pobox.com)
Date: Thu Jun 29 2000 - 04:08:38 PDT


Nicolas Popp writes:
> I am not sure I understand your point. All I am saying is the type of
> distributed search that lets the destination control the answer to a query
> based on inconsistent criteria (the one that the destination decides upon)
> is likely to produce low quality results. I am not saying that all search
> engines do.
>
> Google is the perfect example of a centralized search engine. . . .

Google summarizes other people's assessments of page quality --- as
expressed by whether or not those other people choose to link to the
page --- to determine its rank. While the search engine itself and the
PageRank computation are centralized, the decisions that determine
whether a page is valuable or not are distributed over the whole
knot of the Web bow-tie.

> The fact that Google had to rely on an "external" measure such
> as connectivity to improve relevance seems to confirm my argument that
> letting the destination determine relevance is not the smartest thing to
> do...

I didn't understand that that was what you meant. I apologize for
grumping at you like that.

> >It's ironic that someone from RealNames would post such an assertion.
>
> On the contrary. Because sites will not hesitate to hijack queries,
> RealNames had to create a human editorial process (we call it adjudication)
> to decide whether someone can actually can get a keyword. This is expensive
> and believe me, we would rather trust our customers to pick the keywords
> that they are really entitled to (instead of categorical terms that are
> popular queries). However, they don't! Query frequencies are heavily skewed
> to a few million of generic terms, and everyone would like to be listed when
> such queries occur.

I didn't know that. Thanks for setting me straight. I simply assumed
the opposite. I can see that only some generic terms (e.g. mp3) are
registered.

Even Google does bring up some "hijacked" matches: for example, the
first hits on "scientology" and "amway" are both the official sites of
the respective cults. (To be fair, much of the rest of the search
results are what you'd expect.) On the other hand, that's all
RealNames returns for those names, perhaps by design.

The results of "aol sucks" and "aol-sucks" on RealNames are rather
disturbing, but it doesn't look like it's malicious --- just surprising
query semantics in the first case and a lousy search engine in the
second. The results for "sex" on RealNames suggest that a lousy search
engine is indeed there --- you just get a passel o' porn. (Look at
Google's results for "sex" for a contrast.)

> I like the distributed approach of InfraSearch. Nevertheless, I also think
> that this architecture can be too easily abused, hence will not produce good
> results in the real-world...

Well, for all our sakes, I hope we can find a way to build such a thing
that will not be abused.

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either.  :)


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Thu Jun 29 2000 - 04:11:32 PDT