Zipf's Law.

I Find Karma (adam@cs.caltech.edu)
Wed, 14 Aug 96 16:38:07 PDT


Looks like the decent-coverage illustration might be generally
appliable. 20 web browser versions cover 85% of the users?
Yeah, I can believe that.
-- Adam

From: koen@win.tue.nl (Koen Holtman)
Subject: Re: Conventions for Sharing User Agent Profiles

Shel Kaphan:
>There's trust, and then there's trust. While I (browser user) may
>trust a browser vendor enough to give me a browser I can use safely
>without trashing my filesystem (e.g.), I (service provider) may not
>*believe* everything a browser vendor says about the capability of
>their browser. For instance, a browser vendor might want to advertise
>they are fully compatible with the latest version of Netscape, when in
>fact, there are numerous niggly details about their rendering choices
>that are not done in the same way, and that might not even be noticed by
>the vendor themselves.

I fully agree. If a niggly detail is added to a browser profile, this
action will usually

1) contradict the marketing department of the browser vendor
2) make the programmer who got it subtly wrong unhappy.

So you can't really depend on the browser vendor to make such
additions, no matter if the company is marketing-driven or
technology-driven.

It seems you need independent parties, and a system in which these
parties do not have to care about browser vendors not liking them.
Hmmm. Sociologically, this is beginning to sound like a rating
problem. PICS anyone?

It seems that the biggest problem is not in the distribution of the
profiles, but in their creation: who would actually spend the
time/money gathering information about subtle incompatibilities? I
can think of a number of answers:

1) a consortium of content providers

2) a third party which sells the info to individual content providers

3) a company selling web content creation tools

Could information about subtle incompatibilities flow freely over the
web in any of these cases?

I think we first need to figure out a social/economical model in which
information about subtle bugs would actually be created, not just for
the 5 most popular browser versions, but for the 100 most popular
browser versions. Building a distribution mechanism for this
information seems to be a trivial matter in comparison.

Some statistics to illustrate how much work you need to do to get
decent coverage:

the N most popular account for
user agent versions X% of all requests

1 20.5%
2 35.3%
5 62.1%
10 75.7%
20 84.7%
50 93.1%
75 95.9%
100 97.4%
200 99.4%
300 99.8%
400 100.0%
524 100.0%

(Statistics based on the user agent strings in ~500K requests, 524
different agent versions found. For a header like `User-Agent:
Mozilla/2.0 (Win16; I)', only `Mozilla/2.0' was significant in
determining the user agent version. Web Robots (accounting for an
estimated 8% of requests) were not filtered out when making these
statistics, but I expect no big distortion in the general trend
because of this.)

[Maybe we should move this subthread to www-talk?]

Koen.