From: Gregory Alan Bolcer (gbolcer@endtech.com)
Date: Fri Sep 22 2000 - 13:21:19 PDT
Apologies if this has already been FoRKed, but I have
been unable to find it with our overpowered search engine and
more at large searches even at Clip2 don't turn up the exact article.
The stunning conclusion:
> a typical dial-up Gnutella host does not have
> sufficient bandwidth to effectively participate as a peer on the network
> when the average query rate exceeds approximately 10 queries per second.
Greg
-- Gregory Alan Bolcer | gbolcer@endtech.com | work: 949.833.2800 Chief Technology Officer | http://www.endtech.com | cell: 714.928.5476 Endeavors Technology, Inc. | efax: 603.994.0516 | wap: 949.278.2805Bandwidth Barriers to Gnutella Network Scalability Clip2 Distributed Search Services September 8, 2000
Summary The scalability of a Gnutella network to accommodate more users performing more searches is limited by the lowest bandwidth links prevalent within the network. Usage of the public Gnutella network has grown to the point that a "Dial-Up Modem Barrier" has been hit, with the result that network usability has degraded considerably.
Queries and Average Query Rates Typically, Gnutella users either run downloaded "servent" software that contains both client and server components, or they use one of several Web sites that provide Gnutella client capabilities without requiring any software installation. In either case, the client interface provides a means for a user to conduct a search of the network of Gnutella hosts reachable within a specified number of network hops. A Gnutella host that receives a query passes it on to all the other hosts to which the host is connected, and so on, until the query has traveled its allotted number of hops. Even if a particular host responds to a query, it still passes the query along so that others may respond. In this relaying of queries, hosts treat each other equally regardless of physical characteristics such as network connection speed or CPU power.
Clip2 DSS visited over a million hosts during a 34-day period in July and August, 2000 and counted the number of queries broadcast by each over a fixed time interval. This number included both queries originating at each host and queries being passed along by each host on behalf of others. We obtained an average query rate for each host, and then we averaged over all hosts discovered per day to obtain an average query rate for the Gnutella network for the day. The average query rate is a significant metric because it is a global measure of usage of the Gnutella network. Below, we plot the daily network average query rate: [Image] The data show three trends:
(1) The query rate grew by approximately 100% during the "Napster Flood" of July 26-28, the period during which Napster was threatened with imminent shutdown.
(2) Over a period of days following Napster's reprieve, the query rate declined to pre-Flood levels.
(3) The query rate hit a low on August 6 from which it has rebounded to a level comparable to that seen during the Napster Flood. The rebound included a sharp increase between August 13 and August 14.
Correlations and Scalability Barrier We note that these trends qualitatively correlate with reports on the usability of the Gnutella network. In particular, during the Napster Flood and in the period since approximately August 14, numerous complaints were posted on Gnutella user forums. Users reported responses to their searches were fewer in number and slower to arrive than in the past. They also found their servent software reporting smaller network sizes than they were accustomed to seeing.
Quantitatively, we have observed a correlation between the average query rate and the responsiveness of the network to Gnutella pings. As the average query rate increases, the fraction of hosts returning Gnutella pongs in response to Gnutella pings decreases.
Based on these correlations, it appears the Gnutella network has a scalability barrier at an average query rate of approximately 10 queries per second. This barrier was hit during the Napster Flood and in the period from August 14 to the present.
Explanation of the Scalability Barrier We propose that the source of the barrier is a sufficiently prevalent and well-distributed class of hosts connected via dial-up modems at speeds below 56 kilobits per second. Below, we outline an approximate analysis supporting the plausibility of this hypothesis. Our approach is to show that even by a conservative estimation, a typical dial-up Gnutella host does not have sufficient bandwidth to effectively participate as a peer on the network when the average query rate exceeds approximately 10 queries per second.
In a separate study, we found the most common number of connections per host to be three. A typical Gnutella query message is approximately 30 bytes long including header information, and these messages are encapsulated in messages of lower-level protocols. One TCP/IP wrapper alone requires approximately 40 bytes. To make a lower-bound estimate, we will assume one TCP/IP wrapper per Gnutella message and ignore byte contributions from other protocols that may be in play, such as PPP. Altogether, then, the bandwidth required to communicate 70-byte (560-bit) messages between 3 hosts at a rate of 10 per second is 560 X 3 X 10 = 16,800 bits per second.
Gnutella pings are another sort of message broadcast on the network in the same way as queries. Gnutella hosts issue pings when they connect to other hosts in order to harvest Gnutella pongs that contain host information. Analyzing frequency of message types on the network, we have found pings to be approximately twice as common as queries, with a similar message size.
Gnutella traffic includes three other sorts of messages. Gnutella pongs are broadcast in response to pings, and query hits are routed in response to queries. We noted above both qualitative and quantitative results that query hits and pongs become less common as the average query rate increases. In addition, the older yet popular Nullsoft Gnutella v0.56 servent has been observed to broadcast yet another sort of Gnutella message ("push requests") that many other servent programs route. In sum, we estimate these other message types as a class occur at a bit rate no less than that consumed by queries, and this lower bound estimate will suffice for our purposes here.
In sum, these sources add up to at least 4 X 16,800 bits per second, or 67,200 bits per second, exceeding the bandwidth available to a dial-up modem. As the query rate rises into the range of 10 per second, it is clear a host connected to the network via dial-up modem will begin to fall behind in its processing of network traffic. The result would be exactly what has been observed: slow responses or no responses at all to queries and pings.
We reiterate that this is only an approximate analysis designed to demonstrate the plausibility of the hypothesis that dial-up modems are responsible for the observed scalability barrier. The plausibility is particularly strong given that we have made conservative, lower-bound estimates of several relevant quantities and still obtained a bit rate in excess of that typically achievable with a dial-up modem. To drive our point home, we note that the plausibility holds so long as the bit rate required at approximately 10 queries per second is merely in the regime of dial-up modems. We could ignore the 2X factor due to pings, gauge the total bit rate at 33,600 bits per second, and still have a reasonable argument indicting dial-up modems as the network bottleneck.
Finally, we point out that a barrier exists for each prevalent class of host connection. Were dial-up modem users to disappear from the network, the next barrier would be encountered at the next-highest-speed common class of connection, probably low-speed DSL in the low 100s of kilobits per second range. This would accommodate growth of the average query rate by a factor of a few to 10 over the current average before the same problem would be encountered again. Home | About Us | Join Us | Contact Us © Copyright Clip2.com, Inc. 2000. All rights reserved. [Image]
This archive was generated by hypermail 2b29 : Fri Sep 22 2000 - 13:25:34 PDT