Understanding the Distribution
of File Transmission Duration
in the Web
Dept. of Computer Science,
Haifa, 32000, Israel
Dept. of Computer Science,
Haifa, 32000, Israel
Describing a Web server as a stochastic queuing system requires knowledge of the service time distribution. There are two major types of service in a Web server: Static service involves file transfer, while Dynamic service includes additional work of answering client queries. We restrict our attention to file transfer service experienced by the client (that is, the Round Trip Time), where the service time distribution is characterized by the files' transmission-duration distribution (TDD).
Crovella et al.  presented evidence that the distribution of files-transmission duration distribution is heavy-tailed. Heavy-tailed distributions are characterized by extremely high variability and have dramatic negative effects on the performance. An example for heavy-tailed distribution is the Pareto distribution with shape parameter α and location parameter k , which has the following cumulative distribution function .
with α, k > 0 and x > k.
This paper tries to understand why the TDD is heavy-tailed by analyzing the TDD for each file separately. Our study follows two avenues. First, we obtain new conclusions by further analysis of the BU data set . Second, we perform simulations on the Internet, providing a new data set.
A more refined examination of the BU data set yields evidence that the TDD of the same file from the same server to the same client in the Web is Pareto and therefore heavy-tailed . This empirical result means that every file request in the Web has a non-negligible probability for a very long completion time; this holds even for very small files. In addition, it means that every Web server, including servers that serve a single static page, suffers from bursty service time.
Three major factors might be the cause of the heavy tail of the TDD: the file sizes, the route conditions and the server load.
2. THE INFLUENCE OF FILE SIZES
To study the influence of the file sizes, we compare the TDD of different files transmitted from the same server to the same client. The file sizes vary from 1KB to 150KB, covering most text files. In the simulation, we generate a file request every two seconds and collect the transmission duration statistics. Such low frequency of requests? generation hardly influences the route and the server load. The size of the server files and their popularity are distributed in a realistic way (following ). To increase the reliability, we performed the simulation several times for several clients around the world. Our simulation results appear in Figure 1. The TDDs of all files are Pareto. The shape parameters are similar and the location parameters increase with the file size.
Figure 1. Comparing the transmission duration distributions of files with different sizes that are transmitted from the same server to the same client.
Moreover, when considering files that were transmitted from the same server to the same client in the familiar BU data, the small effect that file sizes do have on the TDD seems to vanish. Almost all the TDDs have a similar shape parameter and a similar location parameter .
Findings of Cohen and Kaplan  might explain this empirical result. This study reveals that DNS query times, TCP connection establishment times, and start-of-session delays at HTTP servers times, are the major causes of long waits, more than the actual transmission time. Note that, regardless of the requested file size the time spent on requests made by the same client to the same server is identically distributed.
3. THE INFLUENCE OF ROUTES
To study the influence of the route on the files TDD, we compare the TDD of the same file, transmitted from the same server to different clients. In the simulation, the clients were located in East and West USA, Portugal, Brazil, Australia and Taiwan, and the server was located in Israel. These routes are known from previous research to have different quality parameters. The simulation result is shown in Figure 2. The TDD of all files is Pareto. The shape parameters are similar and the location parameters are different.
Figure 2. Comparing the transmission duration distributions of a file that is transmitted from the same server to several clients.
4. THE INFLUENCE OF SERVER LOAD
To study the influence of the server load on the files TDD, we compare the TDD of the same file, transmitted from the same server, working under different loads, to the same client. We examine five different load levels, from 'no-load' to 'high-load'. The server load is increased by adding more local clients. The simulation results appear in Figure 3. Both the shape parameters and the location parameters are different.
Figure 3. Comparing the transmission duration distributions of a file that is transmitted from the same server that is working under different loads to the same client.
Since the location parameter is not responsible for the Pareto heavy-tailed behavior, we conclude that the distribution of the file sizes within the above mentioned range and the route quality do not have a major influence on the heavy-tailed behavior of the TDD. Indeed, no correlation has been found between file sizes and transmission-durations distribution . Nevertheless, although this lack of correlation is specifically mentioned in , it is a common mistake to assume that the heavy-tailed file-sizes distribution is causing the heavy-tailed TDD.
The authors wish to thank Mark Crovella for the BU data trace, Roy Friedman and Roman Vitenberg for useful discussions and Idan Zach for performing the simulations.
- E. Cohen and H. Kaplan. Prefetching the means for document transfer: a new approach for reducing web latency. In Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2000), volume 2, pages 854 -863, 2000.
- M. Crovella, M. Taqqu, and A. Bestavros. Heavy-Tailed Probability Distributions in the World Wide Web. Appears in the book: A practical Guide To Heavy Tails: Statistical Techniques and Application, R. Adler, R. Feldman and M. S. Taqqu, editors, Birkhauser, Boston, 1998.
- T. Hettmansperger and M. Keenan. Tailweight, Statistical Inference, and Families of Distributions - A Brief Survey. In Statistical Distributions in Scientific Work, Patil, G.P., et al. (eds), Kluwer, Boston, 1980.
- R. Nossenson and H. Attiya. The distribution of file transmission duration in the Web. Technical Report CS-2002-09, Technion, 2002.http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi?2002/CS/CS-2002-09.