Proactive Web Caching with Cumulative Prefetching for Large Multimedia Data

Jaeyeon Jung1, Dongman Lee2, and Kilnam Chon1
1Korea Advanced Institute of Science and Technology
Department of Computer Science
2Information and Communication University
Computer and Information Systems Group


Web proxy caching provides an effective way to reduce access latency and bandwidth consumption, and thus improve performance of the World Wide Web. Due to the simplicity of incorporating various types of data into the web, non-negligible access for large multimedia data is observed. Large multimedia data transfer, requiring a long-lived connection with consecutive data transmissions, often fails over wide area networks. Repetitive attempts to fetch large objects through proxy caches would waste network resources with no success guaranteed. In this paper, we propose on-demand cumulative prefetching to enhance proxy caches for better data availability of large multimedia files. Our analysis shows that on-demand cumulative prefetching can be performed within a few trials with minimal additional bandwidth requirement. The experimental results demonstrate that the proposed scheme substantially reduces error rate and increases byte hit rate for large multimedia data.

Keywords: Web proxy caching; Prefetching; Data availability; Cumulative

1. Introduction

As incorporation of multimedia data prevalent, the web exhibits variability in workload characteristic [1][2][14]. In particular, the web object sizes extremely vary in the range of 102 to 108 bytes according to media types. HTML documents and embedded images are in the range of 1Kbyte to 10Kbyte but executable binary and multimedia data such as mp3 audio and mpeg video files easily exceed 1Mbyte [9]. Analyses done previously [1][2][11][14] show that non-negligible access is made for large multimedia data though the majority of web data access is for small text and image data objects.

Often fails large multimedia data transfer via a wide area network like the Internet where it is hard to support a long-lived connection with consecutive data transmissions. As shown in figure 1, transfer error rate increases exponentially as the object size becomes larger than 100Kbyte and the error rate of objects larger than 10Mbyte is over 80%.

Proxy caching provides an effective way to reduce access latency and bandwidth requirements for Web content transfer. It copies the recently accessed data to a location closer to clients and improves data availability. However, conventional proxy caches stop fetching and send an error message to a client immediately when an error occurs. The partially fetched objects up to that point are discarded from the cache. We analyzed traces taken from real world proxies and observed that independent of object sizes, users are highly likely to access again the data that they have accessed more than once (see more detail, in section 3). Subsequent requests for the same objects would follow the same fate unless the network condition gets improved much since then. Consequently, repetitive attempts to fetch these large objects through proxy caches would waste network resources with no success guaranteed.

Figure 1: Error rate by object size, both for and traces. The plots are smoothed to eliminate small spikes. Size is shown on a log scale.

In this paper, we propose an enhanced proxy cache that improves data availability of large multimedia, leveraging the observed user access pattern - as soon as a request fails twice, the proposed scheme starts prefetching the failed data and makes the subsequent requests successful data. In addition, to reduce abusive use of network bandwidth due to high transfer failure rate when transferring large data, we take advantage of cumulative fetching which augments data in successive retrievals. Since a proxy cache is located at an aggregate point of the web traffic from a large network user community, the cache that employs some mechanism to prefetch data is naturally able to adapt user access patterns. While we consider conventional caching passive, we do the proposed scheme proactive. The proactive caching with on-demand prefetching can improve the performance of a proxy cache as well as the data readiness of large multimedia data.

We have developed an on-demand cumulative prefetching mechanism within a popular proxy cache, Squid [16], and evaluated its performance using various methods. The analysis shows that the proposed scheme makes the requested data available within a few trials and requires much less additional bandwidth than the conventional scheme. The experimental results exhibit that the proposed scheme substantially reduces error rates by 40%, and saves network bandwidth by 13.18% and enhances byte hit rates by 8.1% for large multimedia data.

The rest of this paper is organized as follows: Section 2 discusses related work. Section 3 describes workload characteristics to motivate the use of on-demand prefetching and cumulative fetching within a proxy cache. Section 4 presents the numerical analysis over the static network, where the probability of an error in a unit-transmission does not change over time. The model is applied for estimating bandwidth cost and any improvements in byte hit rate of the proposed scheme. Then, we outlines the design and implementation of the on-demand prefetching mechanism and demonstrate its performance benefits using both experiments and trace-driven simulations in section 5. Conclusion and future directions follow in section 6.

2. Related work

There have been many proposals on using prefetching and caching for latency reduction [8][10][18]. By prefetching objects into proxy caches, the references to the prefetched objects can be satisfied without further communication to the server, thus reducing network latency. Kroeger et al. analyzed the efficiency of an approach combining caching and prefetching based on proxy logs [10]. They found that the combined approach reduces latency by up to 60% when the entire reference is known at the beginning. In fact, prediction doesn't encompass all of future reference and there is a finite chance of false prediction. Wang and Crowcroft examined that relationship between bandwidth consumption and the probability of correct prediction [18]. Using a queuing model, their analysis shows that bandwidth utilization rapidly increases for ensuring network latency reduction.

Mirroring is a common way to avoid abusive bandwidth consumption due to repetitive failures in caching large multimedia data. It predicts error-prone data and copies them before an actual request is made. Mirroring operations are usually scheduled when large bandwidth is available (e.g. in the middle of night) so users can get fast responses from the mirrored copies during daytime. We call it off-line prefetching. Mirrored copies would avoid most failures caused by network congestion as mirror servers are located before bottleneck points. However, the effectiveness of the mirroring is limited on the accuracy in predictability because of the dynamics and high rate of change in traffic patterns of WWW [5]. Responding to changing traffic patterns is extremely costly, if not impossible [3].

Feldmann et al. analyzed the amount of wasted bandwidth by aborted connection using simulations [6]. They focused on bandwidth saving rather than improving data availability by a proxy cache. We present further analysis on the effects of resuming broken download in accordance with user access characteristics in the following sections.

3. On-demand Prefetching

In this section, we present the analysis results of real-world proxy behavior and explore these results for the on-demand prefetching mechanism. We obtained log files from 18 caches, a set of caches locating one of major backbones in Korea and cache [12], a root cache locating at vBNS [17]. Figure 2 shows cumulative distribution of the number of accesses and the amount of traffic volume by object size. The traffic volume by the objects larger than 1Mbyte represents more than 40% of total traffic, which is significant amount considering less than 1% access counts. Combining this with figure 1, we can expect that large amount of traffic would waste due to errors. Table 1 lists a summary of analysis results.

Figure 2: Number of accesses vs. amount of traffic volume

First, about 50% of the failed data objects are re-accessed within a day. Table 1 shows the percentage of re-access for each data set. Highly repeated attempts make the on-demand prefetching comparable to the off-line one. In other words, we don't have to delay prefetching until nighttime to determine prefetchable objects from the analysis of log files. Instead, on-demand mechanism initiates prefetching at any time, thus improving the performance by prompt reaction to user demands. Second, most failures are temporal. In the analysis, we classify failures either into a connection error or a transmission error based on the length of a fetched object. Objects with a zero length count up the number of connection errors. Table 1 indicates that only less than 4% are caused by connection errors. Connection errors include server failures and network partitioning, which involves special efforts. On the other hand, transmission errors are mainly caused by network congestion, which is automatically resolved by TCP mechanisms. Thus, we define prefetchable objects as failed objects with a non-zero length.

Total statistics



Data set




Objects fetched


4,010,773(100% )

Objects fetched with errors



Objects fetched with zero object length



Bytes fetched (Kbyte)



Bytes fetched with errors (Kbyte)

8,652,600( 13% )

14,654,932( 21% )

Percentage of objects re-accessed



Table 1: Summary of analysis results (percentages in parentheses are relative to the category above it)

The performance of the proposed method depends on how fast it can accomplish prefetching, while the performance of off-line prefetching does on the prediction accuracy. Figure 3 shows the distribution of inter-access time for the broken data objects. If prefetching is done in 1000 seconds after the first error occurs, we can ensure 97% of following requests against subsequent errors. Unlike off-line prefetching, where prefetching is usually done when relatively large bandwidth is available, on-demand prefetching competes bandwidth against user transactions. It is important to minimize network resource consumption.

Figure 3: Cumulative distribution of inter-access time distribution

In order for the cost-effective and higher performance prefetching, cumulative fetching strategy is introduced. When an error occurs, existing techniques stop transmitting and send an error message to a client. A portion of files fetched up to that point is discarded from a cache. Instead, cumulative fetching invokes the range request specified in HTTP 1.1 [7] to fetch the rest of data and make them up into a complete one.

We expect two significant benefits from the cumulative fetching. First, it is possible to reduce bandwidth demands because of retrieving only necessary data portion with an offset from the beginning of the data. Figure 4 shows the average amount data fetched before errors over data size spectrum. The straight line represents expected length referred from content-length field in an HTTP reply header. In average, 38% of the fetched data fails after more than 50% of expected amount is retrieved. This implies that considerable amount of data is trashed which otherwise would be saved if cumulative fetching is used.

Second, a fewer number of trials are required for prefetching entire data. Since an error rate increases proportioning to the object size at the given network conditions, repeated attempts are also more likely to fail. Figure 5 shows the average proportion of data requests failed more than twice. It clearly illustrates that chances for the same requests to fail increase as data objects get bigger. On the other hand, cumulative fetching reduces failures since the amount to fetch gets smaller in the subsequent fetches. Thus data can be retrieved with fewer trials and bandwidth consumption can be minimized. The following section describes numerical analysis of data fetching over error-resilience network links and compares cumulative fetching with the conventional approach on bandwidth consumption and number of trials.

Figure 4: Fetched length vs. expected length

Figure 5: The average proportion of data requests having errors more than twice

4. Analysis of Bandwidth Requirement and Byte Hit Rate of On-Demand Prefetching

4.1 A. Model

To understand the correlation between an error rate and an object size, we develop a numerical model for the bandwidth requirement and the number of trials required to fetch data completely. We assume that data is composed of n units and that each unit has the same failure probability during transmission. To simplify the model, we consider a static network, i.e. the network condition does not change over time. Transmission of the entire data is considered successful if all of consecutive n transmissions are performed without errors. Let p is an error probability of each unit and Pn is an error probability of data with n units, then

Therefore, the average number of units that are successfully delivered during the transmission is

Figure 6 compares Pn from the error model with real data. In this figure, we assumed that n linearly increases to the object size. Although the error rate increases more sharply with object size, the numerical model well explains the correlation between error rate and object size.

Figure 6: Error rate by object size. The solid line represents the error rate calculated.

4.2 B. Cumulative fetching vs. repetitive fetching

In order to compare the bandwidth consumption of different strategies, we define bandwidth cost for a given value of Pn. Bandwidth cost is considered the number of units that are being transmitted on links. We assume that transmission repeats until the entire data is fetched. Then, the total bandwidth cost of the repetitive fetching is the sum of bandwidth cost of each trial. That is,

,where is the average bandwidth cost when the fetching is
accomplished in ith trial with probability of .

Similarly, the average number of trials is given by

As the cumulative fetching resumes data fetching at a point where an error occurs, the total bandwidth cost is simply n,

and the average number of trials can be described as

,where Fj represents the number of units fetched at the jth trial when the transmission ends up with an error. In this case, the number of units to be transmitted at the ith trial is decreased
by the amount ,so the probability that transmission is successful in ith trial becomes

Figure 7 and 8 depict bandwidth consumption and the number of trials in each strategy. The results show that repetitive fetching of error-prone data significantly consumes bandwidth and the number of trials increases as error rate goes higher. The cumulative fetching requires constant bandwidth cost independent of error rate and allows prefetching to be done within a couple of trials even when error rate is 80%.

Figure 7: Bandwidth requirement by error rate
Figure 8: Number of trials by error rate

4.3 C. Prefetching vs. non-prefetching

For the assessment of user access characteristics on the effect of cumulative prefetching, we take the probability of future references into consideration. We assume that prefetching is done before the next request arrives. Let M be an access frequency then byte hit rate by the cumulative prefetching is simply,

And the bandwidth cost is constant, n

The byte hit rate without cumulative prefetching depends on the number of trials to fetch complete data. If the number of access to the objects is smaller than the number of trials, then the byte hit rate becomes zero and the average bandwidth cost is

When the number of access exceeds the number of trials, a proxy cache can successfully serve subsequent accesses after the transfer is done.(Tn) Thus, no additional traffic is required from servers and bandwidth cost does not increase over M and the cost is

The byte hit rate can be written as

,where is the amount of traffic satisfied from a proxy cache.

Figure 9 depicts the additional bandwidth cost of cumulative prefetching compared with non-prefetching according to M = 1, 2, 5,10 and 20. The additional bandwidth cost increases as does an error rate when M = 1. This is because that the fraction of file to be prefetched is large for a high error rate and this amount of traffic is wasted for the wrong prediction. However, the additional cost can be compensated for by bandwidth saving when M is greater than or equal to two. For example, when M is 2 and Pn is 0.69, the traffic amounts to 0.2% of total number of bytes in file is wasted due to prefetching but byte hit rate is increased from 16% to 50%. The improvement in byte hit rate is shown in figure 10. The improvement is always positive except when M is one.

Figure 9: Bandwidth cost vs. access frequency

Figure 10: Byte hit rate vs. access frequency

5. Experimental Results

We implemented the on-demand cumulative prefetching scheme in Squid [16], a public proxy cache software. A proxy cache accepts user request, determines the location of a target object and delivers it through the Internet. The prefetching module monitors every transaction and attaches a prefetch-flag to the object entity that is broken during transmission. The range request in HTTP 1.1 is used for cumulative prefetching. A server with HTTP 1.1 parses range header and transmits requested data byte from the offset. Then, fractions of object entity are summed up at the proxy and the complete object is swapped out to the local storage.

Figure 11 shows a state transition diagram for the prefetching module. There are four states: prefetch-none, prefetch-needed, prefetch-pending and prefetch-done. Each state transition is initiated by events which may have actions associated with. Prefetching is triggered when the fetched object length is less than the content-length at the end of transmission. The object is prefetchable if it is cacheable and has a non-zero positive fetched length. With the creation of PrefetchState, the prefetch timer is set to an initial value. To avoid frequent data fetching over congested links, we apply exponential backoff to prefetch-pending data objects. When the timer expires, we connect to the server and request remaining bytes of the objects. If retrieving data is done successfully then the complete object is swapped out for future references. Otherwise, we repeat fetching until the number of trials reaches a threshold. Finally, PrefetchState is destroyed.

Figure 11: State transition diagram for Prefetching

To evaluate the performance of the on-demand prefetching mechanism, we tested our implementation over an emulated network environment. Figure 12 illustrates an experimental environment. We used dummynet [15] on FreeBSD operating system for emulating packet losses, communication delay and bandwidth limitation. In dummynet, most features of the emulated network are simply controlled and configured via a command line interface. We conducted various experiments on Pentium 300MHz PC with 128MB physical memory.

Figure 12: Experimental environment

In each experiment, packet loss rate is controlled to get a different error rate. Wget [19] is customized to retrieve the same data repeatedly through a proxy cache with some time intervals. Figure 13 shows the error rate observed using different packet loss rates. The error rate changes sensitively to a small range of the packet loss rate. In the first set of experiments, we disabled the on-demand prefetching module in Squid and measured the number of trials varying the packet loss rate. Then, the second set of experiments was done enabling the on-demand prefetching module. Figure 14 shows the result. The number of trials is significantly reduced with the prefetching especially with high error rate, which is expected in the numerical analysis. In average, 3.2 times of trials were required to transmit the entire data by cumulative prefetching while 9.5 times of trials by non-prefetching method when the error rate is 0.8

Figure 13: Error rate vs. packet loss rate

Figure 14: Number of trials

5.1 Trace-driven simulation

We demonstrate performance benefits of the on-demand cumulative prefetching through trace-driven simulations. We compute data availability, the amount of traffic saved and byte hit rate over 6 different fetching strategies.

The simulations draw on log files [12] of caches for 7 days. There are two types of log files. The access log contains request time, url, and the number of bytes transmitted to clients. The store log records the expected length and the fetched length for each file when an object is transferred from server into a local storage. By combining these two sets of files, we can figure out data objects that failed to be fetched, and the expected length and the number of requests followed to those objects. Table 2 lists 6 fetching strategies and each description used in the simulation. Figure 16 illustrates the error rates according to fetching strategies. Compared with the existing repetitive fetching method (RF), other 5 methods have improved data availability from 21% to 60%. On-demand cumulative prefetching (ONCP) and off-line cumulative prefetching method (OFCP) provide the maximal improvement since hit is guaranteed from the second time access of the broken objects in those two cases. For example, the reduction in error rate for 10Mbyte data is 50%.

Name Description
RF Repetitive Fetching
CF Cumulative Fetching
OFCP OFf-line Cumulative Prefetching. Prefetchable objects are limited to the objects that are to be requested at least once during their lifetime. (entire access sequence is known at the beginning)
ONCP ON-demand Cumulative Prefetching. Prefetching is initiated immediately for the failed object transmission
2ONCP ON-demand Cumulative Prefetching with 2 lookaheads. Prefetching is initiated for the objects having two consecutive failed transmissions.
3ONCP ON-demand Cumulative Prefetching with 3 lookaheads. Prefetching is initiated for the objects having three consecutive failed transmissions.
Table 2: Fetching strategies

Figure 16: Data availability. Error rates are plotted for each fetching strategy.

Table 3 lists the amount of traffic saved and the improvement of byte hit rates by each strategy. 'Traffic saved' represents how much network traffic is saved between a proxy cache and servers compared with repetitive fetching method (RF). 'Improvement' represents the difference between byte hit rate obtained by repetitive fetching (RF) and that obtained by each method. Cumulative fetching method (CF) provides the most effective way to save the network bandwidth from a proxy cache to servers because it resumes fetching from the point broken at the last attempt. However, it doesn't give us much benefit in terms of data availability. On-demand cumulative prefetching (ONCP) is a straightforward method to ensure high data availability. This method provides equivalent performance to the off-line cumulative prefetching (OFCP) which provides the maximal availability in the given access characteristics. But it can be achieved at the cost of extra bandwidth consumption between a proxy cache and servers. In this simulation, it shows that 0.8% extra traffic is occurred because of false prediction. Therefore, we leverage user access pattern to reduce the false prediction. 2ONCP method looks ahead past two records including the current one before initiating prefetching for the broken objects. Table 3 and figure 16 show that 2ONCP method saves data traffic amount to 13.18% and improves byte hit rate 8.10% while reducing error rate to 42% in the best case.


Byte prefetched (Kbyte)

Traffic saved

Byte hit (Kbyte)

Byte hit rate






































Table 3: Simulation results

6. Conclusion

To avoid abusive bandwidth demands by repetitive failures of large multimedia data, we have proposed the on-demand prefetching for a proxy cache. We have analyzed bandwidth consumption of different strategies over error-resilience network links. The analysis shows that bandwidth consumption by repetitive trials of data fetching with non-zero error probability increases rapidly as the error probability goes higher. On the other hand, on-demand cumulative prefetching can be performed within a few trials with no additional bandwidth requirement. To assess user access patterns on the effects of prefetching, we compute byte hit rate and bandwidth cost varying access frequency. The results indicate that improvement in byte hit rate increases in any case while the additional bandwidth cost depends on access pattern.

We have measured the performance of the proposed scheme over the network emulating a wide area network environment like the Internet. The experiments show that on-demand prefetching can be effectively achieved by cumulative fetching. Finally, trace-driven simulations explore traffic saved, improvement in the byte hit rate and data availability. The results show substantial reductions in error rates for large multimedia data. Error rates are reduced by 60% to 21% depending on fetching strategies. The network traffic between a proxy and servers is saved by 17% to -0.8%. Byte hit rate is also increased by 9.66% to 7.00%.

We plan to extend the presented model in terms of user-perceived latency that is one of ultimate evaluation metrics for the performance of web proxy caches. This will allow us to elaborate our prefetching strategy to increase performance as well as to reduce client latency.

The cumulative prefetching could be combined with the rate controlled prefetching [4] to minimize on the queuing delay in networks due to prefetching. We plan to monitor network conditions at a proxy cache to tailor the rate allocation to traffic for each request.


We would like to thank the anonymous referees for valuable comments. This work would not be possible but for the logs provided by Jaeho Yang at the Inet and the National Laboratory for Applied Network Research supported by National Science Foundation grants NCR-9616602 and NCR-9521745.


[1] M. Arlitt and C. Williamson, Web Server Workload Characterization. in proceedings of ACM SIGMETRICS 98, San Francisco, CA, USA, November, 1996

[2] P. Barford and M. Crovella, Generating Representative Web Workloads for Network and Server Performance Evaluation. in proceedings of SIGMETRICS 98, Wisconsin, USA, June 1998

[3] J. Chung and M. Sirbu, Distributed Network Storage with Quality-of Service Guarantees. in proceedings of INET99, San Jose, CA, USA, 1999

[4] M. Crovella and P. Barford, The Network Effects of Prefetching. in proceedings of IEEE Infocom 98 San Francisco, CA, 1998

[5] F. Douglis, A. Feldmann, B. Krishnamurthy and J. Mogul, Rate of Change and Other Metrics: a Live Study of the World-Wide Web. in Usenix Symposium on Internet Technologies and Systems, Monterey, USA, December 1997

[6] A. Feldmann, R. Caceres, F. Douglis, G. Glass and M. Rabinovich, Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments. in proceedings of INFOCOM 99, 1999

[7] R. Fielding, J. Gettys, J. Mogul, H. Frystyk and T. Berners-Lee, Hypertext Transfer Protocol HTTP/1.1., RFC2068, January, 1997

[8] J. Gwertzman and M. Selter, The case for geographical push caching. presented at 5th Annual Workshop on Hot Operating Systems, 1995

[9] J. Jung, Enhancing Web Caching Architecture with the Replication of Large Objects. Masters thesis, Korea Advanced Institute of Science and Technology, Korea, February, 1998

[10] T. M. Kroeger, D.D.E.Long and J.C. Mogul, Exploring the bounds of Web latency from caching and prefetching. in Usenix Symposium on Internet Technologies and Systems, Monterey, USA, December, 1997

[11] D. Menasce and V. Almeida, Web and Intranet Performance: A Quantitative Analysis, tutorials in Usenix technical conference, New Orleans, USA, June, 1998

[12] NLANR, Proxy cache log traces, October 1999,

[14] J. Pitkow, Summary of WWW Characterizations. in proceedings of WWW8, April 1998

[15] L. Rizzo, Dummynet: a simple approach to the evaluation of network protocols. ACM Computer Communications Review, Vol. 27, No. 1, 1997

[16] Squid Web Proxy Cache ,, 1999

[17] vBNS, very High Speed Backbone Network Service

[18] Z. Wang and J. Crowcroft, Prefetching in the World Wide Web. in proceedings of IEEE Global Internet, London, UK, 1996

[19] GNU Wget


Jaeyeon Jung is a Ph.D student in Computer Science at the Korea Advanced Institute of Science and Technology (KAIST). Her research interests include Web workload analysis and characterization, caching visualization and scalable information dissemination architecture.

Dongman Lee received the BS in Computer Engineering from Seoul National University in 1982, and the MS and Ph.D. in Computer Science from KAIST in 1984 and 1987, respectively. He worked at HP as a technical contributor from 1988 to 1997. He is associate professor at Information and Communications University (ICU) since 1998. His research interests include distributed object computing, distributed collaborative environments, multimedia multicast protocols, fault-tolerance, group communications, and mobile computing.

Kilman Chon received Ph.D in Computer Science from UCLA in 1974, in addition to BS in Engineering Science from Osaka University. He worked at Rockwell International as distributed computer system designer in late 60s, and Jet Propulsion Laboratory as member of technical staff in the area of advanced mission control in late 70s. He is a professor in Computer Science department at KAIST since 1982. He worked on network systems including the Internet since early 1980s, and contributed to form various regional organizations in the Internet such as APAN and APTLD, of which he is the founding chairs. He is also co-chairing Coordination Committee of Intercontinental Research Networking (CCIRN).