Evaluation of Delivery Techniques for Dynamic Web Content

Evaluation of Delivery Techniques for Dynamic Web Content

Mor Naaman
Stanford University
Department of Computer Science
mor@cs.stanford.edu
Hector Garcia-Molina
Stanford University
Department of Computer Science
hector@cs.stanford.edu
Andreas Paepcke
Stanford University
Department of Computer Science
paepcke@cs.stanford.edu

ABSTRACT

The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is costly to generate. Moreover, traditional web caching schemes are not very effective for dynamically-created pages. We study two new acceleration techniques for dynamic content. The first technique is Edge-Side Includes (ESI), and the second is Class-Based Delta Encoding. To evaluate these schemes, we present a model for the construction of dynamic web pages. We use simulation to explore how system, page and algorithm parameters affect the performance of dynamic-content delivery techniques, and we present a detailed comparison of ESI and delta encoding in two representative scenarios

Keywords

Web caching, ESI, delta encoding, delivery of dynamic web content.

1. INTRODUCTION

A large number of web pages served today are dynamically generated based on the "profile" of the particular requestor, or on the characteristics of a particular request. For example, a user's page at Yahoo can contain stock prices from the user's portfolio, weather summaries for cities of interest to the user, and scores from selected sports events. Users love to get "personalized" content, and dynamic pages will clearly be a growing fraction of web traffic. However, dynamic content is expensive to generate and deliver, as page construction is resource intensive, and the pages are too dynamic or too personalized to be cached.

Thus, with a "naive" dynamic web delivery scheme, most requests for pages propagate to the server. In addition to the page-assembly work and resulting higher latencies, network bandwidth consumption is high, causing this strategy not to scale well. A number of techniques have been proposed to accelerate the delivery of dynamic web pages. Some of them (e.g., [2]) are only concerned with reducing (or handling) the computational load on the server, without any influence on the network traffic load. However, our interest is in techniques that incorporate network savings as well, enabling caching of significant parts of the dynamic content. There are two classes of (sometimes orthogonal) techniques suggested in literature. Some work ([3] and others) focused on deferred assembly of the dynamic page where the final page is assembled from network-cacheable page fragments. On the other hand, delta encoding ([4,5] and others) focused on serving deltas between the page and other (cacheable) pages.

We focus on two concrete systems, as representatives for each of the techniques. One system is based on ESI (Edge Side Includes), a scheme proposed by Akamai and Oracle for describing page assembly [8]. ESI is used to enable assembly of a dynamic web page from smaller page fragments. The fragments can be independently delivered and cached closer to the client. Instead of generating a full HTML page, the server generates ESI code fragments, each containing the original HTML code for this fragment with additional ESI directives (incidentally, ESI requires a complete revision of the website's code). The fragments can be cached on specialized edge servers. The edge servers assemble the page, based on the ESI directions in the fragments, before it is delivered to the client.

The second system we evaluate is class-based delta encoding. In class-based DE [1], the server generates many different "base files". When a client request arrives, the server creates the dynamic page instance, and a specialized machine finds the best base file to use for this particular page. Then, the difference ("delta") between the page and the chosen base file is computed. The delta is sent to the client together with a reference to the appropriate base file. If the client does not have the appropriate base file, it gets the base file from the server (the bandwidth savings stem from the fact that base files are simply static web pages, and can be cached on the client side or on any network cache). Having acquired the delta and base file, the client applies the delta to the base file to construct the final page.

In the expanded version of this paper [6] we present our evaluation model and our detailed comparisons. In the current short paper we briefly sketch our model and results. The contributions of our work are:

2. A PAGE CONTENT MODEL

Evaluation of page caching schemes typically requires a page access model. In our case, we need a much richer model, in order to represent which fragments clients access, and how they are to be assembled into full pages. Our model strikes a balance between simplicity and the richness necessary to capture the main tradeoffs. The model consists of three parts: a model for the underlying data of a resource, a model for the construction of a dynamic page, and a model for the physical system (described in [6]).

A dynamic page is constructed by a selection from available items. These data items are usually chosen from different groups. A page like My Yahoo! [7] is an example where a clients can explicitly choose the modules (correspond to "groups" in our model) that appear on their page: news, weather, movies etc. Within the modules, the clients can choose any specific item, for example, cities for which they like to display weather information. In this way we can model many types of dynamic pages, not necessarily ones that are created according to user preferences.

To model the creation of a dynamic page (e.g., some user's My Yahoo! page) we simulate a two-phase selection. The first phase is a selection of groups to appear on the page. In the second phase we simulate a selection of data items from each group.

3. CONCLUSIONS

Table 1 summarizes some of the conclusions and additional considerations for comparing class-based DE to ESI. A major concern is the computational costs and latencies introduced by both schemes. Class-based DE requires generation of the entire page for each request. Moreover, the page then needs to be delta-encoded, which includes the process of finding a good base file and computing the delta. On the client side, the delta should be applied to the base file accounting for more delay. ESI, on the other hand, requires only assembly of the fragments of the page on edge servers into a full page that is served to the client. The assembly does not introduce new computational costs since it had to be done by the web server even without ESI. The web server benefits twice under ESI: not only does it not have to assemble the page; in many cases it is required to deliver only small parts of the page.

Another consideration is the transparency of both systems. While class-based DE as offered in [1] requires installation of hardware or software, usually near the web server, it does not require any change to web-pages code, and works transparently with existing network infrastructure of proxy caches and clients. ESI, on the other hand, requires changes to web-pages code, as ESI code must be added over the original HTML. In addition, ESI requires specialized edge servers (i.e., the services of a CDN provider) since the assembly directives are not implemented on proxy caches.

Table 1 - Summary: Excellent *, Good +, Bad -, Sometimes ~

ESI

DE

Reduces server traffic

+

*

Reduces client traffic

-

~

Reduces load on web server

*

-

Performance dependent on web page structure

Yes

Yes

Performance dependent on characteristics of data

Yes

Yes

Benefits greater when popularity rises

Yes

Less

Requires main site hardware/software installation

No

Yes

Requires web-page code changes

Yes

No

Requires network infrastructure (CDN services)

Yes

No

Can exploit information available from CDN for page construction

Yes

No

The distribution of work in ESI between the main site and the CDN enables the web site to use information that is available to the CDN provider only, or not maintained by the main site, such as physical location of the clients. This can be exploited in the page construction (e.g., automatically inserting relevant weather) using the ESI language.

Most important, we show that both ESI and DE can reduce traffic and improve caching for dynamic page delivery. However, the benefits of the techniques are highly dependent on the resource. For example, ESI is beneficial when there are a small number of items on the page, the item popularity is skewed, and their time-to-live is high. DE, while always good for reduction of main site traffic, may not always reduce client traffic.

4. REFERENCES

  1. Konstantinos Psounis. Class-based Delta-encoding: A Scalable Scheme for Caching Dynamic Web Content. Int'l Workshop on Web Caching Systems, 2002.
  2. Jim Challenger, Arun Iyengar, Karen Witting, Cameron Ferstat, and Paul Reed. A publishing system for Efficiently Creating Dynamic Web Content. IEEE INFOCOM 2000.
  3. Pei Cao, Jin Zhang, and Kevin Beach. Active Cache: Caching Dynamic Contents on the Web. IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware '98).
  4. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy. Potential Benefits of Delta Encoding and Data Compression for HTTP. ACM SIGCOMM, 1997.
  5. FineGround Networks. Breaking New Ground in Content Acceleration. http://www.fineground.com/pdf/FGCWhitepaper.pdf.
  6. Mor Naaman, Hector Garcia-Molina, and Andreas Paepcke. Evaluation of Delivery Techniques for Dynamic Web Content. Technical Report Number 2003-07. Stanford University, 2003. Available at http://dbpubs.stanford.edu/pub/2003-07.
  7. My Yahoo!, Yahoo Incorporated. http://my.yahoo.com.
  8. ESI 1.0. http://www.w3.org/TR/esi-lang.