Our Approach next up previous
Next: A Case Study: Index Up: Introduction and Motivation Previous: Previous Work

Our Approach

Our survey of other work in this area has led us to formulate five desiderata for an adaptive web site.

Avoid creating work for visitors (e.g. filling out questionnaires). Visitors to a site are turned off by extra work, especially if it has no clear reward, and may opt out rather than participate. Furthermore, if the site cannot improve itself without feedback, it will fail if users do not assist.

Make the web site easier to use for everyone, including first-time users, casual users, etc. Customization can be genuinely useful for repeat visitors, but does not benefit first-time users. In addition, one user's customizations do not apply to other users; there is no sharing or aggregation of information across multiple users. Transformation has the potential to overcome both limitations.

Use web sites as they are, without presupposing the availability of meta-information not currently available (e.g., XML annotations). Perhaps web sites of the future will be heavily annotated with both semantic and structural meta-information. However, we would like to make it possible to transform today's existing web sites into adaptive ones, without making assumptions regarding the future of the web.

Protect the site's original design from destructive changes. Web designers put a great deal of effort into designing web sites. While we may be able to assist them, we do not want to replace them or undo their work.

Keep the human webmaster in control. Clearly, the human webmaster needs to remain in control of the web site in the foreseeable future both to gain her trust in automatic adaptive techniques and to avoid ``disasters.''

We took the above desiderata as constraints on our approach to creating adaptive web sites. We use transformation rather than customization, both to avoid confronting visitors with questionnaires and to facilitate the sharing of site improvements for a wide range of visitors. We focus on an access-based approach, as automatic understanding of free text is difficult. We do not assume any annotations on Web pages beyond HTML. For safety, we limit ourselves to nondestructive transformations: changes to the site that leave existing structure intact. We may add links but not remove them, create pages but not destroy them, add new structures but not scramble existing ones. Finally, we restrict ourselves to generating candidate adaptations and presenting them to the human webmaster -- any non-trivial changes to the web site are under webmaster control.

24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:05 -0800] "GET /home/jones/collectors.html HTTP/1.0" 200 13119
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/madewithmac.gif HTTP/1.0" 200 855
cs106-14.u.washington.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/chinn/ HTTP/1.0" 200 1896
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:06 -0800] "GET /home/jones/gustop2.gif HTTP/1.0" 200 25460
x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /home/rich/aircrafts.html HTTP/1.0" 404 617
x67-122.ejack.umn.edu - - [21/Nov/1996:00:01:08 -0800] "GET /general/info.gif HTTP/1.0" 200 331 - - [21/Nov/1996:00:01:09 -0800] "GET /home/smith/kitty.html HTTP/1.0" 200 5160
24hrlab-214.sfsu.edu - - [21/Nov/1996:00:01:10 -0800] "GET /home/jones/thumbnails/awing-bo.gif HTTP/1.0" 200 5117
Figure 1: Typical user access logs, these from a computer science web site. Each entry corresponds to a single request to the server and includes originating machine, time, and URL requested. Note the series of accesses from each of two users (one from SFSU, one from UMN).

The main source of information we rely on is the site's web server log, which records the pages visited by a user at the site. Our underlying intuition is what we call the visit-coherence assumption: the pages a user visits during one interaction with the site tend to be conceptually related. We do not assume that all pages in a single visit are related. After all, the information we glean from individual visits is noisy; for example, a visitor may pursue multiple distinct tasks in a single visit. However, if a large number of visitors continue to visit and re-visit the same set of pages, that provides strong evidence that the pages in the set are related. Thus, we accumulate statistics over many visits by numerous visitors and search for overall trends.

It is not difficult to devise a number of simple, non-destructive transformations that could improve a site; we describe several in [14]. Examples include highlighting popular links, promoting popular links to the top of a page or to the site's front page, and linking together pages that seem to be related. We have implemented one such transformation: shortcutting, in which we attempt to provide links on each page to visitors' eventual goals, thus skipping the in-between pages. As reported in [13], we found a significant number of visitors used these automatic shortcuts.

However, our long-term goal is to demonstrate that more fundamental adaptations are feasible. An example of this is change in view, where a site could offer an alternative organization of its contents based on user access patterns. Consider, for example, the Music Machines web site,[*] which has been our primary testbed, as it is maintained by one of the authors, and we have full access to all documents and access logs. Music Machines is devoted to information about various kinds of electronic musical instruments. Most of the data at the site is organized by the manufacturer of the instrument and the particular model number. That is, there is a page for the manufacturer Roland and, on that page, links to pages for each instrument Roland produces. However, imagine a visitor to the site who is interested in a comprehensive overview of all the keyboards available from various manufacturers. She would have to first visit the Roland page and look at each of the Roland keyboards, then visit each of the other keyboard manufacturers for its offerings as well. Now, imagine if the site repeatedly observed this kind of behavior and automatically created a new web page containing all the links to all the keyboards. Now our visitor need only visit this new page rather than search for all the keyboards. This page represents a change in view, from the former ``manufacturer-centric'' organization to one based on type of instrument. If we can discover these user access patterns and create new web pages to facilitate them, we should in theory be able to create new views of the site.

next up previous
Next: A Case Study: Index Up: Introduction and Motivation Previous: Previous Work
Mike Perkowitz