Toward zero-input personalization:
Nicholas Kushmerick · James McKee · Fergus Toolan
Smart Media Institute · Department of Computer Science
· University College
Most web services take a "one size fits all" approach: all visitors see the same generic content, formatted in the same generic manner. But of course every visitor has their own needs and preferences. Consider visitors to a computer science department web site: a student might want course materials; a prospective student might want application information; an employer might want lists of graduating students; a journalist might want general background information about the field of computer science. Personalized information services -- delivering the right information, at the right time, in the right format -- represent the Holy Grail of web design.
Personalized content requires a representation of visitors' information needs or preferences. Most recent personalization research requires that visitors explicitly describe their interests. Content-based systems (e.g. [Lang, 1995; Pazzani et al, 1996; Joachims et al, 1997]) require that users explicitly provide a model of what they are looking for, such as a list of query keywords. Collaborative or social filtering systems (e.g. [Resnick et al 1994; Shardanand & Maes, 1995; Hill et al, 1995]) require that users rate items. While user-interface design can simplify the process of providing such information (e.g., [Sakagami & Kamba, 1997; Balbanovic, 1998; Chan 1999]), a natural question arises: Can personalization be effective with no explicit visitor input?
Web personalization can take many forms. We focus on the (modest) goal of suggesting pages within a web site that that visitors will find relevant or interesting. We have developed PWW, a suite of personalization tools for adding personalization features to an existing web site.
To see PWW's functionality, suppose a user is interested in
Kushmerick's "AdEater" research project [Kushmerick, 1999]. The user might
submit the query "
kushmerick adeater" to a search engine
such as Lycos; see Figure 1(a). Note that Lycos'
search results include a hyperlink to page describing Kushmerick's
research. When selected, this hyperlink leads to the page shown in Figure 1(b). The main frame of this page is the
original web page as it would be rendered without PWW. PWW occupies
the small frame at the bottom of the page. As we describe below, PWW
use a novel technique called referrer-based page recommendation
to suggest a small set of pages within the UCD/CS web. These
suggestions are displayed in the menu in the lower-left corner of Figure 1(b). Note that PWW's top-ranking suggestion
describes the AdEater project. PWW's suggestions are
personalized (different visitors get different
recommendations), yet require no user input (visitors need not
explicitly tell PWW anything about their preferences).
PWW's suggestions are based on referrer-based page
recommendation, a novel zero-input recommendation technique.
The intuition underlying RBPR is as follows. Suppose you are
searching for Alon Levy's work on data integration, so you decide to
submit the the keywords "
levy data integration" to a search engine. The output from
this query will probably include (among many others) a hyperlink to Levy's home page in
the University of Washington web. If you decide to select this link,
then a personalized version of the UW web might suggest you go to a
page on Tukwila
or the general database
Referrer-based page recommendation relies on the fact that the page you were looking at when you requested Levy's home page (the referrer URL, in HTTP parlance) often contains numerous terms -- "data", "integration", names of other researchers, etc. -- that provide valuable clues about your "true" information need. RBPR uses terms taken from the referrer URL as query against the entire UW web, and then suggests the K highest-ranking pages (we call K the retrieval window).
We have evaluated RBPR using access logs from the Music Machines web site; see [Perkowitz & Etzioni 1999] for details. We measured the fraction of the user's navigation that could have been avoided, had PWW been installed on Music Machines and the visitors followed its suggestions. We tested PWW in two configurations. In "Referrer URL only" mode the query terms are derived directly from the referrer URL. In "Referrer contents" mode, the query terms are gathered from the page pointed to be the referrer URL. "Referrer URL" mode is faster, but "Referrer contents" mode gives better suggestions. Since the server logs are several years old some of the referrer URLs have disappeared; in "Valid Referrer contents" mode we use heuristics to discard invalid URLs. We then varied the retrieval window K from 1 to 20. Figure 2 shows that, as expected, "Referrer contents" mode gives substantially better predictions, and that a retrieval window greater than 10 does not improve the suggestions enough to justify the fact that the user would need to look through more suggestions.
While RBPR's performance of 9-10% is hardly earth-shattering, recall visitors do not assist PWW in any way. Visitors do not have to express their interests in terms of query keywords, and they do not need to rate web pages during browsing. Furthermore, because the contents of the referrer URLs or Music Machines itself might have changed substantially in the two years since the dataset was gathered, this experimental scenario probably underestimates PWW's performance on live data.
Finally, we note that hyperlink recommendation is notoriously difficult. For example, [Joachims et al, 1997] report that WebWatcher makes useful recommendations about 40-45% of the time even though the system is given an explicit statement of the visitors interests, and that human performance on this task is about 48%. We claim not that RBPR outperforms alternative page recommendation algorithms, but that it represents compelling evidence that zero-input personalization can be effective.
More information at www.cs.ucd.ie/staff/nick/home/research/download/kushmerick-www9/poster/long.