<- Previous | First | Next ->

Related" algorithm in Netscape is based on technology developed by Alexa, Inc., and computes its

answers based on connectivity information, content information, and usage information [11].

To compare the performance of our two algorithms and Netscape's algorithm, we performed a

user study on 59 URLs chosen by 18 volunteers. Our study results show that the precision at 10

computed over all 59 URLs of our two algorithms are 73% better and 51% better than Netscape's.

Not all algorithms gave answers for all URLs in our study. If we restrict the comparison to only

the 37 URLs for which all three algorithms returned answers, then the precision at 10 of our

two algorithms are 40% and 22% better than Netscape's algorithm. This is surprising since our

algorithms are based only on connectivity information.

Netscape's algorithm gives answers for about 17 million URLs [11], while our algorithms can

give answers for a much larger set of URLs (we have connectivity information on 180 million URLs).

This is important because it means that we can give related URL information for more URLs. Our

algorithms are also fast: in our environment, both average less than 200 msec of computation per

input URL.

The example shown in Table 1 is for a URL with a very high level of connectivity (www.nytimes.com

contains 47,751 inlinks in our Connectivity Server), and all three algorithms generally perform quite

well for well-connected URLs. Our algorithms can also work well when there is much less connectivity, as shown by the example in Table 2. This table shows the answers for the Companion and

Netscape algorithms for linas.org/linux/corba.html, one of the input URLs chosen by a user

as part of our user study. Alongside each answer is the user's rating for each answer, with a '1'

meaning that the user considered the page related, '0' meaning that the user considered the page

unrelated, and '{' meaning that the user was unable to access the page at all. The original page

was about CORBA implementations for Linux, and there were 123 pages pointing to this page in

our Connectivity Server. Nine of the ten answers given by the Companion algorithm were deemed

related by our user, while only one page from Netscape's set of answers was deemed related. Most

of Netscape's answers were about the much broader topic of Linux, rather than speci cally about

CORBA implementations on Linux.

Section 2 presents our algorithms in detail and describes Netscape's service, while Section 3

discusses the implementation of our algorithms. Section 4 describes the user study we performed

and presents its results, and also provides a brief performance evaluation of our algorithms. Finally,

Section 6 presents our conclusions.

2 Related Page Algorithms

In this section we describe our two algorithms (the algorithm and the algorithm), as well as Netscape's algorithm. Unlike Netscape's algorithm, both of our algorithms

exploit only the hyperlink-structure (i.e. graph connectivity) of the web and do not examine the

information about the content or usage of pages. Netscape's algorithm uses all three kinds of

information to arrive at its results.

In the sections below, we use the terms and . If there is a hyperlink from page w to

page v, we say that w is a of v and that v is a of w.

2.1 Companion Algorithm

The Companion algorithm takes as input a starting URL u and consists of four steps:
1. Build a for u.


<- Previous | First | Next ->