www8.dvi
<- Previous | First | Next ->

All measurements were performed on a Compaq AlphaServer 4100 with 8 GB of main memory and

dual 466 MHz Alpha processors. The measured running times are wall clock times from the time

the input URL is given to the server until the time the answers are returned. These times do not

include the time taken to format the results as an HTML page, since that was done by a server

process running on another machine (and the time to do this was negligible).

The average running time for the Companion algorithm on the 50 URLs for which it returned

answers was 109 msec, while the average running time for the Cocitation algorithm on the 58

URLs for which it provided answers was 195 msec. The performance of both these algorithms is

su ciently fast that either one could handle a large amount of tra c (close to 800,000 requests

per day for the Companion algorithm). Furthermore, the average performance could probably be

improved by caching answers for frequently requested URLs.

Although we did not explicitly include this factor in our user study, we have informally observed that the subjective quality of answers returned for both the Companion and the Cocitation

algorithms does not decrease when we decrease the parameter B (the number of inlinks considered)

during the building of the vicinity graph. This is important for on-line services because it means

that the graph size could be reduced during times of high load, thereby reducing the amount of

time taken to service each request. Under conditions of low load, the graph size could be increased.

The Companion algorithm generally converges on its answers within a few iterations (typically

less than 10 iterations), but the number of iterations increases with the graph size. Each iteration

takes time that is linear in the number of edges in the vicinity graph. We plot the running time

vs. the number of graph edges in Figure 2 (a).

The running time of the Cocitation algorithm is O(n log n), where n is the number of siblings

examined for cocitation, since it sorts the siblings by the degree of cocitation. This e ect is illustrated in Figure 2 (b). In our experience, the running times for the the cocitation and companion

algorithms are generally correlated, since URLs which have a large number of siblings to consider

in the cocitation algorithm also generally produce a large neighborhood graph for processing in the

companion algorithm.

5 Related Work

Many researchers have proposed schemes for using the hyperlink structure of the web [18, 3, 7, 23,

19, 25, 24, 6, 17, 5]. For the most part, this work does not discuss the nding of related pages, with

four exceptions discussed below.

We know of only one previous work that expoits the order of links: Chakrabarti et al. [9] use

the links and their order to categorize web pages and they show that the links that are near a given

link in page order frequently point to pages on the same topic.

Previous authors have suggested using cocitation and other forms of connectivity to identify

related web pages. Spertus observed that cocitation could indicate that two pages are related [23].

That is, if page A points to both pages B and C, then B and C might be related. Various researchers

in the eld of bibliometrics have also observed this [15, 13, 14, 22], and this observation forms the

basis of our Cocitation algorithm. The notion of collaborative ltering, although it is based on user's

recommendations rather than hyperlinks, also relies on this observation [21]. Pitkow and Pirolli [19]

cluster web pages based on co-citation analysis. Terveen and Hill [25] use the connectivity structure

of the web to nd groups of related web sites.

Our companion algorithm descended from the HITS algorithm developed by Kleinberg [17].

The HITS algorithm was originally proposed by Kleinberg as a way of using connectivity structure

to identify the most authoritative sources of information on a particular topic, where the topic was

12


<- Previous | First | Next ->