All measurements were performed on a Compaq AlphaServer 4100 with 8 GB of main memory and
dual 466 MHz Alpha processors. The measured running times are wall clock times from the time
the input URL is given to the server until the time the answers are returned. These times do not
include the time taken to format the results as an HTML page, since that was done by a server
process running on another machine (and the time to do this was negligible).
The average running time for the Companion algorithm on the 50 URLs for which it returned
answers was 109 msec, while the average running time for the Cocitation algorithm on the 58
URLs for which it provided answers was 195 msec. The performance of both these algorithms is
su ciently fast that either one could handle a large amount of tra c (close to 800,000 requests
per day for the Companion algorithm). Furthermore, the average performance could probably be
improved by caching answers for frequently requested URLs.
Although we did not explicitly include this factor in our user study, we have informally observed that the subjective quality of answers returned for both the Companion and the Cocitation
algorithms does not decrease when we decrease the parameter B (the number of inlinks considered)
during the building of the vicinity graph. This is important for on-line services because it means
that the graph size could be reduced during times of high load, thereby reducing the amount of
time taken to service each request. Under conditions of low load, the graph size could be increased.
The Companion algorithm generally converges on its answers within a few iterations (typically
less than 10 iterations), but the number of iterations increases with the graph size. Each iteration
takes time that is linear in the number of edges in the vicinity graph. We plot the running time
vs. the number of graph edges in Figure 2 (a).
The running time of the Cocitation algorithm is O(n log n), where n is the number of siblings
examined for cocitation, since it sorts the siblings by the degree of cocitation. This e ect is illustrated in Figure 2 (b). In our experience, the running times for the the cocitation and companion
algorithms are generally correlated, since URLs which have a large number of siblings to consider
in the cocitation algorithm also generally produce a large neighborhood graph for processing in the
5 Related Work
Many researchers have proposed schemes for using the hyperlink structure of the web [18, 3, 7, 23,
19, 25, 24, 6, 17, 5]. For the most part, this work does not discuss the nding of related pages, with
four exceptions discussed below.
We know of only one previous work that expoits the order of links: Chakrabarti et al.  use
the links and their order to categorize web pages and they show that the links that are near a given
link in page order frequently point to pages on the same topic.
Previous authors have suggested using cocitation and other forms of connectivity to identify
related web pages. Spertus observed that cocitation could indicate that two pages are related .
That is, if page A points to both pages B and C, then B and C might be related. Various researchers
in the eld of bibliometrics have also observed this [15, 13, 14, 22], and this observation forms the
basis of our Cocitation algorithm. The notion of collaborative ltering, although it is based on user's
recommendations rather than hyperlinks, also relies on this observation . Pitkow and Pirolli 
cluster web pages based on co-citation analysis. Terveen and Hill  use the connectivity structure
of the web to nd groups of related web sites.
Our companion algorithm descended from the HITS algorithm developed by Kleinberg .
The HITS algorithm was originally proposed by Kleinberg as a way of using connectivity structure
to identify the most authoritative sources of information on a particular topic, where the topic was