Improvement of HITS-based Algorithms on Web Documents Research supported in part by the National Science Foundation under grants DUE-9980375 and EIA-0086230. nextupprevious

Improvement of HITS-based Algorithms on Web Documents 

Longzhuang Li, Yi Shang, and Wei Zhang
Department of Computer Engineering and Computer Science
University of Missouri-Columbia
Columbia, MO 65211, USA

Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.


In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance-scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.

Categories and Subject Descriptors: H.3 [Information Systems]: Information Storage and Retrieval;  G.3 [Mathematics of Computing]: Probability and Statistics; D.2 [Software]: Software Engineering

General Terms: Algorithms, Measurement, Performance

Keywords: HITS-based algorithms, relevance scoring methods, information retrieval