Improvement of HITS-based Algorithms on Web Documents
Department of Computer Engineering and Computer Science
University of Missouri-Columbia
Columbia, MO 65211, USA
Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
Abstract:In this paper, we present two ways to improve the precision of HITS-based algorithms on Web documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance-scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.
Categories and Subject Descriptors: H.3 [Information Systems]:
Information Storage and Retrieval; G.3 [Mathematics of Computing]:
Probability and Statistics; D.2 [Software]: Software Engineering
General Terms: Algorithms, Measurement, Performance
Keywords: HITS-based algorithms, relevance scoring methods, information retrieval
- Current HITS-based Algorithms and their Limitations
- A New Weighted HITS-based Algorithm
- Combining the HITS-based Algorithms with Relevance Scoring Methods
- Vector Space Model (VSM)
- Okapi Similarity Measurement (Okapi)
- Cover Density Ranking (CDR)
- Three-Level Scoring Method (TLS)
- About this document ...