Next:ConclusionUp:Improvement of HITS-based AlgorithmsPrevious:Three-Level Scoring Method (TLS)
ExperimentsIn our experiments, 28 queries and five search engines are used. The queries are exactly the same ones as those used in [1,2], they are vintage car (VC), recycling cans (RE), zen buddhism (ZB), thailand tourism (TH), parallel architecture (PA), stamp collecting (SC), telecommuting (TE), sushi (SU), alcoholism (AL), classical guitar (CG), lyme disease (LD), bicycling (BI), field hockey (FH), amusement park (AP), table tennis (TT), rock climbing (RC), computer vision (CV), shakespeare (SH), cruises (CR), gulf war (GW), gardening (GA), cheese (CH), hiv (HI), affirmative action (AA), mutual funds (MF), blues (BL), graphic design (GD), and architecture (AR). For each query, to build a base set, we started five threads simultaneously to collect top 20 hits and their neighborhood from five search engines, AltaVista, Fast, Google, HotBot, and NorthernLight, respectively. The combination of these top hits and their neighborhood forms the base set. For a document in the root set, we limit it to at most 50 in-links and collect all its out-links. The default search mode of the five search engines and lower-case queries were used. The way we construct the base set is different from the previous works [16,1,3], which usually build the base set from only one search engine, e.g., AltaVista. Combining top 20 hits and their neighborhood from five search engines gives us a more relevant base set. Running on a Sun Ultra-10 workstation with 300 MHz UltraSPARC-IIi processor connected to the Internet by the 100 Mbps fast Ethernet, our Java program took about three to five minutes to gather the base set for each query.
|CB||Combination of BHITS and CDR||WCB||Combination of WBHITS and CDR|
|OB||Combination of BHITS and Okapi||WOB||Combination of WBHITS and Okapi|
|TB||Combination of BHITS and TLS||WTB||Combination of WBHITS and TLS|
|VB||Combination of BHITS and VSM||WVB||Combination of WBHITS and VSM|
Duplicates or intra-domain links are then removed. For a link in either in-link set or out-link set of a root link, (1) it is considered a duplicate if it has the same name as another link in the same set, whether they are in lower case or not; (2) it is an intra-domain link if it is on the same domain as the root link. If two links only differ in that one ends with index.html, index.htm, home.htm, or home.html, while the other does not, they are considered duplicates. In addition, if two links are the same except that one begins with www and the other does not, they are also regarded duplicates. For example, link http://www.zenki.com/ and link http://zenki.com/ are duplicates.
Table 3 shows the base sets for the various queries. The number of distinct links in these sets ranges from 912 to 4037. The average in-degree and out-degree of a base-set link are 31 and 14, respectively, and about 43% of base-set links have 50 in-links. These sets of links are used for the BHITS algorithm or WBHITS algorithm. Let's call these sets of links set 1.
Another sets of links, called set 2, were generated from set 1 by further reducing the number of links. Each link is followed to evaluate its relevance to the query. We first remove broken links. If two links have the same document length and the same relevance score using the relevance scoring methods, they are considered duplicates, and one of them is discarded if they are in the same in-link set or out-link set of a root link. Set 2 were used by the methods combining BHITS algorithm or WBHITS algorithm with one of the four scoring methods.
Table 2 lists the 10 HITS-based algorithms used in the experiments. They are five unweighted algorithms and five weighted algorithms. The five unweighted algorithms are BHITS and its combination with one of the four relevance scoring methods. The five weighted algorithms include our weighted BHITS and its combination with one of the four relevance scoring methods.
The average relevance scores of BHITS and WBHITS on 28 queries is plotted
in Fig. 1, which clearly demonstrates
that WBHITS remarkably improves BHITS. In our experiments, we notice that
links are very common in the base sets of queries, and we find small-in-large-out
links in all the 28 queries except query 5, 15, 16, and 18. WBHITS shows
good job on almost all the queries with
except that the relevance scores are slightly worse than those of BHITS
algorithm for queries 1 and 13. Especially, WBHITS gets significant improvement
over BHITS on queries 2, 8, 17, 19, and 26 by preventing the results from
converging to small-in-large-out links.
Table 4 presents the average improvement/degradation
(%) of relevance scores between two algorithms. It shows that the combinations
of BHITS/WBHITS algorithm with a relevance scoring method outperform BHITS
algorithm, with improvement ranging from 18.3% to 22.7%. For example, WB
improves the relevance scores of B by 18.3%. The improvement of
TB, and VB over WB is marginal, ranging from 0.1%
to 2.7%. The improvement of WCB, WOB, WTB, and WVB
over CB, OB, TB, and VB is 0.9%, 0.9%, 1.0%,
and 1.7% respectively, which tells us that a weighted in-link approach
has little effect on the BHITS algorithm when both of them combine content
analysis. Table 4 shows that the combination
of BHITS algorithm with any of the four scoring methods has comparable
performance, with CDR the best and VSM the worst. The best
algorithm CB improves OB, TB, and
0.7%, 0.7%, and 2.9%, WCB improves WOB,WTB, and
only 0.7%, 00.7%, and 2.4%, respectively.
|Statistical||Statistical Comparison of a Pair of HITS-Based Algorithms|
Besides the improvement of average relevance scores between two algorithms, we compare the performance of different algorithms using two statistical metrics: expected loss (EL)  and probably approximately correct (PAC) . We set the confident level 0.95 for PAC and the loss threshold 0.05 for EL. The results are shown in Table 5. Both metrics give consistent results: WB is better than B; CB is the best and VB is the worst among the four algorithms combined with relevance scoring methods; and OB better than TB, although two PAC values are a little below the confident level.
Next:ConclusionUp:Improvement of HITS-based AlgorithmsPrevious:Three-Level Scoring Method (TLS) 2002-02-18