Next:Three-Level Scoring Method (TLS)Up:Combining the HITS-based AlgorithmsPrevious:Okapi Similarity Measurement (Okapi)
Cover Density Ranking (CDR)Instead of computing the relevance based on term appearance, such as VSM, other methods including CDR are based on the appearance of phrases. CDR is developed to meet user expectation better - a document containing most or all of the query terms should be ranked higher than a document containing fewer terms, regardless of the frequency of term occurrence . In CDR, the results of phrase queries are ranked in the following two steps :
- Documents containing one or more query terms are ranked by coordination level, i.e., a document with a larger number of distinct query terms ranks higher. The documents are thus sorted into groups according to the number of distinct query terms each contains, with the initial ranking given to each document based on the group in which it appears.
- The documents at each coordination level are ranked to produce the overall ranking. The score of the cover set is calculated as follows:
where is an ordered pair over a document, called cover, specifying the shortest interval of two distinct terms in the document . is the position of one term, the position of another term, and is assumed to be larger than . is a constant and is set to 16 in our experiments because it has been shown to produce good results . Covers of length or shorter are given score 1, and longer covers are assigned scores less than 1 in proportional to the inverse of their lengths.
- Documents are scored according to the regular CDR method. Each document belongs to a coordination level group and has a score within that group.
- The scores are normalized to range (0, 1] for documents containing only one term, to range (1, 2] for documents containing two different terms, and so on, so forth.
Next:Three-Level Scoring Method (TLS)Up:Combining the HITS-based AlgorithmsPrevious:Okapi Similarity Measurement (Okapi) 2002-02-18