WWW2008 Refereed Papers - WWW 2008: Refereed Papers
Skip to main content.

Refereed Papers

Track: WWW in China: Chinese Web Innovations

Paper Title:
Efficient Multi-keyword Search over P2P Web


  • Hanhua Chen(Huazhong University of Science and Technology)
  • Hai Jin(Huazhong University of Science and Technology)
  • Jiliang Wang(Hong Kong University of Science and Technology)
  • Lei Chen(Hong Kong University of Science and Technology)
  • Yunhao Liu(Hong Kong University of Science andTechnology)
  • Lionel M. Ni(Hong Kong University of Science and Technology)

Current search mechanisms of DHT-based P2P systems can well handle a single keyword search problem. Other than single keyword search, multi-keyword search is quite popular and useful in many real applications. Simply using the solution for single keyword search will require distributed intersection/union operations in wide area networks, leading to unacceptable traffic cost. As it is well known that Bloom Filter (BF) is effective in reducing traffic, we would like to use BF encoding to handle multi-keyword search. Applying BF is not difficult, but how to get optimal results is not trivial. In this study we show, through mathematical proof, that the optimal setting of BF in terms of traffic cost is determined by the global statistical information of keywords, not the minimized false positive rate as claimed by previous methods. Through extensive experiments, we demonstrate how to obtain optimal settings. We further argue that the intersection order between sets is important for multi-keyword search. Thus, we design optimal order strategies based on BF for both "and" and "or" queries. To better evaluate the performance of this design, we conduct extensive simulations on TREC WT10G test collection and the query log of a commercial search engine. Results show that our design significantly reduces the search traffic of existing approach by 73%.

PDF version

Inquiries can be sent to: Email contact: program-chairs at www2008.org

Valid XHTML 1.0 Transitional