WWW2007: Program
Top of Menu Home CFP Program Committees Key Dates Location Hotel Registration Students Sponsors Media Submission Tutorials Workshops Travel Info Proceedings

Refereed Papers

Track: Pervasive Web and Mobility

Paper Title:
Robust Web Page Segmentation for Mobile Terminal Using Content-Distances and Page Layout Information

Authors:

  • Gen Hattori (KDDI R&D Laboratories Inc.)
  • Keiichiro Hoashi (KDDI R&D Laboratories Inc.)
  • Kazunori Matsumoto (KDDI R&D Laboratories Inc.)
  • Fumiaki Sugaya (KDDI R&D Laboratories Inc.)

Abstract:
The demand of browsing information from general Web pages using a mobile phone is increasing. However, since the majority of Web pages on the Internet are optimized for browsing from PCs, it is difficult for mobile phone users to obtain sufficient information from the Web. Therefore, a method to reconstruct PC-optimized Web pages for mobile phone users is essential. An example approach is to segment the Web page based on its structure, and utilize the hierarchy of the content element to regenerate a page suitable for mobile phone browsing. In our previous work, we have examined a robust automatic Web page segmentation scheme which uses the distance between content elements based on the relative HTML tag hierarchy, i.e., the number and depth of HTML tags in Web pages. However, this scheme has a problem that the content-distance based on the order of HTML tags does not always correspond to the intuitional distance between content elements on the actual layout of a Web page. In this paper, we propose a hybrid segmentation method which segments Web pages based on both the content-distance calculated by the previous scheme, and a novel approach which utilizes Web page layout information. Experiments conducted to evaluate the accuracy of Web page segmentation results prove that the proposed method can segment Web pages more accurately than conventional methods. Furthermore, implementation and evaluation of our system on the mobile phone prove that our method can realize superior usability compared to commercial Web browsers.

PDF version











sponsors