WWW2006 - Verifying Genre-based Clustering Approach to Content Extraction
| Skip to main content | Skip to navigation |

Register Now!

Verifying Genre-based Clustering Approach to Content Extraction

  • Suhit Gupta, Columbia University, USA
  • Hila Becker, Columbia University, USA
  • Gail Kaiser, Columbia University, USA
  • Salvatore Stolfo, Columbia University, USA

Full text:

Track: Posters

The content of a webpage is usually contained within a small body of text and images, or perhaps several articles on the same page; however, the content may be lost in the clutter, particularly hurting users browsing on small cell phone and PDA screens and visually impaired users relying on speed rendering of web pages. Using the genre of a web page, we have created a solution, Crunch that automatically identifies clutter and removes it, thus leaving a clean content-full page. In order to evaluate the improvement in the applications for this technology, we identified a number of experiments. In this paper, we have those experiments, the associated results and their evaluation.

Citation

Gupta, S., Becker, H., Kaiser, G., and Stolfo, S. 2006. Verifying genre-based clustering approach to content extraction. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 875-876.
DOI= http://doi.acm.org/10.1145/1135777.1135922

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner


Become a sponsor or exhibitor
Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!