WWW2008 Posters - WWW 2008: Posters
Skip to main content.

Posters


Track: Posters

Paper Title:
Web Page Sectioning Using Regex??-based Template

Authors:

  • Rupesh R. Mehta(Yahoo! R&D)
  • Amit Madaan(Yahoo! R&D)

Abstract:
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The structural and content information is leveraged via template, a generalized regular expression learnt over set of pages. The template along with visual information results into high sectioning accuracy. The experimental results demonstrate the effectiveness of the approach.

PDF version












Inquiries can be sent to: Email contact: program-chairs at www2008.org

Valid XHTML 1.0 Transitional