WWW2006 - Using Graph Matching Techniques to Wrap Data from PDF Documents
| Skip to main content | Skip to navigation |

Register Now!

Using Graph Matching Techniques to Wrap Data from PDF Documents

  • Tamir Hassan, Vienna University of Technology, Austria
  • Robert Baumgartner, Vienna University of Technology, Austria

Full text:

Poster:

Track: Posters

Wrapping is the process of navigating a data source, semi-automatically extracting data and transforming it into a form suitable for data processing applications. There are currently a number of established products on the market for wrapping data from web pages. One such approach is Lixto, a product of research performed at our institute. Our work is concerned with extending the wrapping functionality of Lixto to PDF documents. As the PDF format is relatively unstructured, this is a challenging task. We have developed a method to segment the page into blocks, which are represented as nodes in a relational graph. This paper describes our current research in the use of relational matching techniques on this graph to locate wrapping instances.

Citation

Hassan, T. and Baumgartner, R. 2006. Using graph matching techniques to wrap data from PDF documents. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 901-902.
DOI= http://doi.acm.org/10.1145/1135777.1135935

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner


Become a sponsor or exhibitor
Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!