Tables are common in HTML documents
Table understanding important
Summarization and mobile access
<table></table> tags cannot be trusted
Is a given <table> element a genuine table?
|First I will explain briefly the motivation
behind this work. *Most web publications are written in HTML. Tables are commonly used in such documents
to present relational data. Since they are inherently information rich, table
understanding has many important applications on the web. Although most
tables in HTML documents are marked by the table tags, the inverse is not
true not all table elements indicate true relational tables. In fact, most
of the time the table tag is used simply to achieve a multi-column layout
effect. So the first step in table understanding on the web is table
detection. In this study, we concentrate on tables that are coded using the
table tag. So by table detection we refer to the process of determining
whether a given table element represents a genuine table.