WWW2008 Refereed Papers - WWW 2008: Refereed Papers
Skip to main content.

Refereed Papers


Track: WWW in China: Chinese Web Innovations

Paper Title:
Towards a Global Schema for Web Entities

Authors:

  • Conglei Yao(Peking University)
  • Yongjian Yu(Peking University)
  • Sicong Shou(Peking University)
  • Xiaoming Li(Peking University)

Abstract:
Popular entities often have thousands of instances on the Web. In this paper, we focus on the case where they are presented in table-like format, namely appearing with their attribute names. It is observed that, on one hand, for the same entity, different web pages often incorporate different attributes; on the other, for the same attribute, different web pages often use different attribute names (labels). Therefore, it is imaginably difficult to produce a global attribute schema for all the web entities of a given entity type based on their web instances, although the global attribute schema is usually highly desired in web entity instances integration and web object extraction. To this end, we propose a novel framework of automatically learning a global attribute schema for all web entities of one specific entity type. Under this framework, an iterative instances extraction procedure is first employed to extract sufficient web entity instances to discover enough attribute labels. Next, based on the labels, entity instances, and related web pages, a maximum entropy-based schema discovery approach is adopted to learn the global attribute schema for the target entity type. Experimental results on the Chinese Web achieve weighted average Fscores of 0.7122 and 0.7123 on two global attribute schemas for person-type and movie-type web entities, respectively. These results show that our framework is general, efficient and effective.

PDF version












Inquiries can be sent to: Email contact: program-chairs at www2008.org

Valid XHTML 1.0 Transitional