WWW2007 Paper Details
Track:
XML and Web Data
Paper Title:
Mapping-Driven XML Transformation
Authors:
  • Haifeng Jiang (IBM Almaden Research Center)
  • Howard Ho (IBM Almaden Research Center)
  • Lucian Popa (IBM Almaden Research Center)
  • Wook-Shin Han (Computer Enigeering Dept.Kyungpook National UniversityKorea)
Abstract:
Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as XML over the Web or in XML databases. By means of mappings from source to target schemas, Clio can help users conveniently establish the precise semantics of data transformation and integration. In this paper we study the problem of how to efficiently implement such data transformation (i.e., generating target data from the source data based on schema mappings). We present a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and discuss methodologies and algorithms for implementing these phases. In particular, we elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data (including duplicate elimination). We compare our transformation framework with alternative methods such as using XQuery or SQL/XML provided by current commercial databases. The results demonstrate that the three-phase framework (although as simple as it is) is highly scalable and outperforms the alternative methods by orders of magnitude.
Slot:
New Brunswick, Wednesday, May 9, 2007, 10:30am to 12 noon.
Full-text:
PDF version