WWW2007 Paper Details
XML and Web Data
Paper Title:
A High-Performance Interpretive Approach to Schema-Directed Parsing
  • Morris Matsa (IBM)
  • Eric Perkins (IBM)
  • Abraham Heifets (IBM)
  • Margaret Gaitatzes Kostoulas (IBM)
  • Daniel Silva (IBM)
  • Noah Mendelsohn (IBM)
  • Michelle Leger (IBM)
XML delivers key advantages in interoperability due to its flexibility, expressiveness, and platform-neutrality. As XML has become a performance-critical aspect of the next generation of business computing infrastructure, however, it has become increasingly clear that XML parsing often carries a heavy performance penalty, and that current, widely-used parsing technologies are unable to meet the performance demands of an XML-based computing infrastructure. Several efforts have been made to address this performance gap through the use of grammar-based parser generation. While the performance of generated parsers has been significantly improved, adoption of the technology has been hindered by the complexity of compiling and deploying the generated parsers. Through careful analysis of the operations required for parsing and validation, we have devised a set of specialized bytecodes, designed for the task of XML parsing and validation. These bytecodes are designed to engender the benefits of fine-grained composition of parsing and validation that make existing compiled parsers fast, while being coarse-grained enough to minimize interpreter overhead. This technique of using an interpretive, validating parser balances the need for performance against the requirements of simple tooling and robust scalable infrastructure. Our approach is demonstrated with a specialized schema compiler, used to generate bytecodes which in turn drive an interpretive parser. With almost as little tooling and deployment complexity as a traditional interpretive parser, the bytecode-driven parser usually demonstrates performance within 20% of the fastest fully compiled solutions.
New Brunswick, Thursday, May 10, 2007, 10:30am to 12 noon.
PDF version