Visualizing Structural Patterns in Web Collections
  • M.S. Ali (University of Toronto)
  • Mariano Consens (University of Toronto)
  • Flavio Rizzolo (University of Toronto)
We present a tool, DescribeX, suitable for exploring and visualizing the structural patterns present in collections of XML documents. DescribeX can be employed by developers to interactively discover, for example, those XPath expressions that will actually return elements known to occur in the collection.

Many collections of XML documents present in the Web are difficult to describe because they use different schemas, the schemas used may be extended through namespaces, and the document instances are often complex and ad-hoc in structure. Collected feeds are an example of web collections that are comprised of documents with multiple schemas (e.g. Atom, RSS, and RDF), in multiple versions (e.g. RSS 1.0, RSS 2.0, etc.), which have been fruther extended by schemas from several namespaces (e.g. Dublin core, iTunes Podcast, Microsoft Simple List Extensions). Another example not involving feeds is a collection created from traces of web service requests.
