Presenting tailored resource descriptions: Will XSLT do the job?

Presenting tailored resource descriptions: Will XSLT do the job?

Alison Cawsey
Department of Computing and Electrical Engineering
Heriot-Watt University


The problem of finding relevant resources from those available across the World Wide Web is well recognised. Improved search engines provide part of the answer, but we also need to support the user in assessing for themselves the relevance of documents suggested by a search engine, prior to download. One way to do this is to provide them with descriptions tailored to their profile and query. This paper presents examples of how XSLT may be used to create tailored descriptions from RDF metadata, and explores whether XSLT is an adequate tool for this task.

Keywords: Metadata, RDF, XSL, Information Retrieval, Personalisation.

1. Introduction

The problem of finding relevant resources from those available across the World Wide Web is well recognised. A search engine may be able to suggest a set of potentially relevant resources given a query, but the user is left with the job of trawling through this set to find those that meet his/her need. Often the information provided by the search engine about each resource is not sufficient to allow the user to assess its possible relevance prior to download. We are interested in how more informative, but concise, resource descriptions may be provided to the user to aid them in assessing resource relevance.

There is already some work addressing this problem. Within the information retrieval community there is work on producing query-directed summaries that summarise document content in a way that depends on the user's query (perhaps highlighting information of likely interest). In this work, summaries are extracted from the text of the document (e.g., [5]). However, the summaries produced are limited to what can be extracted from the text, and thus ignore information about the resource which may be external to it. Also, as the summarisation methods work by extracting text fragments, they are not useful for multimedia resources where the text represents but a small part of the information content of the resource.

Another approach that is currently being pursued to support better search and retrieval is the use of rich metadata - data about the resource, such as the title, topic, author, and date of last modification. While some of this information may be repeated within the resource itself, it is useful to have it represented in a structured, explicit form, allowing very focused queries to be made (e.g., find resources where topic=T, author=A and where title contains W). Standards both for the vocubulary to be used in metadata (e.g., Dublin Core [14] , and IMS [15] ), and languages for the representation of metadata (RDF [8] ) are now in place; We expect increased use of metadata as tools are developed based on these standards.

Metadata is normally embedded in a document, not visible to the user. Yet some holders of resource collections recognise the utility of metadata to the user, to help them assess relevance, and make that metadata accessible (see, for example, the Gateway to Educational Materials [16]). We believe that these metadata presentations may be more effective in allowing users to assess relevance if they are tailored to the user and query, so the user can influence which elements are presented. This is particularly important where the metadata available is complex, and could not all be concisely displayed; In the IMS/IEEE metadata specification [15], for example, includes over 50 elements/subelements that may be included in a formal description of an educational resource; while not all would be used for a particular resource instance, it illustrates the complexity of the information that may be available.

The simplest way to present metadata is as a table (see figure 1 for a simple presentation of some Dublin Core metadata). Yet the same information may often be presented more concisely as a text description. We have explored how personalised resource descriptions may be produced, both in table and text form.

Description of:
Title D-Lib Program - Research in Digital Libraries
Description The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing.
Publisher Corporation For National Research Initiatives
Date 1995-01-07
Research; statistical methods
Education, research, related topics
Library use Studies
Type World Wide Web Home Page
Format text/html
Language en
Figure 1. Example Metadata Table

We assume that the metadata will be represented using the Resource Description Framework (RDF) XML serialisation. RDF is a W3C standard for representing metadata [11], allowing clear specification of metadata vocabularies, and rich structure within the formal description. We focus on rich metadata describing individual resources, and in particular those describing educational resources, such as IMS and GEM [15][16]). Although there is currently very little rich metadata in RDF format, it is a standard which is likely to be quickly adopted, as the problems with existing methods (e.g., use of HTML meta tag) become apparent.

The paper explores how far we can get using XSLT to present RDF metadata in a personalised and concise fashion, suggesting some current limitations with how stylesheets are associated with documents. XSLT is the transformation part of a general stylesheet language for XML [9]; it allows quite complex tree transformations by matching on an input tree and creating an output tree, through the application of templates. It is currently used primarily to transform XML to HTML, allowing presentation via current browsers.

2. Presenting RDF to the User - Simple Tables

Our first investigation focused on how XSLT could be used to present RDF metadata to the user. We used the examples from the RDF specification document as input -- Figure 2 illustrates a typical example. It is very straightforward to develop XSLT stylesheets to present this data as a nested table. Templates are written to deal with each element from the RDF namespace (e.g., Description, Bag, Resource), and for each metadata property element (e.g., Author) that we wish to be displayed to the user. A simplified fragment is given in figure 3, with output illustrated in figure 1. This example also serves to illustrate XSLT for those not familiar with it. Each template matches on some elements of the input tree (e.g., RDF:Description elements). The body of the template mixes elements to be inserted in the output tree (e.g., Table) with XSLT processing instructions (e.g., xsl:apply-templates, which results in templates being applied to child nodes in the input tree). Further demonstration output is available on . (Note: These examples were tested with version 19990721 of XT, an implementation of XSLT.)

  <rdf:Description about="">
    <dc:Title>D-Lib Program - Research in Digital Libraries</dc:Title>
    <dc:Description>The D-Lib program supports the community of people
     with research interests in digital libraries and electronic
    <dc:Publisher>Corporation For National Research Initiatives</dc:Publisher>
        <rdf:li>Research; statistical methods</rdf:li>
        <rdf:li>Education, research, related topics</rdf:li>
        <rdf:li>Library use Studies</rdf:li>
    <dc:Type>World Wide Web Home Page</dc:Type>
Figure 2. Simple RDF Example

<?xml version='1.0'?>
<xsl:template match="RDF:Description">
  <TABLE border="2">
    <xsl:apply-templates />

<xsl:template match="DC:subject">
    <th> Subject </th> 
    <td> <xsl:apply-templates /> </td>
Figure 3. Partial Style Sheet for Displaying RDF

The fragment in figure 3 can be improved and developed in many ways. We can reasonably assume in RDF that property element names (including the namespace) will be URIs pointing to a definition of that element (either an RDF schema, or an English definition). Links to these definitions can be easily added, as can links to the resource being described, and also any property values specified using rdf:resource . As an example, given a property element DC:subject, the following XPath functions [10] (used by XSLT) inserts a suitable link to the definition of the Dublin Core subject element:

    <A HREF="{namespace-uri()}{local-name()}">
      <xsl:value-of select="local-name()"/>

If we want the stylesheet to handle all of RDF, it should be able to handle all the abbreviated syntaxes. This makes RDF somewhat awkward to handle; the abbreviated syntax for typed nodes, for example, introduces difficulties. It seems to make sense, therefore, to "normalise" RDF (eliminating abbreviated syntax) prior to application of the stylesheets. This can be done using XSLT again, and separates out one subproblem, making all stylesheets easier to construct. Handling abbreviated typed nodes in XSLT is possible, just difficult; Prior to applying XSLT to present the data we should use XSLT to transform it into the syntax simplest to handle.

The stylesheet illustrated in figure 3 requires that templates are defined for each property element that we want displayed. There are two ways this can be managed, without requiring the stylesheet author to create all such templates "by hand". The first way is to use the RDF schema [11] . An RDF schema will define all the property elements associated with a particular metadata element set. An example fragment of a schema for Dublin Core is given in figure 4. We can use an XSLT style sheet (applied to the schema) to generate another XSLT stylesheet suitable for the presentation of RDF instances based on that schema. Figure 5 illustrates a fragment of such a stylesheet generating stylesheet. The example relies on XSLT namespace aliases being used for the XSLT namespace itself (so we can distinguish XSLT instructions and parts of the result tree), and the namespace of the target element set.

<rdf:Description ID="Title">
   <rdf:type rdf:resource=""/>
   <rdfs:comment>The name given to the resource, usually by the Creator
          or Publisher.
Figure 4. Fragment of RDF Schema for Dublin Core

<xsl:template match="Description[RDF:type/@RDF:resource = 
  <litxsl:template match="ElNameSpace:{@ID}">
      <th> <xsl:value-of select="RDFS:label"/> </th> 
      <td> <litxsl:apply-templates /> </td>
Figure 5. XSLT Fragment to create stylesheet for RDF instance from RDF Schema

So far we have only explored generating stylesheets that cover one RDF schema. Yet a particular metadata record may be based on several schemas. In general we would want to access all these schemas (via the namespace URL) and create a merged stylesheet that covers all the element types that might occur in the record. This should be reasonably straightforward, exploiting the possibility in XSLT of having multiple input documents.

The second way we have attempted dealing with arbitrary property elements is to create a general stylesheet which is based on the assumption that any non RDF/XML namespace refers to property elements. We therefore have a single general template for property elements. However, this approach will not be able to use information from the schemas (e.g., the label of property elements). Also this (and the previous approach) results in all metadata being displayed; yet much of an RDF record may refer to cataloguing and indexing information of little interest to the user. We need methods of easily defining which elements are of interest.

3. Simple Tailored Descriptions

We can easily "hardwire" our stylesheets so, for example, they only present a subset of a particular metadata element set (perhaps missing out Language and Format). Yet different users (making different types of information search) are likely to be interested in different subsets. One way to support this is to allow users to set up their own stylesheet. In a preliminary study (18 subjects) users appeared to value the facility of specifying which resource metadata elements to present following a search.

It is straightforward to create a form interface allowing metadata elements to be selected (e.g., author, title), and a cgi-program to create a personalised XSLT file. However, it is more interesting, and more easily adapted to other metadata element sets, if the form is generated automatically. We can do this using XSLT to present the contents of an RDF Schema (defining a metadata element set) as an HTML form (for element selection). All RDF property elements in the schema are translated into appropriate checkboxes (see figure 6). A general cgi-program can then be used to create the personalised XSLT for display of metadata.

<xsl:template match="Description[RDF:type/@RDF:resource = 
    <th align="left"> 
      <xsl:value-of select="RDFS:label"/> 
      <input type="checkbox">
        <xsl:attribute name="name">
          <xsl:value-of select="@ID"/>
Figure 6. XSLT Fragment to Create Property Element Selection Form from RDF Schema

This allows personal stylesheets to be created given an RDF schema, which should be accessible via the property element names in a particular RDF document. It would be a useful facility for users interested primarily in a particular type of document (e.g., educational), with a specified metadata element set defined in an RDF schema. However, we are left with the question of how to associate that personalised schema with an RDF document. This is straightforward when using XSLT on the server side (using XT); we can easily specify which XSLT file to apply to the RDF, and return the resulting HTML. However, it is less straightforward when using XSLT on the client side, using Internet Explorer. The existing methods and standards for associating a stylesheet with a document [12] don't make it easy to flexibly select which stylesheet to use. One answer is to use cookies, with the style sheet instruction in the RDF referring to a CGI program which will look up the name of the users own personalised stylesheet, via their cookie.

   <?xml-stylesheet type="text/xsl" href="findmystylesheet.cgi">
   <RDF ... >

However, this is of limited flexibility; unless the user can specify a stylesheet independently of the source XML document, much of the power of XML is lost. (As a very different example of this, we might want to have a stylesheet which can create output particularly suitable for screen readers/speech synthesiser, with the appropriate markup for intonation and voice. We want the (possibly visually impaired) user reading the document to be able to specify that this speech-enhanced stylesheet should be used, without relying on the XML author to provide an alternative version of the document with that stylesheet specified.)

4. Tailored Textual Resource Descriptions

Producing tables to present metadata is in fact a rather verbose and inflexible approach. Consider the following metadata:
Title The Moon and Stars
Author Jane Smith
Subject Astronomy
Type Lesson Plan
Grade 5,6,7

This can be presented much more concisely as a single sentence
"'The Moon and Stars', by Jane Smith, is an Astronomy lesson plan for grades 5 to 7."

Text based output also opens up possibilities to go beyond the literal facts when presenting the data, commenting on, comparing and adding to the raw facts (e.g., "Unlike 'Stars today' it is..").

However, generating coherent text from raw facts is a complex problem, addressed in the natural language processing commumity (see [4] ). If the input data is very consistent (ie.., every set of input facts contains the same sort of data) it is possible to create a fill-in-the-blanks template to slot data in (illustrated in XSLT in figure 7).

<xsl:template match="RDF:Description">
   <xsl:value-of select="DC:Title"/>
   by <xsl:value-of select="DC:Author"/>,
   is a <xsl:value-of select="GEM:Subject"/>
   <xsl:value-of select="GEM:Type"/>
   for grades <xsl:value-of select="GEM:grades"/>
Figure 7. XSLT template for text output

However, typically different authors will use a different subset of the available metadata elements when describing their resources. Where the metadata property values are arbitrary strings (rather than from a controlled vocabulary) this introduces further uncertainty, as the length, and even syntactic constructions used by different authors for the same type of element may vary. If we add to that the desire to allow personalised output, containing just those facts of interest to the particular user, we find that simple template-based methods are pushed to their limits. We can easily create grammatically correct text by expressing each fact as a single sentence, or substituting contentless expressions when facts are omitted. But our initial studies with users suggest that poor quality text provides no value over tabular output [1]

The problem is then, given an arbitrary subset of facts (from a reasonably well defined set), how to combine them coherently into a small number of concise sentences. This is a classic natural language generation problem - that of aggregation. While some good work has been done on algorithms for aggregation (e.g., [6] ), we are interested in whether we can exploit some of the basic ideas of natural language generation, using XSLT, to at least improve on the simple template-based approach, allowing reasonably coherent text from varying input data. Does XSLT provide sufficient power to make headway?

A standard approach to use in natural language generation is to separate out the problems of content selection and "realisation" (as English sentences). Two, or often three, stages may be used [2] , each creating intermediate representations which will be input to the next. This seems to make equally good sense when presenting information using XSLT (whether that output is in table, or other format). Typically stylesheets merge the two functions of content selection and presentation. If we can have two separate stylesheets, one concerned with picking out the sections of an XML document that are to be presented, one with transforming that data into a suitable realisable surface form, then we have a much improved modular structure allowing presentation and content selection to be revised independently of one another, and combined in more flexible ways. This is (like many things) straightforward using XSLT on the server side; the stylesheets may be simply piped together. Yet this rather obvious procedure is currently difficult on the client side (though can be managed within Javascript for example). This again points to limitations in the way stylesheets are currently associated with documents. As more and more applications require multiple transformations there will be pressure for better support for this.

Separating out these two processes, we have one (simple) stylesheet for selecting content, one more complex one for composing sentences from that content. The latter can assume that ALL the data in the input tree is to be expressed. We can also require that the first (content selection) stylesheet, while not imposing a rigid final order on the information, does order and group information. This simplifies certain things, so that the second stylesheet can check whether facts have already been expressed, simply by examining the input tree. (XSLT does not allow access, in its decisions, to the tree being constructed). The templates required are a little inelegant, but do allow simple decisions to be made to enhance the coherence and conciseness of the next (e.g., using "also" for repeated values; omitting information already expressed). Figure 8 illustrates example (simplified) output, showing how previous resources are referred to, and publisher details can be omitted in the second description using this approach.

Astronomyis a Science Lesson Plan for grades 5, 6. It is published by AskERIC, an OnlineProvider (Email: The following resources will be needed: Meter sticks , Styrofoam pellets , Cotton balls , Black poster board , Black umbrella , Star chart , Dictionary.

Constellations is another Science Lesson Plan but for grades 3, 4, 5, 6. It is also published by AskERIC

Figure 8. Example Natural Language Output

Separating out content selection from realisation certainly clears some ground. But there are many more issues to be addressed when creating quality text. For example, since an easily readable and concise paragraph should contain little redundancy, we want to be able to choose from the several ways in which an object may be referred to. Some of these choices will be informed by whether (and how) it has been mentioned before. The simplest case of this concerns the use of pronouns (e.g., "It is published.."). The second sentence above would not make sense if the first sentence was omitted (perhaps due to the non-availability of data on subject, type and grades in the input tree). We really need to determine when to use pronouns dynamically, based on what has already been expressed.

The simplest algorithm for pronoun selection just looks to see if a particular entity (e.g., "Astronomy") has been mentioned in the previous sentence; if so a pronoun is used. However, even this apparently simple task proves quite difficult in XSL, due to the unavailability of the output tree, and limited variable mechanisms. One approach is to push more of the problem into the content selection stylesheet, so the output from that contains information on objects (to be) mentioned -- we add "focus" elements to the tree, containing a specification of the expected main subject of a description (e.g., "Astronomy" or "AskERIC"). But this approach will only work if the order of information in the content tree is strictly maintained in the output.

Taking this approach to its rational conclusion would result in numerous stylesheets being used, each making some progress on transforming the input tree, adding more contextual information to the tree as required. This fits reasonably with current proposals for architectures for natural language generation [2] , but is hardly an expected way of using XSL.

While certain problems appear difficult to handle, others have a more natural solution. One issue in language generation is how to handle the fact that a particular fact may be expressed in many different ways, and the way to express it should depend on how it is being combined with other facts. XSLT conveniently provides a "mode" mechanism that at least allows one to separate cleanly different ways of expressing the same content. Figure 9 illustrates how this may be used when defining a template for realising Dublin Core subject information.

<xsl:template match="dc:Subject" mode="pp">
  on the subject of 
  <xsl:value-of select="."/>

<xsl:template match="dc:Subject" mode="adjective">
  <xsl:value-of select="."/>

<xsl:template match="dc:Subject" mode="sentence">
  The subject category is: 
  <xsl:value-of select="."/>
Figure 9. Using modes to define different syntactic constructions for same input

While these "tricks" and others may provide some mileage in creating reasonable output text given varied input, allowing stylesheets to be written that work fairly well for a particular element set, ultimately natural language generation is a problem that requires complex algorithmic solutions; it involves optimisation - choosing the best output given varied quality criteria. XSLT does not allow this type of processing. One approach, used in related work on generating multilingual text from a common input representation, is to make use of Java/Javascript functions, which may be called from XSL. However, making extensive use of this facility results in code that might just have well been written without the XSLT layer. Where quality texts are required we need to turn to other methods, using a general purpose programming language (or specific natural language generation toolkit (e.g., [7] ) that can take parsed XML or RDF and determine how it can be coherently expressed.

5. Looking Beneath the Surface: Should we use the RDF Data Model?

In the discussion so far we have considered how to give a fairly direct presentation of a selected subset of RDF metadata using XSLT. We work with the surface (XML) syntax of RDF (acknowledging the utility of first converting this to a non-abbreviated normal form), and have only considered fairly simple uses of RDF for representing tables of attributes and values. Yet RDF allows richer underlying structures to be represented, and allows inferencing on the knowledge encoded. RDF has a well specified data model, and parsing RDF will result in a set of triples which provide a canonical representation that allows further inference. An example of triples based on the RDF in figure 1 is given in figure 10. This is generated using the standard RDF parser, SiRPAC [13]

       'D-Lib Program - Research in Digital Libraries').
       'The D-Lib program supports the community of people
     with research interests in digital libraries and electronic
       'Corporation For National Research Initiatives').
Figure 10. Triples of the RDF Data Model

There have been various attempts to present useful visualisations of the information from this data model [13] ; Furthermore, as mentioned, working with a canonical representation with defined semantics would allow inference to be done which relates the metadata to the user or query in more sophisticated ways.

Although working with RDF triples does provide more power, it also involves some loss of information (e.g., order of element attributes). We are interested in how to present the information back to the user. But the surface RDF syntax may be closer to a "human-readable" form, making hierarchical structure and decomposition explicit; and RDF authors (or authoring tools) may be influenced by readability issues when structuring their metadata (e.g., defining "important" property elements first). One of the first things that would have to be done when working with the triples of the data model would be to reconstruct (normalised) trees.

We believe that for our purposes there is currently much to be gained from doing fairly simple personalisation, and ensuring that the descriptions output are concise and coherent. In our initial work we have therefore worked entirely with the surface (XML) syntax, and leave to further work the possibility of reasoning more deeply with the parsed representation (as for example in the SiLRI project [3] for querying RDF data).

Whether or not we work with the RDF data model, there may be significant gains in making more use of the RDF schema for a given metadata element set - currently we just insert HTML links in our descriptions to the appropriate sections of the schema. In principle an RDF schema allows definition of a simple ontology, and property values may refer to concepts in this ontology rather than being literal strings. If this facility is widely used, then it will be important in presenting RDF to, at the least, look up the appropriate label of schema-defined concepts and use that in the desciption, and possibly do further reasoning using the ontology to decide just how best to refer to a concept within a given context.

6. Conclusions

XSLT, despite its complexity and power, has limitations as a tool for creating tailored resource descriptions from metadata. For tabular descriptions, the limitations seem to lie partly in current standards for associating stylesheets with documents; these make it awkward to "mix-and-match" stylesheets and documents, to allow personalised output, and also make it difficult to specify that a sequence of stylesheets should be piped together (which seems rational modular development, allowing separation of content selection and presentation).

When creating coherent textual output, the limitations become more apparent. To produce quality text from varying input data is a complex problem, and though we can come up with various useful tricks to make this tractable when the input is fairly constrained (e.g., using modes; creating intermediate tree structures), XSLT is not suitable for less constrained input, when we need to turn to general purpose programming languages or natural language generation tools.

Generating descriptions from RDF metadata is complicated somewhat by the range of alternative abbreviated syntaxes allowed in the current RDF recommendation. These can be "normalised" out, but their existence appears to do no-one any favours, making both machine processing and human interpretation more difficult.

Many of the methods described in this paper, while developed for presenting resource descriptions from metadata, should apply to many related applications where structured data in XML syntax is to be presented flexibly to the user. For example, e-commerce applications may want tailored product descriptions from product data represented in XML. We would expect that XML schemas [17] would provide information on structures that could be used in setting up stylesheets and presenting the data.


This work was partially funded by EPSRC grant GR/M23106. Richard Tobin provided advice on XML and XSLT (and much more).


[1] Bental, D., Cawsey, A., Rock, S., and McAndrew, P. The Need for Natural Language Generation Techniques to Produce Resource Descriptions in MIRADOR IEE Colloqium on Searching for Information: Artificial Intelligence and Information Retrieval Approaches , November 1999, Glasgow, UK.

[2] Cahill, L., et al, In search of a reference architecture for NLG systems in proceedings of European Workshop on Natural Language Generation, Toulouse, May 13-14, 1999.

[3] Stefan Decker, Dan Brickley, Janne Saarela, Jrgen Angele A Query and Inference Service for RDF in QL'98 - The Query Languages Workshop, 1998.

[4] Reiter, E. and Dale, R. Building Applied Natural-Language Generation Systems. Journal of Natural-Language Engineering, 3:57-87, 1997.

[5] Sanderson, M. Accurate user directed summarization from existing tools Proceedings of the 7th International Conference on Information and Knowledge Management (CIKM 98), ps 45-51, 1998.

[6] James Shaw and Kathleen McKeown, "An Architecture for Aggregation in Text Generation." in Proc. of 15th International Joint Conference on Artificial Intelligence, Poster Session. , 1997.

[7] White, Michael and Caldwell, Ted EXEMPLARS: A Practical, Extensible Framework for Dynamic Text Generation In Proceedings of the Ninth International Workshop on Natural Language Generation, Niagara-on-the-Lake, Canada, pp. 266-275. , 1998.

[8] World Wide Web Consortium Resource Description Framework (RDF) Model and Syntax,, 1999.

[9] World Wide Web Consortium XSL Transformations (XSLT) W3C Recommendation,, 1999.

[10]World Wide Web ConsortiumXML Path Language (XPath),, 1999.

[11] World Wide Web Consortium Resource Description Framework (RDF) Schema Specification (W3C Proposed Recommendation), 1999

[12] World Wide Web Consortium Associating Stylesheets with XML Documents W3C Recommendation ,, 1999

[13]World Wide Web Consortium SiRPAC - Simple RDF Parser & Compiler , 1999.

[14]The Dublin Core Metadata Initiative,

[15] IMS Metadata Specification

[16] The Gateway to Educational Materials

[17] World Wide Web Consortium XML Schema Part 1: Structures, W3C Working Draft


Alison Cawsey was awarded a PhD in Artificial Intelligence in 1989, from the University of Edinburgh. Since then she has worked at Cambridge and Glasgow Universities, and is currently a lecturer in Computer Science at Heriot-Watt University. Her research has mainly focused on techniques to generate tailored descriptions and explanations, for education and medical applications.