An interchange format for cross-media personalized publishing

Patrick van Amstel, Pim van der Eijk,* Evert Haasdijk, David Kuilman
Cap Gemini Nederland BV


Web sites are rapidly becoming the medium of choice for one-to-one marketing, communication and commerce. Many commercial solutions in this area have the following drawbacks: they force companies to implement systems within a single framework that is highly vendor-specific and that does not allow them to reuse content for other media. In this paper, we introduce i*Doc, a simple XML interchange format for content-level conditionalization based on a variant of the MIL-PRF-87269 standard for class IV/V IETMs. This format can serve as integration format in multi-vendor CRM solutions and offers consistent cross-media publishing to multiple lower-level delivery channels such as direct mail, ASP, JSP, and WML. Personalization is determined by properties that can be bound to intelligent external systems and determined dynamically.

As a showcase for i*Doc, we have developed a demonstrator of an on-line wine shop, where i*Doc serves to transport information between a database of product descriptions and generated ASP pages. The Web site is highly dynamic, as its behavior is controlled by properties that are re-computed using predictive models generated by the OMEGA Predictive Data Mining (PDM) system. The use of i*Doc allows content to be rapidly retargeted towards other Web delivery platforms, such as JSP, direct mail or mobile Internet.

Keywords: Customer Relationship Management (CRM), Electronic Commerce, Extensible Markup Language (XML), Interactive Electronic Technical Manuals (IETM), Personalization, Predictive Data Mining (PDM)

1. Introduction

Customer Relationship Management (CRM) is all about: acquiring, establishing and retaining a mutual business relationship based on knowledge the company has acquired from customer behaviour, preferences and response. Coupling knowledge of company processes to the insight of customer behaviour is a key factor for establishing effective communication, transaction and processing of customer sessions. Knowing, on the basis of data mining techniques and predictive modeling, how a customer wants to be treated and what triggers interest, is an important ingredient of CRM. Making connections to back-office sales strategies and content repositories completes the opportunity to build a true one-to-one experience with the customer.

The World Wide Web is rapidly becoming a medium of choice to achieve personalized marketing and commerce. Within a company, one-to-one communication affects many departments and is being addressed at many organizational levels, ranging from strategic sales and marketing analysis and decision making down to implementing their implications for back-end information systems and front-end Web engineering.

A range of commercial products are positioned as frameworks for development of one-to-one Web communication solutions. In section 3, we will provide a brief overview of the technical approach shared by some of these products, and will argue that they have several important drawbacks.

Reaching the users of tomorrow will not only be restricted to the means we know today, but more likely to different media that will be used transparently in the situation the user is in. One could argue that communication will be driven by the requirements of the customer, not by the means of communication. Ideally, information interchange will adapt to the needs of the user, even if the person is not even aware of this fact.

As an attempt to provide for this concept, we have developed i*Doc, an XML-based format that encodes a snapshot of a company's portfolio and marketing strategy. i*Doc is heavily based on approaches developed for Interactive Electronic Technical Manuals (IETM), as discussed in section 4. In contrast to existing approaches, i*Doc is neither a marketing data management solution nor a delivery format but an interchange format. Delivery formats like Microsoft Active Server Pages (ASP) or JavaServer Pages (JSP) can be generated in fully automatic fashion, thus offering significant flexibility in deployment options at potentially lower cost. As an XML-based interchange format, i*Doc offers a simple interface to integrate information systems. Finally, i*Doc offers simple integration to intelligent external customer modeling systems. As a particularly relevant example of this, we discuss the OMEGA predictive data mining system [9], which can be used to generate intelligent models for customer behavior.

To demonstrate the capabilities of i*Doc, we have developed a sample E-Commerce site that demonstrates on-line personalized wine selling. In section 5, we will discuss the motivation of this showcase, and discuss how i*Doc content is generated from a wine product database, transformed to ASP using an i*Doc compiler, and combined with OMEGA-derived models to offer intelligent personalization.

2. CRM and Content management

Key to provide the conditions necessary to meet the high and diverse demands we are putting on information, is control. Content management (CM) is often mentioned in this context to scope the functional domain of managing information in chunks (or components), irrespective of its purpose or use in a later (IT-) life-cycle. Bridging the gap between customer expectation and business response to individual needs and circumstances can only be properly addressed with the following prerequisites:

What is needed is a capability to automate the process of collecting and collating information from all the operational systems that manage interaction with the customer, such as front-office sales automation systems, call centers (including telesales and telemarketing), order processing, customer support, shipping, etc. A standard to define the relationships and enable interchange between systems is of paramount importance. XML seems like the most likely contender to address this need [3]. A content management system based on XML offers the environment to maintain information components on an arbitrary fine-grained level that makes adressing, querying and retrieval of components possible to "fuse" with business rules. Dynamics and user-driven communication can be further extended with the use of Predictive Data Mining(PDM). Results of interaction are constantly updated through the CRM life-cycle (knowing, targeting, selling and designing) and are stored or routed to business logic rules. One can argue that modelling of business rules and content matter on the component-level constitutes the required business intelligence and technological basis for CRM.

This paper will argue that it is necessary to encode both business rules, objects, logic and content in one platform independent format: XML. Figure 1 shows how one-to-one marketing is enabled by acquiring knowledge about customers and managing content as components, that can be assembled according to the profile of the individual customer.

Figure 1 Pre-requisites for one-to-one marketing

3. Current approaches

Systems that are engineered to establish a one-to-one relation with customers are usually based on applications areas such as: the Web, call-centers, direct mail etc. The type of media dictates the mix of ingredients and underlying architecture used within the CRM-system. Databases are used for storing customer profiles, data-mining techniques are applied to detect patterns in customer behaviour, filtering techniques for information dissemination and scripting languages to connect different information sources. There are also more monolithic systems that attempt to do all or most of these things. What is lacking is a consistent approach that is independent of application area, and can be transposed in multiple application environments such as the Web, WAP and direct mail simultaneously.

3.1 Scope of personalized publishing

A number of personalization techniques can be used and related to business logic rules. These techniques have varying effectiveness depending on situation and purpose in mind. The following list gives an idea of possible personalization techniques:

Encoding the above techniques can be achieved in many ways, usually based on dedicated application software working on content fragments that supply transformed and converted content elements on the fly. Within the context of this discussion, we have decided to handle these techniques as external procedures that convey values to the core variables within an i*Doc (see 4.5).

3.2 Web-oriented systems

Today's Web-oriented systems are tools that extend functionality of Web server applications. Site management systems offer the interface between server-based repositories to controlled client Web delivery. Depending on the level of sophistication of the tool, site-developers are able to build server-side applications that respond to client behaviour and implement the calling of these "remote procedures" from client Web pages. ASP and JSP technologies are widely used to encode intelligence within Web pages to deliver a personalised experience. Tools like BroadVision One-to-One [8] and Vignette StoryServer are examples of commercial products based on this concept. The approach both applications share is the separation of business rules and content and the invocation of these rules from Web pages. Figure 2 gives an example of such code embedded in a Web page:

<p>Some CD's you might want to check out:</p>
content =  matchObject.matchContent("match_rule", "MusicAdvice",
    "CDS", visitor, Session.Profile,100);

if ( content != null && content.length > 0 )
  var x;
  for (x=0;x<content.length;x++)
    var item = content.get(x);
    Response.write("<tr><td>" + item.get("TITLE") + "</td>");
    Response.write("<td>" + item.get("ARTIST") + "</td>");
    Response.write("<td>" + item.get("LABEL") + "</td>");
  Response.write("<tr><td>No content available!</td>");
Figure 2 Example of Broadvision invocation of a business rule for personalised music advice

Business rules are encoded as methods that use relational table definitions as parameters. Usually a GUI generates a creation-script, as shown in figure 3. The supplied scripting language can manipulate objects that have been defined in the "Business Manager" workbench. These objects are user-defined entities that map to database records and fields. Setting business rules on business objects is handled in a proprietary fashion: constraints are defined describing boundaries that determine delivered content to the user (or community). Matching agents use the rules (or sets of rules) to fill HTML-templates for one-to-one publishing. Agents are implemented as server-side methods that are invoked from function-calls that reside in Web templates.

Figure 3 A GUI enables the specification of business rules

Within templates, ASP-scripting (or JSP) is commonly used for arithmetic functions and iteration on stored objects. Direct access to services on the server makes this approach very efficient but also very dependent on the implementation of the data and application layers.

Two seemingly distinct sets of information: business rules and content components are still tied together by using application logic that maps directly into the tables of the business rules. In this sense, controlled content delivery is a matter of delivering templates with proprietary application logic that execute matching agents. If a certain business rule is changed, this would require editing of all occurrences of the code throughout the site.

Control on information is set on a level corresponding to the granularity of the database-schemas. However, in real-world applications more control is required such as 'tone-of-voice' and conditional texts. Creating support for these features in a relational database scheme would prove untractable. Deeply nested recursive structures do not map wel on the relational paradigm.

The Web-orientation implies the maintainance of a format that is at the end of its life-cycle. In order to adapt to future market-change and communicate across media, it will be necessary to abstract from low-level access on information components, and switch to a more a generic format for defining business intelligence.

Another issue is the limited distinction between content, layout and business information in the aforementioned tools. Cascading Style sheets (CSS) provide a clear separation of lay-out information and content elements, but has the limitation that stylesheet information can only be bound to elements within an Internet environment.

Current personalised Web delivery tools can therefore be considered as low level tools that rely on trained staff to make changes to the system when business or market demands this.

3.3 Conclusion

We have seen that within current approaches conditionalization is accounted for at a level of HTML templates rather than at the level of content, causing high maintenance cost and no cross-medium publishing capabilities. Second, systems integration is product-specific and costly. Third, customer modeling is largely based on static, handmade profiles rather than on dynamic models derived by mining the company's warehouse of historical sales data.

The challenge we are faced with is devising a format that can integrate multiple sources and serve multiple purposes in a uniform way. The format must also support extensibility: user-behaviour and tracking must be fueled back in the system enabling a more knowledgable communication with the customer.

The systems we discussed still rely on dedicated, proprietary architectures that make it difficult to leverage the effort in building a one-to-one system and re-using that effort in multiple, future environments and applications. The eminent WAP-revolution is a good example of the requirement to maintain business information and customer communication on a higher level for cross-media purposes. The i*Doc-format proposes to capture and implement business rules on this level.

The ability to use a higher-level vocabulary (e.g. isLoyalCustomer, likesNewWorldWine) and a higher-level abstraction on data sources (e.g. CustomerDatabase, LegalSite, PredictiveModelingTool, AccountStatements) are vital to make sure content can be re-purposed to meet future business demands. Business rules, objects, application logic and related content should be made independent of technical issues to meet today's and tomorrow's fast growing demands on information delivery.

The XML interchange standard is designed for text-encoding at any required level for generic purposes and can therefore be considered the choice technology for encoding information for multi-purpose, cross-media delivery of content.

4. The i*Doc document format

4.1 Objectives

Our interest in working on i*Doc is to improve development of intelligent Web-based systems that offer one-to-one communication and commerce to support CRM. The use of an XML-based interchange format allows us to avoid some of the shortcomings of existing systems as identified in section 3. Specifically:

In section 5, we illustrate the use of i*Doc in a simple demonstrator system that features properties determined dynamically by an advanced predictive data modelling system. A production system might keep track of hundreds of properties, many of which are customer-specific, and a significant subset of which are dynamically (re)computed using intelligent external systems.

4.2 Interactive Electronic Technical Manuals

The i*Doc concept is heavily based on results of developments in the context of Interactive Electronic Technical Manuals (IETM). IETM is a concept developed at the U.S. Department of Defense to support operation and maintenance of complex technical systems [17]. The DoD uses a scale of five classes of IETM systems to distinguish various levels of functionality offered. Classes I to III are basic electronically viewable documents, ranging from page-based display (class I), via electronically scrolling documents (class II), to linearly structured IETMs (III). Common delivery platforms for these are PDF or TIFF viewers for class I IETMs (often scanned legacy documents), and HTML viewers for classes II and class III IETMs (the latter with extensive use of frames and limited scrolling).

Class IV-V IETMs offer high-end functionality described in the DoD MIL-PRF-87269 standard document. These classes require the use of SGML [19] as source format, use of databases for storage, and support for context-sensitive navigation and content delivery. In the IETM context, context-sensitivity is important for various reasons. One application is to support situations where different system versions or configurations require different maintenance procedures. Another application is support for different types of users. Some IETMs offer multiple versions of particular information components for novice and expert users. The description for novice users may provide more details, introductory information, and instructions when to call in expert help. The description for expert users can be more succinct or even be limited to a check list, and may describe alternative actions that require special expert skills. Via a user interface control, expert users can switch down to the detailed description, but novice users cannot switch up to the expert view.

While the novice/expert distinction is often the only way to segment the user base of an IETM, the personalization mechanisms for context filtering as offered in MIL-PRF-87269 are very generic. i*Doc applies these concepts to Web site personalization.

4.3 Building morphing Web sites using i*Doc

As a metaphor, morphing illustrates a process where content adapts dynamically to the individual who accesses the site, thus reducing the need to use hyperlink traversal or search engines to access relevant content and increasing the commercial interest of the site. The i*Doc-format is intended to act as a flexible data format to encode content for morphing Web sites. Technically, the requirements on content for class IV/V IETMs are very similar to the requirements for morphing Web sites.

We have been using the term i*Doc to refer to the use of content encoded using an XML version of MIL-PRF-87269 content in combination with intelligent external systems. Apart from adapting the SGML specification to XML, i*Doc makes explicit use of three-valued logic and uses namespaces to merge content from multiple sources. The format also supports definition of access to ODBC [25] and external COM [24] objects.

The i*Doc format is a declarative XML-based vocabulary to express conditional content. As opposed to HTML, i*Doc only standardizes a markup sublanguage, as in itself it provides only the limited set of constructs that express conditionalization. These elements need to be complemented with content-bearing elements. The distribution of i*Doc tags is governed by the i*Doc Document Type Definition (DTD). This schema is a simplified version derived from the MIL-PRF-87269 IETM standard. Distribution of content elements is governed by application-specific content DTDs. i*Doc can be used with both higher-level content DTDs like DocBook [12] or lower-level DTDs like HTML [18,32] or WML [31]. Content and conditionalization can be separated cleanly using the namespace mechanism [7].

In figure 4 an example is shown of an i*Doc frame, as it might be displayed graphically in a (hypothetical) marketeer's workbench. The i*Doc notation is shown in figure 6.

Figure 4 An i*Doc flow-diagram

4.4 The MIL-PRF-87269 language

MIL-PRF-87269 offers a standard language to express conditional context filtering [13,15]. It offers data structures corresponding to standard programming language constructs using an SGML-based syntax (in this paper converted to XML syntax for consistency). In this subsection, we will briefly summarize (a simplified variant of) this language. Expressions can reference properties that can be typed as integers, strings or booleans. Constants can be defined and referenced using the elements <integer>, <string>, and <boolean>. Operators can be used to test property values or to construct complex expressions. There are integer operators like <gt/> (greater than)and <lt/> (less than) that yield boolean values. Boolean operators can be combined using <and/> and <or/> operators. String valued properties can be tested for equality.

The following figure displays an example from a hypothetical IETM that tests whether a property SerialCode is less than 16230. This test might conditionalize content that only relates to the first instances of a particular product (that may suffer from a defect that has been remedied for later releases). Note that in this example no namespace prefixes are used.

Figure 5 Example of MIL-PRF-87269 expression of the condition "SerialCode" < 16230

These expressions can be used within IETM content to conditionalize document sections. MIL-PRF-87269 contains the following constructs that express conditions:

The element <Node> can be used inside a conditional element as a generic container for content. It contains an optional first daughter node precond. A precond element contains an expression that, when evaluated at run-time, should return boolean "true" (unless the node is contained in a NodeAlts). Nodes can be referenced via unique identification attributes. The i*Doc compiler used in section 5 can generate separate output units (HTML pages, WML decks) for various nodes and convert ID/IDREF cross-references to hypertext links.

The following example shows the use of namespaces to differentiate i*Doc elements from content elements, in this case elements from the XHTML DTD [32]. The example shows an <ifNode> that limits access to a special offer to high-value customers. This is done by only displaying a hyperlink to users that have the boolean property highValueCustomer set to the value true. This corresponds to the graphical representation shown in figure 4.

      <xhtml:a href="specialOffer">Read all about our special vintage 
         Champage offer</xhtml:a>
      <xhtml:a href="dailySpecials">View our daily specials</xhtml:a>
Figure 6 Conditional hyperlinks to nodes "specialOffer" or "dailySpecials"

The MIL-PRF-87269 standard assumes a run-time interpreter that has knowledge of the syntax and semantics of the expression language, or compilation to a format that provides equivalent functionality. The interpreter should validate the conditions on <nodeAlts> elements, and evaluate the <expression>s contained in <ifNode>s. It should also maintain a lookup table associating properties with values. All properties have global scope and need not be initialized.

Properties can obtain values in one of several ways. First, a value can be asserted in an <assertion> statement in a <postcond> in a node. This is shown in the next figure, with an example that might be used in an on-line shopping site such as the wine site discussed in section 5. This assertion records that a customer has visited a particular page containing a Champagne offer. This illustrates a simple way to selectively record user navigation, which might subsequently be used to generate content (such as related offers) sensitive to site navigation patterns.

Figure 7 The property "champagneOfferVisited" is set to the value "true"

In IETMs, a second way a property value can be set is through user-interaction. The run-time interpreter is required to detect reference to properties that are not assigned values. In MIL-PRF-87269, a <property> can have a dialogRef attribute, which references a <dialog> element. At run-time, when the property value is undefined, a class IV IETM can use this element to create a dialog box, with a request to the user to supply a value for the property. In a Web site context, <dialog>s could be rendered as forms. However, in an E-Commerce application, interaction with customers should be as specific and unintrusive as possible. The number of links to follow or forms to fill out before a user obtains relevant data has to be minimized, because each additional barrier between entry point and data means a percentage of visitors will drop out.

Information-seeking dialogs to deal with unknown property values must also be avoided for another reason. They may reveal sensitive competitive information about a company's marketing strategy or about the type of information a company maintains about its customers. As a result, we have adopted an alternative strategy for the run-time system based on explicit three-valued logic [6]. To account for this at the content level, we have modified the MIL-PRF-87269 <ifNode> to takes four arguments: a condition, a "true"-clause, a "false"-clause, and an "unknown"-clause, which is taken if the <expression> evaluates to "unknown". i*Doc authors (or tools generating i*Doc content) can therefore explicitly specify what content should be delivered in these cases.

Apart from assertions and by dialogs, properties can be bound to external processes. A class V IETM is an integrated system. Values for properties may be supplied by external, intelligent systems, and need not be user-supplied (as in a class IV IETM). In the IETM context, such systems might be data acquisition applications or expert systems. To reference such external systems, the IETM content may define <process> elements that define such external processes and the <parameter>s they take. In class V IETMs, the dialogRef attribute can also reference a <process> element.

The demonstrator described in section 5 was built using an implementation that supports two types of process invocations in i*Doc, viz. ODBC database access and COM binding. The following (simplified) example shows a simple XML notation that expresses that the value for the property highValueCustomer can be determined by looking up the field Value in a row with value of field ID identical to the value of property userID in the table Customer in an ODBC-accessible database named CustomerDB.

<process id="selectCustomerValue" type="odbc" dsn="CustomerDB" 
   <parameter type="in">
   <parameter type="out">
Figure 8 Process definition and parameter specification

Note that binding of properties is separate from reference to properties in i*Doc content. For example, the definition of the <process> can be changed if the value is no longer stored statically as a database field but computed dynamically by an external predictive model. This change can be made without any changes to i*Doc content containing expressions that reference the property highValueCustomer. i*Doc also allows properties that are set by assertions to be marked as persistent (stored across various sessions) or only relevant to particular sessions (and thus re-set after session time-out). Apart from customer-specific properties, properties might be used to encode product availability (bindings to inventory management systems), or even to reflect time-or weather-related information.

4.5 Associating i*Doc to predictive models

For one-to-one marketing to be feasible, it is necessary to have an accurate picture of a customer's preferences, i.e. it is not enough to know with whom you are communicating, but more importantly, to know what that person is interested in and how that should be presented. Predictive models provide this ability and thus complement the matching techniques offered by existing CRM-solutions. Predictive models can be used to identify customers or prospects that offer cross- or up-selling opportunities, are at risk of leaving, are likely to grow or become more (or less) profitable, to name but a few possibilities. Having such predictions available greatly enhances the set of functionalities available to customize customer contact.

Predictive data mining (PDM) aims to develop such models on the basis of historical examples of customer behavior: given a set of customer data (e.g. age or level of income) and observed behavior (e.g. did or did not respond favorably to an offer for some product), models or rules are developed that map known data to predicted behavior [14,16]. Techniques employed by predictive data mining systems range from linear regression and decision trees to neural networks and genetic algorithms. The PDM system used in the demonstrator is OMEGA, a system that employs a combination of statistical methods and genetic programming [9].

Obviously, the predictions offered by these models need to be available at the time of customer contact. This can be achieved by appending the model results to the available datasources (off-line), but a much better alternative is to assess each case at real-time. The latter option ensures that up-to-date information is used and allows for the use of information entered by the user during the session. OMEGA-developed models can be exported to a runtime interpreter that can be incorporated into CRM systems.

i*Docs use OMEGA-models as external procedures that are triggered by means of a function call. Within i*Doc the datasource and datatype are insignificant. Auxiliary processing of function definitions binds the required information of the physical location of the datasources and the appropriate data handles. The models can be used both as a stand-alone application or as a module within a real-time environment. By making external function calls, i*Docs remain independent of the back-office systems and the issues related to the implementation of the PDM-system.

5. The Wine-Online showcase

5.1 Wine Online

As a showcase of i*Doc technology, a prototype E-Commerce site offering personalized wine offers has been developed. The wine trade is typical for businesses that feature a very broad and diverse range of products, a diversity of parameters that determine an individual customer's preference including cost, regional provenance, grape variety, ability to age or to match with food. Customer communication also needs to keep track of interest levels (novice versus professional buyers) and of opportunities for cross-selling (similar or complementary offerings) and affinity networks (other luxury goods, wine travels).

Sites like the Wine Online demonstrator generally offer free access, but require users to register to have access to (parts of) the site. User registration for the demonstrator involves specification of interest level, gender, age, regional and grape varietal interests, and of average prices of products purchased. When logging in, users are presented with a customized product offering. Advertisements are generated on the basis of similar criteria as product offerings. Interest level determines tone of voice: descriptions using wine jargon are avoided with novice users. The site features special offers of limited edition products for its premium customers.

In a production environment, product sales history can be tracked in a data warehouse to perform trend analysis and to predict customer interest. Provided sufficient data, data mining software might detect subtle, seemingly unrelated micro-trends, such as "customers that previously bought Chilean Merlot and Australian Chardonnay and now buy Argentine Malbec are likely to be interested in New Zealand Cabernet Sauvignon". This allows the site to offer added value.

5.2 i*Doc in Wine Online

In Wine Online, i*Docs are used as content transport format. Wine descriptions are stored using a commercial relational database. For each product, the following information is available: ProducerName, ProducerRating, ProductName, PriceClass, Price, ProductRating, Vintage, Color, Grape, Continent, Country, Region, InformalDescription, FormalDescription. The demonstrator contains sample content taken from a professional wine buyer's guide [27].

The i*Doc generator is an ad hoc XML generator (developed using the ODBC and XML libraries in Perl) that essentially produces a database dump, with wine product descriptions (the content) grouped according to various selection criteria, expressed as ifNodes and nodeAlts (the marketing). The XML generated follows a hypothetical markup language for wine descriptions corresponding to database field names. XML elements for i*Doc and content-encoding XML are distinguished using namespace prefixes. When registering, users indicate the average price of wines they buy as well as their interest level (novice or professional). The following XML fragment orders products according to these properties.

      <wdml:ProducerName>J.B. Adam</wdml:ProducerName>
            <wdml:Description>This fresh, dry ... </wdml:Description></idoc:node>
            <wdml:Description>... Professional wine description featuring wine 
               trade jargon ...</wdml:Description></idoc:node>
      <!-- ... other information about this product ... -->
   <!-- ... other products in price class "A" ... -->
   <!-- ... other nodes for other price classes ... -->

Figure 9 Sample i*Doc content for on-line wine shop

Products are grouped in Nodes by priceClass. Note that there is an embedded conditional section within the Product that selects either the InformalDescription or the FormalDescription database field content as source for the <wdml:Description> element. This shows that nesting of conditional sections is straightforward and enables personalization at various levels of granularity.

To publish this i*Doc fragment as a Web site, two transformations are needed. First of all, the wine description content elements need to be mapped to an HTML presentation. The Wine Online HTML front end was developed by a graphics designer using a commercial HTML editor and incorporated as XHTML output in an XSLT stylesheet. This stylesheet replaces all content elements with XHTML but leaves all conditional elements in place. This step of the process is specific for the XML content matter and for HTML servers as publishing systems and would need to be rewritten for other XML languages and for non-HTML channels.

Next, personalization semantics needs to be applied to the various conditional constructs. After initial experiments with server-side interpretation of i*Doc, the demonstrator was built by compilation of the XHTML-based i*Doc content to ASP (Active Server Pages, [23]) to benefit from standard facilities for session management and for access to ODBC and COM data sources in commodity Web servers. This step is independent of XML content matter (now replaced by XHTML) but specific to the ASP presentation engine.

5.3 Publishing i*Doc as Active Server Pages

For reasons of implementation efficiency, it proved useful to re-use and build on top of the Microsoft Active Server Pages [23] architecture as a delivery platform for i*Doc content, to benefit from built-in ASP features like session management and database connectivity. A compiler has been built to deliver i*Doc content for the following reasons:

The actual compilation is done in two passes. In a first pass, properties used in the i*Doc content are collected and proper variable initialization code is generated. The ASP/HTML is generated in the second pass.

After an initial prototype built using Omnimark [26], the i*Doc compiler is currently written in Java using the SAX API [22]. SAX offers an interface to parser events, such as "element start" and "element end". SAX classes are written to handle these events for all i*Doc elements. The following code shows the interface for the <idoc:property> element. The string value of the property can be retrieved using the data interface. Content other than i*Doc elements is passed unchanged.

     public void start (String name, AttributeList atts) 
     public void data(String pcdata){}
     public void end(String name){}
Figure 10 SAX interfaces for Property

In an Internet environment, the presentable unit is the Web page. The i*Doc author (or generating application) can specify that content is to be rendered as separate pages. Content of a node with attribute id="value" is written to a file "value.asp", and idref-based cross-references are converted similarly.

The three-valued logic (true, false, unknown) requires additional programming because VBSCRIPT (the scripting language of ASP) does not support this. Therefore, when i*Doc's conditional constructs are compiled into if...then , while...wend, case statements, three-valued logic is translated in a two-valued logic (true, false). An ifNode can be translated to an if statement that checks if the boolean expression in it is undefined. If defined, an embedded if statement checks if the expression is true or false. The nodeAlts element is converted similarly. The following figure illustrates this for the i*Doc fragment shown in figure 6.

<% if (highValueCustomer <> "")) then %>
<% if (((highValueCustomer)=("TRUE"))) then %>
<A HREF='specialOffer.asp'>Read all about our special vintage champagne offer</A>
<% else %>
<A HREF='dailySpecials.asp'>View our daily specials</A>
<% end if %>
<% end if %>
Figure 11 ASP code for Champagne offer

By default, Web pages are stateless. In ASP there is a solution for this problem using session variables. This approach doesn't solve the problem for properties that have to be saved across sessions (thus keeping a history of the interaction of a site and a particular user). The compiler makes the distinction between session and user variables and will store the user variables per user in a persistent i*Doc-property store.

Evaluating expressions within i*Doc content is easily achieved by transforming <plus/>, <divide/>, <eq/>, <lt/>, <and/> etc. to the corresponding ASP control structures.

Consultation of databases can be easily generated for ASP because of its support for ODBC [25]. Appropriate bindings are generated according to the i*Doc <process> element (see figure 8). Retrieving values from external sources requires an additional functional component within a Web page such as a COM object [24]. This dedicated application serves as an API to a server application. Within i*Doc the COM object is invoked using a function call. As a matter of fact, all external processes that have to be bound to i*Doc-constructs are implemented as functions.

The Java subclassing mechanism is used to separate the ASP code generation from generic i*Doc processing. This means that the compiler can be easily re-targeted towards other publishing mechanisms, such as JavaServer Pages [28].

6. Related work

One-to-one communication and personalization are actively researched topics. They have been studied in the context of areas as diverse as information filtering [1], recommender systems [30], targeted advertising [21] and personalized newspapers [20, 29]. Systems integration and user interface generation are "engineering" issues peripheral to the core research issues in these areas and therefore much less actively researched.

The concepts behind MIL-PRF-87269 have been further developed in the MID (Metafile for Interactive Documents) project [2]. MID offers a more general language for portable IETM view packages. This approach is further being developed for ISMID, an upcoming ISO Interchange Standard for Modifiable Interactive Documents [10].

Finally, the market for Web-based CRM products and solutions is very competitive and support for XML technology is increasing in recent and upcoming product releases. While we are unaware of pure, declarative interchange formats like i*Doc that emphasize a strict separation from "output channels" like HTML sites, a product like Art Technology Group's Dynamo improves integration with back-end content management systems using "Open Content Adapters" interfaces [5], and also offers some XML support. It would seem to be relatively easy to add i*Doc support to such a product.

7. Discussion and future work

i*Doc is an interchange format for conditionalized content. An i*Doc document instance can be seen as a snapshot of a company's marketing strategy at a particular point in time, as it contains commercial product information ("what do we sell") plus market segmentation information ("which categories of customers do we sell this to"). In a complete solution, i*Docs would be intermediate data structures that are assembled from separate systems for product information and customer information.

In the Wine Online demonstrator, the marketing logic (encoded as nodeAlts and ifNode elements) is fixed in the script that maps database records to i*Doc XML content. In a more realistic setting, a graphical i*Doc authoring environment would be needed to allow marketeers to create and update this logic, possibly using intuitive layouts as the one shown in figure 4. Our limited experimentation with XML editors so far did not confirm our initial idea that these tools might be a good platform to develop such an environment. Workbenches for i*Doc would also need simulation tools to answer questions like "How would this site present itself to this category of customers?" and "How many of my customers will this special offer be displayed to?". Commercial CRM systems like BroadVision and ATG offer extensive support in this area.

Another issue is the issue of business rules. In i*Doc, properties can currently be set by assertions, retrieved from database fields, or computed by models in external systems. Apart from this, there is a clear need to allow marketeers to specify rules (functional dependencies) that derive values for properties based on other properties. We are currently working on a Business Rule Markup Language in which such rules can be expressed and exchanged. Operationally, business rules would be evaluated dynamically using forward chaining so that values of additional relevant properties are computed before interpreting i*Doc fragments. Commercial CRM systems typically offer support for business rule specification, but store these in proprietary (and sometimes non-transparent) formats.

Several potential extensions to i*Doc are conceivable to improve its usefulness as representation format, in particular when used to transport tabular information stored as database records. The i*Doc content for the wine demonstrator was generated by a simple database dump program and therefore combines conditional elements and static wine product descriptions. In realistic situations, i*Doc content would be combined at the authoring level with elements that represent (results of) database queries, rather than being generated by a database dump program. In i*Doc, properties can only be bound to the simple data types number, string and boolean. When used with database query elements, there is a potential need to be able to bind properties to collections, to have ways to query them (such as finding out their cardinality) and to perform operations on them (such as union, intersection, duplicate removal). Another use for collection-valued properties would be to reference and represent the contents of a user's shopping basket. For instance, there might be an <ifNode> in i*Doc that shows specific content in case the number of items in the basket is three or higher. Finally, there is a need for a generic type system for properties.

Although i*Doc is still very much a research topic, the authors are actively evaluating the applicability of i*Doc and looking into its use for commercial projects to deliver personalized content and commerce. This is done in collaboration with business partners and customers of Cap Gemini.


We thank Annette Nijenbanning for designing the online wine shop.


[1] Special Issue on Information Filtering. Communications of the ACM, 35 (12), 1992.

[2] M. Anderson, editor. The Metafile for Interactive Documents. Application Guide and Draft Performance Specification for the Encoding of Interactive Documents. MID-2. Naval Surface Warfare Center, Carderock Division, Maryland, 1996.

[3] R. Anderson. "Customer Relationship Management (CRM): perspective." Gartner DataPro, June 1999, pages 1-9.

[4] V. Apparao et al. Document Object Model (DOM) Level 1 Specification. W3C Recommendation.

[5] Art Technology Group.

[6] J. van Benthem, A manual of intensional logic. Stanford: Center for the Study of Language and Information (CSLI), 1988.

[7] T. Bray, D. Hollander, and A. Layman. Namespaces in XML. W3C recommendation.

[8] Broadvision One-to-One,

[9] Cap Gemini, KiQ, Oracle and Sun Microsystems. OMEGA+ Predictive Data Mining. White paper.

[10] N. Chenard and D. Cooper (editors) Interchange Standard for Modifiable Interactive Documents

[11] J. Clark. XSL Transformations (XSLT) Version 1.0,, W3C recommendation.

[12] DavenPort Group, DocBook 3 DTD,

[13] Department of Defense standard MIL-PRF-87269,

[14] A.E. Eiben, A.E. Koudijs, F. Slisser, Genetic Modelling of Customer Retention, Proc. EuroGP 98.

[15] R. Fye, N.E. Montgomery, G.S. Weiss. An object-oriented approach to developing MIL-87269 conforming ETM, ICW, and IETM Content Data models and instances. Proceedings of SGML/XML '97, pages 501-510.

[16] E.W. Haasdijk, R.F. Walker, D. Barrow and M.C. Gerrets, Genetic Algorithms in Business, in: J. Stender, E. Hillebrand, J. Kingdon (eds) Genetic Algorithms in Optimisation, Simulation and Modelling. IOS Press, Amsterdam - Ohmsha, Tokyo, 1994, pp. 157-184.

[17] B. Harvey. Interactive Electronic Technical Manuals. Paper presented at the ISO STEP Conference. Chester, England, 1997.

[18] HTML 4.01 Specification. W3C recommendation.

[19] International Standardization Organization. Standard Generalized Markup Language, ISO-standard, Information processing, Text and Office Systems, 8879:1986.

[20] T. Kamba, K. Bharat and M.C. Albers. "The Krakatoa Chronicle -An Interactive, Personalized, Newspaper on the Web". Proceedings of WWW4.

[21] M. Langheinrich et al. "Unintrusive Customization Techniques for Web Advertising". Proceedings of WWW8.

[22] David Megginson et al. Simple API for XML.

[23] Microsoft Corporation. Active Server Pages.

[24] Microsoft Corporation and Digital Equipment Corporation. The Component Object Model Specification.

[25] Microsoft Corporation. Microsoft ODBC.

[26] Omnimark Technologies Corporation. Guide to Omnimark 5.

[27] R.M. Parker jr. Parker's Wine Buyer's Guide. New York: Simon & Schuster, 1995

[28] E. Pelegri-Llopart and L. Cable. JavaServer Pages Specification. Sun Microsystems.

[29] H. Sakagami et al. "Effective Personalization of Push-Type Systems-visualizing Information Freshness". Proceedings of WWW7.

[30] I. Soboroff, Ch. Nicholas and M. Pazzani. "Workshop on Recommender Systems: Algorithms and Evaluation." SIGIR Forum, Fall 1999.

[31] Wireless Application Forum. Wireless Markup Language Specification.

[32] World Wide Web Consortium. The Extensible HyperText Markup Language (XHTML),, W3C Proposed Recommendation.


Patrick van Amstel studied Technical Computer Science at the Den Haag polytechnic. After his study he worked for a publishing and printing company. He is currently working for Cap Gemini Nederland at the R&D department.

Pim van der Eijk received an M.S. degree in Romance linguistics from the University of Utrecht in 1988, where he subsequently worked as a researcher in Computational Linguistics. Since then he has hold various R&D and consultancy positions with Digital Equipment Corporation and Cap Gemini. His interests include language technology, information retrieval, SGML/XML technology and the Web.

Evert Haasdijk received his M.S. degree in Computer Science at the university of Amsterdam in 1993. He has been working on machine learning, (predictive) data mining and CRM and is currently working as an R&D consultant at Cap Gemini. His research interest include machine learning and business intelligence.

David Kuilman received his M.S. degree in Computational Linguistics from the University of Amsterdam in 1993. He has been actively working with SGML/XML technologies in the past 6 years. Currently, he is a Content Manager working for Wolters Kluwer in The Netherlands.