Integrating multiple applications into the ANSWER system with XML
Junbiao Zhang, Maximilan Ott
C&C Research Lab, NEC, Princeton, NJ 08540,USA
ANSWER  is an active network based information system capable of semantic routing. Information contents and user queries are injected into the ANSWER system and encapsulated into active packets, which are then processed and routed by the ANSWER backbone according to their content or interest specifications. These packets may include customized decision making routing codes tailored to individual applications. In most cases, these codes are composed of calls to the library functions pre-installed in the ANSWER network nodes, and thus are quite succinct. ANSWER is envisioned to be an enhancement to the current information discovery model in the world wide web. A wide spectrum of applications ranging from e-commerce transactions to software distributions can be supported by the ANSWER system and new applications can be easily added into the system. The strength and flexibility of the system come from the programmability of the underlying network and the ontology based semantic structuring of the application data.
Simply speaking, an ontology defines a common vocabulary that may be shared by a group of software agents communicating with each other in a consistent way without sharing a common knowledge base. In the ANSWER system, each application has its own ontology structure which is shared among all the instances of the application. The ANSWER system integrates and manages a multitude of application ontologies by maintaining ontology trees at its network nodes. Ontology trees from different interfaces of a network node may be merged and forwareded to its neighbors. A generic framework is provided by the system to query and process the tree nodes. However, it is up to the applications themselves to define the attributes inside each node and specify the way in which these attributes can be operated upon. Once these are defined, they should be applied to all the instances of the application, i.e. the data set belonging to the application will be processed by the ANSWER system consistently across all the instances of the application. Further, these instances should be shielded from any details of the ANSWER system and only need to be concerned with a uniform and easy to use format specific to the application.
With all these constraints, an important issue in the system design is how applications of the ANSWER system specify their application data sets to the system. We choose to use XML as the interface language for this purpose. In this poster, we explain how XML is used to seamlessly integrated multiple applications into the ANSWER system.
ANSWER semantics and XML
We use XML as the data integration language in ANSWER based on the following considerations:
- XML is a universally accepted standard markup language and is easy to use for the end users
- XML has a rigorous syntax making the ANSWER data processing module less error prone
- many supporting software tools are becoming available to convert legacy data into XML documents which can be understood by the ANSWER system
- Each ANSWER application can use a DTD to not only regulate the format of the data sets provided by the application instances but enforce a consistent ANSWER semantics on the application data.
To clarify the last point, we shall first explain what we mean by ANSWER semantics. As noted earlier, the core of the ANSWER operation model is the ontology trees. Although there are generic rules in searching and merging ontology trees, special rules are required when handling application specific data attributes in the ontology tree nodes. For example, when merging several ontology trees belonging to an e-commerce application, do we create a new node when the same type of node appears in one of the trees, any of the trees, or all of the trees? Suppose we are creating a new node from two nodes, how do we initialize the attributes in the new node? Do we add the attribute values together, or take the maximum of the two, or their average? These are the kind of semantic information that an ANSWER application must provide in order for its data to be handled properly. We can see that such information can be expressed through XSL. However, since they are simple enough, for our prototype implementation, we opt to embed them in the DTD of the application through the use of hidden attributes, i.e. the #FIXED attribute default. When users inject application specific XML data into the ANSWER system, the ANSWER processing module will be able to extract these hidden attributes and analyze the semantic information contained in them. To better explain this process, we shall use an example e-commerce application and show its DTD as well as a simple XML data instance. Due to lack of space, we will only show a small portion of the DTD which describes TV products:
..... <!ELEMENT TV (SET+) > ..... <!ELEMENT SET EMPTY > <!ATTLIST SET BRAND CDATA #REQUIRED SIZE CDATA #REQUIRED PRICE CDATA #REQUIRED COUNT CDATA #REQUIRED MERGESTYLE CDATA #FIXED "ANY" ATTRDEF CDATA #FIXED "BRAND:KEY;SIZE:IGNORED;PRICE:MINMAX;COUNT:SUM" > ....and a sample XML instance,
..... <TV> <SET BRAND="RCA" SIZE="27" PRICE="299" COUNT="10"/> <SET BRAND="SONY" SIZE="20" PRICE="259" COUNT="15"/> <SET BRAND="ZENITH" SIZE="50" PRICE="1299" COUNT="4"/> <!-- etc. --> </TV> .....Let us take a look at the attributes of the "SET" ELEMENT and explain some of the semantic definitions used there. Note that all but two of the attributes are regular attributes used in the XML data. The two hidden attributes "MERGESTYLE" and "ATTRDEF" define the semantic behaviors of the node and all the attributes in the node. For example, they specify that "BRAND" is the primary key in the ontology node. When merging ontology trees, as long as any of the trees contains a "SET" node with a certain key, a new node is created in the resulting tree. Also, when merging several "SET" nodes with the same "BRAND" field, their "COUNT" fields are aggregated additively while the "SIZE" field will be omitted from the new node. The aggregation of the "PRICE" fields is quite different. Informally, it takes the lower and upper bounds from the two "PRICE" fields and use them as the new range . A set of aggregation methods are defined by the ANSWER system and this set can be flexibly extended by the applications through the use of active instructions of the underlying active network.
When an XML data set such as the one above is submitted to the ANSWER system, the attribute information are used to translate the data into ontology nodes and some of the information are stored in the resulting nodes to control the processing of these nodes. We have explained how tree merging is controlled by these attributes. In fact, they can also be used in controlling the searching behaviour of the query packets. For example, at an ANSWER network node, a query packet looking for a "ZENITH" television set less than $400 will search the "BRAND" fields of the "SET" tree node and may use "COUNT" as a metrics in making its routing decisions.
Staus and future work
Currently, we are finalizing the details of the semantic definition used inside the ANSWER DTDs. The ANSWER system core has been completed and work is underway to build the ANSWER XML processing module. Various applications are planned to be tested under the new ANSWER/XML framework. These include: web content query and distribution, e-commerce applications, technical paper search and software distributions.
 Junbiao Zhang, Maximilian Ott,
Information Routing Based on Active Networks",
The 5th Asia Pacific Conference on Communications, Beijing, China, October, 1999
 David L. Tennenhouse ,
Jonathan M. Smith, W. David Sincoskie, David J. Wetherall, Gary J. Minden,
"A Survey of Active Network Research", IEEE communications, Vol. 35, No. 1, pp80-86, January 1997
 T.R. Gruber, "Toward
principles for the design of ontologies used for knowledge sharing",
Padova workshop on
Formal Ontology, Mar, 1993
 W3C, "Extensible
Markup Language(XML) 1.0"