A Web Services Cache Architecture Based on XML Canonicalization
IBM Research, Tokyo Research Laboratory
1623-14, Shimotsuruma, Yamato-shi, Kanagawa, Japan
ABSTRACTRecently, the Web services model has attracted a lot of attention, but so far there have been few discussions about the performance of Web services. For HTTP, cache technology has been widely used for performance improvement. However, there are some special cache technology issues for Web services. A major problem is that a semantically identical XML request message can be represented in several ways. We have designed a cache architecture using canonicalization and canonicalized templates in order to solve this problem and apply cache technology to Web services. The architecture can be applied to existing Web services without any changes. We implemented a prototype of this architecture and evaluated the architecture. In our experiments, when we canonicalize requests, the performance is approximately doubled. When we use canonicalized templates, the performance improves approximately fivefold.
KeywordsWeb services, cache, performance, XML, SOAP
The Web services model  is attracting the attention of many companies. Web services are applications that can be accessed via open standard technologies such as HTTP  and XML . Web services enable data exchange independent of programming language, platform, and transport protocol by using SOAP (Simple Object Access Protocol)  as a messaging format. Using this feature of Web services, applications on different platforms can be combined. Therefore, it is important that Web services be loosely coupled and dynamically bound.
Since the current focus is on the interoperability of Web services, the issues of operating in the real world, that is to say issues such as performance have not been much discussed yet. In this paper, we describe a cache architecture for Web services.
2. Research Issues
Because Web services are accessed via XML messages, the key of a cache entry can be a request XML message. However, a semantically identical XML request message can be represented differently. If simple literal expressions are used for the keys of cache entries, we may store the same cache entry many times.
There are many examples of semantically identical but literally different messages. One example is white space between elements. Another example is differences of namespace prefixes for the same namespace URI. For example, even if same service is called using SOAP, different request messages are generated by each client implementation. Listing 1 and 2 show request messages to invoke the same service using Apache-SOAP and MS-SOAP respectively.
<?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <SOAP-ENV:Body> <ns1:GetLastTradePrice xmlns:ns1="Some-URI" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <symbol xsi:type="xsd:double">DIS</symbol> </ns1:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Listing 1. Request using Apache-SOAP
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <SOAPSDK1:GetLastTradePrice xmlns:SOAPSDK1="Some-URI"> <symbol>DIS</symbol> </SOAPSDK1:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Listing 2. Request using MS-SOAP
3. Caching Web Services
3.1 System Architecture
Figure 1 illustrates a cache architecture, which uses canonicalized XML messages. In a typical case, a request message is sent, and the Reverse Proxy performs canonicalization to find a cached entry. If found and still valid, the Reverse Proxy returns the response message from the Cache Table without accessing the backend Application Server.
Figure 1. Architecture of Proxy Cache for Web Services
Our proxy can accept messages that are not canonicalized as well as canonicalized messages. Accordingly, the Requestor Nodes do not have to be specialized for this architecture. Rather, when request messages are intended for our architecture, the Reverse Proxy simply exhibits better performance.
3.2 Techniques for Caching
The Cache Table manages cached entries, each of which includes a request and a response message. As in Table 1, the entry also includes hash values of the request and response messages. The Proxy Cache utilizes the request hash value to retrieve entries from the Cache Table more quickly. The response hash is used to check if the response message stored by the requestor is identical to the one stored by the Reverse Proxy.
|Request URL||URL for a request|
|Response||XML document for a response|
|Request-hash||Hash value (20bytes) for request|
|Response-hash||Hash value (20bytes) for request|
Table 1. Columns of Cache Table
Using canonicalization, semantically equivalent XML messages are transformed into a literal expression. A standard specification, C14N, for XML canonicalization has been proposed by W3C . Although C14N defines a generic method for canonicalization, for our purpose it often fails.@For example, we can define a canonicalization rule that transforms Listing 3 to Listing 4. The actual rules here are: namespace qualifiers should be a concatenation of "ns" and a number, and all spaces in each line should be eliminated.
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:GetLastTradePrice xmlns:m="urn:stock-quote"> <symbol>IBM</symbol> </m:GetLastTradePrice> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Listing 3. Original Message
<ns1:Envelope xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/" ns1:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <ns1:Body> <ns2:GetLastTradePrice xmlns:ns2="urn:stock-quote"> <symbol>IBM</symbol> </ns2:GetLastTradePrice> </ns1:Body> </ns1:Envelope>
Listing 4. Canonicalized Message
A Canonicalized Template is useful for simplifying the generation of canonicalized messages at Requestor Nodes. As shown in Listing 5, a canonicalized template contains variables that are indicated by a string surrounded with "$" symbols. When the original document includes "$" symbols, they are replaced by "$$" symbols. A variable name cannot include the "$" symbol. Substituted strings must be composed of character data as defined in XML 1.0:
Character data ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
<ns1:Envelope xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/" ns1:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <ns1:Body> <ns2:GetLastTradePrice xmlns:ns2="urn:stock-quote"> <symbol>$symbol$</symbol> </ns2:GetLastTradePrice> </ns1:Body> </ns1:Envelope>
Listing 5. Canonicalized Template
3.3 Processing Patterns
The requestor sends an unmodified request message. The Proxy Cache performs canonicalization, calculates a hash value, and retrieves any entries with that hash value from the Cache Table.
Canonicalized request messages are sent. The Proxy Cache calculates the hash value of the message, and retrieves any entries with that hash value from the Cache Table. Compared to Pattern-1, we do not need the canonicalization process, and thus avoid XML parsing.
The requestor performs canonicalization, calculates the hash value and includes it in the HTTP header. The Proxy Cache directly retrieves from the Cache Table any entries with that hash value. Compared with Pattern-2, we do not need any calculation of the hash value.
The requestor also manages the response messages, so it includes hash values of request and response messages. The Proxy Cache first finds any entry using the hash value from the request message, and returns an HTTP No Content response if one is found.
4. Performance Evaluation
4.1 System Configuration
Here we describe how Proxy Cache in Figure 1 is implemented. The Reverse Proxy is implemented on top of Jakarta-Tomcat3.3 , a servlet engine for Java. The Proxy Cache is implemented as a Java program, and the entries are kept in memory. For the hashing algorithm, we use the SHA-1 algorithm, which is widely used for hashing. With SHA-1, any XML document is hashed to a 160-bit binary value. In our experiments, we use an extremely simple echo application as a provider application.
The following supporting software was used for our prototype:
|SOAP Engine:||Apache SOAP2.2, Apache Axis-alpha2 |
|XML Parser:||Apache Xerces2.0.0beta3 |
Here are the specifications of the reverse proxy machine:
|CPU:||Pentium III 850MHz|
|OS:||Windows 2000 Service Pac 2|
|Java VM:||IBM JDK1.3.0|
4.2 Experiments Data
We experimented by sending various size messages of Patterns 1, 2, 3, and 4. For comparison, we also tried sending the same messages directly to Apache-SOAP and Apache-Axis. Table 2 shows the result of these experiments, and Figure 2 shows a graph based on Table 2. The number in each cell indicates how many messages the reverse proxy could handle each second, i.e. TPS (transactions per second)
Table 2. Performance of Proxy Cache
Figure 2. Performance of Proxy Cache
Figure 3. Ratio between the direct access and Proxy Cache
From Table 2 and Figure 2, it is difficult to understand how fast our architecture is compared to direct access to the server. Figure 3 shows the ratio between direct access and each of the patterns.
We put the reverse proxy in front of the application server. When the reverse proxy deals with non-canonicalized messages (i.e. pattern-1), the reverse proxy performance is approximately twice that for direct access. When the reverse proxy receives canonicalized messages (i.e. Patterns 2, 3, and 4), the reverse proxy performance is five times faster than direct access. When the reverse proxy deals with small messages, there is no big difference among Patterns 2, 3, and 4. However, when the reverse proxy is handling large messages, the differences become clear. When the message size is large, it is obvious that Patterns 3 and 4 are advantageous.
In real world applications, we expect a lower number for direct access because the application in application server is almost doing nothing. Real applications may require more processes, for example accessing databases. Therefore, in using a cache for a real application, the ratio of each pattern to direct access is expected to be higher than the result here.
The cache architecture for Web services and the experiments for performance evaluation are still preliminary. Although there are many research reports on caching Web content, as far as we know, no research has started addressing Web services. Therefore our goal is to provide an initial outlook for this research area.
In this paper, we have described a cache architecture for Web services, and given the result of initial experiments. Our architecture uses XML canonicalization and canonicalized templates in order to effectively apply existing cache technologies to Web services. Our experiments show that when we canonicalize requests, the performance is approximately doubled. When we use canonicalized templates, the performance improves approximately fivefold.
- S. Graham, S. Simeonov, T. Boubez, G. Daniels, D. Davis, Y. Nakamura and R. Neyama. "Building Web Services with Java: Making sense of XML, SOAP, WSDL and UDDI", SAMS Publishers. (2001, to publish)
- R. Fielding, J. Gettys, J.C. Mogul, H. Frystyk, T. Berners-Lee, "Hypertext Transfer Protocol HTTP/1.1", RFC 2616, U.C. Irvine, DEC W3C/MIT, January 1997 http://www.ietf.org/rfc/rfc2616.txt
- T. Bray, J. Paoli, C.M. Sperberg-McQueen, "Extensible Markup Language (XML) 1.0", W3C Recommendation, February 1998. http://www.w3.org/TR/1998/REC-xml-19980210
- D. Box, et,al. "Simple Object Access Protocol (SOAP) 1.1", W3C NOTE, 08 May 2000, http://www.w3.org/TR/SOAP/ (The latest version is available at http://www.w3.org/TR/soap12/).
- "The Jakarta Project", http://jakarta.apache.org/
- "Apache XML Project", http://xml.apache.org/
- "Canonical XML", W3C Recommendation. J. Boyer. March 2001. http://www.w3.org/TR/xml-c14n, http://www.ietf.org/rfc/rfc3076.txt