Skip to main content

skip to main content

developerWorks  >  XML | SOA and Web services | Java technology  >

FastSOA: Accelerate SOA with XML, XQuery, and native XML database technology

The role of a mid-tier SOA cache architecture

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Advanced

Frank Cohen (frank.cohen@rainingdata.com), Director, Solutions Engineering, Raining Data Corporation

07 Feb 2006

Many SOA implementations rely on message formats defined with XML. The resulting message schemas can become complex, incompatible, and difficult to maintain, and can cause serious scalability and performance problems. In this article, Frank Cohen describes a new strategy and techniques for accelerating SOA performance through the use of XML, XQuery, and native XML database technology in the SOA mid-tier.

Many software architects use XML in their service-oriented architecture (SOA) designs even though no SOA standard requires or gives guidance on using XML in SOA. Consequently, the software development community is engaging in many experiments and investigations to find the best way to define service endpoints and message definitions (schemas). Most of these approaches deliver terrible performance and scalability results.

For instance, General Motors Corp. is a proponent of ebXML in SOA, and their early designs -- using the Universal Business Language (UBL) -- created XML messages that are 150,000 bytes to 10 Megabytes or larger. In 2004, my performance testing company, PushToTest, determined that the Java™ application server technology of the day did not deliver sufficient throughput, and exhibited scalability and performance problems in the GM Web Services Performance Benchmark study.

At the time, XML-based Web service technology was still fairly new and I expected the performance problems to be resolved with new generations of the application server technology. Most of these problems still exist.

Web services throughput problems and complex XML

In 2005, PushToTest completed a new SOA performance study (see Resources) that shows how applications built with current Java application servers deliver performance that is not production worthy when dealing with complex XML messages. The problems I found are the same as those in earlier studies:

  • Simple Object Access Protocol (SOAP) bindings (proxies) are inefficient and slow.
  • Every request requires an entirely new set of resources (objects, CPU and network bandwidth) to process a response. There is no caching pattern.
  • Using relational database technology to store XML data is slow and not scalable.
To understand the above three problems, consider how software developers build and deploy an XML service using J2EE application server tools.


Figure 1. WSDL definition illustration
Figure 1. WSDL definition illustration

While you can use a variety of techniques to build an XML-based Web service, I find that most developers prefer to start with a Web Services Description Language (WSDL) definition of the service. Java application servers provide a utility that inputs a WSDL definition and generates a proxy class. The proxy receives a SOAP request and routes the request to a Java object or Enterprise Java Bean (EJB) for processing. The SOAP binding (proxy) is a Java class that is called through a servlet interface.


Figure 2. Java method call illustration
Figure 2. Java method call illustration

Figure 2 illustrates a Web service consumer making a SOAP request to the service. The SOAP binding deserializes the XML content from the SOAP message body. This is processing intensive and complicated because the message body often includes complex data types. For instance, the consumer might send a hash map containing multiple values to the service. The SOAP binding needs to decode the hash map contents and instantiate Java objects for each value. A hash map can contain other hash maps, so the process of decoding SOAP message contents is not easy. Don't believe me? Take a look at the source code to the Apache Axis deserializer.

The SOAP binding instantiates a Java Request object that contains the SOAP message body contents. The SOAP binding calls the target method in the target class and passes the Request object as a parameter. The target EJB or Java object provides all of the processing necessary to create a response to the request. The SOAP binding serializes the return value from the EJB or Java object into a SOAP response message. The SOAP binding goes through the same complexity to decode the values in the response object into values it can serialize into a SOAP response message.

In a study of SOAP bindings created with utilities from the popular Java application servers, I found these problems:

  • The SOAP bindings generated by the application server utilities are inefficient. For instance, I observed certain SOAP bindings create multiple copies of the SOAP request -- with each request instantiated as a String object -- for no apparent reason. Some of the SOAP bindings instantiate up to 15,000 Java objects to deserialize the SOAP request that contains 500 elements in the SOAP message body.
  • On a server equipped with a dual CPU 3.0 GHz Intel Xeon processor, I observed throughput of 15 to 20 transactions per second (TPS) when processing simple SOAP messages where the 10,000-byte payload contained 50 elements. As the complexity and size of the SOAP messages grew, I observed significant scalability and performance problems. Throughput fell to 1.5 TPS for SOAP messages with a 100,000-byte payload containing 750 elements. The larger the number of elements and the depth of each element in the SOAP message body, the worse the problem.

The performance problem multiplies in SOA designs. SOA is a technique for component software reuse. Often one service calls another service in a chain to determine the response to a request from a consumer.


Figure 3. Consumer illustration
Figure 3. Consumer illustration

Not only will the performance problem appear in a single service in the above stack, but each service adds the same overhead as it serializes and deserializes requests and responses. The performance problem multiplies with the number of layers of services called.

Missed opportunities for SOA acceleration

In addition to the slow SOAP binding proxy problem, SOA designs often ignore or overlook two additional issues.

First, SOA designs often overlook the potential for mid-tier service caching to accelerate SOA performance. Consider that most XML schemas in SOA designs define a time-to-live value for a response. In this case, caching a service response and replaying the cached response the next time the service receives the same request is a valid and appropriate way to accelerate SOA service performance.

Second, in my SOA performance tests I tried various approaches to XML message parsing, including the Streaming API for XML (StAX), XML binding compiler, Java Architecture for XML Binding (JAXB), and Document Object Model (DOM) techniques. Some provided better performance than others. For instance, many StAX parsers delivered 2 to 10 times faster performance than DOM parsers.

I wondered if performance improvements might be gained by using something other than Java objects to provide SOAP bindings. For instance, if the incoming SOAP request was handled in a native XML environment, then the Java-based SOAP bindings become unnecessary and the performance slow-down of serializing into Java objects could be avoided.

Furthermore, some native XML environments use the Java Virtual Machine environment but avoid Java object constructions. For instance, Raining Data's TigerLogic XDMS and Kawa/Qexo implement XML processing code by transforming from XQuery directly into Java byte-code. They do this because XML processing code that uses XQuery implemented as byte-code is more efficient in throughput and lines-of-code than using Java objects.

The FastSOA solution

FastSOA is an architecture and software coding practice that addresses these problems:

  • FastSOA solves the SOAP binding (proxy) performance problem by reducing the need for Java objects and increasing the use of native XML environments to provide SOAP bindings.
  • FastSOA introduces a mid-tier service cache to provide SOA service acceleration.
  • FastSOA uses native XML persistence to avoid XML-to-relational transformation performance problems.

The following chart illustrates the FastSOA architecture.


Figure 4. FastSOA architecture illustration
Figure 4. FastSOA architecture illustration

The FastSOA architecture runs in tandem with existing Web-based infrastructure and deploys as a mid-tier cache to receive the service consumer request. For instance, a consumer makes a SOAP request to a service. The mid-tier cache provides a SOAP binding (a proxy). The binding calls an XQuery to handle the XML request document in the XQuery engine. The XQuery checks the cache to see if the request was previously received; in this case, the FastSOA service is able to return the response from the cache without having to go upstream to make the request to the service. This process delivers SOA acceleration through caching for quick SOA performance.

The advantages of the FastSOA approach are:

  • Service end-points are standards based. To the rest of your applications, the FastSOA mid-tier cache looks like the service.
  • No need to replace your existing systems or code. The FastSOA mid-tier cache fits into your existing data center as a data aggregation and mitigation service.
  • In the event that the upstream service is temporarily unavailable, the FastSOA approach provides an easy mechanism for browsing cached data while the service is offline.
  • Requests that are served from the cache lower the amount of bandwidth normally needed to support communication between consumer and service.

To understand FastSOA from a practical standpoint, consider the following example application.

An XML example

General Motors created a service using SOA patterns to enable automotive dealerships to order parts from a manufacturing facility using ebXML-based patterns and protocols. The service understands an XML schema from the Software Technology in Automotive Retailing (STAR) organization. STAR is a combined effort of the big automotive manufacturers, including GM. STAR created and maintains the Business Object Document (BOD) schema and defines -- among many other things -- a request for an inventory check.

The CheckInventory request validates the requestor and checks inventory levels and status. The service consumer creates an inventory request document according to the STAR schema. The consumer marshals the document in a request and sends it over a network to the service. The service sends an inventory status response that shows which parts are in stock.

The parts ordering service benefits from the FastSOA patterns by reducing network bandwidth needs and mitigating the service bandwidth required to respond to redundant requests.

For instance, the inventory response for the parts from an automotive dealership includes a Time-To-Live (TTL) element. The TTL element defines the number of seconds that the response is valid. For example, GM may have set this value to 60 seconds. During those 60 seconds, FastSOA responds to inventory requests from the stored cache of inventory responses from the mid-tier. The service avoids unnecessary bandwidth use and improves the time to respond to requests.

The following table shows how to calculate service acceleration metrics in a network where the service resides on a server that is external to the local network and a FastSOA data mitigation aggregation service resides on the local network.


Table 1. Calculating service acceleration metrics
ActionNo caching²Caching enabled²
Time to process first request1765¹2218¹
Time to store the request in cache453¹
Subsequent identical or redundant requests1765¹320¹
Internet bandwidth used30,400 Kbits304Kbits
Total time used2941 minutes533 minutes

¹All times noted are in milliseconds; 1 second = 1,000 milliseconds.

²Assumptions:

  • 100 Mbit Ethernet connection from consumer to cache service and DSL connection at 1.5 Mbits/second up and down.
  • Time to Live (TTL) of 60 seconds.
  • Request/response is 38,000 bytes combined.
  • 100,000 requests during the TTL period.
 

  

In a FastSOA implementation, an XQuery implements the parts ordering service. The XQuery makes a request to an inventory service, reads the content of the response, and determines at runtime if a previously stored response may be played-back instead of going to the inventory service again.

This implements a FastSOA data mitigation aggregation architecture in a service environment. The combination of XQuery and a native XML database delivers a service that plays-back the previously cached response data, provided the request matches the previous request and the data is still fresh. The result is service acceleration.

FastSOA technology choices

You can implement the FastSOA architecture using Java code and relational database technology. However, I found significant performance and scalability problems when testing service bindings created with Java objects and when persisting XML using relational databases. These problems are significant enough that it makes sense to consider alternatives of XQuery, XSLT, and native XML database technology.

My interest in XQuery comes from its implementation as a native XML environment for application development. Much like the early days of Java technology, the XQuery community is filled with energy to build out and prove XQuery as a development platform. Indeed, most XQuery implementations have extended beyond the XQuery standard to enable XQuery to make SOAP requests. For instance, an XQuery might make queries to other services, to J2EE objects, and to data sources through JDBC, SOAP, and JMS protocols. Additionally, there are already 10 or more highly viable commercial and open-source XQuery implementations.

Lastly, FastSOA uses a native XML database for mid-tier caching because SOA data is normally encoded in XML format, and relational databases do a poor job of persisting and indexing hierarchical unstructured data such as XML. Relational databases that store XML data usually use Binary Large Object (BLOB) field types to store XML. This is neither efficient nor easily indexed for rapid searching. Relational approaches are also not normally optimized to work with streaming data. When sent across a Web service-based network, XML messages are ideal for processing with a stream-based approach that is foreign to relational databases.

Where FastSOA takes you

Adopting mid-tier service caching offers many benefits to your organization over-and-above the SOAP binding performance discussed in this article. Additional benefits include mid-tier schema transformation, service versioning, policy routing, and quality-of-service (QOS) processing. For example, FastSOA provides mid-tier XML message schema transformation to enable compatibility between services that require different and incompatible message types.

Summary

This article looked at SOA performance and scalability acceleration, and detailed the benefits of SOA designs that incorporate XML persistence in the mid-tier using XQuery support. The FastSOA design uses native XML persistence combined with XQuery so that as each service call is received, the mid-tier decides if it should respond with a cached value from a previous request or pass through the request. The service uses XQuery to determine if the cache is valid based on a query to the metadata description of the service request.

The author thanks Darin MacBeath, William Martinez Pomares, and Bob Albo for their feedback and suggestions that improved this article.



Resources

Learn

Get products and technologies

Discuss
  • XQueryNow is a free on-line community for software developers, architects, and IT managers working in XML environments.



About the author

Frank Cohen (frank.cohen@rainingdata.com) is the go-to guy when enterprises need to build, test, and solve problems in complex interoperating information systems. Frank is author of Java Testing and Design: From Unit Tests to Automated Web Tests available now at http://thebook.pushtotest.com. He is also the principal maintainer of the popular TestMaker open-source test utility and framework, and Director of Solutions Engineering at Raining Data Corporation, publisher of the TigerLogic XDMS XQuery engine and native XML database.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top