 | Level: Advanced Peter Heneback (peter.heneback@uk.ibm.com), Consultant, IBM
09 May 2006 XSLT stylesheets are designed to transform XML documents. Coupled with Java extensions, stylesheets can also be a powerful complement to XML Schema when grammar-based validation cannot cover all the constraints required. In this article, Peter Heneback presents the case for validating documents using XSLT with Java extensions and provides practical guidance and code samples.
Background
Grammar-based validation languages, such as XML Schema and DTD, are well equipped to ensure that XML documents conform to a well-defined message structure. This ensures that incoming XML messages can be processed correctly by the receiving applications, but it does not ensure that the data contained in the messages is valid. The limitations of grammar-based validation languages mean that, for example, you have to validate co-occurrence constraints and constraints against variable and external data sets using different methods.
In many cases, validation logic that you cannot implement in XML Schema or DTD is incorporated into application code. That solution is relatively easy to implement, but often results in an inflexible implementation. This article first investigates Schematron as an option to solve the preceding problems and then highlights some of the disadvantages of this approach. Then it explores an alternative solution using well-established W3C standard components coupled with Java extensions and open source XSLT processors.
Schematron
One commonly proposed solution to complement XML Schema is to use Schematron. Schematron is a rule-based language that uses XPath to express assertions about the content in an XML instance document. This is done by transforming the Schematron schema with a base stylesheet, which turns the schema into an XSLT stylesheet. The stylesheet checks the assertions defined by running the XML instance document through an XSLT processor. The result of the transformation is a report, in XML format, that contains details on which assertions failed along with the comments provided with that particular rule in the schema. Schematron, however, is not very well suited to define structure, which quickly becomes a laborious exercise. The need to first validate the document with an XML Schema is, therefore, preserved, but together XML Schema and Schematron can cover the validation requirements of most applications. In fact, because Schematron assertions are in a different namespace from the XML Schema grammar, the two can usually be included in the same file and then be separated as a part of the document validation process.
Figure 1 depicts the logical processing steps of a fairly common application scenario. XSLT first validates an incoming XML instance document and then transforms it before the instance document is either processed by the application itself or sent on to an external application. As is evident
from the diagram, this becomes a complex operation when using a combined XML Schema and
Schematron to first perform the validation and then transform the validated document using an
XSLT stylesheet. You can shorten the process by separating the schemas and transforming the
Schematron schema beforehand, but this still requires two runs through an XSLT processor as well
as parsing and inspecting the validation report produced by the extracted Schematron transformation.
Figure 1. Validating using Schematron
Disadvantages of Schematron
- You need to transform the Schematron schema at least once using the base stylesheet before you can use it to validate an XML document. If the Schematron syntax is included in an XML Schema, further transformations are required.
- Schematron currently doesn't offer the possibility of reporting straight back to the application. You need to do a report processing step after validation, either manual or automated.
XSLT and Java extensions
In this section, I present XSLT with Java extensions as an alternative for complementing the grammar-based validation of XML Schema with a rule-based approach. I take you through a set of simple examples that illustrate how to do this. Instruction on how to implement an XML validator using XML Schema is not included because that has been covered extensively in other articles and tutorials offered by developerWorks (see Resources).
Figure 2 shows the same scenario as Figure 1, but uses XSLT with Java extensions to perform validation and transformation in one step. Instead of producing a report that you must process separately, a validation failure is thrown directly to the controlling application as an object of the ValidationException class, which extends the Exception class. Besides reducing the number of XSLT processors the document has to pass through, the transformation is halted at the first validation failure, thereby preventing unnecessary processing of invalid data.
Figure 2. Validating using XSLT and Java extensions
Simple transformation of an XML document
To demonstrate how to use XSLT and Java for validation, I use a simple transformation of a registry containing employee details, such as name, phone number, title, and gender as sample input. As you see later, the title and gender details are at the center of the co-occurrence constraints example. Listing 1 contains a part of the input XML document.
Listing 1. Input XML document
<input:staff xmlns:input="cross-field-validation-namespace">
...
<input:employee id="1234A">
<input:first_name>Julia</input:first_name>
<input:last_name>Smith</input:last_name>
<input:title>Mrs</input:title>
<input:gender>F</input:gender>
<input:telephones>
<input:mobile preferred="false">0770-555 1231</input:mobile>
<input:mobile preferred="true">0771-555 1232</input:mobile>
<input:home preferred="false">0207-555 1233</input:home>
</input:telephones>
</input:employee>
...
</input:staff>
|
The simple transformation in this example is a concatenation of the first and last names from the employee registry. The transformation also extracts the telephone number with the preferred attribute set to true as shown in Listing 2 .
Listing 2. Simple transformation of employee data
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns:xalan="http://xml.apache.org/xslt"
xmlns:input="cross-field-validation-namespace">
<xsl:template match="/input:staff">
<phones>
<xsl:apply-templates select="input:employee" />
</phones>
</xsl:template>
<xsl:template match="input:employee">
<employee>
<name>
<xsl:value-of select="concat(input:first_name,' ',input:last_name)" />
</name>
<tel>
<xsl:value-of select="input:telephones/*[@preferred = 'true']" />
</tel>
</employee>
</xsl:template>
</xsl:stylesheet> |
Listing 3 shows the output document created by transforming the input document with the XSLT from Listing 2 .
Listing 3. Output document
<phones xmlns:input="cross-field-validation-namespace"
xmlns:exception="xfield.exception.ValidationExceptionThrower"
xmlns:xalan="http://xml.apache.org/xslt">
<employee>
<name>Julia Smith</name>
<tel>0771-555 1232</tel>
</employee>
<employee>
<name>John Smith</name>
<tel>0207-555 1236</tel>
</employee>
<employee>
<name>Jenny Smith</name>
<tel>0770-555 1237</tel>
</employee>
</phones>
|
Implementing simple co-occurrence constraint validation
Two very simple Java classes are all that you need to communicate validation errors directly back to the controlling application through the XSLT processor. For this, you have to create a ValidationException and a ValidationExceptionThrower class. The ValidationException class is a simple extension to the standard Java Exception class, and it enables the controlling application to distinguish validation errors from other exceptions thrown by the processor. The ValidationExceptionThrower class simply throws ValidationException when its throwException method is called. A separate class is required for this purpose. When you use Java extensions in XSLT, it is not possible to use the regular Java throw syntax to throw the exception. You can only use Java extensions for XSLT to create objects and call methods.
Listing 4 and 5 show the complete source codes of the two classes required for the Java extensions.
Listing 4. ValidationException class
package xfield.exception;
public class ValidationException extends Exception{
public ValidationException(String sMsg)
{
super(sMsg);
} // end constructor
} // end class
|
Listing 5. ValidationExceptionThrower class
package xfield.exception;
public class ValidationExceptionThrower {
public ValidationExceptionThrower()
{
// Emtpy
} // end constructor
public void throwException(String sMessage) throws Exception
{
throw new ValidationException(sMessage);
} // end throwException
} // end class
|
Checking co-occurrence constraints
As I stated earlier, the title and gender are used as an example of a common co-occurrence constraint. You want to check that the title and gender elements match up; for example, if title is set to Mr, then gender should be set to M for male. Validating that the title and gender elements separately contain valid values is, however, best done using enumerations in XML Schema and is not tested here.
To validate, place the transformation code inside the <otherwise/> block of a <choose/> clause and check the condition as a test statement inside the <when/> block. If the test evaluates to false, that is, the condition has validated successfully, then the transformation code is executed. If the test evaluates to true, a Java extension is used to call the throwException() method on an instance of the ValidationExceptionThrower class. Note the addition of a namespace with the prefix exception that contains the class name, including packages, in the root tag of the XSLT. The prefix is then used much like an object name to call the throwException() method with an error string as the argument. Going back to the Java code for the ValidationExceptionThrower class, it is then easy to see how a ValidationException is thrown containing the error string from the XSLT.
To check further conditions, add the required <when/> statements with appropriate error messages provided as arguments to the methods.
Listing 6. Basic transformation and co-occurrence constraint validation
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns:xalan="http://xml.apache.org/xslt"
xmlns:exception="xfield.exception.ValidationExceptionThrower"
xmlns:input="cross-field-validation-namespace">
...
<xsl:template match="input:employee">
<xsl:choose>
<xsl:when test="input:gender = 'M' and input:title != 'Mr'
or input:gender = 'F' and input:title = 'Mr'">
<xsl:value-of
select="exception:throwException('Gender and title do not match')"/>
</xsl:when>
<xsl:otherwise>
<employee>
<name>
<xsl:value-of select="concat(input:first_name,' ',input:last_name)"/>
</name>
<tel>
<xsl:value-of select="input:telephones/*[@preferred = 'true']"/>
</tel>
</employee>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet> |
Adding XPath expressions to the exception message, as shown in Listing 7, results in a more detailed description of the validation failure, which greatly simplifies locating the problem.
Listing 7. Detailed validation failure information
<xsl:when test="input:gender = 'M' and input:title != 'Mr'
or input:gender = 'F' and input:title = 'Mr'">
<xsl:value-of select="exception:throwException(
concat('Gender and title do not match for employee ',
input:first_name,' ',input:last_name))" />
</xsl:when>
|
Separating validation from transformation
If it is a requirement that transformation and validation logic are kept separate, you can use a standard <include/> instruction to reference the validation checks and execute them together with the transformation code. Listing 8 shows the transformation code with the <include/> tag referencing the file validate.xsl. Note also the added call to the template named validate; it is important that this call matches the name of the template containing the condition checks in the included file, shown subsequently in Listing 9.
Listing 8. Extracted transformation logic
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns:xalan="http://xml.apache.org/xslt"
xmlns:input="cross-field-validation-namespace">
<xsl:include href="validate.xsl" />
<xsl:template match="/">
<xsl:call-template name="validate" />
<phones>
<xsl:apply-templates />
</phones>
</xsl:template>
<xsl:template match="input:staff/input:employee">
<employee>
<name>
<xsl:value-of
select="concat(input:first_name,' ',input:last_name)" />
</name>
<tel>
<xsl:value-of
select="input:telephones/*[@preferred = 'true']" />
</tel>
</employee>
</xsl:template>
</xsl:stylesheet>
|
Listing 9. Extracted validation logic
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0" xmlns:xalan="http://xml.apache.org/xslt"
xmlns:exception="xfield.exception.ValidationExceptionThrower"
xmlns:input="cross-field-validation-namespace">
<xsl:template name="validate">
<xsl:for-each select="/input:staff/input:employee">
<xsl:if
test="input:gender = 'M' and input:title != 'Mr'
or input:gender = 'F' and input:title = 'Mr'">
<xsl:value-of
select="exception:throwException('Gender and title do not match')" />
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
|
Validating against external reference data
XML Schema can use enumerations to compare incoming data against a predetermined set of acceptable values; however, if the reference data is variable and itself containing relationships, XML Schema has no means of validating against it. As this example shows, a solution using XSLT and Java extensions can handle this scenario and validate against variable and external XML data sets. Note that Schematron also provides this functionality, as the transformed schema uses XSLT to perform the validation.
Listing 10. Reference data
<roles>
...
<employee id="1234A" role="A"/>
<employee id="1234D" role="A"/>
<employee id="1234C" role="X"/>
<employee id="1234B" role="Z"/>
<employee id="1234X" role="Z"/>
...
</roles>
|
The external reference data in this example is a set of employees with roles associated.
Check that all employee IDs in the incoming XML document exist in the reference data set. To accomplish this, use the document() function to load an external XML document into a variable. Check that at least one employee with the current ID exists in the reference variable to assert that the employee number is valid.
Listing 11. Validation against external reference data
<xsl:variable name="reference" select="document('reference.xml')" />
<xsl:template name="validate">
<xsl:for-each select="/input:staff/input:employee">
<xsl:if test="input:gender = 'M' and input:title != 'Mr'
or input:gender = 'F' and input:title = 'Mr'">
<xsl:value-of
select="exception:throwException(concat('Gender and title do not match
for employee ',input:first_name,' ',input:last_name))" />
</xsl:if>
<xsl:variable name="current_id" select="@id" />
<xsl:if
test="count($reference/roles/employee[@id = $current_id]) = 0">
<xsl:value-of
select="exception:throwException(concat('Invalid employee ID: ',$current_id))" />
</xsl:if>
</xsl:for-each>
</xsl:template>
|
Conclusion
In conclusion, Schematron is a good complement to XML Schema when you deal with co-occurrence constraints, for example, or if you require an XML reporting tool. When performance is of a greater importance, and especially when the validation is followed by a transformation, XSLT with Java extensions can provide a more compact solution.
Download | Description | Name | Size | Download method |
|---|
| Java and XSLT source code used in this article | advanced_xml_validation.zip | 18KB | HTTP |
|---|
Resources Learn
- Schematron official home page: Learn more about Schematron.
- A hands-on introduction to Schematron: Practice using Schematron iIn this tutorial (Uche Ogbuji, developerworks, September 2004).
- XML programming in Java technology, Part 1: Learn the basics of manipulating XML documents using Java technology in this tutorial (Doug Tidwell, developerworks, January 2004). Also look at Parts 2 and 3 in this series.
- XML Schema validation in Xercer-Java 2: Work through the process of using schema validation with Xerces-Java in this tutorial (Nicholas Chase, developerworks, July 2002).
- developerWorks XML zone: Find more XML resources here, including articles, tutorials, tips, and standards.
- IBM Certified Solution Developer -- XML and related technologies: Learn how to get certified.
Get products and technologies
- Saxon: Download the Saxon XSLT processor.
- Xalan-J: Download the Xalan-J XSLT processor.
- Xerces: Download the Xerces XML parser.
About the author  | 
|  | Peter Heneback recently graduated with an MSc in Computer Science and has since been working for IBM United Kingdom Limited as a consultant specializing in integration technology, Java, and XML. He has worked with both grid and Web service implementations and previously published an article on grid security. |
Rate this page
|  |