Skip to main content

skip to main content

developerWorks  >  XML  >

Working XML: UML, XMI, and code generation, Part 4

Concept mapping

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Intermediate

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

18 Aug 2004

In this final article in his series on UML and XML, Benoît wraps up the technique. He discusses the need to simplify the model by burying some of the logic in the XSLT stylesheet. He also points out several common pitfalls. Share your thoughts on this article with the author and other readers in the accompanying discussion forum. (You can also click Discuss at the top or bottom of the article to access the forum.)

This article concludes this series on modeling XML applications with the industry-standard UML. The previous installments (see Resources) left one question open: What if you have more than one possible relationship between the UML model and the XML vocabulary? The article further refines what has been an ongoing theme for the series: Modeling is about simplifying reality for practical purposes.

As you have seen, I advocate a realistic and flexible approach that is tailored to the needs of your projects. This article will work on the few remaining loose threads to help you apply this material in your context. With the stylesheets introduced so far and a modeling tool (such as IBM® Rational Rose®), it is easy to start modeling your XML project in an industry-compliant way.

Modeling

As I have stated many times in this series, a model is not drawn in a vacuum; it is created to serve a specific purpose. A model is a simplified representation of certain aspects of the reality, and this simplification makes it easier to analyze the underlying reality and ultimately understand it better.

For this series, the reality is an XML vocabulary or a Web service. Admittedly, that's already abstract reality. Yet if you try to read the text of an XML schema, you will quickly understand why it pays to simplify it. The amount of distraction that's caused by the (somewhat convoluted) schema syntax cries out for simplification. One of the most obvious gains of UML is that, because it's a graphic language it is more readable than markup. Another advantage is that UML offers a more synthetic view: One glance at a model gives you a rough idea of the number of classes and the complexity of the relationships. Last but not least, UML drops many low-level syntactical details such as namespace prefixes, local and global elements, and whether a concept is an element or an attribute.

Ideally, modeling will help you better understand your application, and therefore produce more suitable XML vocabularies or design stronger Web services APIs.

Model refinement

How much simplification is appropriate depends on the specifics of your application as well as how refined the model is. As I've shown you, modeling is not a one-shot deal (except with the most trivial applications). Typically, modeling starts with an informal session in which you collect the basic definitions and the most simple relationships in the model. You then refine the model during several review sessions. These iterations gradually build a more formal and more complete model (typically moving from the whiteboard to the modeling tool). Ultimately, the UML model is converted into an XML schema, which is a very formal description of the XML vocabulary. Alternatively, the model could be processed into a WSDL file -- again a formal description of a Web service. You can use the same model to generate Java classes.

Take a look at the final stage: the processing of the UML model into the more precise XML schema. As you have seen, this is surprisingly easy to do if you follow a simple methodology. Many UML modeling tools (such as IBM Rational Rose) store the model according to the definitions of the UML metamodel.

Simply put, the UML metamodel is the set of classes that represent a UML model. From comments to packages, including classes themselves, every concept in UML has a metaclass. The Object Management Group (OMG) has also standardized an XML representation of the XML Metadata Interchange (XMI) metamodel. XMI makes the model accessible to XML developers. Actually, I should say "sort of standardized" because different modeling tools (and even different versions of the same tool) can interpret XMI differently. In practice, the differences are small and it's trivial to cope with them in the stylesheet.

Anyway, to generate an XML schema from a UML model, it suffices to decide which concept from the UML metamodel matches which XML schema tag. For example, it's obvious that a UML class will become an XML element. Since the UML metamodel is stored in XML, generating the schema is as simple as writing an XSLT template that matches all instances of UML:Class and converts them to xs:element.

It is also possible (and often desirable) to implement the reverse stylesheet to generate UML models from XML schemas. This is particularly handy when you need to integrate standard vocabularies into your design. Such vocabularies are seldom distributed in UML form, but rather as XML schemas. With the appropriate stylesheet, it does not take long to reverse-engineer them into a model.

And beyond

To write a stylesheet that matches elements from XMI and transforms them into XML schema elements is conceptually simple enough. The stylesheets I provided in my previous articles were neither particularly long nor abnormally complex.

That's the theory, at least. In practice, things can get out of hand if you aren't careful. First, the XSLT coding -- although never dramatic -- can be involved. Review the stylesheet I introduced in my last column and pay special attention to the template for UML:Class. It is far from the most complex XSLT template I have written, but it's not as straightforward as the discussion above would lead you to believe. So make sure you hone your XSLT skills before you tackle this project (or just rip off my stylesheet).

Secondly, and more importantly, it is not always simple to decide which UML concept matches which XML concept. In the previous article, I pointed to stereotypes and tags as tools to extend the UML model and support XML concepts that have no equivalent in UML. Stereotypes and tags are helpful, and you may be tempted to cover every single aspect of XML schema through stereotypes and tags. Resist the temptation.

Remember that by definition a model is a simplification, so it makes sense that a model, even a refined one, should not include all of the nitty-gritty implementation details. Many aspects are best left out of the model and buried in the stylesheet itself.



Back to top


Decision time

The W3C XML Schema Recommendation is complex, and you probably don't need to worry about its every detail, so it pays to decide on a subset of the features that you need and will use for a given project. Don't waste your time with the rest of the recommendation.

A clearer model

What should you include and what should you leave out? Unfortunately, I do not have a specific answer to this question. The best course of action is to include those aspects that are important for your project and leave out the rest. Of course, that leaves the issue of deciding what is important to your project.

For most applications, you want to bury the differences between global and local elements, as well as between element and attributes. In practice, it is more important to define the data fields that are needed rather than to decide on whether those fields should be elements or attributes. After all, elements and attributes are just that: fields where you can store data.

Although there are exceptions, the distinction between elements and attributes is often an implementation detail that's largely irrelevant and distracting for the designer. Therefore, although it may be tempting to use different UML concepts (and possibly stereotypes) to model elements and attributes, I advise you not to.

Why ignore the distinction between elements and attributes? Because it confuses the model and adds very little useful information. Compare the three models in Figure 1:


Figure 1. Three UML models
Three UML models

In counter-clockwise order, the first model (top left) uses stereotypes to mark elements. The second model (bottom) reserves UML attributes for XML attributes, and models XML elements as associations (different UML concepts mark the differences in XML). The last model (top right) makes no such distinction. Which one is the clearest? Keep in mind, this is a simple model. Imagine if there were dozen of classes in each case -- which one would be the most readable? Which one would print out on a the smallest amount of paper? It should be clear that the last model is the most readable.

It pays to treat a UML diagram as a user interface. As much as possible, you should minimize clutter and encode the information in a concise and readable way.

Obviously, if you take information away from the UML model, you need to make it available somewhere else. That's the role of the XSLT stylesheet: It must not only convert between UML and XML, but also implement rules that ensure an efficient conversion. The stylesheet introduced in the previous articles makes the following decisions:

  • It never creates XML attributes. Your mileage may vary, but elements can encode all the information required for such a project.
  • It maps UML classes to global elements and UML attributes to local elements. This is primarily to avoid name clashes: If two classes have the same attribute and are mapped to global names, they may conflict.

These two simple rules suffice to add all the information that's missing from the model. As an added bonus, if I decide to change the rule (for example, to no longer use local elements), I only need to change the stylesheet; I don't need to change the model. If those details were written in the model, I would need to update it.

But not simplistic

You may disagree with my position on attributes. Although attributes are not important in this particular application, your application may differ. Different projects emphasize different aspects of the XML syntax:

  • E-business projects tend to concentrate on the class structure and not worry too much about the actual XML syntax.
  • Industry groups tend to invest a lot in the class hierarchy, and try to enforce markup reuse (where an element is reused in different contexts).
  • Business people tend to focus on the business process more than the actual data.
  • Publishing projects tend to worry about the ease of hand-coding the XML markup, because authors may have to write XML documents by hand.

So if attributes are crucially important to your project, make them visible in the model. Again, think of the UML diagram as a user interface into your XML vocabulary. You don't want the UI to hide essential aspects. The model is a tool that has no firm rules on what must appear and what must be left out. You should capture all the information needed for your application and no more.

As you can see, I am not advocating that a single standard mapping can work perfectly for all of your XML applications. XML applications have such diversity that I cannot envision a single mapping that is appropriate for all of them.

Obviously 95% of the UML representation of XML is common to every project, and you can use the stylesheets I have provided as a good starting point. Note that the situation is similar for SQL code generation: You often need to fine-tune code generation to your database.

A word of warning

While I am encouraging you to fine-tune the UML-to-XML mapping to best suit the needs of your project, I recommend that you do so within the framework of UML. You do not need to deviate from standard UML.

Here's an example. Figure 2 is a model that I have often seen used, and it follows bad practice. Specifically, the problem lies in the make-element and namespace-uri attributes.


Figure 2. Poor modeling practice
Poor modeling practice

Presumably, an address does not have a make-element attribute. Instead, the attribute is a hint to the stylesheet to generate a given syntax. The attribute encodes information not about the address, but about the XML coding of the address. This is both dangerous and useless.

It is dangerous because it perverts the definition of UML attributes. An attribute should provide information about its class, not about XML syntax. The result is non-portable, it confuses readers of the model, and it may result in serious maintenance headaches.

Furthermore, it is totally useless because UML provides the extension mechanism (stereotypes and tags) to address this need. If you need to identify a special sort of class or add metalevel information to a class, then you must use UML extensions. Again, while I am advocating that you customize your UML-to-XML mapping to best suit your project, I strongly recommend you do so in the standard manner.



Back to top


Conclusion

I trust this series has given you insight into UML modeling of XML applications. Interest in modeling XML applications with UML is growing, if only because UML models can be shared with Java, C++, and other languages. I have reviewed the tools (UML metamodel, XMI, and XSLT) to make modeling of XML applications a reality. With a modeling tool and the stylesheets I have provided, you are ready to go.

If you have comments or questions about this series, join the discussion in the forum (see Resources).



Resources

  • Participate in the discussion forum.

  • Review the previous installments of this series by Benoît Marchal:
    • Part 1 discusses the relationship between UML and XML schema (developerWorks, March 2004).
    • Part 2 introduces the UML metamodel and then proceeds to XMI, the XML-based specification for the exchange of models. The author then shows how to map from the metamodel to XML Schema. (developerWorks, May 2004).
    • Part 3 discusses stereotypes and tags as tools to extend the UML language (developerWorks, June 2004).

  • Check out the UML specification for the complete UML metamodel, at the Object Management Group site.

  • Explore IBM Rational Rose, the leading UML modeling product. You'll also find plenty of Rational and UML resources on the Rational section of developerWorks.

  • Begin to explore the techniques described in this series with ArgoUML and Poseidon for UML (Community Edition). These free modeling tools can export to XMI, although both tools are more limited than IBM Rational Rose.

  • Read "Strike a balance: Users' expertise on interface design" (developerWorks, September 2003) by Mike Padilla. It offers advice on user interface design. I think much of that information applies to the mapping between UML and XML. Within the constraints of your modeling tool of choice, you'll want to simplify just enough for the task at hand.

  • Investigate The Inmates Are Running The Asylum by Alan Cooper, one of the best books on user interface -- make that one of the best books on computer science.

  • Find hundreds more XML resources on the developerWorks XML zone, including previous installments of Benoît Marchal's Working XML column.

  • Browse for books on these and other technical topics.

  • Find out how you can become an IBM Certified Developer in XML and related technologies.


About the author

Photo of Benoit Marchal

Benoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at www.marchal.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top