Skip to main content

skip to main content

developerWorks  >  Java technology  >

Java theory and practice: Enable initialization atomicity

The self-return idiom makes for more usable API design

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Intermediate

Brian Goetz (brian@quiotix.com), Principal Consultant, Quiotix

27 Apr 2005

Decisions made during API design can have an effect on the API's usability. In designing an API, you need to put yourself in your user's shoes, imagining how the API might be used, and try and make the common use cases convenient for the user. This month, columnist Brian Goetz discusses an API design technique, the self-return idiom, that can make life easier for users of your API in certain circumstances.

In 1985 I had a summer job developing mainframe-based business applications in APL. For those who don't remember APL, it was a very terse language, requiring a special keyboard with all sorts of weird symbols, but it offered some extremely powerful mechanisms for manipulating data stored in arrays. As an example, Conway's "Life" cellular automata game can be implemented in 30-40 characters of APL code, and a program to find all prime numbers below a certain number could be written in 20 characters. (To the uninitiated, such an APL program pretty much looks like line noise.) APL jocks liked to joke that any program could be written in one line in APL. (Reading such a program, on the other hand, was not as easy.)

Macho competition and obfuscated programming contests aside, in what situations is it valuable, from a software engineering perspective, to be able to perform an arbitrary sequence of operations on an object in a single expression? In the Java™ language, several cases exist where being able to instantiate and initialize a complex object in a single expression can improve the code's readability (field initializers, method parameters). There is even one situation where it is downright inconvenient if instantiation and initialization cannot be completed in a single expression (using constructor arguments to instantiate a new object and then passing the new object to a super() or this() constructor).

Mutability: Yes, no, and sometimes

Some objects are immutable (meaning that once constructed, their state does not change), whereas others are mutable. For some immutable objects, immutability is guaranteed by the classes' implementation (such as the String class); for others, immutability is simply assumed by convention, specification, or documentation. (The virtues of immutable objects -- simplicity, safety, thread-safety -- have been extolled in this column and elsewhere, so I won't belabor them here.) Whether immutability is enforced or simply a convention for a given object, the behavior is the same -- once the object is initialized, its state is not modified again.

Some entities can be sensibly modeled as either mutable or immutable objects. The String class is immutable, but that was simply one sensible way to implement a string class. (The C++ STL implements a mutable string class, which is another sensible way to implement String.) Other objects can only be mutable -- for example, it would make no sense for a counter to be immutable. Strictly defined, a mutable object is any object whose state can be changed in an observable way after construction. But by defining mutability only in terms of whether state might ever change, we can miss out on some important object lifecycle distinctions.

Mutability lifecycles

Some objects are mutated throughout their lifecycle -- such as counters or other status-holding objects. Others are initialized to a desired state, perhaps through a series of calls to setters or other mutative methods, and then not modified again until they are garbage collected. Strictly speaking, such objects are mutable, because their state was not set entirely in the constructor and can be mutated, but from the perspective of the program that uses them, they might as well be immutable. Take, for instance, a Properties object that is used by an application to hold the contents of a properties file for configuration purposes. Early in the application, the Properties object is instantiated and loaded with values from the file, but thereafter, it is not modified again. The lifecycle of this Properties object has two phases -- a phase where it is being initialized (treated as mutable) and a phase where it is being used (treated as immutable). Typically, a Properties object used in this manner is not published to the rest of the application until the first phase is complete. So, from the perspective of the rest of the application, the Properties object might as well be immutable.

This phase-change behavior is quite common. Some classes, such as SimpleDateFormat, tend to be used in such a two-phase manner almost exclusively -- once the formatting options are set, the formatter may be used many times, but the settings tend not to be changed after the formatter is "fully initialized." Other classes, such as Properties or HashMap, are sometimes used in this two-phase manner, but frequently treated as fully mutable objects as well. No established name exists to refer to objects that have this two-phase lifecycle, so I'm going to make one up: immutable-once-initialized (IOI). IOI objects generally have a lifecycle that goes something like: construct-modify-modify-modify-publish-use-use-use.



Back to top


The self-return idiom

API designers can make life easier for programmers by anticipating when an object might be used in an IOI manner, and in those situations, using the "self-return idiom" to facilitate easier initialization. The self-returning idiom involves having mutator methods (setXyz() and appendFoo()) return the this reference after performing their action. The StringBuffer class illustrates the self-return idiom -- all the append() methods return a reference to the StringBuffer itself after updating the state of the internal buffer, as shown in Listing 1:


Listing 1. Self-return idiom in StringBuffer.append()

public StringBuffer append(String str) {
    // append str to the internal buffer
    return this;
}

The benefit of the self-return idiom is that it enables you to chain multiple calls together, rather than writing them each out as separate statements:

stringBuffer.append("a=").append(a)
    .append("; b=").append(b);

This code is more readable and more compact than the alternative, which involves four statements. Using the self-return idiom generally has little negative effect on the API design, as many mutative methods (setters, add() and append()) do not return a value anyway, but it can make life a lot easier for your callers. It's too bad more classes don't follow StringBuffer's lead -- it could make some classes a lot more convenient to use.

Static initialization

How many times have you wanted to statically initialize a Set with several known values, but not intended your program to modify the Set? Using the existing Collections classes, this approach would require a static initializer block, and for the collection to be initialized in a different place than it is constructed. Let's say you wanted to pre-initialize a Set with some regular expression patterns you are going to search for in a document. Using the existing API, you would have to do it like Listing 2:


Listing 2. Statically initializing a Set

private static Set<Pattern> patternSet = new HashSet<Pattern>();
static {
    s.add(Pattern.compile("\b(roast beef)\b"));
    s.add(Pattern.compile("\b(on rye)\b"));
    s.add(Pattern.compile("\b(with mustard)\b"));
}

Granted, writing a static initializer and putting it near the object's declaration is not a terrible hardship, but it is somewhat annoying, and the more that initialization and declaration are separated, the greater the chance that future modifications will subvert an intended invariant. Further, if you want to make patternSet immutable from the perspective of your program (a good practice, because it prevents subtle coding errors), you would have to instantiate a temporary Set in the static initializer block, wrap it with Collections.unmodifiableSet(), and then stuff the wrapped set back into patternSet.

In this case, it would have been nice if the Collections classes used the self-return idiom, because then we could have constructed and initialized the Set all in one place. But we can still build an adapter that does what we want. Listing 3 shows an adapter class that simplifies the process of initializing a Set:


Listing 3. Set adapter class that adds self-returning append() methods.

public class SetAdapter<T> implements Set<T> {
    private final Set<T> s; 
    public SelfReturnSetAdapter(Set<T> s) { this.s = s; }

    public Set<T> append(T t) { s.add(t); return this; }
    public Set<T> unmodifiableSet() { return Collections.unmodifiableSet(s); }

    // delegate other Set methods to s
}

Now, using SetAdapter, we can initialize the set of patterns more easily, and without separating the initialization of the set from the initialization of the variable. As an added bonus, we can easily "close" the set by having the last call wrap the set with an unmodifiable wrapper, and still make the patternSet variable final without introducing temporary variables. The only loss of transparency is that we cannot override the add() method to return a value, so we have to give our mutative methods different names, such as append(). Listing 4 shows patternSet initialized inline with SetAdapter instead of with a static initializer block:


Listing 4. patternSet initialized inline with SetAdapter

private final static Set<Pattern> patternSet 
    = new SetAdapter(new HashSet<Pattern>())
          .append(Pattern.compile("\b(roast beef)\b"))
          .append(Pattern.compile("\b(on rye)\b"))
          .append(Pattern.compile("\b(with mustard)\b"))
          .unmodifiableSet();

Instantiating DOM documents

If the designers of the DOM API understood this concept, building representations of XML documents would be a lot easier. (Sure, criticizing the DOM APIs is a bit like shooting fish in a barrel.) Suppose we want to build the following XML document, representing an article and its embedded links:

<article title="Flossing Penguins - A Dentist's Journey to the Pole"
       author="Jeremy Stringfellow, DMD"
       url="http://www.penguinfloss.com/travel/stringfellow.html">
   <link anchor="Glide Floss" url="http://www.crest.com/glide/index.jsp" />
   <link anchor="Antarctica Facts" url="http://www.cia.gov/cia/publications/factbook/geos/ay.html" />
</article>

Constructing this document with DOM would be an exercise in annoyance, involving many temporary variables. We must create a document, an article element, and two link elements, add the attributes to them, and attach the elements to their parents. Unfortunately, each of these operations must be a separate statement, as shown in Listing 5:


Listing 5. Instantiating the DOM Element

Document document = documentFactory.newDocument();
Element articleElement = document.createElement("article");
articleElement.setAttribute("title", article.getTitle());
articleElement.setAttribute("author", article.getAuthor());
articleElement.setAttribute("url", article.getURL());
        
Element linkElement = document.createElement("link");
linkElement.setAttribute("anchor", link.getAnchor());
linkElement.setAttribute("url", link.getURL());
articleElement.appendChild(linkElement);
        
linkElement = document.createElement("link");
linkElement.setAttribute("anchor", anotherLink.getAnchor());
linkElement.setAttribute("url", anotherLink.getURL());
articleElement.appendChild(linkElement);
        
document.appendChild(articleElement);

Now, suppose the DOM classes supported the self-return idiom for setAttribute() and appendChild(). Each element could be created complete in a single expression, and several temporaries could be eliminated. As a bonus, it is even possible to make the structure of the code look like the structure of the resulting document, as shown in Listing 6:


Listing 6. Instantiating the DOM Element with a fictitious, self-returning DOM API

document.appendChild(
    document.createElement("article")
        .setAttribute("title", article.getTitle())
        .setAttribute("author", article.getAuthor())
        .setAttribute("url", article.getURL())
        .appendChild(
            document.createElement("link")
                .setAttribute("anchor", link.getAnchor())
                .setAttribute("url", link.getURL()))
        .appendChild(
            document.createElement("link")
                .setAttribute("anchor", anotherLink.getAnchor())
                .setAttribute("url", anotherLink.getURL())));

While there isn't all that much less code here, were the API to work this way, entire DOM Elements could be instantiated and initialized in a single statement, which would make it slightly easier (and clearer) to create methods that return DOM Elements, or to initialize DOM Elements in variable initializers. And note that without the self-return idiom, it is impossible to pass a complete DOM Element to a super() or this() constructor without writing a helper function, because the DOM API makes it impossible to build an element in a single statement and the super() or this() constructor must be the first statement in a constructor.



Back to top


Harnessing laziness

The self-return idiom can sometimes improve the readability of code, and enables you to completely initialize a logical entity in a single expression, meaning that you can eliminate temporary variables and helper functions when initializing fields or passing arguments to super() and this() constructors. But there is another benefit of the self-return idiom, which comes from co-opting laziness. By reducing the amount of work involved in using a given API, you increase the likelihood that the API will be used properly and effectively. API designers often do not give this aspect sufficient consideration -- that in many situations a developer is faced with a choice of "doing it right" or "doing it well enough." API designers should encourage developers to do things right by making APIs so easy to use that laziness will not discourage developers from using them. (DOM API designers clearly did not understand this lesson.) When designing an API, you should look for use cases where objects might be created in an IOI manner, and provide appropriate methods for building such objects easily - preferably offering users the opportunity to build them in a single expression so that they can be used in initializers or superclass constructor arguments. Similarly, think about the expected role of setters. Are they truly accessors for updating the state of mutable objects, or are they likely to be used as part of an extended construction process, as in SimpleDateFormat? If the latter, it costs nothing to have them return the this reference.

On the subject of laziness, how often do you give your classes a useful toString() implementation (before being forced to for debugging)? Writing a good toString() is certainly not hard, but laziness often prevents these methods from being written, or from updating then when fields are added to a class. Truth be told, they are annoying to write and modify, involving long string concatenations.

Listing 7 shows a simple utility class, which is a crutch for writing toString() implementations. It is a trivial class, built atop StringBuffer, which allows you to build up the toString() value, appending state variables as you go, in a single expression, using the self-return idiom.


Listing 7. ToString class

public class ToString {
    private StringBuffer sb = new StringBuffer();

    public ToString(String title) { sb.append(title).append(" "); }
    public ToString(Object o) {     this(o.getClass().getName()); }

    public ToString add(String name, String value) {
        sb.append(name).append("=\"").append(value).append("\" ");
        return this;
    }

    public ToString add(String name, Object value) {
        return add(name, value == null? "null" : value.toString());
    }

    public ToString add(String name, int value) {
        sb.append(name).append("=").append(value).append(" ");
        return this;
    }
    // name-value versions for other primitive types  

    public ToString addGroup(String name, String value) {
        sb.append(name).append("={").append(value).append("} ");
        return this;
    }

    public ToString add(String name, String[] value) {
        sb.append(name).append("=[");
        for (int i = 0; i < value.length; i++) 
            sb.append("\"").append(value[i]).append("\" ");
        sb.append("] ");
        return this;
    }

    public String toString() {
        return sb.toString();
    }
}

While the resulting toString() code is again not all that different from the by-hand implementation, it is slightly easier to read and edit (especially with IDEs such as Eclipse). The benefit is not that you save a few seconds writing toString() -- it is that laziness is less likely to inhibit the creation of a toString() method at all. Listing 8 shows a typical toString() method using both the by-hand approach and the ToString approach. (The ToString class can also be used independently of the toString() method for producing informative, structured strings to write as log messages.)


Listing 8. toString() using the by-hand and ToString approach

// by hand
public String toString() {
    return "Address " 
        + "streetAddress=" + streetAddress + " "
        + "city=" + city + " " + "state=" + state + " "
        + "zipCode=" + zipCode + " ";
}


// with ToString
public String toString() {
    return new ToString("Address")
        .add("StreetAddress", streetAddress)
        .add("city", city).add("state", state)
        .add("zipCode", zipCode)
        .toString();
}



Back to top


Conclusion

The self-return idiom is not a particularly deep or revolutionary technique, but it can offer an incremental improvement to the usability of an API. When objects are used in an immutable-once-initialized manner, it is extremely convenient to be able to declare and initialize them in a single statement or expression. While it is possible to work around the limitations of a class that does not permit initialization atomicity through initializer blocks and helper functions, the readability of the code can suffer, and often needlessly so. When writing APIs, think about the likely mutability lifecycle of the object, and consider whether the self-return idiom might make life easier for your callers.



Resources



About the author

Brian Goetz has been a professional software developer for over 18 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California, and he serves on several JCP Expert Groups. See Brian's published and upcoming articles in popular industry publications.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top