 | Level: Intermediate Brian Goetz (brian@quiotix.com), Principal Consultant, Quiotix
27 Apr 2005 Decisions made during API design can have an effect on the API's usability. In designing an API, you need to put yourself in your user's shoes, imagining how the API might be used, and try and make the common use cases convenient for the user. This month, columnist Brian Goetz discusses an API design technique, the self-return idiom, that can make life easier for users of your API in certain circumstances.
In 1985 I had a summer job developing mainframe-based business
applications in APL. For those who don't remember APL, it was a very
terse language, requiring a special keyboard with all sorts of weird
symbols, but it offered some extremely powerful mechanisms for
manipulating data stored in arrays. As an example, Conway's "Life"
cellular automata game can be implemented in 30-40 characters of APL
code, and a program to find all prime numbers below a certain number
could be written in 20 characters. (To the uninitiated, such an
APL program pretty much looks like line noise.) APL jocks liked to
joke that any program could be written in one line in APL. (Reading
such a program, on the other hand, was not as easy.)
Macho competition and obfuscated programming contests aside, in what
situations is it valuable, from a software engineering perspective, to
be able to perform an arbitrary sequence of operations on an object in
a single expression? In the Java™ language, several cases exist
where being able to instantiate and initialize a complex object in a
single expression can improve the code's readability (field
initializers, method parameters). There is even one situation where it is
downright inconvenient if instantiation and initialization
cannot be completed in a single expression (using constructor
arguments to instantiate a new object and then passing the new object
to a super() or this() constructor).
Mutability: Yes, no, and sometimes
Some objects are immutable (meaning that once constructed, their state
does not change), whereas others are mutable. For some immutable
objects, immutability is guaranteed by the classes' implementation
(such as the String class); for others, immutability is
simply assumed by convention, specification, or documentation. (The
virtues of immutable objects -- simplicity, safety, thread-safety --
have been extolled in this column and elsewhere, so I won't belabor
them here.) Whether immutability is enforced or simply a convention
for a given object, the behavior is the same -- once the object is
initialized, its state is not modified again.
Some entities can be sensibly modeled as either mutable or immutable
objects. The String class is immutable, but that was simply one
sensible way to implement a string class. (The C++ STL implements a
mutable string class, which is another sensible way to implement
String.) Other objects can only be mutable -- for example, it would
make no sense for a counter to be immutable. Strictly defined, a
mutable object is any object whose state can be changed in an
observable way after construction. But by defining mutability only in
terms of whether state might ever change, we can miss out on some
important object lifecycle distinctions.
Mutability lifecycles
Some objects are mutated throughout their lifecycle -- such as
counters or other status-holding objects. Others are initialized to a
desired state, perhaps through a series of calls to setters or other
mutative methods, and then not modified again until they are garbage
collected. Strictly speaking, such objects are mutable, because their
state was not set entirely in the constructor and can be mutated, but
from the perspective of the program that uses them, they might as well
be immutable. Take, for instance, a Properties object
that is used by an application to hold the contents of a properties
file for configuration purposes. Early in the application, the
Properties object is instantiated and loaded with values
from the file, but thereafter, it is not modified again. The lifecycle
of this Properties object has two phases -- a phase where
it is being initialized (treated as mutable) and a phase where it is
being used (treated as immutable). Typically, a
Properties object used in this manner is not published to
the rest of the application until the first phase is complete. So,
from the perspective of the rest of the application, the
Properties object might as well be immutable.
This phase-change behavior is quite common. Some classes, such as
SimpleDateFormat, tend to be used in such a two-phase
manner almost exclusively -- once the formatting options are set, the
formatter may be used many times, but the settings tend not to be
changed after the formatter is "fully initialized." Other classes,
such as Properties or HashMap, are sometimes
used in this two-phase manner, but frequently treated as fully mutable
objects as well. No established name exists to refer to objects that
have this two-phase lifecycle, so I'm going to make one up:
immutable-once-initialized (IOI). IOI objects generally have a
lifecycle that goes something like:
construct-modify-modify-modify-publish-use-use-use.
The self-return idiom
API designers can make life easier for programmers by anticipating
when an object might be used in an IOI manner, and in those
situations, using the "self-return idiom" to facilitate easier
initialization. The self-returning idiom involves having mutator
methods (setXyz() and appendFoo()) return the
this reference after performing their action. The
StringBuffer class illustrates the self-return idiom --
all the append() methods return a reference to the
StringBuffer itself after updating the state of the
internal buffer, as shown in Listing 1:
Listing 1. Self-return idiom in StringBuffer.append()
public StringBuffer append(String str) {
// append str to the internal buffer
return this;
}
|
The benefit of the self-return idiom is that it enables you to chain
multiple calls together, rather than writing them each out as separate
statements:
stringBuffer.append("a=").append(a)
.append("; b=").append(b);
|
This code is more readable and more compact than the alternative,
which involves four statements. Using the self-return idiom generally
has little negative effect on the API design, as many mutative methods
(setters, add() and append()) do not return a
value anyway, but it can make life a lot easier for your callers. It's
too bad more classes don't follow StringBuffer's lead --
it could make some classes a lot more convenient to use.
Static initialization
How many times have you wanted to statically initialize a Set with
several known values, but not intended your program to modify the Set?
Using the existing Collections classes, this approach would require a
static initializer block, and for the collection to be initialized in
a different place than it is constructed. Let's say you wanted to
pre-initialize a Set with some regular expression patterns you are
going to search for in a document. Using the existing API, you would
have to do it like Listing 2:
Listing 2. Statically initializing a Set
private static Set<Pattern> patternSet = new HashSet<Pattern>();
static {
s.add(Pattern.compile("\b(roast beef)\b"));
s.add(Pattern.compile("\b(on rye)\b"));
s.add(Pattern.compile("\b(with mustard)\b"));
}
|
Granted, writing a static initializer and putting it near the object's
declaration is not a terrible hardship, but it is somewhat annoying,
and the more that initialization and declaration are separated, the
greater the chance that future modifications will subvert an intended
invariant. Further, if you want to make patternSet
immutable from the perspective of your program (a good practice,
because it prevents subtle coding errors), you would have to
instantiate a temporary Set in the static initializer
block, wrap it with Collections.unmodifiableSet(), and
then stuff the wrapped set back into patternSet.
In this case, it would have been nice if the Collections classes used
the self-return idiom, because then we could have constructed and
initialized the Set all in one place. But we can still
build an adapter that does what we want. Listing 3 shows an adapter
class that simplifies the process of initializing a Set:
Listing 3. Set adapter class that adds self-returning append() methods.
public class SetAdapter<T> implements Set<T> {
private final Set<T> s;
public SelfReturnSetAdapter(Set<T> s) { this.s = s; }
public Set<T> append(T t) { s.add(t); return this; }
public Set<T> unmodifiableSet() { return Collections.unmodifiableSet(s); }
// delegate other Set methods to s
}
|
Now, using SetAdapter, we can initialize the set of
patterns more easily, and without separating the initialization of the
set from the initialization of the variable. As an added bonus, we can
easily "close" the set by having the last call wrap the set with an
unmodifiable wrapper, and still make the patternSet
variable final without introducing temporary variables. The only loss
of transparency is that we cannot override the add()
method to return a value, so we have to give our mutative methods
different names, such as append(). Listing 4 shows
patternSet initialized inline with
SetAdapter instead of with a static initializer block:
Listing 4. patternSet initialized inline with SetAdapter
private final static Set<Pattern> patternSet
= new SetAdapter(new HashSet<Pattern>())
.append(Pattern.compile("\b(roast beef)\b"))
.append(Pattern.compile("\b(on rye)\b"))
.append(Pattern.compile("\b(with mustard)\b"))
.unmodifiableSet();
|
Instantiating DOM documents
If the designers of the DOM API understood this concept, building
representations of XML documents would be a lot easier. (Sure,
criticizing the DOM APIs is a bit like shooting fish in a barrel.)
Suppose we want to build the following XML document, representing an
article and its embedded links:
<article title="Flossing Penguins - A Dentist's Journey to the Pole"
author="Jeremy Stringfellow, DMD"
url="http://www.penguinfloss.com/travel/stringfellow.html">
<link anchor="Glide Floss" url="http://www.crest.com/glide/index.jsp" />
<link anchor="Antarctica Facts" url="http://www.cia.gov/cia/publications/factbook/geos/ay.html" />
</article>
|
Constructing this document with DOM would be an exercise in annoyance,
involving many temporary variables. We must create a document, an
article element, and two link elements, add the attributes to them,
and attach the elements to their parents. Unfortunately, each of these
operations must be a separate statement, as shown in Listing 5:
Listing 5. Instantiating the DOM Element
Document document = documentFactory.newDocument();
Element articleElement = document.createElement("article");
articleElement.setAttribute("title", article.getTitle());
articleElement.setAttribute("author", article.getAuthor());
articleElement.setAttribute("url", article.getURL());
Element linkElement = document.createElement("link");
linkElement.setAttribute("anchor", link.getAnchor());
linkElement.setAttribute("url", link.getURL());
articleElement.appendChild(linkElement);
linkElement = document.createElement("link");
linkElement.setAttribute("anchor", anotherLink.getAnchor());
linkElement.setAttribute("url", anotherLink.getURL());
articleElement.appendChild(linkElement);
document.appendChild(articleElement);
|
Now, suppose the DOM classes supported the self-return idiom for
setAttribute() and appendChild(). Each
element could be created complete in a single expression, and several
temporaries could be eliminated. As a bonus, it is even possible to
make the structure of the code look like the structure of the
resulting document, as shown in Listing 6:
Listing 6. Instantiating the DOM Element with a fictitious, self-returning DOM API
document.appendChild(
document.createElement("article")
.setAttribute("title", article.getTitle())
.setAttribute("author", article.getAuthor())
.setAttribute("url", article.getURL())
.appendChild(
document.createElement("link")
.setAttribute("anchor", link.getAnchor())
.setAttribute("url", link.getURL()))
.appendChild(
document.createElement("link")
.setAttribute("anchor", anotherLink.getAnchor())
.setAttribute("url", anotherLink.getURL())));
|
While there isn't all that much less code here, were the API to work
this way, entire DOM Elements could be instantiated and initialized in
a single statement, which would make it slightly easier (and clearer)
to create methods that return DOM Elements, or to initialize DOM
Elements in variable initializers. And note that without the
self-return idiom, it is impossible to pass a complete DOM Element to
a super() or this() constructor without
writing a helper function, because the DOM API makes it impossible to
build an element in a single statement and the super() or
this() constructor must be the first statement in a
constructor.
Harnessing laziness
The self-return idiom can sometimes improve the readability of code,
and enables you to completely initialize a logical entity in a single
expression, meaning that you can eliminate temporary variables and
helper functions when initializing fields or passing arguments to
super() and this() constructors. But there
is another benefit of the self-return idiom, which comes from co-opting
laziness. By reducing the amount of work involved in using a given
API, you increase the likelihood that the API will be used properly
and effectively. API designers often do not give this aspect
sufficient consideration -- that in many situations a developer is
faced with a choice of "doing it right" or "doing it well enough." API
designers should encourage developers to do things right by making
APIs so easy to use that laziness will not discourage developers from
using them. (DOM API designers clearly did not understand this
lesson.) When designing an API, you should look for use cases where
objects might be created in an IOI manner, and provide appropriate
methods for building such objects easily - preferably offering users
the opportunity to build them in a single expression so that they can
be used in initializers or superclass constructor
arguments. Similarly, think about the expected role of setters. Are
they truly accessors for updating the state of mutable objects, or are
they likely to be used as part of an extended construction process, as
in SimpleDateFormat? If the latter, it costs nothing to
have them return the this reference.
On the subject of laziness, how often do you give your classes a
useful toString() implementation (before being forced to
for debugging)? Writing a good toString() is certainly
not hard, but laziness often prevents these methods from being
written, or from updating then when fields are added to a class. Truth
be told, they are annoying to write and modify, involving long string
concatenations.
Listing 7 shows a simple utility class, which is a crutch for writing
toString() implementations. It is a trivial class, built atop
StringBuffer, which allows you to build up the toString() value,
appending state variables as you go, in a single expression, using the
self-return idiom.
Listing 7. ToString class
public class ToString {
private StringBuffer sb = new StringBuffer();
public ToString(String title) { sb.append(title).append(" "); }
public ToString(Object o) { this(o.getClass().getName()); }
public ToString add(String name, String value) {
sb.append(name).append("=\"").append(value).append("\" ");
return this;
}
public ToString add(String name, Object value) {
return add(name, value == null? "null" : value.toString());
}
public ToString add(String name, int value) {
sb.append(name).append("=").append(value).append(" ");
return this;
}
// name-value versions for other primitive types
public ToString addGroup(String name, String value) {
sb.append(name).append("={").append(value).append("} ");
return this;
}
public ToString add(String name, String[] value) {
sb.append(name).append("=[");
for (int i = 0; i < value.length; i++)
sb.append("\"").append(value[i]).append("\" ");
sb.append("] ");
return this;
}
public String toString() {
return sb.toString();
}
}
|
While the resulting toString() code is again not all that
different from the by-hand implementation, it is slightly easier to
read and edit (especially with IDEs such as Eclipse). The
benefit is not that you save a few seconds writing
toString() -- it is that laziness is less likely to
inhibit the creation of a toString() method at
all. Listing 8 shows a typical toString() method using
both the by-hand approach and the ToString approach. (The
ToString class can also be used independently of the
toString() method for producing informative, structured
strings to write as log messages.)
Listing 8. toString() using the by-hand and ToString approach
// by hand
public String toString() {
return "Address "
+ "streetAddress=" + streetAddress + " "
+ "city=" + city + " " + "state=" + state + " "
+ "zipCode=" + zipCode + " ";
}
// with ToString
public String toString() {
return new ToString("Address")
.add("StreetAddress", streetAddress)
.add("city", city).add("state", state)
.add("zipCode", zipCode)
.toString();
}
|
Conclusion
The self-return idiom is not a particularly deep or revolutionary
technique, but it can offer an incremental improvement to the
usability of an API. When objects are used in an
immutable-once-initialized manner, it is extremely convenient to be
able to declare and initialize them in a single statement or
expression. While it is possible to work around the limitations of a
class that does not permit initialization atomicity through
initializer blocks and helper functions, the readability of the code
can suffer, and often needlessly so. When writing APIs, think about
the likely mutability lifecycle of the object, and consider whether
the self-return idiom might make life easier for your callers.
Resources
About the author  | |  | Brian Goetz has been a professional software developer for over 18 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California, and he serves on several JCP Expert Groups. See Brian's published and upcoming articles in popular industry publications. |
Rate this page
|  |