 | Level: Intermediate Brian Goetz (brian.goetz@sun.com), Senior Staff Engineer, Sun Microsystems
24 Apr 2007 Everyone has a favorite feature idea or two for adding
to the Java™ language. With the open-sourcing of the Java
platform and the rise in popularity of other languages for
server-side applications (JavaScript and Ruby, to name two), the
debate over the future of the Java language has never been more
vigorous. Should the Java language embrace major new additions,
such as closures? Or is that too much messing with a good thing? In this month's Java theory and practice, Brian Goetz reviews the concepts involved and provides details on the two competing closures proposals.
In a recent installment of the
Crossing borders
series, my friend and
colleague Bruce Tate wrote about the power of closures using Ruby to illustrate. At the recent JavaPolis
conference in Antwerp, one of the most well-attended sessions was Neal
Gafter's "Adding closures to the Java language." And on
the public white boards at JavaPolis, where attendees could write their
thoughts about anything related to (or unrelated to) Java technology,
nearly half the space was dedicated to the closures debate. It seems
like everyone in the Java community is talking about closures these
days -- even though closures were a well-developed concept more than
20 years before the development of the Java language.
In this article, my goal is to provide you with a 10,000-foot view of the
debate over closures in the Java language. This article starts with an overview of closures, details their application, and then summarizes the competing proposals currently on the table.
Closures: A primer
A closure is a block of code that may contain free (unbound)
variables; these variables are not defined in the block of code or in
any global context but instead in the environment in which the block
of code is defined. The name "closure" comes from the combination of
the block of code to execute (which, by virtue of free variables, is
not closed with respect to variable reference) with an evaluation
environment (the scope) that supplies bindings for the free
variables. Varying degrees of closure support can be found in Scheme,
Common Lisp, Smalltalk, Groovy, JavaScript, Ruby, and Python, among
others.
The value of closures is that they can serve as function objects or
anonymous functions, and this has consequences for the type system
in that the type system must be able to represent not only data but
code as well. Most languages with closures support functions as
first-class objects, meaning that functions can be stored in
variables, passed as parameters to other functions, and -- most
importantly -- created dynamically and returned from
functions. Consider this example in Scheme (from SICP 3.3.3), shown in
Listing 1:
Listing 1. Example from the Scheme programming language of a function that takes a function as argument and returns a memoized version of that function
(define (memoize f)
(let ((table (make-table)))
(lambda (x)
(let ((previously-computed-result (lookup x table)))
(if (not (null? previously-computed-result))
previously-computed-result
(let ((result (f x)))
(insert! x result table)
result))))))
|
This code defines a function called memoize
that takes as its argument a function f and returns another
function that computes the same function as f but caches
previously computed results in a table so they can be returned more
efficiently. The returned function is created using the lambda
construct, which dynamically creates a new function object. The
identifiers in italics are the ones that are free in the newly defined
function; their values are bound in the environment that creates the
function. For example, the table variable, used to store cached
values, is created when memoize is invoked, and because it is
referenced by the newly created function, is not garbage collected
until the resulting function is collected. When the resulting function
is called with argument x, it first looks to see if it has
already computed f(x). If it has, it returns the known
f(x); otherwise, it computes f(x) and caches it in
the table for future use before returning it.
Closures provide a compact, natural way to create and manipulate parameterized computations. You can think of closure support as
providing the ability to treat "blocks of code" as first class objects: pass them around, invoke them, and dynamically create new ones. To
support closures fully, a language needs to provide support for
manipulating, invoking, and creating functions at run time and for
functions to capture the environment in which they were created. Many
languages provide a subset of these features, which may enable some,
though not all, of the benefit of closures. In the debate over whether
to add closures to the Java language, the key question is whether the
benefit of the incremental expressiveness is worth the cost of the
incremental complexity.
Anonymous classes and function pointers
The C language provides function pointers, which allow you to pass
functions as arguments to other functions. However, functions in C
cannot have free variables; all variables must be known at compile
time, which reduces the expressiveness of function pointers as an
abstractive mechanism.
The Java language provides inner classes, which can contain references
to fields of their enclosing object. This feature is richer than
function pointers because it allows the inner class instance to
retain a reference to the environment in which it was created. Indeed,
at first glance, inner classes seem to provide most, if not all, of
the value of closures; one could easily construct an interface called
UnaryFunction and create a memoizing wrapper that can memoize any
unary function. But this approach is often unwieldy in practice; it
forces all the code that interacts with your function to be written
with an awareness of this function "framework."
Closures as pattern templates
Anonymous classes let you create objects that capture part of the
environment in which they were defined, but objects and blocks of code
are not the same thing. As an example, consider any repetitive coding
pattern, such as executing a block of code with a Lock held. If we
want to increment a counter with a Lock held, the code looks like
Listing 2 -- frustratingly verbose for such a simple operation:
Listing 2. Canonical idiom for executing a block of code with a lock held
lock.lock();
try {
++counter;
}
finally {
lock.unlock();
}
|
It would be nice to abstract out the lock management
code, which would make the code more compact and less error-prone. As
a first attempt, you might create a withLock() method like Listing 3:
Listing 3. Attempt at abstracting the concept of "do this while holding lock," which suffers from a lack of exception transparency
public static void withLock(Lock lock, Runnable r) {
lock.lock();
try {
r.run();
}
finally {
lock.unlock();
}
}
|
Unfortunately, this approach only takes you part of the way to where you
want to be. One of the goals of creating this abstraction was to make
the code more compact; unfortunately, the syntax of anonymous inner
classes is not very compact, and the calling code will look something
like Listing 4:
Listing 4. Client code for the withLock() method in Listing 3
withLock(lock,
new Runnable() {
public void run() {
++counter;
}
});
|
That's still a lot of code just to increment a counter with a lock
held! Further, the abstraction barrier introduced by turning the
block of code that is guarded by a lock into a method invocation
complicates matters severely -- what happens if the guarded block of
code might throw a checked exception? Now we cannot use Runnable as
our task representation; we must create a new representation that
allows exceptions to be thrown across the method
invocation. Unfortunately, generics don't help us very much here
either; while you can create a method that has a generic type
parameter E identifying a checked exception it might throw, this
approach doesn't generalize well to methods that throw more than one
checked exception type (which is why the call() method in Callable is
declared to throw Exception, rather than a type specified by a type
parameter). The approach in Listing 3 is hampered by a lack of
exception transparency. It also suffers from other forms of
nontransparency; statements like return or break would mean something
different in the context of the Runnable in Listing 4 than they would
in the try block in Listing 2.
Ideally, you'd like to be able to have your guarded increment operation
look something like Listing 5 and have the code in the block mean
the same as it would in the expanded form from Listing 2:
Listing 5. Ideal (but hypothetical) form for client code for Listing 3
withLock(lock,
{ ++counter; });
|
By adding closures to the language, it becomes possible to create
methods that can behave like control flow constructs, such as "execute
this code with a lock held," "operate on this stream and close it when
you're done," or "time how long it takes to execute this block of
code." This strategy has the potential to simplify certain types of
code that repeatedly use particular coding patterns or idioms, such as
the locking idiom in Listing 2. (Another technique that offered
similar expressiveness, to a certain extent, was the C preprocessor;
it is possible to express the withLock() operation as a preprocessor
macro, though macros are harder to compose and less safe than
closures.)
Closures for generic algorithms
Another place where closures offer an opportunity to dramatically
simplify code is in the use of generic algorithms. As multiprocessor
machines become cheaper, it becomes more important to exploit
fine-grained parallelism. Specifying computations using generic
algorithms provides a natural opportunity for library implementations
to exploit parallelism in the problem space.
For example, consider calculating the sum of the squares of a large
collection of numbers. Listing 6 shows one way to perform this
calculation, but this approach calculates the results sequentially and
might not be the most efficient way to calculate the desired result on
large-scale multiprocessor systems:
Listing 6. Calculating the sum of squares sequentially
double sum;
for (Double d : myBigCollection)
sum += d*d;
|
Each loop iteration has two operations: squaring a value and adding
that value to a running total. Each of the squaring operations is
independent of each other and could be executed in parallel, and
instead of executing N addition operations, if the computation is
organized properly, the sum can be calculated in log(N) operations.
The operation in Listing 6 is an example of the map-reduce algorithm,
where a function is applied to each of a large number of data
elements and the result from each application is combined with some
sort of combining function. If you imagine you have a map-reduce
implementation that takes a data set, a unary function that is
applied to each element, and a binary function for combining the
results, you could express the sum-of-squares calculation as in Listing 7:
Listing 7. Calculating the sum of squares with MapReduce, allowing for the possibility of parallel execution
Double sumOfSquares = mapReduce(myBigCollection,
new UnaryFunction<Double> {
public Double apply(Double x) {
return x * x;
}
},
new BinaryFunction<Double, Double> {
public Double apply(Double x, Double y) {
return x + y;
}
});
|
The mapReduce() implementation assumed in Listing 7 knows which
operations can be evaluated in parallel and could therefore
parallelize both the function application and combining steps,
resulting in improved throughput on a parallel system. But the code in
Listing 7 is really ugly; it takes many more lines of code to
express the generic algorithm equivalent of the three lines in Listing
6.
Closures give you a way to make the code in Listing 7 more
manageable. For example, the closure syntax in Listing 8 does not
correspond to any of the current closure proposals for the Java
language, but instead it is intended to convey the spirit of how closures
support generic algorithms:
Listing 8. Calculating the sum of squares using MapReduce and a hypothetical closure syntax
sumOfSquares = mapReduce(myBigCollection,
function(x) {x * x},
function(x, y) {x + y});
|
The closure-based version in Listing 8 has the best of both worlds: The code is easy to read and write, it is specified at a higher level
of abstraction than the sequential loop, and it can be efficiently
parallelized by the library.
Closure proposals
At least two proposals are on the table for adding closures to
the Java language. The first, dubbed "BGGA" (for its authors, Gilad
Bracha, Neal Gafter, James Gosling, and Peter von der Ahe), extends
the type system to incorporate function types. The second, dubbed
"CICE" (for Concise Inner Class Expressions) and supported by Joshua
Bloch, Doug Lea, and "Crazy Bob" Lee, has a more modest goal:
simplify the creation of anonymous inner class instances. It is likely
that a JSR will soon be proposed to consider the form and degree of
closure support to be suggested for a future version of the Java
language.
The BGGA proposal
The BGGA proposal creates a concept of function types, where a
function has a typed argument list, a return type, and a throws
clause. In the BGGA proposal, the sum-of-squares code would look like
the code in Listing 9:
Listing 9. Calculating the sum of squares using the BGGA closure syntax
sumOfSquares = mapReduce(myBigCollection,
{ Double x => x * x },
{ Double x, Double y => x + y });
|
The code inside the braces to the left of the => symbol identifies the
names and types of the arguments; the code to the right represents the
implementation of the anonymous function being defined. This code can
refer to local variables defined within the block, arguments to the
closure, or variables from the scope in which the closure is created.
In the BGGA proposal, you can declare variables, method arguments, and
method return values that are function types. You can supply a closure in any context where an instance of a single abstract method
class (like Runnable or Callable) is expected; for anonymously typed
closures, an invoke() method is provided so you can call them with a
specified argument list.
One of the primary goals of the BGGA proposal is to allow programmers
to create methods that act like control structures. Accordingly, BGGA
also proposes some syntactic sugar to allow you to call methods that
accept closures as if they were new keywords so that you can create
methods like withLock() or forEach() and invoke them as if they were
control primitives. Listing 10 shows how the withLock() method would
be defined under the BGGA proposal; Listing 11 and Listing 12
show how it would be invoked, using both the standard form and the "control construct" form:
Listing 10. Coding the withLock() method under the BGGA closures proposal
public static <T,throws E extends Exception>
T withLock(Lock lock, {=>T throws E} block) throws E {
lock.lock();
try {
return block.invoke();
} finally {
lock.unlock();
}
}
|
The withLock() method in Listing 10 accepts a lock and a closure. The
return type and throws clause of the closure are generic arguments;
type inference in the compiler generally allows it to be invoked
without specifying the value of T and E, as in Listing 11 and Listing 12:
Listing 11. Invoking withLock()
withLock(lock, {=>
System.out.println("hello");
});
|
Listing 12. Invoking withLock() using the control construct shorthand
withLock(lock) {
System.out.println("hello");
}
|
Like generics, much of the complexity of closures under the BGGA
proposal is borne by library creators; using library methods that
accept closures is much simpler.
The BGGA proposal also works to repair a number of transparency
failures that are present when trying to use inner class instances to
gain the benefits of closures. For example, the semantics of return,
break, and this are different in a block of code than they are in a
Runnable (or other inner class instance) that represents the same
block of code. These elements of nontransparency can cause confusion
when migrating code to take advantage of generic algorithms.
The CICE proposal
The CICE proposal is a simpler proposal that addresses the problem
that instantiating inner class instances is too cumbersome. Rather
than create a notion of function types, it simply creates a more
compact syntax for instantiating instances of inner class with a
single abstract method (such as Runnable, Callable, or Comparator).
Listing 13 shows what the sum of squares code would look like under
CICE. It makes explicit the UnaryFunction and
BinaryFunction types used by
mapReduce(). The arguments to mapReduce()
are anonymous classes derived from UnaryFunction and
BinaryFunction; the syntax simply elides much of the
redundancy associated with creating an anonymous instance.
Listing 13. Sum of squares code under the CICE closures proposal
Double sumOfSquares = mapReduce(myBigCollection,
UnaryFunction<Double>(Double x) { return x*x; },
BinaryFunction<Double, Double>(Double x, Double y) { return x+y; });
|
Because the objects representing the functions passed to
mapReduce() are ordinary anonymous class instances, their
bodies can refer to variables defined in the enclosing scope; the
only difference between the approaches in Listing 13 and Listing 7 is
the verbosity of the syntax.
Closing
The BGGA proposal adds a powerful new facility to the language. As
such, it adds measurable complexity to the syntax and semantics of the
language. The CICE proposal, on the other hand, is more modest: it
takes a facility already present in the language and makes it easier
to use but does not offer any significant new functionality. Closures
are indeed a powerful mechanism for abstraction; once they get used to
them, most people don't want to give them up. (Ask friends who are
experienced Scheme, Smalltalk, or Ruby programmers how they feel about
closures, and they'll probably respond by asking you how you feel about
breathing.) But languages are organic entities, and adding a new feature
to a language that was not anticipated in the original design is
fraught with compromise and creates complexity. The issue being
debated is not whether closures are a good idea -- because they
clearly are -- but whether the benefits of retrofitting the Java
language with closures are worth the costs.
Resources Learn
Discuss
About the author  | |  | Brian Goetz has been a professional software developer for 20 years. He is a senior staff engineer at Sun Microsystems, and he serves on several JCP Expert Groups. Brian's book,
Java Concurrency In Practice
, was published in May 2006 by Addison-Wesley. See Brian's published and upcoming articles in popular industry publications. |
Rate this page
|  |