Level: Introductory Gary Pollice, Professor of Practice, Worcester Polytechnic Institute
15 Sep 2004 from The Rational Edge: Many projects fail either because managers do not collect metrics consistently to make needed adjustments or because they collect the wrong metrics and get a false sense of project progress. This article suggests ways to choose useful team metrics and employ them to ensure project success.
Imagine that you are in Boston, Massachusetts and need to be in London, England
in two weeks for a very important meeting that could determine the future of
your business. Also imagine that you have at your disposal a sailing yacht and
crew. Now, you really don't know much about the mechanics of sailing,
but the captain and crew come highly recommended. As you haven't taken
a vacation in a while, sailing across the Atlantic Ocean seems like a great
tonic for your weary being. The captain assures you that you will arrive in
England in plenty of time. So off you go.
Once your journey is underway, each day you check its progress with the captain.
He shows you the compass and the map so you can see that you're pointed
in the right direction. You look at the other instruments and see that your
speed is more than enough to get you to your destination on time. However, on
the thirteenth day, you go to the control room, and the captain looks worried.
He informs you that the crew just took a reading with their global positioning
system, and though you will make landfall the next day, it will be in the Canary
Islands! Images of your business going up in flames rush through your brain,
and you wonder how this could possibly have happened. What went wrong, you wonder,
as you bury your head in your hands.
In reality, of course, whether or not we understand the intricacies of sailing
and navigation, we know that winds and current have an impact on a ship's
progress. You can't just point the ship in the direction you want to go
and expect it to arrive there. Instead, after each short leg of your voyage,
you need to correct your course, as shown in Figure 1. But this graphic doesn't
show the corrections' granularity. In truth you must make them almost
continuously, just as you do when driving a car.
Figure 1: To arrive at a desired destination, a ship's captain must
make continual course corrections.
Ship captains make corrections in response to readings the crew takes with
various instruments. The trick is to know which instruments to read and how
to make effective adjustments in response to the measurements. Is this much
different than trying to steer a software development project toward success?
Not really.
Once I was involved in a major project that had several teams working diligently
on separate subprojects in an effort to meet the delivery date. Each project
team manager gave weekly progress reports, and week after week, a certain team
manager told the rest of us that everything was on schedule. There was nothing
to worry about. However, three weeks before the planned delivery date, he informed
us that his team would need an extra three months to finish its work!
What happened? Did something occur between the previous weekly meeting and this
one that caused a three-month slip? Of course not. As in our story, the "captain"
was looking at the instruments and thinking everything was fine. The problem
was that he was either looking at the wrong instruments, misreading them,
or misunderstanding what they told him. In any case, the results were catastrophic.
How to get the right metrics
As both our fantasy and my personal anecdote show, we can think everything
is going right when, in reality, our world is crumbling. The problems lie not
in the answers we get but in the questions we ask.
In the sailing story, the question you asked the captain was, "What progress
are we making?" But the question should have been more precise: "Can
you show me on the map where we are right now?" That would have forced
the crew to invoke the GPS and get an exact location. And it wouldn't
have taken long to figure out that the ship was more than a bit off course.
In the case of my client's project, instead of asking, "How are
you doing?," we could have asked for demonstrations of current capability.
If several weeks went by without any proof of progress, we would have known
something was wrong, and we could have adjusted either expectations or the schedule.
Of course, this incident happened before most of us thought about iterative,
incremental development.
Team metrics
In last month's column, I discussed how individual
measurements can help software developers. This month I want to examine team
measurements.
As we saw last month, individual metrics and indicators1
help individuals estimate the effort required to deliver high-quality software
and to improve the quality of their products. The raw data, or actual measurements,
are typically private property that belongs to a certain individual.
Although team metrics and individual ones are used for similar purposes --
to determine how much work the team is able to do, the quality of their work,
and so on -- team metrics are more public. And people view them from a number
of different perspectives.
I recently participated in a workshop on empirical methods for determining
the effectiveness of agile methods.2 We identified
two broad categories of metrics: inward and outward. Inward metrics are similar
to individual metrics. We use them to understand and improve a team's
effectiveness. Outward metrics help people outside the team, such as managers
and other stakeholders, determine a project's status or the effectiveness
of an organizational process. Let's look more closely at both categories.
Outward metrics
Outward metrics are those that managers need to perform their business functions.
They should answer questions such as:
- Is the current projected completion date acceptable?
- Will the software have the required features?
- How much will the project cost?
- Will the quality be acceptable?
By answering these questions, you get valuable information on which to base
rational business decisions. You can determine where you are on the industrial
sea, how far off you are from the planned course, and what kind of corrections
you need to make. The correction might even be to cancel the voyage or change
the destination (i.e., cancel the project or modify its scope).
What outward metrics should you collect? In last month's
column, I mentioned Watts Humphrey's belief that if you measure defects,
time, and size, you can derive most or all meaningful metrics. But how best
to measure these parameters in a way that is relevant to your project and organization
is not always obvious.
Measuring defects, for example, is a tricky business. Which of the defects
are really enhancement requests? Which are simple inquiries about expected behavior?
I prefer to make defect measurement an inward metric that the team can use to
determine readiness for delivery. Often, teams count defects to determine whether
to include a feature in an iteration release. Some also use defect density to
determine whether a product is of acceptable quality. I struggle with that term,
acceptable quality. I have always believed that an arbitrary defect measurement
is too constrained to give a good indication of quality. It certainly is one
quality indicator, but there is so much more, and the team should work with
stakeholders to determine quality measures. Also, measuring change in defect
density might not relate to the organization's business goals. If the
primary goal is to get to market quickly in order to beat a competitor, you
would be better off measuring the number of required features that have been
completed and implemented.
Time measurements might seem like the easiest to obtain: Simply determine how
much time the team spends on the project, right? Not if the project manager
wants more granular information. For example, it would be nice to find out how
much time the team spends on non-productive work and then try to reduce that
amount. Of course, trying to do this can lead you down a rather large "rat
hole." Who decides what is productive? And if an activity is non-productive,
why do team members spend time on it? Initially, I would suggest using the gross
time measurement for simplicity's sake.
Size measurements are typically best treated as inward metrics. However, there
is one metric that external stakeholders can use: the number of features
delivered. This measurement is effective and simple to compute. If you're
using an iterative, incremental software development process like IBM® Rational
Unified Process,® or RUP,® it will give you this measure almost "for
free." If you adhere to the spirit of iterative development, you include
only those features that meet the release criteria for each iteration.
In addition to defect, time, and size measurements, I recommend a few more outward
metrics, which I'll describe below, along with ideas on how to get the
raw data.
Completed features. A feature is either ready to ship or it isn't.
Check each feature at the end of each iteration. The number of features implemented
gives you an indicator of the progress to release. You should also compare the
features ready for release with the current prioritized feature list to make
sure the team is working on the most important features.3
Planned features versus features delivered. This is a classical planned
versus actual metric. No one is able to predict productivity (or velocity as
it is called in Extreme Programming (XP)) precisely, but you want to see a trend
showing that the team is fairly accurate, and that the actual is sometimes over as well as under
the estimate, and not by that much. Figures 2A and 2B show
measurements for teams that have about the same arithmetical average difference
between actual and planned features. However, clearly the team in Figure 2B
is better at estimating its capabilities. In both cases, the teams are over
as much as under their estimates, but the average of the absolute difference
between planned and actual features show that the second team is more accurate.
It might be that the features which the second team is working on are better
defined or more equal in size and complexity. If you find that teams'
estimates are not getting more accurate as time goes by, you may need to look
at how you measure the features. You might need to add a size or complexity
factor to estimate for features that are larger or more complex than others.

Figure 2A: Poor estimation
Figure 2B: Better estimation
Categorized defects. You need to know when software is ready for public
use. Let's assume that you use a defect tracking system. One thing you
must do is determine how to classify defects and enforce consistent application
of your classification system. I have seen many development teams refuse to
classify serious defects as "show-stoppers" as the release date
nears. Unless you enforce consistent categorization, your team is at risk of
creating a false sense of security.4
Defect discovery rate. During development, you will want to examine
the number of defects discovered in the software over time. As you
prepare for a release, this rate should drop. If it doesn't, you may have
quality problems. And if the defect discovery rate approaches zero, you need
to investigate. It could be that your team stopped testing!
However, just because your defect rate isn't approaching zero, don't
assume that you have a quality problem -- at least not a quality problem with
your developers. There are many reasons that the defect rate might spike. You
need to understand the root causes for the data you are trying to analyze. It
may be that your process has other flaws, which are becoming evident in your
defect rate. You may find that you need to look at other metrics, such as the
rate of requirements change, along with defect rate to truly understand the
dynamics of the project.
Resource "burn rate." How much is it costing you to create
the software product? Is the investment worthwhile? In an ever-changing business
climate, managers need to check periodically on whether producing the software
is still a viable option. If the project begins to cost too much, it may be
best to cancel it. Undoubtedly, many Massachusetts officials would have been
happy to give up on the "Big Dig" had they known it would end
up costing billions of dollars in overruns.5 If they had been able to
monitor the burn rate early and often, this might have been possible.
There are other outward metrics you might want to use. The main thing is to
use metrics that are meaningful to you. Also, make sure that you measure
in a consistent way and use consistent units of measurement. Finally, be sure
to review your measurement plan regularly to see if it is still measuring relevant
data; if not, then you need to change the metrics to reflect the current state
of your organization. Remember, outward metrics are for people -- usually managers
-- outside the project. Their purpose is to help these people control projects,
not to improve the team's skills and efficiency.
Another important point: Make sure you measure things that the organization
-- or your managers -- actually values. I will
relate two experiences I had with the same organization by way of illustration.
First, at a managers' meetings I attended when I began the project, each
manager was reporting on his or her team's accomplishments during the
prior month. One announced that her group had resolved more than 600 defects
during that month. The other managers gave her a rousing cheer, but I, being
new to the group, raised my hand and asked whether anyone was concerned that
there had been 600 defects to begin with. Everyone stared at me as if I had
two heads. I realized that for those managers, product quality was not the top
concern. Far more important was confirmation that the team was "working
hard."
On another occasion, I recommended canceling a project and reassigning its
resources to another project. Now, I was heavily invested in that first project.
I had conceived it and worked hard to get it approved and funded. But when team
members spent time talking with the customer and then sharing what they learned,
it turned out that the customer was more interested in another project's
work than in ours. When we analyzed the potential profitability of completing
our project versus helping the other project come to market ahead of plan, the
figures clearly indicated that the latter option was best for the company.
Imagine my shock when my manager expressed great displeasure with our decision
and recommendation. In a conversation he asked me why I was doing this to
him. I explained that accelerating the other project's delivery would
be far more profitable than completing ours. His response? "What do you
care about profit?" Obviously, it was not my job to worry about such matters.
Because following my recommendation would take resources from my manager's
budget and put them under another person's control, my manager felt that
would diminish his value. In other words, he measured value not by how much
the group contributed to the company's bottom line, but by the size of
the budget he controlled. Although not everyone in the company measured value
this way, I came across quite a few who did.
The setting for both of these examples was the same organization within a very
large, established company. Some top managers had realized that the current
culture and values were creating problems and wanted to shake things up. In
fact, the cancelled project was part of an experiment: define a product and
get it to customers within a year. However, not everyone was on board for a
change. Although I gained a lot of respect from many of the engineers for having
the courage to cancel that project, my manager's attitude was definitely
attuned to the prevailing culture. I realized that I would have trouble working
in that environment and left to join a more dynamic organization. Predictably,
the company I left is no longer in business; because of that culture, it could
not change with the times.
Inward metrics
The purpose of inward metrics, like that of personal metrics, is to improve
the team. They should be visible at least to the entire team, and perhaps beyond.
A team is more than a collection of individuals. Efficient teams create synergy,
which enables them to accomplish much more than the sum of individual efforts.
But for inefficient teams, even the smallest problem can have dramatic negative
effects. Therefore, it is important to continually monitor the team's
health and take corrective action as soon as problems appear. Inward
metrics are most useful for this purpose. Like personal metrics, they should
not be used to punish, but rather to support the team's well-being.
We can again start with the three primary measurements: time, defects, and
size. Many inward metrics are analogous to personal ones, addressing questions
such as the following:
- How good is our work?
- Are we able to produce quality software consistently?
- How much time do we spend on different types of work?
- Are our process and application of software development techniques effective?
- Do our tools improve productivity and quality?
Let's look at some inward metrics that are simple to gather and easy
to understand.
Velocity. A term used extensively in the Agile community, velocity
relates to how much the team can accomplish within a given time period. The
concept is simple. Take any measure of planned effort or size, based on use
case scenarios, user stories, function points, or any other measure you can
accurately determine. Specify the amount of work the team thinks it can do in
the next iteration. Then, when the iteration is complete, measure the actual
amount of work the team did.
As you measure velocity over several iterations, planned and actual values
should begin to converge. In XP, the team uses work accomplished in the previous
iteration as the predictor for the next iteration. In colloquial terms, this
is predicting tomorrow's weather from today's weather. It's
easy to do, and the results are typically quite good.
Defect density. Whereas defect density is not a good outward metric,
it is a very good inward one. Defect density is simply the number of defects
per unit of size. If you use lines of code (LOC), for example, you might measure
the number of defects/KLOC (thousand lines of code). You want to see defect
density decline over time, which indicates that your team is doing better quality
work.
You may want to adjust the size measurement you use for defect density as you gain
experience. You may also adopt different measurements, depending upon whether you
are writing additions to legacy code or new code. Measuring defects per change
request might make more sense for legacy code.
Test coverage. This is another easy measurement to obtain, assuming that
you have automated tests and use profiling coverage tools. Each team will have
different target coverage figures, depending upon the methodology they use.
Traditionally, 80 percent coverage has been used as a good target figure. Of
course, you have to determine whether you have covered the "right"
80 percent. If you follow test-first programming (see my June
column), in theory you should be able to achieve 100 percent coverage.
Whatever you choose for a target figure, I encourage you to measure test coverage.
It doesn't necessarily give you the number of defects in your product,
but it does indicate whether your team is doing enough testing.
Change frequency. Especially as a product grows ready for release to
a user community, it is important to monitor the frequency of changes to the
code base. You can do this easily with your version control system. The simplest
measure is the number of files changed per day (or some other time period).
If the number of changes is high, you may not be ready to release the product.
Of course, a small number of changes is not necessarily an indicator that the
code is ready; one very serious defect centered in one module might be taking
a long time to fix. It is important to use this metric in combination with others
that provide more detail.
Change impact. One way to evaluate a good object-oriented design is
by the cohesiveness and coupling of the classes. You want low coupling and high
cohesion. If a defect repair requires you to modify many classes, that is an
indicator of low cohesion and high coupling. If you use your change management
system to track changes to files that are modified to effect the change, you
can get the data you need to measure design quality.
As with outward metrics, there are many more inward metrics you can use. My
advice remains the same: Use metrics that are relevant to your project and organization,
and measure consistently, using standard measurement units.
Arrive at the right destination
Team metrics can help both teams and organizations improve their performance
and get where they want to go. If you want to start a metrics program, let the
team determine what metrics are important for the team. Let the stakeholders
determine what's important to them. Take time up front to make these determinations
and establish standards. Remember, the greatest dangers lie in using the wrong
metrics or using them punitively. Start with just a few metrics and review them
often to see whether they really deliver the information you need. As with your
development process, don't be afraid to change them; add new ones when
necessary and remove those that are not helpful. In addition, be sure that you
don't just look at the numbers and then do nothing to correct the problems
they indicate. As we saw at the beginning of this article, you must take action
and constantly adjust your course as you go in order to get where you want to
be.
Further reading
There are many books and articles on team metrics. Many listed below are about
controlling the process. I have more success when I look at the metrics they
recommend and apply them not so much to control a process or team behavior but
rather to help a team collaborate better. I let team members decide which ones
make sense for our project.
Lawrence H. Putnam and Ware Meyers, Five Core Metrics: The Intelligence
Behind Successful Software Management. Dorset House Publishing, 2003, ISBN
0-932633-55-2.
Watts S. Humphrey, Introduction to the Team Software Process.SM Addison-Wesley,
2000, ISBN 0-201-47719-X.
William A. Florac and Anita D. Carleton, Measuring the Software Process.
Addison-Wesley, 1999, ISBN 0-201-60444-2.
Kent Beck and Martin Fowler, Planning Extreme Programming. Addison-Wesley,
2001, ISBN 0-201-71091-9.
Conte, Dunsmore, and Shen, Software Engineering Metrics and Models.
Benjamin/Cummings, 1986, ISBN 0-8053-2162-4.
Norman E. Fenton and Shari Lawrence Pfleeger, Software Metrics: A Rigorous
& Practical Approach. PWS Publishing Company, 1997, ISBN 053495425-1.
Notes
1 To understand the differences among measure, metric, and indicator, see last
month's column.
2 XP/Agile Universe 2004, Calgary, Alberta, Canada. August 15-18, 2004.
3 Remember that you are continually changing priorities and requirements as
the project progresses. What's hot in this iteration might not be important next,
but it usually is pretty warm.
4 One extreme example was a company whose policy was not to ship any software
that had a verified "priority 1" defect open against it. There were many such
defects as the delivery date approached, so the organization changed the policy.
It instituted a new "priority 0" category and decided not to ship software with
any priority 0 defects against it. Since no one ever classified a defect as
priority 0, the release shipped -- and the company eventually failed.
5 The Big Dig is a road construction project in Boston. It was originally
projected to cost about $4 billion. Currently, the cost is up to $14 billion,
and the project is not yet completed. Who says other engineering disciplines
have far more sophisticated controls than software engineering?
About the author  | 
|  | Gary Pollice is a professor of practice at Worcester Polytechnic Institute, in Worcester, MA. He teaches software engineering, design, testing, and other computer science courses, and also directs student projects. Before entering the academic world, he spent more than thirty-five years developing various kinds of software, from business applications to compilers and tools. His last industry job was with IBM Rational software, where he was known as "the RUP Curmudgeon" and was also a member of the original Rational Suite team. He is the primary author of Software Development for Small Teams: A RUP-Centric Approach, published by Addison-Wesley in 2004. He holds a B.A. in mathematics and an M.S. in computer science. |
Rate this page
|