 | Level: Introductory Scott W. Ambler, Practice Leader, Agile Development, Rational Methods Group, IBM
15 Nov 2006 from The Rational Edge: This two-part series overviews how to take an IBM Rational Unified Process (RUP)-based approach to data warehouse (DW) projects that reduces both your business and technical risk while delivering a high-quality solution that meets the changing needs of its end users. This article overviews the problems associated with a traditional, serial approach to DW development, describes how the evolutionary approach of RUP is much better suited, and overviews the initial phase of such a project.
Gartner Group has pegged the success rate of data warehouse (DW) projects at roughly 50 percent, although depending on how you define success, other people will quote you success rates at between 20 percent on the low end and 65 percent on the high end. Regardless, we seem to have some room for improvement. My experience is that the traditional, serial approach to DW development, where weeks or months of modeling is done before development begins, is insufficient to meet demands of the dynamic business environment in which we find ourselves these days. Changing stakeholder requirements demand a flexible, evolutionary, and highly collaborative approach to DW development. The Rational Unified Process®, or RUP®, defines such an approach.
The need to take an evolutionary approach to DW development is nothing new. Bill Inmon, the acknowledged father of data warehousing, has been very clear about the need to do so virtually from the very beginning. Unfortunately, many data warehousing efforts have been hobbled by the plodding, serial processes of yesteryear -- processes that undoubtedly are a major contributor to the high failure rate of DW projects. Other contributing factors include too great of a focus on the actual data itself, instead of the potential business value that a DW provides to end users, and the rather naïve assumption that you need to identify the "one truth" for your major data entities. Luckily, these are all self-inflicted problems that we can choose to address.
Before I begin, I want to make it perfectly clear that I have intertwined several agile concepts and techniques into my approach to DW projects following RUP. Many of my suggestions, such as doing just enough modeling and documentation for the situation at hand, will likely be seen as familiar ideas that can easily be applied to a DW project. Other ideas may seem heretical within traditional data management circles, e.g., that striving to identify "the one truth" often proves to be a poor investment of your time (don't worry, I'll explain why this is true later in the article). My experience is that if you want to truly achieve a high-quality DW that is responsive to the changing needs of its stakeholders, then you need to move away from traditional techniques and adopt an agile, RUP-based approach.
Why a RUP-based approach?
RUP is often viewed as simply an object-oriented (OO) development process, and, to be fair, that is clearly where its origin lies. The fact is that RUP is also used to build applications using procedural technologies such as COBOL, for implementing commercial off the shelf (COTS)-based systems, for building applications using a service-oriented architecture (SOA), and yes, for building business intelligence (BI) systems using data warehouse, data mart, and reporting technologies. The focus of this article is on DW projects.
Figure 1 depicts the RUP lifecycle. The RUP is an evolutionary approach to development, where you take both an iterative and incremental approach. The iterative nature is depicted by the various disciplines along the left-hand side of the diagram: you iterate back and forth between various activities pertaining to requirements, analysis and design, testing, programming, and so on. You develop incremental releases over time, running through the lifecycle once for each production release of your system. During Elaboration, you develop working architectural releases that address high-risk technical issues. Furthermore, you produce working software at the end of each Construction iteration, software that could potentially be released internally for testing and/or demo purposes.
Figure 1: The RUP lifecycle
There are several reasons why organizations choose RUP for data warehousing projects:
- RUP addresses scope risk. RUP recognizes the fact that your requirements will change throughout the project, and therefore it promotes a flexible approach to requirements management. We know that people aren't good at defining requirements upfront, that external forces motivate change as do internal political forces, and that, quite frankly, people change their minds once they see something delivered. The point is that trying to define a comprehensive requirements specification upfront proves to be a spectacularly risky decision, so we don't do that with RUP. Don't get me wrong. You need to do some initial requirements modeling during RUP's Inception phase, but it doesn't need to be as comprehensive as traditionalists think.
- RUP looks beyond data. One of the primary causes of DW project failure is not providing sufficient business value to your stakeholders. To identify and then focus on business value, you need to understand how that data will potentially be used in practice. Therefore, a usage-centered approach, as provided through RUP's application of use cases, is preferable to a data-centered approach to gain a broad view of the actual requirements.
- RUP addresses technical risk early. I believe that too many DW projects fail because of the false confidence provided by detailed models created early in the project. The data model is very detailed, it captures the "one truth," and you've spent months working on it, so how could that not be good? The problem is that every architecture works on paper, or in your modeling tool, or on your whiteboard. It's not until you prove that architecture with code that you actually know that it will work. As you will see in Part 2 of this series, RUP's Elaboration phase focuses on addressing technical risk early in the lifecycle.
- RUP addresses financial risk. An important RUP concept is that you should deliver working software, incrementally, on a priority basis, during each Elaboration and Construction phase iteration. An iteration is defined by a time box preferably between one and four weeks in length, although very large and/or dispersed teams will have longer iterations, during which a project team delivers a portion of the overall system. Incremental, iterative development that addresses a project's critical components early ensures that you are delivering the highest value functionality first and that you are maximizing your stakeholder's return on investment (ROI) at all times. The method also gives you concrete feedback as to the status of your project. The only true measure of progress on a software development project is the regular delivery of working software, enabling you to determine if your IT investment resources are being spent wisely.
- RUP enables disciplined agility. Data warehousing projects are hard. The requirements are difficult to pin down, and significant challenges are often posed by legacy data sources. We need to be flexible, yet at the same time provide a level of control for management that enables them to govern development efforts effectively. RUP allows this disciplined agility. You can tailor it to meet your exact needs to provide just the right amounts of flexibility and rigor for your unique situation.
The majority of this article is written from the perspective of a green-field DW project using a RUP-based lifecycle, although at the end, I discuss how to handle future production releases. A data warehouse is really an important aspect of your organization's technical infrastructure, so you really need to view it as an on-going product that evolves over time as a series of production releases. Each production release is typically managed as a separate project.
One interesting aspect of RUP is its serial nature in the form of phases. These phases are like the seasons of a project -- on any given day, you may be doing some requirements work or some testing work, but you are more likely to do requirements work early in the project, rather than towards the end of the project. In Figure 1, the "humps" represent the relative effort for each discipline throughout the project. In this article, I describe what occurs during the Inception phase of a DW project, and in Part 2, I describe the Elaboration, Construction, and Transition phases.
Addressing scope risk: The Inception phase
The primary goal of the Inception phase is to set the foundation for your project. You will need to identify the initial scope, an initial high-level plan, a business vision that outlines the goals and initial project justification, and gain initial stakeholder support for the project. Note the use of the word "initial" throughout that list. All of these things will evolve throughout your project, the implication being that you need to get the corresponding artifacts to the point where they are good enough for now -- they don't need to be perfect, nor do they need to be comprehensive or final. Remember, it is possible to be both disciplined and agile with RUP.
Your initial requirements are captured in the form of high-level use cases, an example of which is shown in Figure 2. Figure 2 depicts the Place Stock Order for Store use case for the Behemoth Retail Company, a fictional organization for which we're developing a data warehouse. The advantage of use cases is that they describe an activity that provides business value to a business actor, a business actor being a person or organization that interacts with your system. There are also system actors, external systems that your system interacts with, which in the case of data warehouse projects, would be the legacy data sources that you extract data from.
Place Stock Order for Store|
Each day determine the restocking needs based on previous day sales.
Override some stock levels because some items may:
- Not be needed any more.
- Need additional stock.
- Be new.
- Be promotional.
Consider issues such as:
- Seasonal/holiday items.
- Local marketplace.
- Local competitors.
- Forecasted weather.
|
Figure 2: An initial high-level use case specification
 |
Sidebar: Common Terminology
- Business intelligence (BI). The gathering and analysis of vast amounts of data to gain insights that drive strategic and tactical business decisions. BI encompasses a broad category of technologies, including data warehousing, multidimensional analysis or online analytical processing (OLAP), data mining, data visualization, and reporting.
- Data warehouse. A centralized repository containing comprehensive detailed and summary data that provides a complete view of customers, suppliers, business processes, and transactions, from a historical perspective with little volatility.
- Data mart. A repository containing a subset of the data stored in the data warehouse that is of interest to a specific business community, department, or set of users.
- Data warehousing. The design and implementation of processes and tools to manage and deliver complete, timely, accurate, and understandable information for decision making. Data warehousing deals with managing the development, implementation, and operation of a data warehouse or data mart. It includes metadata management, data acquisition, data cleansing, data integration, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup and recovery planning, and more.
|
|
The use case in Figure 2 is rather slim, indicating its name and some point-form notes describing the main idea. The use case provides just enough information to give the stakeholders a good understanding of its scope. It doesn't go into excruciating detail describing what we intend to do, it doesn't indicate which screens and reports are being used, it doesn't document business rules, and it doesn't list the data elements required to support this effort. We can gather these details in later iterations when and if we need to; for now, our goal is to simply gain a good understanding of the scope.
This is consistent with agile principles: agile modelers stop and move on to something once they've achieved their immediate goal. Many traditionalists struggle with this concept, thinking that they need to document the requirements in detail early in the project. This proves to be a significant waste of time in practice. When the requirements change, and they always do, any previous effort invested in documenting what has just changed is now wasted. Furthermore, there really isn't any rush to document the requirements because if you have the ability to model them today, you also have the ability to model them tomorrow when you actually need the information. Worse yet, the longer you go without the concrete feedback provided by working code, the longer it will take to discover if you truly understand the requirements. In short, writing comprehensive documentation may seem like a safe and comfortable thing to do, but in reality, it increases the risk of project failure.
On the Behemoth data warehouse project, we could easily have dozens, if not hundreds, of use cases. During the Inception phase, some use cases will be described to the level of detail that we see in Figure 2, others won't be as critical to our success so we may only choose to give them a name for now. To organize these use cases, or at least to communicate the overall scope to management, you might decide to create one or more use-case diagrams. Your use-case model is composed of the use-case diagram(s) plus any supporting documentation: i.e., use-case specifications and actor descriptions (if you choose to document them). Keep in mind that the heart of your use-case model are the use cases, not the diagrams, so that is generally where your focus will be.
Stakeholder involvement is critical throughout your project -- I highly recommend Agile Modeling's practice of active stakeholder participation, where stakeholders are not only involved with your project on a daily basis, they are also directly involved with the actual modeling effort itself. Anyone can write use cases to the level of detail shown in Figure 2, so why not have the business experts, the people who understand the business, be directly involved in writing them?
During the Inception phase, you should start thinking about the initial architecture of your system, both from a business point of view and from a technical point of view. For a data warehouse, this generally implies the creation of a high-level conceptual model and a high-level deployment diagram, respectively. Figure 3 depicts an initial conceptual model for the Behemoth DW created using Rational Data Architect (RDA), and Figure 4 shows an initial Unified Modeling Language (UML) deployment diagram created using Rational Software Architect (RSA). Notice how the conceptual model only identifies major business entities such as Customer, Store, and Item as well as the relationships between them. At this point, we only need this level of detail to start identifying potential sources for the corresponding data. Once again, we can address the details when we actually need to. Similarly, the deployment diagram only indicates sufficient detail to enable the team to understand the potential scope of the technical effort. As we'll see in Part 2, both of these models will evolve throughout the project.
Figure 3: An initial conceptual model
Figure 4: An initial deployment model
If any corporate standards exist -- such as data naming conventions, user interface design guidelines, programming guidelines, or modeling guidelines -- you adopt them at this point. If they don't exist, then you'll want to develop some, or better yet, adopt existing ones from industry. It's amazing what you can find on the Internet these days just by looking for it. All of this work is an aspect of RUP's Environment discipline.
At this point in the project, it's important to start making some basic scoping and architectural decisions. Are you going to build one or more data marts for point-specific solutions? Are you building a data warehouse for the entire enterprise? Are you attempting to address all of your organization's business intelligence (BI) needs? Don't worry, you can do all of these projects with a RUP-based approach, but you still need to decide on a scope so that you have a clear goal to address. For the purpose of this discussion, let's assume that you're building a data warehouse for the entire enterprise.
You also want to develop a high-level plan for the project and a detailed plan for the next iteration or two. The high-level project plan addresses major dependencies and overall strategy, whereas the detail iteration plan addresses current tactical issues; as with modeling, a little bit of upfront work and then addressing the details in a just in time (JIT) manner seems to work best. When you are planning the project, don't forget to consider long-term issues such as operations, support, and continued improvements to the warehouse. (On a project of this magnitude, if you're not willing to look at both the total benefit of ownership (TBO) and total cost of ownership (TCO) for the warehouse, you might as well stop now!)
At the end of each RUP phase, you make a go/no-go decision. At the end of the Inception phase, the primary issues are whether you have scope, schedule, and budgetary concurrence with your stakeholders and what appears to be a viable strategy for building the warehouse. Secondary issues include whether you have the project politics under control and the start of effective strategies for data stewardship and data governance.
There are several common pitfalls that you want to avoid during the Inception phase:
- Thinking that this is a traditional requirements phase. In RUP, ensuring that we understand the requirements is so important that we explore them throughout the lifecycle, not just at the beginning of the project.
- Thinking that you need to get your models and plans perfect. You only need to get it good enough for now. We'll see that it's actually quite straightforward to build a data warehouse in an evolutionary manner when we discuss the Construction phase below.
- Trying to create a comprehensive data model at the start of the project. In fact, you may not even need a data model at all. Many data warehouse projects seem to get hung up on the concept that there is a single version of the data truth out there, that there is a common, shared definition for your master reference data and perhaps even your major business entities. This is a nice vision to work toward, but don't let it prevent your team from delivering important business value in a timely manner. The fact is that various portions of your organization have different ways of working, different priorities, and different constraints. There may not be one single shared truth, and even if there were, it's going to change over time anyway. I recently walked through Heathrow Airport in London, and plastered throughout the tunnels were a series of HSBC advertisements (see http://www.yourpointofview.com/hsbcads_airport.aspx). These advertisements were very straightforward. They showed two similar pictures and below each picture, was a different word. Then they showed the same two pictures again, but switched the words around. The point was that people see the world differently, and that as a financial institution, they understand that and are flexible enough to act accordingly. My advice is directly related to this: Take a practical approach and recognize that there is a diminishing rate of return when it comes to modeling, and that you can quickly reach the inflection point where further investment in data modeling reduces the overall value to your organization. Once again, the failure rate of traditional data warehousing efforts speak for themselves.
Coming soon
In Part 2 of this article, I will describe what happens during the Elaboration, Construction, and Transition phases. You'll see how technical risk is addressed during the Elaboration phase by developing an end-to-end working skeleton of the system. You'll also see how the data warehouse evolves throughout the Construction phase based on the changing needs and priorities of its stakeholders. You'll see how the data warehouse is effectively deployed into production during the transition phase. Finally, I'll show how future releases of the data warehouse -- that's right, your work is never done -- are developed by working through the RUP lifecycle again and again.
Resources
About the author  | |  | Scott W. Ambler is a Practice Leader for Agile Development within the IBM Methods group. He develops process materials, speaks at conferences, and works with IBM clients worldwide to help improve their software processes. Scott is author of several books, listed on his Web site at www.ambysoft.com. Scott is also a recognized Ratonal Thought Leader, whose homepage may be viewed here. |
Rate this page
|  |