Level: Intermediate Larry Pay (lpay@ca.ibm.com), DB2 UDB Data Warehouse Consultant, Software Services for DB2 Information Management, IBM
10 Feb 2005 DB2 Cube Views enhances OnLine Analytical Processing (OLAP) in DB2 Universal Data Base V8.2 by extending the power of DB2 summary tables to many Business Intelligence (BI) applications. The DB2 Cube Views Optimization Advisor, used in conjunction with the DB2 Design Advisor, helps you achieve a data warehouse environment that uses an optimum number of DB2 summary tables to satisfy business requirements. This article discusses how to use DB2 Cube Views and what you need to do in order to use it effectively.
What is DB2 Cube Views?
 |
MQTs and summary tables
A Materialized Query Table (MQT) is a table whose definition is based upon the result of a query, and whose data is in the form of precomputed results that are taken from one or more tables.
A summary table is a special type of MQT whose fullselect contains a GROUP BY clause summarizing data from the tables that are referenced in the fullselect. Some examples of summary tables may be found in Listing 1 below.
|
|
DB2 Cube Views is an optional add-on product to DB2 Universal Database that enhances DB2 to improve OLAP (OnLine Analytical Processing) performance. DB2 Cube Views helps build the most cost-effective summary tables in DB2 by leveraging or rediscovering the metadata in existing OLAP tools.
When you install DB2 Cube Views, metadata tables are added to DB2, extending the database catalog and making DB2 OLAP-aware. Cube models and cubes are built using metadata imported into DB2 Cube Views from enabled third-party BI (Business Intelligence) applications, ETL (Extract, Transform, Load) tools, or modeling tools. The Cube Views Optimization Advisor is then used to generate SQL to create DB2 summary tables and indexes. Once these DB2 summary tables are created and refreshed, SQL queries generated by third-party BI applications are rerouted to these DB2 summary tables instead of the base DB2 tables. This results in a dramatic improvement in performance because the data is pre-computed and pre-aggregated.
DB2 Cube Views and third-party BI application tools
Most data warehouse environments have common metadata already stored in an existing BI application, ETL tool, or modeling tool. If you import this common metadata into DB2 Cube Views and analyze it, DB2 will have a good starting point for recommending the optimum number of summary tables and indexes for the data warehouse.
DB2 Cube Views integrates the knowledge about a third-party BI tool's multidimensional model with the knowledge of the DB2 optimizer. A typical DB2 Cube Views scenario is described as follows:
- Export the multidimensional model from a third-party BI application tool into an XML (eXtensible Markup Language) file using a metadata bridge.
- Import this XML file into DB2 Cube Views. After the metadata is imported, a semantic model is built in DB2 Cube Views showing objects that map to the data warehouse schema such as facts, measures, attributes, and dimensions.
- Add supplemental information about hierarchies and levels through the OLAP Center in DB2 Cube Views.
- Invoke the DB2 Cube Views Optimization Advisor to validate the completed cube model and cubes.
- After validation, the Optimization Advisor is used to recommend DB2 summary tables and indexes through automatically generated SQL scripts.
- The DB2 summary tables and indexes are created.
- SQL queries generated and submitted by the BI application tool are rerouted to DB2 summary tables.
The following figure illustrates the scenario:
Figure 1. Third-party BI applications sharing metadata with DB2 Cube Views
DB2 Cube Views facilitates the exchange of metadata information between multiple third-party BI applications, ETL tools, modeling tools, and DB2 through metadata bridges. There are many different scenarios of how to build a semantic model in Cube Views using these metadata bridges. To facilitate building the starting cube model, the key is to determine if information already captured in a BI application tool can be shared with DB2 Cube Views.
Metadata interchange between DB2 Cube Views and the third-party BI analytic tool is either 1-way or 2-way. A 1-way metadata interchange means unidirectional sharing (either export or import) of metadata between DB2 and the third-party analytic tools. A 2-way metadata interchange means a bidirectional sharing (both import and export are supported) of metadata between DB2 and third-party analytic tools. In some cases, a third-party BI or modeling tool needs to use an additional model bridge (like Meta Integration Model Bridge) to enable the 2-way metadata interchange.
Cube Views currently has metadata bridges to products such as:
- OLAP server tools
- OLAP reporting tools
- Extract, Transform, Load (ETL) tools
- Metadata management tools
- Custom applications
What makes a good DB2 Cube Views implementation?
There are several prerequisites to a good DB2 Cube Views implementation:
- An accurate and well-defined multidimensional model showing facts, dimensions, joins, levels, and hierarchies. This model might be in an existing BI application, an ETL tool, or a modeling tool which can export it into a Cube Views model.
- A keen knowledge (on the part of the Cube Views modeler) of what business metrics are likely to be queried, and what levels in the dimension hierarchies need to be aggregated.
- Good cooperation between the Cube View modeler, business analysts, and users, completing a well-defined Cube Views model in a reasonable amount of time.
- A good knowledge of how Cube Views models work and how they relate to existing BI applications. Knowledge of what metadata exists in current BI applications, ETL tools, or reporting tools can be reused to build the cube model. This can help accelerate the cube model building process.
- A good knowledge of Cube Views Optimization Advisor and how to generate what-if scenarios of summary tables that can be generated (depending on the query type). This should go hand in hand with the knowledge of what metadata within Cube Views can be exported to the BI application and exposed to the users, to aid deployment of query interfaces that take advantage of the summary tables.
- A good knowledge of how to create and maintain summary tables and indexes in DB2 and how to optimize DB2 queries using DB2 Design Advisor.
DB2 Cube Views is easy to set up and install. However, you need to spend some effort in the analysis of the metadata interchange between Cube Views and third-party BI application tool, and in creating and maintaining the cube model and cube objects in Cube Views. After you validate the Cube Views model, the generation of summary table SQL scripts is automatic.
Installing and setting up a database in DB2 Cube Views
- You must complete a clean installation of DB2 Universal Database Version V8.2 before installing DB2 Cube Views.
- The DB2 database and DB2 Cube Views servers and clients must be on the same DB2 V8.2 Fixpak levels. You need to install both the DB2 Cube Views server and client components in order to use DB2 Cube Views.
-
Install the DB2 Cube Views server component on the DB2 server on any certified AIX, Linux, Solaris, HP-UX, or Windows platform.
The DB2 Cube Views client component includes the OLAP Center and is available only on the Windows platform. It can be installed on any certified Windows NT, Windows 2000, Windows XP Professional, or Windows 2003 Server platform.
OLAP Center is the graphical user interface for DB2 Cube Views. Use OLAP Center for all Cube Views tasks to take full advantage of all the functions and wizards that facilitate the building of the cube model.
To access the OLAP Center, click the Windows Start button and select Programs > IBM DB2 > Business Intelligence Tools > OLAP Center
- Before you can start analysis, you have to set up the database for DB2 Cube Views. You can set up a DB2 database for use with DB2 Cube Views in two ways:
- Logging in to OLAP Center and invoking a database will automatically install the catalog tables and stored procedure necessary for Cube Views.
- Alternatively, to add the Cube Views catalog tables, execute the db2mdapi.sql SQL script found in the ~\sqllib\misc directory.
In a DB2 command window, change directory to ~\sqllib\misc and run the following command once per database:
db2 -tvf db2mdapi.sql
This script creates a tablespace named DB2INFOSPACE, and creates 35 catalog tables, constraints, and triggers with the schema DB2INFO in the DB2INFOSPACE tablespace. These catalog tables will contain all of the metadata information that will be generated with the creation of the cube model. The script also creates a stored procedure called md_message, which extracts operation and metadata information from the input parameters and then performs the requested metadata operations.
 |
Creating a cube model in DB2 Cube Views
You can create a cube model using OLAP Center in three ways:
- Importing from a BI application tool using the metadata bridge.
- Using Quickstart in OLAP Center.
- Creating a new model from scratch.
Importing from a BI application tool using the metadata bridge
Third-party BI application tools exchange metadata with DB2 Cube Views using a metadata bridge. See Figure 1.
This alternative is the recommended way of creating the Cube Views model, since it allows you to avoid the most labor-intensive portion of building the cube model: the data entry of object names and validation against established naming standards. In addition, building a cube model from the imported metadata ensures that summary tables can be built that will directly address the same multidimensional model that the BI application tool uses.
The steps to create a cube model in Cube Views using an exported XML file are similar to the steps taken to set up the CVSAMPLE database, the sample database provided with Cube Views. After creating the database and enabling it for Cube Views, export the XML file from the BI application tool, then use OLAP Center to import the XML file into Cube Views.
Using Quickstart in OLAP Center
You can also build a cube model by using the Quickstart option in OLAP Center. This option facilitates the process of modeling a basic cube model by using prompts which allow you to specify the fact object and its measures. After you have specified a fact object and its measures, the wizard creates a basic cube model with a fact object and other dimension objects, using previously defined DB2 Referential Integrity constraints information.
To use Quickstart: In OLAP Center, right-click the Cube Model option, and select Create Cube Model - Quickstart.
Creating a new model from scratch
You can create an empty cube model using the Cube Model wizard.
To do this: In OLAP Center, right-click the Cube Model option, and select Create Cube Model.
Before you can use the OLAP Center Optimization Advisor Wizard, you need to validate the cube model. A cube model created using the OLAP Center will validate successfully only after you add the following mandatory components:
- At least one facts object.
- At least one dimension.
- A hierarchy defined for at least one dimension.
- Joins between the existing facts object and dimensions.
- Attributes that reference existing table columns.
The following section lists the different Cube Views objects and how each object maps back to a multidimensional model.
The DB2 Cube Views model metadata objects
DB2 Cube Views metadata objects describe objects similar to multidimensional model structures -- in particular, star schemas and snowflake schemas. However, the objects of interest in Cube Views can be a particular grouping in or a subset of a star schema or snowflake schema. The following diagram shows the mapping of Cube Views metadata to the multidimensional model:
Figure 2. DB2 Cube Views metadata catalog
The following metadata objects are used in a DB2 Cube Views model:
- Attribute -- maps to either a single column in a table or an expression that is a combination of a set of columns or other attributes.
- Measure -- defines a measurement entity and is used in facts objects. Measures usually point to numeric columns in a fact table.
- Level -- consists of a set of attributes that are related and work together as one logical step in a hierarchy's ordering. The relationships between the attributes in a level are usually defined with a functional dependency.
- Hierarchy -- defines relationships among a set of one or more attributes that are grouped by levels in the dimension of a cube model.
- Join -- analogous to a table join in a SQL query. A join has a type and cardinality. Joins can be used in dimensions to join dimension tables together, or in a cube model to join the dimensions of the cube model to its facts object, or within a facts object to join multiple fact tables.
- Dimension -- a collection of related attributes that together describe an aspect of the data. A dimension can reference attributes from one or more dimension tables. However, if you use attributes from multiple dimension tables, the tables must have joins between them and the dimension must reference those joins. A dimension also references one or more hierarchies and can reference relationships between attributes.
- Facts object -- groups related measures that are interesting to a specific application. In a cube model, a facts object is used as a center of a star or snowflake schema. The facts object stores information about the attributes that are used in fact-to-dimension joins, and about the attributes and joins that are used to map the additional measures across multiple database tables. Therefore, in addition to a set of measures, a facts object stores a set of attributes and a set of joins.
- Cube Model -- representation of a logical star or snowflake schema. The cube model groups relevant dimension objects around a central facts object. It describes all data related to a collection of measures. Typically, the cube model relates to a star schema or snowflake schema in the database. The cube model references a single facts object and one or more dimensions.
- Cube -- a specific instance or subset of a cube model. The cube facts and cube dimensions are a subset of those that are referenced in the cube model. A cube is the closest to an OLAP conceptual cube. Most cubes can be retrieved with a single SQL statement.
- Cube facts -- reference a subset of the measures from the facts object from which they are derived.
- Cube dimension -- references a subset of the attributes of the dimension from which it is derived. It also references a single cube hierarchy.
- Cube hierarchy -- references a subset of the attributes of the hierarchy from which it is derived, where the order of the attributes must be in the same order as their order in the hierarchy.
- Cube level -- a subset of a level which is used in a cube. A cube level references the level from which it is derived, and inherits the level key attributes and default attributes that are defined for the parent level.
 |
DB2 Configuration, table design, and Cube Views considerations
In order to ensure that you can obtain the best Cube Views summary table recommendations, you should implement the following rules in DB2 and Cube Views, as recommended by Daniel DeKimpe of the Cube Views development team:
- Define indexes on fact table dimensional keys with high cardinality.
- Define indexes on dimension-level table keys with high cardinality.
- Define functional dependencies for inter-level relationships. The functional dependency indicates that a level object's default attribute and related attributes are functionally determined by the level's key attributes. Using this functional dependency can help minimize the size of the summary table.
- Provide ample temporary table space for summary table refresh.
- Perform runstats on all fact and dimension tables.
- Define non-nullable constraints for all joins if possible. This will ensure that queries can be routed to the summary tables.
- Provide ample sort heap space.
- Set the default refresh age parameter DFT_REFRESH_AGE, in the DB2 database configuration, to ANY. This time duration represents the maximum duration time since a REFRESH TABLE has been processed on a REFRESH DEFERRED MQT. If the REFRESH AGE has a value of 99999999999999 (ANY), REFRESH DEFERRED MQTs are considered in optimizing the processing of a dynamic SQL query.
- Use distributive measures (that is, SUM, MIN, MAX, COUNT) to get the best optimization results. A measure is distributive if the result derived by applying the function to different subsets of the data is the same as the result derived by applying the function to the whole set of data. For example, SUM(1st Quarter Sales) is the same as SUM(January Sales) + SUM(February Sales) + SUM(March Sales).
- Define functional dependencies. Confirm that there are functional dependencies created in the data. A functional dependency specifies that one or more columns are functionally dependent upon one or more other columns, provided all of the columns exist within the same table. If the functional dependencies are defined, DB2 Cube Views can recommend summary tables that include just the key columns from selected dimensions and levels, so that the resulting summary table is more narrow.
- Define non-nullable level keys if possible.
- The level at a dimension grain should have the primary key or a unique key as the level key.
- Validate that the model can be optimized.
In addition to these recommendations, you must also have a method to identify and resolve Referential Integrity (RI) nonconformance in the data warehouse implementation. Fortunately, there is a beta utility provided by Jian Le of the Cube Views Development team to detect and correct this RI nonconformance. This tool will detect missing primary keys, missing foreign keys, nullable foreign keys, and null values in null foreign key fields, and generate data definition language (DDL) to add the necessary DB2 informational constraints. You can download this utility from the IBM alphaWorks site (see Resources).
After you have built the Cube Views model in OLAP Center using the metadata objects as described above, the next step is to validate the model by accessing the Cube Views Optimization Advisor. To do this: Right-click the named cube model, and select Optimization Advisor.
Figure 3. Validating the cube model by accessing the Optimization Advisor
Cube Views validates the cube model against three types of rules for each metadata objects: base rules, cube model completeness rules, and optimization rules. These validation rules ensure that each metadata object is valid both in and out of the context of a cube model, and that effective SQL queries can be written and optimized.
The definitions and examples for these three types of rules are as follows:
- Base rules -- define a metadata object's validity outside the context of its use.
Examples:
- A facts object must refer to at least one measure.
- A dimension must refer to at least one attribute.
- Cube model completeness rules -- apply only to cube models and extend the base rules to ensure that a cube model links to other metadata objects appropriately and that effective SQL queries can be written.
Examples:
- A cube model must refer to one facts object.
- A cube model must refer to at least one dimension.
- Optimization rules -- ensure that SQL queries that are created for the metadata objects can be optimized successfully.
Example:
- The join used to join the facts object and the dimension must have a cardinality of 1:1 or Many: 1 and must join a fact table to a dimension's primary table.
A complete list of these rules can be found under the section "Metadata object rules" in the DB2 Information Center.
The DB2 Cube Views Optimization Advisor Wizard query type and optimization slices
After the DB2 Cube Views model has passed validation, you can use the Optimization Advisor Wizard.
The DB2 Cube Views Optimization Advisor Wizard generates two SQL scripts: one to create the summary tables, and another to refresh the summary tables.
In creating the summary table, the DB2 Cube Views Optimization Advisor Wizard allows the modeler to specify a query type and the optimization slices that represent how the cubes are to be used. An optimization slice is a method of defining critical cube regions that are prioritized for inclusion in the summary table recommended by the Cube Views advisor. No optimization slice is configured if there is no cube defined.
The DB2 Cube Views Optimization Advisor Wizard allows you to specify the following query type options for a cube:
Drill-down queries -- typically access a subset of the data at the top of a cube.
Report queries -- equally likely to access any part of the cube.
MOLAP extract queries -- load data from the lowest levels of the cube dimensions into a MOLAP (Multidimensional OLAP) data store.
Advanced settings -- indicates that optimization slices have been specified that are expected to be queried most frequently. On the query type selection screen, when you click the Specify tab, the Optimization Advisor Wizard displays a screen where you can graphically manipulate an aggregation level for a cube dimension. You can use this to define an optimization slice to handle the type of queries being issued. The following figure shows this screen:
Figure 4. Illustration of an optimization slice
The DB2 Cube Views Optimization Wizard summary tables option
After you have specified the query types and optimization slices, you must specify:
- Whether to create the summary table as immediate refresh or deferred refresh.
- Which table space to use for the summary tables.
- Which index table space to use for the summary tables.
- How much disk space the Optimization Advisor can use when determining summary table recommendations.
- How much time the Optimization Advisor can take when determining summary table recommendations.
- Whether or not the Optimization Advisor uses data sampling in determining summary table recommendations.
Click here for examples of the SQL scripts that the Cube Views Optimization Wizard generates to create and refresh summary tables in DB2.
Using DB2 Cube Views with the DB2 Design Advisor
DB2 V8.2 comes with a DB2 Design Advisor that will analyze a SQL workload to come up with index, MQT (Materialized Query Table), MDC (Multidimensional Clustered Table), and partitioning recommendations.
To achieve the best combination of MQTs and summary tables, use both the Cube Views Optimization Advisor and the DB2 Design Advisor. The Cube Views Optimization Advisor uses a model-based approach to MQTs, while the DB2 Design Advisor uses a SQL workload approach.
The following steps are recommended in using both DB2 Cube Views Optimization Advisor and DB2 Design Advisor:
- Run the runstats utility on all tables in the system.
- Monitor and capture the SQL workload being run in the system.
- Run the DB2 Design Advisor to get initial index recommendations.
- Review the index recommendations from DB2 Design Advisor, and implement them if there is an expected performance gain.
- Create the Cube Views model and the necessary cubes to address your requirements.
- Run the DB2 Cube Views Optimization Advisor to get summary table and index recommendations.
- Review and implement the Optimization Advisor summary table and index recommendations.
- Run the DB2 Design Advisor to get index and MQT recommendations.
- Review the index and MQT recommendations from the DB2 Design Advisor, and implement them if there is an expected performance gain.
- Repeat steps 6 to 9 above as necessary.
Both advisors complement each other. The index recommendations from the DB2 Design Advisor may be used by the Cube Views Optimization Advisor, and the DB2 Design Advisor may recommend MQTs that cover other queries that cannot be handled by the summary tables from the Cube Views Optimization Advisor. This process of tuning using both advisors is iterative, and you should repeat it every time there are significant changes in the Cube model, the SQL workload, or data volumes.
Cube Views is a very powerful tool and provides a much-needed performance boost to OLAP users. The best practices cited above will help you to harness it properly.
Resources Learn
Get products and technologies
About the author  | 
|  | Larry Pay is an Engagement Specialist from the DB2 Data Warehouse Consulting Group, IBM Software Services for DB2 Information Management at the IBM Toronto Laboratory. He has been helping customers implement DB2 and Business Intelligence Solutions for over 5 years. Larry is an IBM Certified Database Administrator for DB2 UDB. |
Rate this page
|