Skip to main content

skip to main content

developerWorks  >  Information Management  >

Introducing DB2 9, Part 2: Table partitioning in DB2 9

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Learn and share!

Exchange know-how with your peers -- try our new Pass It Along beta app


Rate this page

Help us improve this content


Level: Intermediate

Rav Ahuja (rsahuja@ca.ibm.com), DB2 Program Manager, IBM

24 May 2006

Improve performance and achieve greater scalability and easier management by using table partitioning, a feature of DB2® 9 (formerly codenamed "DB2 Viper." As tables grow in size, it may be easier to manage your data in chunks or by limited ranges. This article takes a quick look at the basics of table partitions and gives you insight into the benefits of this capability. For more information on the new features of DB2 9, read other articles in the series.

Introduction

Management of large databases and tables has its own challenges. As tables grow in size, it sometimes becomes easier to manage their data in chunks or by limited ranges, rather than operate on all of the data in those large tables in their entirety. This methodology of managing data is also especially useful when chunks or ranges of new data are added or old data is removed from tables at frequent intervals (for example, roll-in and roll-out operations typical in data warehousing environments).

DB2 9 includes table (range) partitioning capabilities for easier management, improved performance, and greater scalability of large databases.

What is table partitioning?

Table partitioning (sometimes referred to as range partitioning) is a data organization scheme in which table data is divided across multiple storage objects, called data partitions (not to be confused with database partitions or DPF), according to values in one or more table columns. These storage objects can be in different table spaces, in the same table space, or a combination of both.

DB2 9 supports data partitions or data ranges based on a variety of attributes. A commonly used partitioning scheme is the date, where you may decide to clump together data in data partitions such as by year or month. You could also have numeric attributes for partitioning, for instance, records with IDs from one up to 1 million are be stored in one data partition, IDs from 1 million to 2 million in another data partition, and so on. Or for example, you could have records for customers with names starting with A-C in one data partition, D-M in the second data partition, N-Q in a third data partition, and R-Z in the last data partition.

Although you have the option of referring to data partitions by names or numbers (useful for data partition operations), they can be completely transparent to applications. That is, applications can continue to access data by specifying column and table names, and do not need to worry about which data partition(s) the data resides in.

The table partitioning functionality in DB2 for Linux®, UNIX®, and Windows® is similar to equivalent capabilities in DB2 for z/OS®, Informix® Dynamic Server, and Informix Extended Parallel Server. The DB2 for z/OS, DB2 for iSeries™, and DB2 for Linux, UNIX, and Windows use a common base syntax, but implement a different subset. DB2 for Linux, UNIX, and Windows uses an abbreviated syntax that is actually much more concise than the syntax of either of the other two products.

Table partitioning benefits

The benefits of table partitioning in DB2 9 include:

Improved manageability: DB2 9 allows the various data partitions to be administered independently. For example, you can choose to back up and restore individual data partitions instead of entire tables. This lets you break down time-consuming maintenance operations into a series of smaller operations.

Increased query performance: The DB2 optimizer is data partition aware. Therefore, during query execution, only the relevant data partitions are scanned. Eliminating the need to scan data partitions that are not impacted by the query can result in improved performance.


Figure 1. Scanning only relevant partitions
Scanning only relevant partitions

Fast roll-in / roll-out: DB2 9 allows data partitions to be easily added or removed from the table without having to take the database offline. This ability can be particularly useful in a data warehouse environment where you often need to load or delete data to run decision-support queries. For example, a typical insurance data warehouse may have three years of claims history. As each month is loaded and rolled-in into the warehouse, the oldest month can be archived and removed (rolled-out) from the active table. This method of rolling out data partitions is also more efficient as it does not need to log delete operations, which would be the case when deleting specific data ranges.

Better optimization of storage costs: Table partitioning in DB2 9 lets you integrate better with hierarchical storage models. By only using your fastest and most expensive storage hardware for only the most active data partitions, DB2 9 allows you to optimize your overall storage costs and improve performance. If most of your queries only run against the last three months of data, you have to option to assign slower and less expensive storage hardware to older data.

Larger table capacity: Without partitioning, there are limits on the maximum amount of data a storage object, and hence a table, can hold. However, by dividing the contents of the table into multiple storage objects or data partitions, each capable of supporting as much data as in a non-partitioned table, you can effectively create databases that are virtually unlimited in size.

Greater index placement flexibility: DB2 9 allows indexes for partitioned tables to be stored in their own storage objects (table spaces), as opposed to being in the same storage object as the non-partitioned table. This index placement flexibility is particularly useful for performing faster index operations (such as drop index, online index create, and index reorganization), managing table growth, and reduced I/O contention, providing more efficient concurrent access to the index data for the table.

"We are very pleased with some of the new features we’ve tested in DB2 9 …The enhancements to ALTER TABLE work well, improve productivity, and save time. The ability to place indexes anywhere for partitioned tables is also particularly useful when we run out of space in the designated table space."
- Ellen Reys-Klebaner, Chief Database Architect, Visa

Create and use table partitions

DB2 provides a lot of flexibility for creating partitioned tables. Say you have one year’s worth of data and you want to partition it by date, so that each quarter resides in a separate data partition. The following create table syntax illustrates how to do this easily. You could also use the graphical DB2 Control Center for creating and managing data partitions.

CREATE TABLE orders(id INT, shipdate DATE, …)
  PARTITION BY RANGE(shipdate)
    (
    STARTING '1/1/2006' ENDING '12/31/2006' 
      EVERY 3 MONTHS
    )

This results in a table being created with four data partitions, each with three months of data.


Figure 2. Table with 4 data partitions
Table with 4 data partitions

As the following example illustrates, it is also possible to specify the data partition ranges explicitly, have open ended ranges, and give names to each data partition. Naming data partitions is useful for performing data partition operations such as DETACH (more on this to follow).

CREATE TABLE orders(id INT, shipdate DATE, …)
  PARTITION BY RANGE(shipdate)
    (
  PARTITION 4q05 STARTING MINVALUE,
    PARTITION 1q06 STARTING '1/1/2006',
    PARTITION 2q06 STARTING '4/1/2006',
    PARTITION 3q06 STARTING '7/1/2006',
    PARTITION 4q06 STARTING '10/1/2006' 
                  ENDING ‘12/31/2006'
    )

With table partitioning, you can easily roll-in and roll-out data instantaneously. The ATTACH and DETACH options in the ALTER TABLE command facilitate these operations. Using DETACH, an existing data partition or range of values is split off into a standalone table. The rolled-out (detached) table can be dropped, archived, moved to slower storage, and so on.

Now say you want to keep only one year’s (2006) data in the orders table created in the last example. You can DETACH the partition containing the old (pre-2006) data.

ALTER TABLE orders 
	DETACH PARTITION qold INTO oldorders

Similarly, attaching (rolling-in) data as a new data partition in a table is just as easy. The data that is to be rolled-in is first loaded into a separate (staging) table, transformed or cleansed if needed, and then attached to an existing table as shown in the following example.

// CREATE TABLE neworders
// load / insert desired data into neworders
// transform or cleanse new data if neeeded
  ALTER TABLE orders
  	ATTACH PARTITION 1q07
  	STARTING '01/01/2007'
  	ENDING   '03/31/2007'
  	FROM TABLE neworders 
// COMMIT 
// SET INTEGRITY …
// COMMIT

Before the rolled-in data is visible to applications, the SET INTEGRITY statement needs issued (to validate the new data and perform maintenance for global indexes) and the work commited.

Figure 3 shows the rolled-in and rolled-out data partitions in the orders table used in the previous examples.


Figure 3. Rolled-in and rolled-out data partitions
Rolled-in and rolled-out data partitions

Table partitioning and other organization schemes

Table partitioning in DB2 9 can be used in isolation or in combination with other data organization schemes. Each clause of the CREATE TABLE statement includes an algorithm to indicate how the data should be organized. The following three clauses demonstrate the levels of data organization that can be used together in any combination:

  • DISTRIBUTE BY -- Spreads data evenly across database partitions. Use this clause to enable intraquery parallelism and distribute the load across each database partition. This concept is known as database partitioning and is enabled using the Database Partitioning Feature (DPF) in DB2.
  • PARTITION BY -- Groups rows with similar values of a single dimension in the same data partition. This concept is known as table partitioning.
  • ORGANIZE BY -- Groups rows with similar values on multiple dimensions in the same table extent. This concept is known as multidimensional clustering (MDC).

This syntax allows for consistency between the clauses as well as allowing for future algorithms of data organization. Combining the DISTRIBUTE BY and PARTITION BY clauses of the CREATE TABLE statement allows data to be spread across database partitions spanning multiple table spaces.

DB2 9 is the first data server to support all three methods of grouping data at the same time. This is a major innovation in improving data management and information availability.

Figure 4 illustrates all three DB2 data organization schemes being used in conjunction with each other:


Figure 4. Three DB2 data organization schemes
Three DB2 data organization schemes

Conclusion

Table partitioning in DB2 9 provides powerful capabilities for managing large volumes easily and quickly. You are encouraged to download DB2 9 and try out this feature for yourself to see how you can benefit from this and other enhanced features in this new version of DB2.



Resources

Learn

Get products and technologies
  • Download DB2 9 and try it out today.

  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.

  • Build your next development project with IBM trial software, available for download directly from developerWorks.


Discuss


About the author

Rav Ahuja is a worldwide DB2 program manager based at the IBM Toronto Lab. He has been working with DB2 for Linux, UNIX, and Windows since version 1 and has held various roles in DB2 development, technical support, marketing, and product strategy. He works with customers and partners around the globe helping them build and benefit from DB2 and services-based solutions. Rav is a frequent contributor to DB2 papers, articles and books. He holds a Computer Engineering degree from McGill University and MBA from University of Western Ontario.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top