 | Level: Introductory Hui Liao (huiliao@us.ibm.com), IBM Red Brick Warehouse development, IBM Silicon Valley Lab
17 Jul 2003 This article uses a case study to demonstrate the details of the backup and restore functions provided by the IBM Red Brick Warehouse Table Management Utility. Then it takes you a step further, discussing problem solving and the integration of TMU backup and restore operations with a storage management system.
© 2003 International Business Machines Corporation. All rights reserved.
Important
: Read the
disclaimer
before reading this article.
Introduction
In Version 6.20, IBM® Red Brick® Warehouse implemented a full-featured Backup and Restore system inside the Table Management Utility (TMU). The TMU supports online and checkpoint backups, full and incremental backup levels, and automatic data restores. You can back up data directly to disk files or tape devices, or you can use a storage management system that is compliant with the X/Open Backup Services API (XBSA).
After a brief look at the backup and restore functionality, this article demonstrates how to configure and use the backup and restore system through a case study, then discusses some problems you may run into and how to solve them. Finally, the article shows how to integrate TMU backup and restore operations with a storage management system, using examples of two products: Tivoli® Storage Manager and Legato Networker.
Overview
Backup levels
The TMU supports three levels of backups:
-
Level 0, also called a
full backup
: a backup of all database objects.
-
Level 1, also called an
incremental backup
: a backup of database objects that have changed since the last level 0 backup.
-
Level 2, also called an
incremental backup
: a backup of database objects that have changed since the last backup of any level (0, 1, 2).
The TMU also supports
external full backups
, which are full backups performed outside of the Red Brick server, using external backup tools or an operating-system utility program.
Backup modes
The TMU supports two backup modes:
online
and
checkpoint
. An
online backup
takes place while the database is available for both read and write operations. A
checkpoint backup
takes place while the database is available for read-only operations but not for write operations.
Online backups give greater flexibility in backup scheduling and eliminate downtime of the database system. Checkpoint backups provide consistent backup images and are required for a data restore.
Restore operations
The TMU supports various types of restores:
-
A
database restore
restores all objects in the database.
-
A
partial restore
restores a specific segment or physical storage unit (PSU).
-
A
cold restore
is a full restore for a database that cannot be brought online.
-
A
foreign restore
restores data from an external backup.
The
IBM Red Brick Warehouse Table Management Utility Reference Guide
provides detailed explanations for backing up and restoring databases with the TMU. This article goes beyond the scope of the product documentation by presenting a specific case study that includes some performance data, as well as some facts and tips for successfully backing up and restoring large warehouse databases.
Before you start
Before you start running TMU backups, you need to establish a backup strategy, according to the database configuration and backup/restore solution requirements, then choose your backup media.
Establish backup strategy
In order to understand the issues, let's look at an example case study.
Example database configuration:
- The total size of the database is about 530 GB, with 198 segments and 566 PSUs.
- The database is modified (with versioning on) daily (Monday through Sunday) from 5am to 11pm.
- No database modifications are expected between 12 a.m. and 4 a.m. daily.
- Most daily modifications affect the same group of segments. The estimated percentage of the database that is modified is up to 15% (~80 GB) every day, and up to 20% (~106 GB) over the course of a week.
Backup/restore solution requirements:
- There should be no database downtime during the expected modification time frame (5am to 11pm daily).
- During non-busy hours at night (12 a.m. to 4 a.m.), the performance of queries on the database is not a concern.
- Modifications of the database should be backed up daily, and these modifications should be capable of being recovered as much as possible in case of a disaster situation.
- If a database restore is required, the restore time should not exceed 24 hours.
Backup strategy:
According to the database configuration and backup/restore requirements, the following backup strategy is established:
-
Schedule a level-0 backup every Monday starting at 12 a.m. Since a level 0 backup of 530G data is time-consuming,
online
backup mode will be used to avoid any database downtime.
-
Schedule incremental backups daily. An incremental backup will not only capture changes to the database, it will also take much less time and consume much less space than a full backup. Here is the daily incremental backup schedule:
-
Schedule a level-2 backup every night (except on Mondays and Fridays; see below) starting at 12 a.m. when no database modification is expected.
Checkpoint
backup mode will be used to guarantee a restore point. This daily backup will capture all database changes during the day.
- Schedule a level-1 backup every Friday night starting at 12 a.m. This backup will capture all database changes since the level-0 backup performed on Monday. The purpose of weekly level-1 backups is to minimize the database restore time.
Choose backup media
Depending on your business need, you can back up your data directly to disk files or tape devices, or use a storage management system such as Informix® Storage Manager, Legato, or Tivoli.
Direct backups to disks or tape devices are simple to operate and also cost-effective for small and medium-size databases. For very large databases, consider using a storage management system. A storage management system manages the backup media for you and offers features such as data compression and fail-over that the TMU does not yet provide.
In this example, the choice is to back up data directly to disk files to simplify operations.
Configuration
After deciding on a backup strategy and choosing the backup media, configuring the TMU backup/restore system is quite easy. Simply follow these steps:
Now the configuration is complete, and you are ready to run backup operations.
Backing up data
Using the TMU command:
Backup/restore operations are done through commands specified in TMU control files. For example, to run a level 0 online backup to directory
/backup_dir0
, the TMU control file
backup_l0.tmu
contains the following line:
backup to directory '/backup0' online level 0; |
You simply run the command
rb_tmu backup_l0.tmu admin adminpwd
to start a level 0 backup (where
admin
and
adminpwd
are the DBA's username and password).
For incremental backups, just specify the appropriate command in the TMU control file. For example, for a level 1 checkpoint backup:
backup to directory '/backup1' checkpoint level 1; |
For a level 2 checkpoint backup:
backup to directory '/backup2' checkpoint level 2; |
Performance:
The performance of backup (and restore) operations is I/O bound. The operation of backup/restore is serialized; the TMU does not yet support multiple concurrent backup/restore sessions. This also means that the TMU cannot utilize concurrent backup/restore operations provided by the underlying storage management system.
In this case, data was backed up directly to disk files on the local machine. The backup operation was run on a 64-bit Solaris Sparc9 (750 MHz), and the average backup/restore rate was about one gigabyte per minute:
- About 7.5 hours to finish a level 0 backup (530 GB data, as shown in the following rb_tmu output)
- About 1.1 hours for a level 2 backup (80 GB data)
- About 1.5 hours for a level 1 backup (106 GB data).
Here is the output of the TMU operation for the level 0 backup:
(C) Copyright IBM Corp. 1991-2002. All rights reserved.
Version 06.20.0000(0)TST
** INFORMATION **
(523) Backup of database '/perf/local/33/huiliao/toucan_db'
with backup <strong>level 0</strong>, backup type <strong>ONLINE</strong>,
and backup media <strong>DIRECTORY</strong> started. ...
** INFORMATION ** (7051)
Backup to '/backup_dir0/rb_bar_liao_toucan_db.20030413. 043719.00022549.0547'
started on Sunday, April 13, 2003 4:37:19 AM.
** INFORMATION ** (7061)
Backup to '/backup_dir0/rb_bar_liao_toucan_db.20030413. 043719.00022549.0547'
completed on Sunday, April 13, 2003 4:37:19 AM.
** INFORMATION ** (7087)
Backup of the database /perf/local/33/huiliao/toucan_db <strong>
completed successfully</strong> on Sunday, April 13, 2003 4:37:20 AM.
** STATISTICS ** (500)
Time = 04:56:14.36 <strong>cp time</strong>, 07:27:03.36 time,
Logical IO count=69645752, Blk Reads=2640848, Blk Writes=2474915 |
The output from the "time" utility for the above
rb_tmu
command is as follows:
<strong> real 7:27:05.4 user 18:49.8 sys 4:37:24.6 </strong> |
If you are using a storage manager, data compression provided by the storage manager can help improve performance. (Refer to the following section on backup data compression.) Depending on the degree of compression, the performance improvement can be quite significant.
For example, we tested full TMU backup performance with an 8.5-gigabyte database using Tivoli Storage Manager on an AIX 4.3 64-bit platform (CPU: ~370MHz). We ran the same backup with and without data compression. The TSM client and server were running on the same machine. Backup storage was locally attached. The time of a full backup with data compression was only about 1/3 of the time of a full backup without compression. The results of the test are shown in the following table:
| Directly to local disk files | TSM, local disk files, no data compression | TSM, local disk files, data compression | | Full backup time | 43 minutes | 48 minutes | 17 minutes |
Backup data compression:
The TMU does not yet support backup data compression. Therefore, to back up directly to disk or tape, you need a little bit more backup space than the size of the data being backed up. The extra space is used to store metadata information for the PSUs being backed up. This overhead is rather small: from 16 KB to 40 KB per PSU. (For the 530 GB database, the backup space overhead is around 4 MB.)
However, you can achieve data compression outside of the TMU backup/restore system by using one of the following approaches:
-
External full backup:
If you use external backup tools for level 0 backups, external compression tools can be a good choice. For example, if you perform an external backup using the "tar" utility, you can compress the archive before writing it to the backup location:
-
Compression feature provided by storage managers:
Most storage management systems provide a compression feature for backup data. For example, on Tivoli Storage Manager, you can enable compression for a client node by checking
Client Compression Setting
in the Tivoli Server Administration Web Interface before you start a backup operation. This allows backup data sent from the TMU to be compressed before it is sent to the storage manager. In our tests, we saw approximately an 8:1 compression ratio on Tivoli: to back up a database with 8.3 GB data, the backup space used by Tivoli was about 1.02 GB.
Restoring data
A database restore restores all objects in the database. By default, the TMU restores the database up to the last checkpoint backup. For example, we scheduled a database restore for the 530 GB database on Sunday morning at 8am. Before starting the restore, we stopped all activities against the database. Then we simply executed a TMU control file with the following command:
The restore path is as follows: first restore from the level 0 backup performed on Monday at 12 a.m., then restore from the latest level 1 backup performed on Friday at 12 a.m., followed by the level 2 backup performed on Saturday at 12 a.m. The total restore time was (7 + 1 + 1.4) = 9.4 hours (as shown below). This meets our established restore time requirement (not to exceed 24 hours).
(C) Copyright IBM Corp. 1991-2002. All rights reserved.
Version 06.20.0000(0)TST ** INFORMATION ** (7054)
Starting database restore of database /perf/local/33/huiliao/toucan_db. ...
** INFORMATION ** (7044)
Completed restore from /Backup_dir2/rb_bar_liao_toucan_db.20030413. 091358.00004299.0001
on Sunday, April 13, 2003 6:22:05 PM.
** INFORMATION ** (560)
Restore process will re-start the database /perf/local/33/huiliao/toucan_db now. ** INFORMATION ** (7088)
Restore of the database /perf/local/33/huiliao/toucan_db
completed successfully on Sunday, April 13, 2003 6:22:45 PM.
** STATISTICS ** (500)
Time = 05:41:04.68 cp time, 09:24:08.91 time,
Logical IO count =94087412, Blk Reads=3330116, Blk Writes=3443960 |
Here is the output from the "time" utility for the above
rb_tmu
command:
<strong> real 9:24:08.9 user 19:49.1 sys 5:23:15.6 </strong> |
The restore rate is similar to the backup speed. The restore time depends on the amount of data to be restored and the disk/tape I/O speed on the machine where the restore is running.
Some administration tips
Backup media history file manipulation:
The text-based backup media history file
($RB_CONFIG/bar_metadata/<db_name>/rbw_media_history)
contains backup information for all PSUs being backed up. The size of this file grows as you continue to run backup operations. If your operating system has a file size limitation, you should check whether this file is reaching the size limit; when the file size limit is reached, you will not be able to perform further backup operations. When this happens, you must remove records of those backups you no longer need from the file before resubmitting the backup command.
You can do this by directly editing the file with a standard tool (such as
vi
or
emacs
). You must be very careful when you directly edit this file; refer to the TMU Reference Guide for detailed instructions. Another option is to use the TMU load utility to load the file into a table, delete the unwanted records using an SQL statement, and then use an SQL EXPORT statement to export the modified records back to the media history file.
During the record removal process, get the list of backup files (or XBSA copy IDs if you use a storage manager) that correspond to the removed records. After you have removed records from the media history file, you can delete corresponding backup files (or XBSA objects) for those removed records to free your backup space.
The following example demonstrates how to use the load utility and EXPORT statement to remove media history records. In this case, we want to remove all of the backup records for operations on or before 04/25/2003:
-
Save the original
rb2_media_history
file.
- Create a table with columns defined to match the media history record format (refer to the TMU Reference Guide for details):
- Load the media history file into the above table media_history, using the following TMU control file:
-
Query the table
media_history
and make sure that it is safe to remove all records before 04/25/2003: after removal, the rest of the media history records can be used for further backup and restore operations.
For example, in this case, we want to be able to perform database restores as well as incremental backup operations after removing the records. We need to make sure that there was a level 0 backup performed
after
04/25/2003, and that there was a checkpoint backup performed after the level 0 backup. Since each sequence number in the media history record uniquely identifies a backup operation, we used the
seqnum
column in the query:
- Get the list of backup media IDs of those records to be removed and save the list to a file ("mediaID_list"):
-
Remove records from the
media_history
table: all records with a sequence number less than 30 (the sequence number of the first valid level 0 backup after removal) can be removed.
-
Export the
media_history
table to the media history file
rbw_media_history
:
-
Check that the contents of the updated
rbw_media_history
file look fine: for example, there are valid level 0 backups and checkpoint backups. We also ran a level-2 backup just to make sure the updated file was valid.
-
Remove the list of backup files stored in the
mediaID_list
file to free up backup space.
When a backup fails:
If there is a failure during a backup operation, all the data that was backed up before the failure point is still valid. If the failed backup was a
checkpoint
backup, the TMU will mark the backup mode of all PSUs that have already been backed up successfully before the failure point as
online
. This feature provides optimum error recovery, especially in situations where the failure occurs at the middle or end of a long-running backup operation. In such cases, all you need to do is to reissue the backup command and the TMU will back up only the data that has not been backed up since the failure point.
When your database is completely corrupted:
If you have backup data for the database and the backup metadata directory (
$RB_CONFIG/bar_metadata/database_name
) is intact (or can be restored from a backup copy), a cold restore can be your savior. For example, if my database path was
/redbrick/DB
and I accidentally removed all the files under this directory, these are the steps I would use to restore the database:
- Verify that the environment variables RB_PATH and RB_CONFIG are set to the same values as were set for the lost database. Also use the same locale (default English locale in this case).
- Re-create the database:
- Since the backup metadata directory for the lost database is intact, no further action needs to be taken. Otherwise, you would need to restore this directory from a backup copy.
- Start a database restore using a TMU control file with the following command:
The backup segment is restored as part of the database restore, so you can start incremental backups immediately after the database is restored successfully:
RISQL> select BACKUP_SEGMENT from dst_databases; BACKUP_SEGMENT BACKUPSEG |
Integration with a storage management system
Several storage management products have implemented the X/Open Backup Services API (XBSA), such as IBM Tivoli Storage Manager and Legato Networker. TMU backup/restore is fully integrated with storage managers through XBSA APIs. Users perform backup/restore operations using TMU commands, and the underlying storage management system controls and manages the location and contents of the backups.
Before you start using TMU backup/restore with an XBSA-compliant storage management system, you need to first install and configure the storage manager and required modules for XBSA interfaces. According to our experiences on Tivoli and Legato, there are some differences in XBSA API implementation among vendors. So there may be slightly different configurations when TMU users configure different storage management systems.
The IBM Red Brick Warehouse development team has tested two storage management systems (Tivoli Storage Manager and Legato Networker). If you run into problems when using other XBSA-compliant storage management systems with the TMU, contact the IBM Red Brick Warehouse Technical Support team for assistance.
The following sections take you step-by-step through the configuration process we used on Tivoli and Legato. The process is based on personal experience and should be used only for reference purposes. Refer to the storage manager documentation for complete configuration guidelines.
Using Tivoli Storage Manager: (Version 5, Release 1, on an AIX machine)
Install Tivoli SM software:
The required software includes: Tivoli Storage Manager Server, Tivoli Storage Management Device Support, and Tivoli Storage Manager Backup-Archive Client.
-
Install Tivoli Storage Manager Server and Device Support on machine
host1
.
- Start the TSM server (dsmserv).
-
Install the Backup-Archive client on machine
client1
where the Red Brick Warehouse server is installed. In this case, the XBSA library is installed under
/usr/tivoli/tsm/client/Informix/bin64/bsahr10.o
.
Configure Tivoli SM:
After installation, Tivoli SM provides a default configuration that you can use with TMU backup/restore. However, the default setting may not be sufficient for your needs. These are the steps we followed to customize our configuration.
- Log into the TSM Server Administration Web Interface page: http://host1:1580
-
Create backup storage: (In this case, we used disk files as the backup media.)
-
Create a storage pool:
Go to
Object View -> Server Storage -> Storage Pools -> Disk storage pools
.
We created a disk storage pool "BAR_DISKPOOL" by clicking
Disk Storage Pools
, and choosing
Operations "Define a new disk storage pool"
.
-
Create volumes for the above-defined storage pool:
Go to
Object View -> Server Storage -> Storage Pools -> Disk storage pools -> BAR_DISKPOOL -> Volumes
.
Choose
Operations "Define a disk storage pool volume"
and enter volume name "
/BAR/storage/backup_dir/file1
"; this is the disk file that will be used for TMU backups. Define as many volumes as you need.
-
Create a backup policy and a new client node:
-
Define a policy domain:
Object View -> Policy Domains: Options (Define Policy Domain).
We defined a policy domain called "BARPolicy".
-
Define a policy set and its management class:
Object View -> Policy Domains -> BARPolicy -> Policy Sets: Options (Define Policy Set)
. The policy set we defined is "BARPolicySet." Use a similar procedure to define "
Management Classes
" of this policy set. The management class we defined is "BARPolicyMagmt."
-
Assign the storage pool defined above (BAR_DISKPOOL) for the just defined policy set:
Object View -> Policy Domains -> BARPolicy -> BARPolicySet -> Management Classes -> BARPolicyMagmt -> Backup Groups -> Define Backup Group.
Set
Choose Copy Destination
to "BAR_DISKPOOL."
-
Assign the above-defined management class "BARPolicyMagmt" as the
Default Management Class
.
-
Validate
and
Activate
the above-defined policy set "BARPolicySet."
-
Register a new client node for the above-defined policy domain:
Object View -> Policy Domains -> BARPolicy -> Client Nodes: Operations "Register a new node."
Set the
Policy Domain Name
to the above-defined policy domain "BARPolicy." To use compression, set the
Client Compression Setting
to "
YES
." This node name entered here will later be used to set the BAR_SM_USER configuration option for TMU backup/restore. The node name we entered is "BAR_CLIENT."
Environment variables and error log file on client machine:
-
DSMI_DIR: points to the client installation directory. You can use this environment variable if the client software is installed in a nonstandard location. (For example, in my case, the client installation directory is in a standard location:
/usr/Tivoli/tsm/client/api/bin64
.)
-
DSMI_CONFIG: points to the configuration file (
dsm.opt
) that specifies which TSM server the client connects to. You can get the sample file (
dsm.opt.smp
) from the client installation directory. The server name specified in file
dsm.opt
must point to a server name defined in the file
dsm.sys
.
-
dsierror.log
: TSM errors (if any) are logged into this file. By default, the file is created in the directory where the XBSA application (
rb_tmu
in our case) is run. Make sure this directory has appropriate write permission.
Configure TMU backup/restore:
-
Verify connectivity with the TSM server by using the Red Brick
barxbsa
utility:
You can run
barxbsa
from any TSM client machine. Make sure the file
dsm.opt
on the TSM client machine points to the correct TSM server name. For example, in my case, the TSM server runs on machine host1. The TSM client is installed on machine client1 (where the Red Brick server is installed). The files
dsm.opt
and
dsm.sys
on the client machine contain the following lines:
-
Update the
rbw.config
file to use TSM:
Now you are ready to run TMU backup and restore operations. For example, to perform an online level 0 backup:
backup to XBSA online level 0; |
Using Legato Networker: (Version 6.0.1, on a Solaris machine)
Install Legato software
The required Legato software includes: Legato Networker Server, Legato Networker Client, and Informix Networker module.
-
Install Legato Networker Server software on machine
host 2
.
-
Install Legato Networker Client software on the machine
client2
where the Red Brick server is installed.
-
Just "untar" the Informix Networker module on machine
client2
to directory
/work/Legato/client/
. The XBSA library
libxnmi.so.1
is the one used for TMU backup/restore.
Configure Legato Networker:
-
Start the Legato Networker server:
/etc/init.d/networker start |
Usually, there is a script in an rc directory to enable automatic startup on boot.
-
Start the Legato NW GUI administrator tool:
nwadmin
-
Grant user "redbrick" Administrator privilege (user "redbrick" is the user of the
rb_tmu
who runs backup/restore operations): Click
server
to bring up the Server window; for the
Administrator
option, fill in username "redbrick" and click
Add
.
-
Create backup media: The list of available tape devices should be shown automatically under the "Devices" window. You need to mount and label the tape device you choose for backup media. In this case, we used disk files as the backup media:
Media -> Devices -> Create: Name:
Path name of the backup directory (
/BAR/Legato/Backup_dir
in this case).
Media type: file
.
-
Create a backup media pool:
Media -> Pools -> Create: Name:
Enter a pool name. For example, we defined a pool "BARpool" for TMU backups. If you don't want to create a new pool, you can just use the "Default" pool.
-
Label the device: Bring up the
Label
window, specify a volume name, and pick up a volume pool name. For example, we checked "BARpool" as created above.
-
Mount the device: Highlight the device name and click the
Mount
button.
Set environment variables:
-
NSR_DATA_VOLUME_POOL
: This environment indicates the volume pool that the backup will use. By default, pool "Default" will be used. To target specific volumes, set this environment to a specific pool that was predefined. In our case, we used the above-defined media pool "BARpool" for backup operations:
-
NSR_DEBUG_LEVEL
: Turn on XBSA tracing. For example:
-
NSR_DEBUG_FILE
: Specify the file that the trace information is written to.
-
NSR_SERVER
: Points to the server that the client is to connect to.
-
NSR_NO_BUSY_ERRORS
: If the NSR server is not running, the connection does not usually time out with an error; instead it keeps retrying the connection to the storage manager daemon. To prevent the NSR client from retrying the connections if the server does not respond, set this variable to TRUE. This will cause the rb_tmu to fail if the server doesn't respond.
Configure TMU backup/restore:
- Verify connectivity with the NSR server by using the Red Brick barxbsa utility:
-
Update the
rbw.config
file to use Legato Networker:
Now you are ready to run TMU backup and restore operations.
Conclusion
The IBM Red Brick Warehouse backup and restore system supports full and incremental backups and automatic data restores. It provides a checkpoint backup mode, as well as an online backup mode that eliminates database downtime. The TMU supports direct backup to disk files and tape devices, as well as full integration with XBSA-compliant storage management systems. TMU backup and restore operations are easy to configure and administer, as our case study with a large warehouse database demonstrated. We also covered backup performance, potential administration problems and solutions, and the configuration requirements for the Tivoli and Legato storage management systems.
Acknowledgements
I would like to convey my sincere thanks to Qi Jin, Sriram Srinivasan, Bob Rumsby, Christine Smith, and Kari Kelly for their insightful comments and suggestions on this article.
Disclaimer
The configuration information contained in this document has not been submitted to any formal IBM test. Anyone attempting to adapt these techniques to their own environments does so at his or her own risk. The use of this information or the implementation of any of these techniques is derived under specific operating and environmental conditions. While the information has been reviewed for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere.
About the author
|
Hui Liao
is a software developer at the IBM Silicon Valley Lab in San Jose, California. She has worked in the backup and restore, kernel, and data-loading area for IBM Red Brick Warehouse for the past three years. You can reach Hui Liao at
huiliao@us.ibm.com
.
|
About the author  | |  | Hui Liao is a software developer at the IBM Silicon Valley Lab in San Jose, California. She has worked in the backup and restore, kernel, and data-loading area for IBM Red Brick Warehouse for the past three years. You can reach Hui Liao at huiliao@us.ibm.com. |
Rate this page
|  |