Skip to main content

skip to main content

developerWorks  >  Tivoli | Autonomic computing  >

Continuous Data Protection

What is it and how is it best used?

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Intermediate

Chris Stakutis (chris.stakutis@us.ibm.com), CTO VitalFile & SANergy, IBM/Tivoli

17 Oct 2005

Continuous Data Protection is a new style of data protection ("backup"). Traditional backup occures once-per-day (or far less frequently for mobile and home users) and only captures files as they existed at the time of backup. Lost are the changes occuring throughout the work day. There are many different flavors of CDP starting to emerge in the market and each have different value propositions.

Introduction

The computer "backup" world is going through some exciting innovations. A few years ago we started hearing about "disk based backup" which greatly improved the overall performance of backup (faster backups, faster restores). Often times customers would use some type of replication as a "style" of data protection. Continuous Data Protection (CDP) is somewhat of a blend of those two approaches. Specifically:

  • CDP is continually capturing all changes (akin to replication)
  • But tagging (versioning) objects so that they can be specifically rolled-back to a particular time.

While disk-based backup offers faster backup and restore times, it does nothing to help with higher recovery-point objectives; that is, the backup interval is still typically once per day. The main point of CDP is to provide nearly infinite recover points so that any change can be recovered. This is proving to be of huge interest to all computer users, from corporate IT administrators down to individual home users. Traditional style of backup is now starting to look like seat belts, which only work if you use them and only in limited cases; the new world wants air bags.



Back to top


Taxonomy

There are three main types of CDP:

  • Block Based
  • File Based
  • Application Based

All have merits for various situations and customer needs. Let's start with a look at block-based solutions. Block based approaches are very transparent; applications need not know that it is present. Most block based solutions are "in fabric" (in the SAN fabric) and thus also work without regard to the type of server or storage. Quite simply, they see every block-write go across the storage network and logically keep a time-ordered cache of those writes. Some solutions are quite sophisticated in their management of that cache such that they can instantly present a "view" of disk/LUN at any time represented in their cache (versus having to re-assemble or roll-through transactions in a costly manner).

Block based CDP

Block solutions are great at capturing the data transparently, and great at presenting a "view" of some past point in time, but sometimes require additional work on the part of the application or user in order to make use of that historical view. For example, imagine a database application that is constantly streaming I/O to a storage element. To roll back to a view at some arbitrary time that wasn't co-incident with a database synchronize or queisce point would likely mean that the database would have to perform it's own "crash recovery" from that time view. Often block CDP solutions will support a tagging operation which allows the CDP device to tag specific "times" that are perhaps matched with application-side quiescing to allow for discrete recovery points. In between those discrete points the solution will still be able to provide useful views but perhaps at the expense of an application resync of some sort (but when you need to truly go to an arbitrary time, the value is exceedingly high).

So, the charms of a block-based solution are: high application transparency, no performance effect on the application, typically agnostic of hardware and platforms.

Application based CDP

At the other end of the spectrum there is Application-based CDP. In this scheme, specific applications (e.g. DB2 or some other database or similar application) are completely responsible for doing all of the journaling necessary to roll back to any time. Being tightly integrated into the application means the solution can provide a far richer set of recovery capabilities. For example, a database could perhaps recover a row or even a column in a table as it appeared 3 hours earlier and do so on the live system without disturbing the running application. A block based solution, by contrast, has no visibility of tables and rows and columns and only sees raw blocks. A block based solution would have to present a view of the entire disk (or disk set) and the application (such as a database) would have to be able to "mount" that view for use or manipulation.

The charm of an application based solution is: extreme application awareness for powerful recovery capabilities. The downsides are: only will work with that application and likely adds significant overhead and resource uses on the application servers.

File based CDP

Next up: File-based CDP solutions. File based CDP solutions run on the application hosts (file servers or workstations) and are somewhat similar to Application-based CDP solutions (in that file serving is essentially an application) but broader in value since many applications and users use file-based data naturally. Whereas a policy in a block-based solution can only be set per LUN/disk, a file based solution can have different policies per file or file group. Perhaps a set of files on a given machine simply do not need CDP-style protection, or another set of files might need a longer history of time captured, and so forth. Furthermore, file-based CDP solutions add only a modest amount of overhead because when a file is naturally written out to disk (saved) it is very convenient to make an instant copy since the data is already in various caches. Restoring is smoother in a file based solution as well. You do not need to present or mount an entire volume view of some past time point; rather, you can see individual saved instances of each file and pick the desired ones by hand (or request that a given time be restored for a set of files or directories).

Quick tip
File-type CDP is best for end-users and file servers. Why? Because the asset being protected (files) matches the style of CDP, which provides for better granularity and recovery-ease."

The charms of file-based solutions: light weight, file-based policies and granularity, more natural recovery scenarios, and broad application/user value.

Choosing the right type of CDP

So what type of CDP solution is best? Classic answer: It depends.

If you data is strictly files and those files are being used by typical office workers (creating and editing documents) or by automated business applications (perhaps XML packages), then a file-based solution is quite likely best (particularly if you are interested in protecting user-end-point systems such as workstations and laptops). If your machine is mostly serving a variety of applications (such as DB2 or Oracle or mail), a block-based solution is probably best. Last, if the application you are running has its own application-based CDP capabilities built-in, consider using that provided the overhead seems acceptable.

Table 1. Various types of CDP and their uses
PlatformUseCDP approachComments
UnixDatabasesHW True Block CDP with marked recovery ptsVol consistency; performance; non-app impacting
File serving SW True FILE CDP Per file policies; ease of use; on-line nature; easy of deployment
WindowsFile & Print SW True FILE CDP
Desktops SW True FILE CDP Lightness; flexibility; ease of deployement
DatabasesSW Frequent SnapsSynchronized with app; Very fast & effecient; Cost effective; Rapid recovery
EmailSW Frequent Snaps


Back to top


Tivoli CDP for File: What is it?

Tivoli CDP for Files is, quite simply, a file-based CDP solution. IBM/Tivoli will likely have a variety of CDP solutions over time and CDP for Files is the first one brought forward. Why start with files? Because files are the most prevalent business asset and growing at the fastest pace and arguably the least protected (especially on smaller end-point machines such as departmental file servers and workstations/laptops). Furthermore, loss of file data (due to accidental overwrites or erasures) creates tremendous lost-productivity of our expensive labor force. While impressive tabulations exist for the cost of help-desk calls to restore files, it is far more impressive to imagine all the calls that do not even go to the help desk such as when a user corrupts a file mid-day.

Tivoli CDP for files is designed with two major use-cases in mind:

  • Corporate or departmental file servers
  • End users (workstations and laptops)

Corporate file servers are typically backed-up once per day which is not nearly enough protection for our modern users (who are pressured more than ever to work on more things at once and under tighter deadlines). Adding CDP to those file servers (perhaps still keeping the existing backup solution in tact) will dramatically increase RPO and end-user productivity.

Direct end-user workstation or laptop protection has rapidly become a concern among IT managers. Just a handful of years ago corporate IT managers forced users to store their material on mapped network volumes and specifically would not backup end user workstations. Most end users that had some files stored locally would back up themselves by using writable CD's. Today, users are walking around with 60 or 100 gigabyte disk drives that are a veritable sponge soaking up all their corporate data and never making it back to a controlled file server. While it might still be the corporate "policy" to have users push their data to a corporate file server, it is becoming more and more unpractical. Thus, a per-user semi-personal backup solution is an easy-to-embrace notion, particularly if it automates that back-end "pushing" of data back to the corporate file server.

Enter Tivoli CDP for Files

Tivoli CDP for Files is an extraordinarily small and easy-to-use backup application equally valuable to both file servers as well as workstations and traveling laptops. Tivoli CDP embodies a unique combination of continuous protection along with a more traditional scheduled to-disk protection capability. The product breaks-down files into three very sensible categories:

  • Files that you truly know are valuable and warrant the special CDP style of protection
  • Files that you truly know are not valuable and should never be backed up (various system areas, temporary areas, replicated email, etc).
  • And files that fall into that gray area in between; files you might not even know they exist nor their importance, until you've lost one of them.

Traditional backup is similar to the third category above; that is, it is gratuitous in nature which was considered the safest approach to backup. Yet, such an approach was too lackadaisical for important material and far too ambitious for less important material.

The Tivoli CDP product combines replication with versioned-instances. As files are changed, the software can take several actions automatically and transparently:

  • Create a local versioned instance which allows for restore opportunities regardless of connectivity to any network (e.g. while in an airplane)
  • Optionally queue the file for transmission to a corporate server and be tolerant of network disconnect situations
  • Remember the file has changed and at a later scheduled time push a copy to some off-machine target.

More than 95% of restore requests are for material recently created and altered. Keeping most of recent data locally allows for unheralded end-user protection. That said, it is still of paramount importance to migrate data off-machine even if infrequently. Tivoli CDP for Files is designed to support any file-class device as a target, such as a file server, a closed architecture NAS device, another LUN, or simply a removable Firewire or USB drive. Corporate administrators will gravitate to using a corporate file server as the back-end data store which allows them to once again take control of those wandering digital assets.

The corporate risk of missing end-user protection or not protecting their file servers with modern continuous protection is too high in today's world of data-everywhere. The opportunity for an easy to use file-based CDP solution is vast.

cj


CDP "configuration for continuous protection" screen
CDP configuration for continuous protection screen

CDP "restore" interface screen
CDP restore interface screen


Resources



About the author

Chris Stakutis is a renowned data storage industry inventor, technologist, and author with over 20 years of industry experience. He holds over 6 U.S. patents (8 more filed) along various data and networking inventions. Currently working for IBM managing cutting-edge data storage research and development, he was the founder and CTO of SANergy (high speed data sharing), which was sold to IBM in 2000. Mr. Stakutis is often published in industry journals and seen speaking at industry events. Mr. Stakutis graduated from Worcester Polytechnic Institute in an accelerated three year program and then went on to obtain an MBA from Babson College at a leisurely 10-year pace. He has held key engineering and product management rolls in various high technology companies including Mercury Computer Systems, Precision Robots, MIT Lincoln Laboratories, and many startups.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top