
|
 |
Cheetah 2 is Released
Cheetah2
Cheetah
2 is Released
By now you probably have heard about the release of IDS 11.50 (Cheetah
2). The release was announced at the recent IIUG user
conference
which was held in Lenexa, Kansas on April 28 - April 30.
I have been heavily involved in the development of Cheetah2, which is
the main reason that I have not been active in the blobsphere world
recently.
There are many cool features which are part of Cheetah2, not the least
of which is the expansion of work done in IDS 11 as part of
MACH11. I will be describing many of these new features in
detail
over the next few weeks, but though that I'd do a quick overview in
this blog entry.
Check out some of the new functionality at the web site .... http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp
While mentioning external sites, I need to mention that IBM is
currently taking a survey which should give us insite into your
business priorities and experiences with IBM software. To
take this survey check out
http://www-306.ibm.com/software/data/info/consumability-survey/
Expanded Ports
IDS 11.50 has now been ported to the MAC. This was announced
at Mac-World - held in January.
New
Communication Protocols
We have extended support for SSL. In the past we supported
encrypted communications, but now support the complete SSL suite.
Also, we now have support for DRDA and JCC. This
will
enable IDS to more easily support the same clients as is currently
supported by DB2.
Single Sign On
We now have the ability to support a common authentication for multiple
IDS servers.
Updatable
Secondary Nodes
With IDS 11, we expanded the secondary types from the
single HDR
secondary node to include multiple secondary nodes (RSS) as
well
as a secondary running on top of a shared disk (SDS). This
created the MACH11 cluster. In IDS 11.50, we have added to
the
usability of the secondary node by making it possible to perform update
activity on those secondary nodes - be they HDR, RSS, or SDS nodes.
This means that the investment that has been made
in
availability solutions can be used in much the same way as
the
normal primary node.
Expanding the Isolation of the Secondary Node
In the past, the read isolation on the secondary node was restricted to
dirty-read isolation. With IDS 11.50, we have expanded this
to
include committed read and last committed read isolation. With
the release, this is restricted to SDS nodes only, but will be expanded
to RSS and HDR in the near future as more testing is done.
Connection
Virtualization
We have added support for connection virtualization by implementing a
connection manager. The connection manager monitors the
various
nodes within the MACH11 cluster to determine the type of node,
workload, availability, etc. The customer can configure the
connection manager by describing a class of service. When the
client application connects to the cluster, it connects to the defined
class of service rather than to a specific server. The
connection
manager will then route the connection to the best choice for that
classification , based on current workload.
Failover
Arbitrator
Part of the connection manager is to perform failover detection and
transfer of functionality. This is done by a simple set of
rules
which are part of the Connection Manager configuration.
OAT
Enhancements
The Open Admin Tool has had quite a few new enhancements. It
has
had an design-makeover of the overall layout and presentation.
It
really looks a lot better than in the past. In addition to
the
normal monitoring interfaces, it also now has some autonomics such as
update statistics automation and alert management
Addition
to SQL
There are several new things which we have added to the SQL engine.
One of the key things is a row versioning indicator which can
be
used to support optimistic locking techniques. Also we have
added
the ability to support dynamic query construction within basic SPL.
Categories
: [ IDS ]
May 06 2008, 05:00:00 PM EDT
Permalink
|
Running MACH11 on a single machine
MACH11 on a single server
Setting up MACH11 on a
Single Server
Well I owe everyone an apology. It's been way too long since my
last
post. I've been rather busy lately trying to get all of the
stuff
into Cheetah2 (the upcoming release), but still I should have posted
something. Sorry...
Many of you may have heard in a recent "Chat with the Labs" that we are
in the beta process for Cheetah2. Also, Jerry Keesee (the
director of IDS development) mentioned that we would be starting an
open beta shortly. Some of you may have even already joined the
open beta and are currently testing with Cheetah2.
So I thought that I'd spend a bit of time
today describing how you can setup the MACH11 environment on a single
server, be it HDR, RSS, or SDS. This might be a good thing to
discuss at this time because there is some new functionality for the
MACH11 environment in Cheetah2.
I do most of my development on a Linux workstation. It does
not
have a fancy shared disk subsystem. It simply has the factory
installed IDE disk. I also do quite a bit of development on
http://www-128.ibm.com/developerworks/blogs/page/roundrepmy
laptop using VMWare running RedHat 4. (Yes - you can run a
MACH11
cluster on a virtual server.) I can guarantee you that my
laptop
doesn't have any shared disk subsystem.
OK - what it the trick to make this work? Well it's fairly
simple - use relative path names. The same technique will
work on basic HDR on IDS versions released prior to IDS 11.
Let's see how I set up my environment for a primary node which supports
an SDS node and an HDR secondary.

First of all, let's look at the directory layout. On my
development system, all of my chunks are located under /db/IDS.
Under that directory, I set up a different directory for each
of
my servers. In my case, I name my servers serv1,
serv2,
serv3, serv4, etc. That means that I have a
directory /db/IDS/serv1,
/db/IDS/serv2, /db/IDS/serv3,
etc. Then within each of these directories, I set up my chunk
files. For instance, /db/IDS/serv1 would look something like
what
is displayed to the right. (Of course I'm lazy and have a
script
which sets all of the up.)
So far there doesn't appear to be anything unusual about this. This is
pretty much how most people will have set up a testing system.
But then let's examine the onconfig and see what it looks
like.

The first thing that we might notice is the ROOTPATH parameter..
I'm not using the fully qualified path name
/db/IDS/serv1/rootchk. Instead I'm only using the name of
the
file. OK - so what does this mean? Well to make this work,
the only restriction is that when starting onlinit, I must first be in
the instance's directory of /db/IDS/serv1.
So in order to bring up the server, I first execute cd /db/IDS/serv1 and
then execute oninit
-iyv. By using relative path names, I'm able to
run with any of the MACH11 server types, be it HDR, RSS, or SDS.
Now let's examine some of the other key parameters which might
need to be modified. To support shared disk secondary nodes
(SDS), we might want to modify SDS_TEMPDBS and SDS_PAGING to
use
relative path names. In this example, I'm using the file
sdstemp
as my temporary DBSpace for shared disks and the files page1/page2 for my
SDS paging files. Also notice that I set my message log file
to the file name log.
OK - now let's see how the shared disk is set up. I actually
have
the option to simply use the exact same path name on the SDS node as I
use on the primary. If all I want to do is to set up a
primary
and SDS nodes, then there is no reason to use relative path names.
I simply have to use the same entry for ROOTPATH on the SDS
node
as I do on the primary. On Windows, this might be the easiest
way
to set up a MACH11 cluster. However in my environment, I want
to
be able to set up both SDS and HDR/RSS. Since running HDR and
RSS
on the same machine will require using relative paths, then I will also
set up SDS using relative path name. So let's see how the
'instance directory' of the SDS node is set up.
Basically the only thing that we have to do to use relative path names
for the SDS node and to have a SDS instance directory is to use links
to point to where the primary chunks are located. The SDS
temporary dbspace chunks, paging file, and message log files are
dynamically created as the SDS node is started.
It is a little more tricky to use relative path name on Windows because
then the database server is run as a Windows service. So what
must be done is to bring up the instance by using the starts command
in the correct directory rather than using the auto start functionality
of the Windows service manager. You can use the Windows
instance
manager to get things set up, but will not want to actually initialize
the server. Instead you will want to edit the instances
onconfig
file to use the relative path names, get into the correct instance
directory, and then run starts <instance_name> -iy.
This
will result in the same effect as having run oninit -iy on a UNIX type
of system.
Additionally, on Windows there an issue with setting up
HDR and/or RSS because the physical recovery of the server will try to
also startup the engine. Physical recovery is performed by
running ontape -p and is a normal step used to initialize the HDR/RSS
secondary. Since ontape -p will automatically start the
server,
there can be a problem with oninit not being in the correct directory
because it is not started in quite the same way on Windows as on UNIX.
To get around this issue, I've used the following technique
in
the past to instantiate the HDR secondary on Windows.
| On
the Primary |
On
the Secondary |
| onmode -d primary <secondary node> |
|
| onmode -c |
|
| onmode -ky |
|
| Copy chunks from primary instance directory
to the secondary instance directory |
|
|
Perform a physical recovery (oninit -PHY) |
|
onmode -d secondary <primary node> |
We don't document the oninit -PHY option and don't encourage it's usage
in a normal production environment. It performs a physical
recovery of the server which means that we only recover up to the
checkpoint. We do not perform any roll forward of the logical
logs. So in normal production environments, it's misuse can
cause
problems and possible loss of data - so if you should attempt to use
this technique to set up an HDR environment on Windows, be aware of
this.
Categories
: [ HDR | MACH11 | RSS | SDS ]
Feb 15 2008, 10:20:00 AM EST
Permalink
|
cdr check correction
cdr_check_correction
Correction
to the cdr check
Document in Developers Works
In the developer's work
article for enabling the cdr check functionality, there is an error in
the compile script for AIX. This article is located in
http://www.ibm.com/developerworks/db2/library/techarticle/dm-0604pruet/
and is titled "Enable
'cdr check' functionality within IBM Informix Dynamic Server".
To enable cdr check in IDS 10, you must first install some
UDRs to enable checksum generation. It is not necessary to do
this in IDS 11 because those UDRs are built into the product itself.
The part in error is the following statement in the make file
for AIX. Instead of having...
$(NM) -X64 -g checksum.o | sed "/ U /d" | cut -f1 -d" " | sed
"/.o:/d" | sed -e "s/^\.//" | sort -u < checksum.exp
we should have
$(NM) -X64 -g checksum.o | sed "/ U /d" | cut -f1 -d" " | sed
"/.o:/d" | sed -e "s/^\.//" | sort -u >
checksum.exp
That would mean that the correct make file for AIX64 would be.
# # Compiler/Linker flags specific to AIX #
CC = cc LD = ld NM = nm CFLAGS = -q64 -shared -qchars=signed -D_H_LOCALEDEF -DINFX_ANSI -D_LARGE_FILES PICFLAGS = -lm SOFLAGS = -G -b64 -bnoentry
LIBSO = checksum.so
TARGETS =${LIBSO}
.SUFFIXES: .c .o
all: $(TARGETS)
checksum.so: checksum.c $(CC) $(CFLAGS) $(PICFLAGS) -I${INFORMIXDIR}/incl/public -I${INFORMIXDIR}/incl/ -c checksum.c $(NM) -X64 -g checksum.o | sed "/ U /d" | cut -f1 -d" " \ | sed "/.o:/d" | sed -e "s/^\.//" | sort -u > checksum.exp $(LD) $(SOFLAGS) -bE:checksum.exp -o checksum.so checksum.o -lm -lc chmod 755 checksum.so
clean:: rm -f ${LIBSO} *.o
|
Categories
: [ cdr ]
Dec 11 2007, 09:00:00 AM EST
Permalink
|
Identifying the Server Type
Identifying the Server Type
Identifying the Server type
With introduction of SDS and RSS, one can have a complex topology of IDS cluster. The DBA's scripts as well as applications need find out whether the server is stand-alone, Primary or Secondary. That makes it important to understand the programmatic interfaces available to find type of server being accessed.
Following are the ways to check the type of the server :
Administrative Utilities
The output from 'onstat -' prints the type of the server. For all secondary types, it will say Read-Only with the type HDR, RSS or SDS. For each of the secondary server type, there are onstat options to get more details
- 'onstat -g dri' prints HDR information
- 'onstat -g sds' prints SDS information
- 'onstat -g rss' prints RSS information
Sysmaster Database
On primary, view sysha_nodes view contains all the server names with types. On all secondary type servers, it has a single row with primary server's entry.
Esql/c Client
A warning is set in SQLCA when the client connects to any secondary type server. The sqlwarn.sqlwarn6 is set to 'W'. Also the SQLSTATE is set to '01I06'. Application can look at this warning flag to determine whether the server is read-only or not.
JDBC Client
The Informix JDBC driver provides more direct APIs to check the server status. The Connection object supports three methods isReadOnly(), isHDREnabled() and getHDRtype().
- isReadOnly() : Returns true if the active server is a secondary server
- isHDREnabled() : Returns true if both servers in the HDR pair are available. Returns false if one of the servers is unavailable.
- getHDRtype() : Returns primary or standard for a primary server, secondary for a secondary server
UDRs
The C UDRs can use mi_hdr_status() API to check the type of the server where the UDR is being executed. The return value should be checked for bits MI_HDR_PRIMARY and MI_HDR_SECONDARY. These macros are defined in $INFORMIXDIR/public/milib.h
There is no direct way from the SPL or Java routines. One can query against sysmaster tables mentioned above.
Sysdbopen() UDR
IDS 11.10 supports two DBA controlled routines sysdbopen() and sysdbclose(). These procedures are run by server on the behalf of the users when the try to connect/disconnect to/from a database. One can create a sysdbopen() routine that checks the server type (using the mi_ API or sysmaster query) and restrict databases or users on secondary servers.
Nov 20 2007, 06:09:14 PM EST
Permalink
|
DDRBLOCK
DDRBLOCK
DDRBLOCK
It sometimes happens that quite useful fixes and enhancements make it into a release but remain little-known. A few such fixes and enhancements made it into the 11.10xC2 server; together, these enhancements make the management of the CDR_QDATA_SBSPACE configuration and of DDRBLOCK mode much easier and more tenable than in the past.
The IDS server writes to logical log files in a circular fashion, overwriting older log files when a new log file needs to be written to and more than LOGILES files (as specified in the $INFORMIXDIR/etc/$ONCONFIG configuration file) have been written to. DDRBLOCK occurs when new transactions writing to the log come dangerously close to wrapping the log space around and overwriting old logs that Enterprise Replication has yet to process. In older servers, if the system ever entered DDRBLOCK mode, it could be very difficult to get the system out of DDRBLOCK mode without restarting oninit.
More recent releases of Enterprise Replication -- certainly, version 10 and later -- should rarely enter DDRBLOCK mode, unless the system is severely misconfigured. An example of a dangerously misconfigured system would be one with too few log files, especially if some of the log files are quite large while others are quite small. With such a configuration, even a small hiccup when Enterprise Replication processes log entries can cause DDRBLOCK mode, or even worse, log wrap. If log wrap occurs, that is, if new transactions overwrite entries that Enterprise Replication has yet to process, Enterprise Replication shuts down and data becomes unsynchronized among servers in the replication system.
One condition in which Enterprise Replication can still enter
DDRBLOCK mode even in an otherwise well-configured system is when a
destination site remains inaccessible for an extended period of time.
If this happens, the Reliable Queue Manager (RQM) send queue will save
transactions that include that site in its destination list in stable
storage. If the spool space fills, the oninit server will likely enter
DDRBLOCK mode, because Enterprise Replication cannot stably store
transactions in its send queue and therefore can no longer advance the replay position, the oldest point in the logs that Enterprise Replication needs to access.
As an example, I have configured a small two-server replication system. I configured the IDS instance at which I will be generating transactions with too few logs and too little send queue stable storage and used the 'cdr suspend serv' command to suspend the other server. Since transactions cannot flow to the destination server, transactions quickly start to accumulate in the send queue:
[pinch-cdrtempmurre] (pinch) 110 % onstat -g rqm sendq | egrep '^ Txns'
Txns in queue: 18
Txns in memory: 7
Txns in spool only: 11
Txns spooled: 11
and as I configured very little send queue spool space, the spool space immediately fills up, as shown in the message log:
10:44:47 CDR QUEUER: Send Queue space is FULL - waiting for space in CDR_QDATA_SBSPACE
In this case, Enterprise Replication will also raise an alarm of severity 4 and class 31.
Since Enterprise Replication cannot advance the replay position, the IDS instance also enters DDRBLOCK state, as shown by the "Blocked:DDR" line in the following output:
[pinch-cdrtempmurre] (pinch) 129 % onstat -g ddr | head -10
IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:26:03 -- 78772 Kbytes
Blocked:DDR
DDR -- Running --
# Event Snoopy Snoopy Replay Replay Current Current
Buffers ID Position ID Position ID Position
2064 4 1ee4454 3 74f018 12 2ad000
We can see that the replay log id is 3, whereas the current log id to which IDS is writing transactions is 12. The fact that log 12 is the current log is also displayed by the onstat -l command:
[pinch-cdrtempmurre] (pinch) 132 % onstat -l | grep C | grep -v CDR
451f2c30 2 U---C-L 12 1:31763 9000 685 7.61
I configured my example instance to have only 10 logical log files, so if we cannot reuse logical log 3 and are already at log 12, we need 12 - 3 + 1 or all 10 logical log files. Small wonder the server is in DDRBLOCK mode!
The send queue stable storage area is configured via the CDR_QDATA_SBSPACE configuration parameter. 11.10xC2 and later include an addition to onstat that allows the sbspaces configured to CDR_QDATA_SBSPACE to be monitored very easily. The command is onstat -g rqm sbspaces:
onstat -g rqm sbspaces
IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:29:41 -- 78772 Kbytes
Blocked:DDR
RQM Space Statistics for CDR_QDATA_SBSPACE:
-------------------------------------------
name/addr number used free total %full pathname
0x46581c58 5 311 1 312 100 /tmp/amsterdam_sbsp_base
amsterdam_sbsp_base5 311 1 312 100
0x46e54528 6 295 17 312 95 /tmp/amsterdam_sbsp_2
amsterdam_sbsp_26 295 17 312 95
0x46e54cf8 7 310 2 312 99 /tmp/amsterdam_sbsp_3
amsterdam_sbsp_37 310 2 312 99
0x47bceca8 8 312 0 312 100 /tmp/amsterdam_sbsp_4
amsterdam_sbsp_48 312 0 312 100
In the past, the information returned via the onstat -g rqm sbspaces command was available, but you had gather it by looking at the the CDR_QDATA_SBSPACE values and then manually extracting the information relevant to the CDR_QDATA_SBSPACE spaces from the onstat -d output. Imagine doing this in a "real" system with dozens of dbspaces!
If CDR_QDATA_SBSPACE space starts to run low, you can either add more chunks to an sbspace already in the CDR_QDATA_SBSPACE list, or, starting with the 11.10xC2 release, you can add a new sbspace to the CDR_QDATA_SBSPACE list.
For example, say I have created (via onspaces) a new sbspace mynewcdrsbsp:
[pinch-cdrtempmurre] (configparam) 157 % onstat -d | grep mynewcdrsbsp
47bce508 12 0x68001 12 1 2048 N SB informix mynewcdrsbsp
47bce6a0 12 12 0 1000 702 702 POSB /tmp/mynewcdrsbsp
I can then add that space to the list of CDR_QDATA_SBSPACE spaces via the cdr add config command.
[pinch-cdrtempmurre] (configparam) 158 % userid informix cdr add config "CDR_QDATA_SBSPACE mynewcdrsbsp"
WARNING: The value specifed updated in-memory only.
I can easily verify what sbspaces are configured via onstat. As you can
see, mynewcdrsbsp is there:
[pinch-cdrtempmurre] (configparam) 159 % onstat -g cdr config CDR_QDATA_SBSPACE
IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:39:38 -- 86964 Kbytes
Blocked:DDR
CDR_QDATA_SBSPACE configuration setting:
amsterdam_sbsp_base
amsterdam_sbsp_2
amsterdam_sbsp_3
amsterdam_sbsp_4
mynewcdrsbsp
and Enterprise Replication is spooling transactions to the new sbspace. In fact, it's already 99% full.
[pinch-cdrtempmurre] (configparam) 162 % onstat -g rqm sbspaces
IBM Informix Dynamic Server Version 11.10.F -- On-Line -- Up 00:51:59 -- 86964 Kbytes
Blocked:DDR
RQM Space Statistics for CDR_QDATA_SBSPACE:
-------------------------------------------
name/addr number used free total %full pathname
0x46581c58 5 311 1 312 100 /tmp/amsterdam_sbsp_base
amsterdam_sbsp_base5 311 1 312 100
0x46e54528 6 312 0 312 100 /tmp/amsterdam_sbsp_2
amsterdam_sbsp_26 312 0 312 100
0x46e54cf8 7 310 2 312 99 /tmp/amsterdam_sbsp_3
amsterdam_sbsp_37 310 2 312 99
0x47bceca8 8 312 0 312 100 /tmp/amsterdam_sbsp_4
amsterdam_sbsp_48 312 0 312 100
0x47bce6a0 12 696 6 702 99 /tmp/mynewcdrsbsp
mynewcdrsbsp 12 696 6 702 99
So what about DDRBLOCK mode? In practice, by far the likeliest cause for entering DDRBLOCK mode is that a destination server remains unavailable for an extended period of time. (In this example, I have simulated that condition by suspending the destination server.) If you expect the destination server to become available in a reasonable amount of time and you have enough disk space, you can add more space to the CDR_QDATA_SBSPACE parameter as in this example. Because Enterprise Replication raises an alarm of severity 4 and class 31 when it runs out of send queue spool space, you could even write an alarm handler to automate this task.
What if you expect a destination server to become unavailable for an extended period of time, a period longer than you expect can be handled by spooling the send queue to disk? You will have little choice other than to remove the unavailable server from the replication system and to resynchronize data once it becomes available again; but that is the topic of a future blog entry.
Nov 16 2007, 12:53:56 PM EST
Permalink
|
An Always-On HDR
An Always-On HDR
An Always-On HDR
IDS’s HDR technology is the cornerstone to every high-availability environment. If you need your data available at all times (and who doesn’t?) you must plan for unexpected outages (e.g. network, hardware or operating system failure). HDR addresses this by allowing you to have a copy of your primary server. With DRINTERVAL set to -1, you can guarantee that your primary and secondary servers are in complete synchronization. Problem solved.
But what happens if one node of your HDR pair goes down? You’re no longer operating with high-availability protection. How much more risk can you tolerate? Sure you’ve got logs saved, but clients need the data faster than a log restore.
With the release of version 11, IDS can be configured in such a way that you can have HDR always on. In other words, you can create an environment where you step back into HDR as soon as a failure causes you to step out of it. How do you do this? Use a cluster of an HDR pair plus RSS.
A Remote Standalone Secondary (or RSS) node operates very similarly to an HDR secondary except it is not in sync with the primary. It offers many advantages when used at a remote location (i.e. one with high network latency), but in our context let’s use one locally. One characteristic of an RSS server is that is can become an HDR secondary while on line! An HDR secondary in turn can become an RSS node. Now we’ve got all the pieces in place, so let’s explain the ring.
The simplest cluster has three nodes: an HDR primary, an HDR secondary, and an RSS node. Our goal is to always have HDR on. So when an event occurs that causes our HDR pair to break - one of the nodes fails - that must trigger a second event that reestablishes an HDR pair. The second event can occur manually or programmatically. Since the cluster has only three nodes, let’s consider the three failures that could occur and what to do.
Scenario 1: Primary fails
- Make the HDR secondary your new HDR primary
- Make the RSS node your new HDR secondary
- Fix your old primary and bring it online as an RSS node
Scenario 2: Secondary fails
- Make the RSS node your new HDR secondary
- Fix your old HDR secondary and bring it online as an RSS node
Scenario 3: RSS node fails
- Fix your RSS node
OR
- Add a new RSS node
Regardless of which node fails, we have the means of reestablishing an HDR environment. Further, since RSS technology is one-to-N, multiple RSS nodes can be added to the cluster giving you more options for each scenario.
What about scenarios when more than one failure occurs at a time? These are obviously more complex and their solutions depend on what types of failures occur. Redundant machine parts and network infrastructure, interconnected network nodes, and our high-availability “ring” will mostly likely play significant parts.
Adding an RSS to an HDR pair can give at least a second layer of high data availability, and as explained above can at best make HDR always on.
Categories
: [ HDR | RSS ]
Nov 12 2007, 12:23:01 PM EST
Permalink
|
Catching and Cleaning
Catching and Cleaning a Grouper
Getting Info about the Grouper
Now that we know about the grouper in general, let's look at how we can get specific information about what it's doing.
How do we do that? Well if you answered "onstat" … you're right! onstat -g grp, with an optional modifier, is the gateway to the inside of the grouper. In typical onstat style running the basic command, onstat -g grp, gives you a sampling of various information also accessible from other subcommands. Let's pick just one piece to focus in on. The line "Eval thread interface ring buffer pending entries" indicates how much work is outstanding for the evaluator threads. The fanout thread puts items on the "ring buffer" and the evaluator threads take things off. This can help you decide the best number of evaluator threads for your systems.
For information about the evaluation phase two commands are particularly good. onstat -g grp E gives information about each evaluator thread including the number of updates they have processed. Secondly, onstat -g grp P shows for which tables the grouper is evaluating rows.
For information about the compression phase, check out onstat -g grp M. This keeps a running average of the time been spent on compression and shows what compression strategies are currently being used. onstat -g grp Mz resets these statistics.
Lastly, for info about the copy phase try running onstat -g grp T. This command tells you details about the last transaction copied out as well the total amount of transactions processed.
Keep these onstat commands in your tackle box for the next time you wanna catch and clean a grouper!
Nov 08 2007, 06:42:27 PM EST
Permalink
|
Fishing Arround
grouper
Grouper
Grouper Threads
Phases of the
Grouper Evaluator
No - this is not about some fish. Rather, this is about the
process
within IDS which regroups the logical log records into a transaction
for replication, evaluates the rows to determine what should be
replicated and where it should be replicated to, and then places the
replicated transaction into the send queue for transmission to the
target servers.
Grouper
Threads
The grouper is composed of two parts. The first part consists
of the grouper fanout thread (CDRGfan).
The purpose of the grouper fanout thread is to
- Receive reconstituted log records from the log snooper (ddr_snoopy)
- Regroup the transaction (i.e. attach the log record to the
appropriate transaction)
- Pass the log records to the grouper evaluator for evaluation
- Determine if the transaction is consuming too many
resources and
needs special treatment such as it's own memory pool and/or needs to be
paged
- Place the transaction into the grouper serial list.
This is
done when the commit record is processed and is used to ensure that the
transaction is placed into the send queue by commit order.
The second part of the grouper is the grouper-evaluator. The
grouper-evaluator consists of several threads whose names begin
with "CDRGeval__". The purpose of the grouper
evaluator is
to
- Evaluate the log record to determine if it is a candidate
for prorogation
- Reconstitute the transaction from the logical logs
- Compress the transaction by the removal of any duplicate
operations on the same row
- Determine the original 'before image' and the final 'after
image' of any update operation
- Queue the replicated transaction for transmission to the
various targets
- Record any deleted rows in the shadow delete table
It is fairly obvious that the Grouper-Evaluator is a
fairly critical component of ER. Because of that it is rather
critical that it be as
streamlined as possible. Otherwise, it would not be able to
process the log records quickly which would cause a back flow into the
log snooping process. And a back flow into the log snooping
would
cause a significant impact on overall latency. So it is
rather
important that grouper be able process the log records
quickly and avoid having to do disk IO.
Phases
of the Grouper Evaluator
Evaluation
Phase
The first phase of the grouper evaluator is the evaluation phase.
Duing this process one of the grouper evaluator threads will
examine the log record to determine if it is a candidate for
replication. If it is not, then the row is immediatly
released. Generally the grouper will evaluate rows as the
transaction log buffers are being flushed to disk. That means
that there is generally no physical IO involved in obtaining the rows.
This means that if the transaction performs
operations on multiple rows, it is possible that grouper may have
evaluated the log records before the commit for the transaction has
occurred. This would generally be the case if the commit of
the
transaction is in a different log buffer than the other operations of
the transaction. However, the grouper does not place the
transaction into the send queue until it has processed the commit
record and all rows of the transaction have been evaluated.
The grouper evaluation is performed in parallel. By that I
mean that one log record of the original transaction might be evaluated
by one of the grouper threads while another log record of the same
transaction can be evaluated by another thread. This makes it
possible for the evaluator to remain fairly current with the current
log position.
Compression
Phase
Once the commit record has been processed, the grouper goes through a
compression phase. This involves determining all of the
operations for a given row within the transaction and eliminating any
unnecessary operations. For instance, if a row was updated
multiple times within the transaction, the duplicate operations will be
eliminated and only the original before and after image will be saved.
If a row was inserted in a transaction and then deleted
within
that same transaction, then it will not even be replicated.
This
process reduces the overall size of the transaction which will be
placed into the send queue.
Additionally, the compression phase is a requirement for transmitting
the correct operation. There are many examples where the
operation can not be transmitted by using the same operation as was
performed on the source. For instance, suppose a replicate
was
defined with a filter - say "select
* from payroll where status_column = 4".
Now suppose the following command was issued
update payroll set
status_column = 4 where emp_no = 23412;
Unless the before image of the row had a status_column of 4, then the
target would not have the existing before image as the before image was
not a member of the replicated set of data. Therefor, when
the
update operation was replicate, we would need to replicate it as an
insert, not as a delete.
Likewise, suppose the following statement was issued:
update payroll set
status_column = 3 where emp_no = 23412;
If the before image of the row had the status_column set to 4, then the
update operation would be removing the row from the set of replicated
data because the filter used to define the replicate is no longer a 4.
That means that the update operation would need to be
transmitted
as a delete operation.
Copy
Phase
The final phase of the grouper evaluator is the copy phase.
During this time, the replicated transaction is placed into
the send queue for transmission to the target nodes. Although
the transaction may be transmitted to multiple targets, it is placed
into the send queue only once. The transaction is placed
into the queue in a 'stream' format - which basically means that it is
put into a network-independent format. That means that
objects such as user defined types are converted into a stream for
transmission to the target nodes.
Configuring
the Grouper Threads
The onconfig parameter used to configure the grouper evaluator threads
is CDR_EVALTHREADS x,y where 'x' is the number of threads per
CPUVP and 'y' is a number of extra threads. The default is
1,2.
I personally think that 1,2 is a good setting for the number
of
evaluator threads. The theory is that we want to evaluate the
row
as quickly as possible. Since the majority of work that the
grouper evaluator threads do is very light weight simple evaluation of
the log records, there is little cost with having one per CPUVP.
Also this makes it easier to maintain a balance between the
logging work and the consumer of the logs.
However, there is still the problem of having to maintain the local
shadow delete table. Maintaining the delete table does
involve
some blocking activity because we have to perform IO to the delete
table itself. That will cause the grouper evaluator thread to
go
into a wait state, which can lead to lagging behind the consumption of
the logs. That's why it still makes sense to have a couple of
extra evaluator threads.
Categories
: [ ER | Grouper ]
Nov 05 2007, 11:00:00 PM EST
Permalink
|
Monitoring the Queue
Monitor_the_queue
Monitoring
the Queue
Overview
onstat -g rqm sendq
output
Current
Statistics
Historical
Statistics
Progress Table
RQM Handle
Overview
Well - made it safely back home from the IOD conference in Las Vegas
without losing too much money in the casino. The really cool
thing was that while waiting at the airport, I played some slot
machines and won $45.00. Guess I should have taken a later
flight
so I could have played longer... ;-)
I mentioned last week
that I
would be posting some pictures from the conference.
Unfortunatly,
my pictures were not very good, so I would suggest checking out
these pictures
instead. The conference was great.
There
were arround 8000 folks attending and IDS had a strong presence.
Well - back to business. In a previous entry I gave some
thoughts about sizing the queue.
In this entry I'm going to describe how to monitor the
queue.There are two main ways to monitor the queues, through onstat and
through the sysmaster database. In this blog entry we will
focus
on the onstat
-g rqm command.
The Reliable Queue Manager (rqm) is the subcomponent of ER which is
responsible for the physical management of the queue. It is
responsible for things such as determining when an item can be removed
from the queue, what thread is referencing a queue item, cursors on the
queue (rqm handles), when an item must be spooled to disk
(smartblob), etc.
There are several options which can be used with the onstat -g rqm
command. The following table describes these options:
(Options to onstat -g
rqm)
| Option |
Description |
| <nothing>
(i.e onstat -g rqm) |
Display
information
about all queues |
| SENDQ |
Display
information
about the send
queue. The send queue is used to transmit transactions to
target
servers. These transactions might be originating on the local
node or might originated on a remote server in the case with
hierarchical routing. |
| RECVQ |
Display
information
about the
receive queue. The receive queue is used hold the replicated
transaction as it is received on the target but has not yet been
applied on the target table by the datasync threads. |
| CNTRLQ |
Display
information
about the
control queue. This queue is used to manage control messages
such
as replicate definitions, server definitions, start replicates, etc.
Items placed in the control queues are always copied into
stable
storage. |
| ACKQ |
Display
information
about the ACK
queue. This queue is used to hold acknowledgments before
they
are sent to the source node. |
| SYNCQ |
Display
information
about the sync
queue. This queue is only used as part of the define server
and
then only to transmit the syscdr database to the newly defined node. |
| SBSPACES |
Display
information
about the sbspaces used to contain the stable storage of the queues. |
| FULL |
Display
the
transaction headers for each of the transactions within the queue which
are currently in memory. |
| VERBOSE |
In
addition to the
transaction headers, display information about each of the rows
for transactions in the queues. |
| BRIEF |
Display
a short
summary of what is contained in the queue. |
In this posting, we are going to limit ourselves to onstat -g rqm SENDQ.
onstat
-g rqm sendq output
There are several sections in the onstat -g rqm sendq command.
The following table describes these sections.
- The Summary
Section.
This section contains a summary of the queue. It is further
broken into two sections.
- The current summary
This section contains the current statistics about the queue.
- The historical summary
This section contains the historical totals of the queue. It contains
information such as the total number of transactions which have been
queued as well as the maximum size that the queue has grown to
- The Progress
Table Section
This section contains the 'progress' of the queue. By that we
mean that this is what is tracking what has been sent to what remote
server, and what has been ACKed from the remote server. This
section is further broken into two sections
- The progress table summary
This contains describes what table on disk is used to contain the
progress table. It also describes how often the progress
table is flushed to disk.
- The target/replicate progress information
This contains information on which transactions have been sent to the
target nodes and what the target nodes have acknowledged.
Also this contains the number of bytes per target/replicate
combination which are currently in the queue.
- The
Transaction Section
This section contains information about the first and last transactions
which are in memory in the queue.
- The Handle
Section.
This section contains a list of each of the handles which has been
allocated to each of the users of the queue. The handle can
be
thought of as a cursor into the queue. It is used to track
the
position within the queue.
The
Current Statistics Section
The current statistics section is the first
section in the onstat -g
rqm sendq command.
It contains information about the current contents of the
queue
such as how many bytes are contained in the queue, how many
transactions are in the queue, how many transactions are currently in
memory, how many have been spooled to disk, how many exist
only  on disk, etc.
When a new transaction is placed into the queue, the transaction is
given a stamp. This stamp is used to maintain the order of
the
transactions within the queue. This is a bit different from
the
commit order because the original commit order is only useful within
the context of the server on which the transaction is originally
committed. In the case of a system using
hierarchical
routing, it is possible that the send queue will have transactions
which originated on other servers. That would be the case of
a
replicated transaction which must be forwarded to another node.
In order to maintain the insert order, when a transaction is
inserted into the send queue, it receives a stamp. The stamp
is a
64 bit integer which is maintained as part of the queue. In
this
example, the next transaction to be inserted will be 638.
In this example, the send queue currently contains 611
transactions of which 268 are in memory, 343 are not in memory at all,
and 42 (611-569) are only in memory. The reason
that some
of the spooled transactions are also in memory is that we spawn a group
of spooling threads when we sense that we are getting close to running
out of memory. The spooled transaction is not immediately
removed
from memory, however. Instead the spooled transaction will be
removed from memory only when the memory limits are reached.
The
reason for this pre-spooling is to avoid having to do a lot of work
when we reach the memory limits. Once a
transaction has
been spooled and the in-memory copy of the transaction has been
removed, then the transaction is never completely reloaded back into
memory. Instead we transmit the transaction directly from the
spooled disk copy of the transaction.
The Size of Data in queue
is
the size of the queue when combining the in-memory transactions with
the spool-only transactions. The Pending Txn Buffers contains
information about transactions which are in the process of being queued
into the qeue.
The
Historical Statistics Section
Starting with Max Real memory data used,
we enter the historical section. This section contains a
summary of what has been placed in the queue in the past.
The Max Real memory data used contains the largest in memory size of
the queue. In this case, it reached up to 1,544,060 bytes.
The configured limits of the queue is currently configured to
be
1,536,000 bytes, so when the transaction when into the queue which
caused the limit to be reached, it triggered activity to flush the
in-memory transactions which had already been spooled. If no
in-memory transactions had been spooled, then the thread placing
transactions into the queue would have had to also spool the
transactions. That's why we spawn seperate spooling threads
to
perform the actual spooling. We try to get the spooling done
before we actually have to remove the transaction from memory.
There have been 638 transactions which have been queued to this queue.
That should match up with the insert stamp of the queue.
Of
those 638 transactions, 569 have also been spooled. At this
point
none of the spooled transactions have been restored. The
reason
for that is that the only reason that the transactions were spooled is
that I brought down one of the targets. Since the target is
down,
then we will not be attempting to restore those transactions.
When that server is brought back up, then we would attempt to
restore those transactions and send them to the target.
Recovered transactions are the transactions which existed only in the
spool when the instanace was started. They are not recovered
by re-reading from the logical log, but are simply recovered from the
disk storage when the engine is started. They would have been
snooped from the logical log at some time in the past, but now are
found in the stable queue.
Total Txns deleted
is the number of transactions that have been removed from the queue.
They may have been only in memory, only on disk in the stable
queue, or in both. The Total
Txns duplicated contains the number of times that we
attempted to queue a transaction which had already been processed.
This can occur when ER is first starting up as part of the
instance startup, or as part of a cdr
start command. The Total Txn Lookups is
simply a counter of the number of times that an ER thread attempted to
read a transaction.
The
Progress Tables Section
The progress table section contains information
on what is currently queued, to which server it is queued for, and what
has been ACKed from each of the participants of the replicate.

The first part of the progress table section is a summary.
The information in the receive queue progress table is
written to disk as part of each transaction that the datasync thread
applies. This is not, however the case with the send queue
progress table. Instead the send queue progress table is
copied to disk every so often. In this example we see that
the progress table is flushed to the table spttrg_send every 30
seconds. Another thing which might trigger the flushing of
the progress table is if over 1000 entries are dirtied.
Below the summary section is a list of the servers and group entries
which contain the information as far as what is currently queued for
each server, what has been sent to the remote server, and what has been
ACKed from the remote server. The term Group is a carry-over
from the 7.31 days when the replicate could be part of a replicate
group. It should really be "Replicate" in post-7.31
instances. The contents of the ACKed and Sent columns
contains the key of the last transaction which was
acknowledged from the remote server or sent to that server.
The KEY is a multi-part number consisting of
<source_node>/<unique_log_id>/<logpos>/<incremental
number>. From this we can see that the last
transaction which we sent to server 3 was transaction 0x2f/0x1934c8 and
the last transaction which has acknowledged is 0x28/0x684c8.
By examining the progress table we can discover which server is tending
to lag behind. In this example, server 2 is completely
current, but server 3 is lagging somewhat behind.
At the very bottom of this example, we see the start of the transaction
section. This contains the first and last transaction in the
queue which is currently in memory.
The
RQM Handle Section
The last section contains the handles.
The RQM handle can be thought of as being much like a cursor.
It contains the position within the queue that any thread is
currently processing.
Each thread that attempts to read a transaction from the queue, or to
place a transaction into the queue must first allocate a handle.
This handle is used to maintain the positioning within the
queue. By examining the RQM handle section, you can get an
idea what each of the threads are doing. For instance in this
example, we see that CDRNsA2 (Send Thread to server 2) is at the end of
the queue. We also see that CDRNsT3 (Send Thread to server 3)
is in the process of sending transaction 1/42/0xbc4c8.
It might be a bit surprising to see which threads have handles on the
send queue. The network send threads make sense.
These would be the CDRNsxxx threads.
However, it is a bit surprising to see that the receive
threads (CDRNrxxx) have handles on the send queue. The reason
for this is because of routing. When a transaction is
received which must be forwarded to another server, then the receive
thread will need to place that transaction into the send queue.
Therefore, it is not unusual to see that the receive threads
will have a handle on the send queue.
The other handles make sense. The grouper evaluator
(CDRGeval##) has to have a handle on the send queue because it is
placing transactions originating on this node into the send queue for
transmission to a remote server. The ACK threads (CDRACK##)
would have a handle on the send queue because it must update the
progress table and potentially delete a transaction when an ACK is
received from a remote server.
Categories
: [ ER | Queue | RQM ]
Oct 22 2007, 09:18:00 AM EDT
Permalink
|
Sizing the Queue
SizingTheQueue
Sizing the Queue
Overview
In general there is not a lot of configuration items for Enterprise
Replication. One of the things which can be configured is the
in-memory max size of the queues. This is configured by the
onconfig parameter CDR_QUEUEMEM parameter. The default value for this is 4096, which is probably too small.
There are two main queues used by ER - the send queue and the receive
queue. Transactions which have been retrieved from the logical
log file and have been evaluated for replication are placed in the send
queue for transmission to the target nodes. A given transaction
is placed in the send queue only once, even if it is to be sent to
multiple target nodes. When the transaction is received on the
target node, it is placed in the receive queue where it waits its turn
to be applied.
If the ER domain is defined to be using some form or a hierarchy, it is
possible that the received replicated transaction will also be placed
in the send queue so that it can be forwarded to other nodes. In
fact it is possible that the replicated transaction is only placed in
the send queue. That would be the case where the transaction
might need to be forwarded, but the intermediate node is not a
participate in replication. However, for the purpose of
this discussion, we will consider only a single source with a
single target node.
First of all, the value of CDR_QUEUEMEM is not a preallocated block of
memory which is used to store transactions. It is a limit on the
maximum memory size that an ER queue can expand to. If this limit
is reached then the replicated transaction may exist in the disk
overflow space within a smartblob. Also, the value of
CDR_QUEUEMEM is not the max size of all of the queues. Rather it
is the max size of any specific queue. That means that if
CDR_QUEUEMEM is set to the default 4096, then both the send queue and the
receive queue can grow up to 4 meg each.
Impact on the Send Queue
When the send queue approaches CDR_QUEUEMEM size, spooling
threads will be spawned to flush transactions to the configured
smartblob space. These spooled transactions are not immediately
freed from memory however. Instead we will not free the spooled
transactions from memory until the CDR_QUEUEMEM limit is reached.
If that limit is reached, then the spooled transactions will be
freed from memory and thus will exist only in the smartblob storage of
the queue.
When it comes time to send a transaction to the target, if the
transaction exists only in the smartblob portion of the queue, then the
transaction is transmitted directly from the spooled transaction to the
target. We do not reload the transaction totally into memory once
it has been spooled and has been removed from main memory.
Impact on the Receive Queue
As the transaction is received on the target, it is placed in the
receive queue where it remains until it is applied by the datasync
threads.
We only spool a transaction in the receive queue if it exceeds 1/2 of
the total queue memory size. If the receive queue should reach
the CDR_QUEUEMEM limit, then the target will activate flow
control by causing a NIF block. The purpose of this is to prevent
the source from sending any additional transactions until the receive
queue drains a bit.
Sizing the Transaction
In order to correctly size the queues, it is important to know how much
memory is required to store the transaction as it is in transit to the
target server. Each row within the replicated transaction
contains fixed header which contains information about the row.
Also there can be a series of options which contain specific
information about the row. Probably the most common option is a
hash value which is used to support apply parallelism. Finally
each replicated transaction will have a transaction header. The
current (IDS 11) size of the fixed row buffer header is 52 bytes on a 32-bit machine,
and the size of the transaction header is 258 bytes. For 64-bit
machines, the size of the row buffer header is 60 bytes and the transaction header is 292 bytes.
The options
is a variable list and can be of variable sizes. However, for this discussion
we will consider only the hash used in the apply parallelism which is 4
bytes.
The last part of the formula is the rowsize as is taken from the
systables table. If we examine the customer table of the stores
database, we see that the rowsize is 134 bytes.  That means that
the queue memory needed to contain a single row insert transaction
is 448 bytes. We could therefor queue 9581 single row
insert transactions or a single transaction of 22591 inserts before we
reached the CDR_QUEUEMEM limit of 4 Meg.
Care has to be taken, however, if the replicated table contains variable
length columns such as varchars or lvarches, because it is the expanded
size of the row that is used by Enterprise Replication. If we
examine the warehouses table (right) in the stores demo database, we see that
the table has three columns (warehouse_name, warehouse_id, and
warehouse_spec). Two of the columns are lvarchar column types of
2K size.
However if we examine the size of the warehouses table from systables (below), we
discover that the size of the row is 4106. This means
that we could only perform 971 single row insert transaction of the
warehouses table or a single transaction of 1031 inserts before
reaching the CDR_QUEUEMEM limit.
The fact that ER uses the expanded size of the row can be a surprise,
especially if the lvarchar columns in the original row only contained
short character strings. This has even more of an impact if the
environment is such that there is a lot of activity on the tables with
the lvarchars. In such a situation, spooling might occur if the
value of CDR_QUEUEMEM is set too low.
Sizing Strategy
The most common strategy for configuring Enterprise Replication is
generally to try to obtain the lowest possible latency. In order to do
that it is important to avoid spooling transactions to disk. This
means that the CDR_QUEUEMEM limits need to be fairly large.
Remember, ER is going to have to process all of the replicated
tables which all of the client transactions are updating. To do
that, ER needs to be able to hold as many replicated transactions in
memory as is possible. It might be that we want size the queue
memory based on the total allowed memory size. As an example, for
an update anywhere system, we might want to consider 1/8-1/6 of the
total memory available since with update anywhere, we will have
activity on both the send and receive queues. That would mean
that the total active queue memory would be between 1/4 and 1/3 of all
available memory.
For instance if the total virtual memory size is configured to be 1
Gig, then we probably want to consider letting the CDR_QUEUEMEM be
sized somewhere around 150 Meg for an update anywhere configuration or
200-250 Meg for a source/target configuration. Don't forget that
ER will have to process all of the activity that the client
transactions are processing and to do that is going to require memory.
Otherwise, ER will have to spool transactions, and spooling will
affect the latency of the apply.
Oct 09 2007, 12:00:00 AM EDT
Permalink
|
MACH the Knife
c:\mpruet\page1
The MACH11 Cluster
Overview
The High Availability Data Replication Secondary (HDR)
Setup of the HDR secondary
The Remote Standalone Secondary (RSS)
Setup of the RSS node
The Shared Disk Secondary (SDS)
Setup of the SDS node
Failover within the MACH11 Cluster
Special Case with SDS only Clusters
Promotion of the RSS node into an HDR Secondary
Demotion of the HDR Secondary into an RSS Node
Overview
The MACH11 cluster, introduced with IDS 11, is an extension of
the traditional HDR. It provides a fully integrated
solution for multiple levels of availability, and is the
foundation for Continuous Availability.
The MACH11 cluster introduces two new types of secondary server
which complement the existing HDR secondary. The first is the Remote
Standalone Secondary (RSS) and the other is the Shared Disk Secondary
(SDS). The main difference between the RSS node and the SDS node
is that while the RSS node maintains a physical copy of data on disk, the SDS node maintains only the shared memory buffer
pool. As the name implies, the SDS node is also attached to the
same physical disks as the primary node by using a shared disk
subsystem.
There can be only one primary node
within the cluster. Also
there can be only one HDR secondary. However, there can be any
number of RSS and/or SDS nodes within the cluster. Also it
is important to understand that only logged data is replicated within
the MACH11 cluster.
For additional information checkout "Availability Solutions with Informix Dynamic Server 11"
The High Availability Data Replication Secondary (HDR)
High Availability Data Replication (HDR) has been part of the
Informix Dynamic Server since IDS 6. It provides support for a
hot backup system which is also available for dirty read processing.
HDR works by shipping the logs from the primary node to the
secondary where they are applied to the physical chunks on the
secondary. HDR is a member of the MACH11
cluster and much of the technology which is used to implement the
rest of the MACH11 cluster is based on HDR technology
Setup of the HDR Secondary
There are six steps to bring up an HDR secondary. The first two
steps are often overlooked, but yet are fairly important. This
involves making sure that the chunk files exist on what will become the
secondary server and making certain that any UDR/Datablade executable
is installed on the secondary node. The files must have the same path
as they do on the primary node and the UDR/Datablade executable must be
in the same locations. It may be that this involves nothing more
than issuing the unix 'touch' command, or the establishment of links
to the appropriate directory. Also, care must be taken to ensure
that the chunk files have the proper owner, group and permissions.
Generally these will be owner - informix, group - informix,
permissions owner and group r+w.
The following chart describes the steps to create an HDR secondary.
| Step |
Description |
Primary |
Secondary |
| 1 |
Create chunk files on the secondary |
|
This is a manual step which must be performed on the secondary node. |
| 2 |
Install UDRs and datablades on the secondary |
|
This is a manual step which must be performed on the secondary node. |
| 3 |
Update the reserved pages and set this node into the primary and set the identity of the secondary node |
onmode -d primary <secondary_node> |
|
| 4 |
Perform a backup of the primary |
ontape -s -L 0
(onbar -b -L 0) |
|
| 5 |
Perform a physical restore on the secondary |
|
ontape -p
(onbar -r -p) |
| 6 |
Mark the secondary as a secondary and point to the primary instance |
|
onmode -d secondary <primary_node> |
table 1
In step 3. we set a flag in the reserved pages which identifies this
node as an HDR primary and also identify the network connection to the
HDR secondary. In step 4, we perform a full system backup of the
primary and then (step 5) perform a physical restore on the HDR
secondary node. ( A physical
restore does not perform the rollforward of the logical log files.
That means when the ontape/onbar command is finished, the restored instance is positioned at the backup
checkpoint. The rollforward of the logs is done by transmitting the logical logs from the primary node.)
The HDR secondary must run the same executable binary oninit as the
primary. The host of the HDR secondary must be similar to the
primary but does not have to be identical. For instance, the
primary might be a 24 processor system and the HDR secondary might be a
4 processor system. However, it is important not to undersize the
HDR secondary system because if the HDR secondary is unable to
process the log records as fast as the primary creates them, then
backflow can occur. If this should happen then user activity on
the promary can block until the HDR secondary can catch up.
While the HDR secondary can be used for report processing, its primary
purpose is to provide failover support in the event that the primary
node is lost. To make the secondary into a primary node, simply
run 'onmode -d primary <old_primary_node>'.
The Remote Standalone Secondary (RSS)
The primary purpose of the RSS node is to act as a backup for the
HDR secondary. If the primary node is down, the HDR secondary is normally promoted into the primary. However, if the
original primary is going to be down for an extended period of time, it
is possible to promote the RSS node into the HDR secondary.
Unlike the HDR secondary, the RSS node communicates with the primary
using a full duplexed model. This means that it is not necessary
for the secondary to acknowledge every message sent from the primary
before the next message is sent. Because the communication model
is full duplexed, it is possible for communication with the RSS
secondary to better utilize the network capacity.
This means that the RSS node can normally better utilize the
available network bandwidth than the HDR secondary.
That means
that the RSS node is better able to handle long distance communication
networks than the HDR secondary is. However, this comes at a
cost. The RSS node can only work in asynchronous mode. Even
the checkpoint is asychronous. Because of this, the RSS node is
not able to be promoted directly into a primary node. However,
it can be promoted into the HDR secondary and then subsequently be
promoted into the primary node.
Not only can the RSS node be promoted to the HDR secondary node, but
also the HDR secondary node can be demoted into an RSS node. This
might be desired during some periods of time to take advantage of the
full duplexed communications model.
Setup of the RSS node
RSS requires that the server utilize Index Page Logging. Normally
when an index is created, we only log the create index operation, not
the work that is done by the create index itself. With
traditional HDR, the index pages from the index build are directly transmitted to the
secondary as part of the index creation. With RSS, we felt that the
cost of attempting to transfer the index to multiple RSS nodes would
impact user activity too much, so we chose instead to simply place those pages
into the logical log. We do not place all of the index into the log as
a single transaction. Instead we may generate multiple
transactions to log the index creation so as to avoid any long
transaction during the index build. This feature is activated by
setting LOG_INDEX_BUILDS to 1 in the onconfig.
The setup of an RSS node is very similar to the setup of the HDR secondary node.
| Step |
Description |
Primary |
RSS Node |
| 1 |
Setup chunk files on the secondary |
|
Manual process to create any chunk files on the secondary node |
| 2 |
Install any UDRs and DataBlades |
|
Manual process to install any UDRs and/or DataBlades on the secondary node |
| 3 |
Register RSS node in the sysha database |
onmode -d add RSS <node> <password> |
|
| 4 |
Perform a backup on the source |
ontape -s -L 0
(onbar -b -L 0) |
|
| 5 |
Perform physical restore on the secondary |
|
ontape -p
(onbar -r -p) |
| 6 |
Connect to the primary |
|
onmode -d RSS <promary> <password> |
table 2
We establish the potential RSS node in step three and also set an
optional password for the initial connect request. If the sysha
database does not yet exist on the primary node, it will automatically
be created in the root chunk. The optional password is only used
in the initial connection from the RSS node to the primary. After
taking a full system backup on the primary and restoring it on the
secondary (again a physical restore), the setup is completed by issuing
onmode -d RSS on the RSS node.
This will cause a network connection to be established with the
primary and replication to be established. If a password was used
as part of the onmode -d add RSS command on the primary, then the same password is required as part of the onmode -d RSS command.
The Shared Disk Secondary (SDS)
While the HDR secondary and RSS nodes maintain both the buffer cache
and a disk copy of the database chunks, the shared disk secondary node
only maintains the buffer cache. Instead of maintaining a copy of
the chunks on local disk, the SDS node uses the same physical disks as
the primary on a shared disk subsystem such as Veritas or GPFS.
The reason that we implemented the Shared Disk Secondary was to
take advantage of newer disk technology. For instance, the
customer might want to have a standby instance but use disk
mirroring or some other means of hardware availability solution to
provide for the disk redundancy.
The setup of the SDS node is a different process than the setup of
the HDR secondary or RSS nodes. Instead of performing a backup of
the primary node and physical restore on the secondary node, the SDS
node is instantiated by simply issuing a checkpoint on the primary node
and the SDS node starting the roll-forward of the logs as of that
checkpoint LSN. As the primary is flushing logs to disk, it sends
LSN that it has flushed to the SDS node. The SDS node will then
read and process the logs up to that LSN. As the SDS node is
processing log records, it sends a notificatio | | |