 | Level: Intermediate John Beaven (beavenj@uk.ibm.com), WebSphere Development, IBM Ian Robinson (ian_robinson@uk.ibm.com), STSM, WebSphere Transactions Architect, IBM
06 Apr 2005 This article introduces the new high availability support for the IBM® WebSphere® Application Server transaction service, available as part of WebSphere Application Server V6. The article describes the two main styles of transactional high availability, discusses the infrastructure requirements associated with them, and explains the configuration steps required to enable these styles of high availability in your WebSphere Application Server deployment.
Introduction
The WebSphere Application Server transaction manager stores
information regarding the state of completing transactions in a persistent form
that is used during transaction recovery. This persistent form is referred to
as the transaction recovery log and is used to complete prepared transactions
following a server failure. This activity is referred to as transaction recovery
processing. In addition to completing outstanding transactions, this processing also ensures that any locks held in the associated resource managers are released.
Prior to WebSphere Application Server V6, it was necessary to
restart a failed server to perform recovery processing. Version
6.0 introduces the ability for a peer server (another cluster member) to process the recovery logs of a failed server while the peer continues
to manage its own transactional workload. This capability is known as peer recovery processing and supports the resolution of
in-doubt transactions without the need to wait for the
failed server to restart. This facility forms part of the overall WebSphere Application Server high availability (HA) strategy.
Styles of transactional high availability
WebSphere Application Server V6.0 supports two styles for the
initiation of transaction peer recovery, which are categorized in this document as automated and manual. The style is governed by the configuration of a high availability policy, which is referred to hereafter simply as the policy for the transaction service. More information on these policies can be found in An introduction to high availability policies.
This document describes what these two styles
offer, how to configure them and the factors to consider before
enabling their use:
- Automated peer recovery
This is the default style of peer recovery
initiation. If an application server fails, WebSphere Application Server
automatically selects a server to perform peer recovery processing on
its behalf. Apart from enabling high availability and configuring the
recovery log location for each cluster member, no additional WebSphere Application Server configuration steps are required to use this model.
- Manual peer recovery
This style of peer recovery must be explicitly configured. If an application server fails, the
operator can use the administrative console to select a server to perform recovery processing on its behalf. The
configuration steps that are required to prepare the system for manual peer recovery and
the method by which it is directed are described in the Configuration for manual peer recovery section.
Physical deployment considerations
Regardless of the style of peer recovery
initiation (automated or manual), certain common physical and logical
deployment considerations must be taken into account prior to the use of
the HA function. Some additional considerations for
automated peer recovery are documented below.
Transaction recovery logs
For application servers to perform
peer recovery for each other, it is necessary for them to access the
recovery logs. Recovery logs must be placed on a medium that is
available to all servers, such as a network-attached storage device (NAS)
mounted on each node. All nodes must have read and write access to the recovery
logs.
Figure 1. Physical deployment of recovery logs
Two types of potential server
failure exist: software failure and hardware failure. Software failures
generally do not affect other application servers directly. Even servers on the same physical hardware can perform peer recovery processing.
If a hardware failure occurs,
all the servers that are deployed on the failed hardware become
unavailable. Servers on other hardware are required to handle peer
recovery processing. Clearly, any HA configuration requires that servers are deployed across multiple and discrete hardware systems.
Logical deployment considerations
For the purposes of this section, it is
necessary to introduce the following term definitions. These definitions are
described in more detail in Configuration for automated peer recovery.
- System overloading - The system becomes very heavily loaded such that
response times are extremely poor and requests begin to time out.
- Network partitioning - A communications failure
occurs in a network that effectively creates two smaller networks that
are now independent and cannot contact each other.
- File locking - The process of obtaining an exclusive lock on a
file over a network. File locking is used to ensure
exclusive access to the files that make up a recovery log. Locking is enabled by default
- NFS - Network file system. A remote file access protocol
that is used by most UNIX® clients. The more recent version of NFS (NFSv4)
provides different file locking semantics to its predecessor (NFSv3).
- CIFS - Common Internet File System.
A remote file access protocol that is used by Windows® clients.
Transaction peer recovery requires a common
configuration of the resource providers between the participating server
members and is constrained to servers within the same cluster.
Peer recovery style considerations
The capabilities of the file
system through which the recovery logs are accessed and the
required style of peer recovery initiation (automated or manual) have an
impact on the deployment, configuration, and tuning of the system.
The HA function for the transaction service uses network file locking, and different file systems have different file locking behaviors. Some file systems support
network clients obtaining an exclusive write lock on a file and remove the lock if the client crashes. Some file systems do
not fully support this behavior. The implications of these differences are discussed in the next section.
Manually initiated peer recovery does not
require the use of exclusive file locks and can be configured and
deployed without the need to worry about the locking semantics of the underlying
file system. In addition to the steps described in General configuration, detailed configuration steps for manual recovery can be found
in the Configuration for manual peer recovery section.
Automatic peer recovery does require the use and guarantee of exclusive file locks to ensure
transactional data integrity. As a result, the capabilities of the
underlying file system that hosts the recovery logs becomes important, as discussed
in File system considerations. Detailed configuration steps for automated recovery can be
found in the Configuration for automated peer recovery section of this document.
File system considerations
To maintain the integrity of a recovery log
file, only a single client process can access the log at
a time; this access can be assured using the exclusive network file locks discussed previously. Scenarios where
this is important include, system overloading and network
partitioning. In both of these cases, it is possible that machines seem to fail
when they are actually still running. In the case of system overloading, this situation results because the machine is not responding to communications events. In the case of network
partitioning, the machine is actually not contactable. Different file systems have different
capabilities with respect to support for exclusive file locks and the ability of those locks to become invalidated when a client fails or becomes unresponsive.
The most popular protocols for accessing
remote files are CIFS, used by Windows clients, and NFS, used by most UNIX
clients. The most recent version of the NFS protocol, NFSv4, provides lease-based
exclusive locks on files as does CIFS. NFSv3 locking is not lease-based and so is
less effective in an environment where file ownership needs to be
failed over in the event of a server crash. If NFSv3 is used with automated
peer recovery, a systems administrator must consider additional configuration choices, which are detailed in Considerations for
automated peer recovery. These considerations are not required for either manually-initiated peer recovery or when the file system is either NFSv4 or CIFS. This information is summarized in Table 1.
Table 1. Summary of supported configurations
The Disabling file locking section contains a discussion of the effect of disabling locking and how
to configure this functionality when NFSv3 is used with automatically initiated peer
recovery.
An introduction to high availability policies
What is a high availability policy?
WebSphere Application Server V6.0 provides integrated
high availability support in which system subcomponents, such as the
transaction service, are made highly available. A high availability (HA) policy
provides the logic that governs the manner in which each WebSphere Application Server HA component behaves within the overall HA framework. In the case of the transaction
service, the transaction HA policy provides the logic to determine which
servers own a recovery log at any given time. Typically, policies
assign ownership of a recovery log to the server that originally created
it (the home server) and that server may then use the recovery log for both
recovery and normal transactional activity.
In the event that the home server is
unavailable or fails, ownership can pass to a peer server to perform
recovery processing.
Conceptually, a policy can be thought of as
consisting of two key components, a policy type and a policy configuration. Only HA
policies relevant to transaction peer recovery are considered in this article, although other highly available WebSphere Application Server services are available that define their own HA policies.
Policy type
The policy type determines whether
peer recovery initiation is manual or automated. The policy
essentially provides the logic for determining updated recovery log ownership
in the event of a server failure. WebSphere Application Server provides the following policy types for transaction peer recovery:
- Static
Ownership of the recovery log is defined in the WebSphere Application Server
configuration. At run time, the static policy assigns ownership accordingly.
Any changes to ownership require a change to the static configuration and therefore, this policy type is used when manually initiated peer recovery is warranted.
- One-of-N
Ownership of the recovery log
is determined dynamically by the WebSphere Application Server HA framework and assigned to exactly one of the N cluster members. This policy type is used for
automated peer recovery.
Policy configuration
Additional parameters known as the match criteriaare used to tune
the behavior of the policy. The match criteria is a set of
key/value pairs that is used to define the scope of a policy. Two
specific key values of interest are:
- type
Used to associate a policy with
the transaction service. Do not confuse this value with
the policy type defined previously. The complete key value pair is as
follows: type = WAS_TRANSACTIONS. Other values for this key
are used to associate the policy definition with other highly
available WebSphere Application Server components and are outside the scope of this
document.
- GN_PS
Used to associate a policy with
a specific server. For transaction service policy definitions, this value is
used only when defining policies that support manual recovery. The
complete key value pair is as follows: GN_PS = <cell_name>\<node_name>\<server_name>.
How is a policy associated with the cluster?
WebSphere Application Server provides a grouping
mechanism for application servers known as a core group and enforces
the rule that all members of a given cluster must also reside in the
same core group. The screenshot in Figure 2 shows the association between
an application server, which in this case is server1, and a core group, which in this
case is DefaultCoreGroup. Given that this server is a member of a
cluster, all other cluster members, by definition, are in the same
core group.
Figure 2. Core groups
Each core group definition has a
policies attribute that can contain an arbitrary number of policy
definitions. These definitions can be thought of as policy definitions that apply to
the associated clusters and their members. Figure 3
shows the default policy definitions for the default core group.
Figure 3. Policy definitions for the default core group
The policy definition, Clustered TM
Policy, is associated with the transaction service by the type =
WAS_TRANSACTIONS match criteria. This policy is a One-of-N type policy which
provides automated peer recovery processing support such
that only one server owns the recovery log at any time.
General configuration
Recovery log location
If transactional high availability
is not in use, no requirement exists to specify the transactions
recovery log directory. In the absence of such a setting, application
servers assume a default location within the profile directory. If
transactional high availability is in use, the recovery log location
must be provided to ensure that the recovery logs for a server are visible to
all application servers in the cluster. Otherwise, servers might not
be able to access each profile directory.
The administrative console panel that is
used to configure this setting is shown in Figure 4. You can find this panel under the Container services setting in the server configuration. This setting must be provided for each application server in the cluster.
Figure 4. Administrative console: Configuring the transaction service
Enabling high availability
Before peer recovery events can be
performed within the cluster, it is necessary to enable high
availability for persistent services, namely the transaction
service. This single boolean setting is located in the cluster
configuration as shown in Figure 5:
Figure 5. Enabling high availability for the cluster
After this setting is changed
(either enabled or disabled), you must restart the servers before the
required behavior is adopted. If the setting is enabled, servers
are only available for peer recovery processing after they are
restarted. Similarly, if the setting is disabled, servers continue to
be available for peer recovery processing until they are
restarted.
Configuration for automated peer recovery
Automated peer recovery processing
is the default setting when HA is enabled. Servers automatically peer
recover for each other without any manual intervention. Apart from the
general configuration steps described previously, no additional configuration
is required to adopt this style of transactional high availability.
This section aims to describe the
default configuration so that it can be restored if manual peer recovery
is configured.
Automated peer recovery is provided
by a single policy known as the One-of-N policy.
Configuration steps
- Remove any static HA policies
- When configuring the system for manual peer recovery (see Configuration for manual peer recovery), a number of static
policies are defined. These policies must be removed for automated peer
recovery to be enabled again. Within the administrative console, navigate to the Servers => Core groups => Core group settings tab in the left-hand panel. All cluster members reside in a single core group. By default, this group is the DefaultCoreGroup.) Select the core group that contains these servers in the Core groups window. The Core groups configuration panel is displayed, as shown in Figure 6.
Figure 6. Core groups configuration
- Select Policies
from the Additional Properties options
on the right side of this window. The current set
of policies is displayed:
Figure 7. Core groups policies
- Select any static policies
with a match set including type =
WAS_TRANSACTIONS, and click Delete.
- Create the required transaction HA policy
- In the previous example, the required
WAS_TRANSACTIONS One-of-N policy already exists as shown in Figure 8:
Figure 8. One-of-N policy
If this policy is already defined,
no user action is required. If, however, this policy is not present,
it must be defined. To do this, navigate to the Servers => Core groups => Core group settings tab in the left panel. Select the core group that is associated with the
cluster members in the Core groups
window. This action opens the Core groups configuration panel that is shown in Figure 6.
- Select Policies
from the Additional Properties options
on the right side of this window. This action displays the current set
of policies:
Figure 9. Current set of policies
- From the Policies
panel, select New to begin the process of
defining a new policy. The first step is to choose the policy type. In
this case, choose the One-of-N policy
as shown in Figure 10 and select Next.
Figure 10. Select new policy type
- The next step is to provide a name
for the policy. Although this value is a free-form text string that is used only as a label in the administrative console, enter a name
that associates the policy with the required behavior, for example, Clustered TM policy.)
Figure 11. Create new policy
- Enable Fail back to ensure
that a failed server reclaims its recovery log when it restarts, and
optionally enter a free-form description. Click OK
and then select Match criteria under Additional properties. This action opens an
empty Match criteria panel as shown in Figure 12.
Figure 12. Match criteria
- The final step is to define the match criteria that identifies this policy as a
transactional policy. Click New
to open the match criteria definition window. Enter type as the name and WAS_TRANSACTIONS as the value, and select OK.
Figure 13. New match criteria
- After this step is complete, save the
configuration in the normal manner, ensuring that the Synchronize changes with nodes' option is selected.
- Restart the cluster
When these changes are made, stop and restart
the cluster members to adopt these new
settings. Until this action is taken, the cluster members continue to operate according to the previous policy configuration.
Considerations
A requirement of a recovery log is
that only one server has access to it (either for recovery
processing or standard logging) at any time. Consequently, a peer
recovery process must not run and access the associated recovery logs if
the original server is still running. When using manual peer recovery,
this access is enforced by the operator who only triggers peer recovery
processing for servers that are unavailable.
When using automated recovery
processing, WebSphere Application Server makes the determination that a server has failed
using a heart-beating mechanism between servers. When a server fails, it
stops responding to the heartbeat messages and other servers detect
this change.
In addition to server failure, two other scenarios exist where a server might either stop or seem to stop responding to these heartbeat events. These two scenarios, introduced as file system considerations
are:
- System overloading occurs when a machine becomes very heavily loaded such that response times are extremely poor and requests begin to time out. Several potential causes exist for such overloading, including:
- The server is underpowered and cannot handle the
workload.
- The server received a temporary
surge of requests.
- Insufficient physical memory is
available. As a result, the operating system is too busy paging to
give the application server the required CPU time.
- Network partitioning occurs when a communications failure in a network results
in two smaller networks that are now independent and cannot contact each
other.
Regardless of the cause, these
conditions result in "failures" being detected in the system, even
though all the servers are actually still running and processing a
transactional workload. Figure 14 illustrates these
conditions:
Figure 14. Network partitioning
Although these conditions are rare,
left unchecked they have the potential to cause recovery log collision
and consequent loss of data integrity. To prevent such problems from
occurring, WebSphere Application Server uses network file locking technology
to ensure exclusive access to recovery log files.
File locking for recovery logs
WebSphere Application Server obtains an exclusive lock
on the physical recovery log files whenever it is instructed to perform
recovery processing and releases this lock when it is instructed to pass
ownership of the logs to another server. Access to a recovery log is
only performed when the exclusive lock is held. As discussed in File system considerations, different network file systems have different
capabilities with respect to their support for exclusive locks. The
result is that automated peer recovery is better suited to NFSv4 or CIFS
than to NFSv3. Listed below are the differences between the locking behaviors of NFSv4
and its predecessor, NFSv3:
NFSv3
This file system supports exclusive
file locks, but holds them on behalf of a failed host until that host can restart. In this context, the host is the physical machine
running the application server that requested the lock and it is the
restart of the host, not the application server, that eventually
triggers the locks to release.
By way of illustration, consider the behavior when a cluster members fails:
- Server H is running on host H and holds an exclusive file lock for its own recovery log files.
- Server P is running on host P and holds an exclusive file lock for
its own recovery log files.
- Host H fails, taking server H with it. The NFS lock manager on the
file server holds the locks that are granted to server H on its behalf.
- A peer recovery event is triggered in server P for server H by
WebSphere Application Server.
- Server P attempts to gain an exclusive file lock for this peer
recovery log, but is unable to do so as it is held on behalf of server
H. The peer recovery process is blocked. Effectively, peer recovery is
disabled because of the locking behavior of the file system.
- At some point, host H is restarted. The locks held on its behalf
are released.
- The peer recovery process in server P is unblocked and granted the
exclusive file locks that are needed to perform peer recovery.
- Peer recovery takes place in server P for server H.
- Server H is restarted.
- If peer recovery is still in progress in server P, the recovery is halted.
- Server P releases the exclusive lock on the recovery logs and returns
ownership of the recovery logs back to server H.
- Server H obtains the exclusive lock and can now perform standard
transaction logging.
When using NFSv3, you can use two
techniques to provide a more appropriate failover behavior:
- Elect to use manual failover and configure the system as described in Configuration for manual peer recovery
- Disable the use of exclusive file locking and put in place
measures to prevent overloads or network partitions. See Disabling
file locking for more information on this topic.
NFSv4
Unlike NFSv3, this file system releases locks held on behalf of a host in case that host fails. Peer recovery can occur
automatically without the need to restart the failed hardware. This version of NFS is recommended for use with automated peer recovery.
Disabling file locking
If you use NFSv3 to support
automatic peer recovery processing, it becomes necessary to disable file locking, as discussed previously in File locking for WebSphere Application Server recovery logs,
This action, in turn,
requires that additional measures be put in place to prevent
system overloading or network partitioning that might lead
to a peer recovery process being directed for an active server.
To disable file locking:
- Open the transaction service configuration settings and select Custom properties.
Figure 15. Transaction service custom properties
- Select New to define a
new custom property and enter DISABLE_FILE_LOCKING
as the name and TRUE as the value
before clicking OK to add the new property.
Figure 16. Disabling file locking
This article is not intended to
provide a guide to the mechanisms by which overloading and partitioning
can be prevented. These mechanisms are the domain of workload management and wider
systems configuration and administration. The references in the Resources
section provide further information.
Having taken steps to mitigate the
risk to recovery log integrity when locking is disabled, it is possible
to tune the heartbeating parameters of the WebSphere Application Server HA framework to change the conditions under which a server is considered failed. A system
administrator needs to consider the characteristics of
applications, network, and peak workloads to determine whether
an acceptable period of time exists beyond which the likelihood of an
incorrectly diagnosed server failure is considered acceptably small.
A trade-off exists between reducing the risk of an incorrect diagnosis
of server failure and increasing the time it takes for automated failover
and peer recovery to occur. By default, a server is considered to have
failed after 20 heartbeats, with a 10-second frequency, are missed.
These defaults are properties of the core group that
can be modified, but a discussion of how to do this modification and the consequences
of doing so are beyond the scope of this article.
See Resources for more information.
Configuration for manual peer recovery
Manual peer recovery processing is
not the default setting and must be enabled through the configuration.
Operator intervention is then required to trigger any peer
recovery processing. Manual peer recovery can be used when the file
system does not provide the required level of file locking support and
no constraints are in place to ensure that overloading or network
partitioning does not occur.
Operator action is required only
when a server fails and cannot be restarted; in this case, the
administrator can use the administrative console to specify which peer server the
HA framework directs the transaction log to fail over to.
The configuration steps that are required to set up manual peer recovery are listed below. Manual peer recovery processing is
provided by a group of polices known as static policies, where one
policy definition is provided per application server. Individual
definitions are required to define server-specific
configuration within the policy which, in this case, is the identity of the
server that will initiate a peer recovery process.
Configuration steps
- Create the required static policy definitions
- Within the administrative console, navigate to the Servers => Core groups => Core group settingstab
in the left panel. All cluster members reside in a single core
group, which by default is the DefaultCoreGroup.). Select the core group that contains these servers in the Core groups window. This action opens the Core
groups configuration panel, as shown previously in Figure 6.
- Select Policies
from the Additional Properties options
on the right-hand side of this window. This action displays the current set
of policies, as shown previously in Figure 7.
- Note that the current policy that
governs transactional high availability can be identified by the type = WAS_TRANSACTIONS match criteria. In the previous example, the default is One-of-N policy. This policy can be left in place and is overridden by the
static policy definitions that are provided by the configuration steps in this
section.
- From the Policies
panel, select New to begin the process of
defining a new policy. The first step is to choose the policy type.
Select the Static policyfrom the drop-down menu, as shown in FIgure 17 and click Next.
Figure 17. Select a static policy
- A panel is displayed where you can
configure this new policy by entering a name and, optionally,
a description. The name you choose has no direct effect on the
configuration as it is a free-form text string that is used only as a
label in the administrative console. To assist readability in the administrative
console, choose a name that clearly associates the policy
with a particular server. In this example, the name TM-SERVER1 is used to associate this policy with server1.
Figure 18. Configure a static policy
- Click OK
and select Match criteria under Additional properties. An
empty Match criteria panel is displayed, as shown
in Figure 12.
- The next step is to define the two match criteria that identify this policy as a
transactional policy that is associated with the target server.
Select New to open the match
criteria definition window and enter type
as the name and WAS_TRANSACTIONS as
the value. After this action is done, click OK to associate the policy with the transaction service.
Figure 19. Define match criteria
- Repeat this procedure with the name
GN_PS and type <Cell>\<Node>\<Server>,
where these fields are replaced by their respective values. In this
example, the Value field is set to the value, dmgrCell\appnode1\server1.
Figure 20. Match criteria
- Navigate back to the Policy
configuration panel that is shown in Figure 18 and select the Static
group servers, which is located on the right under Additional
Properties. The Static
group servers panel is displayed, as shown in Figure 21:
Figure 21. Static group servers
The Static
group servers panel lists all the servers in the core group and
classifies them as either Core group servers or Static group servers.
Now, no Static group
servers are listed.
- From the Core
group servers list, select the server that is associated with the policy
and click Add >> to move it to
the Static group servers list. Be careful when you
select the server because an incorrect selection
here can compromise data integrity. In this example,
appNode1/server1 is selected.
The list of static group servers
can be thought of as the set of servers that attempts to own the
recovery logs at the same time. For normal operation, several
criteria must be ensured to guarantee data integrity:
- Only one server must be added to this list.
This criterion is very important. Adding two servers causes recovery log
contention as both servers attempt to own the associated recovery logs.
Exception: Where a second server is
added as part of manual peer recovery initiation. More information can
be found in the Directing manual peer recovery section.
- Only the server that is associated with the
policy must be added to this list. Adding a different server
prevents the home server from owning its recovery logs and
stops the home server from starting correctly.
Figure 22. Static group servers
- Repeat this process for
each server in the cluster. When this process is complete, save the
configuration in the normal manner, ensuring that the Synchronize
changes with Nodes option is selected. The following figure shows a complete
configuration for an example cluster with three servers.
Figure 23. Example of a complete static configuration
- Restart the cluster
Restart the cluster within which these servers reside after these configuration changes are made.
 |
Managing peer recovery
After the appropriate static policies
are defined, peer recovery can no longer take place automatically
and must be triggered by an administrator through the administrative console.
This requirement applies to peer recovery processing
only; standard recovery processing of server recovery logs,
driven when the server starts, still occurs automatically.
Typically, a peer recovery process
is directed by the operator if an application server becomes
unavailable for some reason, for example, a machine failure. The choice
of peer server, within the cluster, is arbitrary.
Before the peer recovery process is
initiated, it is important to ensure that the "problem" server has
actually failed and cannot restart. To ensure data integrity, a manual
peer recovery process must be initiated only for servers that are
not running
Directing manual peer recovery
- Within the administrative console, navigate to the Servers => Core groups => Core group settings tab
in the left panel.
- Select the core group that contains the failed
server in the Core groups window. The
Core groups configuration panel is displayed, as shown in Figure 6.
- Select Policies
from the Additional Properties options
on the right side of this window. The policy set is displayed
that was established in the first section of this article, as
illustrated in Figure 23.
- Locate the static policy that is associated
with the failed server and select the Name
field. The Configuration settings for this policy are displayed, as
shown in Figure 24. In this example, assume that server server2 has failed.)
Figure 24. Policy for a failed server
- Now select Static
group servers, which is located on the right under Additional
Properties. The General
Properties panel is displayed as shown in Figure 25.
Figure 25. Static group servers
During normal running, this
list must contain only the server that is associated with the policy, in this
case, server2. This list identifies the set of servers
that attempt to own the associated recovery logs at the same
time. Adding more than one server causes recovery log contention as
both servers attempt to process recovery. However, in the event that the
server has failed, which is an assertion that must be made by the operator, it is
safe to add a second server to this list. The second server
performs the peer recovery processing. After peer recovery processing is
complete, you must remove this additional entry before the failed server
is restarted.
The General
properties panel lists all the servers in the core group and
classifies them as either Core group servers or
Static group servers. Now, the
only server that is listed in Static group servers
is the failed server.
- From the Core
group servers list, select the server on which you want to
initiate the peer recovery process. Be careful to select an
application server rather than a system server, such as the node agent or
domain manager. In this example, server3 is chosen.
- After you select the server,
click Add to add the server
to the Static group servers.
Figure 26. Add static peer recoverer server
- Click OK and save the configuration, ensuring that the Synchronize changes with nodes option is
selected. This causes a recovery process for the failed server to begin on the peer server.
Resetting the configuration
When recovery processing is
complete and before the failed server is restarted, the configuration
changes that you made must be reversed. Navigate back
to the Static group servers list,
select the peer server that was added previously and click Remove => OK before saving the configuration with the Synchronize changes with nodes option selected.
Conclusion
This article introduced the high availability support for the WebSphere Application Server transaction service. It described how this support enables recovery of the transaction log of a failed server by a running peer, and discussed scenarios under which manual and automated peer recovery are appropriate. The article described the key steps required to configure these scenarios.
Resources
About the authors  | |  | John Beaven works as a Software Engineer for IBM on WebSphere Application Server. He has been involved in the development of transaction support for CORBA OTS and J2EE JTA and is the technical lead for highly-availability in the WebSphere Application Server transaction manager. John holds a Bachelors in Computer Science from the University of Southampton, England. |
 | |  | Dr. Ian Robinson is an IBM senior technical staff member and the transaction architect for IBM WebSphere Application Server.
He has over 10 years experience of designing and implementing distributed transaction systems, having worked on the IBM CICS server and
ComponentBroker CORBA server. Ian is co-chair of the Web Services Resource Framework TC,
spec lead for the J2EE Activity Service (JSR 95) and co-author of the
WS-Transaction set of specifications.
Ian received a BSc and a PhD in Physics from the University of Exeter, England, in 1986 and 1989 respectively.
|
Rate this page
|  |