Skip to main content

skip to main content

developerWorks  >  WebSphere  >

IBM WebSphere Developer Technical Journal: Transactional high availability and deployment considerations in WebSphere Application Server V6

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

John Beaven (beavenj@uk.ibm.com), WebSphere Development, IBM
Ian Robinson (ian_robinson@uk.ibm.com), STSM, WebSphere Transactions Architect, IBM

06 Apr 2005

This article introduces the new high availability support for the IBM® WebSphere® Application Server transaction service, available as part of WebSphere Application Server V6. The article describes the two main styles of transactional high availability, discusses the infrastructure requirements associated with them, and explains the configuration steps required to enable these styles of high availability in your WebSphere Application Server deployment.

Introduction

The WebSphere Application Server transaction manager stores information regarding the state of completing transactions in a persistent form that is used during transaction recovery. This persistent form is referred to as the transaction recovery log and is used to complete prepared transactions following a server failure. This activity is referred to as transaction recovery processing. In addition to completing outstanding transactions, this processing also ensures that any locks held in the associated resource managers are released.

Prior to WebSphere Application Server V6, it was necessary to restart a failed server to perform recovery processing. Version 6.0 introduces the ability for a peer server (another cluster member) to process the recovery logs of a failed server while the peer continues to manage its own transactional workload. This capability is known as peer recovery processing and supports the resolution of in-doubt transactions without the need to wait for the failed server to restart. This facility forms part of the overall WebSphere Application Server high availability (HA) strategy.

Styles of transactional high availability

WebSphere Application Server V6.0 supports two styles for the initiation of transaction peer recovery, which are categorized in this document as automated and manual. The style is governed by the configuration of a high availability policy, which is referred to hereafter simply as the policy for the transaction service. More information on these policies can be found in An introduction to high availability policies.

This document describes what these two styles offer, how to configure them and the factors to consider before enabling their use:

  • Automated peer recovery
    This is the default style of peer recovery initiation. If an application server fails, WebSphere Application Server automatically selects a server to perform peer recovery processing on its behalf. Apart from enabling high availability and configuring the recovery log location for each cluster member, no additional WebSphere Application Server configuration steps are required to use this model.
  • Manual peer recovery
    This style of peer recovery must be explicitly configured. If an application server fails, the operator can use the administrative console to select a server to perform recovery processing on its behalf. The configuration steps that are required to prepare the system for manual peer recovery and the method by which it is directed are described in the Configuration for manual peer recovery section.



Back to top


Physical deployment considerations

Regardless of the style of peer recovery initiation (automated or manual), certain common physical and logical deployment considerations must be taken into account prior to the use of the HA function. Some additional considerations for automated peer recovery are documented below.

Transaction recovery logs

For application servers to perform peer recovery for each other, it is necessary for them to access the recovery logs. Recovery logs must be placed on a medium that is available to all servers, such as a network-attached storage device (NAS) mounted on each node. All nodes must have read and write access to the recovery logs.


Figure 1. Physical deployment of recovery logs
Physical deployment of recovery logs

Two types of potential server failure exist: software failure and hardware failure. Software failures generally do not affect other application servers directly. Even servers on the same physical hardware can perform peer recovery processing. If a hardware failure occurs, all the servers that are deployed on the failed hardware become unavailable. Servers on other hardware are required to handle peer recovery processing. Clearly, any HA configuration requires that servers are deployed across multiple and discrete hardware systems.



Back to top


Logical deployment considerations

For the purposes of this section, it is necessary to introduce the following term definitions. These definitions are described in more detail in Configuration for automated peer recovery.

  • System overloading - The system becomes very heavily loaded such that response times are extremely poor and requests begin to time out.
  • Network partitioning - A communications failure occurs in a network that effectively creates two smaller networks that are now independent and cannot contact each other.
  • File locking - The process of obtaining an exclusive lock on a file over a network. File locking is used to ensure exclusive access to the files that make up a recovery log. Locking is enabled by default
  • NFS - Network file system. A remote file access protocol that is used by most UNIX® clients. The more recent version of NFS (NFSv4) provides different file locking semantics to its predecessor (NFSv3).
  • CIFS - Common Internet File System. A remote file access protocol that is used by Windows® clients.

Transaction peer recovery requires a common configuration of the resource providers between the participating server members and is constrained to servers within the same cluster.

Peer recovery style considerations

The capabilities of the file system through which the recovery logs are accessed and the required style of peer recovery initiation (automated or manual) have an impact on the deployment, configuration, and tuning of the system. The HA function for the transaction service uses network file locking, and different file systems have different file locking behaviors. Some file systems support network clients obtaining an exclusive write lock on a file and remove the lock if the client crashes. Some file systems do not fully support this behavior. The implications of these differences are discussed in the next section.

Manually initiated peer recovery does not require the use of exclusive file locks and can be configured and deployed without the need to worry about the locking semantics of the underlying file system. In addition to the steps described in General configuration, detailed configuration steps for manual recovery can be found in the Configuration for manual peer recovery section.

Automatic peer recovery does require the use and guarantee of exclusive file locks to ensure transactional data integrity. As a result, the capabilities of the underlying file system that hosts the recovery logs becomes important, as discussed in File system considerations. Detailed configuration steps for automated recovery can be found in the Configuration for automated peer recovery section of this document.

File system considerations

To maintain the integrity of a recovery log file, only a single client process can access the log at a time; this access can be assured using the exclusive network file locks discussed previously. Scenarios where this is important include, system overloading and network partitioning. In both of these cases, it is possible that machines seem to fail when they are actually still running. In the case of system overloading, this situation results because the machine is not responding to communications events. In the case of network partitioning, the machine is actually not contactable. Different file systems have different capabilities with respect to support for exclusive file locks and the ability of those locks to become invalidated when a client fails or becomes unresponsive.

The most popular protocols for accessing remote files are CIFS, used by Windows clients, and NFS, used by most UNIX clients. The most recent version of the NFS protocol, NFSv4, provides lease-based exclusive locks on files as does CIFS. NFSv3 locking is not lease-based and so is less effective in an environment where file ownership needs to be failed over in the event of a server crash. If NFSv3 is used with automated peer recovery, a systems administrator must consider additional configuration choices, which are detailed in Considerations for automated peer recovery. These considerations are not required for either manually-initiated peer recovery or when the file system is either NFSv4 or CIFS. This information is summarized in Table 1.


Table 1. Summary of supported configurations
Supported configurations

The Disabling file locking section contains a discussion of the effect of disabling locking and how to configure this functionality when NFSv3 is used with automatically initiated peer recovery.



Back to top


An introduction to high availability policies

What is a high availability policy?

WebSphere Application Server V6.0 provides integrated high availability support in which system subcomponents, such as the transaction service, are made highly available. A high availability (HA) policy provides the logic that governs the manner in which each WebSphere Application Server HA component behaves within the overall HA framework. In the case of the transaction service, the transaction HA policy provides the logic to determine which servers own a recovery log at any given time. Typically, policies assign ownership of a recovery log to the server that originally created it (the home server) and that server may then use the recovery log for both recovery and normal transactional activity.

In the event that the home server is unavailable or fails, ownership can pass to a peer server to perform recovery processing.

Conceptually, a policy can be thought of as consisting of two key components, a policy type and a policy configuration. Only HA policies relevant to transaction peer recovery are considered in this article, although other highly available WebSphere Application Server services are available that define their own HA policies.

Policy type

The policy type determines whether peer recovery initiation is manual or automated. The policy essentially provides the logic for determining updated recovery log ownership in the event of a server failure. WebSphere Application Server provides the following policy types for transaction peer recovery:

  • Static
    Ownership of the recovery log is defined in the WebSphere Application Server configuration. At run time, the static policy assigns ownership accordingly. Any changes to ownership require a change to the static configuration and therefore, this policy type is used when manually initiated peer recovery is warranted.
  • One-of-N
    Ownership of the recovery log is determined dynamically by the WebSphere Application Server HA framework and assigned to exactly one of the N cluster members. This policy type is used for automated peer recovery.

Policy configuration

Additional parameters known as the match criteriaare used to tune the behavior of the policy. The match criteria is a set of key/value pairs that is used to define the scope of a policy. Two specific key values of interest are:

  • type
    Used to associate a policy with the transaction service. Do not confuse this value with the policy type defined previously. The complete key value pair is as follows: type = WAS_TRANSACTIONS. Other values for this key are used to associate the policy definition with other highly available WebSphere Application Server components and are outside the scope of this document.
  • GN_PS
    Used to associate a policy with a specific server. For transaction service policy definitions, this value is used only when defining policies that support manual recovery. The complete key value pair is as follows: GN_PS = <cell_name>\<node_name>\<server_name>.

How is a policy associated with the cluster?

WebSphere Application Server provides a grouping mechanism for application servers known as a core group and enforces the rule that all members of a given cluster must also reside in the same core group. The screenshot in Figure 2 shows the association between an application server, which in this case is server1, and a core group, which in this case is DefaultCoreGroup. Given that this server is a member of a cluster, all other cluster members, by definition, are in the same core group.


Figure 2. Core groups
Core groups

Each core group definition has a policies attribute that can contain an arbitrary number of policy definitions. These definitions can be thought of as policy definitions that apply to the associated clusters and their members. Figure 3 shows the default policy definitions for the default core group.


Figure 3. Policy definitions for the default core group
Policy definitions for the default core group

The policy definition, Clustered TM Policy, is associated with the transaction service by the type = WAS_TRANSACTIONS match criteria. This policy is a One-of-N type policy which provides automated peer recovery processing support such that only one server owns the recovery log at any time.



Back to top


General configuration

Recovery log location

If transactional high availability is not in use, no requirement exists to specify the transactions recovery log directory. In the absence of such a setting, application servers assume a default location within the profile directory. If transactional high availability is in use, the recovery log location must be provided to ensure that the recovery logs for a server are visible to all application servers in the cluster. Otherwise, servers might not be able to access each profile directory.

The administrative console panel that is used to configure this setting is shown in Figure 4. You can find this panel under the Container services setting in the server configuration. This setting must be provided for each application server in the cluster.


Figure 4. Administrative console: Configuring the transaction service
Configuring the transaction service

Enabling high availability

Before peer recovery events can be performed within the cluster, it is necessary to enable high availability for persistent services, namely the transaction service. This single boolean setting is located in the cluster configuration as shown in Figure 5:


Figure 5. Enabling high availability for the cluster
Enabling high availability for the cluster

After this setting is changed (either enabled or disabled), you must restart the servers before the required behavior is adopted. If the setting is enabled, servers are only available for peer recovery processing after they are restarted. Similarly, if the setting is disabled, servers continue to be available for peer recovery processing until they are restarted.



Back to top


Configuration for automated peer recovery

Automated peer recovery processing is the default setting when HA is enabled. Servers automatically peer recover for each other without any manual intervention. Apart from the general configuration steps described previously, no additional configuration is required to adopt this style of transactional high availability.

This section aims to describe the default configuration so that it can be restored if manual peer recovery is configured.

Automated peer recovery is provided by a single policy known as the One-of-N policy.

Configuration steps

  1. Remove any static HA policies

    1. When configuring the system for manual peer recovery (see Configuration for manual peer recovery), a number of static policies are defined. These policies must be removed for automated peer recovery to be enabled again. Within the administrative console, navigate to the Servers => Core groups => Core group settings tab in the left-hand panel. All cluster members reside in a single core group. By default, this group is the DefaultCoreGroup.) Select the core group that contains these servers in the Core groups window. The Core groups configuration panel is displayed, as shown in Figure 6.
      Figure 6. Core groups configuration
      Core groups configuration
    2. Select Policies from the Additional Properties options on the right side of this window. The current set of policies is displayed:
      Figure 7. Core groups policies
      Core groups policies
    3. Select any static policies with a match set including type = WAS_TRANSACTIONS, and click Delete.

  2. Create the required transaction HA policy

    1. In the previous example, the required WAS_TRANSACTIONS One-of-N policy already exists as shown in Figure 8:
      Figure 8. One-of-N policy
      One-of-N policy
      If this policy is already defined, no user action is required. If, however, this policy is not present, it must be defined. To do this, navigate to the Servers => Core groups => Core group settings tab in the left panel. Select the core group that is associated with the cluster members in the Core groups window. This action opens the Core groups configuration panel that is shown in Figure 6.

    2. Select Policies from the Additional Properties options on the right side of this window. This action displays the current set of policies:
      Figure 9. Current set of policies
      Current set of policies
    3. From the Policies panel, select New to begin the process of defining a new policy. The first step is to choose the policy type. In this case, choose the One-of-N policy as shown in Figure 10 and select Next.
      Figure 10. Select new policy type
      Select new policy type
    4. The next step is to provide a name for the policy. Although this value is a free-form text string that is used only as a label in the administrative console, enter a name that associates the policy with the required behavior, for example, Clustered TM policy.)
      Figure 11. Create new policy
      Create new policy
    5. Enable Fail back to ensure that a failed server reclaims its recovery log when it restarts, and optionally enter a free-form description. Click OK and then select Match criteria under Additional properties. This action opens an empty Match criteria panel as shown in Figure 12.
      Figure 12. Match criteria
      Match criteria
    6. The final step is to define the match criteria that identifies this policy as a transactional policy. Click New to open the match criteria definition window. Enter type as the name and WAS_TRANSACTIONS as the value, and select OK.
      Figure 13. New match criteria
      New match criteria
    7. After this step is complete, save the configuration in the normal manner, ensuring that the Synchronize changes with nodes' option is selected.

  3. Restart the cluster

    When these changes are made, stop and restart the cluster members to adopt these new settings. Until this action is taken, the cluster members continue to operate according to the previous policy configuration.

Considerations

A requirement of a recovery log is that only one server has access to it (either for recovery processing or standard logging) at any time. Consequently, a peer recovery process must not run and access the associated recovery logs if the original server is still running. When using manual peer recovery, this access is enforced by the operator who only triggers peer recovery processing for servers that are unavailable.

When using automated recovery processing, WebSphere Application Server makes the determination that a server has failed using a heart-beating mechanism between servers. When a server fails, it stops responding to the heartbeat messages and other servers detect this change.

In addition to server failure, two other scenarios exist where a server might either stop or seem to stop responding to these heartbeat events. These two scenarios, introduced as file system considerations are:

  • System overloading occurs when a machine becomes very heavily loaded such that response times are extremely poor and requests begin to time out. Several potential causes exist for such overloading, including:
    • The server is underpowered and cannot handle the workload.
    • The server received a temporary surge of requests.
    • Insufficient physical memory is available. As a result, the operating system is too busy paging to give the application server the required CPU time.
  • Network partitioning occurs when a communications failure in a network results in two smaller networks that are now independent and cannot contact each other.

Regardless of the cause, these conditions result in "failures" being detected in the system, even though all the servers are actually still running and processing a transactional workload. Figure 14 illustrates these conditions:


Figure 14. Network partitioning
Network partitioning

Although these conditions are rare, left unchecked they have the potential to cause recovery log collision and consequent loss of data integrity. To prevent such problems from occurring, WebSphere Application Server uses network file locking technology to ensure exclusive access to recovery log files.

File locking for recovery logs

WebSphere Application Server obtains an exclusive lock on the physical recovery log files whenever it is instructed to perform recovery processing and releases this lock when it is instructed to pass ownership of the logs to another server. Access to a recovery log is only performed when the exclusive lock is held. As discussed in File system considerations, different network file systems have different capabilities with respect to their support for exclusive locks. The result is that automated peer recovery is better suited to NFSv4 or CIFS than to NFSv3. Listed below are the differences between the locking behaviors of NFSv4 and its predecessor, NFSv3:

NFSv3

This file system supports exclusive file locks, but holds them on behalf of a failed host until that host can restart. In this context, the host is the physical machine running the application server that requested the lock and it is the restart of the host, not the application server, that eventually triggers the locks to release.

By way of illustration, consider the behavior when a cluster members fails:

  1. Server H is running on host H and holds an exclusive file lock for its own recovery log files.
  2. Server P is running on host P and holds an exclusive file lock for its own recovery log files.
  3. Host H fails, taking server H with it. The NFS lock manager on the file server holds the locks that are granted to server H on its behalf.
  4. A peer recovery event is triggered in server P for server H by WebSphere Application Server.
  5. Server P attempts to gain an exclusive file lock for this peer recovery log, but is unable to do so as it is held on behalf of server H. The peer recovery process is blocked. Effectively, peer recovery is disabled because of the locking behavior of the file system.
  6. At some point, host H is restarted. The locks held on its behalf are released.
  7. The peer recovery process in server P is unblocked and granted the exclusive file locks that are needed to perform peer recovery.
  8. Peer recovery takes place in server P for server H.
  9. Server H is restarted.
  10. If peer recovery is still in progress in server P, the recovery is halted.
  11. Server P releases the exclusive lock on the recovery logs and returns ownership of the recovery logs back to server H.
  12. Server H obtains the exclusive lock and can now perform standard transaction logging.
When using NFSv3, you can use two techniques to provide a more appropriate failover behavior:
  1. Elect to use manual failover and configure the system as described in Configuration for manual peer recovery
  2. Disable the use of exclusive file locking and put in place measures to prevent overloads or network partitions. See Disabling file locking for more information on this topic.

NFSv4

Unlike NFSv3, this file system releases locks held on behalf of a host in case that host fails. Peer recovery can occur automatically without the need to restart the failed hardware. This version of NFS is recommended for use with automated peer recovery.

Disabling file locking

If you use NFSv3 to support automatic peer recovery processing, it becomes necessary to disable file locking, as discussed previously in File locking for WebSphere Application Server recovery logs, This action, in turn, requires that additional measures be put in place to prevent system overloading or network partitioning that might lead to a peer recovery process being directed for an active server.

To disable file locking:

  1. Open the transaction service configuration settings and select Custom properties.
    Figure 15. Transaction service custom properties
    Transaction service custom properties
  2. Select New to define a new custom property and enter DISABLE_FILE_LOCKING as the name and TRUE as the value before clicking OK to add the new property.
    Figure 16. Disabling file locking
    Disabling file locking

This article is not intended to provide a guide to the mechanisms by which overloading and partitioning can be prevented. These mechanisms are the domain of workload management and wider systems configuration and administration. The references in the Resources section provide further information.

Having taken steps to mitigate the risk to recovery log integrity when locking is disabled, it is possible to tune the heartbeating parameters of the WebSphere Application Server HA framework to change the conditions under which a server is considered failed. A system administrator needs to consider the characteristics of applications, network, and peak workloads to determine whether an acceptable period of time exists beyond which the likelihood of an incorrectly diagnosed server failure is considered acceptably small.

A trade-off exists between reducing the risk of an incorrect diagnosis of server failure and increasing the time it takes for automated failover and peer recovery to occur. By default, a server is considered to have failed after 20 heartbeats, with a 10-second frequency, are missed. These defaults are properties of the core group that can be modified, but a discussion of how to do this modification and the consequences of doing so are beyond the scope of this article. See Resources for more information.



Back to top


Configuration for manual peer recovery

Manual peer recovery processing is not the default setting and must be enabled through the configuration. Operator intervention is then required to trigger any peer recovery processing. Manual peer recovery can be used when the file system does not provide the required level of file locking support and no constraints are in place to ensure that overloading or network partitioning does not occur.

Operator action is required only when a server fails and cannot be restarted; in this case, the administrator can use the administrative console to specify which peer server the HA framework directs the transaction log to fail over to.

The configuration steps that are required to set up manual peer recovery are listed below. Manual peer recovery processing is provided by a group of polices known as static policies, where one policy definition is provided per application server. Individual definitions are required to define server-specific configuration within the policy which, in this case, is the identity of the server that will initiate a peer recovery process.

Configuration steps

  1. Create the required static policy definitions
    1. Within the administrative console, navigate to the Servers => Core groups => Core group settingstab in the left panel. All cluster members reside in a single core group, which by default is the DefaultCoreGroup.). Select the core group that contains these servers in the Core groups window. This action opens the Core groups configuration panel, as shown previously in Figure 6.
    2. Select Policies from the Additional Properties options on the right-hand side of this window. This action displays the current set of policies, as shown previously in Figure 7.
    3. Note that the current policy that governs transactional high availability can be identified by the type = WAS_TRANSACTIONS match criteria. In the previous example, the default is One-of-N policy. This policy can be left in place and is overridden by the static policy definitions that are provided by the configuration steps in this section.
    4. From the Policies panel, select New to begin the process of defining a new policy. The first step is to choose the policy type. Select the Static policyfrom the drop-down menu, as shown in FIgure 17 and click Next.
      Figure 17. Select a static policy
      Select a static policy
    5. A panel is displayed where you can configure this new policy by entering a name and, optionally, a description. The name you choose has no direct effect on the configuration as it is a free-form text string that is used only as a label in the administrative console. To assist readability in the administrative console, choose a name that clearly associates the policy with a particular server. In this example, the name TM-SERVER1 is used to associate this policy with server1.
      Figure 18. Configure a static policy
      Configure a static policy
    6. Click OK and select Match criteria under Additional properties. An empty Match criteria panel is displayed, as shown in Figure 12.
    7. The next step is to define the two match criteria that identify this policy as a transactional policy that is associated with the target server. Select New to open the match criteria definition window and enter type as the name and WAS_TRANSACTIONS as the value. After this action is done, click OK to associate the policy with the transaction service.
      Figure 19. Define match criteria
      Define match criteria
    8. Repeat this procedure with the name GN_PS and type <Cell>\<Node>\<Server>, where these fields are replaced by their respective values. In this example, the Value field is set to the value, dmgrCell\appnode1\server1.
      Figure 20. Match criteria
      Match criteria
    9. Navigate back to the Policy configuration panel that is shown in Figure 18 and select the Static group servers, which is located on the right under Additional Properties. The Static group servers panel is displayed, as shown in Figure 21:
      Figure 21. Static group servers
      Static group servers
      The Static group servers panel lists all the servers in the core group and classifies them as either Core group servers or Static group servers. Now, no Static group servers are listed.

    10. From the Core group servers list, select the server that is associated with the policy and click Add >> to move it to the Static group servers list. Be careful when you select the server because an incorrect selection here can compromise data integrity. In this example, appNode1/server1 is selected.

      The list of static group servers can be thought of as the set of servers that attempts to own the recovery logs at the same time. For normal operation, several criteria must be ensured to guarantee data integrity:
      • Only one server must be added to this list. This criterion is very important. Adding two servers causes recovery log contention as both servers attempt to own the associated recovery logs. Exception: Where a second server is added as part of manual peer recovery initiation. More information can be found in the Directing manual peer recovery section.
      • Only the server that is associated with the policy must be added to this list. Adding a different server prevents the home server from owning its recovery logs and stops the home server from starting correctly.
        Figure 22. Static group servers
        Static group servers
    11. Repeat this process for each server in the cluster. When this process is complete, save the configuration in the normal manner, ensuring that the Synchronize changes with Nodes option is selected. The following figure shows a complete configuration for an example cluster with three servers.
      Figure 23. Example of a complete static configuration
      Example of a complete static configuration
  2. Restart the cluster

    Restart the cluster within which these servers reside after these configuration changes are made.



Back to top


Managing peer recovery

After the appropriate static policies are defined, peer recovery can no longer take place automatically and must be triggered by an administrator through the administrative console. This requirement applies to peer recovery processing only; standard recovery processing of server recovery logs, driven when the server starts, still occurs automatically.

Typically, a peer recovery process is directed by the operator if an application server becomes unavailable for some reason, for example, a machine failure. The choice of peer server, within the cluster, is arbitrary.

Before the peer recovery process is initiated, it is important to ensure that the "problem" server has actually failed and cannot restart. To ensure data integrity, a manual peer recovery process must be initiated only for servers that are not running

Directing manual peer recovery

  1. Within the administrative console, navigate to the Servers => Core groups => Core group settings tab in the left panel.
  2. Select the core group that contains the failed server in the Core groups window. The Core groups configuration panel is displayed, as shown in Figure 6.
  3. Select Policies from the Additional Properties options on the right side of this window. The policy set is displayed that was established in the first section of this article, as illustrated in Figure 23.
  4. Locate the static policy that is associated with the failed server and select the Name field. The Configuration settings for this policy are displayed, as shown in Figure 24. In this example, assume that server server2 has failed.)
    Figure 24. Policy for a failed server
    Policy for a failed server
  5. Now select Static group servers, which is located on the right under Additional Properties. The General Properties panel is displayed as shown in Figure 25.
    Figure 25. Static group servers
    Static group servers
    During normal running, this list must contain only the server that is associated with the policy, in this case, server2. This list identifies the set of servers that attempt to own the associated recovery logs at the same time. Adding more than one server causes recovery log contention as both servers attempt to process recovery. However, in the event that the server has failed, which is an assertion that must be made by the operator, it is safe to add a second server to this list. The second server performs the peer recovery processing. After peer recovery processing is complete, you must remove this additional entry before the failed server is restarted.

    The General properties panel lists all the servers in the core group and classifies them as either Core group servers or Static group servers. Now, the only server that is listed in Static group servers is the failed server.

  6. From the Core group servers list, select the server on which you want to initiate the peer recovery process. Be careful to select an application server rather than a system server, such as the node agent or domain manager. In this example, server3 is chosen.
  7. After you select the server, click Add to add the server to the Static group servers.
    Figure 26. Add static peer recoverer server
    Add static peer recoverer server
  8. Click OK and save the configuration, ensuring that the Synchronize changes with nodes option is selected. This causes a recovery process for the failed server to begin on the peer server.

Resetting the configuration

When recovery processing is complete and before the failed server is restarted, the configuration changes that you made must be reversed. Navigate back to the Static group servers list, select the peer server that was added previously and click Remove => OK before saving the configuration with the Synchronize changes with nodes option selected.



Back to top


Conclusion

This article introduced the high availability support for the WebSphere Application Server transaction service. It described how this support enables recovery of the transaction log of a failed server by a running peer, and discussed scenarios under which manual and automated peer recovery are appropriate. The article described the key steps required to configure these scenarios.



Resources



About the authors

John Beaven works as a Software Engineer for IBM on WebSphere Application Server. He has been involved in the development of transaction support for CORBA OTS and J2EE JTA and is the technical lead for highly-availability in the WebSphere Application Server transaction manager. John holds a Bachelors in Computer Science from the University of Southampton, England.


Dr. Ian Robinson is an IBM senior technical staff member and the transaction architect for IBM WebSphere Application Server. He has over 10 years experience of designing and implementing distributed transaction systems, having worked on the IBM CICS server and ComponentBroker CORBA server. Ian is co-chair of the Web Services Resource Framework TC, spec lead for the J2EE Activity Service (JSR 95) and co-author of the WS-Transaction set of specifications. Ian received a BSc and a PhD in Physics from the University of Exeter, England, in 1986 and 1989 respectively.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top