Reliable Transaction Router
System Manager's Manual


Previous Contents Index

3.6.4 Controlling the Active Site

RTR lets the system operator deploy a range of shadow and standby partitions in order to provide the desired degree of application resilience to failures. By default, RTR automatically manages the assignment of active and standby roles to the available partition instances. The operator can assign a relative priority to each backend on which a partition instance exists. Enter priority as a list of backend node names with the highest priority first in decreasing order, as shown in the following example:


RTR> SET PARTITION/PRIORITY_LIST=(BE1, BE2, BE3) Facility1:Partition1 

Suspend transaction presentation before entering or changing the priority list.

Chapter 7 provides more information on the SET PARTITION command.

To set the partition backend priority list, program the set_qualifier argument of the rtr_set_info() call as follows:


 rtr_qualifier_value_t   set_qualifiers[ 2 ]; 
 char    *szNodeList = "your,list,of,node,names,here" 
 
 set_qualifiers[ 0 ].qv_qualifier = rtr_partition_be_priority_list; 
 set_qualifiers[ 0 ].qv_value  = &szNodeList; 
 set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; 
 set_qualifiers[ 1 ].qv_value  = NULL; 

3.6.5 Controlling Failover

In a system configured for maximum fault tolerance employing both shadows and standbys, there is a choice to be made in case the primary site fails. The /FAILOVER_POLICY qualifier to the SET PARTITION command allows the system operator to select one of the following policies that RTR should pursue in selecting the new primary site in the event of a failure:

A factor influencing the choice of failover policy is the time required to affect a failover and the subsequent impact on client response times. The time for standby takeover of a failed node's journal depends on the size of that journal. Failover to a shadow site is affected quickly. However, if the secondary site has accumulated a backlog of transactions, they must be processed before any new transactions can be started. The choice will be determined by the characteristics of your application and configuration.

The following example shows one use of the /FAILOVER_POLICY qualifier:


RTR> SET PARTITION/FAILOVER_POLICY=SHADOW Facility1:Partition1 

For more information see the SET PARTITION command in Chapter 6.

To set the partition failover policy, program the set_qualifier argument of the rtr_set_info() call as follows:


     rtr_qualifier_value_t   set_qualifiers[ 2 ]; 
     rtr_partition_failover_policy_t newPolicy; 
 
     set_qualifiers[ 0 ].qv_qualifier = rtr_partition_failover_policy; 
     set_qualifiers[ 0 ].qv_value = &newPolicy; 
     set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; 
     set_qualifiers[ 1 ].qv_value = NULL; 

Legal values for newPolicy are:

3.6.6 Controlling Transaction Replay

RTR has implemented the capability of controlling transaction replay in cases where a killer message happens during a transaction replay preventing recovery from continuing normally. A killer message causes server availability to be lost because of the presence of a message capable of causing repeated server application failure during recovery. This is typically the result of an improperly handled condition or application programming error within the server itself. Under such circumstances it may be desirable to sidestep a particular transaction, maintain server operation, and manually process the transaction at some later time.

The RTR solution is to establish, for a given partition, the maximum number of retries for any given transaction presented during recovery. Once this limit has been exceeded, the offending transaction is removed from the recovery process and is written to the journal as an exception record. Subsequent processing of this transaction requires manual intervention by someone qualified to evaluate and correct the situation in both the application and in RTR. Once the application status is understood, the SET TRANSACTION command can be used to update the journal, thus ensuring that the final state of any manually transacted exceptions are accurately reflected in future recovery operations.

The recovery retry count indicates the maximum number of times that a transaction should be presented for recovery before being written to the journal as an exception. Once a transaction has been recorded as an exception, it is no longer considered eligible for recovery and requires manual processing by a qualified individual.

The recovery retry count is partition-specific, and applies to both local and shadow recovery operations. The default is no limit on the number of retries, which permits a killer message to bring down all available servers servicing a given partition.

The recovery retry count should be set before starting (or restarting) the application servers so that the limit is established prior to the start of recovery operations.

The following example shows how to set the retry count:


RTR> SET PARTITION/RECOVERY_RETRY_COUNT=3 Facility1:Partition1 

See Chapter 7 for more information on the SET PARTITION command.

To set the partition transaction recovery limit, program the set_qualifier argument of rtr_set_info() as follows:


    rtr_qualifier_value_t   set_qualifiers[ 2 ]; 
    rtr_uns_32_t   newLimit = . . .; 
 
    set_qualifiers[ 0 ].qv_qualifier = rtr_partition_rcvy_retry_count; 
    set_qualifiers[ 0 ].qv_value = &newLimit; 
    set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; 
    set_qualifiers[ 1 ].qv_value = NULL; 

3.6.7 Partition Persistence

Partitions in RTR are designed to be persistent, remaining until explicitly removed during normal RTR processing. However, under certain conditions, relics of partitions can remain in the RTR journal. RTR automatically performs some cleanup of such records, but depends on the creation of the relevant facility to initiate this process. For example, in a test environment, many facilities are created for temporary use, with no intention of retaining those facilities. Because the creation of each facility may cause the creation of associated records for a partition in the RTR journal, creating many ad hoc facilities can cause the RTR journal to become filled. In such a case, when trying to create a new partition (or opening a new server channel), the error message NOMOREPRT may appear. To correct this problem, the journal must be purged of these ad hoc entries. To purge the RTR journal of such unwanted transactions, use the DUMP JOURNAL command to verify the partition name and transaction ID of the unwanted transactions, and use the SET TRANSACTION command with the partition name and transaction ID to set the state to DONE. Recreate the facility with the CREATE FACILITY command.

3.7 Displaying Partition Information

Information on the definition and state of a partition is displayed with the SHOW PARTITION command, as seen in the following example. The information of interest in the context of partition management relates to the backend instance of the partition. See Chapter 7 for more information on the SHOW PARTITION command.


RTR> show partition/backend 
 
Backend partitions on node BE1  at Wed Feb 24 15:07:50 1999 
 
Partition name                     Facility                       State 
RTR$DEFAULT_PARTITION_16777217     RTR$DEFAULT_FACILITY           active 
RTR$DEFAULT_PARTITION_16777218     RTR$DEFAULT_FACILITY           active 
 


Chapter 4
Transaction Management

4.1 Overview

This section describes the concepts of RTR's transaction management capability.

The RTR transaction is the center of an RTR application, and transaction state is the property that characterizes a transaction's current condition. Whenever a transaction progresses from one stage to another, the transaction state is updated to reflect a transaction transition. Transaction states are maintained in memory. Transaction states are also stored in the RTR Journal for recovery purposes.

Three different states are used internally by RTR to keep track of transaction status.

These three state types are very closely related. The Transaction Runtime State, also known as Transaction State, describes how a transaction progresses from an RTR role (FE, TR, BE) point of view. For example, a transaction can enter a stage in which its transaction state from an RTR frontend viewpoint is different from its transaction state from the viewpoint of an RTR router.

The Transaction Journal State describes how a transaction branch running on an RTR backend progresses from the RTR journal perspective. The Transaction Journal State and the Transaction Server State belong to each separate branch (participating partition) of the transaction. When a transaction branch changes state, its corresponding Transaction Journal State is updated and the new state, along with other information pertaining to this transaction, is stored in the RTR journal. The Transaction Journal State is primarily used by RTR to perform the recovery replay of a transaction after a failure, if necessary. An RTR frontend and router will not see this state. Note that because the Transaction Runtime State is not always stored immediately in the journal, the state in the journal may not always reflect the actual state of the transaction. The following table describes the Transaction Journal States.

Table 4-1 Transaction Journal States
Transaction Journal State (by Branch) Explanation of State
SENDING Initial state of the transaction branch.
VOTED The server has voted and the vote has been written to disk.
COMMIT RTR has asked the servers to commit the transaction.
ABORT RTR has asked the servers to rollback the transaction.
DONE The servers have informed RTR that the transaction has been committed to the database. It is safe to FORGET the transaction.
PRI_DONE The primary server has committed the transaction; the secondary may not have done so. This is the typical case of a REMEMBER transaction.
EXCEPTION RTR asked the server to commit the transaction, but the server failed to commit it to the database. The transaction needs manual reconciliation.

The Transaction Server State describes transaction state as seen by a specific server, serving that branch of the transaction. RTR uses this state to determine if a server is available to process a new transaction or if a server has voted on a particular transaction. As with the Transaction Journal State, the Transaction Server State is only relevant at the backend.

RTR provides a set of comprehensive management utilities to help users closely monitor the flow of a transaction and all three types of states associated with that transaction. These utilities help users understand how a transaction migrates from one stage to another and help diagnose problems.

Use the SHOW TRANSACTION command to examine a transaction's up-to-date status on frontend, router or backend roles. With this command, users can see all three types of transaction states of a particular transaction and also understand how the RTR journal and server applications perceive this transaction. When a transaction commits or aborts, all status associated with this transaction is removed from memory and can no longer be monitored by the command.

The DUMP JOURNAL command can be used to trace and review the flow of a transaction. The RTR journal saves all of the information about a transaction. This inlcludes its transaction journal state and the transaction messages (records) received from the RTR client and sent to the server. The information is kept until a transaction is committed or aborted and all participants have been notified.

Use the SET TRANSACTION command to modify the current state of a transaction to a new state. This command can be used to circumvent an unexpected situation. For example, in a situation where two shadowed servers are configured, the system administrator might decide not to replay (recover) all remembered transactions in an RTR journal after a failure. The SET TRANSACTION command could set specified transactions in a PRI_DONE or REMEMBER state to a DONE state and avoid the delay of transactions being remembered from a journal for fast recovery. The SET TRANSACTION command should only be used by experienced RTR system administrators as the command introduces the risk of corrupting or losing transactions if used incorrectly. It can be used on the backend only and the RTR log file must be turned on for this command.

Log file entries are made for all transaction state changes for debugging and auditing purposes.

4.2 Exception Transactions

When a server votes on a transaction, RTR expects the server to commit the transaction to the database when RTR makes the request. If for some reason the server cannot do so, the server has two choices:

EXCEPTION transactions can be inspected with the DUMP JOURNAL command. The final state of the transaction should say EXCEPTION.

4.2.1 Dealing with EXCEPTION Transactions

The system administrator must decide what to do with transactions that are marked EXCEPTION. There are two choices:

4.2.2 What EXCEPTION Transactions Mean to Data Integrity

EXCEPTION transactions keep the application available, although they cause some loss of data integrity. EXCEPTION transactions are considered committed by the initiator of the transaction, as well as by the other participants (such as the other shadow member). Therefore, subsequent transactions, which are dependent on the results of this transaction, could produce erroneous outcomes. In some applications, the erroneous outcomes do not matter. In applications where the outcome does matter, the best approach is to crash the application, and allow the system administrator to manually intervene.

4.3 Transaction State Changes

There are eight valid state changes allowed for the SET TRANSACTION command. Attempting to change transaction state to a state that is not allowed produces an error message of %RTR-E-INVSTATCHANGE, Invalid to change from current state to the specified state . Table 4-2 identifies the valid state changes.

Table 4-2 Valid Transaction State Transitions
  NEW STATE
Current State COMMIT ABORT EXCEPTION DONE
SENDING   YES    
VOTED YES YES    
COMMIT     YES YES
EXCEPTION YES     YES
PRI_DONE       YES

All transaction states referenced in Table 4-2 are RTR journal states. Use the RTR commands DUMP JOURNAL or SHOW TRANSACTION to determine the journal state for each transaction branch.

Four typical situations are listed below where transaction state changes by the system administrator are allowed.

After the SET TRANSACTION command is executed, use the DUMP JOURNAL command to verify the result.


Previous Next Contents Index