hp Reliable Transaction Router
System Manager's Manual


Previous Contents Index

4.2.2 What EXCEPTION Transactions Mean to Data Integrity

EXCEPTION transactions keep the application available, although they cause some loss of data integrity. EXCEPTION transactions are considered committed by the initiator of the transaction, as well as by the other participants (such as the other shadow member). Therefore, subsequent transactions, which depend on the results of this transaction, could produce erroneous outcomes. In some applications, the erroneous outcomes do not matter, but in applications where the outcome does matter, the best approach is to allow the system administrator to manually intervene.

4.3 Transaction State Changes

There are eight valid state changes allowed for the SET TRANSACTION command. Attempting to change transaction state to a state that is not allowed produces an error message of %RTR-E-INVSTATCHANGE, Invalid to change from current state to the specified state . Table 4-2 identifies the valid state changes.

Table 4-2 Valid Transaction State Transitions
  NEW STATE
CURRENT STATE COMMIT ABORT EXCEPTION DONE
SENDING   YES    
VOTED YES YES    
COMMIT     YES YES
EXCEPTION YES     YES
PRI_DONE       YES

All transaction states in Table 4-2 are RTR journal states. Use the RTR commands SHOW TRANSACTION or DUMP JOURNAL (you must stop the RTRACP to use this command) to determine the journal state for each transaction branch.

Four typical situations are listed below where transaction state changes by the system administrator are allowed.

After the SET TRANSACTION command is executed, use the DUMP JOURNAL command to verify the result.

4.4 Command Line Examples

The following is an example of the SET TRANSACTION command:


RTR> start rtr 
RTR> set log/file=settran 
RTR> set transaction/state=PRI_DONE/new_state=DONE/facility=Facility1/- 
_RTR> partition=Partition1 * 

This example would set all transactions with the current state of PRI_DONE (remember) to DONE on the facility Facility1 and the partition Partition1. The log file, settran , would record the transaction state changes. The changes could be viewed with the SHOW TRANSACTION command or the DUMP JOURNAL command. In a shadow recovery situation this would clear the journal of remember transactions and provide for a quick turnaround of the shadow site.

The following example shows how RTR commands monitor and manipulate three different transaction states. Consider a scenario where a distributed transaction accesses two RTR partitions. The multiple-participant distributed transaction would have two transaction branches accessing different RTR partitions, say part1 and part2, respectively.

The client commits the transaction and calls rtr_accept_tx() which prompts RTR to start the two-phase commit protocol. RTR sends a prepare message to the two participants. Upon receiving the prepare message ( mt_prepare ), one of the server applications is ready to commit and casts its vote by calling rtr_accept_tx() . RTR writes a VOTE record in the RTR journal and sends the vote message back to the router. However, due to an unexpected defect in the application software, the second server has not sent its VOTE message back to the RTR router. Thus, the transaction is stalled in the second server.

To examine this situation, an RTR system administrator should first use the SHOW TRANSACTION/BACKEND command on the backend node to analyze the transaction's status. As shown in the following example, the transaction runtime state is RECEIVING, indicating the distributed transaction is not yet committed. The server states for the transaction branches are VOTED and VREQ respectively, indicating that one of the transaction branches has been voted by the associated server whereas the other transaction branch is still in "Vote Request" state (VREQ). The journal states for the transaction branches are VOTED and SENDING, indicating that one transaction branch has voted and its VOTED record was written in the RTR journal. The other transaction branch's journal state is SENDING, indicating that transaction branch is still in the process of processing a message from the client and it has not yet advanced to the VOTED state. The journal states for the transaction branches that are recorded in the RTR journal are consistent with their server states.

A transaction branch's journal state is persistent and is therefore used by the SET TRANSACTION command to change a transaction's state. The DUMP JOURNAL command is also useful to examine each transaction branch's journal state.


Backend transactions on node nodea at Mon Mar 13 16:02:42 2000 
 
Tid:            3ad01f10,0,0,0,0,3ad01f10,a08730b4 
Facility:                       test 
Frontend:                      nodea     FE-User:                     tu.7006 
State:                     RECEIVING     Start time: Mon Mar 13 16:00:08 2000 
Key-Range-Id:      16777216,16777217     Router:                        nodea 
Invocation:        ORIGINAL,ORIGINAL     Active-Key-Ranges:                 2 
Recovering-Key-Ranges:             0     Total-Tx-Enqs:                     2 
Server-Pid:                7006,7006     Server-State:             VOTED,VREQ 
Journal-Node:    nodea.com,nodea.com     Journal-State:         VOTED,SENDING 
First-Enq:                       1,2     Nr-Enqs:                         1,1 
Nr-Replies:                      0,0 
 

As previously described in this scenario, the transaction is stalled in one of the servers. To resolve this situation, use the RTR SET TRANSACTION command to abort this transaction. Change either one of the transaction branch's journal state to ABORT as shown in the following example:


RTR>set transaction/new=abort/state=voted/facility=test/partition=part1 
%RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part1 of 
   facility test 

or


RTR>set transaction/new=abort/state=sending/facility=test/partition=part2 
%RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part2 of 
   facility test 

See Chapter 8 for detailed information on these commands.


Chapter 5
Server Shadowing and Recovery

With RTR shadowing, your system can recover from a site disaster without the need for special coding within your application program.

A database is said to be shadowed when two copies of the same database are deployed on separate nodes at two different locations, typically two different sites. Each location maintains a copy of the database used by the server application, and RTR keeps the database copies synchronized. Shadow site configurations can contain two nodes at separate sites, two nodes in a cluster, or two clusters at separate sites. When setting up a shadow configuration for two nodes in a cluster, the syntax must explicitly state that the nodes are not to be standby nodes.

Concurrent servers handle similar transactions, (that is, in the same key range but not the same transactions). Standby servers do not handle transactions at all (for the given key range) and shadow servers handle the same transactions.

5.1 Primary and Secondary Partition States

There is a concept of primary and secondary states for the shadow server pair, although in most cases this is transparent to the user when the processing is the same on both sites.

The assignment of primary and secondary states to partitions can be managed by the partition priority list or left to RTR. If left to RTR, initial role assignment is arbitrary, in that the first server of a shadow pair to start is put in the primary state and the second, in the secondary state. The adopted states may change, as servers come and go. RTR needs to determine which server is in the primary state before presenting a transaction to the server in the secondary state.

5.2 Automatic Features

A shadow site has an identical copy of the customer's database.

Transactions are sent by RTR to both sites. RTR ensures that they are processed by the servers in the same order on each site (unless transactions are defined as independent), so that both copies of the customer database remain up to date.

A transaction is sent to the secondary site only after the primary has accepted it, or if the primary fails before being asked to vote.

RTR suppresses replies and broadcasts issued by the secondary shadow server.

5.2.1 Shadow Events in Partitions

RTR provides the following shadowing events:
RTR_EVTNUM_SRPRIMARY Server is in primary mode
RTR_EVTNUM_SRSTANDBY Server is in standby mode
RTR_EVTNUM_SRSECONDARY Server is in secondary mode
RTR_EVTNUM_SRSHADOWLOST Server has lost its shadow partner
RTR_EVTNUM_SRSHADOWGAIN Server has gained its shadow partner
RTR_EVTNUM_SRRECOVERCMPL Server has completed recovery

The shadow events are delivered with no special status and no data. They are delivered only to the servers whose state has changed.

A server receives RTR_EVTNUM_SRPRIMARY under the following circumstances:

A server receives RTR_EVTNUM_SRSTANDBY when it starts up and servers already exist for the same key range on another node in the same cluster.

A server receives RTR_EVTNUM_SRSECONDARY when it starts up and a shadow primary set of servers exist elsewhere.

A server receives RTR_EVTNUM_SRSHADOWLOST if it is running as primary and the secondary goes away.

A server receives RTR_EVTNUM_SRSHADOWGAIN if it is running as primary and a secondary node starts up.

A server receives RTR_EVTNUM_SRRECOVERCMPL when it has finished doing recovery operations and is ready to start processing new transactions.

5.3 RTR Journal System

The RTR journal is used for the following purposes:

5.4 Shadow Site Failure and Journaling

If a shadow site fails, RTR allows transactions to continue to be processed on the remaining site. The transactions processed by the remaining server or servers are retained by the primary server in its journal; when the failed site restarts, these transactions are sent to this site as part of a shadow-recovery operation, thus bringing the failed site back up to date.

The overhead required when calculating journal size comes from internal journal data (block stamping) of approximately 3%. In addition, there is internal transaction data per (client to server) transactional message, and some further data per transaction (concerning voting and transaction completion).

Also, RTR prevents further transactional data from being written to the journal when it is nearly full, but continues to allow deletes from the journal (deletes also cause data to be written to the journal). Ten segments are held in reserve for storing information about deleted transactions even when RTR cannot accept further transactions because the journal is full.

Caution

If the journal disk becomes full, transactions are aborted until the shadow partner restarts and empties the journal of transactions to be replayed.

5.5 Performance

The performance of a shadow pair compares with a transaction that spans two nodes, with the addition of one extra protocol message which is required to ensure that the transactions are presented in the same order.

RTR does not have to wait for the secondary shadow server to complete its processing. It only needs to know that the primary has committed the transaction and that the journal file of the secondary shadow server contains the final vote status.

The two partners in a shadow pair should be connected with sufficient bandwidth to allow for the large amounts of data which may need to be transferred during a shadow catchup operation.

5.6 Shadows in Action

The first node on which a shadow backend for a particular key range starts is arbitrarily designated by RTR to be the primary site for that key range unless a priority list has been defined.

Initially each RTR backend searches its journals to find any recoverable transactions left over from a previous invocation of the backend. Once these have been processed (or RTR determines that no such transactions exist), the backend becomes active and available to handle new transactions sent by clients/frontends.

While no other backend node for this key range is available, the backend runs in REMEMBER mode. RTR saves transactions processed on this site in the RTR journal (together with the order in which they should be committed), so that when the other-site backend starts, they can be sent to this site.

When a backend starts on a second site, it begins processing the transactions saved in the primary site's journal. These are deleted from the journal as they are processed. When the second site backends have caught up, the second backend enters SECONDARY ACTIVE state and the original site backends enter PRIMARY ACTIVE state. In this mode, new transactions are sent to both sites in parallel. They are executed first on the primary site, and shortly afterward on the secondary site in the same order. The primary site commits transactions as soon as it knows that the secondary site has hardened (i.e., written to the journal) the order in which the transaction is to be committed.

If a failure occurs at this point, the remaining site executes a short cleanup operation. After completing the cleanup operation and determining that the other site is really down, it reverts to the REMEMBER state and continues processing new transactions autonomously, saving the transaction information in its journal for when the other site restarts.

The execution order is determined for transactions issued to concurrent servers on a particular node by recording the order in which the individual servers issue rtr_accept_tx() calls. RTR knows that at the time a correctly written server application calls rtr_accept_tx() , it has already accessed (and therefore locked) any database records it uses, and that it will release these records after RTR causes the rtr_accept_tx() call to complete. Any conflicting transaction would not be able to issue rtr_accept_tx() concurrently. Therefore a correct serialization order for issuing the transactions on the shadow site can be determined.

Transactions can also use the /INDEPENDENT or /READ_ONLY flag to tell RTR that their order is not important and transactions need not be recovered serially.


Previous Next Contents Index