hp Reliable Transaction Router
System Manager's Manual

4.2.2 What EXCEPTION Transactions Mean to Data Integrity

EXCEPTION transactions keep the application available, although they cause some loss of data integrity. EXCEPTION transactions are considered committed by the initiator of the transaction, as well as by the other participants (such as the other shadow member). Therefore, subsequent transactions, which depend on the results of this transaction, could produce erroneous outcomes. In some applications, the erroneous outcomes do not matter, but in applications where the outcome does matter, the best approach is to allow the system administrator to manually intervene.

4.3 Transaction State Changes

There are eight valid state changes allowed for the SET TRANSACTION command. Attempting to change transaction state to a state that is not allowed produces an error message of %RTR-E-INVSTATCHANGE, Invalid to change from current state to the specified state . Table 4-2 identifies the valid state changes.

Table 4-2 Valid Transaction State Transitions
NEW STATE

CURRENT STATE COMMIT ABORT EXCEPTION DONE

SENDING YES

VOTED YES YES

COMMIT YES YES

EXCEPTION YES YES

PRI_DONE YES

**Table 4-2 Valid Transaction State Transitions**
	NEW STATE
CURRENT STATE	COMMIT	ABORT	EXCEPTION	DONE
SENDING		YES
VOTED	YES	YES
COMMIT			YES	YES
EXCEPTION	YES			YES
PRI_DONE				YES

All transaction states in Table 4-2 are RTR journal states. Use the RTR commands SHOW TRANSACTION or DUMP JOURNAL (you must stop the RTRACP to use this command) to determine the journal state for each transaction branch.

Four typical situations are listed below where transaction state changes by the system administrator are allowed.

State SENDING changed to state ABORT.
The server application, after receiving an rtr_mt_prepare message needs to access and lock the database record before calling rtr_accept_tx() for a particular transaction. If the record being accessed is locked for some reason, the application will experience a "hung" situation and cannot proceed. If the situation persists, the application will become unavailable. RTR detects such a deadlock, but intervention is probably required.
Normally, this condition can be avoided by specifying a transaction timeout. However, if that is not the case, the system administrator can choose to abort the transaction with the SET TRANSACTION command. Internally, RTR will inform the router as well as all the other participating servers to abort this transaction in a consistent manner. Note that this may not be enough to correct the original condition with the database. The server application would still wait for the database to free up. Subsequent transactions could be sent to this server process, and be vulnerable. The real reason for the lockup should be investigated and corrected.
State VOTED changed to state COMMIT.
A transaction branch could be in a VOTED state when a server application running on the backend may have been separated from the rest of participating servers after casting the VOTE for a multi-participant transaction. Another way this could happen is that at least one of the other participants in a distributed transaction has failed to cast its VOTE (by calling rtr_accept_tx ).
As long as there is a router coordinating the transaction, RTR will not allow the state of the transaction to be modified from VOTED to COMMIT in order to prevent the possibility of data inconsistency. This can happen because RTR is still in the middle of determining the final state for the transaction. However, it is possible to modify the state to ABORT, as explained in the next section.
If the backend is not connected to a coordinating router, the other servers may have already committed the transaction but not "forgotten" it. As far as the application is concerned, this global transaction is committed and all changes have been committed to the underlying database on the different sites. However, the local transaction branch is still in VOTED state in the RTR journal. You can use the command to manually commit the local transaction branch. This results in the transaction being recovered to a server during the next recovery cycle.
As mentioned earlier, this command is only applicable if there is no coordinating router running, (that is, the backend is separated from the rest of the RTR network). If this is not the case, RTR rejects the command with the error message RTR-E-SETTRANROUTER, indicating that a coordinating router is still available.

Note
This operation could lead to data inconsistency, if used injudiciously, and should only be used after careful research.
State VOTED changed to state ABORT.
As previously explained, a transaction can be stalled in a VOTED state in one or more of the following situations.
- There is a distributed deadlock (only possible with multi-participant transactions and multiple simultaneously active transactions).
- One participant has voted, but another participant has not voted (for example, waiting for the database record to become accessible).
- The transaction is not currently active, and is a candidate for recovery. The transaction is probably a multi-participant transaction and the final journal state for the local branch is VOTED. The transaction outcome cannot be determined without consulting the journal from the other participants.
Whatever the cause, this transaction ties up the system resources and prevents other transactions from running. Should the system administrator decide to abort the transaction using the SET TRANSACTION command, RTR sends a request-to-abort message to all the participants (transaction branches) to abort each transaction branch. After the abort, RTR presents a rtr_mt_rejected message to each server with a status indicating that "TX was aborted by Set Transaction operation". If the coordinating router is available, a race condition is possible, in that the transaction coordinator might be trying to commit the transaction at the instant that the operator was attempting to abort the transaction. Under this scenario, RTR may not allow the abort to proceed, if the coordinating router has already decided to commit the transaction. An operator log message on the router will be written to warn the administrator of this situation.
State COMMIT changed to state DONE.
An example of this state change is where a server crashed while performing an SQL commit immediately after receiving an mt_accepted message. The transaction is in COMMIT state as recorded in the RTR journal, but RTR considers this an uncertain transaction and will try to recover this transaction (unless limited by the partition's RECOVERY_RETRY_COUNT parameter). If a determination can be made that the transaction is truly committed in the underlying database, there is no reason to allow RTR to recover or replay this transaction to another server. To forget such transactions, the state should be changed from COMMIT to DONE. For single participant transactions, a journal state of VOTED really means COMMIT, because there is no reason for RTR not to commit a transaction that has one branch that is ready to commit.

After the SET TRANSACTION command is executed, use the DUMP JOURNAL command to verify the result.

4.4 Command Line Examples

The following is an example of the SET TRANSACTION command:

RTR> start rtr RTR> set log/file=settran RTR> set transaction/state=PRI_DONE/new_state=DONE/facility=Facility1/- _RTR> partition=Partition1 *

This example would set all transactions with the current state of PRI_DONE (remember) to DONE on the facility Facility1 and the partition Partition1. The log file, settran , would record the transaction state changes. The changes could be viewed with the SHOW TRANSACTION command or the DUMP JOURNAL command. In a shadow recovery situation this would clear the journal of remember transactions and provide for a quick turnaround of the shadow site.

The following example shows how RTR commands monitor and manipulate three different transaction states. Consider a scenario where a distributed transaction accesses two RTR partitions. The multiple-participant distributed transaction would have two transaction branches accessing different RTR partitions, say part1 and part2, respectively.

The client commits the transaction and calls rtr_accept_tx() which prompts RTR to start the two-phase commit protocol. RTR sends a prepare message to the two participants. Upon receiving the prepare message ( mt_prepare ), one of the server applications is ready to commit and casts its vote by calling rtr_accept_tx() . RTR writes a VOTE record in the RTR journal and sends the vote message back to the router. However, due to an unexpected defect in the application software, the second server has not sent its VOTE message back to the RTR router. Thus, the transaction is stalled in the second server.

To examine this situation, an RTR system administrator should first use the SHOW TRANSACTION/BACKEND command on the backend node to analyze the transaction's status. As shown in the following example, the transaction runtime state is RECEIVING, indicating the distributed transaction is not yet committed. The server states for the transaction branches are VOTED and VREQ respectively, indicating that one of the transaction branches has been voted by the associated server whereas the other transaction branch is still in "Vote Request" state (VREQ). The journal states for the transaction branches are VOTED and SENDING, indicating that one transaction branch has voted and its VOTED record was written in the RTR journal. The other transaction branch's journal state is SENDING, indicating that transaction branch is still in the process of processing a message from the client and it has not yet advanced to the VOTED state. The journal states for the transaction branches that are recorded in the RTR journal are consistent with their server states.

A transaction branch's journal state is persistent and is therefore used by the SET TRANSACTION command to change a transaction's state. The DUMP JOURNAL command is also useful to examine each transaction branch's journal state.

Backend transactions on node nodea at Mon Mar 13 16:02:42 2000 Tid: 3ad01f10,0,0,0,0,3ad01f10,a08730b4 Facility: test Frontend: nodea FE-User: tu.7006 State: RECEIVING Start time: Mon Mar 13 16:00:08 2000 Key-Range-Id: 16777216,16777217 Router: nodea Invocation: ORIGINAL,ORIGINAL Active-Key-Ranges: 2 Recovering-Key-Ranges: 0 Total-Tx-Enqs: 2 Server-Pid: 7006,7006 Server-State: VOTED,VREQ Journal-Node: nodea.com,nodea.com Journal-State: VOTED,SENDING First-Enq: 1,2 Nr-Enqs: 1,1 Nr-Replies: 0,0

As previously described in this scenario, the transaction is stalled in one of the servers. To resolve this situation, use the RTR SET TRANSACTION command to abort this transaction. Change either one of the transaction branch's journal state to ABORT as shown in the following example:

RTR>set transaction/new=abort/state=voted/facility=test/partition=part1 %RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part1 of facility test

RTR>set transaction/new=abort/state=sending/facility=test/partition=part2 %RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part2 of facility test

See Chapter 8 for detailed information on these commands.

Chapter 5
Server Shadowing and Recovery

With RTR shadowing, your system can recover from a site disaster without the need for special coding within your application program.

A database is said to be shadowed when two copies of the same database are deployed on separate nodes at two different locations, typically two different sites. Each location maintains a copy of the database used by the server application, and RTR keeps the database copies synchronized. Shadow site configurations can contain two nodes at separate sites, two nodes in a cluster, or two clusters at separate sites. When setting up a shadow configuration for two nodes in a cluster, the syntax must explicitly state that the nodes are not to be standby nodes.

Concurrent servers handle similar transactions, (that is, in the same key range but not the same transactions). Standby servers do not handle transactions at all (for the given key range) and shadow servers handle the same transactions.

5.1 Primary and Secondary Partition States

There is a concept of primary and secondary states for the shadow server pair, although in most cases this is transparent to the user when the processing is the same on both sites.

The assignment of primary and secondary states to partitions can be managed by the partition priority list or left to RTR. If left to RTR, initial role assignment is arbitrary, in that the first server of a shadow pair to start is put in the primary state and the second, in the secondary state. The adopted states may change, as servers come and go. RTR needs to determine which server is in the primary state before presenting a transaction to the server in the secondary state.

5.2 Automatic Features

A shadow site has an identical copy of the customer's database.

Transactions are sent by RTR to both sites. RTR ensures that they are processed by the servers in the same order on each site (unless transactions are defined as independent), so that both copies of the customer database remain up to date.

A transaction is sent to the secondary site only after the primary has accepted it, or if the primary fails before being asked to vote.

RTR suppresses replies and broadcasts issued by the secondary shadow server.

5.2.1 Shadow Events in Partitions

RTR provides the following shadowing events:

RTR_EVTNUM_SRPRIMARY Server is in primary mode

RTR_EVTNUM_SRSTANDBY Server is in standby mode

RTR_EVTNUM_SRSECONDARY Server is in secondary mode

RTR_EVTNUM_SRSHADOWLOST Server has lost its shadow partner

RTR_EVTNUM_SRSHADOWGAIN Server has gained its shadow partner

RTR_EVTNUM_SRRECOVERCMPL Server has completed recovery

The shadow events are delivered with no special status and no data. They are delivered only to the servers whose state has changed.

A server receives RTR_EVTNUM_SRPRIMARY under the following circumstances:

On initial startup if servers for the key range are not already running on other nodes
If the server had previously been standby and the previous primary has failed
If the server had previously been shadow secondary and the previous primary has failed

A server receives RTR_EVTNUM_SRSTANDBY when it starts up and servers already exist for the same key range on another node in the same cluster.

A server receives RTR_EVTNUM_SRSECONDARY when it starts up and a shadow primary set of servers exist elsewhere.

A server receives RTR_EVTNUM_SRSHADOWLOST if it is running as primary and the secondary goes away.

A server receives RTR_EVTNUM_SRSHADOWGAIN if it is running as primary and a secondary node starts up.

A server receives RTR_EVTNUM_SRRECOVERCMPL when it has finished doing recovery operations and is ready to start processing new transactions.

5.3 RTR Journal System

The RTR journal is used for the following purposes:

RTR stores data about a transaction so that if the backend processing a transaction fails for any reason, another backend can transparently continue from where the previous server failed (if it is able to access the same database and journal) (assuming an environment where both backends can access the journal disk).
RTR also stores data about transactions when a shadow site is known to be missing. In this case, RTR stores all transaction data which can result in a database update. The RTR journal stores this data in the journal file accessible from the backend until the secondary shadow site is available again. RTR then transparently replays this data to the shadow site, after which the data is deleted from the journal.
Partition information

5.4 Shadow Site Failure and Journaling

If a shadow site fails, RTR allows transactions to continue to be processed on the remaining site. The transactions processed by the remaining server or servers are retained by the primary server in its journal; when the failed site restarts, these transactions are sent to this site as part of a shadow-recovery operation, thus bringing the failed site back up to date.

The overhead required when calculating journal size comes from internal journal data (block stamping) of approximately 3%. In addition, there is internal transaction data per (client to server) transactional message, and some further data per transaction (concerning voting and transaction completion).

Also, RTR prevents further transactional data from being written to the journal when it is nearly full, but continues to allow deletes from the journal (deletes also cause data to be written to the journal). Ten segments are held in reserve for storing information about deleted transactions even when RTR cannot accept further transactions because the journal is full.

Caution

If the journal disk becomes full, transactions are aborted until the shadow partner restarts and empties the journal of transactions to be replayed.

5.5 Performance

The performance of a shadow pair compares with a transaction that spans two nodes, with the addition of one extra protocol message which is required to ensure that the transactions are presented in the same order.

RTR does not have to wait for the secondary shadow server to complete its processing. It only needs to know that the primary has committed the transaction and that the journal file of the secondary shadow server contains the final vote status.

The two partners in a shadow pair should be connected with sufficient bandwidth to allow for the large amounts of data which may need to be transferred during a shadow catchup operation.

5.6 Shadows in Action

The first node on which a shadow backend for a particular key range starts is arbitrarily designated by RTR to be the primary site for that key range unless a priority list has been defined.

Initially each RTR backend searches its journals to find any recoverable transactions left over from a previous invocation of the backend. Once these have been processed (or RTR determines that no such transactions exist), the backend becomes active and available to handle new transactions sent by clients/frontends.

While no other backend node for this key range is available, the backend runs in REMEMBER mode. RTR saves transactions processed on this site in the RTR journal (together with the order in which they should be committed), so that when the other-site backend starts, they can be sent to this site.

When a backend starts on a second site, it begins processing the transactions saved in the primary site's journal. These are deleted from the journal as they are processed. When the second site backends have caught up, the second backend enters SECONDARY ACTIVE state and the original site backends enter PRIMARY ACTIVE state. In this mode, new transactions are sent to both sites in parallel. They are executed first on the primary site, and shortly afterward on the secondary site in the same order. The primary site commits transactions as soon as it knows that the secondary site has hardened (i.e., written to the journal) the order in which the transaction is to be committed.

If a failure occurs at this point, the remaining site executes a short cleanup operation. After completing the cleanup operation and determining that the other site is really down, it reverts to the REMEMBER state and continues processing new transactions autonomously, saving the transaction information in its journal for when the other site restarts.

The execution order is determined for transactions issued to concurrent servers on a particular node by recording the order in which the individual servers issue rtr_accept_tx() calls. RTR knows that at the time a correctly written server application calls rtr_accept_tx() , it has already accessed (and therefore locked) any database records it uses, and that it will release these records after RTR causes the rtr_accept_tx() call to complete. Any conflicting transaction would not be able to issue rtr_accept_tx() concurrently. Therefore a correct serialization order for issuing the transactions on the shadow site can be determined.

Transactions can also use the /INDEPENDENT or /READ_ONLY flag to tell RTR that their order is not important and transactions need not be recovered serially.

Contents

Index

RTR_EVTNUM_SRPRIMARY	Server is in primary mode
RTR_EVTNUM_SRSTANDBY	Server is in standby mode
RTR_EVTNUM_SRSECONDARY	Server is in secondary mode
RTR_EVTNUM_SRSHADOWLOST	Server has lost its shadow partner
RTR_EVTNUM_SRSHADOWGAIN	Server has gained its shadow partner
RTR_EVTNUM_SRRECOVERCMPL	Server has completed recovery

hp Reliable Transaction RouterSystem Manager's Manual