Reliable Transaction Router
System Manager's Manual

4.2 Exception Transactions

When a server votes on a transaction, RTR expects the server to commit the transaction to the database when RTR makes the request. If for some reason the server cannot do so, the server has two choices:

The server can exit the process, causing RTR to recover the transaction to another concurrent server channel or to a standby server. If the problem with committing to the database persists, it is possible that the subsequent attempts at recovering the transaction will cause other server processes to exit. This can cause the entire pool of servers to go away, thus affecting the availability of the application. You can avoid this by limiting the number of recovery attempts that RTR does for a specific transaction, using the SET PARTITION/RECOVERY_RETRY_COUNT=nn command. This limits the number of times the same transaction is recovered to a server, which subsequently crashes. The transaction can no longer be aborted once it is committed and is placed in an EXCEPTION state.
The server application could use the rtr_set_info() function to directly change the state of a committed transaction to EXCEPTION. This avoids the need to crash in order to signal to RTR that something unexpected has happened to the transaction.

EXCEPTION transactions can be inspected with the DUMP JOURNAL command. The final state of the transaction should say EXCEPTION.

4.2.1 Dealing with EXCEPTION Transactions

The system administrator must decide what to do with transactions that are marked EXCEPTION. There are two choices:

Fix the environmental issue that was causing the server to crash, and then recover the transaction from the journal. To do this, the transaction state should be changed to COMMIT using the SET TRANSACTION command. When the server is restarted, RTR will recover this transaction.
Manually apply the transaction to the database and remove the transaction from the RTR journal. To do this, the transaction state should be modified to DONE by using the SET TRANSACTION command. The transaction will be forgotten by RTR.

4.2.2 What EXCEPTION Transactions Mean to Data Integrity

EXCEPTION transactions keep the application available, although they cause some loss of data integrity. EXCEPTION transactions are considered committed by the initiator of the transaction, as well as by the other participants (such as the other shadow member). Therefore, subsequent transactions, which are dependent on the results of this transaction, could produce erroneous outcomes. In some applications, the erroneous outcomes do not matter. In applications where the outcome does matter, the best approach is to crash the application, and allow the system administrator to manually intervene.

4.3 Transaction State Changes

There are eight valid state changes allowed for the SET TRANSACTION command. Attempting to change transaction state to a state that is not allowed produces an error message of %RTR-E-INVSTATCHANGE, Invalid to change from current state to the specified state . Table 4-2 identifies the valid state changes.

Table 4-2 Valid Transaction State Transitions
NEW STATE

Current State COMMIT ABORT EXCEPTION DONE

SENDING YES

VOTED YES YES

COMMIT YES YES

EXCEPTION YES YES

PRI_DONE YES

**Table 4-2 Valid Transaction State Transitions**
	NEW STATE
Current State	COMMIT	ABORT	EXCEPTION	DONE
SENDING		YES
VOTED	YES	YES
COMMIT			YES	YES
EXCEPTION	YES			YES
PRI_DONE				YES

All transaction states referenced in Table 4-2 are RTR journal states. Use the RTR commands DUMP JOURNAL or SHOW TRANSACTION to determine the journal state for each transaction branch.

Four typical situations are listed below where transaction state changes by the system administrator are allowed.

State SENDING changed to state ABORT.
The server application, after receiving an rtr_mt_prepare message needs to access and lock the database record before calling rtr_accept_tx() for a particular transaction. If the record being accessed is locked for some reason, the application will experience a "hung" situation and cannot proceed. If the situation persists, the application will become unavailable. Normally, this condition can be avoided by specifying a transaction timeout. However, if that is not the case, the system administrator can choose to abort the transaction with the SET TRANSACTION command. Internally, RTR will inform the router as well as all the other participating servers to abort this transaction in a consistent manner. Note that this may not be enough to correct the original condition with the database. The server application would still wait for the database to free up. Subsequent transactions could be sent to this server process, and be vulnerable. The real reason for the lockup should be investigated and corrected.
State VOTED changed to state COMMIT.
A transaction branch could be in a VOTED state when a server application running on the backend may have been separated from the rest of participating servers after casting the VOTE for a multi-participant transaction. Another way this could happen is that at least one of the other participants in a distributed transaction has failed to cast its VOTE (by calling rtr_accept_tx ).
As long as there is a router coordinating the transaction, RTR will not allow the state of the transaction to be modified from VOTED to COMMIT in order to prevent the possibility of data inconsistency. This can happen because RTR is still in the middle of determining the final state for the transaction. However, it is possible to modify the state to ABORT, as explained in the next section.
If the backend is not connected to a coordinating router, the other servers may have already committed the transaction but not "forgotten" it. As far as the application is concerned, this global transaction is committed and all changes have been committed to the underlying database on the different sites. However, the local transaction branch is still in VOTED state in the RTR journal. You can use the command to manually commit the local transaction branch. This results in the transaction being recovered to a server during the next recovery cycle.
As mentioned earlier, this command is only applicable if there is no coordinating router running, (that is, the backend is separated from the rest of the RTR network). If this is not the case, RTR rejects the command with the error message RTR-E-SETTRANROUTER, indicating that a coordinating router is still available.

Note
This operation could lead to data inconsistency, if used injudiciously, and should only be used after careful research.
State VOTED changed to state ABORT.
As previously explained, a transaction can be stalled in a VOTED state in one or more of the following situations.
- There is a distributed deadlock (only possible with multi-participant transactions and multiple simultaneously active transactions).
- One participant has voted, but another participant has not voted (for example, waiting for the database record to become accessible).
- The transaction is not currently active, and is a candidate for recovery. The transaction is probably a multi-participant transaction and the final journal state for the local branch is VOTED. The transaction outcome cannot be determined without consulting the journal from the other participants.
Whatever the cause, this transaction ties up the system resources and prevents other transactions from running. Should the system administrator decide to abort the transaction using the SET TRANSACTION command, RTR sends a request-to-abort message to all the participants (transaction branches) to abort each transaction branch. After the abort, RTR presents a rtr_mt_rejected message to each server with a status indicating that "TX was aborted by Set Transaction operation". If the coordinating router is available, a race condition is possible, in that the transaction coordinator might be trying to commit the transaction at the instant that the operator was attempting to abort the transaction. Under this scenario, RTR may not allow the abort to proceed, if the coordinating router has already decided to commit the transaction. An operator log message on the router will be written to warn the administrator of this situation.
State COMMIT changed to state DONE.
An example of this state change is where a server crashed while performing an SQL commit immediately after receiving an mt_accepted message. The transaction is in COMMIT state as recorded in the RTR journal, but RTR considers this an uncertain transaction and will try to recover this transaction (unless limited by the partition's RECOVERY_RETRY_COUNT parameter). If a determination can be made that the transaction is truly committed in the underlying database, there is no reason to allow RTR to recover or replay this transaction to another server. To forget such transactions, the state should be changed from COMMIT to DONE. For single participant transactions, a journal state of VOTED really means COMMIT, because there is no reason for RTR not to commit a transaction that has one branch that is ready to commit.

After the SET TRANSACTION command is executed, use the DUMP JOURNAL command to verify the result.

4.4 Command Line Examples

The following is an example of the SET TRANSACTION command:

RTR> start rtr RTR> set log/file=settran RTR> set transaction/state=PRI_DONE/new_state=DONE/facility=Facility1/- _RTR> partition=Partition1 *

This example would set all transactions with the current state of PRI_DONE (remember) to DONE on the facility Facility1 and the partition Partition1. The log file, settran , would record the transaction state changes. The changes could be viewed with the SHOW TRANSACTION command or the DUMP JOURNAL command. In a shadow recovery situation this would clear the journal of remember transactions and provide for a quick turnaround of the shadow site.

The following example shows how RTR commands monitor and manipulate three different transaction states. Consider a scenario where a distributed transaction accesses two RTR partitions. The multiple-participant distributed transaction would have two transaction branches accessing different RTR partitions, say part1 and part2, respectively.

The client commits the transaction and calls rtr_accept_tx() which prompts RTR to start the two-phase commit protocol. RTR sends a prepare message to the two participants. Upon receiving the prepare message ( mt_prepare ), one of the server applications is ready to commit and casts its vote by calling rtr_accept_tx() . RTR writes a VOTE record in the RTR journal and sends the vote message back to the router. However, due to an unexpected defect in the application software, the second server has not sent its VOTE message back to the RTR router. Thus, the transaction is stalled in the second server.

To examine this situation, an RTR system administrator should first use the SHOW TRANSACTION/BACKEND command on the backend node to analyze the transaction's status. As shown in the following example, the transaction runtime state is RECEIVING, indicating the distributed transaction is not yet committed. The server states for the transaction branches are VOTED and VREQ respectively, indicating that one of the transaction branches has been voted by the associated server whereas the other transaction branch is still in "Vote Request" state (VREQ). The journal states for the transaction branches are VOTED and SENDING, indicating that one transaction branch has voted and its VOTED record was written in the RTR journal. The other transaction branch's journal state is SENDING, indicating that transaction branch is still in the process of processing a message from the client and it has not yet advanced to the VOTED state. The journal states for the transaction branches that are recorded in the RTR journal are consistent with their server states.

A transaction branch's journal state is persistent and is therefore used by the SET TRANSACTION command to change a transaction's state. The DUMP JOURNAL command is also useful to examine each transaction branch's journal state.

Backend transactions on node nodea at Mon Mar 13 16:02:42 2000 Tid: 3ad01f10,0,0,0,0,3ad01f10,a08730b4 Facility: test Frontend: nodea FE-User: tu.7006 State: RECEIVING Start time: Mon Mar 13 16:00:08 2000 Key-Range-Id: 16777216,16777217 Router: nodea Invocation: ORIGINAL,ORIGINAL Active-Key-Ranges: 2 Recovering-Key-Ranges: 0 Total-Tx-Enqs: 2 Server-Pid: 7006,7006 Server-State: VOTED,VREQ Journal-Node: nodea.com,nodea.com Journal-State: VOTED,SENDING First-Enq: 1,2 Nr-Enqs: 1,1 Nr-Replies: 0,0

As previously described in this scenario, the transaction is stalled in one of the servers. To resolve this situation, use the RTR SET TRANSACTION command to abort this transaction. Change either one of the transaction branch's journal state to ABORT as shown in the following example:

RTR>set transaction/new=abort/state=voted/facility=test/partition=part1 %RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part1 of facility test

RTR>set transaction/new=abort/state=sending/facility=test/partition=part2 %RTR-S-SETTRANDONE, 1 transaction(s) updated in partition part2 of facility test

See Chapter 7 for detailed information on these commands.

Chapter 5
Server Shadowing and Recovery

With RTR shadowing, your system can recover from a site disaster without the need for special coding within your application program.

A database is said to be shadowed when two copies of the same database are deployed on separate nodes at two different locations, typically two different sites. Each location maintains a copy of the database used by the server application, and RTR keeps the database copies synchronized. Shadow site configurations can contain two nodes at separate sites, two nodes in a cluster, or two clusters at separate sites. When setting up a shadow configuration for two nodes in a cluster, the syntax must explicitly state that the nodes are not to be standby nodes.

Concurrent servers handle similar transactions, (that is, in the same key range but not the same transactions). Standby servers do not handle transactions at all (for the given key range) and shadow servers handle the same transactions.

5.1 Primary and Secondary Roles

There is a concept of primary and secondary roles for the shadow server pair, although in most cases this is transparent to the user when the processing is the same on both sites.

The assignment of primary and secondary roles to partitions can be managed by the partition priority list, or left to RTR. If left to RTR, initial role assignment is arbitrary, in that the first server of a shadow pair to start is given the primary role, and the second the secondary. The assigned roles may change, as servers come and go. Roles are required, since RTR needs to determine the voting order on the primary site before the transaction is presented to the secondary site.

5.2 Automatic Features

Shadow sites each have an identical copy of the customer's database.

Transactions are sent by RTR to both sites. RTR ensures that they are processed by the servers in the same order on each site, so that both copies of the customer database remain up to date.

A transaction is sent to the secondary site only after the primary has accepted it, or if the primary fails before being asked to vote.

RTR suppresses replies and broadcasts issued by the secondary shadow server.

5.2.1 Shadow Events

RTR provides the following shadowing events:

RTR_EVTNUM_SRPRIMARY Server is in primary mode

RTR_EVTNUM_SRSTANDBY Server is in standby mode

RTR_EVTNUM_SRSECONDARY Server is in secondary mode

RTR_EVTNUM_SRSHADOWLOST Server has lost its shadow partner

RTR_EVTNUM_SRSHADOWGAIN Server has gained its shadow partner

RTR_EVTNUM_SRRECOVERCMPL Server has completed recovery

The shadow events are delivered with no special status and no data. They are delivered only to the servers whose state has changed.

A server receives RTR_EVTNUM_SRPRIMARY under the following circumstances:

On initial startup if servers for the key range are not already running on other nodes
If the server had previously been standby and the previous primary has failed
If the server had previously been shadow secondary and the previous primary has failed

A server receives RTR_EVTNUM_SRSTANDBY when it starts up and servers already exist for the same key range on another node in the same cluster.

A server receives RTR_EVTNUM_SRSECONDARY when it starts up and a shadow primary set of servers exist elsewhere.

A server receives RTR_EVTNUM_SRSHADOWLOST if it is running as primary and the secondary goes away.

A server receives RTR_EVTNUM_SRSHADOWGAIN if it is running as primary and a secondary node starts up.

A server receives RTR_EVTNUM_SRRECOVERCMPL when it has finished doing recovery operations and is ready to start processing new transactions.

5.3 RTR Journal System

The RTR journal is used for the following purposes:

RTR stores data about a transaction so that if the backend processing a transaction fails for any reason, another backend can transparently continue from where the previous server failed (if it is able to access the same database and journal).
RTR also stores data about transactions when a shadow site is known to be missing. In this case, RTR stores all transaction data which can result in a database update. The RTR journal stores this data in the journal file accessible from the backend until the secondary shadow site is available again. RTR then transparently replays this data to the shadow site, after which the data is deleted from the journal.
Partition information

The amount of space required for the journal depends upon the:

Size of the messages in a transaction
Number of messages in the transaction
Rate of generation of transactions
Maximum time a shadow site can be out of commission

Thus a journal file is often quite large.

The /MAXIMUM_BLOCKS qualifier on the CREATE JOURNAL command controls how large a journal may become. The /MAXIMUM_BLOCKS qualifier defines the maximum number of blocks which the journal is allowed to occupy on any one disk. RTR does not check if this amount of space is actually available, as the disk space specified by /MAXIMUM_BLOCKS is used only on demand by RTR when insufficient space is available in the space allocated by the /BLOCKS qualifier.

The number of blocks specified by the /BLOCKS qualifier specifies the maximum size of the journal that RTR attempts to use. The actual number of blocks used may vary, depending upon the load on RTR.

The command MODIFY JOURNAL also accepts the /BLOCKS and /MAXIMUM_BLOCKS qualifiers.

Journal file extension occurs on demand when RTR detects that a "write to journal" would otherwise fail due to lack of space. Journal file truncation takes place periodically when enough free blocks are detected.

Refer to MODIFY JOURNAL for the syntax description of the MODIFY JOURNAL command.

RTR> show journal/files/full RTR journal:- Disk: /dev/rz3a Blocks: 2500 Allocated: 1253 Maximum: 3500 File: //rtrjnl/anders/BRONZE.J00 RTR>

5.4 Shadow Site Failure and Journaling

If a shadow site fails, RTR allows transactions to continue to be processed on the remaining site. The intermediate transactions processed by the remaining server or servers are retained by RTR; when the failed site restarts, these transactions are sent to this site as part of a shadow-recovery operation, thus bringing the failed site back up to date.

Since the transactions are stored in the RTR journal, it must be created with enough disk space in reserve to store data for the longest expected outage. It can be calculated using:

( Nr. transaction messages per second multiplied by ( transaction message length + 70 ) multiplied by seconds of outage ) + 5% file overhead.

The result in bytes must be divided by 512 to obtain size in blocks.

The overhead required when calculating journal size comes from internal journal data (block stamping) of approximately 3%. In addition, there is internal transaction data per (client to server) transactional message, and some further data per transaction (concerning voting and transaction completion).

Also, RTR prevents further transactional data from being written to the journal when it is nearly full, but continues to allow deletes from the journal (deletes also cause data to be written to the journal). Ten segments are held in reserve for storing information about deleted transactions even when RTR cannot accept further transactions because the journal is full.

Caution

If the journal disk becomes full, transactions are aborted until the shadow partner restarts and empties the journal of transactions to be replayed.

Contents

Index

RTR_EVTNUM_SRPRIMARY	Server is in primary mode
RTR_EVTNUM_SRSTANDBY	Server is in standby mode
RTR_EVTNUM_SRSECONDARY	Server is in secondary mode
RTR_EVTNUM_SRSHADOWLOST	Server has lost its shadow partner
RTR_EVTNUM_SRSHADOWGAIN	Server has gained its shadow partner
RTR_EVTNUM_SRRECOVERCMPL	Server has completed recovery

Reliable Transaction RouterSystem Manager's Manual

4.2 Exception Transactions

4.2.2 What EXCEPTION Transactions Mean to Data Integrity

Chapter 5Server Shadowing and Recovery

Reliable Transaction Router
System Manager's Manual

Chapter 5
Server Shadowing and Recovery