Reliable Transaction Router
System Manager's Manual

2.15.2 Files Created by the RTR Windows NT Service

If RTR is started from the Service rather than from a Command Prompt window, several files are created in the RTR root directory.

srvcin.txt is created to act as a command line input source
srvcout.txt acts as a container for console output
rtrstart.rtr contains the startup commands.

When the Service stops RTR, it recreates srvcin.txt and creates rtrstop.rtr for stopdown commands. Creation of these files is unconditional; that is, they are created every time RTR is started or stopped, whether or not they already exist. RTR will therefore ignore (and overwrite) any changes made to one of these files.

2.16 Selecting Processing-states (Roles) for Nodes

This section discusses how RTR assigns roles to backend node partitions, and how routers are selected.

2.16.1 Role Assignment for Backend Node Partitions

RTR assigns a primary or secondary processing state to a partition (or a key-range definition), consisting of one or more server application channels, which may or may not share a common process. All such server channels belonging to a given partition will have the same processing state on a given node. However, the processing state for the same partition will normally be different on different nodes. The exception is the case of the standby processing state. Because a given partition can have multiple standby nodes, several of these nodes may be in this state.

RTR determines the processing state of a given partition through the use of a globally managed sequence number for that partition. By default, the RTR master router will automatically assign sequence numbers to partitions during startup. When a server is started up on a backend node and declares a new partition for that node, the partition initially has a sequence number of zero. When the partition on that backend makes an initial connection to the master router, the router increases its sequence number count for that partition by one and assigns the new sequence number to the new backend partition. The active node with the lowest backend partition sequence number gets the primary processing state in both shadow and standby configurations. That node is also referred to as the primary node, though the same node could have a standby processing state for a different partition.

Under certain failover conditions, backend partitions may either retain their original sequence number or be assigned a new sequence number by the router. If a failure is caused by a network disruption, for example, a backend partition will retain its sequence number when it reconnects with the router. However, if the backend node is rebooted or RTR is restarted on the backend node, a new sequence number will be assigned by the router to any partitions that start up on that node. Routers will only assign new sequence numbers to backend partitions that have a current sequence number of zero, or if the backend partition is joining an existing facility and has a sequence number that conflicts with another backend partition on another node.

Sequence number information can be obtained from the SHOW PARTITION command. In the output of this command, the sequence number is indicated by the "relative priority". The following example shows how to use the SHOW PARTITION command from a router partition. In this example, the backend partition called Bronze has a sequence number of 1, and the backend partition called Gold has a sequence number of 2.

Router partitions on node SILVER in group test at Mon Mar 22 14:51:16 1999 State: ACTIVE Low bound: 0 High bound: 4294967295 Failover policy: fail_to_standby Backends: bronze,gold States: pri_act,sec_act Relative priorities: 1,2 Primary main: bronze Shadow main: gold

The output of the SHOW PARTITION command for each backend node is as follows:

Backend partitions on node BRONZE in group "test" at Mon Mar 22 14:52:32 1999 Partition name: p1 Facility: RTR$DEFAULT_FACILITY State: pri_act Low bound: 0 High bound: 4294967295 Active servers: 0 Free servers: 1 Transaction presentation: active Last Rcvy BE: gold Active transaction count: 0 Transactions recovered: 0 Failover policy: fail_to_standby Key range ID: 16777217 Master router: silver Relative priority: 1 Features: Shadow,NoStandby,Concurrent Backend partitions on node GOLD in group "test" at Mon Mar 22 14:54:12 1999 Partition name: p1 Facility: RTR$DEFAULT_FACILITY State: sec_act Low bound: 0 High bound: 4294967295 Active servers: 0 Free servers: 1 Transaction presentation: active Last Rcvy BE: bronze Active transaction count: 0 Transactions recovered: 0 Failover policy: fail_to_standby Key range ID: 16777216 Master router: silver Relative priority: 2 Features: Shadow,NoStandby,Concurrent

The following figure shows how sequence numbers are initially assigned in a simple partition with two backends named Bronze and Gold, and a router named Silver.

Figure 2-3 Assignment of Sequence Numbers

A partition (with shadowing enabled) is started on node Bronze.
The partition on Bronze obtains sequence number 1 from the router and becomes the primary.
Another server on the same partition (with the same attributes) is started on Gold.
The partition on Gold obtains sequence number 2 from the router and becomes the secondary.
Node Bronze crashes and reboots (the partition sequence number on Bronze is reset to 0). The partition on Gold goes into Remember.
When the server starts, the partition on Bronze obtains sequence number 3 from the router and becomes the secondary; Gold now becomes the primary.
The network connection from node Silver to node Gold fails. The partition on Bronze becomes the primary. The partition on node Gold loses quorum and is in a wait-for-quorum state.
The network connection to node Gold is reestablished. The partition on Gold retained its original sequence number of 2 and retains the primary role while the partition on Bronze reassumes the secondary role.

Alternately, the roles of backend nodes can be specifically assigned with the /PRIORITY_LIST qualifier to the SET PARTITION command. The /PRIORITY_LIST qualifier can be used to ensure that when Bronze fails and then returns to participate in the facility, it then becomes the active primary member. To ensure this, the following command would be issued on both backend systems immediately after the creation of the partition:

SET PARTITION test/PRIORITY_LIST=(bronze,gold)

The same priority list order should be used on all partition members. If a different list is used, the router will determine the sequence number for conflicting members through the order in which those members joined the facility. For example, if the above command were issued only on Bronze, and Gold had the opposite priority list, the router would assign the lower sequence number to the backend that joined the facility first.

The /PRIORITY_LIST feature is very useful in cluster configurations. For example, Site A and Site B each contain 2-node clusters, as shown in the following figure. The facility is configured such that at Site A, Node-A1 has the primary active partition and Node-A2 has the standby partition. At Site B, Node-B1 is the secondary active partition and Node-B2 has the standby of the secondary.

Figure 2-4 Cluster Configuration

The partition could be defined such that the standby node, Node-A2, would become active if the primary node were to fail. For example, issuing the following command on all four nodes for this partition guarantees that the specified list is followed when there is a failure.

SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-A2,Node-B1,Node-B2)

Using the SHOW PARTITION command from the router, this partition would be as follows:

Router partitions on node SILVER in group "test" at Mon Mar 22 17:22:06 1999 State: ACTIVE Low bound: 0 High bound: 4294967295 Failover policy: fail_to_standby Backends: node-a1,node-a2,node-b1,node-b2 States: pri_act,standby,sec_act,standby Relative priorities: 1,2,3,4 Primary main: node-a1 Shadow main: node-b1

However, the partition could also be configured so that the secondary active node, Node-B1, would become the primary node if the original primary system were to fail. This is controlled with the /FAILOVER_POLICY qualifier to the SET PARTITION command. The default is /FAILOVER_POLICY=STAND_BY.

If the relative priority (sequence number) for Node-A2 is changed to four, it still becomes the primary active server if Node-A1 fails because the failover policy indicates a fail_to_standby requirement for this partition.

SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-B1,Node-B2,Node-A2)

After issuing this command, the router partition appears as follows. Note the change in relative priorities for the backends.

Router partitions on node SILVER in group test at Tue Mar 23 13:29:41 1999 State: ACTIVE Low bound: 0 High bound: 4294967295 Failover policy: fail_to_standby Backends: node-a1,node-a2,node-b1,node-b2 States: pri_act,standby,sec_act,standby Relative priorities: 1,4,2,3 Primary main: node-a1 Shadow main: node-b1

Use the following SET PARTITION command to change the facility so that Node-B1 will become the primary active server if Node-A1 fails.

SET PARTITION test/FAILOVER_POLICY=shadow

Use the /FAILOVER_POLICY qualifier to select a new active primary in configurations where shadowing is enabled. This qualifier takes precedence over the /PRIORITY_LIST qualifier. Use the /PRIORITY_LIST qualifier to determine the failover order for specific nodes. It is most useful in cluster configurations where it can specify the exact failover order for the nodes within the cluster. For example, in a standby facility on a cluster of four nodes, the /PRIORITY_LIST qualifier can specify the desired order of failover for those cluster members. Some machines within a cluster may be more powerful than other machines. This feature allows for the most efficient use of those machines.

2.16.2 Router Selection

Within the scope of a given facility, routers and backends connect to one another. However, nodes with a specific role do not connect to nodes with the same role, i.e., routers do not connect to other routers. Frontends choose only one router to connect to at a given time. This router is called the Current Router for that frontend within the scope of a facility.

A backend connects to all routers defined within a facility. The connected router with the lowest network address is designated the master router. Internally, a node is identified through a structure called the Kernel Net ID. The Kernel Net ID is a concatenation of all network addresses a node is known as for all the protocols and interfaces that it supports. The master router designation is only relevant to a backend. It is where the backend goes to obtain and verify partition configuration and facility information.

Routers are made known to the frontend systems through the list specified in the /ROUTER=(list) qualifier to the CREATE FACILITY command. This list specifically determines the preferred router. If the first router specified is not available, the next one on the list is chosen. When the facility is created on the frontend, the list of routers specified can be a subset of the routers contained within the entire facility. This can be used to prevent a frontend from selecting a router reserved for other frontend systems. Failback of routers is supported. This means that if the preferred router was not available, and it became available later, the frontend would automatically fail back and connect to its preferred router.

Chapter 3
Partition Management

3.1 Overview

This section describes the concepts and operations of RTR partitions.

3.1.1 What is a Partition?

Partitions are subdivisions of a routing key range of values. They are used with a partitioned data model and RTR data-content routing. Partitions exist for each distinct range of values in the routing key for which a server is available to process transactions. RTR provides for failure tolerance by allowing system operators to start redundant instances of partitions in a distributed network and by automatically managing the state and flow of transactions to the partition instances.

Partition instances support the following relationships:

Concurrency - permits multiple server channels to be connected to an instance of a partition.
Standbys - permits multiple instances of a partition to be distributed over the nodes of a cluster. A standby set may have as many members as a cluster has nodes, or with some restrictions you may place a standby on any network node. At any one time, one member of the set is active while the others wait in standby mode to take over if the active member fails.
Shadows - provide site disaster protection by allowing replication of transaction processing at a remote site. A pair of partition instances (or standby sets) cooperate to provide this replication, with provision for automatic recovery of a shadow member restarting after a failure.

Prior to RTR V3.2, the creation and behavior of a partition was linked to the declaration of server application channels. Partitions and their characteristics can now be defined by the system operator. This has the following advantages:

Allows a further decoupling of the application from its operating environment, therefore reducing application programming requirements
Allows the system operators to make choices concerning the runtime behavior of the system

3.1.2 What is Partition Management?

Prior to RTR V3.2, the management of a partition state was an entirely automatic function of the distributed RTR system. Starting with RTR V3.2, the system operator can issue commands to control certain partition characteristics, and to set preferences concerning partition behavior.

3.2 Partition Naming

A prerequisite for partition management is the ability to identify a partition in the system that is to be the subject of management commands. For this purpose, partitions have been given names, which may be drawn from a number of sources described here.

3.2.1 Default Partition Names

Partitions receive automatically generated default names, in the form RTR$DEFAULT_PARTITION_ number , unless the name is supplied by one of the methods described in the following sections. This allows system operators access to the partition command set without having to change existing application programs or site configuration procedures.

3.2.2 Programmer-Supplied Names

An extension to the rtr_open_channel() call allows the application programmer to supply a name when opening a server channel. The pkeyseg argument specifies an additional item of type rtr_keyseg_t , assigning the following values:

ks_type = rtr_keyseg_partition indicates that a partition name is being passed.
ks_lo_bound should point to the null-terminated string to use for the partition name.
ks_hi_bound must be NULL.

Using this model, the partition segments and key ranges served by the server are still specified by the server when the channel is opened.

3.2.3 System-Manager Supplied Partition Names

Partitions can be defined by the system manager using the create partition system management command, or by using rtr_open_channel() flag arguments. The system manager can set partition characteristics with this command and applications can open channels to the partition by name. See Section 3.4 for an example of passing a partition name with rtr_open_channel() .

3.2.4 Name Format and Scope

A valid partition name can contain no more than 63 characters. It can combine alphanumeric characters (abc123), the underscore (_), and the dollar sign ($). Partition names must be unique within a facility name and should be referenced on the command line with the facility name when using partition commands. Partition names exist only on the backend where the partition resides. You will not see the partition names at the RTR routers.

3.3 Life Cycle of a Partition

This section describes the life cycle of partitions, including the ways they can be created and their persistence.

3.3.1 Implicit Partition Creation

Partitions are created implicitly when an application program calls rtr_open_channel() to create a server channel, specifying the key segments and value ranges for the segments with the pkeyseg argument. Other partition attributes are established with the flags argument. Prior to RTR V3.2, this was the only way partitions could be created. Partitions created in this way are automatically deleted when the last server channel to the partition is closed.

3.3.2 Explicit Partition Creation

Partitions can also be created by the system operator before server application program start up using system management commands. This gives the operator more control over partition characteristics. Partitions created in this way remain in the system until either explicitly deleted by the operator, or RTR is stopped.

3.3.3 Persistence of Partition Definitions

RTR stores partition definitions in the journal and records for each transaction the partition in which it was processed. This is convenient when viewing or editing the contents of the journal (using the SET TRANSACTION command), where the partition name can be used to select a subset of the transactions in the journal. RTR will not permit a change in the partition name or definition as long as transactions remain in the journal that were processed under the current name or definition for the partition. If transactions remain in the journal and you need to change the partition name or definition, you can take one of the following actions:

Start appropriate servers to complete processing of the transactions.
Remove the transactions from the journal with the SET TRANSACTION command.
Replace the RTR journal with the CREATE JOURNAL/SUPERSEDE command. Note that this will destroy any transactions remaining in the journal and should be done with caution.

3.4 Binding Server Channels to Named Partitions

For a server application to be able to open a channel to an explicitly created partition, the application passes the name of the partition through the pkeyseg argument of rtr_open_channel() call. It is not necessary to pass key segment descriptors, but if the application does, they must be compatible with the existing partition definition. You may pass partition characteristics through the flags argument, but these will be superseded by those of the existing partition.

RTR> create partition/KEY1=(type. . .) par_one . . . rtr_keyseg_t partition_name; partition_name.ks_type = rtr_keyseg_partition; partition_name.ks_lo_bound = "par_one"; status - rtr_open_channel(..., RTR_F_OPE_SERVER,..., 1, &partition_name);

In summary, to fully decouple server applications from the definition of the partitions to be processed, write applications that open server channels where only the required partition name is passed. Leave the management of the partition characteristics to the system managers and operators.

3.5 Entering Partition Commands

Partitions can be managed by issuing partition commands directed at the required partitions after they are created. Partition commands can be entered in one of two ways:

A command line processed by the RTR command line interface, for example RTR> SET PARTITION
Programmed using rtr_set_info()

Enter partition commands on the backend where the partition is located. Note that commands that affect a partition state only take effect once the first server joins a partition. Errors encountered at that time will appear as log file entries. Using partition commands to change the state of the system causes a log file entry.

3.5.1 Command Line Usage

Partition management in the RTR command language is implemented with the following command set:

RTR> CREATE PARTITION
RTR> SET PARTITION
RTR> DELETE PARTITION

The name of the facility in which the partition resides can be specified with the /FACILITY command line qualifier, or as a colon-separated prefix to the partition name (for example Facility1:Partition1). Detailed descriptions of the command syntax are given in the Command Line Reference section of this manual, and are summarized in the following discussions. Examples in the following sections use a partition name of Partition1 in the facility name of Facility1.

Contents

Index