Reliable Transaction Router
System Manager's Manual

The same priority list order should be used on all partition members. If a different list is used, the router will determine the sequence number for conflicting members through the order in which those members joined the facility. For example, if the above command were issued only on Bronze, and Gold had the opposite priority list, the router would assign the lower sequence number to the backend that joined the facility first.

The /PRIORITY_LIST feature is very useful in cluster configurations. For example, Site A and Site B each contain 2-node clusters, as shown in the following figure. The facility is configured such that at Site A, Node-A1 has the primary active partition and Node-A2 has the standby partition. At Site B, Node-B1 is the secondary active partition and Node-B2 has the standby of the secondary.

Figure 2-4 Cluster Configuration

The partition could be defined such that the standby node, Node-A2, would become active if the primary node were to fail. For example, issuing the following command on all four nodes for this partition guarantees that the specified list is followed when there is a failure.

SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-A2,Node-B1,Node-B2)

Using the SHOW PARTITION command from the router, this partition would be as follows:

Router partitions on node SILVER in group "test" at Mon Mar 22 17:22:06 1999 State: ACTIVE Low bound: 0 High bound: 4294967295 Failover policy: fail_to_standby Backends: node-a1,node-a2,node-b1,node-b2 States: pri_act,standby,sec_act,standby Relative priorities: 1,2,3,4 Primary main: node-a1 Shadow main: node-b1

However, the partition could also be configured so that the secondary active node, Node-B1, would become the primary node if the original primary system were to fail. This is controlled with the /FAILOVER_POLICY qualifier to the SET PARTITION command. The default is /FAILOVER_POLICY=STAND_BY.

If the relative priority (sequence number) for Node-A2 is changed to four, it still becomes the primary active server if Node-A1 fails because the failover policy indicates a fail_to_standby requirement for this partition.

SET PARTITION test/PRIORITY_LIST=(Node-A1,Node-B1,Node-B2,Node-A2)

After issuing this command, the router partition appears as follows. Note the change in relative priorities for the backends.

Router partitions on node SILVER in group test at Tue Mar 23 13:29:41 1999 State: ACTIVE Low bound: 0 High bound: 4294967295 Failover policy: fail_to_standby Backends: node-a1,node-a2,node-b1,node-b2 States: pri_act,standby,sec_act,standby Relative priorities: 1,4,2,3 Primary main: node-a1 Shadow main: node-b1

Use the following SET PARTITION command to change the facility so that Node-B1 will become the primary active server if Node-A1 fails.

SET PARTITION test/FAILOVER_POLICY=shadow

Use the /FAILOVER_POLICY qualifier to select a new active primary in configurations where shadowing is enabled. This qualifier takes precedence over the /PRIORITY_LIST qualifier. Use the /PRIORITY_LIST qualifier to determine the failover order for specific nodes. It is most useful in cluster configurations where it can specify the exact failover order for the nodes within the cluster. For example, in a standby facility on a cluster of four nodes, the /PRIORITY_LIST qualifier can specify the desired order of failover for those cluster members. Some machines within a cluster may be more powerful than other machines. This feature allows for the most efficient use of those machines.

2.16.2 Router Selection

Within the scope of a given facility, routers and backends connect to one another. However, nodes with a specific role do not connect to nodes with the same role, i.e., routers do not connect to other routers. Frontends choose only one router to connect to at a given time. This router is called the Current Router for that frontend within the scope of a facility.

A backend connects to all routers defined within a facility. The connected router with the lowest network address is designated the master router. Internally, a node is identified through a structure called the Kernel Net ID. The Kernel Net ID is a concatenation of all network addresses a node is known as for all the protocols and interfaces that it supports. The master router designation is only relevant to a backend. It is where the backend goes to obtain and verify partition configuration and facility information.

Routers are made known to the frontend systems through the list specified in the /ROUTER=(list) qualifier to the CREATE FACILITY command. This list specifically determines the preferred router. If the first router specified is not available, the next one on the list is chosen. When the facility is created on the frontend, the list of routers specified can be a subset of the routers contained within the entire facility. This can be used to prevent a frontend from selecting a router reserved for other frontend systems. Failback of routers is supported. This means that if the preferred router was not available, and it became available later, the frontend would automatically fail back and connect to its preferred router.

Chapter 3
Partition Management

3.1 Overview

This section describes the concepts and operations of RTR partitions.

3.1.1 What is a Partition?

Partitions are subdivisions of a routing key range of values. They are used with a partitioned data model and RTR data-content routing. Partitions exist for each distinct range of values in the routing key for which a server is available to process transactions. RTR provides for failure tolerance by allowing system operators to start redundant instances of partitions in a distributed network and by automatically managing the state and flow of transactions to the partition instances.

Partition instances support the following relationships:

Concurrency - permits multiple server channels to be connected to an instance of a partition.
Standbys - permits multiple instances of a partition to be distributed over the nodes of a cluster. A standby set may have as many members as a cluster has nodes, or with some restrictions you may place a standby on any network node. At any one time, one member of the set is active while the others wait in standby mode to take over if the active member fails.
Shadows - provide site disaster protection by allowing replication of transaction processing at a remote site. A pair of partition instances (or standby sets) cooperate to provide this replication, with provision for automatic recovery of a shadow member restarting after a failure.

Prior to RTR V3.2, the creation and behavior of a partition was linked to the declaration of server application channels. Partitions and their characteristics can now be defined by the system operator. This has the following advantages:

Allows a further decoupling of the application from its operating environment, therefore reducing application programming requirements
Allows the system operators to make choices concerning the runtime behavior of the system

3.1.2 What is Partition Management?

Prior to RTR V3.2, the management of a partition state was an entirely automatic function of the distributed RTR system. Starting with RTR V3.2, the system operator can issue commands to control certain partition characteristics, and to set preferences concerning partition behavior.

3.2 Partition Naming

A prerequisite for partition management is the ability to identify a partition in the system that is to be the subject of management commands. For this purpose, partitions have been given names, which may be drawn from a number of sources described here.

3.2.1 Default Partition Names

Partitions receive automatically generated default names, in the form RTR$DEFAULT_PARTITION_ number , unless the name is supplied by one of the methods described in the following sections. This allows system operators access to the partition command set without having to change existing application programs or site configuration procedures.

3.2.2 Programmer-Supplied Names

An extension to the rtr_open_channel() call allows the application programmer to supply a name when opening a server channel. The pkeyseg argument specifies an additional item of type rtr_keyseg_t , assigning the following values:

ks_type = rtr_keyseg_partition indicates that a partition name is being passed.
ks_lo_bound should point to the null-terminated string to use for the partition name.
ks_hi_bound must be NULL.

Using this model, the partition segments and key ranges served by the server are still specified by the server when the channel is opened.

3.2.3 System-Manager Supplied Partition Names

Partitions can be defined by the system manager using the create partition system management command, or by using rtr_open_channel() flag arguments. The system manager can set partition characteristics with this command and applications can open channels to the partition by name. See Section 3.4 for an example of passing a partition name with rtr_open_channel() .

3.2.4 Name Format and Scope

A valid partition name can contain no more than 63 characters. It can combine alphanumeric characters (abc123), the underscore (_), and the dollar sign ($). Partition names must be unique within a facility name and should be referenced on the command line with the facility name when using partition commands. Partition names exist only on the backend where the partition resides. You wwil not see the partition names at the RTR routers.

3.3 Life Cycle of a Partition

This section describes the life cycle of partitions, including the ways they can be created and their persistence.

3.3.1 Implicit Partition Creation

Partitions are created implicitly when an application program calls rtr_open_channel() to create a server channel, specifying the key segments and value ranges for the segments with the pkeyseg argument. Other partition attributes are established with the flags argument. Prior to RTR V3.2, this was the only way partitions could be created. Partitions created in this way are automatically deleted when the last server channel to the partition is closed.

3.3.2 Explicit Partition Creation

Partitions can also be created by the system operator before server application program start up using system management commands. This gives the operator more control over partition characteristics. Partitions created in this way remain in the system until either explicitly deleted by the operator, or RTR is stopped.

3.3.3 Persistence of Partition Definitions

RTR stores partition definitions in the journal and records for each transaction the partition in which it was processed. This is convenient when viewing or editing the contents of the journal (using the SET TRANSACTION command), where the partition name can be used to select a subset of the transactions in the journal. RTR will not permit a change in the partition name or definition as long as transactions remain in the journal that were processed under the current name or definition for the partition. If transactions remain in the journal and you need to change the partition name or definition, you can take one of the following actions:

Start appropriate servers to complete processing of the transactions.
Remove the transactions from the journal with the SET TRANSACTION command.
Replace the RTR journal with the CREATE JOURNAL/SUPERSEDE command. Note that this will destroy any transactions remaining in the journal and should be done with caution.

3.4 Binding Server Channels to Named Partitions

For a server application to be able to open a channel to an explicitly created partition, the application passes the name of the partition through the pkeyseg argument of rtr_open_channel() call. It is not necessary to pass key segment descriptors, but if the application does, they must be compatible with the existing partition definition. You may pass partition characteristics through the flags argument, but these will be superseded by those of the existing partition.

RTR> create partition/KEY1=(type. . .) par_one . . . rtr_keyseg_t partition_name; partition_name.ks_type = rtr_keyseg_partition; partition_name.ks_lo_bound = "par_one"; status - rtr_open_channel(..., RTR_F_OPE_SERVER,..., 1, &partition_name);

In summary, to fully decouple server applications from the definition of the partitions to be processed, write applications that open server channels where only the required partition name is passed. Leave the management of the partition characteristics to the system managers and operators.

3.5 Entering Partition Commands

Partitions can be managed by issuing partition commands directed at the required partitions after they are created. Partition commands can be entered in one of two ways:

A command line processed by the RTR command line interface, for example RTR> SET PARTITION
Programmed using rtr_set_info()

Enter partition commands on the backend where the partition is located. Note that commands that affect a partition state only take effect once the first server joins a partition. Errors encountered at that time will appear as log file entries. Using partition commands to change the state of the system causes a log file entry.

3.5.1 Command Line Usage

Partition management in the RTR command language is implemented with the following command set:

RTR> CREATE PARTITION
RTR> SET PARTITION
RTR> DELETE PARTITION

The name of the facility in which the partition resides can be specified with the /FACILITY command line qualifier, or as a colon-separated prefix to the partition name (for example Facility1:Partition1). Detailed descriptions of the command syntax are given in the Command Line Reference section of this manual, and are summarized in the following discussions. Examples in the following sections use a partition name of Partition1 in the facility name of Facility1.

3.5.2 Programmed Partition Management

Partition commands are programmed using rtr_set_info() . Usage of the arguments are as follows:

pchannel supplies the address of a rtr_channel_t to receive the channel opened in the event of a successful call.
Flags must be RTR_NO_FLAGS .
Verb must be the value verb_set (from the enumeration rtr_verb_t ).
Object must be rtr_partition_object .

select_qualifiers should identify the facility and partition, by name:

rtr_qualifier_value_t select_qualifiers[ 3 ]; select_qualifiers[ 0 ].qv_qualifier = rtr_facility_name; select_qualifiers[ 0 ].qv_value = "your_facility_name_here"; select_qualifiers[ 1 ].qv_qualifier = rtr_partition_name; select_qualifiers[ 1 ].qv_value = "your_partition_name_here"; select_qualifiers[ 2 ].qv_qualifier = rtr_qualifiers_end; select_qualifiers[ 2 ].qv_value = NULL;

The set_qualifier list expresses the required change in partition behavior or characteristic.

The rtr_set_info() call completes asynchronously. If the function call is successful, completion is signaled by the delivery of an RTR message of type rtr_mt_closed on the channel whose identifier is returned through the pchannel argument. The programmer should retrieve this message by using rtr_receive_message() . The data accompanying the message is of type rtr_status_data_t . The completion status of the partition command can be accessed as the status field of the message data.

3.6 Managing Partitions

A set of commands or program calls are used to manage partitions. Information on managing partitions is provided in this section.

3.6.1 Controlling Shadowing

The state of shadowing for a partition can be enabled or disabled. This can be useful in the following circumstances:

Enabling site disaster protection for an application partition for the first time
A recovery aid following prolonged outage of a former shadow site.

The following restrictions apply:

Shadowing for a partition can be turned off only in the absence of an active secondary site.
The active member must be running in remember mode.
The command will fail if entered on either an active primary or secondary with a message to this effect.
If entered on a standby of either the primary or secondary, the command is accepted but fails in the RTR router. This failure is recorded with a log file entry at the router.

Once shadowing is disabled, the secondary site servers will be unable to start up in shadow mode until shadowing is enabled again. Shadowing for the partition can be turned on by entering the command at the current active member or on any of its standbys.

RTR> SET PARTITION/FACILITY=Facility1/SHADOW Facility1:Partition1

For further information, see the SET PARTITION command in Chapter 6.

To enable shadowing, program the set_qualifier argument of rtr_set_info() as follows:

rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_partition_state_t newState = rtr_partition_state_shadow; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL;

To disable shadowing, specify newState as rtr_partition_state_noshadow .

3.6.2 Controlling Transaction Presentation

Transaction presentation is the process of passing transactions to idle server channels for processing. While transaction presentation is active, new transactions are started on the first free server channel for the appropriate partition.

Use the /SUSPEND qualifier to the SET PARTITION command to halt the presentation of new transactions to servers on the backend where the command is entered. The command completes when the processing of all currently active transactions is complete. The optional /TIMEOUT qualifier specifies, as a number of seconds, the time that the command waits for completion. If the command times out, presentation of new transactions are suspended, but there still exist transactions for which servers have yet to complete processing. The operator must decide either to reenter the command and wait a further period of time, or resume the partition. Note that use of this command does not affect any transaction timeout value specified by RTR clients, so such transactions may encounter a timeout condition if the partition remains suspended.

The /RESUME qualifier restarts presentation of transactions to the server application channels.

The following examples show how to use the qualifiers:

RTR> SET PARTITION/FACILITY=Facility1/SUSPEND/TIMEOUT=5 Facility1:Partition1 RTR> RTR> SET PARTITION/FACILITY=Facility1/RESUME Facility1:Partition1

For a more complete description, see the SET PARTITION command in Chapter 6.

To suspend transaction presentation on a partition with a timeout of 30 seconds, program the set_qualifier argument of the rtr_set_info() call as follows:

rtr_qualifier_value_t set_qualifiers[ 3 ]; rtr_partition_state_t newState = rtr_partition_state_suspend; rtr_uns_32_t ulTimeoutSecs = 30; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_partition_cmd_timeout_secs; set_qualifiers[ 1 ].qv_value = &ulTimeoutSecs; set_qualifiers[ 2 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 2 ].qv_value = NULL;

Note that the timeout is an optional element. To resume transaction presentation, specify newState as rtr_partition_state_resume .

3.6.3 Controlling Recovery

The purpose of RTR automated recovery is to ensure the best possible consistency of application databases across a distributed computing environment. To achieve this, RTR relies in part on information stored in the journals of the participating systems. Should one or more of these systems be unavailable at recovery time, automated recovery may stall or fail awaiting availability of these systems and their journals. This is good from the point of view of data consistency, but bad when viewed from an application availability perspective.

If a partition enters a wait state or fails, but has neither a local or remote journal, an operator can instruct RTR to skip the current step in the recovery process with the /IGNORE_RECOVERY qualifier. Since this command bypasses parts of the recovery cycle, use it with caution in cases where availability is valued over consistency in application databases.

The recovery cycle can also be manually restarted with the /RESTART_RECOVERY qualifier. This may be useful if the operator previously aborted automated recovery. Since this command can result in recovery of transactions from previously inaccessible journals, do not use this if your applications are sensitive to the order in which transactions are processed by the servers.

The following example shows how to use the qualifiers:

RTR> SET PARTITION/FACILITY=Facility1/IGNORE_RECOVERY Facility1:Partition1 RTR> RTR> SET PARTITION/FACILITY=Facility1/RESTART_RECOVERY Facility1:Partition1

A complete description of the SET PARTITION command qualifiers can be found in Chapter 6.

To terminate the current recovery state, program the set_qualifier argument of rtr_set_info() as follows:

rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_partition_state_t newState = rtr_partition_state_exitwait; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL;

To restart recovery, specify newState as rtr_partition_state_recover .

Contents

Index

Reliable Transaction RouterSystem Manager's Manual

Chapter 3Partition Management

3.1 Overview

3.1.2 What is Partition Management?

3.4 Binding Server Channels to Named Partitions

Reliable Transaction Router
System Manager's Manual

Chapter 3
Partition Management