hp Reliable Transaction Router
System Manager's Manual


Previous Contents Index

2.17.1 Recognized Clusters

OpenVMS and Tru64 UNIX clusters use mechanisms such as a lock manager to deal with file sharing and sustained availability. The cluster configuration includes dual-ported disks to provide access from multiple CPUs.

2.17.1.1 OpenVMS Clusters and Tru64 Truclusters

In Figure 2-6, N1 and N2 are nodes in an OpenVMS cluster. When the active process P1A fails, the standby process P1S takes over. Whenever standby takeover occurs as part of takeover activity, the standby server undergoes a recovery process in which it tries to recover any uncertain transactions that the active server was processing when the failure occurred.

If node N1 failed, then RTR on node N2 opens the failed node's RTR journal and recovers any uncertain transactions from it, thereby ensuring transaction consistency. If only the RTR server process failed, the failed node (N1) still has its journal open so RTR does not try to open the journal directly. Instead, it asks the remote RTR system N2 to recover any uncertain transactions. This behavior imposes certain requirements on the accessibility of the journal.

2.17.1.2 Journal Location

Since the node that takes over needs to open the journal of the failed node, this journal must be placed on the cluster file system. If the journal is not on the cluster file system, the standby recovery process will continue to scan the file systems for the journal and the partition will never come out of recovery. As long as RTR is unable to access the required journal and the system operator does not enter an overriding system management command, the partition state remains in lcl_rec_fail .

2.17.1.3 Journal Locking

RTR uses the distributed lock manager (DLM) to coordinate access to the journal file. Normally each node locks and opens its own journal file. During recovery, some other node may receive the lock and open the journal. However, when the owning node is restored, RTR will request release of the journal. In this case, the remote node will release the lock on this journal, and the owner node can open its journal. If the node loses cluster quorum, then RTR releases locks on this journal and lets another node take over.

2.17.1.4 Cluster Communications

When setting up networks and cluster communications in an OpenVMS or Tru64 cluster that are intended for RTR standby operations, avoid the situation where RTR loses quorum on the node while the OpenVMS or Tru64 cluster has quorum. This can happen if there is one interface for cluster traffic and a completely separate interface for network traffic (IP, DECnet). In this case, if the network interface breaks, then RTR will view the node as unreachable and therefore inquorate.

However, since cluster communication is still intact, the operating system does not lose cluster quorum. Since RTR has lost quorum, another node will try to take over, but since the operating system cluster has not lost quorum, the lock on the journal will not be released and recovery will not complete. The key point is to avoid situations where a backend node can lose network communication to its RTR routers yet remain a viable member of its cluster.

2.17.2 Windows Clusters

Windows clusters, unlike OpenVMS and Tru64 Clusters, are not shared-all clusters. Windows clusters use the concept of host-based clustering, that is, one node physically mounts the shared disks and makes the shared disks available as a network share to all other nodes in the cluster. If the host node fails, then one of the other nodes will rehost the disks. This rehosting is handled by the Windows clustering software. Only two-node Windows cluster configurations are supported for RTR. In terms of Windows clusters, RTR is an application and the RTR journals are the database resource that fails over between the Windows cluster servers. (A good reference for Windows clustering information is Joseph M. Lamb's Windows 2000 Clustering and Load Balancing Handbook available from Prentice-Hall.)

2.17.2.1 Journal Location

The RTR journal for both Windows NT servers must be located on the same disk on the SCSI bus that is shared between the two NT cluster servers. The RTR registry entry for the journal must be set to the same value on both server nodes. Furthermore, the registry entry should specify the journal disk using the path qualified by the cluster name. For example, if the cluster name is ALPHACLUSTER , and the journal disk has the cluster share name DISK1 , then the RTR journal registry entry should be entered as


\\ALPHACLUSTER\DISK1 

which can be modified using the Registry Editor. The registry key for the journal is found under


\HKEY_LOCAL_MACHINE\SOFTWARE\Compaq 
Computer Corporation\Reliable Transaction Router\Journal 

There is no default and the value must be in the given format. If the journal is not located on a shared disk in a Windows cluster configuration, then RTR behaves as a standalone RTR node and no use is made of cluster functionality.

2.17.2.2 Facility Role Definition

The computers (nodes) participating in RTR Facilities that are using the standby features must be configured with both a backend role and a router role.

2.17.2.3 RTR Home Directory

In a Windows cluster configuration, the RTR home directory must not be located on a shared SCSI disk. RTR creates lock files in the RTR home directory and the journal directory during normal operation. These are of the form N*.LCK or N*.BLK, and C*.LCK or C*.BLK. These files may be left in these directories after RTR has been stopped, but they will be reused once RTR is started again. There is no real need for a daemon to purge these files at system boot time.

2.17.2.4 Cluster Failover

The cluster failover group containing the disk share on which the RTR journal files are located must not have failback policy enabled. That is, if the failover group fails over to the secondary cluster node due to a primary server outage, the group must not fail back to the primary node once the primary node is available again. As long as RTR facilities have been defined in a cluster configuration, then the failover group with the journal device must not be manually failed over to the other cluster server by the cluster administrator. Failover should only occur at the discretion of the cluster failover manager software.

2.17.3 Unrecognized Clusters

Unrecognized or unsupported clusters have different behaviour than recognized or supported clusters.

The default behavior in unrecognized cluster systems is to treat them as non-clustered. However, RTR standby failover will still work. RTR will fail over to the standby server process if the active server process fails. This standby takeover also performs recovery. If it is only the active server process that failed, then RTR can still recover any uncertain transactions through the remote RTR process. If, however, the node itself becomes unavailable (from, for example, an RTR crash, a node crash or a network crash) then the recovery process performs a journal scan to locate the journal of the failed node.

But unlike the case of a recognized cluster, RTR does not wait for the journal to become available. Instead, it changes to the active state and continues to process transactions. Any incomplete transactions in the failed node's journal will remain there; these transactions are not lost. They are eventually recovered when the failed node becomes active again, although their sequencing will be lost.

2.17.4 Enhancing Recovery on Sun Systems

RTR supports the use of external scripts to complement RTR standby failover in unclustered configurations. This behavior is enabled with the environment variable RTR_STANDBY_WITHOUT_CLUSTER . When this environment variable is set, it modifies the behavior of RTR standby failover as described in Failover for Sun. Note that this feature is currently only available on Solaris platforms.

Failover for Sun

When the active node or RTR goes down, the standby node begins to fail over. As part of its failover, it scans the available file systems for the RTR journal of the previously active node that failed. This scanning continues until the journal is found. External scripts can then be run to make the journal available using volume management, rehosting disks or other methods; however NFS mounts or network shares will not be accepted. Once the journal is available, the currently active node can open and lock the remote journal.

Failback

Since the current active node then has the remote journal locked, when the standby node is restarted, it will not have the journal available. When the facility is created on the standby node, the facility creation event generated on the active node will close the remote journal. Additionally, an external user-written script called freeremotedisk is also called. User-defined commands can be put in this script to cause migration of the disk back to its original owner. Once the journal is available in the standby node's file system, it is automatically opened. The user-defined script freeremotedisk should be located in /opt/rtr/RTR400/bin . Output from the execution of this script is sent to /rtr/freeremotedisk.LOG . Execution of this script is also logged to the RTR log file.

2.17.4.1 Restrictions

Whenever the RTR_STANDBY_WITHOUT_CLUSTER variable is set, it is also recommended that RTR_JAM_FAILOVER_WAIT_SECS be set to some suitable value such as 20 seconds. This is the interval after which RTR will poll to find the remote journal during failover. By default, this is set to zero which can lead to high CPU usage. This feature should be restricted to use in two-node clusters with each node assigned both a backend and a router role. Before changing these environment variables, make sure that all RTR processes have been shut down.


Chapter 3
Partition Management

3.1 Overview

This section describes the concepts and operations of RTR partitions.

3.1.1 What is a Partition?

Partitions are subdivisions of a routing key range of values. They are used with a partitioned data model and RTR data-content routing. Partitions exist for each distinct range of values in the routing key for which a server is available to process transactions. RTR provides for failure tolerance by allowing system operators to start separate instances of partitions in a distributed network and by automatically managing the state and flow of transactions to the partition instances.

Partition instances support the following relationships:

The system operator can issue commands to control certain partition characteristics, and to set preferences concerning partition behavior.

3.2 Partition Naming

A prerequisite for partition management is the ability to identify a partition in the system that is to be the subject of management commands. For this purpose, partitions have names, either by default, supplied by the programmer, or supplied by the system manager.

3.2.1 Name Format and Scope

A valid partition name can contain no more than 63 characters. It can combine alphanumeric characters (abc123), the underscore (_), and the dollar sign ($). Partition names must be unique within a facility name and should be referenced on the command line with the facility name when using partition commands. Partition names exist only on the backend where the partition resides. You will not see the partition names at the RTR routers.

3.2.2 Default Partition Names

Partitions can receive automatically generated default names, in the form RTR$DEFAULT_PARTITION, unless the name is supplied.

3.2.3 Programmer-Supplied Names

The application programmer can supply a name when opening a server channel with the rtr_open_channel() call. The pkeyseg argument specifies an additional item of type rtr_keyseg_t , assigning the following values:

Using this model, the partition segments and key ranges served by the server are still specified by the server when the channel is opened.

3.2.4 System-Manager Supplied Partition Names

The system manager can supply partition names using the create partition system management command, or by using rtr_open_channel() flag arguments. The system manager can set partition characteristics with this command and applications can open channels to the partition by name. See Section 3.4 for an example of passing a partition name with rtr_open_channel() .

3.3 Life Cycle of a Partition

This section describes the life cycle of partitions, including the ways they can be created and their persistence.

3.3.1 Implicit Partition Creation

Partitions are created implicitly when an application program calls rtr_open_channel() to create a server channel, specifying the key segments and value ranges for the segments with the pkeyseg argument. Other partition attributes are established with the flags argument. Prior to RTR V3.2, this was the only way partitions could be created. Partitions created in this way are automatically deleted when the last server channel to the partition is closed.

3.3.2 Explicit Partition Creation

Partitions can also be created by the system operator before server application program start up using system management commands. This gives the operator more control over partition characteristics. Partitions created in this way remain in the system until either explicitly deleted by the operator, or RTR is stopped.

3.3.3 Persistence of Partition Definitions

RTR stores partition definitions in the journal and records for each transaction the partition in which it was processed. This is convenient when viewing or editing the contents of the journal (using the SET TRANSACTION command), where the partition name can be used to select a subset of the transactions in the journal. RTR will not permit a change in the partition name or definition as long as transactions remain in the journal that were processed under the current name or definition for the partition. If transactions remain in the journal and you need to change the partition name or definition, you can take one of the following actions:

3.4 Binding Server Channels to Named Partitions

For a server application to be able to open a channel to an explicitly created partition, the application passes the name of the partition through the pkeyseg argument of rtr_open_channel() call. It is not necessary to pass key segment descriptors, but if the application does, they must be compatible with the existing partition definition. You may pass partition characteristics through the flags argument, but these will be superseded by those of the existing partition.


    RTR> create partition/KEY1=(type. . .) par_one 
     . . . 
    rtr_keyseg_t    partition_name; 
 
    partition_name.ks_type = rtr_keyseg_partition; 
    partition_name.ks_lo_bound = "par_one"; 
 
    status - rtr_open_channel(..., RTR_F_OPE_SERVER,..., 1, &partition_name); 

In summary, to fully decouple server applications from the definition of the partitions to be processed, write applications that open server channels where only the required partition name is passed. Leave the management of the partition characteristics to the system managers and operators.

3.5 Entering Partition Commands

Partitions can be managed by issuing partition commands directed at the required partitions after they are created. Partition commands can be entered in one of two ways:

Enter partition commands on the backend where the partition is located. Note that commands that affect a partition state only take effect once the first server joins a partition. Errors encountered at that time will appear as log file entries. Using partition commands to change the state of the system causes a log file entry.

3.5.1 Command Line Usage

Partition management in the RTR command language is implemented with the following command set:

The name of the facility in which the partition resides can be specified with the /FACILITY command line qualifier, or as a colon-separated prefix to the partition name (for example Facility1:Partition1). Detailed descriptions of the command syntax are given in the Command Line Reference section of this manual, and are summarized in the following discussions. Examples in the following sections use a partition name of Partition1 in the facility name of Facility1.

3.5.2 Programmed Partition Management

Partition commands are programmed using rtr_set_info() . Usage of the arguments are as follows:

The rtr_set_info() call completes asynchronously. If the function call is successful, completion is signaled by the delivery of an RTR message of type rtr_mt_closed on the channel whose identifier is returned through the pchannel argument. The programmer should retrieve this message by using rtr_receive_message() . The data accompanying the message is of type rtr_status_data_t . The completion status of the partition command can be accessed as the status field of the message data.

3.6 Managing Partitions

A set of commands or program calls are used to manage partitions. Information on managing partitions is provided in this section.

3.6.1 Controlling Shadowing

The state of shadowing for a partition can be enabled or disabled. This can be useful in the following circumstances:

The following restrictions apply:

Once shadowing is disabled, the secondary site servers will be unable to start up in shadow mode until shadowing is enabled again. Shadowing for the partition can be turned on by entering the command at the current active backend member or on any of its standbys.


RTR> SET PARTITION/SHADOW Facility1:Partition1 

For further information, see the SET PARTITION command in Chapter 8.

To enable shadowing, program the set_qualifier argument of rtr_set_info() as follows:


    rtr_qualifier_value_t   set_qualifiers[ 2 ]; 
    rtr_partition_state_t  newState = rtr_partition_state_shadow; 
 
    set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; 
    set_qualifiers[ 0 ].qv_value  = &newState; 
    set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; 
    set_qualifiers[ 1 ].qv_value  = NULL; 

To disable shadowing, specify newState as rtr_partition_state_noshadow .


Previous Next Contents Index