Updated: 11 December 1998 |
OpenVMS Cluster Systems
Previous | Contents | Index |
The connection manager ensures that computers in an OpenVMS Cluster system communicate with one another to enforce the rules of cluster membership.
Computers in an OpenVMS Cluster system share various data and system
resources, such as access to disks and files. To achieve the
coordination that is necessary to maintain resource integrity, the
computers must maintain a clear record of cluster membership.
2.3.1 Connection Manager
The connection manager creates an OpenVMS Cluster when the first computer is booted and reconfigures the cluster when computers join or leave it during cluster state transitions. The overall responsibilities of the connection manager are to:
A primary purpose of the connection manager is to prevent cluster partitioning, a condition in which nodes in an existing OpenVMS Cluster configuration divide into two or more independent clusters.
Cluster partitioning can result in data file corruption because the
distributed lock manager cannot coordinate access to shared resources
for multiple OpenVMS Cluster systems. The connection manager prevents
cluster partitioning using a quorum algorithm.
2.3.3 Quorum Algorithm
The quorum algorithm is a mathematical method for determining if a
majority of OpenVMS Cluster members exist so resources can be shared
across an OpenVMS Cluster system. Quorum is a dynamic
value calculated by the connection manager to prevent cluster
partitioning. The connection manager allows processing to occur only if
a majority of the OpenVMS Cluster members are functioning.
2.3.4 System Parameters
Two system parameters, VOTES and EXPECTED_VOTES, are key to the computations performed by the quorum algorithm. The following table describes these parameters.
Parameter | Description |
---|---|
VOTES |
Specifies a fixed number of votes that a computer contributes toward
quorum. The system manager can set the VOTES parameters on each
computer or allow the operating system to set it to the following
default values:
Each Alpha or VAX computer with a nonzero value for the VOTES system parameter is considered a voting member. |
EXPECTED_VOTES | Specifies the sum of all VOTES held by OpenVMS Cluster members. The initial value is used to derive an estimate of the correct quorum value for the cluster. The system manager must set this parameter on each active Alpha or VAX computer, including satellites, in the cluster. |
The quorum algorithm operates as follows:
Step | Action | ||||||
---|---|---|---|---|---|---|---|
1 |
When nodes in the OpenVMS Cluster boot, the connection manager uses the
largest value for EXPECTED_VOTES of all systems present to derive an
estimated quorum value according to the following
formula:
Estimated quorum = (EXPECTED_VOTES + 2)/2 | Rounded down |
||||||
2 |
During a state transition, the connection manager dynamically computes
the cluster quorum value to be the
maximum of the following:
|
||||||
3 |
The connection manager compares the cluster votes value to the cluster
quorum value and determines what action to take based on the following
conditions:
|
Note: When a node leaves the OpenVMS Cluster system,
the connection manager does not decrease the cluster quorum value. In
fact, the connection manager never decreases the cluster quorum value;
it only increases it. However, system managers can decrease the value
according to the instructions in Section 8.6.2.
2.3.6 Example
Consider a cluster consisting of three computers, each computer having
its VOTES parameter set to 1 and its EXPECTED_VOTES parameter set to 3.
The connection manager dynamically computes the cluster quorum value to
be 2 (that is, (3 + 2)/2). In this example, any two of the three
computers constitute a quorum and can run in the absence of the third
computer. No single computer can constitute a quorum by itself.
Therefore, there is no way the three OpenVMS Cluster computers can be
partitioned and run as two independent clusters.
2.3.7 Quorum Disk
A cluster system manager can designate a disk a quorum disk. The quorum disk acts as a virtual cluster member whose purpose is to add one vote to the total cluster votes. By establishing a quorum disk, you can increase the availability of a two-node cluster; such configurations can maintain quorum in the event of failure of either the quorum disk or one node, and continue operating.
Note: Setting up a quorum disk is recommended for OpenVMS Cluster configurations with only two nodes. A quorum disk is not necessary for configurations with more than two nodes.
For example, assume an OpenVMS Cluster configuration with many satellites (that have no votes) and two nonsatellite systems (each having one vote) that downline load the satellites. Quorum is calculated as follows:
(EXPECTED VOTES + 2)/2 = (2 + 2)/2 = 2 |
Because there is no quorum disk, if either of the nonsatellite systems depart from the cluster, only one vote remains and cluster quorum is lost. Activity will be blocked throughout the cluster until quorum is restored.
However, if the configuration includes a quorum disk (adding one vote to the total cluster votes), and the EXPECTED_VOTES parameter is set to 3 on each node, then quorum will still be 2 even if one of the nodes leaves the cluster. Quorum is calculated as follows:
(EXPECTED VOTES + 2)/2 = (3 + 2)/2 = 2 |
Rules: Each OpenVMS Cluster system can include only one quorum disk. At least one computer must have a direct (not served) connection to the quorum disk:
Reference: For more information about enabling a
quorum disk, see Section 8.2.4. Section 8.3.2 describes removing a quorum
disk.
2.3.8 Quorum Disk Watcher
To enable a computer as a quorum disk watcher, use one of the following methods:
Method | Perform These Steps |
---|---|
Run the CLUSTER_CONFIG.COM procedure
(described in Chapter 8) |
Invoke the procedure and:
The procedure uses the information you provide to update the values of the DISK_QUORUM and QDSKVOTES system parameters. |
Respond YES when the OpenVMS installation procedure asks whether the
cluster will contain a quorum disk
(described in Chapter 4) |
During the installation procedure:
The procedure uses the information you provide to update the values of the DISK_QUORUM and QDSKVOTES system parameters. |
Edit the
MODPARAMS or AGEN$ files (described in Chapter 8) |
Edit the following parameters:
|
Hint: If only one quorum disk watcher has direct
access to the quorum disk, then remove the disk and give its votes to
the node.
2.3.9 Rules for Specifying Quorum
For the quorum disk's votes to be counted in the total cluster votes, the following conditions must be met:
Hint: By increasing the quorum disk's votes to one
less than the total votes from both systems (and by increasing the
value of the EXPECTED_VOTES system parameter by the same amount), you
can boot and run the cluster with only one node.
2.4 State Transitions
OpenVMS Cluster state transitions occur when a computer joins or leaves an OpenVMS Cluster system and when the cluster recognizes a quorum disk state change. The connection manager controls these events to ensure the preservation of data integrity throughout the cluster.
A state transition's duration and effect on users (applications) are
determined by the reason for the transition, the configuration, and the
applications in use.
2.4.1 Adding a Member
Every transition goes through one or more phases, depending on whether its cause is the addition of a new OpenVMS Cluster member or the failure of a current member.
Table 2-2 describes the phases of a transition caused by the addition of a new member.
Phase | Description |
---|---|
New member detection |
Early in its boot sequence, a computer seeking membership in an OpenVMS
Cluster system sends messages to current members asking to join the
cluster. The first cluster member that receives the membership request
acts as the new computer's advocate and proposes reconfiguring the
cluster to include the computer in the cluster. While the new computer
is booting, no applications are affected.
Note: The connection manager will not allow a computer to join the OpenVMS Cluster system if the node's value for EXPECTED_VOTES would readjust quorum higher than calculated votes to cause the OpenVMS Cluster to suspend activity. |
Reconfiguration | During a configuration change due to a computer being added to an OpenVMS Cluster, all current OpenVMS Cluster members must establish communications with the new computer. Once communications are established, the new computer is admitted to the cluster. In some cases, the lock database is rebuilt. |
Table 2-3 describes the phases of a transition caused by the failure of a current OpenVMS Cluster member.
Cause | Description | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Failure detection |
The duration of this phase depends on the cause of the failure and on
how the failure is detected.
During normal cluster operation, messages sent from one computer to another are acknowledged when received.
|
||||||||||||||||||
Repair attempt | If the virtual circuit to an OpenVMS Cluster member is broken, attempts are made to repair the path. Repair attempts continue for an interval specified by the PAPOLLINTERVAL system parameter. (System managers can adjust the value of this parameter to suit local conditions.) Thereafter, the path is considered irrevocably broken, and steps must be taken to reconfigure the OpenVMS Cluster system so that all computers can once again communicate with each other and so that computers that cannot communicate are removed from the OpenVMS Cluster. | ||||||||||||||||||
Reconfiguration | If a cluster member is shut down or fails, the cluster must be reconfigured. One of the remaining computers acts as coordinator and exchanges messages with all other cluster members to determine an optimal cluster configuration with the most members and the most votes. This phase, during which all user (application) activity is blocked, usually lasts less than 3 seconds, although the actual time depends on the configuration. | ||||||||||||||||||
OpenVMS Cluster system recovery |
Recovery includes the following stages, some of which can take place in
parallel:
|
||||||||||||||||||
Application recovery | When you assess the effect of a state transition on application users, consider that the application recovery phase includes activities such as replaying a journal file, cleaning up recovery units, and users logging in again. |
OpenVMS Cluster systems based on LAN use a cluster group number and a
cluster password to allow multiple independent OpenVMS Cluster systems
to coexist on the same extended LAN and to prevent accidental access to
a cluster by unauthorized computers.
2.5.1 Cluster Group Number
The cluster group number uniquely identifies each OpenVMS Cluster system on a LAN. This number must be from 1 to 4095 or from 61440 to 65535.
Rule: If you plan to have more than one OpenVMS Cluster system on a LAN, you must coordinate the assignment of cluster group numbers among system managers.
Note: OpenVMS Cluster systems operating on CI and DSSI
do not use cluster group numbers and passwords.
2.5.2 Cluster Password
The cluster password prevents an unauthorized computer using the cluster group number, from joining the cluster. The password must be from 1 to 31 alphanumeric characters in length, including dollar signs ($) and underscores (_).
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
4477PRO_002.HTML
|