[OpenVMS documentation]
[Site home] [Send comments] [Help with this site] [How to order documentation] [OpenVMS site] [Compaq site]
Updated: 11 December 1998

Guidelines for OpenVMS Cluster Configurations


Previous Contents Index

A.6.5 Step 5: Install the OpenVMS Operating System

Refer to the OpenVMS Alpha or VAX upgrade and installation manual for information about installing the OpenVMS operating system. Perform the installation once for each system disk in the OpenVMS Cluster system. In most configurations, there is a single system disk. Therefore, you need to perform this step once, using any system.

During the installation, when you are asked if the system is to be a cluster member, answer Yes. Then, complete the installation according to the guidelines provided in OpenVMS Cluster Systems.

A.6.6 Step 6: Configure Additional Systems

Use the CLUSTER_CONFIG command procedure to configure additional systems. Execute this procedure once for the second host that you have configured on the SCSI bus. (See Section A.7.1 for more information.)

A.7 Supplementary Information

The following sections provide supplementary technical detail and concepts about SCSI OpenVMS Cluster systems.

A.7.1 Running the OpenVMS Cluster Configuration Command Procedure

You execute either the CLUSTER_CONFIG.COM or the CLUSTER_CONFIG_LAN.COM command procedure to set up and configure nodes in your OpenVMS Cluster system. Your choice of command procedure depends on whether you use DECnet or the LANCP utility for booting. CLUSTER_CONFIG.COM uses DECnet; CLUSTER_CONFIG_LAN.COM uses the LANCP utility. (For information about using both procedures, see OpenVMS Cluster Systems.)

Typically, the first computer is set up as an OpenVMS Cluster system during the initial OpenVMS installation procedure (see Section A.6.5). The CLUSTER_CONFIG procedure is then used to configure additional nodes. However, if you originally installed OpenVMS without enabling clustering, the first time you run CLUSTER_CONFIG, the procedure converts the standalone system to a cluster system.

To configure additional nodes in a SCSI cluster, execute CLUSTER_CONFIG.COM for each additional node. Table A-7 describes the steps to configure additional SCSI nodes.

Table A-7 Steps for Installing Additional Nodes
Step Procedure
1 From the first node, run the CLUSTER_CONFIG.COM procedure and select the default option [1] for ADD.
2 Answer Yes when CLUSTER_CONFIG.COM asks whether you want to proceed.
3 Supply the DECnet name and address of the node that you are adding to the existing single-node cluster.
4 Confirm that this will be a node with a shared SCSI interconnect.
5 Answer No when the procedure asks whether this node will be a satellite.
6 Configure the node to be a disk server if it will serve disks to other cluster members.
7 Place the new node's system root on the default device offered.
8 Select a system root for the new node. The first node uses SYS0. Take the default (SYS10 for the first additional node), or choose your own root numbering scheme. You can choose from SYS1 to SYS n, where n is hexadecimal FFFF.
9 Select the default disk allocation class so that the new node in the cluster uses the same ALLOCLASS as the first node.
10 Confirm whether or not there is a quorum disk.
11 Answer the questions about the sizes of the page file and swap file.
12 When CLUSTER_CONFIG.COM completes, boot the new node from the new system root. For example, for SYSFF on disk DKA200, enter the following command:
 BOOT -FL FF,0 DKA200

In the BOOT command, you can use the following flags:

  • -FL indicates boot flags.
  • FF is the new system root.
  • 0 means there are no special boot requirements, such as conversational boot.

You can run the CLUSTER_CONFIG.COM procedure to set up an additional node in a SCSI cluster, as shown in Example A-2.

Example A-2 Adding a Node to a SCSI Cluster

$ @SYS$MANAGER:CLUSTER_CONFIG 
 
           Cluster Configuration Procedure 
 
 
    Use CLUSTER_CONFIG.COM to set up or change an OpenVMS Cluster configuration. 
    To ensure that you have the required privileges, invoke this procedure 
    from the system manager's account. 
 
    Enter ? for help at any prompt. 
 
            1. ADD a node to a cluster. 
            2. REMOVE a node from the cluster. 
            3. CHANGE a cluster member's characteristics. 
            4. CREATE a duplicate system disk for CLU21. 
            5. EXIT from this procedure. 
 
    Enter choice [1]: 
 
    The ADD function adds a new node to a cluster. 
 
    If the node being added is a voting member, EXPECTED_VOTES in 
    every cluster member's MODPARAMS.DAT must be adjusted, and the 
    cluster must be rebooted. 
 
 
    WARNING - If this cluster is running with multiple system disks and 
              if common system files will be used, please, do not 
              proceed unless you have defined appropriate logical 
              names for cluster common files in SYLOGICALS.COM. 
              For instructions, refer to the OpenVMS Cluster Systems 
              manual. 
 
 
              Do you want to continue [N]? y 
 
    If the new node is a satellite, the network databases on CLU21 are 
    updated. The network databases on all other cluster members must be 
    updated. 
 
    For instructions, refer to the OpenVMS Cluster Systems manual. 
 
What is the node's DECnet node name? SATURN 
What is the node's DECnet node address? 7.77 
Is SATURN to be a clustered node with a shared SCSI bus (Y/N)? y 
Will SATURN be a satellite [Y]? N 
Will SATURN be a boot server [Y]? 
 
    This procedure will now ask you for the device name of SATURN's system root. 
    The default device name (DISK$BIG_X5T5:) is the logical volume name of 
    SYS$SYSDEVICE:. 
 
What is the device name for SATURN's system root [DISK$BIG_X5T5:]? 
What is the name of SATURN's system root [SYS10]? SYS2 
    Creating directory tree SYS2 ... 
    System root SYS2 created 
 
    NOTE: 
        All nodes on the same SCSI bus must be members of the same cluster 
        and must all have the same non-zero disk allocation class or each 
        will have a different name for the same disk and data corruption 
        will result. 
 
Enter a value for SATURN's ALLOCLASS parameter [7]: 
Does this cluster contain a quorum disk [N]? 
Updating network database... 
Size of pagefile for SATURN [10000 blocks]? 
   .
   .
   .

A.7.2 Error Reports and OPCOM Messages in Multihost SCSI Environments

Certain common operations, such as booting or shutting down a host on a multihost SCSI bus, can cause other hosts on the SCSI bus to experience errors. In addition, certain errors that are unusual in a single-host SCSI configuration may occur more frequently on a multihost SCSI bus.

These errors are transient errors that OpenVMS detects, reports, and recovers from without losing data or affecting applications that are running. This section describes the conditions that generate these errors and the messages that are displayed on the operator console and entered into the error log.

A.7.2.1 SCSI Bus Resets

When a host connected to a SCSI bus first starts, either by being turned on or by rebooting, it does not know the state of the SCSI bus and the devices on it. The ANSI SCSI standard provides a method called BUS RESET to force the bus and its devices into a known state. A host typically asserts a RESET signal one or more times on each of its SCSI buses when it first starts up and when it shuts down. While this is a normal action on the part of the host asserting RESET, other hosts consider this RESET signal an error because RESET requires that the hosts abort and restart all I/O operations that are in progress.

A host may also reset the bus in the midst of normal operation if it detects a problem that it cannot correct in any other way. These kinds of resets are uncommon, but they occur most frequently when something on the bus is disturbed. For example, an attempt to hot plug a SCSI device while the device is still active (see Section A.7.6) or halting one of the hosts with Ctrl/P can cause a condition that forces one or more hosts to issue a bus reset.

A.7.2.2 SCSI Timeouts

When a host exchanges data with a device on the SCSI bus, there are several different points where the host must wait for the device or the SCSI adapter to react. In an OpenVMS system, the host is allowed to do other work while it is waiting, but a timer is started to make sure that it does not wait too long. If the timer expires without a response from the SCSI device or adapter, this is called a timeout.

There are three kinds of timeouts:

Timeout errors are not inevitable on SCSI OpenVMS Cluster systems. However, they are more frequent on SCSI buses with heavy traffic and those with two initiators. They do not necessarily indicate a hardware or software problem. If they are logged frequently, you should consider ways to reduce the load on the SCSI bus (for example, adding an additional bus).

A.7.2.3 Mount Verify

Mount verify is a condition declared by a host about a device. The host declares this condition in response to a number of possible transient errors, including bus resets and timeouts. When a device is in the mount verify state, the host suspends normal I/O to it until the host can determine that the correct device is there, and that the device is accessible. Mount verify processing then retries outstanding I/Os in a way that insures that the correct data is written or read. Application programs are unaware that a mount verify condition has occurred as long as the mount verify completes.

If the host cannot access the correct device within a certain amount of time, it declares a mount verify timeout, and application programs are notified that the device is unavailable. Manual intervention is required to restore a device to service after the host has declared a mount verify timeout. A mount verify timeout usually means that the error is not transient. The system manager can choose the timeout period for mount verify; the default is one hour.

A.7.2.4 Shadow Volume Processing

Shadow volume processing is a process similar to mount verify, but it is for shadow set members. An error on one member of a shadow set places the set into the volume processing state, which blocks I/O while OpenVMS attempts to regain access to the member. If access is regained before shadow volume processing times out, then the outstanding I/Os are reissued and the shadow set returns to normal operation. If a timeout occurs, then the failed member is removed from the set. The system manager can select one timeout value for the system disk shadow set, and one for application shadow sets. The default value for both timeouts is 20 seconds.

Note

The SCSI disconnect timeout and the default shadow volume processing timeout are the same. If the SCSI bus is heavily utilized so that disconnect timeouts may occur, it may be desirable to increase the value of the shadow volume processing timeout. (A recommended value is 60 seconds.) This may prevent shadow set members from being expelled when they experience disconnect timeout errors.

A.7.2.5 Expected OPCOM Messages in Multihost SCSI Environments

When a bus reset occurs, an OPCOM message is displayed as each mounted disk enters and exits mount verification or shadow volume processing.

When an I/O to a drive experiences a timeout error, an OPCOM message is displayed as that drive enters and exits mount verification or shadow volume processing.

If a quorum disk on the shared SCSI bus experiences either of these errors, then additional OPCOM messages may appear, indicating that the connection to the quorum disk has been lost and regained.

A.7.2.6 Error Log Basics

In the OpenVMS system, the Error Log utility allows device drivers to save information about unusual conditions that they encounter. In the past, most of these unusual conditions have happened as a result of errors such as hardware failures, software failures, or transient conditions (for example, loose cables).

If you type the DCL command SHOW ERROR, the system displays a summary of the errors that have been logged since the last time the system booted. For example:


$ SHOW ERROR
Device                           Error Count 
SALT$PKB0:                               6 
$1$DKB500:                              10 
PEA0:                                    1 
SALT$PKA0:                               9 
$1$DKA0:                                 0 
 

In this case, 6 errors have been logged against host SALT's SCSI port B (PKB0), 10 have been logged against disk $1$DKB500, and so forth.

To see the details of these errors, you can use the command ANALYZE/ERROR/SINCE=dd-mmm-yyyy:hh:mm:ss at the DCL prompt. The output from this command displays a list of error log entries with information similar to the following:


******************************* ENTRY    2337. ******************************* 
 ERROR SEQUENCE 6.                               LOGGED ON:  CPU_TYPE 00000002 
 DATE/TIME 29-MAY-1995 16:31:19.79                            SYS_TYPE 0000000D 
 
<identification information> 
 
       ERROR TYPE            03 
                                       COMMAND TRANSMISSION FAILURE 
       SCSI ID               01 
                                       SCSI ID = 1. 
       SCSI LUN              00 
                                       SCSI LUN = 0. 
       SCSI SUBLUN           00 
                                       SCSI SUBLUN = 0. 
       PORT STATUS     00000E32 
                                       %SYSTEM-E-RETRY, RETRY OPERATION 
 
<additional information> 
 

For this discussion, the key elements are the ERROR TYPE and, in some instances, the PORT STATUS fields. In this example, the error type is 03, COMMAND TRANSMISSION FAILURE, and the port status is 00000E32, SYSTEM-E-RETRY.

A.7.2.7 Error Log Entries in Multihost SCSI Environments

The error log entries listed in this section are likely to be logged in a multihost SCSI configuration, and you usually do not need to be concerned about them. You should, however, examine any error log entries for messages other than those listed in this section.

A.7.3 Restrictions and Known Problems

The OpenVMS Cluster software has the following restrictions when multiple hosts are configured on the same SCSI bus:

OpenVMS Cluster systems also place one restriction on the SCSI quorum disk, whether the disk is located on a single-host SCSI bus or a multihost SCSI bus. The SCSI quorum disk must support tagged command queuing (TCQ). This is required because of the special handling that quorum I/O receives in the OpenVMS SCSI drivers.

This restriction is not expected to be significant, because all disks on a multihost SCSI bus must support tagged command queuing (see Section A.7.7), and because quorum disks are normally not used on single-host buses.


Previous Next Contents Index

[Site home] [Send comments] [Help with this site] [How to order documentation] [OpenVMS site] [Compaq site]
[OpenVMS documentation]

Copyright © Compaq Computer Corporation 1998. All rights reserved.

Legal
6318PRO_017.HTML