Document revision date: 19 July 1999
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

Guidelines for OpenVMS Cluster Configurations


Previous Contents Index

7.7 Creating a Cluster with a Shared FC System Disk

To configure nodes in an OpenVMS Cluster system, you must execute the CLUSTER_CONFIG.COM (or CLUSTER_CONFIG_LAN.COM) command procedure. (You can run either the full version, which provides more information about most prompts, or the brief version.)

For the purposes of CLUSTER_CONFIG, a shared Fibre Channel (FC) bus is treated like a shared SCSI bus, except that the allocation class parameters do not apply to FC. The rules for setting node allocation class and port allocation class values remain in effect when parallel SCSI storage devices are present in a configuration that includes FC storage devices.

To configure a new OpenVMS Cluster system, you must first enable clustering on a single, or standalone, system. Then you can add additional nodes to the cluster.

Example 7-10 shows how to enable clustering using CLUSTER_CONFIG_LAN_BRIEF.COM on a standalone node called FCNOD1. At the end of the procedure, FCNOD1 reboots and forms a one-node cluster.

Example 7-11 shows how to run CLUSTER_CONFIG_LAN_BRIEF.COM on FCNOD1 to add a second node, called FCNOD2, to form a two-node cluster. At the end of the procedure, the cluster is configured to allow FCNOD2 to boot off the same FC system disk as FCNOD1.

The following steps are common to both examples:

  1. Select the default option [1] for ADD.
  2. Answer Yes when CLUSTER_CONFIG_LAN.COM asks whether there will be a shared SCSI bus. SCSI in this context refers to FC as well as to parallel SCSI.
    The allocation class parameters are not affected by the presence of FC.
  3. Answer No when the procedure asks whether the node will be a satellite.

Example 7-10 Enabling Clustering on a Standalone FC Node

$ @CLUSTER_CONFIG_LAN_BRIEF 
 
                   Cluster Configuration Procedure 
                    Executing on an Alpha System 
 
    DECnet Phase IV is installed on this node. 
 
    The LAN, not DECnet, will be used for MOP downline loading. 
    This Alpha node is not currently a cluster member 
    
 
MAIN MENU 
 
   1. ADD FCNOD1 to existing cluster, or form a new cluster. 
   2. MAKE a directory structure for a new root on a system disk. 
   3. DELETE a root from a system disk. 
   4. EXIT from this procedure. 
 
Enter choice [1]: 1 
Is the node to be a clustered node with a shared SCSI bus (Y/N)? Y 
 
    Note: 
        Every cluster node must have a direct connection to every other 
        node in the cluster.  Since FCNOD1 will be a clustered node with 
        a shared SCSI bus, and Memory Channel, CI, and DSSI are not present, 
        the LAN will be used for cluster communication. 
 
Enter this cluster's group number: 511 
Enter this cluster's password: 
Re-enter this cluster's password for verification: 
 
Will FCNOD1 be a boot server [Y]? Y 
    Verifying LAN adapters in LANACP database... 
    Updating LANACP LAN server process volatile and permanent databases... 
    Note: The LANACP LAN server process will be used by FCNOD1 for boot 
          serving satellites. The following LAN devices have been found: 
    Verifying LAN adapters in LANACP database... 
 
    LAN TYPE    ADAPTER NAME    SERVICE STATUS 
    ========    ============    ============== 
    Ethernet    EWA0            ENABLED 
 
 
  CAUTION: If you do not define port allocation classes later in this 
           procedure for shared SCSI buses, all nodes sharing a SCSI bus 
           must have the same non-zero ALLOCLASS value. If multiple 
           nodes connect to a shared SCSI bus without the same allocation 
           class for the bus, system booting will halt due to the error or 
           IO AUTOCONFIGURE after boot will keep the bus offline. 
 
Enter a value for FCNOD1's ALLOCLASS parameter [0]: 5 
Does this cluster contain a quorum disk [N]? N 
    Each shared SCSI bus must have a positive allocation class value. A shared 
    bus uses a PK adapter. A private bus may use: PK, DR, DV. 
 
    When adding a node with SCSI-based cluster communications, the shared 
    SCSI port allocation classes may be established in SYS$DEVICES.DAT. 
    Otherwise, the system's disk allocation class will apply. 
 
    A private SCSI bus need not have an entry in SYS$DEVICES.DAT. If it has an 
    entry, its entry may assign any legitimate port allocation class value: 
 
       n   where n = a positive integer, 1 to 32767 inclusive 
       0   no port allocation class and disk allocation class does not apply 
      -1   system's disk allocation class applies (system parameter ALLOCLASS) 
 
    When modifying port allocation classes, SYS$DEVICES.DAT must be updated 
    for all affected nodes, and then all affected nodes must be rebooted. 
    The following dialog will update SYS$DEVICES.DAT on FCNOD1. 
 
    There are currently no entries in SYS$DEVICES.DAT for FCNOD1. 
    After the next boot, any SCSI controller on FCNOD1 will use 
    FCNOD1's disk allocation class. 
 
 
Assign port allocation class to which adapter [RETURN for none]: PKA 
Port allocation class for PKA0: 10 
 
        Port Alloclass   10    Adapter FCNOD1$PKA 
 
Assign port allocation class to which adapter [RETURN for none]: PKB 
Port allocation class for PKB0: 20 
 
        Port Alloclass   10    Adapter FCNOD1$PKA 
        Port Alloclass   20    Adapter FCNOD1$PKB 
 
  WARNING: FCNOD1 will be a voting cluster member. EXPECTED_VOTES for 
           this and every other cluster member should be adjusted at 
           a convenient time before a reboot. For complete instructions, 
           check the section on configuring a cluster in the "OpenVMS 
           Cluster Systems" manual. 
 
    Execute AUTOGEN to compute the SYSGEN parameters for your configuration 
    and reboot FCNOD1 with the new parameters. This is necessary before 
    FCNOD1 can become a cluster member. 
 
Do you want to run AUTOGEN now [Y]? Y 
 
    Running AUTOGEN -- Please wait. 
 
The system is shutting down to allow the system to boot with the 
generated site-specific parameters and installed images. 
 
The system will automatically reboot after the shutdown and the 
upgrade will be complete. 

Example 7-11 Adding a Node to a Cluster with a Shared FC System Disk

$ @CLUSTER_CONFIG_LAN BRIEF 
 
                   Cluster Configuration Procedure 
                    Executing on an Alpha System 
 
    DECnet Phase IV is installed on this node. 
 
    The LAN, not DECnet, will be used for MOP downline loading. 
    FCNOD1 is an Alpha system and currently a member of a cluster 
    so the following functions can be performed: 
 
MAIN MENU 
 
   1. ADD an Alpha node to the cluster. 
   2. REMOVE a node from the cluster. 
   3. CHANGE a cluster member's characteristics. 
   4. CREATE a duplicate system disk for FCNOD1. 
   5. MAKE a directory structure for a new root on a system disk. 
   6. DELETE a root from a system disk. 
   7. EXIT from this procedure. 
 
Enter choice [1]: 1 
 
    This ADD function will add a new Alpha node to the cluster. 
 
  WARNING: If the node being added is a voting member, EXPECTED_VOTES for 
           every cluster member must be adjusted.  For complete instructions 
           check the section on configuring a cluster in the "OpenVMS Cluster 
           Systems" manual. 
 
  CAUTION: If this cluster is running with multiple system disks and 
           common system files will be used, please, do not proceed 
           unless appropriate logical names are defined for cluster 
           common files in SYLOGICALS.COM. For instructions, refer to 
           the "OpenVMS Cluster Systems" manual. 
 
Is the node to be a clustered node with a shared SCSI bus (Y/N)? Y 
Will the node be a satellite [Y]? N 
What is the node's SCS node name? FCNOD2 
What is the node's SCSSYSTEMID number? 19.111 
    NOTE: 19.111 equates to an SCSSYSTEMID of 19567 
Will FCNOD2 be a boot server [Y]? Y 
What is the device name for FCNOD2's system root 
[default DISK$V72_SSB:]? Y 
What is the name of FCNOD2's system root [SYS10]? 
    Creating directory tree SYS10 ... 
    System root SYS10 created 
 
  CAUTION: If you do not define port allocation classes later in this 
           procedure for shared SCSI buses, all nodes sharing a SCSI bus 
           must have the same non-zero ALLOCLASS value. If multiple 
           nodes connect to a shared SCSI bus without the same allocation 
           class for the bus, system booting will halt due to the error or 
           IO AUTOCONFIGURE after boot will keep the bus offline. 
 
Enter a value for FCNOD2's ALLOCLASS parameter [5]: 
Does this cluster contain a quorum disk [N]? N 
Size of pagefile for FCNOD2 [RETURN for AUTOGEN sizing]? 
 
    A temporary pagefile will be created until resizing by AUTOGEN. The 
    default size below is arbitrary and may or may not be appropriate. 
 
Size of temporary pagefile [10000]? 
Size of swap file for FCNOD2 [RETURN for AUTOGEN sizing]? 
 
    A temporary swap file will be created until resizing by AUTOGEN. The 
    default size below is arbitrary and may or may not be appropriate. 
 
Size of temporary swap file [8000]? 
    Each shared SCSI bus must have a positive allocation class value. A shared 
    bus uses a PK adapter. A private bus may use: PK, DR, DV. 
 
    When adding a node with SCSI-based cluster communications, the shared 
    SCSI port allocation classes may be established in SYS$DEVICES.DAT. 
    Otherwise, the system's disk allocation class will apply. 
 
    A private SCSI bus need not have an entry in SYS$DEVICES.DAT. If it has an 
    entry, its entry may assign any legitimate port allocation class value: 
 
       n   where n = a positive integer, 1 to 32767 inclusive 
       0   no port allocation class and disk allocation class does not apply 
      -1   system's disk allocation class applies (system parameter ALLOCLASS) 
 
    When modifying port allocation classes, SYS$DEVICES.DAT must be updated 
    for all affected nodes, and then all affected nodes must be rebooted. 
    The following dialog will update SYS$DEVICES.DAT on FCNOD2. 
 
Enter [RETURN] to continue: 
 
    $20$DKA400:<VMS$COMMON.SYSEXE>SYS$DEVICES.DAT;1 contains port 
    allocation classes for FCNOD2. After the next boot, any SCSI 
    controller not assigned in SYS$DEVICES.DAT will use FCNOD2's 
    disk allocation class. 
 
 
Assign port allocation class to which adapter [RETURN for none]: PKA 
Port allocation class for PKA0: 11 
 
        Port Alloclass   11    Adapter FCNOD2$PKA 
 
Assign port allocation class to which adapter [RETURN for none]: PKB 
Port allocation class for PKB0: 20 
 
        Port Alloclass   11    Adapter FCNOD2$PKA 
        Port Alloclass   20    Adapter FCNOD2$PKB 
 
Assign port allocation class to which adapter [RETURN for none]: 
 
  WARNING: FCNOD2 must be rebooted to make port allocation class 
           specifications in SYS$DEVICES.DAT take effect. 
Will a disk local only to FCNOD2 (and not accessible at this time to FCNOD1) 
be used for paging and swapping (Y/N)? N 
 
    If you specify a device other than DISK$V72_SSB: for FCNOD2's 
    page and swap files, this procedure will create PAGEFILE_FCNOD2.SYS 
    and SWAPFILE_FCNOD2.SYS in the [SYSEXE] directory on the device you 
    specify. 
 
What is the device name for the page and swap files [DISK$V72_SSB:]? 
%SYSGEN-I-CREATED, $20$DKA400:[SYS10.SYSEXE]PAGEFILE.SYS;1 created 
%SYSGEN-I-CREATED, $20$DKA400:[SYS10.SYSEXE]SWAPFILE.SYS;1 created 
    The configuration procedure has completed successfully. 
 
    FCNOD2 has been configured to join the cluster. 
 
    The first time FCNOD2 boots, NETCONFIG.COM and 
    AUTOGEN.COM will run automatically. 

7.8 Online Reconfiguration

The FC interconnect can be reconfigured while the hosts are running OpenVMS. This includes the ability to:

OpenVMS does not automatically detect most FC reconfigurations. You must use the following procedure to safely perform an FC reconfiguration, and to ensure that OpenVMS has adjusted its internal data structures to match the new state:

  1. Dismount all disks that are involved in the reconfiguration.
  2. Perform the reconfiguration.
  3. Enter the following commands on each host that is connected to the Fibre Channel:


    SYSMAN> IO SCSI_PATH_VERIFY 
    SYSMAN> IO AUTOCONFIGURE 
    

The purpose of the SCSI_PATH_VERIFY command is to check each FC path in the system's IO database to determine whether the attached device has been changed. If a device change is detected, then the FC path is disconnected in the IO database. This allows the path to be reconfigured for a new device by using the IO AUTOCONFIGURE command.

Note

In the current release, the SCSI_PATH_VERIFY command only operates on FC disk devices. It does not operate on generic FC devices, such as the HSG80 command console LUN (CCL). (Generic FC devices have names such as $1$GGAnnnnn. This means that once the CCL of an HSG80 has been configured by OpenVMS with a particular device identifier, it's device identifier should not be changed.


Chapter 8
Configuring OpenVMS Clusters for Availability

Availability is the percentage of time that a computing system provides application service. By taking advantage of OpenVMS Cluster features, you can configure your OpenVMS Cluster system for various levels of availability, including disaster tolerance.

This chapter provides strategies and sample optimal configurations for building a highly available OpenVMS Cluster system. You can use these strategies and examples to help you make choices and tradeoffs that enable you to meet your availability requirements.

8.1 Availability Requirements

You can configure OpenVMS Cluster systems for different levels of availability, depending on your requirements. Most organizations fall into one of the broad (and sometimes overlapping) categories shown in Table 8-1.

Table 8-1 Availability Requirements
Availability Requirements Description
Conventional For business functions that can wait with little or no effect while a system or application is unavailable.
24 x 365 For business functions that require uninterrupted computing services, either during essential time periods or during most hours of the day throughout the year. Minimal down time is acceptable.
Disaster tolerant For business functions with stringent availability requirements. These businesses need to be immune to disasters like earthquakes, floods, and power failures.

8.2 How OpenVMS Clusters Provide Availability

OpenVMS Cluster systems offer the following features that provide increased availability:

8.2.1 Shared Access to Storage

In an OpenVMS Cluster environment, users and applications on multiple systems can transparently share storage devices and files. When you shut down one system, users can continue to access shared files and devices. You can share storage devices in two ways:

8.2.2 Component Redundancy

OpenVMS Cluster systems allow for redundancy of many components, including:

With redundant components, if one component fails, another is available to users and applications.

8.2.3 Failover Mechanisms

OpenVMS Cluster systems provide failover mechanisms that enable recovery from a failure in part of the OpenVMS Cluster. Table 8-2 lists these mechanisms and the levels of recovery that they provide.

Table 8-2 Failover Mechanisms
Mechanism What Happens if a Failure Occurs Type of Recovery
DECnet--Plus cluster alias If a node fails, OpenVMS Cluster software automatically distributes new incoming connections among other participating nodes. Manual. Users who were logged in to the failed node can reconnect to a remaining node.

Automatic for appropriately coded applications. Such applications can reinstate a connection to the cluster alias node name, and the connection is directed to one of the remaining nodes.

I/O paths With redundant paths to storage devices, if one path fails, OpenVMS Cluster software fails over to a working path, if one exists. Transparent, provided another working path is available.
Interconnect With redundant or mixed interconnects, OpenVMS Cluster software uses the fastest working path to connect to other OpenVMS Cluster members. If an interconnect path fails, OpenVMS Cluster software fails over to a working path, if one exists. Transparent.
Boot and disk servers If you configure at least two nodes as boot and disk servers, satellites can continue to boot and use disks if one of the servers shuts down or fails.

Failure of a boot server does not affect nodes that have already booted, providing they have an alternate path to access MSCP served disks.

Automatic.
Terminal servers and LAT software Attach terminals and printers to terminal servers. If a node fails, the LAT software automatically connects to one of the remaining nodes. In addition, if a user process is disconnected from a LAT terminal session, when the user attempts to reconnect to a LAT session, LAT software can automatically reconnect the user to the disconnected session.

Manual. Terminal users who were logged in to the failed node must log in to a remaining node and restart the application.
Generic batch and print queues You can set up generic queues to feed jobs to execution queues (where processing occurs) on more than one node. If one node fails, the generic queue can continue to submit jobs to execution queues on remaining nodes. In addition, batch jobs submitted using the /RESTART qualifier are automatically restarted on one of the remaining nodes.

Transparent for jobs waiting to be dispatched.

Automatic or manual for jobs executing on the failed node.

Autostart batch and print queues For maximum availability, you can set up execution queues as autostart queues with a failover list. When a node fails, an autostart execution queue and its jobs automatically fail over to the next logical node in the failover list and continue processing on another node. Autostart queues are especially useful for print queues directed to printers that are attached to terminal servers. Transparent.

Reference: For more information about cluster alias, generic queues, and autostart queues, see OpenVMS Cluster Systems.

8.2.4 Related Software Products

Table 8-3 shows a variety of related OpenVMS Cluster software products that Compaq offers to increase availability.

Table 8-3 Products That Increase Availability
Product Description
DECamds Collects and analyzes data from multiple nodes simultaneously and directs all output to a centralized DECwindows display. The analysis detects availability problems and suggests corrective actions.
Volume Shadowing for OpenVMS Makes any disk in an OpenVMS Cluster system a redundant twin of any other same-model disk in the OpenVMS Cluster.
DECevent Simplifies disk monitoring. DECevent notifies you when it detects that a disk may fail. If the OpenVMS Cluster system is properly configured, DECevent can add a new disk and start a shadow copy operation.
POLYCENTER Console Manager (PCM) Helps monitor OpenVMS Cluster operations. PCM provides a central location for coordinating and managing up to 24 console lines connected to OpenVMS nodes or HSJ/HSC console ports.

8.3 Strategies for Configuring Highly Available OpenVMS Clusters

The hardware you choose and the way you configure it has a significant impact on the availability of your OpenVMS Cluster system. This section presents strategies for designing an OpenVMS Cluster configuration that promotes availability.

8.3.1 Availability Strategies

Table 8-4 lists strategies for configuring a highly available OpenVMS Cluster. These strategies are listed in order of importance, and many of them are illustrated in the sample optimal configurations shown in this chapter.

Table 8-4 Availability Strategies
Strategy Description
Eliminate single points of failure Make components redundant so that if one component fails, the other is available to take over.
Shadow system disks The system disk is vital for node operation. Use Volume Shadowing for OpenVMS to make system disks redundant.
Shadow essential data disks Use Volume Shadowing for OpenVMS to improve data availability by making data disks redundant.
Provide shared, direct access to storage Where possible, give all nodes shared direct access to storage. This reduces dependency on MSCP server nodes for access to storage.
Minimize environmental risks Take the following steps to minimize the risk of environmental problems:
  • Provide a generator or uninterruptible power system (UPS) to replace utility power for use during temporary outages.
  • Configure extra air-conditioning equipment so that failure of a single unit does not prevent use of the system equipment.
Configure at least three nodes OpenVMS Cluster nodes require a quorum to continue operating. An optimal configuration uses a minimum of three nodes so that if one node becomes unavailable, the two remaining nodes maintain quorum and continue processing.

Reference: For detailed information on quorum strategies, see Section 11.5 and OpenVMS Cluster Systems.

Configure extra capacity For each component, configure at least one unit more than is necessary to handle capacity. Try to keep component use at 80% of capacity or less. For crucial components, keep resource use sufficiently less than 80% capacity so that if one component fails, the work load can be spread across remaining components without overloading them.
Keep a spare component on standby For each component, keep one or two spares available and ready to use if a component fails. Be sure to test spare components regularly to make sure they work. More than one or two spare components increases complexity as well as the chance that the spare will not operate correctly when needed.
Use homogeneous nodes Configure nodes of similar size and performance to avoid capacity overloads in case of failover. If a large node fails, a smaller node may not be able to handle the transferred work load. The resulting bottleneck may decrease OpenVMS Cluster performance.
Use reliable hardware Consider the probability of a hardware device failing. Check product descriptions for MTBF (mean time between failures). In general, newer technologies are more reliable.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6318PRO_010.HTML