OpenVMS Cluster Systems

Document revision date: 30 March 2001

OpenVMS Cluster Systems

Contents

Index

6.3.1.1 Serving the System Disk

Setting bit 2 to serve the system disk is important when other nodes in the cluster rely on this system being able to serve its system disk. This setting prevents obscure contention problems that can occur when a system attempts to complete I/O to a remote system disk whose system has failed.

The following sequence of events describes how a contention problem can occur if serving the system disk is disabled (that is, if bit 2 is not set):

The MSCP_SERVE_ALL setting is changed to disable serving when the system reboots.
The serving system crashes.
The client system that was executing I/O to the serving system's system disk is holding locks on resources of that system disk.
The client system starts mount verification.
The serving system attempts to boot but cannot because of the locks held on its system disk by the client system.
The client's mount verification process times out after a period of time set by the MVTIMEOUT system parameter, and the client system releases the locks. The time period could be several hours.
The serving system is able to reboot.

6.3.1.2 Setting the MSCP and TMSCP System Parameters

Use either of the following methods to set these system parameters:

Specify appropriate values for these parameters in a computer's MODPARAMS.DAT file and then run AUTOGEN.
Run the CLUSTER_CONFIG.COM or the CLUSTER_CONFIG_LAN.COM procedure, as appropriate, and choose the CHANGE option to perform these operations for disks and tapes.

With either method, the served devices become accessible when the serving computer reboots. Further, the servers automatically serve any suitable device that is added to the system later. For example, if new drives are attached to an HSC subsystem, the devices are dynamically configured.

Note: The SCSI retention command modifier is not supported by the TMSCP server. Retention operations should be performed from the node serving the tape.

6.4 MSCP I/O Load Balancing

MSCP I/O load balancing offers the following advantages:

Faster I/O response
Balanced work load among the members of an OpenVMS Cluster

Two types of MSCP I/O load balancing are provided by OpenVMS Cluster software: static and dynamic. Static load balancing occurs on both VAX and Alpha systems; dynamic load balancing occurs only on VAX systems. Both types of load balancing are based on the load capacity ratings of the server systems.

6.4.1 Load Capacity

The load capacity ratings for the VAX and Alpha systems are predetermined by Compaq. These ratings are used in the calculation of the available serving capacity for MSCP static and dynamic load balancing. You can override these default settings by specifying a different load capacity with the MSCP_LOAD parameter.

Note that the MSCP server load-capacity values (either the default value or the value you specify with MSCP_LOAD) are estimates used by the load-balancing feature. They cannot change the actual MSCP serving capacity of a system.

A system's MSCP serving capacity depends on many factors including its power, the performance of its LAN adapter, and the impact of other processing loads. The available serving capacity, which is calculated by each MSCP server as described in Section 6.4.3, is used solely to bias the selection process when a client system (for example, a satellite) chooses which server system to use when accessing a served disk.

6.4.2 Increasing the Load Capacity When FDDI is Used

When FDDI is used instead of Ethernet, the throughput is far greater. To take advantage of this greater throughput, Compaq recommends that you change the server's load-capacity default setting with the MSCP_LOAD parameter. Start with a multiplier of four. For example, the load-capacity rating of any Alpha system connected by FDDI to a disk can be set to 1360 I/O per second (4x340). Depending on your configuration and the software you are running, you may want to increase or decrease this value.

6.4.3 Available Serving Capacity

The load-capacity ratings are used by each MSCP server to calculate its available serving capacity.

The available serving capacity is calculated in the following way:

Step Calculation

1 Each MSCP server counts the read and write requests sent to it and periodically converts this value to requests per second.

2 Each MSCP server subtracts its requests per second from its load capacity to compute its available serving capacity.

Step	Calculation
1	Each MSCP server counts the read and write requests sent to it and periodically converts this value to requests per second.
2	Each MSCP server subtracts its requests per second from its load capacity to compute its available serving capacity.

6.4.4 Static Load Balancing

MSCP servers periodically send their available serving capacities to the MSCP class driver (DUDRIVER). When a disk is mounted or one fails over, DUDRIVER assigns the server with the highest available serving capacity to it. (TMSCP servers do not perform this monitoring function.) This initial assignment is called static load balancing.

6.4.5 Dynamic Load Balancing (VAX Only)

Dynamic load balancing occurs only on VAX systems. MSCP server activity is checked every 5 seconds. If activity to any server is excessive, the serving load automatically shifts to other servers in the cluster.

6.4.6 Overriding MSCP I/O Load Balancing for Special Purposes

In some configurations, you may want to designate one or more systems in your cluster as the primary I/O servers and restrict I/O traffic on other systems. You can accomplish these goals by overriding the default load-capacity ratings used by the MSCP server. For example, if your cluster consists of two Alpha systems and one VAX 6000-400 system and you want to reduce the MSCP served I/O traffic to the VAX, you can assign a low MSCP_LOAD value, such as 50, to the VAX. Because the two Alpha systems each start with a load-capacity rating of 340 and the VAX now starts with a load-capacity rating of 50, the MSCP served satellites will direct most of the I/O traffic to the Alpha systems.

6.5 Managing Cluster Disks With the Mount Utility

For locally connected disks to be accessible to other nodes in the cluster, the MSCP server software must be loaded on the computer to which the disks are connected (see Section 6.3.1). Further, each disk must be mounted with the Mount utility, using the appropriate qualifier: /CLUSTER, /SYSTEM, or /GROUP. Mounting multiple disks can be automated with command procedures; a sample command procedure, MSCPMOUNT.COM, is provided in the SYS$EXAMPLES directory on your system.

The Mount utility also provides other qualifiers that determine whether a disk is automatically rebuilt during a remount operation. Different rebuilding techniques are recommended for data and system disks.

This section describes how to use the Mount utility for these purposes.

6.5.1 Mounting Cluster Disks

To mount disks that are to be shared among all computers, specify the MOUNT command as shown in the following table.

IF... THEN...

At system startup

The disk is attached to a single system and is to be made available to all other nodes in the cluster. Use MOUNT/CLUSTER device-name on the computer to which the disk is to be mounted. The disk is mounted on every computer that is active in the cluster at the time the command executes. First, the disk is mounted locally. Then, if the mount operation succeeds, the disk is mounted on other nodes in the cluster.

The computer has no disks directly attached to it. Use MOUNT/SYSTEM device-name on the computer for each disk the computer needs to access. The disks can be attached to a single system or shared disks that are accessed by an HS x controller. Then, if the mount operation succeeds, the disk is mounted on the computer joining the cluster.

When the system is running

You want to add a disk. Use MOUNT/CLUSTER device-name on the computer to which the disk is to be mounted. The disk is mounted on every computer that is active in the cluster at the time the command executes. First, the disk is mounted locally. Then, if the mount operation succeeds, the disk is mounted on other nodes in the cluster.

IF...	THEN...
At system startup
The disk is attached to a single system and is to be made available to all other nodes in the cluster.	Use MOUNT/CLUSTER device-name on the computer to which the disk is to be mounted. The disk is mounted on every computer that is active in the cluster at the time the command executes. First, the disk is mounted locally. Then, if the mount operation succeeds, the disk is mounted on other nodes in the cluster.
The computer has no disks directly attached to it.	Use MOUNT/SYSTEM device-name on the computer for each disk the computer needs to access. The disks can be attached to a single system or shared disks that are accessed by an HS x controller. Then, if the mount operation succeeds, the disk is mounted on the computer joining the cluster.
When the system is running
You want to add a disk.	Use MOUNT/CLUSTER device-name on the computer to which the disk is to be mounted. The disk is mounted on every computer that is active in the cluster at the time the command executes. First, the disk is mounted locally. Then, if the mount operation succeeds, the disk is mounted on other nodes in the cluster.

To ensure disks are mounted whenever possible, regardless of the sequence that systems in the cluster boot (or shut down), startup command procedures should use MOUNT/CLUSTER and MOUNT/SYSTEM as described in the preceding table.

Note: Only system or group disks can be mounted across the cluster or on a subset of the cluster members. If you specify MOUNT/CLUSTER without the /SYSTEM or /GROUP qualifier, /SYSTEM is assumed. Also note that each cluster disk mounted with the /SYSTEM or /GROUP qualifier must have a unique volume label.

6.5.2 Examples of Mounting Shared Disks

Suppose you want all the computers in a three-member cluster to share a disk named COMPANYDOCS. To share the disk, one of the three computers can mount COMPANYDOCS using the MOUNT/CLUSTER command, as follows:

$ MOUNT/CLUSTER/NOASSIST $1$DUA4: COMPANYDOCS

If you want just two of the three computers to share the disk, those two computers must both mount the disk with the same MOUNT command, as follows:

$ MOUNT/SYSTEM/NOASSIST $1$DUA4: COMPANYDOCS

To mount the disk at startup time, include the MOUNT command either in a common command procedure that is invoked at startup time or in the computer-specific startup command file.

Note: The /NOASSIST qualifier is used in command procedures that are designed to make several attempts to mount disks. The disks may be temporarily offline or otherwise not available for mounting. If, after several attempts, the disk cannot be mounted, the procedure continues. The /ASSIST qualifier, which is the default, causes a command procedure to stop and query the operator if a disk cannot be mounted immediately.

6.5.3 Mounting Cluster Disks With Command Procedures

To configure cluster disks, you can create command procedures to mount them. You may want to include commands that mount cluster disks in a separate command procedure file that is invoked by a site-specific SYSTARTUP procedure. Depending on your cluster environment, you can set up your command procedure in either of the following ways:

As a separate file specific to each computer in the cluster by making copies of the common procedure and storing them as separate files
As a common computer-independent file on a shared disk

With either method, each computer can invoke the common procedure from the site-specific SYSTARTUP procedure.

Example: The MSCPMOUNT.COM file in the SYS$EXAMPLES directory on your system is a sample command procedure that contains commands typically used to mount cluster disks. The example includes comments explaining each phase of the procedure.

6.5.4 Disk Rebuild Operation

To minimize disk I/O operations (and thus improve performance) when files are created or extended, the OpenVMS file system maintains a cache of preallocated file headers and disk blocks.

If a disk is dismounted improperly---for example, if a system fails or is removed from a cluster without running SYS$SYSTEM:SHUTDOWN.COM---this preallocated space becomes temporarily unavailable. When the disk is remounted, MOUNT scans the disk to recover the space. This is called a disk rebuild operation.

6.5.5 Rebuilding Cluster Disks

On a nonclustered computer, the MOUNT scan operation for recovering preallocated space merely prolongs the boot process. In an OpenVMS Cluster system, however, this operation can degrade response time for all user processes in the cluster. While the scan is in progress on a particular disk, most activity on that disk is blocked.

Note: User processes that attempt to read or write to files on the disk can experience delays of several minutes or longer, especially if the disk contains a large number of files or has many users.

Because the rebuild operation can delay access to disks during the startup of any OpenVMS Cluster computer, Compaq recommends that procedures for mounting cluster disks use the /NOREBUILD qualifier. When MOUNT/NOREBUILD is specified, disks are not scanned to recover lost space, and users experience minimal delays while computers are mounting disks.

Reference: Section 6.5.6 provides information about rebuilding system disks. Section 9.5.1 provides more information about disk rebuilds and system-disk throughput techniques.

6.5.6 Rebuilding System Disks

Rebuilding system disks is especially critical because most system activity requires access to a system disk. When a system disk rebuild is in progress, very little activity is possible on any computer that uses that disk.

Unlike other disks, the system disk is automatically mounted early in the boot sequence. If a rebuild is necessary, and if the value of the system parameter ACP_REBLDSYSD is 1, the system disk is rebuilt during the boot sequence. (The default setting of 1 for the ACP_REBLDSYSD system parameter specifies that the system disk should be rebuilt.) Exceptions are as follows:

Setting Comments

ACP_REBLDSYSD parameter should be set to 0 on satellites. This setting prevents satellites from rebuilding a system disk when it is mounted early in the boot sequence and eliminates delays caused by such a rebuild when satellites join the cluster.

ACP_REBLDSYSD should be set to the default value of 1 on boot servers, and procedures that mount disks on the boot servers should use the /REBUILD qualifier. While these measures can make boot server rebooting more noticeable, they ensure that system disk space is available after an unexpected shutdown.

Setting	Comments
ACP_REBLDSYSD parameter should be set to 0 on satellites.	This setting prevents satellites from rebuilding a system disk when it is mounted early in the boot sequence and eliminates delays caused by such a rebuild when satellites join the cluster.
ACP_REBLDSYSD should be set to the default value of 1 on boot servers, and procedures that mount disks on the boot servers should use the /REBUILD qualifier.	While these measures can make boot server rebooting more noticeable, they ensure that system disk space is available after an unexpected shutdown.

Once the cluster is up and running, system managers can submit a batch procedure that executes SET VOLUME/REBUILD commands to recover lost disk space. Such procedures can run at a time when users would not be inconvenienced by the blocked access to disks (for example, between midnight and 6 a.m. each day). Because the SET VOLUME/REBUILD command determines whether a rebuild is needed, the procedures can execute the command for each disk that is usually mounted.

Suggestion: The procedures run more quickly and cause less delay in disk access if they are executed on:

Powerful computers
Computers that have direct access to the volume to be rebuilt

Moreover, several such procedures, each of which rebuilds a different set of disks, can be executed simultaneously.

Caution: If either or both of the following conditions are true when mounting disks, it is essential to run a procedure with SET VOLUME/REBUILD commands on a regular basis to rebuild the disks:

Disks are mounted with the MOUNT/NOREBUILD command.
The ACP_REBLDSYSD system parameter is set to 0.

Failure to rebuild disk volumes can result in a loss of free space and in subsequent failures of applications to create or extend files.

6.6 Shadowing Disks Across an OpenVMS Cluster

Volume shadowing (sometimes referred to as disk mirroring) achieves high data availability by duplicating data on multiple disks. If one disk fails, the remaining disk or disks can continue to service application and user I/O requests.

6.6.1 Purpose

Volume Shadowing for OpenVMS software provides data availability across the full range of OpenVMS configurations---from single nodes to large OpenVMS Cluster systems---so you can provide data availabililty where you need it most.

Volume Shadowing for OpenVMS software is an implementation of RAID 1 (redundant arrays of independent disks) technology. Volume Shadowing for OpenVMS prevents a disk device failure from interrupting system and application operations. By duplicating data on multiple disks, volume shadowing transparently prevents your storage subsystems from becoming a single point of failure because of media deterioration, communication path failure, or controller or device failure.

6.6.2 Shadow Sets

You can mount one, two, or three compatible disk volumes to form a shadow set, as shown in Figure 6-9. Each disk in the shadow set is known as a shadow set member. Volume Shadowing for OpenVMS logically binds the shadow set devices together and represents them as a single virtual device called a virtual unit. This means that the multiple members of the shadow set, represented by the virtual unit, appear to operating systems and users as a single, highly available disk.

Figure 6-9 Shadow Set With Three Members

6.6.3 I/O Capabilities

Applications and users read and write data to and from a shadow set using the same commands and program language syntax and semantics that are used for nonshadowed I/O operations. System managers manage and monitor shadow sets using the same commands and utilities they use for nonshadowed disks. The only difference is that access is through the virtual unit, not to individual devices.

Reference: Volume Shadowing for OpenVMS describes the shadowing product capabilities in detail.

6.6.4 Supported Devices

For a single workstation or a large data center, valid shadowing configurations include:

All MSCP compliant DSA drives
All DSSI devices
All StorageWorks SCSI disks and controllers, and some third-party SCSI devices that implement READL (read long) and WRITEL (write long) commands and use the SCSI disk driver (DKDRIVER)
Restriction: SCSI disks that do not support READL and WRITEL are restricted because these disks do not support the shadowing data repair (disk bad-block errors) capability. Thus, using unsupported SCSI disks can cause members to be removed from the shadow set.

You can shadow data disks and system disks. Thus, a system disk need not be a single point of failure for any system that boots from that disk. System disk shadowing becomes especially important for OpenVMS Cluster systems that use a common system disk from which multiple computers boot.

Volume Shadowing for OpenVMS does not support the shadowing of quorum disks. This is because volume shadowing makes use of the OpenVMS distributed lock manager, and the quorum disk must be utilized before locking is enabled.

There are no restrictions on the location of shadow set members beyond the valid disk configurations defined in the Volume Shadowing for OpenVMS Software Product Description (SPD 27.29.xx).

6.6.5 Shadow Set Limits

You can mount a maximum of 500 shadow sets (each having one, two, or three members) in a standalone or OpenVMS Cluster system. The number of shadow sets supported is independent of controller and device types. The shadow sets can be mounted as public or private volumes.

For any changes to these limits, consult the Volume Shadowing for OpenVMS Software Product Description (SPD 27.29.xx).

6.6.6 Distributing Shadowed Disks

The controller-independent design of shadowing allows you to manage shadow sets regardless of their controller connection or location in the OpenVMS Cluster system and helps provide improved data availability and very flexible configurations.

For clusterwide shadowing, members can be located anywhere in an OpenVMS Cluster system and served by MSCP servers across any supported OpenVMS Cluster interconnect, including the CI, Ethernet, DSSI, and FDDI. For example, OpenVMS Cluster systems using FDDI can be up to 40 kilometers apart, which further increases the availability and disaster tolerance of a system.

Figure 6-10 shows how shadow set member units are on line to local controllers located on different nodes. In the figure, a disk volume is local to each of the nodes ATABOY and ATAGRL. The MSCP server provides access to the shadow set members over the Ethernet. Even though the disk volumes are local to different nodes, the disks are members of the same shadow set. A member unit that is local to one node can be accessed by the remote node over the MSCP server.

Figure 6-10 Shadow Sets Accessed Through the MSCP Server

For shadow sets that are mounted on an OpenVMS Cluster system, mounting or dismounting a shadow set on one node in the cluster does not affect applications or user functions executing on other nodes in the system. For example, you can dismount the virtual unit from one node in an OpenVMS Cluster system and leave the shadow set operational on the remaining nodes on which it is mounted.

Other shadowing notes:

If an individual disk volume is already mounted as a member of an active shadow set, the disk volume cannot be mounted as a standalone disk on another node.
System disks can be shadowed. All nodes booting from shadowed system disks must:
- Have a Volume Shadowing for OpenVMS license.
- Specify the same physical member of the system disk shadow set as the boot device.
- Set shadowing system parameters to enable shadowing and specify the system disk virtual unit number.
- Mount additional physical members into the system disk shadow set early in the SYSTARUP_VMS.COM command procedure.
- Mount the disks to be used in the shadow set.

Contents

Index

privacy and legal statement

4477PRO_011.HTML