OpenVMS Cluster Systems

Updated: 11 December 1998

OpenVMS Cluster Systems

Contents

Index

9.5 System-Disk Throughput

Achieving enough system-disk throughput requires some combination of the following techniques:

Technique Reference

Avoid disk rebuilds at boot time. Section 9.5.1

Offload work from the system disk. Section 9.5.2

Configure multiple system disks. Section 9.5.3

Use Volume Shadowing for OpenVMS. Section 6.6

Technique	Reference
Avoid disk rebuilds at boot time.	Section 9.5.1
Offload work from the system disk.	Section 9.5.2
Configure multiple system disks.	Section 9.5.3
Use Volume Shadowing for OpenVMS.	Section 6.6

9.5.1 Avoiding Disk Rebuilds

The OpenVMS file system maintains a cache of preallocated file headers and disk blocks. When a disk is not properly dismounted, such as when a system fails, this preallocated space becomes temporarily unavailable. When the disk is mounted again, OpenVMS scans the disk to recover that space. This is called a disk rebuild.

A large OpenVMS Cluster system must ensure sufficient capacity to boot nodes in a reasonable amount of time. To minimize the impact of disk rebuilds at boot time, consider making the following changes:

Action Result

Use the DCL command MOUNT/NOREBUILD for all user disks, at least on the satellite nodes. Enter this command into startup procedures that mount user disks. It is undesirable to have a satellite node rebuild the disk, yet this is likely to happen if a satellite is the first to reboot after it or another node fails.

Set the system parameter ACP_REBLDSYSD to 0, at least for the satellite nodes. This prevents a rebuild operation on the system disk when it is mounted implicitly by OpenVMS early in the boot process.

Avoid a disk rebuild during prime working hours by using the SET VOLUME/REBUILD command during times when the system is not so heavily used. Once the computer is running, you can run a batch job or a command procedure to execute the SET VOLUME/REBUILD command for each disk drive. User response times can be degraded during a disk rebuild operation because most I/O activity on that disk is blocked. Because the SET VOLUME/REBUILD command determines whether a rebuild is needed, the job can execute the command for every disk. This job can be run during off hours, preferably on one of the more powerful nodes.

Action	Result
Use the DCL command MOUNT/NOREBUILD for all user disks, at least on the satellite nodes. Enter this command into startup procedures that mount user disks.	It is undesirable to have a satellite node rebuild the disk, yet this is likely to happen if a satellite is the first to reboot after it or another node fails.
Set the system parameter ACP_REBLDSYSD to 0, at least for the satellite nodes.	This prevents a rebuild operation on the system disk when it is mounted implicitly by OpenVMS early in the boot process.
Avoid a disk rebuild during prime working hours by using the SET VOLUME/REBUILD command during times when the system is not so heavily used. Once the computer is running, you can run a batch job or a command procedure to execute the SET VOLUME/REBUILD command for each disk drive.	User response times can be degraded during a disk rebuild operation because most I/O activity on that disk is blocked. Because the SET VOLUME/REBUILD command determines whether a rebuild is needed, the job can execute the command for every disk. This job can be run during off hours, preferably on one of the more powerful nodes.

Caution: In large OpenVMS Cluster systems, large amounts of disk space can be preallocated to caches. If many nodes abruptly leave the cluster (for example, during a power failure), this space becomes temporarily unavailable. If your system usually runs with nearly full disks, do not disable rebuilds on the server nodes at boot time.

9.5.2 Offloading Work

In addition to the system disk throughput issues during an entire OpenVMS Cluster boot, access to particular system files even during steady-state operations (such as logging in, starting up applications, or issuing a PRINT command) can affect response times.

You can identify hot system files using a performance or monitoring tool (such as those listed in Section 1.5.2), and use the techniques in the following table to reduce hot file I/O activity on system disks:

Potential Hot Files Methods to Help

Page and swap files When you run CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM to add computers to specify the sizes and locations of page and swap files, relocate the files as follows:

Move page and swap files for computers off system disks.
Set up page and swap files for satellites on the satellites' local disks, if such disks are available.

Move these high-activity files off the system disk:

SYSUAF.DAT
NETPROXY.DAT
RIGHTSLIST.DAT
ACCOUNTNG.DAT
VMSMAIL_PROFILE.DATA
QMAN$MASTER.DAT
+VMS$OBJECTS.DAT
Layered product and other application files
Use any of the following methods:

Specify new locations for the files according to the instructions in Chapter 5.
Use caching in the HSC subsystem or in RF or RZ disks to improve the effective system-disk throughput.
Add a solid-state disk to your configuration. These devices have lower latencies and can handle a higher request rate than a regular magnetic disk. A solid-state disk can be used as a system disk or to hold system files.
Use DECram software to create RAMdisks on MOP servers to hold copies of selected hot read-only files to improve boot times. A RAMdisk is an area of main memory within a system that is set aside to store data, but it is accessed as if it were a disk.

Potential Hot Files	Methods to Help
Page and swap files	When you run CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM to add computers to specify the sizes and locations of page and swap files, relocate the files as follows: Move page and swap files for computers off system disks. Set up page and swap files for satellites on the satellites' local disks, if such disks are available.
Move these high-activity files off the system disk: SYSUAF.DAT NETPROXY.DAT RIGHTSLIST.DAT ACCOUNTNG.DAT VMSMAIL_PROFILE.DATA QMAN$MASTER.DAT +VMS$OBJECTS.DAT Layered product and other application files	Use any of the following methods: Specify new locations for the files according to the instructions in Chapter 5. Use caching in the HSC subsystem or in RF or RZ disks to improve the effective system-disk throughput. Add a solid-state disk to your configuration. These devices have lower latencies and can handle a higher request rate than a regular magnetic disk. A solid-state disk can be used as a system disk or to hold system files. Use DECram software to create RAMdisks on MOP servers to hold copies of selected hot read-only files to improve boot times. A RAMdisk is an area of main memory within a system that is set aside to store data, but it is accessed as if it were a disk.

+VAX specific

Moving these files from the system disk to a separate disk eliminates most of the write activity to the system disk. This raises the read/write ratio and, if you are using Volume Shadowing for OpenVMS, maximizes the performance of shadowing on the system disk.

9.5.3 Configuring Multiple System Disks

Depending on the number of computers to be included in a large cluster and the work being done, you must evaluate the tradeoffs involved in configuring a single system disk or multiple system disks.

While a single system disk is easier to manage, a large cluster often requires more system disk I/O capacity than a single system disk can provide. To achieve satisfactory performance, multiple system disks may be needed. However, you should recognize the increased system management efforts involved in maintaining multiple system disks.

Consider the following when determining the need for multiple system disks:

Concurrent user activity
In clusters with many satellites, the amount and type of user activity on those satellites influence system-disk load and, therefore, the number of satellites that can be supported by a single system disk. For example:

IF...	THEN...	Comments
Many users are active or run multiple applications simultaneously	The load on the system disk can be significant; multiple system disks may be required.	Some OpenVMS Cluster systems may need to be configured on the assumption that all users are constantly active. Such working conditions may require a larger, more expensive OpenVMS Cluster system that handles peak loads without performance degradation.
Few users are active simultaneously	A single system disk might support a large number of satellites.	For most configurations, the probability is low that most users are active simultaneously. A smaller and less expensive OpenVMS Cluster system can be configured for these typical working conditions but may suffer some performance degradation during peak load periods.
Most users run a single application for extended periods	A single system disk might support a large number of satellites if significant numbers of I/O requests can be directed to application data disks.	Because each workstation user in an OpenVMS Cluster system has a dedicated computer, a user who runs large compute-bound jobs on that dedicated computer does not significantly affect users of other computers in the OpenVMS Cluster system. For clustered workstations, the critical shared resource is a disk server. Thus, if a workstation user runs an I/O-intensive job, its effect on other workstations sharing the same disk server might be noticeable.

Concurrent booting activity
One of the few times when all OpenVMS Cluster computers are simultaneously active is during a cluster reboot. All satellites are waiting to reload the operating system, and as soon as a boot server is available, they begin to boot in parallel. This booting activity places a significant I/O load on the boot server, system disk, and interconnect.
Note: You can reduce overall cluster boot time by configuring multiple system disks and by distributing system roots for computers evenly across those disks. This technique has the advantage of increasing overall system disk I/O capacity, but it has the disadvantage of requiring additional system management effort. For example, installation of layered products or upgrades of the OpenVMS operating system must be repeated once for each system disk.
System management
Because system management work load increases as separate system disks are added and does so in direct proportion to the number of separate system disks that need to be maintained, you want to minimize the number of system disks added to provide the required level of performance.

Volume Shadowing for OpenVMS is an alternative to creating multiple system disks. Volume shadowing increases the read I/O capacity of a single system disk and minimizes the number of separate system disks that have to be maintained because installations or updates need only be applied once to a volume-shadowed system disk. For clusters with substantial system disk I/O requirements, you can use multiple system disks, each configured as a shadow set.

Cloning the system disk is a way to manage multiple system disks. To clone the system disk:

Create a system disk (or shadow set) with roots for all OpenVMS Cluster nodes.
Use this as a master copy, and perform all software upgrades on this system disk.
Back up the master copy to the other disks to create the cloned system disks.
Change the volume names so they are unique.
If you have not moved system files off the system disk, you must have the SYLOGICALS.COM startup file point to system files on the master system disk.
Before an upgrade, be sure to save any changes you need from the cloned disks since the last upgrade, such as MODPARAMS.DAT and AUTOGEN feedback data, accounting files for billing, and password history.

9.6 Conserving System Disk Space

The essential files for a satellite root take up very little space, so that more than 96 roots can easily fit on a single system disk. However, if you use separate dump files for each satellite node or put page and swap files for all the satellite nodes on the system disk, you quickly run out of disk space.

9.6.1 Techniques

To avoid running out of disk space, set up common dump files for all the satellites or for groups of satellite nodes. For debugging purposes, it is best to have separate dump files for each MOP and disk server. Also, you can use local disks on satellite nodes to hold page and swap files, instead of putting them on the system disk. In addition, move page and swap files for MOP and disk servers off the system disk.

Reference: See Section 10.8 to plan a strategy for managing dump files.

9.7 Adjusting System Parameters

As an OpenVMS Cluster system grows, certain data structures within OpenVMS need to grow in order to accommodate the large number of nodes. If growth is not possible (for example, because of a shortage of nonpaged pool) this will induce intermittent problems that are difficult to diagnose.

You should run AUTOGEN with FEEDBACK frequently as a cluster grows, so that settings for many parameters can be adjusted. Refer to Section 8.7 for more information about running AUTOGEN.

In addition to running AUTOGEN with FEEDBACK, you should check and manually adjust the following parameters:

SCSCONNCNT
SCSBUFFCNT
SCSRESPCNT
CLUSTER_CREDITS

9.7.1 The SCSCONNCNT Parameter

Description: The SCSCONNCNT parameter controls the number of connection descriptor table (CDT) entries allocated at boot time. The CDTs are used for the different connections a node makes to other OpenVMS Cluster nodes for tasks such as general OpenVMS Cluster system coordination and disk and tape serving.

Default: The default value is 40. An additional 200 entries are allocated in the connection descriptor list (CDL) by OpenVMS to avoid running out. Once the initial CDTs are used, OpenVMS can create up to 200 more entries.

Symptoms of entry shortages: The default value typically becomes insufficient when the OpenVMS Cluster configuration grows to between 50 and 70 nodes. A shortage is most likely to occur on OpenVMS Cluster nodes that are disk servers, tape servers, or both. Signs of shortage are when nodes are unable to join the OpenVMS Cluster and when nodes see disks or tapes served from some of the servers, but not others.

How to determine entries in use: Use the System Dump Analyzer (SDA) utility as follows to check the number of CDT entires in use on a given OpenVMS Cluster node:

$ ANALYZE/SYSTEM VAX/VMS System Analyzer SDA> SHOW CONNECTIONS --- CDT Summary Page --- CDT Address Local Process Connection ID State... ----------- ------------- ------------- -----... 8044EE70 SCS$DIRECTORY FA1F0000 listen... 8044EFD0 MSCP$TAPE FA1F0001 listen... 8044F130 MSCP$DISK FA1F0002 listen... 8044F290 VMS$VAXcluster FA1F0003 listen... 80450050 VMS$TAPE_CL_DRVR FA24000D open... 804501B0 VMS$DISK_CL_DRVR FA1F000E open... . . . Number of free CDT's: 107

To determine the number of connections in use, count the lines of output to see whether the total is close to the value of SCSCONNCNT plus 200.

Note: The Number of free CDT's: line is not useful in determining whether you are close to running out of entries. This is because the line does not include any of the extra slots that OpenVMS allocates as a cushion, unless the slots were used and freed over time.

How to resolve shortages: If the value of SCSCONNCNT is insufficient:

Add a line such as the following to MODPARAMS.DAT, specifying a value appropriate for your configuration. The number you enter should be the current usage for that node, or slightly higher if you anticipate growth.
MIN_SCSCONNCNT = 300
Note: Remember that the 200-slot cushion is available and can be used if you are not short of nonpaged pool.
Run AUTOGEN and reboot.

9.7.2 The SCSBUFFCNT Parameter (VAX Only)

Note: On Alpha systems, the SCS buffers are allocated as needed, and the SCSBUFFCNT parameter is reserved for OpenVMS use only.

Description: On VAX systems, the SCSBUFFCNT parameter controls the number of buffer descriptor table (BDT) entries that describe data buffers used in block data transfers between nodes.

Symptoms of entry shortages: A shortage of entries affects performance, most likely affecting nodes that perform MSCP serving.

How to determine a shortage of BDT entries: Use the SDA utility (or the Show Cluster utility) to identify systems that have waited for BDT entries.

SDA> READ SYS$SYSTEM:SCSDEF %SDA-I-READSYM, reading symbol table SYS$COMMON:[SYSEXE]SCSDEF.STB;1 SDA> EXAM @SCS$GL_BDT + CIBDT$L_QBDT_CNT 8046BB6C: 00000000 "...." SDA>

How to resolve shortages: If the SDA EXAMINE command displays a nonzero value, BDT waits have occurred. If the number is nonzero and continues to increase during normal operations, increase the value of SCSBUFFCNT.

9.7.3 The SCSRESPCNT Parameter

Description: The SCSRESPCNT parameter controls the number of response descriptor table (RDT) entries available for system use. An RDT entry is required for every in-progress message exchange between two nodes.

Symptoms of entry shortages: A shortage of entries affects performance, since message transmissions must be delayed until a free entry is available.

How to determine a shortage of RDT entries: Use the SDA utility as follows to check each system for requests that waited because there were not enough free RDTs.

SDA> READ SYS$SYSTEM:SCSDEF %SDA-I-READSYM, reading symbol table SYS$COMMON:[SYSEXE]SCSDEF.STB;1 SDA> EXAM @SCS$GL_RDT + RDT$L_QRDT_CNT 8044DF74: 00000000 "...." SDA>

How to resolve shortages: If the SDA EXAMINE command displays a nonzero value, RDT waits have occurred. If you find a count that tends to increase over time under normal operations, increase SCSRESPCNT.

9.7.4 The CLUSTER_CREDITS Parameter

Description: The CLUSTER_CREDITS parameter specifies the number of per-connection buffers a node allocates to receiving VMS$VAXcluster communications. This system parameter is not dynamic; that is, if you change the value, you must reboot the node on which you changed it.

Default: The default value is 10. The default value may be insufficient for a cluster that has very high locking rates.

Symptoms of cluster credit problem: A shortage of credits affects performance, since message transmissions are delayed until free credits are available. These are visible as credit waits in the SHOW CLUSTER display.

How to determine whether credit waits exist: Use the SHOW CLUSTER utility as follows:

Run SHOW CLUSTER/CONTINUOUS.
Type REMOVE SYSTEM/TYPE=HS.
Type ADD LOC_PROC, CR_WAIT.
Type SET CR_WAIT/WIDTH=10.
Check to see whether the number of CR_WAITS (credit waits) logged against the VMS$VAXcluster connection for any remote node is incrementing regularly. Ideally, credit waits should not occur. However, occasional waits under very heavy load conditions are acceptable.

How to resolve incrementing credit waits:

If the number of CR_WAITS is incrementing more than once per minute, perform the following steps:

Increase the CLUSTER_CREDITS parameter on the node against which they are being logged by five. The parameter should be modified on the remote node, not on the node which is running SHOW CLUSTER.
Reboot the node.

Note that it is not necessary for the CLUSTER_CREDITS parameter to be the same on every node.

9.8 Minimize Network Instability

Network instability also affects OpenVMS Cluster operations. Table 9-9 lists techniques to minimize typical network problems.

Table 9-9 Techniques to Minimize Network Problems
Technique Recommendation

Adjust the RECNXINTERVAL parameter. The RECNXINTERVAL system parameter specifies the number of seconds the OpenVMS Cluster system waits when it loses contact with a node, before removing the node from the configuration. Many large OpenVMS Cluster configurations operate with the RECNXINTERVAL parameter set to 40 seconds (the default value is 20 seconds).
Raising the value of RECNXINTERVAL can result in longer perceived application pauses, especially when the node leaves the OpenVMS Cluster system abnormally. The pause is caused by the connection manager waiting for the number of seconds specified by RECNXINTERVAL.

Protect the network. Treat the LAN as if it was a part of the OpenVMS Cluster system. For example, do not allow an environment in which a random user can disconnect a ThinWire segment to attach a new PC while 20 satellites hang.

Choose your hardware and configuration carefully. Certain hardware is not suitable for use in a large OpenVMS Cluster system.

Some network components can appear to work well with light loads, but are unable to operate properly under high traffic conditions. Improper operation can result in lost or corrupted packets that will require packet retransmissions. This reduces performance and can affect the stability of the OpenVMS Cluster configuration.
Beware of bridges that cannot filter and forward at full line rates and repeaters that do not handle congested conditions well.
Refer to Guidelines for OpenVMS Cluster Configurations to determine appropriate OpenVMS Cluster configurations and capabilities.

Use the LAVC$FAILURE_ANALYSIS facility. See Section D.5 for assistance in the isolation of network faults.

**Table 9-9 Techniques to Minimize Network Problems**
Technique	Recommendation
Adjust the RECNXINTERVAL parameter.	The RECNXINTERVAL system parameter specifies the number of seconds the OpenVMS Cluster system waits when it loses contact with a node, before removing the node from the configuration. Many large OpenVMS Cluster configurations operate with the RECNXINTERVAL parameter set to 40 seconds (the default value is 20 seconds). Raising the value of RECNXINTERVAL can result in longer perceived application pauses, especially when the node leaves the OpenVMS Cluster system abnormally. The pause is caused by the connection manager waiting for the number of seconds specified by RECNXINTERVAL.
Protect the network.	Treat the LAN as if it was a part of the OpenVMS Cluster system. For example, do not allow an environment in which a random user can disconnect a ThinWire segment to attach a new PC while 20 satellites hang.
Choose your hardware and configuration carefully.	Certain hardware is not suitable for use in a large OpenVMS Cluster system. Some network components can appear to work well with light loads, but are unable to operate properly under high traffic conditions. Improper operation can result in lost or corrupted packets that will require packet retransmissions. This reduces performance and can affect the stability of the OpenVMS Cluster configuration. Beware of bridges that cannot filter and forward at full line rates and repeaters that do not handle congested conditions well. Refer to Guidelines for OpenVMS Cluster Configurations to determine appropriate OpenVMS Cluster configurations and capabilities.
Use the LAVC$FAILURE_ANALYSIS facility.	See Section D.5 for assistance in the isolation of network faults.

9.9 DECnet Cluster Alias

You should define a cluster alias name for the OpenVMS Cluster to ensure that remote access will be successful when at least one OpenVMS Cluster member is available to process the client program's requests.

The cluster alias acts as a single network node identifier for an OpenVMS Cluster system. Computers in the cluster can use the alias for communications with other computers in a DECnet network. Note that it is possible for nodes running DECnet for OpenVMS to have a unique and separate cluster alias from nodes running DECnet--Plus. In addition, clusters running DECnet--Plus can have one cluster alias for VAX, one for Alpha, and another for both.

Note: A single cluster alias can include nodes running either DECnet for OpenVMS or DECnet--Plus, but not both. Also, an OpenVMS Cluster running both DECnet for OpenVMS and DECnet--Plus requires multiple system disks (one for each).

Reference: See Chapter 4 for more information about setting up and using a cluster alias in an OpenVMS Cluster system.

Contents

Index

Legal

 
4477PRO_019.HTML