Updated: 11 December 1998 |
OpenVMS Cluster Systems
Previous | Contents | Index |
Satellite nodes can be set up to reboot automatically when recovering from system failures or power failures.
Reboot behavior varies from system to system. Many systems provide a console variable that allows you to specify which device to boot from by default. However, some systems have predefined boot "sniffers" that automatically detect a bootable device. The following table describes the rebooting conditions.
AUTOGEN includes a mechanism called feedback. This mechanism examines data collected during normal system operations, and it adjusts system parameters on the basis of the collected data whenever you run AUTOGEN with the feedback option. For example, the system records each instance of a disk server waiting for buffer space to process a disk request. Based on this information, AUTOGEN can size the disk server's buffer pool automatically to ensure that sufficient space is allocated.
Execute SYS$UPDATE:AUTOGEN.COM manually as described in the
OpenVMS System Manager's Manual.
8.7.1 Advantages
To ensure that computers are configured adequately when they first join the cluster, you can run AUTOGEN with feedback automatically as part of the initial boot sequence. Although this step adds an additional reboot before the computer can be used, the computer's performance can be substantially improved.
Compaq strongly recommends that you use the feedback option. Without feedback, it is difficult for AUTOGEN to anticipate patterns of resource usage, particularly in complex configurations. Factors such as the number of computers and disks in the cluster and the types of applications being run require adjustment of system parameters for optimal performance.
Compaq also recommends using AUTOGEN with feedback rather than the SYSGEN utility to modify system parameters, because AUTOGEN:
When a computer is first added to an OpenVMS Cluster, system parameters that control the computer's system resources are normally adjusted in several steps, as follows:
Because the first AUTOGEN operation (initiated by either
CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) is performed both in the
minimum environment and without feedback, a newly added computer may be
inadequately configured to run in the OpenVMS Cluster environment. For
this reason, you might want to implement additional configuration
measures like those described in Section 8.7.3 and Section 8.7.4.
8.7.3 Obtaining Reasonable Feedback
When a computer first boots into an OpenVMS Cluster, much of the computer's resource utilization is determined by the current OpenVMS Cluster configuration. Factors such as the number of computers, the number of disk servers, and the number of disks available or mounted contribute to a fixed minimum resource requirements. Because this minimum does not change with continued use of the computer, feedback information about the required resources is immediately valid.
Other feedback information, however, such as that influenced by normal user activity, is not immediately available, because the only "user" has been the system startup process. If AUTOGEN were run with feedback at this point, some system values might be set too low.
By running a simulated user load at the end of the first production boot, you can ensure that AUTOGEN has reasonable feedback information. The User Environment Test Package (UETP) supplied with your operating system contains a test that simulates such a load. You can run this test (the UETP LOAD phase) as part of the initial production boot, and then run AUTOGEN with feedback before a user is allowed to log in.
To implement this technique, you can create a command file like that in
step 1 of the procedure in Section 8.7.4, and submit the file to the
computer's local batch queue from the cluster common SYSTARTUP
procedure. Your command file conditionally runs the UETP LOAD phase and
then reboots the computer with AUTOGEN feedback.
8.7.4 Creating a Command File to Run AUTOGEN
As shown in the following sample file, UETP lets you specify a typical user load to be run on the computer when it first joins the cluster. The UETP run generates data that AUTOGEN uses to set appropriate system parameter values for the computer when rebooting it with feedback. Note, however, that the default setting for the UETP user load assumes that the computer is used as a timesharing system. This calculation can produce system parameter values that might be excessive for a single-user workstation, especially if the workstation has large memory resources. Therefore, you might want to modify the default user load setting, as shown in the sample file.
Follow these steps:
$! $! ***** SYS$COMMON:[SYSMGR]UETP_AUTOGEN.COM ***** $! $! For initial boot only, run UETP LOAD phase and $! reboot with AUTOGEN feedback. $! $ SET NOON $ SET PROCESS/PRIVILEGES=ALL $! $! Run UETP to simulate a user load for a satellite $! with 8 simultaneously active user processes. For a $! CI connected computer, allow UETP to calculate the load. $! $ LOADS = "8" $ IF F$GETDVI("PAA0:","EXISTS") THEN LOADS = "" $ @UETP LOAD 1 'loads' $! $! Create a marker file to prevent resubmission of $! UETP_AUTOGEN.COM at subsequent reboots. $! $ CREATE SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE $! $! Reboot with AUTOGEN to set SYSGEN values. $! $ @SYS$UPDATE:AUTOGEN SAVPARAMS REBOOT FEEDBACK $! $ EXIT |
$! $ NODE = F$GETSYI("NODE") $ IF F$SEARCH ("SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE") .EQS. "" $ THEN $ SUBMIT /NOPRINT /NOTIFY /USERNAME=SYSTEST - _$ /QUEUE='NODE'_BATCH SYS$MANAGER:UETP_AUTOGEN $ WAIT_FOR_UETP: $ WRITE SYS$OUTPUT "Waiting for UETP and AUTOGEN... ''F$TIME()'" $ WAIT 00:05:00.00 ! Wait 5 minutes $ GOTO WAIT_FOR_UETP $ ENDIF $! |
When you boot the computer, it runs UETP_AUTOGEN.COM to simulate the user load you have specified, and it then reboots with AUTOGEN feedback to set appropriate system parameter values.
This chapter provides guidelines for building OpenVMS Cluster systems that include many computers---approximately 20 or more---and describes procedures that you might find helpful.1 Typically, such OpenVMS Cluster systems include a large number of satellites.
Note that the recommendations in this chapter also can prove beneficial in some clusters with fewer than 20 computers. Areas of discussion include:
1 Refer to the OpenVMS Cluster Software Software Product Description (SPD) for configuration limitations. |
When building a new large cluster, you must be prepared to run AUTOGEN and reboot the cluster several times during the installation. The parameters that AUTOGEN sets for the first computers added to the cluster will probably be inadequate when additional computers are added. Readjustment of parameters is critical for boot and disk servers.
One solution to this problem is to run the UETP_AUTOGEN.COM command procedure (described in Section 8.7.4) to reboot computers at regular intervals as new computers or storage interconnects are added. For example, each time there is a 10% increase in the number of computers, storage, or interconnects, you should run UETP_AUTOGEN.COM. For best results, the last time you run the procedure should be as close as possible to the final OpenVMS Cluster environment.
To set up a new, large OpenVMS Cluster, follow these steps:
Step | Task |
---|---|
1 | Configure boot and disk servers using the CLUSTER_CONFIG_LAN.COM or the CLUSTER_CONFIG.COM command procedure (described in Chapter 8). |
2 | Install all layered products and site-specific applications required for the OpenVMS Cluster environment, or as many as possible. |
3 | Prepare the cluster startup procedures so that they are as close as possible to those that will be used in the final OpenVMS Cluster environment. |
4 | Add a small number of satellites (perhaps two or three) using the cluster configuration command procedure. |
5 | Reboot the cluster to verify that the startup procedures work as expected. |
6 | After you have verified that startup procedures work, run UETP_AUTOGEN.COM on every computer's local batch queue to reboot the cluster again and to set initial production environment values. When the cluster has rebooted, all computers should have reasonable parameter settings. However, check the settings to be sure. |
7 | Add additional satellites to double their number. Then rerun UETP_AUTOGEN on each computer's local batch queue to reboot the cluster, and set values appropriately to accommodate the newly added satellites. |
8 | Repeat the previous step until all satellites have been added. |
9 | When all satellites have been added, run UETP_AUTOGEN a final time on each computer's local batch queue to reboot the cluster and to set new values for the production environment. |
For best performance, do not run UETP_AUTOGEN on every computer simultaneously, because the procedure simulates a user load that is probably more demanding than that for the final production environment. A better method is to run UETP_AUTOGEN on several satellites (those with the least recently adjusted parameters) while adding new computers. This technique increases efficiency because little is gained when a satellite reruns AUTOGEN shortly after joining the cluster.
For example, if the entire cluster is rebooted after 30 satellites have
been added, few adjustments are made to system parameter values for the
28th satellite added, because only two satellites have joined the
cluster since that satellite ran UETP_AUTOGEN as part of its initial
configuration.
9.2 General Booting Considerations
Two general booting considerations, concurrent booting and minimizing
boot time, are described in this section.
9.2.1 Concurrent Booting
One of the rare times when all OpenVMS Cluster computers are simultaneously active is during a cluster reboot---for example, after a power failure. All satellites are waiting to reload the operating system, and as soon as a boot server is available, they begin to boot in parallel. This booting activity places a significant I/O load on the system disk or disks, interconnects, and boot servers.
For example, Table 9-1 shows a VAX system disk's I/O activity and elapsed time until login for a single satellite with minimal startup procedures when the satellite is the only one booting. Table 9-2 shows system disk I/O activity and time elapsed between boot server response and login for various numbers of satellites booting from a single system disk. The disk in these examples has a capacity of 40 I/O operations per second.
Note that the numbers in the tables are fabricated and are meant to provide only a generalized picture of booting activity. Elapsed time until login on satellites in any particular cluster depends on the complexity of the site-specific system startup procedures. Computers in clusters with many layered products or site-specific applications require more system disk I/O operations to complete booting operations.
Total I/O Requests to System Disk | Average System Disk I/O Operations per Second | Elapsed Time Until Login (minutes) |
---|---|---|
4200 | 6 | 12 |
Number of Satellites | I/Os Requested per Second | I/Os Serviced per Second | Elapsed Time Until Login (minutes) |
---|---|---|---|
1 | 6 | 6 | 12 |
2 | 12 | 12 | 12 |
4 | 24 | 24 | 12 |
6 | 36 | 36 | 12 |
8 | 48 | 40 | 14 |
12 | 72 | 40 | 21 |
16 | 96 | 40 | 28 |
24 | 144 | 40 | 42 |
32 | 192 | 40 | 56 |
48 | 288 | 40 | 84 |
64 | 384 | 40 | 112 |
96 | 576 | 40 | 168 |
While the elapsed times shown in Table 9-2 do not include the time
required for the boot server itself to reload, they illustrate that the
I/O capacity of a single system disk can be the limiting factor for
cluster reboot time.
9.2.2 Minimizing Boot Time
A large cluster needs to be carefully configured so that there is sufficient capacity to boot the desired number of nodes in the desired amount of time. As shown in Table 9-2, the effect of 96 satellites rebooting could induce an I/O bottleneck that can stretch the OpenVMS Cluster reboot times into hours. The following list provides a few methods to minimize boot times.
OpenVMS Cluster satellite nodes use a single LAN adapter for the initial stages of booting. If a satellite is configured with multiple LAN adapters, the system manager can specify with the console BOOT command which adapter to use for the initial stages of booting. Once the system is running, the OpenVMS Cluster uses all available LAN adapters. This flexibility allows you to work around broken adapters or network problems.
The procedures and utilities for configuring and booting satellite nodes are the same or vary only slightly between Alpha and VAX systems. These are described in Section 9.4.
In addition, VAX nodes can MOP load Alpha satellites, and Alpha nodes
can MOP load VAX satellites. Cross-architecture booting is described in
Section 10.5.
9.4 Configuring and Booting Satellite Nodes
Complete the items in the following Table 9-3 before proceeding with satellite booting.
Step | Action |
---|---|
1 |
Configure disk server LAN adapters.
Because disk-serving activity in an OpenVMS Cluster system can generate a substantial amount of I/O traffic on the LAN, boot and disk servers should use the highest-bandwidth LAN adapters in the cluster. The servers can also use multiple LAN adapters in a single system to distribute the load across the LAN adapters. The following list suggests ways to provide sufficient network bandwidth:
|
2 | If the MOP server node and system-disk server node (Alpha or VAX) are not already configured as cluster members, follow the directions in Section 8.4 for using the cluster configuration command procedure to configure each of the VAX or Alpha nodes. Include multiple boot and disk servers to enhance availability and distribute I/O traffic over several cluster nodes. |
3 | Configure additional memory for disk serving. |
4 | Run the cluster configuration procedure on the Alpha or VAX node for each satellite you want to boot into the OpenVMS Cluster. |
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
4477PRO_017.HTML
|