Volume Shadowing for OpenVMS

Document revision date: 19 July 1999

Volume Shadowing for OpenVMS

Contents

Index

1.5 Installation

Volume Shadowing for OpenVMS is a System Integrated Product (SIP) that you install at the same time that you install the operating system. However, you purchase the license and the product separately from the OpenVMS base operating system. To use the volume shadowing software, you must install the license. See the instructions included in your current OpenVMS upgrade and installation manual.

See Section 3.1 for more information about licensing Volume Shadowing for OpenVMS.

Chapter 2
Configuring Your System for High Data Availability

System availability is a critical requirement in most computing environments. A dependable environment enables users to interact with their system when they want and in the way they want.

A key component of overall system availability is availability or accessibility of data. Volume Shadowing for OpenVMS provides high levels of data availability by allowing shadow sets to be configured on a single-node system or on an OpenVMS Cluster system, so that continued access to data is possible despite failures in the disk media, disk drive, or disk controller. For shadow sets whose members are local to different OpenVMS Cluster nodes, if one node serving a shadow set member shuts down, the data is still accessible through an alternate node.

Although you can create a shadow set that consists of only one disk, you must mount two or more volumes in order to "shadow"---maintain multiple copies of the same data. This configuration protects against either failure of a single disk drive or deterioration of a single volume. For example, if one member fails out of a shadow set, the remaining member can be used as a source disk whose data can be accessed by applications at the same time the data is being copied to a newly mounted target disk. Once the data is copied, both disks contain identical information and the target disk becomes a complete source member of the shadow set.

Using two controllers provides a further guarantee of data availability in the event of a single-controller failure. When setting up a system with volume shadowing, you should connect each disk drive to a different controller I/O channel whenever possible. Separate connections help protect against either failure of a single controller or of the communication path used to access it.

Using an OpenVMS Cluster system (as opposed to a single-node environment) and multiple controllers provides the greatest data availability. Shadow sets can comprise either member units on different controllers or MSCP servers.

Figure 2-1 provides a qualitative, high-level classification of how you can achieve low to high levels of physical data availability in different types of configurations.

Figure 2-1 Levels of Availability

Section 2.1 describes how you can configure your shadowed system to achieve high data availability despite physical failures.

2.1 Repair and Recovery from Failures

A common failure that makes data unavailable is a communication failure. Communication errors fall into the categories shown in Table 2-1. A host node can detect communication failures any time data is transferred between the host computer and a controller. Table 2-1 describes the types of failures and the actions the volume shadowing software takes to repair or recover from the error.

Table 2-1 Types of Device Failures
Type Description

Controller error Results from a failure in the controller. If the failure is recoverable, processing continues and data availability is not impacted. If the failure is nonrecoverable, shadow set members connected to the controller are removed from the shadow set, and processing continues with the remaining members. In configurations where disks are dual-pathed between two controllers, and one controller fails, the shadow set members fail over to the remaining controller and processing continues.

Unit or drive error Signifies that the mechanics or electronics in the device failed. If the failure is recoverable, processing continues. If the failure is nonrecoverable, the node that detects the error removes the device or unit from the shadow set.

Data errors Results when a device detects corrupt data. Data errors usually result from media defects that do not cause the device to be removed from a shadow set. Depending on the severity of the data error (or the degree of media deterioration), the controller:

Corrects the error and continues.
Corrects the data and, depending on the device and controller implementation, may revector it to a new logical block number (LBN).

In situations where data is not correctable by the controller, volume shadowing replaces the lost data by retrieving it from another shadow set member and writing the data to the new LBN of the member with the incorrect data.

**Table 2-1 Types of Device Failures**
Type	Description
Controller error	Results from a failure in the controller. If the failure is recoverable, processing continues and data availability is not impacted. If the failure is nonrecoverable, shadow set members connected to the controller are removed from the shadow set, and processing continues with the remaining members. In configurations where disks are dual-pathed between two controllers, and one controller fails, the shadow set members fail over to the remaining controller and processing continues.
Unit or drive error	Signifies that the mechanics or electronics in the device failed. If the failure is recoverable, processing continues. If the failure is nonrecoverable, the node that detects the error removes the device or unit from the shadow set.
Data errors	Results when a device detects corrupt data. Data errors usually result from media defects that do not cause the device to be removed from a shadow set. Depending on the severity of the data error (or the degree of media deterioration), the controller: Corrects the error and continues. Corrects the data and, depending on the device and controller implementation, may revector it to a new logical block number (LBN). In situations where data is not correctable by the controller, volume shadowing replaces the lost data by retrieving it from another shadow set member and writing the data to the new LBN of the member with the incorrect data.

When a recoverable unit or driver failure occurs, the first node to detect the failure must decide how to recover from a failure in a manner least likely to affect the availability or consistency of the data. The node that discovers the failure determines its course of action as follows:

If no members of a shadow set can be accessed by the node, that node does not attempt to make any adjustments to the shadow set membership. Rather, it assumes that another node, which does have access to the shadow set, will make appropriate corrections.
Provided that at least one member of the shadow set is accessible by the node that detected the error, that node will attempt to recover from the failure. The node repeatedly attempts to access the failed shadow set member within the period of time specified by the system parameter SHADOW_MBR_TIMEOUT. If access to the failed disk is not established within the time specified by SHADOW_MBR_TIMEOUT, the disk is removed from the shadow set.

Handling of shadow set recovery and repair differs depending on the type of failure that occurred and the hardware configuration. In general, devices that are inaccessible tend to fail over to other controllers whenever possible or are removed from the shadow set. Errors that occur as a result of media defects can often be repaired by copying good data from other source shadow members to the member with the error. This repair operation is synchronized within the cluster and with the application I/O stream.

2.2 Shadow Set Configurations

To illustrate the varying levels of data availability obtainable through Volume Shadowing for OpenVMS, this section provides a representative sample of hardware configurations. Figures 2-2 through 2-7 show possible system configurations for shadow sets.

In addition to the hardware components, these figures illustrate how the shadow set virtual unit relates to processors in the system. The hardware used to describe the sample systems, while intended to be representative, is hypothetical; they should be used only for general observations about availability and not as a suggestion for any specific configurations or products.

In all of the following examples, the shadow set members use the $allocation-class$ddcu: naming convention. The virtual unit uses the DSAn: format, where n represents a number between 0 and 9999. These naming conventions are described in more detail in Section 4.2.

Figure 2-2 presents a system with one CPU and one controller. The shadow set consists of three disk members. This configuration provides coverage against media errors and up to two-member disk failures.

Figure 2-2 Configuration of a Shadow Set (One CPU, One Controller)

Figure 2-3 presents a system with one CPU and two controllers. In this configuration, each shadow set member is connected to a different controller.

In addition to providing coverage against media errors or disk failures, this type of configuration provides continued access to data in spite of the failure of either one of the controllers.

Figure 2-3 Configuration of a Shadow Set (One CPU, Two Controllers)

Figure 2-4 presents two CPUs and three shadow set member units connected by dual paths to two controllers. The shadow set is accessible with either one or both systems operating. In this configuration, any given disk can be on line to only one controller at a time. For example, $2$DUA5 is on line (primary path) to the CPU A on the left. As a result, CPU B on the right accesses $2$DUA5 by means of the MSCP server on CPU A. If CPU A fails, $2$DUA5 fails over to the controller on CPU B.

Different members of the shadow set can fail over between controllers independently of each other. The satellite nodes access the shadow set members by means of the MSCP servers on each system. Satellites access all disks over primary paths, and failover is handled automatically.

Figure 2-4 Configuration of a Shadow Set (An OpenVMS Cluster, Dual Controllers)

Figure 2-5 presents an OpenVMS Cluster system with two systems connected to multiple disks on a DSSI interconnect. The DSA1 and DSA2 virtual units represent the two shadow sets and are accessible through either system. This configuration offers both an availability and a performance advantage. The shadowed disks in this configuration are highly available because the satellite nodes have access through either of the systems. Thus, if one system fails, the satellites can access the shadowed disks through the remaining system. In addition, this configuration offers a performance advantage by utilizing an interconnect separate from the Ethernet for I/O traffic. In general, you can expect better I/O throughput from this type of configuration than from an Ethernet-only OpenVMS Cluster system.

Figure 2-5 Configuration of a Shadow Set (Highly Available Local Area OpenVMS Cluster)

Figure 2-6 illustrates how shadowed disks can be located anywhere throughout an OpenVMS Cluster system. The figure presents a cluster system with three nodes, multiple HSJ controllers, and multiple shadow set members that are accessible by any node. The shadow sets are accessible with three nodes, with two nodes, and, in some cases, with only one node operating. The exception is if CPUA and the CPUB fail, leaving only CPUC running. In this case, access to the secondary star coupler is lost, preventing access to the DSA1 and DSA2 shadow sets. Note that Figure 2-6 also configures shadow set members on different star couplers.

Figure 2-6 Configuration of a Shadow Set (With Multiple Star Couplers and Multiple Controllers)

Figure 2-7 illustrates how the FDDI (Fiber Distributed Data Interface) interconnect allows you to shadow data disks over long distances. Members of each shadow set are configured between two distinct and widely separated locations. The OpenVMS systems and shadowed disks in both locations function as a single OpenVMS Cluster system and shadow set configuration. If a failure occurs at either site, the critical data is still available at the remaining site.

Figure 2-7 Configuration of a Shadowed FDDI Cluster

Both Alpha and VAX systems can share data on shadowed data disks, as shown in Figure 2-7.

Systems other than satellite nodes are unable to boot from disks that are located remotely across an Ethernet or FDDI LAN. Therefore, these systems require local access to their system disk. Note that this restriction limits the ability to create system disk shadow sets across an FDDI or Ethernet LAN.

Chapter 3
Preparing to Use Volume Shadowing

This chapter explains the system management tasks required for using volume shadowing on your system, including licensing, setting system parameters, and booting.

Once you have determined how to configure your shadow set, perform the following steps:

If you are running OpenVMS VAX and phase I volume shadowing, migrate to phase II, as explained in Appendix A. Phase I shadowing is no longer available as of OpenVMS Version 6.2.
Select which of your disk drives you want to shadow. Prepare the selected volumes for mounting by physically placing the volumes in the drives (for removable media disks). Ensure the disks are not write locked.
Consider whether or not you want to initialize the volumes you have chosen to shadow. Do not initialize volumes that contain useful data. If you are creating a new shadow set, initialize one volume and give it a volume label that you want to use for the shadow set. When you later mount additional volumes into the shadow set, each volume will be initialized and will be given the same volume label automatically.
Install the Volume Shadowing for OpenVMS licenses. Section 3.1 for more information.
Set the SHADOWING parameter to enable volume shadowing on each node that will use volume shadowing. See Section 3.2 for more information.
Setting the SHADOWING parameter requires that you reboot the system.
Dismount the disk drives you selected for the shadow set and remount them (along with the additional shadow set disk drives) as shadow set members. Note that:
- You do not need to change the device volume labels and logical names.
- If you use mount command files, ensure that the commands mount the physical devices using the appropriate naming syntax for virtual units (DSAn:).
See Chapter 4 for more information on the MOUNT command.

System disks can be shadowed. All nodes booting from that system disk must have shadowing licensed and enabled.

3.1 Licensing Volume Shadowing for OpenVMS

To use the volume shadowing product, you must purchase the license separately from the OpenVMS operating system even though the volume shadowing software is part of the OpenVMS operating system.

Volume shadowing licenses are available in two options:

Per disk license, which enables you to license each disk that you plan to include in the shadow set. One example of how this option might be more cost effective is in a cluster where you intend to shadow only a small number of disks.
Capacity license (per CPU), which may be more attractive to those who have larger systems with many more disks to shadow.

Both options work on the same CPU or in an OpenVMS Cluster that contains both Alpha and VAX computers.

After licensing the OpenVMS operating system by registering a OpenVMS Product Authorization Key (PAK), you must license Volume Shadowing for OpenVMS with a separate volume shadowing PAK. The PAK provides information that defines the Volume Shadowing for OpenVMS license contract you have with Compaq Computer Corporation. Obtain a PAK from your Compaq sales representative.

When you enter information from the PAK into the online LICENSE database, the OpenVMS License Management Facility (LMF) authorizes the use of volume shadowing.

If you have a per disk license, you must register and activate a license for each shadowed disk. Starting with Volume Shadowing for OpenVMS Version 7.1, a license check for each disk that is shadowed using the per-disk volume shadowing license is included. Per-disk volume shadowing licenses apply to full shadow set members only. When the number of shadow set members exceeds the number of per disk licenses for five minutes, shadowing issues an OPCOM warning message. You can have this message also sent to an E-mail account by defining the system logical SHADOW_SERVER$MAIL_NOTIFICATION to a standard OpenVMS Mail address or a UNIX (internet) address. An invalid address will not generate a failure message.

Shadowing issues notification again 59 minutes after noncompliant shadow set members are mounted. One minute later, shadow set members are automatically removed from shadow sets until the number of members equals the number of licenses. Members are removed systematically from multiple-member shadow sets; single-member shadow sets will not be affected.

Disks that are the target of a copy operation do not consume a license unit until the copy is complete. Thus it is always possible to obtain a copy of a single-member shadow set.

If you are using capacity licenses, you must register and activate a license for Volume Shadowing for OpenVMS on each node that mounts a shadow set, including satellites in an OpenVMS Cluster system. If you do not register and activate nodes or disks that will use volume shadowing, subsequent shadow set mount operations will not succeed and will display the error messages like the one in Example 3-1.

Example 3-1 Nodes Not Registered to Use Volume Shadowing

%LICENSE-E-NOAUTH, DEC VOLSHAD use is not authorized on this node -LICENSE-F-NOLICENSE, no license is active for this software product -LICENSE-I-SYSMGR, please see your system manager

For more information about the License Management Facility, refer to the OpenVMS Operating System Software Product Description 25.01.xx.

You can also consult the OpenVMS License Management Utility Manual.

After you register the volume shadowing PAK, you must set the shadowing parameters on each node where you want to enable shadowing.

3.2 Setting the Volume Shadowing Parameters

Table 3-1 lists the system parameters that you can use to tailor the shadowing software on your system. These parameters were introduced in OpenVMS Version 7.1.

Table 3-1 Volume Shadowing Parameters
Parameter Function Range Default Dynamic

SHADOWING Enables phase II of volume shadowing. See Table 3-2 for a description of parameter values. 0, 2 ¹ 0 No

SHADOW_MAX_COPY Limits the number of concurrent merge or copy operations on a given node. 0--200 4 Yes

SHADOW_MBR_TMO Controls the amount of time the system tries to fail over physical members of a shadow set. 1--65,535
seconds 120 Yes

SHADOW_SYS_DISK Allows system disk to be a shadow set and, optionally, enables a minimerge to occur. If a minimerge is enabled, the system must also be configured for writing to a nonshadowed, nonsystem disk of your choice. 0, 1, 4097 ¹ 0 Yes

SHADOW_SYS_TMO Controls the amount of time members of a system disk shadow set have to return to the set. 1--65,535
seconds 120 Yes

SHADOW_SYS_UNIT Contains the virtual unit number of the system disk. 0--9999 0 No

SHADOW_SYS_WAIT This parameter applies only to shadow sets that are currently mounted in the cluster. Controls the amount of time a booting system will wait for all members of a mounted shadow system disk to become available. 1--65,535
seconds 256 Yes

**Table 3-1 Volume Shadowing Parameters**
Parameter	Function	Range	Default	Dynamic
SHADOWING	Enables phase II of volume shadowing. See Table 3-2 for a description of parameter values.	0, 2 ¹	0	No
SHADOW_MAX_COPY	Limits the number of concurrent merge or copy operations on a given node.	0--200	4	Yes
SHADOW_MBR_TMO	Controls the amount of time the system tries to fail over physical members of a shadow set.	1--65,535 seconds	120	Yes
SHADOW_SYS_DISK	Allows system disk to be a shadow set and, optionally, enables a minimerge to occur. If a minimerge is enabled, the system must also be configured for writing to a nonshadowed, nonsystem disk of your choice.	0, 1, 4097 ¹	0	Yes
SHADOW_SYS_TMO	Controls the amount of time members of a system disk shadow set have to return to the set.	1--65,535 seconds	120	Yes
SHADOW_SYS_UNIT	Contains the virtual unit number of the system disk.	0--9999	0	No
SHADOW_SYS_WAIT	This parameter applies only to shadow sets that are currently mounted in the cluster. Controls the amount of time a booting system will wait for all members of a mounted shadow system disk to become available.	1--65,535 seconds	256	Yes

¹All other values are reserved for use by Compaq.

The following subsections discuss these parameters in more detail.

SHADOWING

The SHADOWING parameter enables or disables specific phases of volume shadowing on your system.

Note that volume shadowing requires that KDM70 disk controllers run a minimum of Version 3.0 microcode.

Table 3-2 describes these settings in detail.

Table 3-2 SHADOWING Parameter Settings
Setting Effect

0 Shadowing is not enabled.
This is the default value.

2 Enables host-based shadowing.
This setting provides shadowing of all MSCP compliant disks that are located on a standalone system or on an OpenVMS Cluster system. Set SHADOWING to 2 on every node that will mount a shadow set, including satellite nodes.

**Table 3-2 SHADOWING Parameter Settings**
Setting	Effect
0	Shadowing is not enabled. This is the default value.
2	Enables host-based shadowing. This setting provides shadowing of all MSCP compliant disks that are located on a standalone system or on an OpenVMS Cluster system. Set SHADOWING to 2 on every node that will mount a shadow set, including satellite nodes.

SHADOW_MAX_COPY

The SHADOW_MAX_COPY parameter controls how many parallel copy and merge operations are allowed on a given node. (Copy and merge operations are described in Chapter 6.) This parameter provides a way to limit the number of copy and merge operations in progress at any one time.

The value of SHADOW_MAX_COPY can range from 0 to 200. The default value is specific to the OpenVMS version. You can determine the default value by looking at the parameter setting. When the value of the SHADOW_MAX_COPY parameter is 4, and you mount five multivolume shadow sets that all need a copy operation, only four copy operations can proceed. The fifth copy operation must wait until one of the first four copies completes.

Consider the following when choosing a value for the SHADOW_MAX_COPY parameter:

CPU power
Disk controller bandwidth
Interconnect controller bandwidth
Other work loads on the system

For example, the default value of 4 may be too high for a small node. (In particular, satellite nodes should have SHADOW_MAX_COPY set to a value of 0.) Too low a value for SHADOW_MAX_COPY unnecessarily restricts the number of operations your system can effectively handle and extends the amount of time it takes to merge all of the shadow sets.

SHADOW_MAX_COPY is a dynamic parameter. Changes to the parameter affect only future copy and merge operations; current operations (pending or already in progress) are not affected. See the OpenVMS System Manager's Manual for more information about setting dynamic parameters.

SHADOW_MBR_TMO

The SHADOW_MBR_TMO parameter controls the amount of time the system tries to fail over physical members of a shadow set before removing them from the set.

With the SHADOW_MBR_TMO parameter, you specify the number of seconds, from 1 to 65,535, during which recovery of a shadow set member is attempted.

Note

The value of SHADOW_MBR_TMO should not exceed the value of the parameter MVTIMEOUT.

If you specify 0, a default delay is used. The default delay is specific to the version of OpenVMS running on your system. For shadow sets in an OpenVMS Cluster configuration, the value of SHADOW_MBR_TMO should be set to the same value on each node.

SHADOW_MBR_TMO is a dynamic parameter that you can change on a running system. See the OpenVMS System Manager's Manual for more information about setting dynamic parameters.

Determining the correct value for SHADOW_MBR_TMO is a trade-off between rapid recovery and high availability. If rapid recovery is required, set SHADOW_MBR_TMO to a low value. This ensures that failing shadow set members are removed from the shadow set quickly and that user access to the shadow set continues. However, removal of shadow set members reduces data availability and, after the failed member is repaired, a full copy operation is required when it is mounted back into the shadow set.

If high availability is paramount, set SHADOW_MBR_TMO to a high value. This allows the shadowing software additional time to regain access to failed members. However, user access to the shadow set is stalled during the recovery process. If recovery is successful, access to the shadow set continues without the need for a full copy operation, and data availability is not degraded. Setting SHADOW_MBR_TMO to a high value may be appropriate when shadow set members are configured across LANs that require lengthy bridge recovery time.

Shadowing uses a timer to adhere to the number of seconds specified by the SHADOW_MBR_TMO parameter. For directly connected SCSI devices that have been powered down or do not answer to polling, the elapsed time before a device is removed from a shadow set can take several minutes.

SHADOW_SYS_DISK

A SHADOW_SYS_DISK parameter value of 1 enables shadowing of the system disk. A value of 0 disables shadowing of the system disk. A value of 4097 enables a minimerge. The default value is 0.

If you enable a minimerge of the system disk, you must also configure your system to write a dump to a nonshadowed, nonsystem disk of your choice. This is known as dump off system disk (DOSD). For more information on DOSD, see the OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems manual.

Contents

Index

privacy and legal statement

5423PRO_001.HTML

Volume Shadowing for OpenVMS

1.5 Installation

Chapter 2Configuring Your System for High Data Availability

2.1 Repair and Recovery from Failures

2.2 Shadow Set Configurations

Chapter 3Preparing to Use Volume Shadowing

3.1 Licensing Volume Shadowing for OpenVMS

3.2 Setting the Volume Shadowing Parameters

Chapter 2
Configuring Your System for High Data Availability

Chapter 3
Preparing to Use Volume Shadowing