Document revision date: 19 July 1999 | |
Previous | Contents | Index |
Volume Shadowing for OpenVMS is a System Integrated Product (SIP) that you install at the same time that you install the operating system. However, you purchase the license and the product separately from the OpenVMS base operating system. To use the volume shadowing software, you must install the license. See the instructions included in your current OpenVMS upgrade and installation manual.
See Section 3.1 for more information about licensing Volume Shadowing for OpenVMS.
System availability is a critical requirement in most computing environments. A dependable environment enables users to interact with their system when they want and in the way they want.
A key component of overall system availability is availability or accessibility of data. Volume Shadowing for OpenVMS provides high levels of data availability by allowing shadow sets to be configured on a single-node system or on an OpenVMS Cluster system, so that continued access to data is possible despite failures in the disk media, disk drive, or disk controller. For shadow sets whose members are local to different OpenVMS Cluster nodes, if one node serving a shadow set member shuts down, the data is still accessible through an alternate node.
Although you can create a shadow set that consists of only one disk, you must mount two or more volumes in order to "shadow"---maintain multiple copies of the same data. This configuration protects against either failure of a single disk drive or deterioration of a single volume. For example, if one member fails out of a shadow set, the remaining member can be used as a source disk whose data can be accessed by applications at the same time the data is being copied to a newly mounted target disk. Once the data is copied, both disks contain identical information and the target disk becomes a complete source member of the shadow set.
Using two controllers provides a further guarantee of data availability in the event of a single-controller failure. When setting up a system with volume shadowing, you should connect each disk drive to a different controller I/O channel whenever possible. Separate connections help protect against either failure of a single controller or of the communication path used to access it.
Using an OpenVMS Cluster system (as opposed to a single-node environment) and multiple controllers provides the greatest data availability. Shadow sets can comprise either member units on different controllers or MSCP servers.
Figure 2-1 provides a qualitative, high-level classification of how you can achieve low to high levels of physical data availability in different types of configurations.
Figure 2-1 Levels of Availability
Section 2.1 describes how you can configure your shadowed system to
achieve high data availability despite physical failures.
2.1 Repair and Recovery from Failures
A common failure that makes data unavailable is a communication failure. Communication errors fall into the categories shown in Table 2-1. A host node can detect communication failures any time data is transferred between the host computer and a controller. Table 2-1 describes the types of failures and the actions the volume shadowing software takes to repair or recover from the error.
Type | Description |
---|---|
Controller error | Results from a failure in the controller. If the failure is recoverable, processing continues and data availability is not impacted. If the failure is nonrecoverable, shadow set members connected to the controller are removed from the shadow set, and processing continues with the remaining members. In configurations where disks are dual-pathed between two controllers, and one controller fails, the shadow set members fail over to the remaining controller and processing continues. |
Unit or drive error | Signifies that the mechanics or electronics in the device failed. If the failure is recoverable, processing continues. If the failure is nonrecoverable, the node that detects the error removes the device or unit from the shadow set. |
Data errors |
Results when a device detects corrupt data. Data errors usually result
from media defects that do not cause the device to be removed from a
shadow set. Depending on the severity of the data error (or the degree
of media deterioration), the controller:
In situations where data is not correctable by the controller, volume shadowing replaces the lost data by retrieving it from another shadow set member and writing the data to the new LBN of the member with the incorrect data. |
When a recoverable unit or driver failure occurs, the first node to detect the failure must decide how to recover from a failure in a manner least likely to affect the availability or consistency of the data. The node that discovers the failure determines its course of action as follows:
Handling of shadow set recovery and repair differs depending on the
type of failure that occurred and the hardware configuration. In
general, devices that are inaccessible tend to fail over to other
controllers whenever possible or are removed from the shadow set.
Errors that occur as a result of media defects can often be repaired by
copying good data from other source shadow members to the member with
the error. This repair operation is synchronized within the cluster and
with the application I/O stream.
2.2 Shadow Set Configurations
To illustrate the varying levels of data availability obtainable through Volume Shadowing for OpenVMS, this section provides a representative sample of hardware configurations. Figures 2-2 through 2-7 show possible system configurations for shadow sets.
In addition to the hardware components, these figures illustrate how the shadow set virtual unit relates to processors in the system. The hardware used to describe the sample systems, while intended to be representative, is hypothetical; they should be used only for general observations about availability and not as a suggestion for any specific configurations or products.
In all of the following examples, the shadow set members use the $allocation-class$ddcu: naming convention. The virtual unit uses the DSAn: format, where n represents a number between 0 and 9999. These naming conventions are described in more detail in Section 4.2.
Figure 2-2 presents a system with one CPU and one controller. The shadow set consists of three disk members. This configuration provides coverage against media errors and up to two-member disk failures.
Figure 2-2 Configuration of a Shadow Set (One CPU, One Controller)
Figure 2-3 presents a system with one CPU and two controllers. In this configuration, each shadow set member is connected to a different controller.
In addition to providing coverage against media errors or disk failures, this type of configuration provides continued access to data in spite of the failure of either one of the controllers.
Figure 2-3 Configuration of a Shadow Set (One CPU, Two Controllers)
Figure 2-4 presents two CPUs and three shadow set member units connected by dual paths to two controllers. The shadow set is accessible with either one or both systems operating. In this configuration, any given disk can be on line to only one controller at a time. For example, $2$DUA5 is on line (primary path) to the CPU A on the left. As a result, CPU B on the right accesses $2$DUA5 by means of the MSCP server on CPU A. If CPU A fails, $2$DUA5 fails over to the controller on CPU B.
Different members of the shadow set can fail over between controllers independently of each other. The satellite nodes access the shadow set members by means of the MSCP servers on each system. Satellites access all disks over primary paths, and failover is handled automatically.
Figure 2-4 Configuration of a Shadow Set (An OpenVMS Cluster, Dual Controllers)
Figure 2-5 presents an OpenVMS Cluster system with two systems connected to multiple disks on a DSSI interconnect. The DSA1 and DSA2 virtual units represent the two shadow sets and are accessible through either system. This configuration offers both an availability and a performance advantage. The shadowed disks in this configuration are highly available because the satellite nodes have access through either of the systems. Thus, if one system fails, the satellites can access the shadowed disks through the remaining system. In addition, this configuration offers a performance advantage by utilizing an interconnect separate from the Ethernet for I/O traffic. In general, you can expect better I/O throughput from this type of configuration than from an Ethernet-only OpenVMS Cluster system.
Figure 2-5 Configuration of a Shadow Set (Highly Available Local Area OpenVMS Cluster)
Figure 2-6 illustrates how shadowed disks can be located anywhere throughout an OpenVMS Cluster system. The figure presents a cluster system with three nodes, multiple HSJ controllers, and multiple shadow set members that are accessible by any node. The shadow sets are accessible with three nodes, with two nodes, and, in some cases, with only one node operating. The exception is if CPUA and the CPUB fail, leaving only CPUC running. In this case, access to the secondary star coupler is lost, preventing access to the DSA1 and DSA2 shadow sets. Note that Figure 2-6 also configures shadow set members on different star couplers.
Figure 2-6 Configuration of a Shadow Set (With Multiple Star Couplers and Multiple Controllers)
Figure 2-7 illustrates how the FDDI (Fiber Distributed Data Interface) interconnect allows you to shadow data disks over long distances. Members of each shadow set are configured between two distinct and widely separated locations. The OpenVMS systems and shadowed disks in both locations function as a single OpenVMS Cluster system and shadow set configuration. If a failure occurs at either site, the critical data is still available at the remaining site.
Figure 2-7 Configuration of a Shadowed FDDI Cluster
Both Alpha and VAX systems can share data on shadowed data disks, as shown in Figure 2-7.
Systems other than satellite nodes are unable to boot from disks that are located remotely across an Ethernet or FDDI LAN. Therefore, these systems require local access to their system disk. Note that this restriction limits the ability to create system disk shadow sets across an FDDI or Ethernet LAN.
This chapter explains the system management tasks required for using volume shadowing on your system, including licensing, setting system parameters, and booting.
Once you have determined how to configure your shadow set, perform the following steps:
System disks can be shadowed. All nodes booting from that system disk
must have shadowing licensed and enabled.
3.1 Licensing Volume Shadowing for OpenVMS
To use the volume shadowing product, you must purchase the license separately from the OpenVMS operating system even though the volume shadowing software is part of the OpenVMS operating system.
Volume shadowing licenses are available in two options:
Both options work on the same CPU or in an OpenVMS Cluster that contains both Alpha and VAX computers.
After licensing the OpenVMS operating system by registering a OpenVMS Product Authorization Key (PAK), you must license Volume Shadowing for OpenVMS with a separate volume shadowing PAK. The PAK provides information that defines the Volume Shadowing for OpenVMS license contract you have with Compaq Computer Corporation. Obtain a PAK from your Compaq sales representative.
When you enter information from the PAK into the online LICENSE database, the OpenVMS License Management Facility (LMF) authorizes the use of volume shadowing.
If you have a per disk license, you must register and activate a license for each shadowed disk. Starting with Volume Shadowing for OpenVMS Version 7.1, a license check for each disk that is shadowed using the per-disk volume shadowing license is included. Per-disk volume shadowing licenses apply to full shadow set members only. When the number of shadow set members exceeds the number of per disk licenses for five minutes, shadowing issues an OPCOM warning message. You can have this message also sent to an E-mail account by defining the system logical SHADOW_SERVER$MAIL_NOTIFICATION to a standard OpenVMS Mail address or a UNIX (internet) address. An invalid address will not generate a failure message.
Shadowing issues notification again 59 minutes after noncompliant shadow set members are mounted. One minute later, shadow set members are automatically removed from shadow sets until the number of members equals the number of licenses. Members are removed systematically from multiple-member shadow sets; single-member shadow sets will not be affected.
Disks that are the target of a copy operation do not consume a license unit until the copy is complete. Thus it is always possible to obtain a copy of a single-member shadow set.
If you are using capacity licenses, you must register and activate a license for Volume Shadowing for OpenVMS on each node that mounts a shadow set, including satellites in an OpenVMS Cluster system. If you do not register and activate nodes or disks that will use volume shadowing, subsequent shadow set mount operations will not succeed and will display the error messages like the one in Example 3-1.
Example 3-1 Nodes Not Registered to Use Volume Shadowing |
---|
%LICENSE-E-NOAUTH, DEC VOLSHAD use is not authorized on this node -LICENSE-F-NOLICENSE, no license is active for this software product -LICENSE-I-SYSMGR, please see your system manager |
For more information about the License Management Facility, refer to the OpenVMS Operating System Software Product Description 25.01.xx.
You can also consult the OpenVMS License Management Utility Manual.
After you register the volume shadowing PAK, you must set the shadowing
parameters on each node where you want to enable shadowing.
3.2 Setting the Volume Shadowing Parameters
Table 3-1 lists the system parameters that you can use to tailor the shadowing software on your system. These parameters were introduced in OpenVMS Version 7.1.
Parameter | Function | Range | Default | Dynamic |
---|---|---|---|---|
SHADOWING | Enables phase II of volume shadowing. See Table 3-2 for a description of parameter values. | 0, 2 1 | 0 | No |
SHADOW_MAX_COPY | Limits the number of concurrent merge or copy operations on a given node. | 0--200 | 4 | Yes |
SHADOW_MBR_TMO | Controls the amount of time the system tries to fail over physical members of a shadow set. |
1--65,535
seconds |
120 | Yes |
SHADOW_SYS_DISK | Allows system disk to be a shadow set and, optionally, enables a minimerge to occur. If a minimerge is enabled, the system must also be configured for writing to a nonshadowed, nonsystem disk of your choice. | 0, 1, 4097 1 | 0 | Yes |
SHADOW_SYS_TMO | Controls the amount of time members of a system disk shadow set have to return to the set. |
1--65,535
seconds |
120 | Yes |
SHADOW_SYS_UNIT | Contains the virtual unit number of the system disk. | 0--9999 | 0 | No |
SHADOW_SYS_WAIT | This parameter applies only to shadow sets that are currently mounted in the cluster. Controls the amount of time a booting system will wait for all members of a mounted shadow system disk to become available. |
1--65,535
seconds |
256 | Yes |
The following subsections discuss these parameters in more detail.
The SHADOWING parameter enables or disables specific phases of volume shadowing on your system.
Note that volume shadowing requires that KDM70 disk controllers run a minimum of Version 3.0 microcode.
Table 3-2 describes these settings in detail.
Setting | Effect |
---|---|
0 |
Shadowing is not enabled.
This is the default value. |
2 |
Enables host-based shadowing.
This setting provides shadowing of all MSCP compliant disks that are located on a standalone system or on an OpenVMS Cluster system. Set SHADOWING to 2 on every node that will mount a shadow set, including satellite nodes. |
The SHADOW_MAX_COPY parameter controls how many parallel copy and merge operations are allowed on a given node. (Copy and merge operations are described in Chapter 6.) This parameter provides a way to limit the number of copy and merge operations in progress at any one time.
The value of SHADOW_MAX_COPY can range from 0 to 200. The default value is specific to the OpenVMS version. You can determine the default value by looking at the parameter setting. When the value of the SHADOW_MAX_COPY parameter is 4, and you mount five multivolume shadow sets that all need a copy operation, only four copy operations can proceed. The fifth copy operation must wait until one of the first four copies completes.
Consider the following when choosing a value for the SHADOW_MAX_COPY parameter:
For example, the default value of 4 may be too high for a small node. (In particular, satellite nodes should have SHADOW_MAX_COPY set to a value of 0.) Too low a value for SHADOW_MAX_COPY unnecessarily restricts the number of operations your system can effectively handle and extends the amount of time it takes to merge all of the shadow sets.
SHADOW_MAX_COPY is a dynamic parameter. Changes to the parameter affect only future copy and merge operations; current operations (pending or already in progress) are not affected. See the OpenVMS System Manager's Manual for more information about setting dynamic parameters.
The SHADOW_MBR_TMO parameter controls the amount of time the system tries to fail over physical members of a shadow set before removing them from the set.
With the SHADOW_MBR_TMO parameter, you specify the number of seconds, from 1 to 65,535, during which recovery of a shadow set member is attempted.
The value of SHADOW_MBR_TMO should not exceed the value of the parameter MVTIMEOUT. |
If you specify 0, a default delay is used. The default delay is specific to the version of OpenVMS running on your system. For shadow sets in an OpenVMS Cluster configuration, the value of SHADOW_MBR_TMO should be set to the same value on each node.
SHADOW_MBR_TMO is a dynamic parameter that you can change on a running system. See the OpenVMS System Manager's Manual for more information about setting dynamic parameters.
Determining the correct value for SHADOW_MBR_TMO is a trade-off between rapid recovery and high availability. If rapid recovery is required, set SHADOW_MBR_TMO to a low value. This ensures that failing shadow set members are removed from the shadow set quickly and that user access to the shadow set continues. However, removal of shadow set members reduces data availability and, after the failed member is repaired, a full copy operation is required when it is mounted back into the shadow set.
If high availability is paramount, set SHADOW_MBR_TMO to a high value. This allows the shadowing software additional time to regain access to failed members. However, user access to the shadow set is stalled during the recovery process. If recovery is successful, access to the shadow set continues without the need for a full copy operation, and data availability is not degraded. Setting SHADOW_MBR_TMO to a high value may be appropriate when shadow set members are configured across LANs that require lengthy bridge recovery time.
Shadowing uses a timer to adhere to the number of seconds specified by the SHADOW_MBR_TMO parameter. For directly connected SCSI devices that have been powered down or do not answer to polling, the elapsed time before a device is removed from a shadow set can take several minutes.
A SHADOW_SYS_DISK parameter value of 1 enables shadowing of the system disk. A value of 0 disables shadowing of the system disk. A value of 4097 enables a minimerge. The default value is 0.
If you enable a minimerge of the system disk, you must also configure your system to write a dump to a nonshadowed, nonsystem disk of your choice. This is known as dump off system disk (DOSD). For more information on DOSD, see the OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems manual.
Previous | Next | Contents | Index |
privacy and legal statement | ||
5423PRO_001.HTML |