Document revision date: 19 July 1999 | |
Previous | Contents | Index |
To obtain the device names of all members of a shadow set, you must
make a series of calls to $GETDVI. In your first call to $GETDVI, you
can specify either the virtual unit that represents the shadow set or
the device name of a member of the shadow set.
5.6.2.1 Virtual Unit Names
If your first call specifies the name of the virtual unit, the item
list should contain a DVI$_SHDW_NEXT_MBR_NAME item descriptor into
which $GETDVI returns the name of the lowest-numbered member of the
shadow set. The devnam argument of the next call to
$GETDVI should specify the device name returned in the previous call's
DVI$_SHDW_NEXT_MBR_NAME item descriptor. This second call's item list
should contain a DVI$_SHDW_NEXT_MBR_NAME item descriptor to receive the
name of the next-highest-numbered unit in the shadow set. You should
repeat these calls to $GETDVI until $GETDVI returns a null string,
which means that there are no more members in the shadow set.
5.6.2.2 Member Unit Names
If your first call specifies the device name of a shadow set member, you must determine the name of the virtual unit that represents the shadow set before you can obtain the device names of all members contained in the shadow set. Therefore, if your first call specifies a member, it should also specify an item list that contains a DVI$_SHDW_MASTER_NAME item descriptor. $GETDVI returns the name of the virtual unit that represents the shadow set into this descriptor. You can now make the series of calls to $GETDVI described in Section 5.6.2.1. The devnam argument of each call specifies the name of the device returned in the previous call's DVI$_SHDW_NEXT_MBR_NAME item descriptor. You repeat these calls until $GETDVI returns a null string, indicating that there are no more members in the shadow set.
Volume shadowing performs four basic functions. The two most important, as with any disk I/O subsystem, are to satisfy read and write requests. The other two functions, copy and merge, are required for shadow set maintenance.
Copy and merge operations are the cornerstone of achieving data availability. Under certain circumstances, Volume Shadowing for OpenVMS must perform a copy or a merge operation to ensure that corresponding LBNs on all shadow set members contain the same information. Although volume shadowing automatically performs these operations, this chapter provides an overview of their operation.
Copy and merge operations occur at the same time that applications and
user processes read and write to active shadow set members, thereby
having a minimal effect on current application processing.
6.1 Shadow Set Consistency
During the life of a shadow set, the state of any shadow set member relative to the rest of the members of the shadow set can vary. The shadow set is considered to be in a steady state when all of its members are known to contain identical data. Changes in the composition of the shadow set are inevitable because:
For example, suppose an operator dismounts a member of a shadow set and then remounts the member back into the shadow set. During the member's absence, the remaining members of the shadow set may have experienced write operations. Thus, the information on the member being remounted into the shadow set will differ from the information on the rest of the shadow set. Therefore a copy operation is required.
As another example, consider a situation where a shadow set is mounted by several systems in an OpenVMS Cluster configuration. If one of those systems fails, the data on the members of the shadow set may differ because of outstanding or incomplete write operations issued by the failed system. The shadowing software resolves this situation by performing a merge operation.
In any event, copy and merge operations allow volume shadowing to preserve the consistency of the data written to the shadow set. A shadow set is considered to be in a transient state when one or more of its members are undergoing a copy or a merge operation. Additionally, volume shadowing maintains shadow set consistency by:
Volume shadowing uses two internal mechanisms to coordinate shadow set consistency:
Table 6-1 lists some of the information contained in the SCB.
SCB Information | Function |
---|---|
Volume label | Identifies a unique name for the volume. Every member of a shadow set must use the same volume label. |
BACKUP revision number | A BACKUP/IMAGE restoration rearranges the location of data on a volume and sets a revision number to record this change. The Mount utility (MOUNT) checks the revision number of the proposed shadow set member against the numbers on current or other proposed shadow set members. If the revision number differs, the shadowing software determines whether a copy or merge operation is required to bring the data on the less current members up to date. |
Volume shadowing generation number | When a member joins a shadow set, it is marked with a volume shadowing generation number. You can erase the generation number by using the /OVERRIDE=SHADOW_MEMBERSHIP qualifier with the MOUNT command. |
Mount and dismount status | The SCB mount status field is used as a flag that is set when a volume is mounted and cleared when it is dismounted. There is also a count of the number of nodes that have mounted the shadow set write-enabled. The MOUNT command checks this field when a disk is mounted. If the flag is set, this indicates that the disk volume was incorrectly dismounted. This will occur in the event of system failure. When mounting shadow sets that were incorrectly dismounted, or where the write count filed is not correct, the shadowing software automatically initiates merge operations. |
Upon receiving a command to mount a shadow set, volume shadowing immediately determines whether a copy or a merge operation is required; if so, the volume shadowing software automatically performs the operation to reconcile data differences. If you are not sure which disks might be targets of copy operations, you can specify the /CONFIRM or /NOCOPY qualifiers when you use the MOUNT command. To disable performing any copy operations, use the /NOCOPY qualifier. If you mount a shadow set interactively, use the /CONFIRM qualifier to instruct MOUNT to display the targets of copy operations and request permission before the operations are performed.
When you dismount an individual shadow set member, you produce a situation similar to a hardware disk failure. Because files remain open on the virtual unit, the removed physical unit is marked as not being properly dismounted.
After one of the devices is removed from a shadow set, the remaining
shadow set members have their generation number incremented,
identifying them as being more current than the former shadow set
member. This generation number aids in determining the correct copy
operation if you remount the member into a shadow set.
6.2 Copy Operations
The purpose of a copy operation is to duplicate data on a source disk to a target disk. At the end of a copy operation, both disks contain identical information, and the target disk becomes a complete member of the shadow set. Read and write access to the shadow set continues while a disk or disks are undergoing a copy operation.
The DCL command MOUNT initiates a copy operation when a disk is added to an existing shadow set. A copy operation is simple in nature: A source disk is read and the data is written to the target disk. This is usually done in multiple block increments referred to as LBN ranges. In an OpenVMS Cluster environment, all systems that have the shadow set mounted know about the target disk and include it as part of the shadow set. However, only one of the OpenVMS systems actually manages the copy operation.
Two complexities characterize the copy operation:
Volume Shadowing for OpenVMS handles these situations differently depending on the operating system version number and the hardware configuration. For systems running software prior to OpenVMS Version 5.5--2, the copy operation is performed by an OpenVMS node and is known as an unassisted copy operation (see Section 6.2.1).
With Version 5.5--2 and later, the copy operation includes enhancements for shadow set members that are configured on controllers that implement new copy capabilities. These enhancements enable the controllers to perform the copy operation and are referred to as assisted copies (see Section 6.2.2).
Volume Shadowing for OpenVMS supports both assisted and unassisted
shadow sets in the same cluster. Whenever you create a shadow set, add
members to an existing shadow set, or boot a system, the shadowing
software reevaluate's each device in the changed configuration to
determine whether it is capable of supporting the copy assist.
6.2.1 Unassisted Copy Operations
Unassisted copy operations are performed by an OpenVMS system. The actual transfer of data from the source member to the target is done through host node memory. Although unassisted copy operations are not CPU intensive, they are I/O intensive and consume a small amount of CPU bandwidth on the node that is managing the copy. An unassisted copy operation also consumes interconnect bandwidth.
On the system that manages the copy operation, user and copy I/Os compete evenly for the available I/O bandwidth. For other nodes in the cluster, user I/Os proceed normally and contend for resources in the controller with all the other nodes. Note that the copy operation may take longer as the user I/O load increases.
The volume shadowing software performs an unassisted copy operation when it is not possible to use the assisted copy feature (see Section 6.2.2). The most common cause of an unassisted copy operation is when the source and target disk or disks are not on line to the same controller subsystem. For unassisted copy operations, two disks can be active targets of an unassisted copy operation simultaneously, if the members are added to the shadow set on the same command line. Disks participating in an unassisted copy operation may be on line to any controller anywhere in a cluster.
During an unassisted copy operation, the concept of a copy fence is created---the fence moves across the disk, logically separating the copied and uncopied LBN areas. The node that is managing the copy operation knows the precise location of the fence and periodically notifies the other nodes in the cluster of the fence location. Thus, if the node performing the copy operation shuts down, another node can continue the operation without restarting at the beginning.
Read I/O requests to either side of the copy fence are serviced only from a source shadow set member.
Write I/O requests, below the fence, are issued in parallel to all members of the shadow set.
Write I/O requests, above the fence, are completed first to source members, then to copy target members.
The time and amount of I/O required to complete an unassisted copy
operation depends heavily on the similarities of the data on the source
and target disks. It can take at least two and a half times longer to
copy a member containing dissimilar data than it does to complete a
copy operation on a member containing similar data.
6.2.2 Assisted Copy Operations
Unlike an unassisted copy, an assisted copy does not transfer data through the host node memory. The actual transfer of data is performed within the controller, by direct disk-to-disk data transfers, without having the data pass through host node memory. Thus, the assisted copy decreases the impact on the system, the I/O bandwidth consumption, and the time required for copy operations.
Shadow set members must be accessed from the same controller in order to take advantage of the assisted copy. The shadowing software controls the copy operation by using special MSCP copy commands, called disk copy data (DCD) commands, to instruct the controller to copy specific ranges of LBNs. For an assisted copy, only one disk can be an active target for a copy at a time.
For OpenVMS Cluster configurations, the node that is managing the copy operation issues an MSCP DCD command to the controller for each LBN range. The controller then performs the disk-to-disk copy, thus avoiding consumption of interconnect bandwidth.
By default, the Volume Shadowing for OpenVMS software (beginning with OpenVMS Version 5.5--2) and the controller automatically enable the copy assist if the source and target disks are accessed through the same HSC or HSJ controller.
Shadowing automatically disables the copy assist if:
See Section 6.4 for information about disabling and reenabling the
assisted copy capability.
6.3 Merge Operations
The purpose of a merge operation is to compare data on shadow set members and to ensure that inconsistencies are resolved. A merge operation is initiated when a system failure results in the possibility of incomplete writes. For example, if a write request is made to a shadow set but the system fails before a completion status is returned from all the shadow set members, it is possible that:
The exact timing of the failure during the original write request defines which of these three scenarios results. When the system recovers, however, it is essential that corresponding LBNs on each shadow set member contain the same data (old or new). Thus, the issue here is not one of data availability, but rather of reconciling potential differences among shadow set members. Once the data on all disks is made identical, application data can be reconciled, if necessary, either by the user reentering the data or by database recovery and application journaling techniques.
The merge operation is managed by one of the OpenVMS systems that has the shadow set mounted. The members of a shadow set are physically compared to each other to ensure that they contain the same data. This is done by performing a block-by-block comparison of the entire volume. As the merge proceeds, any blocks that are different are made the same. --either both old or new---by means of a copy operation. Because the shadowing software does not know which member contains newer data, any full member can be the source member of the merge operation.
The shadowing software always selects one member as a logical master for any merge operation, across the OpenVMS Cluster. Any difference in data is resolved by a propagation of the information from the merge master to all the other members.
The system responsible for doing the merge operation on a given shadow set, updates the merge fence for this shadow set after a range of LBNs is reconciled. This fence "proceeds" across the disk and separates the merged and unmerged portions of the shadow set.
Application read I/O requests to the merged side of the fence can be satisfied by any source member of the shadow set. Application read I/O requests to the unmerged side of the fence are also satisfied by any source member of the shadow set; however, any potential data differences---discovered by doing a data compare operation---are corrected on all members of the shadow set before returning the data to the user or application that requested it.
This method of dynamic correction of data inconsistencies during read requests allows a shadow set member to fail at any point during the merge operation without impacting data availability.
Volume Shadowing for OpenVMS supports both assisted and unassisted
shadow sets in the same cluster. Whenever you create a shadow set, add
members to an existing shadow set, or boot a system, the shadowing
software reevaluates each device in the changed configuration to
determine whether it is capable of supporting the merge assist.
6.3.1 Unassisted Merge Operations
For systems running software prior to OpenVMS Version 5.5--2, the merge operation is performed by the system and is known as an unassisted merge operation.
To ensure minimal impact on user I/O requests, volume shadowing implements a mechanism that causes the merge operation to give priority to user and application I/O requests.
Performing merge operations as a background process ensures that when failures occur, they minimally impact user I/O. A side effect of this is that unassisted merge operations can often take extended periods of time to complete, depending on user I/O rates. Also, if another node fails before a merge completes, the current merge is abandoned and a new one is initiated from the beginning.
Note that data availability and integrity are fully preserved during
merge operations regardless of their duration. All shadow set members
contain equally valid data.
6.3.2 Assisted Merge Operations
Starting with OpenVMS Version 5.5--2, the merge operation includes enhancements for shadow set members that are configured on controllers that implement assisted merge capabilities. The assisted merge operation is also referred to as a minimerge. The minimerge feature significantly reduces the amount of time needed to perform merge operations. Usually, the minimerge completes in a few minutes.
By using information about write operations that were logged in controller memory, the minimerge is able to merge only those areas of the shadow set where write activity was known to have been in progress. This avoids the need for the total read and compare scans required by unassisted merge operations, thus reducing consumption of system I/O resources.
Controller-based write logs contain information about exactly which LBNs in the shadow set had write I/O requests outstanding (from a failed node). The node that performs the assisted merge operation uses the write logs to merge those LBNs that may be inconsistent across the shadow set. No controller-based write logs are maintained for a one member shadow set. No controller-based write logs are maintained if only one OpenVMS system has the shadow set mounted.
Because of the requirement to consolidate crash dump files, the shadowing software does not automatically perform a minimerge on a system disk. Dump off system disk (DOSD) is supported on both OpenVMS VAX and OpenVMS Alpha, starting with OpenVMS VAX Version 6.2 and OpenVMS Alpha Version 7.1. If DOSD is enabled, the system disk can be minimerged. |
The minimerge operation is enabled on nodes running OpenVMS Version 5.5--2 or later. Volume shadowing automatically enables the minimerge if the controllers involved in accessing the physical members of the shadow set support it. See the Volume Shadowing for OpenVMS Software Product Description (SPD 27.29.xx) for a list of supported controllers. Note that minimerge operations are possible even when shadow set members are connected to different controllers. This is because write log entries are maintained on a per controller basis for each shadow set member.
Volume Shadowing for OpenVMS automatically disables minimerges if:
Minimerge operations are not enabled on standalone systems.
The following transient conditions can also cause a minimerge operation to be disabled:
Previous | Next | Contents | Index |
privacy and legal statement | ||
5423PRO_006.HTML |