OpenVMS Cluster Systems

Document revision date: 15 July 2002

OpenVMS Cluster Systems

Contents

Index

F.7.7 Transport (TR) Header

The transport (TR) header is used to pass SCS datagrams and sequenced messages between cluster nodes. The important fields for network troubleshooting are the TR datagram flags, message acknowledgment, and sequence numbers. Note that because the CC and TR headers occupy the same space, a TR/CC flag identifies the type of message being transmitted over the channel.

Figure F-10 shows the portions of the TR header that are needed for network troubleshooting, and Table F-11 describes these fields.

Figure F-10 TR Header

Note: The TR header shown in Figure F-10 is used when both nodes are running Version 1.4 or later of the NISCA protocol. If one or both nodes are running Version 1.3 or an earlier version of the protocol, then both nodes will use the message acknowledgment and sequence number fields in place of the extended message acknowledgment and extended sequence number fields, respectively.

Table F-11 Fields in the TR Header
Field Description

Datagram flags (bits <7:0>) Provide additional information about the transport datagram.

Value Abbreviated
Datagram
Type Expanded
Datagram
Type Function

0 DATA Packet data Contains data to be delivered to the upper levels of software.

1 SEQ Sequence flag Set to 1 if this is a sequenced message and the sequence number is valid.

2 Reserved Set to 0.

3 ACK Acknowledgment Acknowledges the field is valid.

4 RSVP Reply flag Set when an ACK datagram is needed immediately.

5 REXMT Retransmission Set for all retransmissions of a sequenced message.

6 Reserved Set to 0.

7 TR/CC flag Transport flag Set to 0; indicates a TR datagram.

Message acknowledgment An increasing value that specifies the last sequenced message segment received by the local node. All messages prior to this value are also acknowledged. This field is used when one or both nodes are running Version 1.3 or earlier of the NISCA protocol.

Extended message acknowledgment An increasing value that specifies the last sequenced message segment received by the local node. All messages prior to this value are also acknowledged. This field is used when both nodes are running Version 1.4 or later of the NISCA protocol.

Sequence number An increasing value that specifies the order of datagram transmission from the local node. This number is used to provide guaranteed delivery of this sequenced message segment to the remote node. This field is used when one or both nodes are running Version 1.3 or earlier of the NISCA protocol.

Extended sequence number An increasing value that specifies the order of datagram transmission from the local node. This number is used to provide guaranteed delivery of this sequenced message segment to the remote node. This field is used when both nodes are running Version 1.4 or later of the NISCA protocol.

F.8 Using a LAN Protocol Analysis Program

Some failures, such as packet loss resulting from congestion, intermittent network interruptions of less than 20 seconds, problems with backup bridges, and intermittent performance problems, can be difficult to diagnose. Intermittent failures may require the use of a LAN analysis tool to isolate and troubleshoot the NISCA protocol levels described in Section F.1.

As you evaluate the various network analysis tools currently available, you should look for certain capabilities when comparing LAN analyzers. The following sections describe the required capabilities.

F.8.1 Single or Multiple LAN Segments

Whether you need to troubleshoot problems on a single LAN segment or on multiple LAN segments, a LAN analyzer should help you isolate specific patterns of data. Choose a LAN analyzer that can isolate data matching unique patterns that you define. You should be able to define data patterns located in the data regions following the LAN header (described in Section F.7.2). In order to troubleshoot the NISCA protocol properly, a LAN analyzer should be able to match multiple data patterns simultaneously.

To troubleshoot single or multiple LAN segments, you must minimally define and isolate transmitted and retransmitted data in the TR header (see Section F.7.7). Additionally, for effective network troubleshooting across multiple LAN segments, a LAN analysis tool should include the following functions:

A distributed enable function that allows you to synchronize multiple LAN analyzers that are set up at different locations so that they can capture information about the same event as it travels through the LAN configuration
A distributed combination trigger function that automatically triggers multiple LAN analyzers at different locations so that they can capture information about the same event

The purpose of distributed enable and distributed combination trigger functions is to capture packets as they travel across multiple LAN segments. The implementation of these functions discussed in the following sections use multicast messages to reach all LAN segments of the extended LAN in the system configuration. By providing the ability to synchronize several LAN analyzers at different locations across multiple LAN segments, the distributed enable and combination trigger functions allow you to troubleshoot LAN configurations that span multiple sites over several miles.

F.8.2 Multiple LAN Segments

To troubleshoot multiple LAN segments, LAN analyzers must be able to capture the multicast packets and dynamically enable the trigger function of the LAN analyzer, as follows:

Step Action

1 Start capturing the data according to the rules specific to your LAN analyzer. Compaq recommends that only one LAN analyzer transmit a distributed enable multicast packet on the LAN. The packet must be transmitted according to the media access-control rules.

2 Wait for the distributed enable multicast packet. When the packet is received, enable the distributed combination trigger function. Prior to receiving the distributed enable packet, all LAN analyzers must be able to ignore the trigger condition. This feature is required in order to set up multiple LAN analyzers capable of capturing the same event. Note that the LAN analyzer transmitting the distributed enable should not wait to receive it.

3 Wait for an explicit (user-defined) trigger event or a distributed trigger packet. When the LAN analyzer receives either of these triggers, the LAN analyzer should stop the data capture.
Prior to receiving either trigger, the LAN analyzer should continue to capture the requested data. This feature is required in order to allow multiple LAN analyzers to capture the same event.

4 Once triggered, the LAN analyzer completes the distributed trigger function to stop the other LAN analyzers from capturing data related to the event that has already occurred.

Step	Action
1	Start capturing the data according to the rules specific to your LAN analyzer. Compaq recommends that only one LAN analyzer transmit a distributed enable multicast packet on the LAN. The packet must be transmitted according to the media access-control rules.
2	Wait for the distributed enable multicast packet. When the packet is received, enable the distributed combination trigger function. Prior to receiving the distributed enable packet, all LAN analyzers must be able to ignore the trigger condition. This feature is required in order to set up multiple LAN analyzers capable of capturing the same event. Note that the LAN analyzer transmitting the distributed enable should not wait to receive it.
3	Wait for an explicit (user-defined) trigger event or a distributed trigger packet. When the LAN analyzer receives either of these triggers, the LAN analyzer should stop the data capture. Prior to receiving either trigger, the LAN analyzer should continue to capture the requested data. This feature is required in order to allow multiple LAN analyzers to capture the same event.
4	Once triggered, the LAN analyzer completes the distributed trigger function to stop the other LAN analyzers from capturing data related to the event that has already occurred.

The HP 4972A LAN Protocol Analyzer, available from the Hewlett-Packard Company, is one example of a network failure analysis tool that provides the required functions described in this section.

Reference: Section F.10 provides examples that use the HP 4972A LAN Protocol Analyzer.

F.9 Data Isolation Techniques

The following sections describe the types of data you should isolate when you use a LAN analysis tool to capture OpenVMS Cluster data between nodes and LAN adapters.

F.9.1 All OpenVMS Cluster Traffic

To isolate all OpenVMS Cluster traffic on a specific LAN segment, capture all the packets whose LAN header contains the protocol type 60--07.

Reference: See also Section F.7.2 for a description of the LAN headers.

F.9.2 Specific OpenVMS Cluster Traffic

To isolate OpenVMS Cluster traffic for a specific cluster on a specific LAN segment, capture packets in which:

The LAN header contains the the protocol type 60--07.
The DX header contains the cluster group number specific to that OpenVMS Cluster.

Reference: See Sections F.7.2 and F.7.5 for descriptions of the LAN and DX headers.

F.9.3 Virtual Circuit (Node-to-Node) Traffic

To isolate virtual circuit traffic between a specific pair of nodes, capture packets in which the LAN header contains:

The protocol type 60--07
The destination SCS address
The source SCS address

You can further isolate virtual circuit traffic between a specific pair of nodes to a specific LAN segment by capturing the following additional information from the DX header:

The cluster group code specific to that OpenVMS Cluster
The destination SCS transport address
The source SCS transport address

Reference: See Sections F.7.2 and F.7.5 for LAN and DX header information.

F.9.4 Channel (LAN Adapter--to--LAN Adapter) Traffic

To isolate channel information, capture all packet information on every channel between LAN adapters. The DX header contains information useful for diagnosing heavy communication traffic between a pair of LAN adapters. Capture packets in which the LAN header contains:

The destination LAN adapter address
The source LAN adapter address

Because nodes can use multiple LAN adapters, specifying the source and destination LAN addresses may not capture all of the traffic for the node. Therefore, you must specify a channel as the source LAN address and the destination LAN address in order to isolate traffic on a specific channel.

Reference: See Section F.7.2 for information about the LAN header.

F.9.5 Channel Control Traffic

To isolate channel control traffic, capture packets in which:

The LAN header contains the the protocol type 60--07.
The CC header datagram flags byte (the TR/CC flag, bit <7>) is set to 1.

Reference: See Sections F.7.2 and F.7.6 for a description of the LAN and CC headers.

F.9.6 Transport Data

To isolate transport data, capture packets in which:

The LAN header contains the the protocol type 60--07.
The TR header datagram flags byte (the TR/CC flag, bit <7>) is set to 0.

Reference: See Sections F.7.2 and F.7.7 for a description of the LAN and TR headers.

F.10 Setting Up an HP 4972A LAN Protocol Analyzer

The HP 4972A LAN Protocol Analyzer, available from the Hewlett-Packard Company, is highlighted here because it meets all of the requirements listed in Section F.8. However, the HP 4972A LAN Protocol Analyzer is merely representative of the type of product useful for LAN network troubleshooting.

Note: Use of this particular product as an example here should not be construed as a specific purchase requirement or endorsement.

This section provides some examples of how to set up the HP 4972A LAN Protocol Analyzer to troubleshoot the local area OpenVMS Cluster system protocol for channel formation and retransmission problems.

F.10.1 Analyzing Channel Formation Problems

If you have a LAN protocol analyzer, you can set up filters to capture data related to the channel control header (described in Section F.7.6).

You can trigger the LAN analyzer by using the following datagram fields:

Protocol type set to 60--07 hexadecimal
Correct cluster group number
TR/CC flag set to 1

Then look for the HELLO, CCSTART, VERF, and VACK datagrams in the captured data. The CCSTART, VERF, VACK, and SOLICIT_SRV datagrams should have the AUTHORIZE bit (bit <4>) set in the CC flags byte. Additionally, these messages should contain the scrambled cluster password (nonzero authorization field). You can find the scrambled cluster password and the cluster group number in the first four longwords of SYS$SYSTEM:CLUSTER_AUTHORIZE.DAT file.

Reference: See Sections F.9.3 through F.9.5 for additional data isolation techniques.

F.10.2 Analyzing Retransmission Problems

Using a LAN analyzer, you can trace datagrams as they travel across an OpenVMS Cluster system, as described in Table F-12.

Table F-12 Tracing Datagrams
Step Action

1 Trigger the analyzer using the following datagram fields:

Protocol type set to 60--07
Correct cluster group number
TR/CC flag set to 0
REXMT flag set to 1

2 Use the distributed enable function to allow the same event to be captured by several LAN analyzers at different locations. The LAN analyzers should start the data capture, wait for the distributed enable message, and then wait for the explicit trigger event or the distributed trigger message. Once triggered, the analyzer should complete the distributed trigger function to stop the other LAN analyzers capturing data.

3 Once all the data is captured, locate the sequence number (for nodes running the NISCA protocol Version 1.3 or earlier) or the extended sequence number (for nodes running the NISCA protocol Version 1.4 or later) for the datagram being retransmitted (the datagram with the REXMT flag set). Then, search through the previously captured data for another datagram between the same two nodes (not necessarily the same LAN adapters) with the following characteristics:

Protocol type set to 60--07
Same DX header as the datagram with the REXMT flag set
TR/CC flag set to 0
REXMT flag set to 0
Same sequence number or extended sequence number as the datagram with the REXMT flag set

4 The following techniques provide a way of searching for the problem's origin.

IF... THEN...

The datagram appears to be corrupt Use the LAN analyzer to search in the direction of the source node for the corruption cause.

The datagram appears to be correct Search in the direction of the destination node to ensure that the datagram gets to its destination.

The datagram arrives successfully at its LAN segment destination Look for a TR packet from the destination node containing the sequence number (for nodes running the NISCA protocol Version 1.3 or earlier) or the extended sequence number (for nodes running the NISCA protocol Version 1.4 or later) in the message acknowledgment or extended message acknowledgement field. ACK datagrams have the following fields set:

Protocol type set to 60--07
Same DX header as the datagram with the REXMT flag set
TR/CC flag set to 0
ACK flag set to 1

The acknowledgment was not sent, or if a significant delay occurred between the reception of the message and the transmission of the acknowledgment Look for a problem with the destination node and LAN adapter. Then follow the ACK packet through the network.

The ACK arrives back at the node that sent the retransmission packet Either of the following conditions may exist:

The retransmitting node is having trouble receiving LAN data.
The round-trip delay of the original datagram exceeded the estimated timeout value.

You can verify the second possibility by using SDA and looking at the ReRcv field of the virtual circuit display of the system receiving the retransmitted datagram.
Reference: See Example F-2 for an example of this type of SDA display.

Reference: See Appendix G for more information about congestion control and PEDRIVER message retransmission.

F.11 Filters

This section describes:

How to use the HP 4972A LAN Protocol Analyzer filters to isolate packets that have been retransmitted or that are specific to a particular OpenVMS Cluster.
How to enable the distributed enable and trigger functions.

F.11.1 Capturing All LAN Retransmissions for a Specific OpenVMS Cluster

Use the values shown in Table F-13 to set up a filter, named LAVc_TR_ReXMT, for all of the LAN retransmissions for a specific cluster. Fill in the value for the local area OpenVMS Cluster group code (nn--nn) to isolate a specific OpenVMS Cluster on the LAN.

Table F-13 Capturing Retransmissions on the LAN
Byte Number Field Value

1 DESTINATION xx--xx--xx--xx--xx--xx

7 SOURCE xx--xx--xx--xx--xx--xx

13 TYPE 60--07

23 LAVC_GROUP_CODE nn--nn

31 TR FLAGS 0x1xxxxx ₂

33 ACKING MESSAGE xx--xx

35 SENDING MESSAGE xx--xx

**Table F-13 Capturing Retransmissions on the LAN**
Byte Number	Field	Value
1	DESTINATION	xx--xx--xx--xx--xx--xx
7	SOURCE	xx--xx--xx--xx--xx--xx
13	TYPE	60--07
23	LAVC_GROUP_CODE	nn--nn
31	TR FLAGS	0x1xxxxx ₂
33	ACKING MESSAGE	xx--xx
35	SENDING MESSAGE	xx--xx

¹Base 2

F.11.2 Capturing All LAN Packets for a Specific OpenVMS Cluster

Use the values shown in Table F-14 to filter all of the LAN packets for a specific cluster. Fill in the value for OpenVMS Cluster group code (nn--nn) to isolate a specific OpenVMS Cluster on the LAN. The filter is named LAVc_all.

Table F-14 Capturing All LAN Packets (LAVc_all)
Byte Number Field Value

1 DESTINATION xx--xx--xx--xx--xx--xx

7 SOURCE xx--xx--xx--xx--xx--xx

13 TYPE 60--07

23 LAVC_GROUP_CODE nn--nn

33 ACKING MESSAGE xx--xx

35 SENDING MESSAGE xx--xx

**Table F-14 Capturing All LAN Packets (LAVc_all)**
Byte Number	Field	Value
1	DESTINATION	xx--xx--xx--xx--xx--xx
7	SOURCE	xx--xx--xx--xx--xx--xx
13	TYPE	60--07
23	LAVC_GROUP_CODE	nn--nn
33	ACKING MESSAGE	xx--xx
35	SENDING MESSAGE	xx--xx

Contents

Index

privacy and legal statement

4477PRO_033.HTML