Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

Availability Manager User's Guide


Previous Contents Index

4.3 Displaying Additional Event Information

For more detailed information about a specific event, double-click any event data item in the Events pane. The Availability Manager first displays a data page that most closely corresponds to the cause of the event. You can choose other tabs for additional detailed information.

For a description of data pages and the information they contain, see Chapter 3.


Chapter 5
Performing Fixes on OpenVMS Nodes

You can perform fixes on OpenVMS nodes to resolve resource availability problems and improve system availability.

This chapter discusses the following topics:

Caution

Performing certain fixes can have serious repercussions, including possible system failure. Therefore, only experienced system managers should perform fixes.

5.1 Understanding Fixes

When you suspect or detect a resource availability problem, in many cases you can use the Availability Manager to analyze the problem and to perform a fix to improve the situation.

Availability Manager fixes fall into two categories:

You can access fixes, by category, from the pages listed in Table 5-1.

Table 5-1 Accessing Availability Manager Fixes
Fix Category and Name Available from This Page
Node fixes:
Crash node
Adjust cluster quorum
Node Summary
CPU
Memory
I/O
Process fixes:
  • General process fixes:
    Delete a process
    Exit an image
    Suspend a process
    Resume a process
    Change a process priority
  • Process memory fixes:
    Purge working set
    Adjust working set
  • Process limits fixes:
    Direct I/O
    Buffered I/O
    AST
    Open file
    Lock
    Timer
    I/O Byte
All of the process fixes are available from the following pages:
  • Memory
  • I/O
  • CPU Process
  • Single Process

Table 5-2 summarizes various problems, recommended fixes, and the expected results of fixes.

Table 5-2 Summary of Problems and Matching Fixes
Problem Fix Result
Node resource hanging cluster Crash Node Node fails with operator-requested shutdown.
Cluster hung Adjust Quorum Quorum for cluster is adjusted.
Process looping, intruder Delete Process Process no longer exists.
Endless process loop in same PC range Exit Image Exit from current image.
Runaway process, unwelcome intruder Suspend Process Process is suspended from execution.
Process previously suspended Resume Process Process starts from point it was suspended.
Runaway process or process that is overconsuming Change Process Priority Base priority changes to selected setting.
Low node memory Purge Working Set Frees memory on node; page faulting might occur for process affected.
Working set too high or low Adjust Working Set Removes unused pages from working set; page faulting might occur.
Process quota has reached its limit and has entered RWAIT state Adjust Process Limits Process receives greater limit, which in many cases frees the process to continue execution.

Most process fixes correspond to an OpenVMS system service call, as shown in the following table:
Process Fix System Service Call
Delete a process $DELPRC
Exit an image $FORCEX
Suspend a process $SUSPND
Resume a process $RESUME
Change a process priority $SETPRI
Purge working set $PURGWS
Adjust working set $ADJWSL
Adjust process limits of the following:
Direct I/O (DIO)
Buffered I/O (BIO)
Asynchronous system trap (AST)
Open file (FIL)
Lock queue (ENQ)
Timer queue entry (TQE)
Subprocess (PRC)
I/O byte (BYT)
None

Note

Each fix that uses a system service call requires that the process execute the system service. A hung process will have the fix queued to it, where the fix will remain until the process is operational again.

Be aware of the following facts before you perform a fix:

5.2 Performing Fixes

Standard OpenVMS privileges restrict users' write access. When you run the Data Analyzer, you must have the CMKRNL privilege to send a write (fix) instruction to a node with a problem.

The following options are displayed at the bottom of all fix pages:
Option Description
OK Applies the fix and then exits the page. Any message associated with the fix is displayed in the Event pane.
Cancel Cancels the fix.
Apply Applies the fix and does not exit the page. Any message associated with the fix is displayed in the Return Status section of the page and in the Event pane.

The following sections explain how to perform nodes fixes and process fixes and describe specific fixes you can make.

5.2.1 Node Fixes

The Availability Manager node fixes allow you to deliberately fail (crash) a node or to adjust cluster quorum.

To perform a node fix, follow these steps:

  1. On the Node Summary, CPU, Memory, or I/O page, click the Fix menu.
  2. Click Fix Options.

5.2.1.1 Crash Node

Caution

The crash node fix is an operator-requested bugcheck from the driver. It takes place as soon as you click OK in the Crash Node page. After you perform this fix, the node cannot be restored to its previous state. After a crash, the node must be rebooted.

When you select the Crash Node option, the Availability Manager displays the Crash Node page, shown in Figure 5-1.

Figure 5-1 Crash Node Page


Note

Because the node cannot report a confirmation when a node crash fix is successful, the crash success message is displayed after the timeout period for the fix confirmation has expired.

5.2.1.2 Adjust Quorum

The Adjust Quorum fix forces the node to refigure the quorum value. This fix is the equivalent of the Interrupt Priority C (IPC) mechanism used at system consoles for the same purpose. The fix forces the adjustment for the entire cluster so that each node in the cluster will have the same new quorum value.

The Adjust Quorum fix is useful when the number of votes in a cluster falls below the quorum set for that cluster. This fix allows you to readjust the quorum so that it corresponds to the current number of votes in the cluster.

When you select the Adjust Quorum option, the Availability Manager displays the page shown in Figure 5-2.

Figure 5-2 Adjust Quorum Page


5.2.2 Performing Process Fixes

To perform a process fix, follow these steps:

  1. On the Memory or I/O page, right-click a process name.
  2. Click Fix Options.
    The Availability Manager displays three Process tabs:
    Process General
    Process Memory
    Process Limits
  3. Click one of these tabs to bring it to the front.
  4. Click the down arrow to display the process fixes in this group.
  5. Select one process fix (for example, Change Process Priority, as shown in Figure 5-3,) to display a fix page.

Figure 5-3 Change Process Priority Page


Some of the fixes, like Change Process Priority, require you to use a slider to change the default value. When you have finished setting a new process priority, click one of the options at the bottom of the page.

5.2.3 General Process Fixes

The following sections describe Availability Manager general process fixes.

5.2.3.1 Delete Process

In most cases, a Delete Process fix deletes a process. However, if a process is waiting for disk I/O or is in a resource wait state (RWAST), this fix might not delete the process. In this situation, it is useless to repeat the fix. Instead, depending on the resource the process is waiting for, a Process Limit fix might free the process. As a last resort, reboot the node to delete the process.

Caution

Deleting a system process on a system process could cause the system to hang or become unstable.

When you select the Delete Process option, the Availability Manager displays the page shown in Figure 5-4.

Figure 5-4 Delete Process Page


After reading the explanation, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.3.2 Exit Image

Exiting an image on a node can stop an application that a user requires. Check the Single Process page first to determine which image is running on the node.

Caution

Exiting an image on a system process could cause the system to hang or become unstable.

When you select the Exit Image option, the Availability Manager displays the page shown in Figure 5-5.

Figure 5-5 Exit Image Page


After reading the explanation in the page, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.3.3 Suspend Process

Suspending a process that is consuming excess CPU time can improve perceived CPU performance on the node by freeing the CPU for other processes to use. (Conversely, resuming a process that was using excess CPU time while running might reduce perceived CPU performance on the node.)

Caution

Do not suspend system processes, especially JOB_CONTROL, because this might make your system unusable. (See the OpenVMS Programming Concepts Manual, Volume I for more information.)

When you select the Suspend Process option, the Availability Manager displays the page shown in Figure 5-6.

Figure 5-6 Suspend Process Page


After reading the explanation, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.3.4 Resume Process

Resuming a process that was using excess CPU time while running might reduce perceived CPU performance on the node. (Conversely, suspending a process that is consuming excess CPU time can improve perceived CPU performance by freeing the CPU for other processes to use.)

When you select the Resume Process option, the Availability Manager displays the page shown in Figure 5-7.

Figure 5-7 Resume Process Page


After reading the explanation, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.3.5 Change Process Priority

If the priority of a compute-bound process is too high, the process can consume all the CPU cycles on the node, affecting performance dramatically. On the other hand, if the priority of a process is too low, the process might not obtain enough CPU cycles to do its job, also affecting performance.

When you select the Process Priority option, the Availability Manager displays the page shown in Figure 5-8.

Figure 5-8 Change Process Priority Page


To change the base priority for a process, drag the slider on the scale to the number you want. The current priority number is displayed in a small box above the slider. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new base priority, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.4 Process Memory Fixes

The following sections describe the Availability Manager fixes you can use to correct process memory problems.

5.2.4.1 Purge Working Set

This fix purges the working set to a minimal size. You can use this fix to reclaim a process's pages that are not in active use. If the process is in a wait state, the working set remains at a minimal size, and the purged pages become available for other uses. If the process becomes active, pages the process needs are page-faulted back into memory, and the unneeded pages are available for other uses.

Be careful not to repeat this fix too often: a process that continually reclaims needed pages can cause excessive page faulting, which can affect system performance.

When you select the Purge Working Set option, the Availability Manager displays the page shown in Figure 5-9.

Figure 5-9 Purge Working Set Page


After reading the explanation on the page, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.4.2 Adjust Working Set

Adjusting the working set of a process might prove to be useful in situations similar to the following ones:

When you select the Adjust Working Set fix, the Availability Manager displays the page shown in Figure 5-10.

Figure 5-10 Adjust Working Set Page


To perform this fix, use the slider to adjust the working set to the limit you want. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new working set limit, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

The following sections describe Availability Manager process limits fixes.

5.2.5 Process Limits Fixes

If a process is waiting for a resource, you can use a Process Limits fix to increase the resource limit so that the process can continue. The increased limit is in effect only for the life of the process, however; any new process is assigned the quota that was set in the UAF.

When you click the Process Limits tab, you can select any of the options described in the following sections.

5.2.5.1 Direct I/O Count Limit

You can use this fix to adjust the direct I/O count limit of a process. When you select the Direct I/O option, the Availability Manager displays the page shown in Figure 5-11.

Figure 5-11 Direct I/O Count Limit Page


To perform this fix, use the slider to adjust the direct I/O count to the limit you want. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new direct I/O count limit, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.5.2 Buffered I/O Count Limit

You can use this fix to adjust the buffered I/O count limit of a process. When you select the Buffered I/O option, the Availability Manager displays the page shown in Figure 5-12.

Figure 5-12 Buffered I/O Count Limit Page


To perform this fix, use the slider to adjust the buffered I/O count to the limit you want. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new buffered I/O count limit, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.5.3 AST Queue Limit

You can use this fix to adjust the AST queue limit of a process. When you select the AST option, the Availability Manager displays the page shown in Figure 5-13.

Figure 5-13 AST Queue Limit Page


To perform this fix, use the slider to adjust the AST queue limit to the number you want. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new AST queue limit, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.

5.2.5.4 Open File Limit

You can use this fix to adjust the open file limit of a process. When you select the Open File option, the Availability Manager displays the page shown in Figure 5-14.

Figure 5-14 Open File Limit Page


To perform this fix, use the slider to adjust the open file limit to the number you want. You can also click the line above or below the slider to adjust the number by one.

When you are satisfied with the new open file limit, select one of the options displayed at the bottom of the page. A message displayed on the page indicates that the fix has been successful.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6552PRO_006.HTML