Updated: 11 December 1998 |
OpenVMS Performance Management
Previous | Contents | Index |
All the system tuning solutions for excessive paging involve a reallocation of the memory resource, and nothing more. Consider the following suggestions:
In situations of excessive paging not due to image activations, you should determine what kinds of faults and faulting rates exist. Use the MONITOR PAGE command and your knowledge of your work load. If you are experiencing a high hard fault rate (represented by Page Read I/O Rate), evaluate the overall faulting rate (represented by Page Fault Rate). If the overall faulting rate is low while the hard fault rate is high, the page cache is ineffective; that is, the size of the free-page list, the modified-page list, or both, is too small. You need to increase the size of the cache. This relatively rare problem occurs when a system has been mistuned; for example, perhaps AUTOGEN was bypassed.
Before deciding to acquire more memory, try increasing the values of
MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM. (See Section 11.3.) You
might also try reducing the system parameter BALSETCNT or reducing the
working set characteristics. However, if these changes result
immediately in the following problems when the cache is too large and
the working sets are too small (and lowering the cache parameter values
a bit does not bring them into balance), you have no other tuning
options. You must reduce demand or acquire more memory. (See
Section 11.26.)
7.3.7 Saturated System Disk
If you have the combination of a high hard fault rate with high faulting overall, it is quite possible the load is too high on your system, which means that the system disk is saturated and you must reduce the page faulting to disk.
However, first perform the checks described in Chapter 11 for small working set sizes. This action will rule out or correct the possibility that the combination of heavy overall faulting with heavy hard faulting is due to too large a page cache while too many processes attempt to work with small working sets. The solution will require you to reduce the cache size and increase the WSQUOTA values.
If this investigation fails to produce results, you can conclude that the system disk is saturated. Therefore, you should consider:
Because of the commoditization of components, prices have fallen
significantly over the years and more than one option may be
affordable. When evaluating the costs of different components, consider
the cost of detailed analysis and the cost of the associated delay.
Adding the more expensive component tomorrow may cost less than adding
a cheaper component a week from today. Also note that the more
expensive component may deliver other benefits for the rest of the
system as a whole.
7.3.8 Page Cache Is Too Large
If you find that your faults are mostly of the soft variety, check to see if the overall faulting rate is high. If so, you might have the relatively rare problem of an unnecessarily large page cache. As a guideline, you should expect the size of your page cache to be one order of magnitude less than the total memory consumed by the balance set under load conditions.
The only way to create a page cache that is too large is by seriously
mistuning a system. (Perhaps AUTOGEN was bypassed.) Section 11.4
describes how to reduce the size of the page cache through the
MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM system parameters.
7.3.9 Small Total Working Set Size
If your page cache size is appropriate, you need to investigate the likelihood that excessive paging is induced when a number of processes attempt to run with working set sizes that are too small for them. If the total memory for the balance set is too small, one of the following three possibilities (or a combination thereof) is at work:
Figures A-4, A-5, and A-6 summarize the
procedures for isolating the cause of working set sizes that are too
small.
7.3.10 Inappropriate WSDEFAULT, WSQUOTA, and WSEXTENT Values
Begin to narrow down the possible causes of unusually small total working set sizes by looking first at your system's allocation of working set sizes. To gain some insight into the work load and which processes have too little memory, do the following:
Perhaps you can conclude that one large process (or several) does not
need as much memory as it is using. If you reduced its WSQUOTA or
WSEXTENT values, or both, the other processes could use the memory the
large process currently takes. (For more information, see
Section 11.5.)
7.3.10.1 Learning About the Process
To form any firm conclusions at this point, you need to learn more about the process's behavior as its working set size grows and shrinks. Use the MONITOR PROCESSES command and the lexical function F$GETJPI for this purpose.
To look at the current values as the process executes, follow these steps:
To request the items, use the system service SYS$GETJPI or the lexical function F$GETJPI. When using F$GETJPI, specify the process ID (PID) in quotation marks and a keyword (GPGCNT, PPGCNT, WSEXTENT, WSQUOTA, or WSSIZE) denoting the type of process information to be returned as shown in the following example:
$ WSQUOTA = F$GETJPI("pid","WSQUOTA") $ SHOW SYMBOL WSQUOTA $ WSSIZE = F$GETJPI("pid","WSSIZE") $ SHOW SYMBOL WSSIZE $ PPGCNT = F$GETJPI("pid","PPGCNT") $ SHOW SYMBOL PPGCNT $ GPGCNT = F$GETJPI("pid","GPGCNT") $ SHOW SYMBOL GPGCNT $ WSEXTENT = F$GETJPI("pid","WSEXTENT") $ SHOW SYMBOL WSEXTENT |
Suggestion: Write a program or command procedure that requests the PID and then formats and displays the resulting data.
The lexical function item PPGCNT represents the process page count, while GPGCNT represents the global page count. You need these values to determine how full the working set list is. The sum of PPGCNT plus GPGCNT is the actual amount of memory in use and should always be less than or equal to the value of WSSIZE. By sampling the actual amount of memory in use while processes execute, you can begin to evaluate just how appropriate the values of WSQUOTA and WSEXTENT are.
If the values of WSQUOTA and WSEXTENT are either unnecessarily
restricted or too large in a few obvious cases, they need to be
adjusted; proceed next to the discussion of adjusting working sets in
Section 11.5.
7.3.11 Ineffective Borrowing
If you observe that few of the processes are able to take advantage of
loans, then borrowing is ineffective. Section 11.6 discusses how to
make the necessary adjustments so that borrowing is more effective.
7.3.12 AWSA Might be Disabled
You need to investigate the status of automatic working set adjustment (AWSA) by checking the value of the system parameter WSINC. If you find WSINC is greater than zero, you know that automatic working set adjustment is turned on. (More precisely, the part of automatic working set adjustment that permits working set sizes to grow is turned on). However, at the same time, you should also check whether WSDEC, PFRATL, or both, are zero. While setting WSINC=0 turns the full automatic working set adjustment mechanism off, setting PFRATL=0 when WSINC is greater than zero will disable just that part of automatic working set adjustment that provides the voluntary decrements in the working set sizes. (For example, in Figure 3-5, if PFRATL and WSDEC equaled zero, the actual working set limit line would have leveled off at Q4 and would not have changed until Q18.)
If automatic working set adjustment is disabled, processes are unable
to increase their working set sizes. You will observe that although
processes have WSQUOTA values greater than their WSDEFAULT values,
those processes that are currently active (doing some computing) do not
show a working set size count above their WSDEFAULT values. At the same
time, your system is experiencing heavy page faulting. You should
enable automatic working set adjustment, by setting WSINC greater than
zero, so that working set growth is possible.
7.3.13 AWSA is Ineffective
If AWSA is turned on, there are four ways that it could be performing less than optimally, and you must evaluate them:
If you use the SHOW PROCESS/CONTINUOUS command for those processes that
MONITOR PROCESSES/TOPFAULT shows are the heaviest page faulters, you
might find that the automatic working set adjustment is not increasing
their working set sizes quickly enough in response to their faulting.
If the default values of WSINC, PFRATH, or AWSTIME have been changed,
you should restore them to their original values and consider adjusting
the WSDEF and WSQUO values of the offending process.
7.3.13.2 AWSA with Voluntary Decrementing Enabled Causes Oscillations
It is possible for the voluntary decrementing feature of automatic
working set adjustment to cause processes to go into a form of
oscillation where the working set sizes never stabilize, but keep
growing and shrinking while accompanied by page faulting. When you
observe this situation, through the SHOW PROCESS/CONTINUOUS display,
you should disable voluntary decrementing by setting PFRATL=0. See
Section 11.8.
7.3.13.3 AWSA Shrinks Working Sets Too Quickly
From the SHOW PROCESS/CONTINUOUS display, you can also determine if the
voluntary decrementing feature of automatic working set adjustment is
shrinking the working sets too quickly. In that event, you should
consider decreasing WSDEC and decreasing PFRATL. See Section 11.9.
7.3.13.4 AWSA Needs Voluntary Decrementing Enabled
You might observe the case of one or more processes that rapidly achieve a very large working set count and then maintain that size over some period of time. However, you know or suspect that those processes should not require that much memory continuously. Although those processes are not page faulting, other processes are. You should check whether voluntary decrementing is turned off (PFRATL=0 and optionally WSDEC=0). See Figure A-6. It may be that, for your work load, voluntary decrementing would bring about improvement since it is time based, not load based. You could enable voluntary decrementing according to the suggestions in Section 11.10 to see if any improvement is forthcoming.
If you decide to take this step, keep in mind that it is the exception
rather than the rule. You could make conditions worse rather than
better. Be certain to monitor your system very carefully to ensure that
you do not induce working set size oscillations in your overall work
load, as described previously. If no improvement is obtained, you
should turn off voluntary decrementing. Probably your premise that the
working set size could be reduced was incorrect. Also, if oscillations
do result that do not seem to stabilize with a little time, you should
turn voluntary decrementing off again. You must explore, instead, ways
to schedule those processes so that they are least disruptive to the
work load.
7.3.13.5 Swapper Trimming Is Too Vigorous
Perhaps there are valid reasons why at your site WSINC has been set to zero to turn off automatic working set adjustment. For example, the applications might be well understood, and the memory requirements for each image might be so predictable that the value for WSDEFAULT can be accurately set. Furthermore, it is possible that if automatic working set adjustment is enabled at your site, you are satisfied that your system is using appropriate values for WSQUOTA, WSEXTENT, PFRATH, BORROWLIM, and GROWLIM. In these situations, perhaps swapper trimming is to blame for the excessive paging. In particular, perhaps trimming on the second level is too severe.
Figure A-7 illustrates the investigation for paging problems induced by swapper trimming. Again, you must determine the top faulting processes and evaluate what is happening and how much memory is consumed by these processes. Use the MONITOR PROCESSES/TOPFAULT and MONITOR PROCESSES commands. By selecting the top faulting processes and scrutinizing their behavior with the SHOW PROCESS/CONTINUOUS command, you can determine if there are many active processes that seem to display working set sizes with the following values:
Either finding indicates that swapper trimming is too severe.
If such is the case, consider increasing the system parameter
SWPOUTPGCNT while evaluating the need to increase the system parameter
LONGWAIT. The swapper uses LONGWAIT to detect those processes that are
truly idle. If LONGWAIT specifies too brief a time, the swapper can
swap temporarily idle processes that would otherwise have become
computable again soon (see Section 11.12). For computable processes,
the same condition can occur if DORMANTWAIT is set too low.
7.4 Analyzing the Swapping Symptom
Experience with systems has shown that swapping of active processes is less desirable than modest paging, because swapping involves disk accesses (true only of hard page faults). Swapping requires each process and its context to be written out to disk, an event that is normally slower than the average paging operation, since it involves more blocks. There is additional system overhead for swapping caused by stopping and starting processes. In using the disk resource heavily, the swapper might cause additional entries in the queue on its disk, thus delaying other processes that need access to that disk.
Not only is swapping costly in terms of performance, but its relative
cost is higher for slower processors. In fact, the single-disk,
slower-speed system pays the highest price of all for swapping, since
all other access to the disk is delayed while the disk is used for
swapping. If your processor speed is an issue, you could decide to
reduce swapping and make yours a system that primarily pages.
7.4.1 Detecting Harmful Swapping
Harmful swapping manifests itself in heavy consumption of the CPU resource and the disk, to the detriment of other processes. Use the following tests to check for any symptoms that indicate swapping is harmful:
If your swapping passes these three tests, you can conclude that
swapping is not so harmful on your system that you should eliminate it.
7.4.2 Investigating Harmful Swapping
Indications of harmful swapper activity, such as heavy disk or CPU consumption, warrant attention. (Figures A-8, A-9, and A-10 summarize the investigation for swapping.)
Consider converting your system to one that only pages and rarely if ever swaps, particularly if your system is a small configuration. You accomplish this by performing the following tasks:
Optionally, you could decide to reduce the process working set quotas (in the UAF). See Section 11.5.
Even if you tune your system so that it rarely swaps, you still need a
swapping file on your system. However, the space requirement for the
swapping file is reduced. If disk space is at a premium, you can adjust
your swapping file space requirement to 75 percent of its previous
value with the AUTOGEN command procedure. (See the OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems.)
7.4.3 Causes of Harmful Swapping
If you find that your system is showing symptoms of harmful swapping and that performance has degraded, no free balance slots and insufficient free memory for all working sets are two possible causes.
If there are no free balance slots, use the DCL command SHOW MEMORY to check the number of free balance slots. If the number available is small and you know there is still adequate free memory (which you can also check with SHOW MEMORY), then you should be able to alleviate the swapping by increasing the system parameter BALSETCNT (see Section 11.14).
On VAX, if you have no free balance slots, check the system parameter VBSS_ENABLE to determine whether virtual balance slots are enabled. See Section 3.6.6 for more information about virtual balance slots.
Insufficient Free Memory for All the Working Sets
If there are free balance slots but the total of the working set sizes exceeds available memory, you can safely conclude that there is not enough free memory to support all the working sets at once. This condition can result from one or more of the following factors:
To determine if the page cache is too large, do the following:
Only when a system has been seriously mistuned should you find that the page cache is too large. (Perhaps AUTOGEN was bypassed.) Section 11.4 describes how to reduce the size of the page cache through the MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM system parameters.
If you determine that the page cache is not too large, or having
reduced its size, you find that there is still insufficient free memory
for all the working sets, you need to investigate other potential
causes for the problem. These causes are described in the next sections.
7.4.4 Why Processes Consume Unreasonable Amounts of Memory
Swapping can be induced whenever one or a small number of processes
devour memory at the expense of other processes. You can find out if a
few users are using large amounts of memory by examining the display
produced by the MONITOR PROCESSES command.
7.4.5 Large, Compute-Bound Processes
At this point, you should be particularly alert for the situation where one or more very large, compute-bound processes at low priority consume memory at the expense of a number of smaller processes. Typically, the smaller processes might be trying to perform some terminal I/O, such as editing. When memory becomes tight, the large process that is compute bound is less likely to be selected for outswapping than any process that is in the local event flag wait state. Consequently, in this situation, the operating system will select processes running the editor for outswapping as soon as they start to wait for I/O. As a result, the editing processes will experience poor response times due to frequent outswapping. The SHOW SYSTEM command provides a valuable tool for checking the priority and state of the large process.
Note the process identification number from the MONITOR PROCESSES display and ensure that you have the WORLD privilege. Then, for each large process you want to investigate, use the lexical function F$GETJPI as described in Section 7.3.10, to request the working set quota, size, process page count, global page count, and working set extent.
If you find that any of the processes are above their working set quotas, decrease DORMANTWAIT and monitor performance for a time. If decreasing DORMANTWAIT proves ineffective, enter the DCL command SET PROCESS/SUSPEND to suspend the large, compute-bound process that is over WSQUOTA. This action offers a rapid means of restoring other process activities. (Once the process is suspended, the swapper can trim the process to its SWPOUTPGCNT value.) As soon as SHOW PROCESS/CONTINUOUS reveals that the process has been trimmed, you can safely resume it. If the AWSA is set correctly, the problem should not recur since the process will be unable to grow beyond its quota while memory is scarce.
However, you must determine the underlying cause of the problem (for example, the working set quota might be too large for the process) and take corrective action. For example, you could lower WSQUOTA and increase WSEXTENT. Borrowing will then be reclaimed by the swapper. If the large, compute-bound process is not above its working set quota, suspending the process may provide temporary relief, but as soon as you allow the process to resume, it can start to devour memory again. Thus, the most satisfactory corrective action is the permanent solution discussed in Section 11.5.
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
6491PRO_007.HTML
|