Previous | Contents | Index |
To troubleshoot server problems, you should be familiar with the following topics:
The following sections describe how to determine the cause of a server problem and solve it if possible. Problem resolution includes determining whether or not the problem is caused by the Advanced Server software. To solve client-based problems, hardware problems, and application-specific problems, see the documentation for the specific products involved.
Troubleshooting a server problem requires the following steps:
The following sections describe each step in more detail.
6.2.1.1 Step 1: Collecting Information About the Problem
When you first detect a server problem, or when the problem is reported, collect as much information as possible immediately. Record the following information:
If you are investigating a recurring or ongoing problem, you should, if
possible, implement an immediate solution that allows the client to
continue working. Record server problems and save a dump file, if one
was generated, and save associated log files and data files before
restarting the server or changing the server configuration. You can use
the information gathering command procedure
SYS$STARTUP:PWRK$GATHER_INFO.COM to save these files.
6.2.1.2 Step 2: Analyzing the Problem
When you analyze the server problem, you should also look for the solution to the problem. Therefore, you must isolate the component that needs to be modified, replaced, removed, or enhanced.
Advanced Server software provides information in log files and tools to
help you determine the cause of a server problem. These tools keep
records of activities and errors. You can use them to isolate problem
areas and to help solve problems. You may be able to solve the problem
using the Advanced Server commands and utilities.
6.2.1.3 Step 3: Solving the Problem
The cause of a server problem may be within your ability to correct. At best, you may determine a configuration or definition change that will correct the problem. Or, you may be able to modify a server parameter or disable a service until the problem is solved more satisfactorily.
The procedure for solving a server problem depends on your ability to capture information about the problem and the state of the server at the time of the problem. If a problem is reported to be intermittent and is difficult to reproduce at will, the procedure for analysis and solution will take longer and be more difficult. Thus, it is particularly important to collect detailed information as soon as the problem is reported.
The following sections show how to use the Advanced Server tools in the problem-solving process. Using these tools, you can modify the server to report on network activity and events, providing more detailed investigation of problems that you have already determined to be caused by the server or its network resources.
If you cannot determine the cause of a server problem, or if you cannot solve the problem, report the problem to your software specialist and keep the Advanced Server data structure PWRK$LMROOT and the log files for future analysis.
To help you report the information required for analyzing a server
problem, the Advanced Server software includes a procedure you can run to
gather server information.
6.2.1.3.1 Gathering Information About Server Status
To invoke the procedure provided by the server to gather server status information, enter the following commands:
$ SET DEFAULT SYS$STARTUP $ @PWRK$GATHER_INFO.COM |
The resulting file (ADVANCED_SERVER_AS_INFO.BCK) is a BACKUP saveset containing copies of the Advanced Server database, logs, and, if present, process dump files.
If the problem you are investigating causes a systemwide failure,
create a dump file for the system. The system dump file captures system
information. Be sure to verify that your system dump file size is
sufficient to capture a full system dump.
6.2.2 The Problem Analysis Process
Problem analysis is a process of elimination. Given little information to start, you must begin at the general level and use the information-gathering tools described in this chapter to determine the area from which the problem originates. If you have sufficient information at the beginning to isolate the problem area or if the problem is ongoing or if you can reproduce the problem, you can proceed directly to the section in this chapter that addresses the type of problem you are investigating.
The problem-solving procedure differs depending on the type of problem reported. The following sections describe several types of problems, in analytical order, from the generic characteristics of server problems to the more specific.
Problem types are characterized by behavior or source as follows:
Intermittent problems are those that are not easily reproducible. They may not prevent server operation, like ongoing problems, and they may be difficult to analyze and solve. For these types of problems, your analysis depends heavily on the log files and messages reported before and during the time the problem occurred. To help locate such problems, you can use network traces, both on the condition where the problem can be reproduced, and when the problem is intermittent.
Table 6-7, Procedure for Solving Intermittent Problems, describes the steps you may take to determine the cause of an intermittent problem.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
Record the time and date when the problem occurred, the nature of the symptoms, the computer name of the client, if any. Related information can include applications that have connections to the server, server shares, and resources consumed by the client. | Check for alerts around the time the problem occurred. Attempt to reproduce the problem on the same client and on other clients in the domain. | You can enable and modify the Alerter service to provide more specific, immediate error notification, as described in Table 6-1, Alerter Configuration Parameters. If the problem circumstances can be reproduced, use the Alerter service to watch the messages during the occurrence of the problem. |
If the problem is unique to a specific group or one client, see Analyze
the Problem in the next column of this table.
If the problem is continuous, or if you can reproduce the problem at will, continue to the section Domain and Computer Problems. |
Use the SHOW EVENTS command to see the event messages that were
recorded for the time the problem occurred. Enable additional
event/audit tracking to get more detailed information. See
Section 6.1.3 in this guide for more information.
Check Advanced Server log files for additional messages, as described in Section 6.1.4,Advanced Server Log Files. |
Review events and log files to isolate the cause of the problem and
address it accordingly.
Intermittent problems that do not prevent use of the server may be due to faulty hardware. Check the connections to the client, the client configuration, and the network hardware. |
6.2.2.2 Domain and Computer Problems
The domain-wide functions of the server depend on its role in the
domain and on the other servers in the domain. The Advanced Server
command-line interface lets you display information about the domain
and modify server activity in the domain.
Table 6-8, Procedure for Solving Domain and Computer Problems, described how to determine the cause of server and domain problems and what to do about them.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
Determine whether users of other computers in the domain receive error messages when attempting to connect to a server, or whether server administrators receive error messages using ADMINISTER commands. | If so, the problem may be due to a server's relationship to the other servers in the domain. Use the SHOW COMPUTERS command to determine the status of other computers in the domain. |
Use the REMOVE COMPUTER command to take the computer off the domain.
Use the SET COMPUTER /ACCOUNT_SYNCH command to synchronize the security accounts database across the domain. Use the SET COMPUTER/ROLE command to change the server role of a server in the domain, as described in Section 2.1.1.1, Changing a Server's Role in a Domain. |
Determine whether domain problems require changes on multiple servers in the domain. |
Use the SHOW
ADMINISTRATION command to display the server and domain name of the server currently being administered. |
Use the SET ADMINISTRATION command to set the server and domain name of the server to be managed, as described in Section 2.1.4, Administering Another Domain. |
When setting up trusts between domains, you receive the error message "Could not find domain controller for this domain." |
Check that each domain has a running domain controller.
Check that both domains are running the same transport protocol (TCP/IP, DECnet, or NetBEUI). |
Start at least one server in each domain.
Use the Configuration Manager to enable the same transport on both domains, as described in Section 7.1, Managing File Server Parameters Affecting System Resources. |
6.2.2.3 Server Operation Problems
If the server fails to complete routine operations, the log files and
error messages from the software usually indicate the nature and source
of the problem.
Table 6-9, Procedure for Solving Server Operation Problems, describes how to determine the cause of a problem in server operation and what do to about it.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
Check the error messages seen during failing procedures and operations. | Use Advanced Server log files to display messages about problems during software startup and operation. | Use the Configuration Manager to modify server parameters that affect the way the server runs, as described in Section 7.1, Managing File Server Parameters Affecting System Resources, or modify server configuration parameters, as described in Section 7.2, Managing Server Configuration Parameters Stored in the OpenVMS Registry. |
Check service startup failures, which are logged in the system event log files. | Use the SHOW EVENTS command to display system events. | Use the START SERVICES and STOP SERVICES commands to manage services, as described in Section 2.3.4, Managing Services. |
Advanced Server uses its data cache for caching the security databases, in addition to client file data. To ensure a balance of cache usage, the file server periodically monitors its use of the data cache, as follows:
BlobCache Warning: Sum of Blob file control areas is 950272 bytes (45% of data cache). |
BlobCache Error: The largest single Blob file control area is 1187840 bytes (57% of data cache). BlobCache Error: The largest single Blob file control area is PWRK$LMROOT:[LANMAN.DOMAINS]DOMAIN1. |
You can use the ADMIN/ANALYZE command to monitor these warning messages and error messages, as described in Section 6.1.4.2, The Advanced Server Common Event Log.
6.2.2.4 Problems with Services
Advanced Server software includes several optional services. For example,
Auditing is a service useful for analyzing server problems. However,
the services must be enabled.
Table 6-10, Procedure for Solving Service Problems, describes how to determine whether a problem is caused by network service problems and what do to about them.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
Check whether the services are running. | Use the SHOW SERVICES command to display the services that are running. | Use the following commands to control the operation of the services: |
START SERVICE | ||
STOP SERVICE | ||
PAUSE SERVICE | ||
CONTINUE SERVICE | ||
(See Section 2.3.4, Managing Services, for more information.) |
6.2.2.5 Client Connection Problems
Clients may be individually or collectively reporting a failure to
connect to the server or reporting slow response time in connecting to
the server or the share.
Table 6-11, Procedure for Solving Client Connection Problems, describes the causes behind many typical client connection problems and what to do about them. For information on problems connecting to shares or specific files, see Section 6.2.2.6, Share Access Problems.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
If a client cannot end a session or there are too many sessions, you can control the user sessions. | Use the SHOW SESSIONS command to display current Advanced Server client sessions. | Use the CLOSE SESSION command to close unneeded sessions. |
If more than one client reports a problem when connection to the server is lost or with slow response time, the problem may be caused by too many connections to the same server. |
Use the SHOW
CONNECTIONS command to display the connections that clients have established to Advanced Server shares. |
Use the CLOSE CONNECTION command to end one or more connections. |
When a client tries to log on over a WAN, the following message is received: "You were logged on, but have not been validated by a server." | Clients may use NetBIOS broadcasts to send logon requests, and these requests do not go over the router. | To locate domain controllers capable of authenticating logons, use a WINS Server or LMHOSTS entries that include the #DOM directive. |
6.2.2.6 Share Access Problems
Clients may fail to connect to shares or lose existing connections. The
shares must be set to permit client access. Share setup includes:
Table 6-12, Procedure for Solving Share Access Problems, describes the causes behind some typical share access problems and what to do about them.
Step 1: Collect Information | Step 2: Analyze the Problem | Step 3: Solve the Problem |
---|---|---|
Determine whether the client is connected but failing to access resources in the shares. For example, the client computer displays the connection to the server but is unable to list all the files and directories to which the client requires access. |
Use the SHOW USER command to display the groups to which the user
belongs.
Use the SHOW SHARE command to display the groups allowed to access the share. |
To add the user to a group, use the MODIFY GROUP command to add the user name. To let the user's group access a share, use the MODIFY SHARE/PERMISSIONS command, as described in Section 4.3.4, Changing Share Properties. |
Use the SHOW FILE command to display access permissions on the resources. If the OpenVMS and Advanced Server security model is enabled, use the OpenVMS command DIRECTORY/SECURITY to display the OpenVMS owner and protection information. | Use the Advanced Server SET FILE/PERMISSIONS command, as described in Section 4.3.5.2, Setting Permissions on a File or Directory, to modify the permissions on the file to give the user or group access to the specific resource. Use the OpenVMS SET FILE/PROTECTION command to modify the RMS protections on a directory or file. | |
Use the Advanced Server SHOW HOSTMAP command to display host mapped user accounts. | Use the ADD HOSTMAP command, as described in Section 3.1.16.2, Establishing User Account Host Mapping, to associate a network user account with an OpenVMS user account. | |
If some clients report problems connecting to a share, the problem may be caused by too many connections. | Use the SHOW SHARES command to display information about the connection limit on the share. | Use the MODIFY SHARE command to change the connection limit on the share, as described in Section 4.3.4, Changing Share Properties. |
If clients report failure to access a specific file, the problem may be caused by incorrect permission settings on the file. | Use the SHOW FILE command to display files that are open, clients who have the files open, and the permissions granted to the clients. |
Use the SET FILE
/PERMISSIONS command, as described in Section 4.3.6, Specifying File and Directory Access Permissions, to set the file permissions correctly. |
Previous | Next | Contents | Index |