Document revision date: 15 July 2002 | |
![]() |
![]() ![]() ![]() ![]() |
![]() |
Previous | Contents | Index |
Use the following sections to help solve queue manager problems:
Topic | For More Information |
---|---|
Avoiding common problems: a troubleshooting checklist | Section 13.11.1 |
If the queue manager does not start | Section 13.11.2 |
If the queuing system stops or the queue manager does not run on specific nodes | Section 13.11.3 |
If the queue manager becomes unavailable | Section 13.11.4 |
If the queuing system does not work on a specific OpenVMS Cluster node | Section 13.11.5 |
If you see inconsistent queuing behavior on different OpenVMS Cluster nodes | Section 13.11.6 |
Reporting a queuing system problem to Compaq support representatives | Section 13.12 |
To avoid the most common queuing system problems, make sure you have met the following requirements:
Requirement | For More Information |
---|---|
QMAN$MASTER is identically defined on all nodes in the cluster. | Section 13.3 |
The queue database is in the specified location. | Section 13.3 |
The queue database disk is mounted and available. | Section 13.3 |
The node list specified with the /ON qualifier contains a sufficient number of nodes. If you specify a node list, Compaq recommends that you include an asterisk (*) at the end of the node list. | Section 13.11.4 |
The system address parameters SCSNODE and SCSSYSTEMID match the DECnet for OpenVMS node name and node ID. | Section 13.11.5 |
If the queue manager does not start when you enter the START/QUEUE/MANAGER command, the system displays the following message:
%JBC-E-QMANNOTSTARTED, queue manager could not be started |
Search the operator log file SYS$MANAGER:OPERATOR.LOG (or look on the operator console) for messages from the queue manager and job controller for information about the problem, as follows:
$ SEARCH SYS$MANAGER:OPERATOR.LOG/WINDOW=5 QUEUE_MANAGE, JOB_CONTROL,BATCH_MANAGE |
Use the information provided with these messages to further investigate
the problem, making sure you have met the requirements listed in
Section 13.11.1.
13.11.2.2 Cause
The cause of the problem is the system's inability to find the queue master file. Often the logical is not defined correctly, or the disk is not available. For example, the following message indicates that the master queue file does not exist in the expected location:
%%%%%%%%%%% OPCOM 13-MAR-2000 15:53:52.84 %%%%%%%%%%% Message from user SYSTEM on ABDCEF %JBC-E-OPENERR, error opening SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT %%%%%%%%%%% OPCOM 13-MAR-2000 15:53:53.04 %%%%%%%%%%% Message from user SYSTEM on ABDCEF -SYSTEM-W-NOSUCHFILE, no such file |
On systems with multiple queue managers, search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command as explained in Section 13.4. Correct any problem indicated in the displayed information.
$ START/QUEUE/MANAGER DUA55:[SYSQUE](1) %JBC-E-QMANNOTSTARTED, queue manager could not be started(2) $ SEARCH SYS$MANAGER:OPERATOR.LOG /WINDOW=5 QUEUE_MANAGE,JOB_CONTROL(3) %%%%%%%%%%% OPCOM 14-APR-2000 18:55:18.23 %%%%%%%%%%% Message from user SYSTEM on CATNIP %QMAN-E-OPENERR, error opening DUA55:[SYSQUE]SYS$QUEUE_MANAGER.QMAN$QUEUES; %%%%%%%%%%% OPCOM 14-APR-2000 18:55:18.29 %%%%%%%%%%% Message from user SYSTEM on CATNIP -RMS-F-DEV, error in device name or inappropriate device type for operation %%%%%%%%%%% OPCOM 14-APR-2000 18:55:18.31 %%%%%%%%%%% Message from user SYSTEM on CATNIP -SYSTEM-W-NOSUCHDEV, no such device available(4) $ START/QUEUE/MANAGER DUA5:[SYSQUE](5) |
For more information about multiple queue managers and their process
names, see Section 13.8.1.
13.11.3 If the Queuing System Stops or the Queue Manager Does Not Run on Specific Nodes
Use this section if the queue manager does not run on a specific node in the cluster, or if the queuing system stops, especially after one of the following actions:
Check the operator log that was current at the time the queue manager started up or failed over. Search the log for operator messages from the queue manager.
On systems with multiple queue managers, also search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command, as explained in Section 13.4.
For more information about multiple queue managers and their process names, see Section 13.8.1.
The following messages indicate that the queue database is not in the specified location:
%%%%%%%%%%% OPCOM 4-FEB-2000 15:06:25.21 %%%%%%%%%%% Message from user SYSTEM on MANGLR %QMAN-E-OPENERR, error opening CLU$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES; %%%%%%%%%%% OPCOM 4-FEB-2000 15:06:27.29 %%%%%%%%%%% Message from user SYSTEM on MANGLR -RMS-E-FNF, file not found %%%%%%%%%%% OPCOM 4-FEB-2000 15:06:27.45 %%%%%%%%%%% Message from user SYSTEM on MANGLR -SYSTEM-W-NOSUCHFILE, no such file |
The following messages indicate that the queue database disk is not mounted:
%%%%%%%%%%% OPCOM 4-FEB-2000 15:36:49.15 %%%%%%%%%%% Message from user SYSTEM on MANGLR %QMAN-E-OPENERR, error opening DISK888:[QUEUE_DATABASE]SYS$QUEUE_MANAGER.QMAN$QUEUES; %%%%%%%%%%% OPCOM 4-FEB-2000 15:36:51.69 %%%%%%%%%%% Message from user SYSTEM on MANGLR -RMS-F-DEV, error in device name or inappropriate device type for operation %%%%%%%%%%% OPCOM 4-FEB-2000 15:36:52.20 %%%%%%%%%%% Message from user SYSTEM on MANGLR -SYSTEM-W-NOSUCHDEV, no such device available |
The queuing system does not work correctly under the following circumstances:
In general, the queuing system will be shut off completely if the queue
manager encounters a serious error and forces a crash or failover twice
in two minutes consecutively on the same node. Therefore, the queuing
system may have stopped, or it may continue to run if the queue manager
moves to yet another node on which it can access the database after the
original failed startup.
13.11.3.3 Correcting the Problem
Perform the following steps:
The queue manager becomes unavailable if it does not start or has
stopped running.
13.11.4.1 Investigating the Problem
To investigate the problem, enter SHOW CLUSTER to see if the nodes on
the list are available.
13.11.4.2 Cause
An insufficient failover node list might have been specified for the
queue manager, so that none of the nodes in the failover list is
available to run the queue manager.
13.11.4.3 Correcting the Problem
Make sure the queue manager list contains a sufficient number of nodes by entering START/QUEUE/MANAGER with the /ON qualifier to specify a node list appropriate for your configuration.
If you are in doubt about what nodes to specify, Compaq recommends that
you specify an asterisk (*) wildcard character as the last node in the
list; the asterisk indicates that any remaining node in the cluster can
run the queue manager. Specifying the asterisk prevents your queue
manager from becoming unavailable because of an insufficient node list.
13.11.5 If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node
Use this section if the queuing system does not work on a specific node
when it starts up.
13.11.5.1 Investigating the Problem
Perform the following steps:
%%%%%%%%%%% OPCOM 4-FEB-2000 15:36:49.15 %%%%%%%%%%% Message from user SYSTEM on ZNFNDL %QMAN-E-COMMERROR, unexpected error #5 in communicating with node CSID 000000 %%%%%%%%%%% OPCOM 4-FEB-2000 15:36:49.15 %%%%%%%%%%% Message from user SYSTEM on ZNFNDL -SYSTEM-F-WRONGACP, wrong ACP for device_ |
$ RUN SYS$SYSTEM:SYSMAN SYSMAN> PARAMETERS SHOW SCSSYSTEMID Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- SCSSYSTEMID 19941 0 -1 -1 Pure-numbe SYSMAN> PARAMETERS SHOW SCSNODE Parameter Name Current Default Min. Max. Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- SCSNODE "RANDY " " " " " "ZZZZ" Ascii SYSMAN> EXIT $ RUN SYS$SYSTEM:NCP NCP> SHOW EXECUTOR SUMMARY Node Volatile Summary as of 5-FEB-2000 15:50:36 Executor node = 19.45 (DREAMR) State = on Identification = DECnet for OpenVMS V7.2 NCP> EXIT $ WRITE SYS$OUTPUT 19*1024+45 19501 |
If the DECnet node name and node ID do not match the SCSNODE and
SCSSYSTEMID system address parameters, IPC (interprocess communication,
an operating system internal mechanism) cannot work properly and the
affected node will not be able to participate in the queuing system.
13.11.5.3 Correcting the Problem
Perform the following steps:
Use this section if you see the following symptoms:
Perform the following steps:
%%%%%%%%%%% OPCOM 4-FEB-2000 14:41:20.88 %%%%%%%%%%% Message from user SYSTEM on MANGLR %JBC-E-OPENERR, error opening BOGUS:[QUEUE_DIR]QMAN$MASTER.DAT; %%%%%%%%%%% OPCOM 4-FEB-2000 14:41:21.12 %%%%%%%%%%% Message from user SYSTEM on MANGLR -RMS-E-FNF, file not found |
This problem may be caused by different definitions for the logical
name QMAN$MASTER on different nodes in the cluster, causing multiple
queuing environments. You typically find this problem in OpenVMS
Cluster environments when you have just added a system disk or moved
the queuing database.
13.11.6.3 Correcting the Problem
Perform the following steps:
STOP/QUEUE/MANAGER/CLUSTER/NAME_OF_MANAGER=name |
where /NAME_OF_MANAGER specifies the name of the queue
manager to be stopped.
1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM. |
13.12 Reporting a Queuing System Problem to Compaq
If you encounter problems with the queuing system that you need to report to a Compaq support representative, provide the information in the following table. This information will help Compaq support representatives diagnose your problem. Please provide as much of the information as possible.
Information | Description |
---|---|
Summary of the problem |
Include the following information:
|
Steps for reproducing the problem | Specify the exact steps and include a list of any special hardware or software required to reproduce the problem. |
Configuration information |
For example:
|
Output from the SHOW QUEUE/MANAGERS/FULL command |
Use SYSMAN to enter the command on all nodes, as follows:
$ RUN SYS$SYSTEM:SYSMAN Type the output file SYSMAN.LIS to verify that the output for all nodes match. |
Location of the queue and journal files | If possible, find out the most recent value that was specified in the dirspec parameter of the START/QUEUE/MANAGER command (to specify the location of the queue and journal files). If none was specified, the default is SYS$COMMON:[SYSEXE]. |
Translation of QMAN$MASTER logical name |
Verify that the translation is the same on all nodes.
Enter the following commands, and include the resulting output:
If the translations returned from the SHOW LOGICAL command are not physical disk names, repeat the SHOW LOGICAL command within the environment of each node to translate the returned value until you reach a translation that includes the physical device name. |
Operator log file output |
Enter the following commands to search the operator log for any message
output by the job controller or queue manager:
$ SEARCH SYS$MANAGER:OPERATOR.LOG/WINDOW=5 - On systems with multiple queue managers, for queue managers other
than the default, specify the first 12 characters of the queue manager
name of any additional queue manager. For example, for a queue manager
named PRINT_MANAGER, specify PRINT_MANAGE as follows:
|
Information returned from relevant DCL commands | Include this information if entering a DCL command shows evidence of the problem. |
A copy of the journal file of the queue database |
Use the Backup utility (BACKUP) with the /IGNORE=INTERLOCK qualifier to
create a copy of the file SYS$QUEUE_MANAGER.QMAN$JOURNAL, and provide
this copy to Compaq.
On systems with multiple queue managers, include copies of journal files for all queue managers. Journal files for queue managers other than the default are named in the format name_of_manager.QMAN$JOURNAL. |
Copies of any process dumps that might have been created |
Enter the following commands to find any related process dumps, and
provide copies of the files to Compaq:
$ RUN SYS$SYSTEM:SYSMAN If the problem involves an execution queue using a symbiont other than PRTSMB or LATSYM, also include process dump files from the symbiont. The file name has the format image_file_name.DMP. |
Output from the SHOW QUEUE command | If your problem affects individual queues, enter the SHOW QUEUE command to show each affected queue. |
Any other relevant information |
For example:
|
Previous | Next | Contents | Index |
![]() ![]() ![]() ![]() |
privacy and legal statement | ||
6017PRO_057.HTML |