[OpenVMS documentation]
[Site home] [Send comments] [Help with this site] [How to order documentation] [OpenVMS site] [Compaq site]
Updated: 11 December 1998

OpenVMS VAX System Dump Analyzer Utility Manual


Previous Contents Index

9.4.1 Examining the Routine

The MOVB instruction is part of a routine that reads characters from a buffer and writes them to the printer. The routine contains the loop of instructions that starts at the label 20$ and ends at 25$. This loop executes once for each character in the buffer, performing these steps:

  1. The driver checks the printer's status register to see if the printer is ready.
  2. If the printer is ready, the driver gets a character from the buffer and moves it to the printer's data register, to which R0 points.
  3. It then decrements R1, which contains the count of characters left to print. If R1 contains a number greater than 0, control is passed back to the instruction at 20$, and the loop begins again.

Steps 1 and 2 are repeated until the contents of R1 are 0 or the printer signals that it is not ready.

If the printer signals that it is not ready, the driver transfers control to 30$ (line 598), the beginning of a routine that waits for an interrupt from the printer. When the printer becomes ready, it interrupts the driver and execution of the loop resumes.

Examine the code to determine which variables control the loop.

The byte count (BCNT) is the number of characters in the buffer. Note that BCNT is set by a function decision table (FDT) routine and that this routine sets the value of BCNT to the number of characters in the buffer. In line 586, the starting address of a buffer that is BCNT bytes in size is moved into R3.

Note also that the number of characters left to be printed is represented by the byte offset (BOFF), the offset into the buffer at which the driver finds the next character to be printed. This value controls the number of times the loop is executed.

Because the exception is an access violation, either R3 or R0 must contain an incorrect value. You can determine that R0 is probably valid by the following logic:

Thus, the contents of R3 seem to be the cause of the failure.

The most likely reason that the contents of R3 are wrong is that the MOVB instruction at line 599 executes too many times. You can check this by comparing the contents of UCB$W_BOFF and UCB$W_BCNT. If UCB$W_BOFF contains a larger value than that in UCB$W_BCNT, then R3 contains a value that is too large, indicating that the MOVB instruction has incremented the contents of R3 too many times.

9.4.2 Checking the Values of Key Variables

Because the start-I/O routine requires that R5 contain the address of the printer's UCB, and because several other instructions reference R5 without error before any instruction in the loop does, you can assume that R5 contains the address of the right UCB. To compare BOFF and BCNT, use the command FORMAT @R5 to display the contents of the UCB, as shown in the following session.


SDA> READ SYS$SYSTEM:SYSDEF.STB
SDA> FORMAT @R5


8005D160    UCB$L_FQFL      800039A8 
            UCB$L_RQFL 
            UCB$W_MB_SEED 
            UCB$W_UNIT_SEED 
8005D164    UCB$L_FQBL      800039A8 
            UCB$L_RQBL 
8005D168    UCB$W_SIZE          0122 
8005D16A    UCB$B_TYPE        10 
8005D16B    UCB$B_FIPL      34 
            UCB$B_FLCK 
   .
   .
   .
8005D1C8    UCB$L_SVAPTE    80062720 
8005D1CC    UCB$W_BOFF          0795 
8005D1CE    UCB$W_BCNT      006D 
8005D1D0    UCB$B_ERTCNT          00 
8005D1D1    UCB$B_ERTMAX        00 
8005D1D2    UCB$W_ERRCNT    0000 
   .
   .
   .
SDA> 

If you have only one printer in your system configuration, you do not need to use the FORMAT command. Instead, you can use the command SHOW DEVICE LP. Because only one printer is connected to the processor, only one UCB is associated with a printer for SDA to display.

The output produced by the FORMAT @R5 command shows that UCB$W_BOFF contains a value greater than that in UCB$W_BCNT; it should be smaller. Therefore, the value stored in BOFF is incorrect.

Thus, the value of BOFF is not the number of characters that remain in the buffer. This value is used in calculating an address that is referenced at an elevated IPL. When this address is within a null page (unreadable in all access modes), an attempt to reference it causes the system to fail.

9.4.3 Identifying and Correcting the Defective Code

Examine the printer driver code to locate all instructions that modify UCB$W_BOFF. The value changes in two circumstances:

When the printer times out, the driver should not modify UCB$W_BOFF. It does so, however, in line 631. The driver should modify the contents of UCB$W_BOFF only when it is certain that the printer printed the character. When the printer times out, this is not the case. Furthermore, the wait-for-interrupt routine preserves only registers R3, R4, and R5, so that only those registers can be used unmodified after the execution of the wait-for-interrupt routine. Thus, the use of R1 in line 631 is an error.

To correct the problem, change the WFIKPCH argument (line 616) so that, when the printer times out, the WFIKPCH macro transfers control to 50$ rather than to 40$.


607 
608 30$: BNEQ    40$                  ;If NEQ paper problem 
609      ADDW3   #1,R1,UCB$W_BOFF(R5) ;Save number of characters remaining 
610      DEVICELOCK - 
611              LOCKADDR=UCB$L_DLCK(R5),-  ;Lock device interrupts 
612              SAVIPL=-(SP)         ;Save current IPL      
613      BITW    #^X80,LP_CSR(R4)     ;Is it ready now? 
614      BNEQ    35$                  ;If NEQ, yes, it's ready 
615      BISB    #^X40,LP_CSR(R4)     ;Set interrupt enable 
616      WFIKPCH 40$,#12              ;Wait for ready interrupt 
617      IOFORK                       ;Create a fork process 
618      BRB     10$                  ;  ...and start next output 
619 
620 35$: 
621      DEVICEUNLOCK - 
622              LOCKADDR=UCB$L_DLCK(R5),-  ;Unlock device interrupts 
623              NEWIPL=(SP)+         ;Restore IPL 
624      CLRW    LP_CSR(R4)           ;Disable device interrupts 
625      BRB     10$                  ;Go transfer more characters 
626 ; 
627 ; PRINTER HAS PAPER PROBLEM 
628 ; 
629 
630 40$: CLRL    UCB$L_LP_OFLCNT(R5)  ;Clear offline counter 
631      ADDW3   #1,R1,UCB$W_BOFF(R5) ;Save number of characters remaining 
632 50$: CLRW    LP_CSR(R4)           ;Disable printer interrupt 
633      IOFORK                       ;Lower to fork level 
634      BBS     #UCB$V_CANCEL,UCB$W_STS(R5),80$  ;If set, cancel I/O operation 
635      TSTW    LP_CSR(R4)           ;Printer still have paper problem? 
636      BLSS    55$                  ;If LSS yes 
637      MOVL    #15,UCB$L_LP_TIMEOUT(R5)  ;Set timeout value 
638      BRB     10$                  ; ...and start next output 

10 Inducing a System Failure

If the operating system is not performing well and you want to create a dump you can examine, you must induce a system failure. Occasionally, a device driver or other user-written, kernel-mode code can cause the system to execute a loop of code at a high priority, interfering with normal system operation. This can occur even though you have set a breakpoint in the code if the loop is encountered before the breakpoint. To gain control of the system in such circumstances, you must cause the system to fail and then reboot it.

If the system has suspended all noticeable activity (if it is "hung"), see the examples of causing system failures in Section 10.2.

If you are generating a system crash in response to a system hang, be sure to record the PC at the time of the system halt as well as the contents of the general registers. Submit this information to Digital, along with the Software Performance Report (SPR) and a copy of the generated system dump file.

10.1 Meeting Crash Dump Requirements

The following requirements must be met before the system can write a complete crash dump:

10.2 Examples of How to Cause System Failures

The following examples show the sequence of console commands needed to cause a system failure on each type of processor. In each instance, after halting the processor and examining its registers, you place the equivalent of --1 (for example, FFFFFFFF16) into the PC. The value placed in the PSL sets the processor access mode to kernel and the IPL to 31. After these commands are executed, an INVEXCEPTN bugcheck is reported on the console terminal, followed by a listing of the contents of the processor registers.

The console volume of most processors contains a command file named either CRASH.COM or CRASH.CMD, which you can execute to perform these commands. Note that the console sessions recorded in this section omit much of the information the console displays in response to the listed commands.

VAX 85x0/8700/88x0

The following series of console commands causes a system failure on the VAX 85x0/8700/88x0 systems. (Note that the console prompt for the VAX 8810, 8820, and 8830 systems is PS-CIO-0> and not >>>.)


$ [Ctrl/P]
>>> SET CPU CURRENT_PRIMARY
>>> HALT
?00       Left CPU -- CPU halted 
          PC = 8001911C
>>> @CRASH
!
! Command procedure to force bugcheck via access violation
!
SET VERIFY
SET CPU CURRENT_PRIMARY    !Select primary
EXAMINE PSL                !Display PSL
        M 00000000 00420008
EXAMINE/I/NEXT 4 0
. 
. 
. 
 
 
DEPOSIT PC FFFFFFFF        !Set PC=-1 to force ACCVIO
DEPOSIT PSL 41F0000        !Set IPL=31, interrupt stack
CONTINUE                   !Execute from PC=-1
 

VAX 82x0/83x0, VAXstation 3520/3540, 6000 Series, and 9000 Series

The following console commands cause a system failure on a VAX 82x0/83x0 system, a VAXstation 3520/3540 system, a VAX 6000 series system, or a VAX 9000 series system.


$ [Ctrl/P]
        PC = 80008B1F
>>> E P
>>> E/I 0
>>> E/I +
>>> E/I +
>>> E/I +
>>> E/I +
>>> D/G F FFFFFFFF
>>> D P 41F0000
>>> C
 

VAX 8600/8650

The following console commands cause a system failure on the VAX 8600/8650 systems.


$ [Ctrl/P]
>>> @CRASH
    SET QUIET OFF          !Make clearer 
    SET ABORT OFF          !Don't abort on E/VIR command 
    HALT 
        CPU stopped, INVOKED BY CONSOLE (CSM code 11) 
        PC 80008B1F 
    UNJAM                  !Clear the way 
    E PSL                  !Display PSL 
        U PSL 00000000 
    E/I/N:4 0              !Display stack pointers
   .
   .
   .
 
    E SP                   !Get current stack pointers 
        G 0E 80000C40 
    E/vir/next:40 @        !Dump top of stack
   .
   .
   .
 
    D PC FFFFFFFF          !Invalidate the PC 
    D PSL 1F0000           !Kernel mode, IPL 31 
    SET ABORT ON           !Restore abort flag 
    SET QUIET ON           !Shut output off 
    CONTINUE               !Force a machine check
 

VAX-11/780 and VAX-11/785

The following console commands cause a system failure on the VAX-11/780 and VAX-11/785 processors.


$ [Ctrl/P]
>>> @CRASH
HALT                       !Halt system, examine PC,
HALTED AT 80008A89 
 
EXAMINE PSL                !PSL,
00000000 
 
EXAMINE/INTERN/NEXT:4 0    !and all stack pointers
 
DEPOSIT PC = -1            !Invalidate PC
DEPOSIT PSL = 41F0000       !Kernel mode, IPL 31 
 
 CONTINUE
 

VAX-11/750

The following code causes a system failure on a VAX-11/750. On this processor, the HALT command is a NOP; a Ctrl/P automatically halts the processor.


$ [Ctrl/P]
>>> H
>>> E P
>>> E/I 0
>>> E/I +
>>> E/I +
>>> E/I +
>>> E/I +
>>> D/G F FFFFFFFF
>>> D P 41F0000
>>> C

MicroVAX 3400/3600/3900 Series, VAXstation/MicroVAX 3100, VAXstation/MicroVAX 2000, MicroVAX II, and VAX 4000 Series

To force a crash of a MicroVAX, you must first halt the processor. (After you halt the processor, press the HALT button again so that it is popped out and is not illuminated.) Then, issue the following console commands:


>>> E PSL
>>> E/I/N:4 0
>>> D PC FFFFFFFF
>>> D PSL 41F0000
>>> C

VAX-11/730

The following console commands cause a system failure on a VAX-11/730. Ctrl/P automatically halts the processor.


$ [Ctrl/P]
>>> H
>>> E PSL
>>> E/I/N:4 0
>>> D PC FFFFFFFF
>>> D PSL 1F0000
>>> C

SDA Usage Summary

The System Dump Analyzer is a utility that you can use to help determine the causes of system failures. This utility is also useful for examining the running system.

Format

analyze {/CRASH_DUMP [/RELEASE] filespec| /SYSTEM} [/SYMBOL=system-symbol-table]


Command Parameter

filespec

Name of the file that contains the dump you want to analyze. At least one field of the filespec is required, and it can be any field. The default filespec is the highest version of SYSDUMP.DMP in your default directory.
Usage Summary

The following table summarizes how to perform key SDA operations.

Operation Command Explanation or Requirements
Invoke SDA to analyze a system dump $ ANALYZE/CRASH_DUMP filename If you do not specify a file name, SDA prompts you for one.

Reading the dump file usually requires system privilege (SYSPRV), but your system manager can allow less privileged processes to read dump files.

Your process needs change-mode-to-kernel (CMKRNL) privilege to release page file dump blocks, whether you use the /RELEASE qualifier or the SDA COPY command.

Invoke SDA to analyze a running system $ ANALYZE/SYSTEM Your process must have change-mode-to-kernel (CMKRNL) privilege. You cannot specify a file name with the /SYSTEM qualifier.
Send all output from SDA to a file SDA> SET OUTPUT filename The file produced is 132 columns wide and is formatted for output to a printer.
Redirect the output to your terminal $ SET OUTPUT SYS$OUTPUT  
Send a copy of all the commands you enter and all the output those commands produce to a file SDA> SET LOG filename The file produced is 132 columns wide and is formatted for output to a printer.
Exit an SDA display or the SDA utility SDA> EXIT If SDA is in display mode, you must use the EXIT command twice: once to exit display mode and a second time to exit SDA.

SDA Qualifiers

The following qualifiers, described in this section, determine whether the object of an SDA session is a crash dump or a running system. They also help create the environment of an SDA session. Table SDA-10 briefly describes the SDA qualifiers.

Table SDA-10 Descriptions of SDA Qualifiers
Qualifier Description
/CRASH_DUMP Invokes SDA to analyze a specified dump file
/RELEASE Invokes SDA to release those blocks that are occupied by a crash dump in a specified system paging file
/SYMBOL Specifies a system symbol table for SDA to use in place of the system symbol table it uses by default (SYS$SYSTEM:SYS.STB)
/SYSTEM Invokes SDA to analyze a running system

/CRASH_DUMP

Invokes SDA to analyze the specified dump file.

Format

/CRASH_DUMP filespec


Parameter

filespec

Name of the crash dump file to be analyzed. The default file specification is: SYS$DISK and [default-dir] represent the disk and directory specified in your last SET DEFAULT command. If you do not specify filespec, SDA prompts you for it.

Description

See Section 2 for additional information on crash dump analysis.

Examples

#1

$ ANALYZE/CRASH_DUMP SYS$SYSTEM:SYSDUMP.DMP
$ ANALYZE/CRASH SYS$SYSTEM
      

These commands invoke SDA to analyze the crash dump stored in SYS$SYSTEM:SYSDUMP.DMP.

#2

$ ANALYZE/CRASH SYS$SYSTEM:PAGEFILE.SYS
      

This command invokes SDA to analyze a crash dump stored in the system paging file.

/RELEASE

Invokes SDA to release those blocks in the specified system paging file occupied by a crash dump.

Format

/RELEASE filespec


Parameter

filespec

Name of the system page file (SYS$SYSTEM:PAGEFILE.SYS). The default file specification is: SYS$DISK and [default-dir] represent the disk and directory specified in your last SET DEFAULT command. If you do not specify filespec, SDA prompts you for it.

Description

You use the /RELEASE qualifier to release from the system paging file those blocks occupied by a crash dump. When invoked with the /RELEASE qualifier, SDA immediately deletes the dump from the paging file and allows no opportunity to analyze its contents.

When you specify the /RELEASE qualifier in the ANALYZE command, you must also do the following:

  1. Use the /CRASH_DUMP qualifier.
  2. Include the name of the system paging file (SYS$SYSTEM:PAGEFILE.SYS) as the filespec.

If you do not specify the system paging file or the specified paging file does not contain a dump, SDA generates the following messages:


%SDA-E-BLKSNRLSD, no dump blocks in page file to release, or not page file 
%SDA-E-NOTPAGFIL, specified file is not the page file 


Example


$ ANALYZE/CRASH_DUMP/RELEASE SYS$SYSTEM:PAGEFILE.SYS
      

This command invokes SDA to release to the paging file those blocks in SYS$SYSTEM:PAGEFILE.SYS occupied by a crash dump.

/SYMBOL

Specifies a system symbol table for SDA to use in place of the system symbol table it uses by default (SYS$SYSTEM:SYS.STB).

Format

/SYMBOL =system-symbol-table


Parameter

system-symbol table

File specification of the SDA system symbol table needed to define symbols required by SDA to analyze a dump from a particular system. The specified system-symbol-table must contain those symbols required by SDA to find certain locations in the executive image.

If you do not specify the /SYMBOL qualifier, SDA uses SYS$SYSTEM:SYS.STB by default. When you do specify the /SYMBOL qualifier, SDA assumes the default disk and directory to be SYS$DISK: that is, the disk and directory specified in your last SET DEFAULT command. If SDA is given a file that is not a system symbol table in the /SYMBOL qualifier, it halts with a fatal error.


Description

The /SYMBOL qualifier allows you to specify a system symbol table, other than SYS$SYSTEM:SYS.STB, to load into the SDA symbol table. This might be necessary, for instance, to analyze a crash dump taken on a processor running a different version of OpenVMS.

You can use the /SYMBOL qualifier whether you are analyzing a system dump or a running system.


Example


$ ANALYZE/CRASH_DUMP/SYMBOL=SYS$CRASH:SYS.STB SYS$SYSTEM
      

This command invokes SDA to analyze the crash dump stored in SYS$SYSTEM:SYSDUMP.DMP, using the system symbol table at SYS$CRASH:SYS.STB.


Previous Next Contents Index

[Site home] [Send comments] [Help with this site] [How to order documentation] [OpenVMS site] [Compaq site]
[OpenVMS documentation]

Copyright © Compaq Computer Corporation 1998. All rights reserved.

Legal
4556PRO_004.HTML