Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS MACRO-32 Porting and User's Guide


Previous Contents Index

2.10.2 Preserving Granularity

To preserve the granularity of a VAX MACRO memory write instruction on a byte, word, or unaligned longword on Alpha means to guarantee that the instruction executes successfully on the specified data and preserves the integrity of the surrounding data.

The VAX architecture includes instructions that perform independent access to byte, word, and unaligned longword locations in memory so two processes can write simultaneously to different bytes of the same aligned longword without interfering with each other.

The Alpha architecture defines instructions that can address only aligned longword and quadword operands. On Alpha, code that writes a data field to memory that is less than a longword in length or is not aligned can do so only by using an interruptible instruction sequence that involves a quadword load, an insertion of the modified data into the quadword, and a quadword store. In this case, two processes that intend to write to different bytes in the same quadword will actually load, perform operations on, and store the whole quadword. Depending on the timing of the load and store operations, one of the byte writes could be lost.

The compiler provides the /PRESERVE=GRANULARITY option to guarantee the integrity of byte, word, and unaligned longword writes. The /PRESERVE=GRANULARITY option causes the compiler to generate Alpha instructions that provide granularity preservation for any VAX instructions that write to bytes, words, or unaligned longwords. Alternatively, you can insert the .PRESERVE GRANULARITY and .NOPRESERVE GRANULARITY directives in sections of VAX MACRO source code as required to enable and disable granularity preservation.

For example, the instruction MOVB R1, (R2) generates the following Alpha code sequence:


        LDQ_U     R28,(R2) 
        MSKBL     R28,R2,R28 
        INSBL     R1,R2,R25 
        BIS       R25,R28,R25 
        STQ_U     R25,(R2) 

If any other code thread modifies part of the data pointed to by (R2) between the LDQ_U and the STQ_U instructions, that data will be overwritten and lost.

If you have specified that granularity be preserved for the same instruction, by either the command qualifier or the directive, the Alpha command sequence becomes the following:


          BIC       R2,#^B0111,R24 
RETRY:    LDQ_L     R28,(R24) 
          MSKBL     R28,R2,R28 
          INSBL     R1,R2,R25 
          BIS       R25,R28,R25 
          STQ_C     R25,(R24) 
          BEQ       R25, FAIL 
           . 
           . 
           . 
FAIL:     BR        RETRY 

In this case, if the data pointed to by (R2) is modified by another code thread, the operation will be retried.

For a MOVW R1,(R2) instruction, the code generated to preserve granularity depends on whether the register R2 is currently assumed to be aligned by the compiler's register alignment tracking. If R2 is assumed to be aligned, the compiler generates essentially the same code as in the preceding MOVB example, except that it uses INSWL and MSKWL instructions instead of INSBL and MSKBL, and it uses #^B0110 in the BIC of the R2 address. If R2 is assumed to be unaligned, the compiler generates two separate LDQ_L/STQ_C pairs to ensure that the word is correctly written even if it crosses a quadword boundary.

Warning

The code generated for an aligned word write, with granularity preservation enabled, will cause a fatal reserved operand fault at run time if the address is not aligned. If the address being written to could ever be unaligned, inform the compiler that it should generate code that can write to an unaligned word by using the compiler directive .SET_REGISTERS UNALIGNED=Rn immediately before the write instruction.

To preserve the granularity of a MOVL R1,(R2) instruction, the compiler always writes whole longwords with a STL instruction, even if the address to which it is writing is assumed to be unaligned. If the address is unaligned, the STL instruction will cause an unaligned memory reference fault. The PALcode unaligned fault handler will then do the loads, masks, and stores necessary to write the unaligned longword. However, since PALcode is noninterruptible, this ensures that the surrounding memory locations are not corrupted.

When porting an application to an Alpha system, you should determine whether the application performs byte, word, or unaligned longword writes to memory that is shared either with processes executing on the local processor, or with processes executing on another processor in the system, or with an AST routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications for a more complete discussion of the programming issues involved in granularity operations in an Alpha system.

Note

INSV instructions do not generate code that correctly preserves granularity when granularity is turned on.

2.10.3 Precedence of Atomicity Over Granularity

If you enable the preservation of both granularity and atomicity, and the compiler encounters VAX code that requires that both be preserved, atomicity takes precedence over granularity.

For example, the instruction INCW 1(R0), when compiled with .PRESERVE=GRANULARITY, retries the write of the new word value, if it is interrupted. However, when compiled with .PRESERVE=ATOMICITY, it will also refetch the initial value and increment it, if interrupted. If both options are specified, it will do the latter.

In addition, while the compiler can successfully generate code for unaligned words and longwords that preserves granularity, it cannot generate code for unaligned words or longwords that preserves atomicity. If both options are specified, all memory references must be to aligned addresses.

2.10.4 Examples When Atomicity Cannot Be Guaranteed

Because compiler atomicity guarantees only affect memory modification operands in VAX instructions, you should take special care in examining VAX MACRO sources for coding problems /PRESERVE=ATOMICITY cannot resolve. For instance, consider the following VAX instruction:


ADDL2 (R1),4(R1) 

For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:


        LDL     R28,(R1) 
Retry:  LDL_L   R24,4(R1) 
        ADDL    R28,R24,R24 
        STL_C   R24,4(R1) 
        BEQ     fail 
        . 
        . 
        . 
fail:   BR      Retry 

Note that, in this Alpha code sequence, when the STL_C fails, only the modify operand is reread before the add. The data (R1) is not reread. This behavior differs slightly from VAX behavior. In a VAX system, the entire instruction would execute without interruption; in an Alpha system, only the modify operand is updated atomically.

As a result, code that requires the read of the data (R1) to be atomic must use another method, such as a lock, to obtain that level of synchronization.

Consider another VAX instruction:


MOVL    (R1),4(R1) 
For this instruction, the compiler generates an Alpha code sequence such as the following whether or not atomicity preservation was turned on:


LDL     R28,(R1) 
STL     R28,4(R1) 

The VAX instruction in this example is atomic on a single VAX CPU, but the Alpha instruction sequence is not atomic on a single Alpha CPU. Because the 4(R1) operand is a write operand and not a modify operand, the operation is not made atomic by the use of the LDL_L and STL_C.

Finally, consider a more complex VAX INCL instruction:


INCL    @(R1) 
For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:


        LDL     R28,(R1) 
Retry:  LDL_L   R24,(R28) 
        ADDL    R24,#1,R24 
        STL_C   R24,(R28) 
        BEQ     fail 
        . 
        . 
        . 
fail:   BR      Retry 

Here, only the update of the modify data is atomic. The fetch required to obtain the address of the modify data is not part of the atomic sequence.

2.10.5 Alignment Considerations for Atomicity

When preserving atomicity, the compiler must assume the modify data is aligned. An update of a field spanning a quadword boundary cannot occur atomically since this would require two read-modify-write sequences. Since software cannot handle an unaligned LDx_L or STx_C instruction as it can a normal load or store instruction, a LDx_L or STx_C instruction to an unaligned address will generate a fatal reserved operand fault.

When /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified, an INCL (R1) instruction generates LDL_L and STL_C instructions so R1 must be longword aligned.

For an INCW (R1) instruction, the compiler generates an Alpha code sequence such as the following:


        BIC     R1,#^B0110,R28  ; Compute Aligned Address 
Retry:  LDQ_L   R24,(R28)       ; Load the QW with the data 
        EXTWL   R24,R1,R23     ; Extract out the Word 
        ADDL    R23,#1,R23      ; Increment the Word 
        INSWL   R23,R1,R23     ; Correctly position the Word 
        MSKWL   R24,R1,R24     ; Zero the spot for the Word 
        BIS     R23,R24,R23     ; Combine Original and New word 
        STQ_C   R23,(R28)       ; Conditionally store result 
        BEQ     fail            ; Branch ahead on failure 
        . 
        . 
        
        . 
fail:   BR      Retry 
Note that the first BIC instruction uses #^B0110, not #^B0111. This is to ensure that the word does not cross a quadword boundary, which would result in an incomplete memory update. If the address in R1 is not pointing to an aligned word, bit 0 will be set and the bit will not be cleared by the BIC instruction. The Load Quadword Locked instruction (LDQ_L) will then generate a fatal reserved operand fault.

An INCB instruction uses #^B0111 to generate the aligned address since all bytes are aligned.

2.10.6 Interlocked Instructions and Atomicity

The compiler's methods of preserving atomicity have an interesting side effect in compiled VAX MACRO code. On VAX systems, only the interlocked instructions will work correctly to synchronize access to shared data in multiprocessor systems. On Alpha multiprocessing systems, the code resulting from a compilation of modify instructions (with atomicity preserved) and interlocked instructions would both work correctly, because the LDx_L and STx_C which the compiler generates for both sets of instructions operate correctly across multiple processors.

Because this compiler side effect is specific to Alpha systems and does not port back to VAX systems, you should avoid relying on it when porting VAX MACRO code to Alpha if you intend to run the code on both systems.

However, interlocked instructions must still be used if the memory modification is being used as an interlock for other instructions for which atomicity is not preserved. This is because the Alpha architecture does not guarantee strict write ordering. For example, consider the following VAX MACRO code sequence:


.PRESERVE ATOMICITY 
INCL (R1) 
.NOPRESERVE ATOMICITY 
MOVL (R2),R3 

This code sequence will generate the following Alpha code sequence:


Retry:  LDL_L   R28,(R1) 
        ADDL    R28,#1,R28 
        STL_C   R28,(R1) 
 
        BEQ     R28, fail 
        LDL     R3, (R2) 
         . 
         . 
         . 
fail:   BR      Retry 

Because of the data prefetching of the Alpha architecture, the data from (R2) may be read before the store to (R1) is processed. If the INCL (R1) instruction is being used as a lock to prevent the data at (R2) from being accessed before the lock is set, the read of (R2) may occur before the increment of (R1) and thus is not protected.

The VAX interlocked instructions generate Alpha MB (memory barrier) instructions before and after the interlocked sequence. This prevents memory loads from being moved across the interlocked instruction.

For example, consider the following code sequence:


ADAWI     #1,(R1) 
MOVL      (R2),R3 

This code sequence will generate the following Alpha code sequence:


        MB 
Retry:  LDL_L   R28,(R1) 
        ADDL    R28,#1,R28 
        STL_C   R28,(R1) 
 
        BEQ     R28, Fail 
        MB 
        LDL     R3, (R2) 
         . 
         . 
         . 
Fail:   BR      Retry 
 

The MB instructions cause all memory operations before the MB instruction to complete before any memory operations after the MB instruction are allowed to begin.

2.11 Compiling and Linking

The compiler requires the following files, one for compiling, the other for linking:
File Description
SYS$LIBRARY:STARLET.MLB Macro library that defines the compiler directives.
SYS$LIBRARY:STARLET.OLB Object library containing emulation routines and other routines used by the compiler.

When you compile your code, the compiler automatically checks STARLET.MLB for definitions of compiler directives. Similarly, when you link your code, the linker links against STARLET.OLB to resolve undefined symbols.

The following is an example of a command procedure used to compile the MACRO-32 module [SYS]SYSSNDJBC.MAR:


$ SET DEFAULT WORK1:[PEAK.A.PORT] 
 
$ MACRO/MIGRATION/LIS=LIS$SYSSNDJBC-ALPHA.LIS - 
        ALPHA$LIBRARY:STARLET/LIB+ - 
        ALPHA$LIBRARY:LIB/LIB+ - 
        ALPHA$LIBRARY:ARCH_DEFS.MAR+ - 
        SRC$SYSSNDJBC.MAR 
$ MACRO/NOOBJECT/LIS=LIS$:SYSSNDJBC-VAX - 
        VAX$LIBRARY:STARLET/LIB+ - 
        VAX$LIBRARY:LIB/LIB+ - 
        VAX$LIBRARY:ARCH_DEFS.MAR+ - 
        SRC$SYSSNDJBC.MAR 
$ EXIT 

Not all modules need both libraries and many modules need component-specific libraries, but this example shows the basic approach to using the compiler.

Note

Compaq recommends that you use the latest version of the compiler. Make sure to use the version of SYS$LIBRARY:STARLET.MLB that ships with the compiler and make sure that the logical name points to the correct directory. Note that SYS$LIBRARY:STARLET.MLB is equivalent to ALPHA$LIBRARY:STARLET.MLB.

2.11.1 Line Numbering in Listing File

The macro expansion line numbering scheme in the listing file is Xnn/mmm, where Xnn shows the nesting depth and mmm is the line number relative to the outermost macro, as shown in the following example.


.MAIN.                  Source Listing    9-SEP-1996 11:36:03    AMAC V3.0-20-311D                    
                                         20-JUL-1992 11:05:38    X6AJ_RESD$:[SYSLIB]ARCH_DEFS.MAR;1 
 
              00000000    1 ; 
              00000000    2 ; This is the ALPHA (previously called "EVAX") version of ARCH_DEFS.MAR, 
              00000000    3 ; which contains architectural definitions for compiling VMS sources 
              00000000    4 ; for VAX and ALPHA systems. 
              00000000    5 ; 
00000001      00000000    6 EVAX = 1 
00000001      00000000    7 ALPHA = 1 
00000001      00000000    8 BIGPAGE = 1 
00000020      00000000    9 ADDRESSBITS = 32 
              00000000   10          .macro  test1 
              00000000   11          clrl    r1 
              00000000   12          clrl    r2 
              00000000   13          tstl    48(sp)    ; generate uplevel stack error 
              00000000   14          clrl    r3 
              00000000   15          .endm 
              00000000   16          .macro  test2 
              00000000   17          clrl    r4 
              00000000   18          clrl    r5 
              00000000   19          test1 
              00000000   20          clrl    r6 
              00000000   21          .endm 
              00000000   22 
              00000000   23  foo:    .jsb_entry 
                 . 
                 . 
                 . 
              00000000   44          clrl    r0 
              00000011   45          test2 
                           1.......      
%AMAC-E-UPLEVSTK, (1) up-level stack reference in routine FOO 
 
    X01/001   00000002               clrl    r4 
    X01/002   00000004               clrl    r5 
    X01/003   00000006               test1 
    X02/004   00000006               clrl    r1 
    X02/005   00000008               clrl    r2 
    X02/006   0000000A               tstl    48(sp)     ; generate uplevel stack error 
    X02/007   0000000D               clrl    r3 
    X02/008   0000000F   
    X01/009   0000000F               clrl    r6 
    X01/010   00000011   
              00000011   46          rsb 
              00000012   47          .end 
 

2.11.2 Linking an Object Module

To link the object files produced by the compiler, use the following commands as a basis:


$ @ALPHA$TOOLS:LINK       ! Set up DCL and Logical to EXE 
$ LINK/ALPHA image_name,object1,object2,... 

For certain VAX instructions (such as the divide instructions and others described in this manual), the compiler produces object code that issues a call to the OpenVMS General-Purpose Run-Time Library (OTS$ RTL). By default, the linker links against the library that that contains these routines.

2.12 Debugging

The compiler provides full debugger support. The debug session for compiled VAX MACRO code is similar to that for assembled VAX MACRO code. However, there are some important differences that are described in this section. For a complete description of debugging, see the OpenVMS Debugger Manual.

2.12.1 Code Relocation

One major difference is that the code is compiled rather than assembled. On a VAX system, each VAX MACRO instruction is a single machine instruction. On an Alpha system, each VAX MACRO instruction may be compiled into many Alpha machine instructions. A major side effect of this difference is the relocation and rescheduling of code if you do not specify /NOOPTIMIZE in your compile command.

By default, several optimizations are performed that cause the movement of generated code across source boundaries (see Section 1.2, Section 4.3, and Appendix A). For most code modules, debugging is simplified if you compile with /NOOPTIMIZE, which prevents this relocation from happening. After you have debugged your code, you can recompile without /NOOPTIMIZE to improve performance.

2.12.2 Symbolic Variables for Routine Arguments

Another major difference between debugging compiled code and debugging assembled code is a new concept to VAX MACRO, the definition of symbolic variables for examining routine arguments. On VAX systems, when you are debugging a routine and want to examine the arguments, you typically do something like the following:


        DBG> EXAMINE @AP        ; to see the argument count 
        DBG> EXAMINE @AP+4      ; to examine the first arg 

or


        DBG> EXAMINE @AP        ; to see arg count 
        DBG> EXAMINE .+4:.+20   ; to see first 5 args 

On Alpha systems, the arguments do not reside in a vector in memory as they do on VAX systems. Furthermore, there is no AP register on Alpha systems. If you type EXAMINE @AP when debugging VAX MACRO compiled code, the debugger reports that AP is an undefined symbol.

In the compiled code, the arguments can reside in some combination of:

The compiler does not require that you figure out where the arguments are by reading the generated code. Instead, it provides $ARGn symbols that point to the correct argument locations. The $ARG0 symbol is the same as @AP+0 is on VAX systems, that is, the argument count. The $ARG1 symbol is the first argument, $ARG2 is the second argument, and so forth. These symbols are defined in CALL_ENTRY and JSB_ENTRY directives, but not in EXCEPTION_ENTRY directives.

2.12.3 Locating Arguments Without $ARGn Symbols

There may be additional arguments in your code for which the compiler did not generate a $ARGn symbol. The number of $ARGn symbols defined for a .CALL_ENTRY routine is the maximum number detected by the compiler (either by automatic detection or as specified by MAX_ARGS) or 16, whichever is less. For a .JSB_ENTRY routine, since the arguments are homed in the caller's stack frame and the compiler cannot detect the actual number, it always creates eight $ARGn symbols.

In most cases, you can easily find any additional arguments, but in some cases you cannot.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
5601PRO_004.HTML