OpenVMS MACRO-32 Porting and User's Guide

Document revision date: 30 March 2001

OpenVMS MACRO-32 Porting and User's Guide

Contents

Index

2.10.2 Preserving Granularity

To preserve the granularity of a VAX MACRO memory write instruction on a byte, word, or unaligned longword on Alpha means to guarantee that the instruction executes successfully on the specified data and preserves the integrity of the surrounding data.

The VAX architecture includes instructions that perform independent access to byte, word, and unaligned longword locations in memory so two processes can write simultaneously to different bytes of the same aligned longword without interfering with each other.

The Alpha architecture defines instructions that can address only aligned longword and quadword operands. On Alpha, code that writes a data field to memory that is less than a longword in length or is not aligned can do so only by using an interruptible instruction sequence that involves a quadword load, an insertion of the modified data into the quadword, and a quadword store. In this case, two processes that intend to write to different bytes in the same quadword will actually load, perform operations on, and store the whole quadword. Depending on the timing of the load and store operations, one of the byte writes could be lost.

The compiler provides the /PRESERVE=GRANULARITY option to guarantee the integrity of byte, word, and unaligned longword writes. The /PRESERVE=GRANULARITY option causes the compiler to generate Alpha instructions that provide granularity preservation for any VAX instructions that write to bytes, words, or unaligned longwords. Alternatively, you can insert the .PRESERVE GRANULARITY and .NOPRESERVE GRANULARITY directives in sections of VAX MACRO source code as required to enable and disable granularity preservation.

For example, the instruction MOVB R1, (R2) generates the following Alpha code sequence:

LDQ_U R28,(R2) MSKBL R28,R2,R28 INSBL R1,R2,R25 BIS R25,R28,R25 STQ_U R25,(R2)

If any other code thread modifies part of the data pointed to by (R2) between the LDQ_U and the STQ_U instructions, that data will be overwritten and lost.

If you have specified that granularity be preserved for the same instruction, by either the command qualifier or the directive, the Alpha command sequence becomes the following:

BIC R2,#^B0111,R24 RETRY: LDQ_L R28,(R24) MSKBL R28,R2,R28 INSBL R1,R2,R25 BIS R25,R28,R25 STQ_C R25,(R24) BEQ R25, FAIL . . . FAIL: BR RETRY

In this case, if the data pointed to by (R2) is modified by another code thread, the operation will be retried.

For a MOVW R1,(R2) instruction, the code generated to preserve granularity depends on whether the register R2 is currently assumed to be aligned by the compiler's register alignment tracking. If R2 is assumed to be aligned, the compiler generates essentially the same code as in the preceding MOVB example, except that it uses INSWL and MSKWL instructions instead of INSBL and MSKBL, and it uses #^B0110 in the BIC of the R2 address. If R2 is assumed to be unaligned, the compiler generates two separate LDQ_L/STQ_C pairs to ensure that the word is correctly written even if it crosses a quadword boundary.

Warning

The code generated for an aligned word write, with granularity preservation enabled, will cause a fatal reserved operand fault at run time if the address is not aligned. If the address being written to could ever be unaligned, inform the compiler that it should generate code that can write to an unaligned word by using the compiler directive .SET_REGISTERS UNALIGNED=Rn immediately before the write instruction.

To preserve the granularity of a MOVL R1,(R2) instruction, the compiler always writes whole longwords with a STL instruction, even if the address to which it is writing is assumed to be unaligned. If the address is unaligned, the STL instruction will cause an unaligned memory reference fault. The PALcode unaligned fault handler will then do the loads, masks, and stores necessary to write the unaligned longword. However, since PALcode is noninterruptible, this ensures that the surrounding memory locations are not corrupted.

When porting an application to an Alpha system, you should determine whether the application performs byte, word, or unaligned longword writes to memory that is shared either with processes executing on the local processor, or with processes executing on another processor in the system, or with an AST routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications for a more complete discussion of the programming issues involved in granularity operations in an Alpha system.

Note

INSV instructions do not generate code that correctly preserves granularity when granularity is turned on.

2.10.3 Precedence of Atomicity Over Granularity

If you enable the preservation of both granularity and atomicity, and the compiler encounters VAX code that requires that both be preserved, atomicity takes precedence over granularity.

For example, the instruction INCW 1(R0), when compiled with .PRESERVE=GRANULARITY, retries the write of the new word value, if it is interrupted. However, when compiled with .PRESERVE=ATOMICITY, it will also refetch the initial value and increment it, if interrupted. If both options are specified, it will do the latter.

In addition, while the compiler can successfully generate code for unaligned words and longwords that preserves granularity, it cannot generate code for unaligned words or longwords that preserves atomicity. If both options are specified, all memory references must be to aligned addresses.

2.10.4 Examples When Atomicity Cannot Be Guaranteed

Because compiler atomicity guarantees only affect memory modification operands in VAX instructions, you should take special care in examining VAX MACRO sources for coding problems /PRESERVE=ATOMICITY cannot resolve. For instance, consider the following VAX instruction:

ADDL2 (R1),4(R1)

For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:

LDL R28,(R1) Retry: LDL_L R24,4(R1) ADDL R28,R24,R24 STL_C R24,4(R1) BEQ fail . . . fail: BR Retry

Note that, in this Alpha code sequence, when the STL_C fails, only the modify operand is reread before the add. The data (R1) is not reread. This behavior differs slightly from VAX behavior. In a VAX system, the entire instruction would execute without interruption; in an Alpha system, only the modify operand is updated atomically.

As a result, code that requires the read of the data (R1) to be atomic must use another method, such as a lock, to obtain that level of synchronization.

Consider another VAX instruction:

MOVL (R1),4(R1)

For this instruction, the compiler generates an Alpha code sequence such as the following whether or not atomicity preservation was turned on:

LDL R28,(R1) STL R28,4(R1)

The VAX instruction in this example is atomic on a single VAX CPU, but the Alpha instruction sequence is not atomic on a single Alpha CPU. Because the 4(R1) operand is a write operand and not a modify operand, the operation is not made atomic by the use of the LDL_L and STL_C.

Finally, consider a more complex VAX INCL instruction:

INCL @(R1)

For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:

LDL R28,(R1) Retry: LDL_L R24,(R28) ADDL R24,#1,R24 STL_C R24,(R28) BEQ fail . . . fail: BR Retry

Here, only the update of the modify data is atomic. The fetch required to obtain the address of the modify data is not part of the atomic sequence.

2.10.5 Alignment Considerations for Atomicity

When preserving atomicity, the compiler must assume the modify data is aligned. An update of a field spanning a quadword boundary cannot occur atomically since this would require two read-modify-write sequences. Since software cannot handle an unaligned LDx_L or STx_C instruction as it can a normal load or store instruction, a LDx_L or STx_C instruction to an unaligned address will generate a fatal reserved operand fault.

When /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified, an INCL (R1) instruction generates LDL_L and STL_C instructions so R1 must be longword aligned.

For an INCW (R1) instruction, the compiler generates an Alpha code sequence such as the following:

BIC R1,#^B0110,R28 ; Compute Aligned Address Retry: LDQ_L R24,(R28) ; Load the QW with the data EXTWL R24,R1,R23 ; Extract out the Word ADDL R23,#1,R23 ; Increment the Word INSWL R23,R1,R23 ; Correctly position the Word MSKWL R24,R1,R24 ; Zero the spot for the Word BIS R23,R24,R23 ; Combine Original and New word STQ_C R23,(R28) ; Conditionally store result BEQ fail ; Branch ahead on failure . . . fail: BR Retry

Note that the first BIC instruction uses #^B0110, not #^B0111. This is to ensure that the word does not cross a quadword boundary, which would result in an incomplete memory update. If the address in R1 is not pointing to an aligned word, bit 0 will be set and the bit will not be cleared by the BIC instruction. The Load Quadword Locked instruction (LDQ_L) will then generate a fatal reserved operand fault.

An INCB instruction uses #^B0111 to generate the aligned address since all bytes are aligned.

2.10.6 Interlocked Instructions and Atomicity

The compiler's methods of preserving atomicity have an interesting side effect in compiled VAX MACRO code. On VAX systems, only the interlocked instructions will work correctly to synchronize access to shared data in multiprocessor systems. On Alpha multiprocessing systems, the code resulting from a compilation of modify instructions (with atomicity preserved) and interlocked instructions would both work correctly, because the LDx_L and STx_C which the compiler generates for both sets of instructions operate correctly across multiple processors.

Because this compiler side effect is specific to Alpha systems and does not port back to VAX systems, you should avoid relying on it when porting VAX MACRO code to Alpha if you intend to run the code on both systems.

However, interlocked instructions must still be used if the memory modification is being used as an interlock for other instructions for which atomicity is not preserved. This is because the Alpha architecture does not guarantee strict write ordering. For example, consider the following VAX MACRO code sequence:

.PRESERVE ATOMICITY INCL (R1) .NOPRESERVE ATOMICITY MOVL (R2),R3

This code sequence will generate the following Alpha code sequence:

Retry: LDL_L R28,(R1) ADDL R28,#1,R28 STL_C R28,(R1) BEQ R28, fail LDL R3, (R2) . . . fail: BR Retry

Because of the data prefetching of the Alpha architecture, the data from (R2) may be read before the store to (R1) is processed. If the INCL (R1) instruction is being used as a lock to prevent the data at (R2) from being accessed before the lock is set, the read of (R2) may occur before the increment of (R1) and thus is not protected.

The VAX interlocked instructions generate Alpha MB (memory barrier) instructions before and after the interlocked sequence. This prevents memory loads from being moved across the interlocked instruction.

For example, consider the following code sequence:

ADAWI #1,(R1) MOVL (R2),R3

This code sequence will generate the following Alpha code sequence:

MB Retry: LDL_L R28,(R1) ADDL R28,#1,R28 STL_C R28,(R1) BEQ R28, Fail MB LDL R3, (R2) . . . Fail: BR Retry

The MB instructions cause all memory operations before the MB instruction to complete before any memory operations after the MB instruction are allowed to begin.

2.11 Compiling and Linking

The compiler requires the following files, one for compiling, the other for linking:

File Description

SYS$LIBRARY:STARLET.MLB Macro library that defines the compiler directives.

SYS$LIBRARY:STARLET.OLB Object library containing emulation routines and other routines used by the compiler.

File	Description
SYS$LIBRARY:STARLET.MLB	Macro library that defines the compiler directives.
SYS$LIBRARY:STARLET.OLB	Object library containing emulation routines and other routines used by the compiler.

When you compile your code, the compiler automatically checks STARLET.MLB for definitions of compiler directives. Similarly, when you link your code, the linker links against STARLET.OLB to resolve undefined symbols.

The following is an example of a command procedure used to compile the MACRO-32 module [SYS]SYSSNDJBC.MAR:

$ SET DEFAULT WORK1:[PEAK.A.PORT] $ MACRO/MIGRATION/LIS=LIS$SYSSNDJBC-ALPHA.LIS - ALPHA$LIBRARY:STARLET/LIB+ - ALPHA$LIBRARY:LIB/LIB+ - ALPHA$LIBRARY:ARCH_DEFS.MAR+ - SRC$SYSSNDJBC.MAR $ MACRO/NOOBJECT/LIS=LIS$:SYSSNDJBC-VAX - VAX$LIBRARY:STARLET/LIB+ - VAX$LIBRARY:LIB/LIB+ - VAX$LIBRARY:ARCH_DEFS.MAR+ - SRC$SYSSNDJBC.MAR $ EXIT

Not all modules need both libraries and many modules need component-specific libraries, but this example shows the basic approach to using the compiler.

Note

Compaq recommends that you use the latest version of the compiler. Make sure to use the version of SYS$LIBRARY:STARLET.MLB that ships with the compiler and make sure that the logical name points to the correct directory. Note that SYS$LIBRARY:STARLET.MLB is equivalent to ALPHA$LIBRARY:STARLET.MLB.

2.11.1 Line Numbering in Listing File

The macro expansion line numbering scheme in the listing file is Xnn/mmm, where Xnn shows the nesting depth and mmm is the line number relative to the outermost macro, as shown in the following example.

.MAIN. Source Listing 9-SEP-1996 11:36:03 AMAC V3.0-20-311D 20-JUL-1992 11:05:38 X6AJ_RESD$:[SYSLIB]ARCH_DEFS.MAR;1 00000000 1 ; 00000000 2 ; This is the ALPHA (previously called "EVAX") version of ARCH_DEFS.MAR, 00000000 3 ; which contains architectural definitions for compiling VMS sources 00000000 4 ; for VAX and ALPHA systems. 00000000 5 ; 00000001 00000000 6 EVAX = 1 00000001 00000000 7 ALPHA = 1 00000001 00000000 8 BIGPAGE = 1 00000020 00000000 9 ADDRESSBITS = 32 00000000 10 .macro test1 00000000 11 clrl r1 00000000 12 clrl r2 00000000 13 tstl 48(sp) ; generate uplevel stack error 00000000 14 clrl r3 00000000 15 .endm 00000000 16 .macro test2 00000000 17 clrl r4 00000000 18 clrl r5 00000000 19 test1 00000000 20 clrl r6 00000000 21 .endm 00000000 22 00000000 23 foo: .jsb_entry . . . 00000000 44 clrl r0 00000011 45 test2 1....... %AMAC-E-UPLEVSTK, (1) up-level stack reference in routine FOO X01/001 00000002 clrl r4 X01/002 00000004 clrl r5 X01/003 00000006 test1 X02/004 00000006 clrl r1 X02/005 00000008 clrl r2 X02/006 0000000A tstl 48(sp) ; generate uplevel stack error X02/007 0000000D clrl r3 X02/008 0000000F X01/009 0000000F clrl r6 X01/010 00000011 00000011 46 rsb 00000012 47 .end

2.11.2 Linking an Object Module

To link the object files produced by the compiler, use the following commands as a basis:

$ @ALPHA$TOOLS:LINK ! Set up DCL and Logical to EXE $ LINK/ALPHA image_name,object1,object2,...

For certain VAX instructions (such as the divide instructions and others described in this manual), the compiler produces object code that issues a call to the OpenVMS General-Purpose Run-Time Library (OTS$ RTL). By default, the linker links against the library that that contains these routines.

2.12 Debugging

The compiler provides full debugger support. The debug session for compiled VAX MACRO code is similar to that for assembled VAX MACRO code. However, there are some important differences that are described in this section. For a complete description of debugging, see the OpenVMS Debugger Manual.

2.12.1 Code Relocation

One major difference is that the code is compiled rather than assembled. On a VAX system, each VAX MACRO instruction is a single machine instruction. On an Alpha system, each VAX MACRO instruction may be compiled into many Alpha machine instructions. A major side effect of this difference is the relocation and rescheduling of code if you do not specify /NOOPTIMIZE in your compile command.

By default, several optimizations are performed that cause the movement of generated code across source boundaries (see Section 1.2, Section 4.3, and Appendix A). For most code modules, debugging is simplified if you compile with /NOOPTIMIZE, which prevents this relocation from happening. After you have debugged your code, you can recompile without /NOOPTIMIZE to improve performance.

2.12.2 Symbolic Variables for Routine Arguments

Another major difference between debugging compiled code and debugging assembled code is a new concept to VAX MACRO, the definition of symbolic variables for examining routine arguments. On VAX systems, when you are debugging a routine and want to examine the arguments, you typically do something like the following:

DBG> EXAMINE @AP ; to see the argument count DBG> EXAMINE @AP+4 ; to examine the first arg

DBG> EXAMINE @AP ; to see arg count DBG> EXAMINE .+4:.+20 ; to see first 5 args

On Alpha systems, the arguments do not reside in a vector in memory as they do on VAX systems. Furthermore, there is no AP register on Alpha systems. If you type EXAMINE @AP when debugging VAX MACRO compiled code, the debugger reports that AP is an undefined symbol.

In the compiled code, the arguments can reside in some combination of:

Registers
On the stack above the routine's stack frame
In the stack frame, if the argument list was homed (see Section 2.3) or if there are calls out of the routine that would require the register arguments to be saved

The compiler does not require that you figure out where the arguments are by reading the generated code. Instead, it provides $ARGn symbols that point to the correct argument locations. The $ARG0 symbol is the same as @AP+0 is on VAX systems, that is, the argument count. The $ARG1 symbol is the first argument, $ARG2 is the second argument, and so forth. These symbols are defined in CALL_ENTRY and JSB_ENTRY directives, but not in EXCEPTION_ENTRY directives.

2.12.3 Locating Arguments Without $ARGn Symbols

There may be additional arguments in your code for which the compiler did not generate a $ARGn symbol. The number of $ARGn symbols defined for a .CALL_ENTRY routine is the maximum number detected by the compiler (either by automatic detection or as specified by MAX_ARGS) or 16, whichever is less. For a .JSB_ENTRY routine, since the arguments are homed in the caller's stack frame and the compiler cannot detect the actual number, it always creates eight $ARGn symbols.

In most cases, you can easily find any additional arguments, but in some cases you cannot.

Contents

Index

privacy and legal statement

5601PRO_004.HTML