Updated: 11 December 1998 |
Porting VAX MACRO Code to OpenVMS Alpha
Previous | Contents | Index |
Locking Code Written in Other Languages
Code written in other programming languages can also be locked down by using the $LOCK_PAGE_INIT macro in a VAX MACRO module. Any code in any module written in any language will be locked by this macro if the psect $LOCK_PAGE_2 is used for the generated code and the psect $LOCK_LINKAGE_2 is used for the generated linkage section.
For on-the-fly lockdown, $LOCK_PAGE and $UNLOCK_PAGE, respectively, mark the beginning and end of a section of code to be locked. The marked code becomes a separate routine in the locked psect, where all code locked anywhere in the image is placed.
$LOCK_PAGE locks the pages and linkage section of the locked routine into the working set and JSRs to it. This macro is placed inline in executable code. All code between this macro and the matching $UNLOCK_PAGE macro is included in the locked routine and is locked down.
$UNLOCK_PAGE returns from the locked routine and then unlocks the pages and linkage section from the working set. The macro is placed inline in executable code at some point after a $LOCK_PAGE macro.
$LOCK_PAGE and $UNLOCK_PAGE both have an optional parameter, ERROR, which is an error address to which to branch if the $LKWSET or $ULWSET calls fail. $UNLOCK_PAGE has a second optional parameter, LINK_SECT. LINK_SECT is a linkage psect to which to return if the linkage psect in effect when the $LOCK_PAGE macro was executed was not the default linkage psect, $LINKAGE.
All registers are preserved by both macros unless the error address parameter is present and one of the calls fail, in which case R0 reflects the status of the failed call. R1 then contains 0 if the call to lock or unlock the code failed, and 1 if that call succeeded but the call to lock or unlock the linkage section failed.
Control must enter the code through the $LOCK_PAGE macro, and must leave through the $UNLOCK_PAGE macro. The local symbol block that is in effect when the $LOCK_PAGE macro is executed is restored when the $UNLOCK_PAGE macro is executed, but since the locked code becomes a separate routine, the locked code itself is a separate local symbol block. Even if named symbols are used, branches into or out of the locked code section are not allowed, and will be flagged by the compiler with the following error:
%AMAC-E-MULTLKSEC, Routines which share code must use the same linkage psect. |
Note that since the locked code is made into a separate routine, any references to local stack storage within the routine will have to be changed, as the stack context is no longer the same.
Because on-the-fly lockdown requires the overhead of four system service calls plus an extra subroutine call every time it is executed, it is recommended that this be changed to initialization-time lockdown if the lockdown is done for any performance-critical code. If other routines in the image use initialization-time lockdown, then you must change the on-the-fly lockdown to initialization-time lockdown. |
Table 3-2 shows the code changes required to use these macros for on-the-fly lockdown. Note that the $UNLOCK_PAGE macro precedes the RSB, so that it is executed. Any status being passed by the routine in R0 and R1 remains intact because $UNLOCK_PAGE preserves these registers.
Code Section | On VAX Systems | On Alpha Systems |
---|---|---|
Main code |
Routine_A: |
Routine_A: |
Table 3-3 shows the same original code and the changes necessary for initialization-time lockdown.
Code Section | On VAX Systems | On Alpha Systems |
---|---|---|
Initialization |
Nothing. |
$LOCK_PAGE_INIT |
Main code |
Routine_A: |
$LOCKED_PAGE_START |
The following statements and recommendations regarding synchronization are relevant to the porting of code from VAX systems to Alpha systems:
This synchronization guarantee is only true for multiprocessing systems. The uniprocessing version of spin locks does not use interlocked instructions. As a result, memory barriers are not provided in uniprocessor spin lock, mutex, and lock manager synchronization. |
ADDL3 R1, R2, 4(R3) ; Save total EVAX_TRAPB ; Insure any arithmetic traps handled by ; existing handler MOVAB HANDLER2, 0(FP) ; Enable new condition handler |
This chapter describes how you can improve the performance of your ported code. The topics described in this chapter follow:
An unaligned data reference will work but will be slow on OpenVMS Alpha, because the system must take an unaligned address fault to complete the unaligned reference. If it is known that a data reference is unaligned, the compiler can generate unaligned quadword loads and masks to manually extract the data. This is slower than an aligned load but much faster than taking an alignment fault. Global data labels that are not longword or quadword aligned are flagged with information-level messages.
In addition, unaligned memory modification references cannot
be made atomic with /PRESERVE=ATOMICITY or .PRESERVE ATOMICITY. If this
is attempted, it will cause a fatal reserved operand fault.
4.1.1 Alignment Assumptions
By default, the compiler assumes the following:
Every time a register is changed, the compiler determines whether the base address in the register is still aligned. If the register and specified offset result in an aligned address, the compiler uses an aligned load or store for a memory reference. The compiler attempts to track register usage in terms of whether the base address remains aligned. When a stored memory address is loaded, for instance, MOVL 4(R1),R0, or used indirectly for instance, MOVL@4(R1),R0, the compiler assumes the resulting address is aligned.
For quadword memory references such as MOVQ instructions, the compiler assumes the base address is quadword aligned, unless it has determined by means of its register tracking code that the address may not be longword aligned. In other words, quadword register alignment is not tracked---only longword alignment.
Quadword references in Alpha built-ins, such as those in the following example, will be in new code, where alignment should be correct. Therefore all memory references in the following example will use aligned quadword load/stores:
EVAX_LDQ R1, (R2) EVAX_ADDQ (R1), #1, (R3) |
If an Alpha built-in (other than EVAX_LDQU or EVAX_STQU) is used on an
address that is not quadword aligned, an alignment fault will occur at
run time.
4.1.2 Directives and Qualifier for Changing Alignment Assumptions
The compiler provides two directives and one qualifier for changing the compiler's alignment assumptions. Both directives enable the compiler to produce more efficient code. The .SET_REGISTERS directive allows you to specify whether a register is aligned or unaligned. This directive should be used when the result of an operation is the reverse of what the compiler expects. It also allows you to declare registers that the compiler would not otherwise detect as input or output registers.
The .SYMBOL_ALIGNMENT directive allows you to specify the alignment of any memory reference that uses a symbolic offset. This directive should be used when you know the data will be aligned for every use of the symbolic offset.
These directives are described in detail in Appendix B. The examples in each description show how to use them.
The /UNALIGN qualifier to the MACRO/MIGRATION command tells the compiler to assume unaligned all the time for all register-based memory references rather than try to track the alignment. This does not affect stack-based or static references where the compiler knows the alignment.
This qualifier is described in detail in Appendix A.
4.1.3 Precedence of Alignment Controls
The order of precedence of the compiler's alignment controls, from strongest (.SYMBOL_ALIGNMENT) to weakest (built-in assumptions and tracking mechanisms), follows:
The following recommendations are provided for aligning data:
The Alpha architecture is pipelined, which means that before completing the current instruction, it starts to execute several instructions beyond it. By tailoring the code to keep the pipeline filled, you can make the code run significantly faster.
On each conditional branch, the Alpha architecture attempts to predict whether or not the branch is taken so that it can correctly fill the instruction pipeline with the next instruction to be executed. The architecture predicts that forward conditional branches will not be taken and backward conditional branches will be taken. A mispredicted branch costs extra time because the pipeline must be flushed, and, in addition, the instruction at the branch destination may not be in the instruction cache.
The compiler tries to follow the flow of the VAX MACRO code to generate Alpha code that has the most common code path in a contiguous block, to allow the pipelined Alpha architecture to process the code with the greatest efficiency. However, in some situations, the compiler's default rules do not generate the most efficient code. In performance sensitive code sections, you can often improve the efficiency of the generated code by giving the compiler information about which code paths will most likely be taken.
4.2.1 Default Code Flow and Branch Prediction
Generally, the compiler generates Alpha code that follows unconditional
VAX MACRO branches and falls through conditional VAX MACRO branches
unless it is directed otherwise. For example, consider the following
VAX MACRO code sequence:
(Code block A) BLBS R0,10$ (Code block B) 10$: (Code block C) BRB 30$ 20$: . (Code block D) . 30$: (Code block E) |
The Alpha code generated for this sequence looks like the following:
(Code block A) BLBS R0,10$ (Code block B) 10$: (Code block C) 30$: (Code block E) |
Note that the compiler fell through the BLBS instruction, continuing with the instructions immediately following the BLBS. At the BRB instruction, it did not generate a branch instruction at all but followed the Alpha code generated from Code block C with the Alpha code generated from Code block E, at the branch destination. Code from Code block D at label 20$ will be generated at a later point in the routine. If there is no branch to the label 20$, the compiler will report the following informational message and will not generate Alpha code for Code block D:
UNRCHCODE, unreachable code |
In most cases, this algorithm produces Alpha code that matches the assumptions of the architecture:
However, because the compiler follows unconditional branches, the destination of a backward VAX MACRO branch may not have been generated yet. In this case, a conditional branch that was backward in the VAX MACRO source code may become a forward branch in the generated Alpha code. See Section 4.2.5 for a further discussion and resolution of this problem.
There are some cases where the compiler may assume that a forward branch is taken. For example, consider the following common VAX MACRO coding practice:
JSB XYZ ;Call a routine BLBS R0,10$ ;Branch to continue on success BRW ERROR ;Destination too far for byte offset 10$: |
In this case, and any case where the inline code following the branch is only a few lines and does not rejoin the code flow at the branch destination, the forward branch is considered taken. This eliminates the delay that occurs on Alpha systems for a mispredicted branch. The compiler will automatically change the sense of the branch, and will move the code between the branch and the label out of line to a point beyond the normal exit of the routine. For this example it would generate the following code:
JSR XYZ BLBC $L1 10$: . . . (routine exit) $L1: BRW ERROR |
The compiler provides two directives, .BRANCH_LIKELY and .BRANCH_UNLIKELY, to change its assumptions about branch prediction. The directive .BRANCH_LIKELY is for use with forward conditional branches when the probability of the branch is large, say 75 percent or more. The directive .BRANCH_UNLIKELY is for use with backward conditional branches when the probability of the branch is less than 25 percent.
These directives should only be used in performance-sensitive code. Furthermore, you should be more cautious when adding .BRANCH_UNLIKELY, because it introduces an additional branch indirection for the case when the branch is actually taken. That is, the branch is changed to a forward branch to a branch instruction, which in turn branches to the original branch target.
There is no directive to tell the compiler not to follow an unconditional branch. However, if you want the compiler to generate code that does not follow the branch, you can change the unconditional branch to be a conditional branch that you know will always be taken. For example, if you know that in the current code section R3 always contains the address of a data structure, you could change a BRB instruction to a TSTL R3 followed by a BNEQ instruction. This branch will always be taken, but the compiler will fall through and continue code generation with the next instruction. This will always cause a mispredicted branch when executed, but may be useful in some situations.
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
5601PRO_007.HTML
|