/unroll, /ur, (unroll=4), /unroll2, /ur2, (…/unroll2=160), /unroll3, /ur3, (/unroll3=20)

4.5.21 /unroll, /ur, (unroll=4), /unroll2, /ur2, (/unroll2=160), /unroll3, /ur3, (/unroll3=20)

The /unroll, /unroll2, and /unroll3 qualifiers control how KAP unrolls scalar inner loops. Loop execution is often more efficient when the loops are unrolled. Fewer iterations with more work per iteration will require less loop- control overhead. KAP unrolls the loop until either the loop has been unrolled the number of times given in the /unroll qualifier, or the amount of "work" in each iteration reaches the value given by the /unroll2 qualifier.

Note: If you use kapf with the Digital Fortran compiler optimization qualifier set to /O5 , you should turn off loop unrolling by setting /unroll=1 .

Outer loop unrolling is a part of memory management and is not controlled by these qualifiers.

The /scalaropt=2 level is required to enable loop unrolling.

The syntax for /unroll and /unroll2 is as follows:

Long forms: /unroll=<#it> or /unroll2=<weight>
Short forms: /u=<#it> , /ur=<#it> , /ur2=<weight> , where <#it> is the maximum number of iterations to unroll. Other settings are as follows:
=0 - use the default value.
=1 - no unrolling.
<weight> - the maximum weight in an unrolled loop. The <weight> setting is estimated by counting operands and operators in a loop.

There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.

The default (4,100) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 100.

Loop overhead is reduced by performing more iterations from the original loop for each pass through the new loop, but the gain is less with each additional unrolled iteration. Eventually, the cost in extra memory exceeds the gain from unrolling. The /unroll qualifier sets a maximum number of iterations to unroll.

Note: When the total number of iterations to be executed by the loop (the iteration count) is constant, KAP searches for a number of iterations to unroll that is near the /unroll value and which exactly divides the iteration count. This avoids having extra iterations left over, which must be handled separately and generate extra code. The range over which KAP searches for an exact divisor is the /unroll value plus or minus 25%.

To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:

# of assignments +
# of IF statements +
# of subscripts +
# of arithmetic operations

For the following example, the user has specified 8 for the maximum number of iterations to unroll (/unroll=8 ) and 100 for the maximum "work per unrolled iteration" (/unroll2=100 ):

DO 10 I = 2,N
     A(I) = B(I)/A(I-1)
10   CONTINUE

This example has: 1 assignment 0 ifs 3 subscripts 2 arithmetic operators --------- 6 is the weighted sum (The work for 1 iteration)

This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.

In this case (an unknown number of iterations), KAP will generate two loops - the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:

DO 11 I=2,N-7,8
     A(I) = B(I) / A(I-1)
     A(I+1) = B(I+1) / A(I)
     A(I+2) = B(I+2) / A(I+1)
     A(I+3) = B(I+3) / A(I+2)
     A(I+4) = B(I+4) / A(I+3)
     A(I+5) = B(I+5) / A(I+4)
     A(I+6) = B(I+6) / A(I+5)
     A(I+7) = B(I+7) / A(I+6)
       11 CONTINUE
       DO 2 I=I,N,1
       A(I) = B(I) / A(I-1)
2  CONTINUE

The /unroll3=n qualifier sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as /unroll2 ), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. Leave this qualifier at 20, the default.

Previous Page | Next Page | Contents | Index |
Command-Line Qualifiers