The /unroll, /unroll2,
and /unroll3
qualifiers control how KAP unrolls scalar inner loops. Loop
execution is often more efficient when the loops are unrolled. Fewer
iterations with more work per iteration will require less loop-
control overhead. KAP unrolls the loop until either the loop has
been unrolled the number of times given in the /unroll
qualifier, or the amount of "work" in each iteration reaches the
value given by the /unroll2
qualifier.
kapf
with the Digital Fortran compiler optimization qualifier set to
/O5
, you should turn off loop unrolling by setting
/unroll=1
.
Outer loop unrolling is a part of memory management and is not controlled by these qualifiers.
The /scalaropt=2
level is required to enable loop
unrolling.
The syntax for /unroll
and /unroll2
is as
follows:
/unroll=<#it>
or
/unroll2=<weight>
/u=<#it>
,
/ur=<#it>
, /ur2=<weight>
, where
<#it>
is the maximum number of iterations to unroll.
Other settings are as follows:
=0
- use the default value.
=1
- no unrolling.
<weight>
- the maximum weight in an unrolled
loop. The <weight>
setting is estimated by
counting operands and operators in a loop.
There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. NO warning is given if you request more than 100 unrolled iterations.
The default (4,100) means that the maximum number of iterations to unroll is 4 and that the maximum amount of work is 100.
Loop overhead is reduced by performing more iterations from the
original loop for each pass through the new loop, but the gain
is less with each additional unrolled iteration. Eventually, the
cost in extra memory exceeds the gain from unrolling. The
/unroll
qualifier sets a maximum number of iterations to
unroll.
/unroll
value and which exactly divides
the iteration count. This avoids having extra iterations left
over, which must be handled separately and generate extra code.
The range over which KAP searches for an exact divisor is the
/unroll
value plus or minus 25%.
To use the "work per unrolled iteration" limit, KAP analyzes a given loop by computing an estimate of the computational work that is inside the loop for ONE iteration. This rough estimate is based on the following criteria:
For the following example, the user has specified 8 for the maximum
number of iterations to unroll (/unroll=8
) and 100 for
the maximum "work per unrolled iteration" (/unroll2=100
):
DO 10 I = 2,N A(I) = B(I)/A(I-1) 10 CONTINUE
This example has: 1 assignment 0 ifs 3 subscripts 2 arithmetic operators --------- 6 is the weighted sum (The work for 1 iteration)
This weighted sum is then divided into 100 to give a potential unrolling factor of 16. However, because the user has also specified 8 for the maximum number of unrolled iterations, KAP takes the minimum of the 8 and 16. Therefore, KAP will unroll only 8 iterations. The maximum number of iterations that KAP will unroll is 100. If the user requests more than that, NO warning will be given.
In this case (an unknown number of iterations), KAP will generate two loops - the primary unrolled loop and a cleanup loop to ensure that the number of iterations in the main loop is a multiple of the unrolling factor. The result is the following:
DO 11 I=2,N-7,8 A(I) = B(I) / A(I-1) A(I+1) = B(I+1) / A(I) A(I+2) = B(I+2) / A(I+1) A(I+3) = B(I+3) / A(I+2) A(I+4) = B(I+4) / A(I+3) A(I+5) = B(I+5) / A(I+4) A(I+6) = B(I+6) / A(I+5) A(I+7) = B(I+7) / A(I+6) 11 CONTINUE DO 2 I=I,N,1 A(I) = B(I) / A(I-1) 2 CONTINUE
The /unroll3=n
qualifier sets the lower limit for
unrolling. If there are less than n units of work in the loop (same
units as /unroll2
), the loop will not be unrolled. The
amount of work in each loop iteration is shown in the loop table in
the annotated listing. Leave this qualifier at 20, the default.