Loop unrolling is a standard manual optimization that creates larger
loops by replication of the original loop body. Loop unrolling
is done automatically by KAP to speed up some loops by reducing
the number of times the loop control overhead is encountered.
Inner loop unrolling is controlled by the /unroll
and
/unroll2
qualifiers. Outer loop unrolling is part of
memory management and is controlled by the /roundoff
and /scalaropt
qualifiers.
Unrolling a loop involves duplicating the loop body one or more times within the loop, adding an increment, or changing the increment that was already in the loop, and possibly inserting cleanup code before the loop to execute any left-over iterations of the loop. If the loop bounds are constant and the iteration count of the loop is small, the loop may be entirely deleted and replaced by copies of the loop body.
If the loop bounds are constant, KAP may use an unrolling factor near, but above, the unroll value if that will exactly divide the loop iteration count.
The /scalaropt
command qualifier must be set to at
least 2 to enable loop unrolling.
The following examples were run with /unroll=8
and
/unroll2=1000
. See Chapter 4
for more information about these command qualifiers.
If the loop bounds are unknown at compilation time, a loop may be unrolled, as shown in the following example:
for (i=1; i<n ; i++) a[i] = b[i]/a[i-1] ;
Becomes:
for ( i = 1; i<=n - 8; i+=8 ) { a[i] = b[i] / a[i-1]; a[i+1] = b[i+1] / a[i]; a[i+2] = b[i+2] / a[i+1]; a[i+3] = b[i+3] / a[i+2]; a[i+4] = b[i+4] / a[i+3]; a[i+5] = b[i+5] / a[i+4]; a[i+6] = b[i+6] / a[i+5]; a[i+7] = b[i+7] / a[i+6]; } for ( ; i<n; i++ ) { a[i] = b[i] / a[i-1]; }
If loop bounds are constant, the unrolled loop may look like the following example. Notice that KAP has deviated slightly from the unroll value to make the iteration count an exact multiple of the unrolling factor thereby eliminating the need for a cleanup loop, as shown in the following example:
for (i=1; i<100; i++) a[i] = b[i]/a[i-1] ;Becomes:
for ( i = 1; i<=91; i+=9 ) { a[i] = b[i] / a[i-1]; a[i+1] = b[i+1] / a[i]; a[i+2] = b[i+2] / a[i+1]; a[i+3] = b[i+3] / a[i+2]; a[i+4] = b[i+4] / a[i+3]; a[i+5] = b[i+5] / a[i+4]; a[i+6] = b[i+6] / a[i+5]; a[i+7] = b[i+7] / a[i+6]; a[i+8] = b[i+8] / a[i+7]; }
Or, if the loop iteration count is constant and small, the loop control may be removed altogether, as shown in the following example:
for (i=1; i<5 ; i++) a[i] = b[i]/a[i-1] ;
Becomes:
a[1] = b[1] / a[0]; a[2] = b[2] / a[1]; a[3] = b[3] / a[2]; a[4] = b[4] / a[3];