The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. It takes the following form:
c$OMP DO [clause[[,] clause] . . . ] do_loop [c$OMP END DO [NOWAIT]]
See Section 15.2.2.3.
See Section 15.2.2.4.
Must be used if ordered sections are contained in the dynamic extent of the DO directive. For more information about ordered sections, see the ORDERED directive described in Section 15.2.3.8.
See Section 15.2.2.5.
See Section 15.2.2.6.
Specifies how iterations of the DO loop are divided among the threads of the team. The chunk must be a scalar integer expression. The following four types are permitted, three of which allow the optional parameter chunk:
Type | Effect |
---|---|
STATIC | Divides iterations into contiguous pieces by dividing the number of iterations by the number of threads in the team. Each piece is then dispatched to a thread before loop execution begins. |
If chunk is specified, iterations are divided into pieces of a size specified by chunk. The pieces are statically dispatched to threads in the team in a round-robin fashion in the order of the thread number. | |
DYNAMIC | Can be used to get a set of iterations dynamically. It defaults to 1 unless chunk is specified. |
If chunk is specified, the iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically gets the next set of iterations. | |
GUIDED | Can be used to specify a minimum number of iterations. It defaults to 1 unless chunk is specified. |
If chunk is specified, the chunksize is reduced exponentially with each succeeding dispatch. The chunk specifies the minimum number of iterations to dispatch each time. If there are less than chunk iterations remaining, the rest are dispatched. | |
RUNTIME [1] | Defers the scheduling decision until run time. You can choose a schedule type and chunksize at run time by using the environment variable OMP_SCHEDULE. |
[1] No chunk is permitted for this type. |
If the SCHEDULE clause is not used, the default schedule type is
STATIC.
The iterations of the DO loop are distributed across the existing
team of threads. The values of the loop control parameters of the
DO loop associated with a DO directive must be the same for all the
threads in the team.
You cannot branch out of a DO loop associated with a DO
directive.
If used, the END DO directive must appear immediately after the end
of the loop. If you do not specify an END DO directive, an END DO
directive is assumed at the end of the DO loop.
If you specify NOWAIT in the END DO directive, threads do not
synchronize at the end of the parallel loop. Threads that finish
early proceed straight to the instruction following the loop
without waiting for the other members of the team to finish the
DO directive.
Parallel DO loop control variables are block-level entities within
the DO loop. If the loop control variable also appears in the
LASTPRIVATE list of the parallel DO, it is copied out to a variable
of the same name in the enclosing PARALLEL region. The variable in
the enclosing PARALLEL region must be SHARED if it is specified in
the LASTPRIVATE list of a DO directive.
Only a single SCHEDULE clause and ORDERED clause can appear in a DO
directive.
DO directives must be encountered by all threads in a team or by
none at all. none at all. It must also be encountered in the same
order by all threads in a team.
In the following example, the loop iteration variable is private by
default, and it is not necessary to explicitly declare it. The END
DO directive is optional:
If there are multiple independent loops within a parallel region,
you can use the NOWAIT clause to avoid the implied BARRIER at the
end of the DO directive, as follows:
Correct execution sometimes depends on the value that the last
iteration of a loop assigns to a variable. Such programs must list
all such variables as arguments to a LASTPRIVATE clause so that the
values of the variables are the same as when the loop is executed
sequentially, as follows:
In this case, the value of I at the end of the parallel region
equals N+1, as in the sequential case.
Ordered sections are useful for sequentially ordering the output
from work that is done in parallel. Assuming that a reentrant I/O
library exists, the following program prints out the indexes in
sequential order:
c$OMP PARALLEL
c$OMP DO
DO I=1,N
B(I) = (A(I) + A(I-1)) / 2.0
END DO
c$OMP END DO
c$OMP END PARALLEL
c$OMP PARALLEL
c$OMP DO
DO I=2,N
B(I) = (A(I) + A(I-1)) / 2.0
END DO
c$OMP END DO NOWAIT
c$OMP DO
DO I=1,M
Y(I) = SQRT(Z(I))
END DO
c$OMP END DO NOWAIT
c$OMP END PARALLEL
c$OMP PARALLEL
c$OMP DO LASTPRIVATE(I)
DO I=1,N
A(I) = B(I) + C(I)
END DO
c$OMP END PARALLEL
CALL REVERSE(I)
c$OMP DO ORDERED SCHEDULE(DYNAMIC)
DO I=LB,UB,ST
CALL WORK(I)
END DO
...
SUBROUTINE WORK(K)
c$OMP ORDERED
WRITE(*,*) K
c$OMP END ORDERED