Compaq Fortran
User Manual for
OpenVMS Alpha Systems

5.4.2 Passing Array Arguments Efficiently

In Compaq Fortran, there are two general types of array arguments:

Explicit-shape arrays used with FORTRAN 77.
These arrays have a fixed rank and extent that are known at compile time. Other dummy argument (receiving) arrays that are not deferred-shape (such as assumed-size arrays) can be grouped with explicit-shape array arguments.
Deferred-shape arrays introduced with Fortran 90.
Types of deferred-shape arrays include array pointers and allocatable arrays. Assumed-shape array arguments generally follow the rules about passing deferred-shape array arguments.

When passing arrays as arguments, either the starting (base) address of the array or the address of an array descriptor is passed:

When using explicit-shape (or assumed-size) arrays to receive an array, the starting address of the array is passed.
When using deferred-shape or assumed-shape arrays to receive an array, the address of the array descriptor is passed (the compiler creates the array descriptor).

Passing an assumed-shape array or array pointer to an explicit-shape array can slow run-time performance. This is because the compiler needs to create an array temporary for the entire array. The array temporary is created because the passed array may not be contiguous and the receiving (explicit-shape) array requires a contiguous array. When an array temporary is created, the size of the passed array determines whether the impact on slowing run-time performance is slight or severe.

Table 5-3 summarizes what happens with the various combinations of array types. The amount of run-time performance inefficiency depends on the size of the array.

Table 5-3 Output Argument Array Types
Input Arguments Array Types Explicit-Shape Arrays Deferred-Shape and Assumed-Shape Arrays

Explicit-Shape Arrays Very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. Efficient. Only allowed for assumed-shape arrays (not deferred-shape arrays). Does not use an array temporary. Passes an array descriptor. Requires an interface block.

Deferred-Shape and Assumed-Shape Arrays When passing an allocatable array, very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional.
When not passing an allocatable array, not efficient. Instead use allocatable arrays whenever possible.
Uses an array temporary. Does not pass an array descriptor. Interface block optional.
Efficient. Requires an assumed-shape or array pointer as dummy argument. Does not use an array temporary. Passes an array descriptor. Requires an interface block.

**Table 5-3 Output Argument Array Types**
Input Arguments Array Types	Explicit-Shape Arrays	Deferred-Shape and Assumed-Shape Arrays
Explicit-Shape Arrays	Very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional.	Efficient. Only allowed for assumed-shape arrays (not deferred-shape arrays). Does not use an array temporary. Passes an array descriptor. Requires an interface block.
Deferred-Shape and Assumed-Shape Arrays	When passing an allocatable array, very efficient. Does not use an array temporary. Does not pass an array descriptor. Interface block optional. When not passing an allocatable array, not efficient. Instead use allocatable arrays whenever possible. Uses an array temporary. Does not pass an array descriptor. Interface block optional.	Efficient. Requires an assumed-shape or array pointer as dummy argument. Does not use an array temporary. Passes an array descriptor. Requires an interface block.

For More Information:

On arrays and their data declaration statements, see the Compaq Fortran Language Reference Manual.

5.5 Improve Overall I/O Performance

Improving overall I/O performance can minimize both device I/O and actual CPU time. The techniques listed in this section can greatly improve performance in many applications.

A bottleneck determines the maximum speed of execution by being the slowest process in an executing program. In some programs, I/O is the bottleneck that prevents an improvement in run-time performance. The key to relieving I/O bottlenecks is to reduce the actual amount of CPU and I/O device time involved in I/O. Bottlenecks may be caused by one or more of the following:

A dramatic reduction in CPU time without a corresponding improvement I/O time results in an I/O bottleneck.
By such coding practices as:
- Unnecessary formatting of data and other CPU-intensive processing
- Unnecessary transfers of intermediate results
- Inefficient transfers of small amounts of data
- Application requirements

Improved coding practices can minimize actual device I/O, as well as the actual CPU time.

Compaq offers software solutions to system-wide problems like minimizing device I/O delays (see Section 5.1.1).

5.5.1 Use Unformatted Files Instead of Formatted Files

Use unformatted files whenever possible. Unformatted I/O of numeric data is more efficient and more precise than formatted I/O. Native unformatted data does not need to be modified when transferred and will take up less space on an external file.

Conversely, when writing data to formatted files, formatted data must be converted to character strings for output, less data can transfer in a single operation, and formatted data may lose precision if read back into binary form.

To write the array A(25,25) in the following statements, S₁ is more efficient than S₂:

S₁ WRITE (7) A S₂ WRITE (7,100) A 100 FORMAT (25(' ',25F5.21))

Although formatted data files are more easily ported to other systems, Compaq Fortran can convert unformatted data in several formats (see Chapter 9).

5.5.2 Write Whole Arrays or Strings

The general guidelines about array use discussed in Section 5.4 also apply to reading or writing an array with an I/O statement.

To eliminate unnecessary overhead, write whole arrays or strings at one time rather than individual elements at multiple times. Each item in an I/O list generates its own calling sequence. This processing overhead becomes most significant in implied-DO loops. When accessing whole arrays, use the array name (Fortran 90/95 array syntax) instead of using implied-DO loops.

5.5.3 Write Array Data in the Natural Storage Order

Use the natural ascending storage order whenever possible. This is column-major order, with the leftmost subscript varying fastest and striding by 1 (see Section 5.4). If a program must read or write data in any other order, efficient block moves are inhibited.

If the whole array is not being written, natural storage order is the best order possible.

5.5.4 Use Memory for Intermediate Results

Performance can improve by storing intermediate results in memory rather than storing them in a file on a peripheral device. One situation that may not benefit from using intermediate storage is a disproportionately large amount of data in relation to physical memory on your system. Excessive page faults can dramatically impede virtual memory performance.

5.5.5 Defaults for Blocksize and Buffer Count

Compaq Fortran provides OPEN statement defaults for BLOCKSIZE and BUFFERCOUNT that generally offer adequate I/O performance. The default for BLOCKSIZE and BUFFERCOUNT is determined by SET RMS_DEFAULT command default values.

Specifying a BUFFERCOUNT of 2 (or 3) allows Record Management Services (RMS) to overlap some I/O operations with CPU operations. For sequential and relative files, specify a BLOCKSIZE of at least 1024 bytes. For indexed files, consult the Guide to OpenVMS File Applications for information on file tuning and specifying the optimal BUFFERCOUNT and BLOCKSIZE.

Any experiments to improve I/O performance should try to increase the amount of data read by each disk I/O. For large indexed files, you can reduce disk I/O by specifying enough buffers (BUFFERCOUNT) to keep most of the index portion of the file in memory.

For More Information:

On tuning indexed files and optimal BUFFERCOUNT and BLOCKSIZE values, see the Guide to OpenVMS File Applications.
On specifying BLOCKSIZE and BUFFERCOUNT, see the Compaq Fortran Language Reference Manual.

5.5.6 Specify RECL

When creating a file, you should consider specifying a RECL value that provides for adequate I/O performance. The RECL value unit differs for unformatted files (4-byte units) and formatted files (1-byte units).

The RECL value unit for formatted files is always 1-byte units. For unformatted files, the RECL unit is 4-byte units, unless you specify the /ASSUME=BYTERECL qualifier to request 1-byte units (see Section 2.3.6).

When porting unformatted data files from non-Compaq systems, see Section 9.4.5.

For More Information:

On optimal RECL (record length) values, see the Guide to OpenVMS File Applications.
On specifying RECL, see the Compaq Fortran Language Reference Manual.

5.5.7 Use the Optimal Record Type

Unless a certain record type is needed for portability reasons (see Section 6.4.3), choose the most efficient type, as follows:

For sequential files of a consistent record size, the fixed-length record type gives the best performance.
For sequential unformatted files when records are not fixed in size, use variable-length or segmented records.
For sequential formatted files when records are not fixed in size, use variable-length records, unless you need to use Stream_LF records for data porting compatibility (see Section 6.4.3).

For More Information:

On Compaq Fortran data files and I/O, see Chapter 6.
On OPEN statement specifiers and defaults, see Section 6.5 and the Compaq Fortran Language Reference Manual.

5.5.8 Enable Implied-DO Loop Collapsing

DO loop collapsing reduces a major overhead in I/O processing. Normally, each element in an I/O list generates a separate call to the Compaq Fortran RTL. The processing overhead of these calls can be most significant in implied-DO loops.

Compaq Fortran reduces the number of calls in implied-DO loops by replacing up to seven nested implied-DO loops with a single call to an optimized run-time library I/O routine. The routine can transmit many I/O elements at once.

Loop collapsing can occur in formatted and unformatted I/O, but only if certain conditions are met:

The control variable must be an integer. The control variable cannot be a dummy argument or contained in an EQUIVALENCE or VOLATILE statement. Compaq Fortran must be able to determine that the control variable does not change unexpectedly at run time.
The format must not contain a variable format expression.

For More Information:

On VOLATILE attribute and statement, see the Compaq Fortran Language Reference Manual.
On loop optimizations, see Section 5.7.

5.5.9 Use of Variable Format Expressions

Variable format expressions (a Compaq Fortran 77 extension) are almost as flexible as run-time formatting, but they are more efficient because the compiler can eliminate run-time parsing of the I/O format. Only a small amount of processing and the actual data transfer are required during run time.

On the other hand, run-time formatting can impair performance significantly. For example, in the following statements, S₁ is more efficient than S₂ because the formatting is done once at compile time, not at run time:

S₁ WRITE (6,400) (A(I), I=1,N) 400 FORMAT (1X, <N> F5.2) . . . S₂ WRITE (CHFMT,500) '(1X,',N,'F5.2)' 500 FORMAT (A,I3,A) WRITE (6,FMT=CHFMT) (A(I), I=1,N)

5.6 Additional Source Code Guidelines for Run-Time Efficiency

Other source coding guidelines can be implemented to improve run-time performance.

The amount of improvement in run-time performance is related to the number of times a statement is executed. For example, improving an arithmetic expression executed within a loop many times has the potential to improve performance more than improving a similar expression executed once outside a loop.

5.6.1 Avoid Small Integer and Small Logical Data Items

Avoid using integer or logical data less than 32 bits, because the smallest unit of efficient access on Alpha systems is 32 bits.

Accessing a 16-bit (or 8-bit) data type can result in a sequence of machine instructions to access the data, rather than a single, efficient machine instruction for a 32-bit data item.

To minimize data storage and memory cache misses with arrays, use 32-bit data rather than 64-bit data, unless you require the greater numeric range of 8-byte integers or the greater range and precision of double precision floating-point numbers.

5.6.2 Avoid Mixed Data Type Arithmetic Expressions

Avoid mixing integer and floating-point (REAL) data in the same computation. Expressing all numbers in a floating-point arithmetic expression (assignment statement) as floating-point values eliminates the need to convert data between fixed and floating-point formats. Expressing all numbers in an integer arithmetic expression as integer values also achieves this. This improves run-time performance.

For example, assuming that I and J are both INTEGER variables, expressing a constant number (2) as an integer value (2.) eliminates the need to convert the data:

Original Code: INTEGER I, J
I = J / 2.

Efficient Code: INTEGER I, J
I = J / 2

For applications with numerous floating-point operations, consider using the /ASSUME=NOACCURACY_SENSITIVE qualifier (see Section 5.8.8) if a small difference in the result is acceptable.

You can use different sizes of the same general data type in an expression with minimal or no effect on run-time performance. For example, using REAL, DOUBLE PRECISION, and COMPLEX floating-point numbers in the same floating-point arithmetic expression has minimal or no effect on run-time performance.

5.6.3 Use Efficient Data Types

In cases where more than one data type can be used for a variable, consider selecting the data types based on the following hierarchy, listed from most to least efficient:

Integer (See also Section 5.6.1)
Single-precision real, expressed explicitly as REAL, REAL (KIND=4), or REAL*4
Double-precision real, expressed explicitly as DOUBLE PRECISION, REAL (KIND=8), or REAL*8
Extended-precision real, expressed explicitly as REAL (KIND=16) or REAL*16

However, keep in mind that in an arithmetic expression, you should avoid mixing integer and floating-point (REAL) data (see Section 5.6.2).

5.6.4 Avoid Using Slow Arithmetic Operators

Before you modify source code to avoid slow arithmetic operators, be aware that optimizations convert many slow arithmetic operators to faster arithmetic operators. For example, the compiler optimizes the expression H=J**2 to be H=J*J.

Consider also whether replacing a slow arithmetic operator with a faster arithmetic operator will change the accuracy of the results or impact the maintainability (readability) of the source code.

Replacing slow arithmetic operators with faster ones should be reserved for critical code areas. The following hierarchy lists the Compaq Fortran arithmetic operators, from fastest to slowest:

Addition (+), subtraction (-), and floating-point multiplication (*)
Integer multiplication (*)
Division (/)
Exponentiation (**)

5.6.5 Avoid EQUIVALENCE Statement Use

Avoid using EQUIVALENCE statements. EQUIVALENCE statements can:

Force unaligned data or cause data to span natural boundaries.
Prevent certain optimizations, including:
- Global data analysis under certain conditions (see Section 5.7.3)
- Implied-DO loop collapsing when the control variable is contained in an EQUIVALENCE statement

5.6.6 Use Statement Functions and Internal Subprograms

Whenever the Compaq Fortran compiler has access to the use and definition of a subprogram during compilation, it may choose to inline the subprogram. Using statement functions and internal subprograms maximizes the number of subprogram references that will be inlined, especially when multiple source files are compiled together at optimization level /OPTIMIZE=LEVEL=4 or higher.

For more information, see Section 5.1.2.

5.6.7 Code DO Loops for Efficiency

Minimize the arithmetic operations and other operations in a DO loop whenever possible. Moving unnecessary operations outside the loop will improve performance (for example, when the intermediate nonvarying values within the loop are not needed).

For More Information:

On loop optimizations, see Section 5.8.2 and Section 5.8.4.
On Compaq Fortran statements, see the Compaq Fortran Language Reference Manual.

5.7 Optimization Levels: the /OPTIMIZE=LEVEL=n qualifier

Compaq Fortran performs many optimizations by default. You do not have to recode your program to use them. However, understanding how optimizations work helps you remove any inhibitors to their successful function.

Generally, Compaq Fortran increases compile time in favor of decreasing run time. If an operation can be performed, eliminated, or simplified at compile time, Compaq Fortran does so, rather than have it done at run time. The time required to compile the program usually increases as more optimizations occur.

The program will likely execute faster when compiled at /OPTIMIZE=LEVEL=4, but will require more compilation time than if you compile the program at a lower level of optimization.

The size of the object file varies with the optimizations requested. Factors that can increase object file size include an increase of loop unrolling or procedure inlining.

Table 5-4 lists the levels of Compaq Fortran optimization with different /OPTIMIZE=LEVEL=n levels. For example, /OPTIMIZE=LEVEL=0 specifies no selectable optimizations (certain optimizations always occur); /OPTIMIZE=LEVEL=5 specifies all levels of optimizations including loop transformation and software pipelining.

Table 5-4 Types of Optimization Performed at Different /OPTIMIZE =LEVEL = n Levels
/OPTIMIZE=LEVEL=n

Optimization Type n=0 n=1 n=2 n=3 n=4 n=5

Loop transformation and software pipelining X

Automatic inlining X X

Loop unrolling X X X

Additional global optimizations X X X

Global optimizations X X X X

Local (minimal) optimizations X X X X X

**Table 5-4 Types of Optimization Performed at Different /OPTIMIZE =LEVEL = n Levels**
	/OPTIMIZE=LEVEL=n
Optimization Type	n=0	n=1	n=2	n=3	n=4	n=5
Loop transformation and software pipelining						X
Automatic inlining					X	X
Loop unrolling				X	X	X
Additional global optimizations				X	X	X
Global optimizations			X	X	X	X
Local (minimal) optimizations		X	X	X	X	X

The default is /OPTIMIZE=LEVEL=4.

In Table 5-4, the following terms are used to describe the levels of optimization (described in detail in Section 5.7.1 to Section 5.7.6):

Local (minimal) optimizations (/OPTIMIZE=LEVEL=1 or higher) occur within the source program unit and include recognition of common subexpressions and the expansion of multiplication and division.
Global optimizations (/OPTIMIZE=LEVEL=2 or higher) include such optimizations as data-flow analysis, code motion, strength reduction, split-lifetime analysis, and instruction scheduling.
Additional global optimizations (/OPTIMIZE=LEVEL=3 or higher) improve speed at the cost of extra code size. These optimizations include loop unrolling and code replication to eliminate branches.
Automatic inlining (/OPTIMIZE=LEVEL=4 or higher) applies interprocedure analysis and inline expansion of small procedures, usually by using heuristics that limit extra code.
Loop transformation and Software pipelining (/OPTIMIZE=LEVEL=5 or higher) include a group of loop transformation optimizations and the software pipelining optimization.
The loop transformation optimizations apply to array references within loops and can apply to multiple nested loops. These optimizations can improve the performance of the memory system.
Software pipelining applies instruction scheduling to certain innermost loops, allowing instructions within a loop to "wrap around" and execute in a different iteration of the loop. This can reduce the impact of long-latency operations, resulting in faster loop execution.
Software pipelining also enables the prefetching of data to reduce the impact of cache misses.

5.7.1 Optimizations Performed at All Optimization Levels

The following optimizations occur at any optimization level (0 through 5):

Space optimizations
Space optimizations decrease the size of the object or executing program by eliminating unnecessary use of memory, thereby improving speed of execution and system throughput. Compaq Fortran space optimizations are as follows:
- Constant Pooling
  Only one copy of a given constant value is ever allocated memory space. If that constant value is used in several places in the program, all references point to that value.
- Dead Code Elimination
  If operations will never execute or if data items will never be used, Compaq Fortran eliminates them. Dead code includes unreachable code and code that becomes unused as a result of other optimizations, such as value propagation.
Inlining arithmetic statement functions and intrinsic procedures
Regardless of the optimization level, Compaq Fortran inserts arithmetic statement functions directly into a program instead of calling them as functions. This permits other optimizations of the inlined code and eliminates several operations, such as calls and returns or stores and fetches of the actual arguments. For example:
SUM(A,B) = A+B . . . Y = 3.14 X = SUM(Y,3.0) ! With value propagation, becomes: X = 6.14
Most intrinsic procedures are automatically inlined.
Inlining of other subprograms, such as contained subprograms, occurs at optimization level 4.
Implied-DO loop collapsing
DO loop collapsing reduces a major overhead in I/O processing. Normally, each element in an I/O list generates a separate call to the Compaq Fortran RTL. The processing overhead of these calls can be most significant in implied-DO loops.
If Compaq Fortran can determine that the format will not change during program execution, it replaces the series of calls in up to seven nested implied-DO loops with a single call to an optimized RTL routine (see Section 5.5.8). The optimized RTL routine can transfer many elements in one operation.
Compaq Fortran collapses implied-DO loops in formatted and unformatted I/O operations, but it is more important with unformatted I/O, where the cost of transmitting the elements is a higher fraction of the total cost.
Array temporary elimination and FORALL statements
Certain array store operations are optimized. For example, to minimize the creation of array temporaries, Compaq Fortran can detect when no overlap occurs between the two sides of an array expression. This type of optimization occurs for some assignment statements in FORALL constructs.
Certain array operations are also candidates for loop unrolling optimizations (see Section 5.7.4.1).

5.7.2 Local (Minimal) Optimizations

To enable local optimizations, use /OPTIMIZE=LEVEL=1 or a higher optimization level (LEVEL=2, LEVEL=3, LEVEL=4, LEVEL=5).

To prevent local optimizations, specify /NOOPTIMIZE (/OPTIMIZE=LEVEL=0).

5.7.2.1 Common Subexpression Elimination

If the same subexpressions appear in more than one computation and the values do not change between computations, Compaq Fortran computes the result once and replaces the subexpressions with the result itself:

DIMENSION A(25,25), B(25,25) A(I,J) = B(I,J)

Without optimization, these statements can be compiled as follows:

t1 = ((J-1)*25+(I-1))*4 t2 = ((J-1)*25+(I-1))*4 A(t1) = B(t2)

Variables t1 and t2 represent equivalent expressions. Compaq Fortran eliminates this redundancy by producing the following:

t = ((J-1)*25+(I-1)*4 A(t) = B(t)

Contents

Index

Original Code:	`INTEGER I, J` `I = J / 2.`
Efficient Code:	`INTEGER I, J` `I = J / 2`

Compaq FortranUser Manual for OpenVMS Alpha Systems

5.4.2 Passing Array Arguments Efficiently

5.5.6 Specify RECL

5.5.7 Use the Optimal Record Type

5.5.8 Enable Implied-DO Loop Collapsing

5.5.9 Use of Variable Format Expressions

5.6.5 Avoid EQUIVALENCE Statement Use

5.6.6 Use Statement Functions and Internal Subprograms

5.7 Optimization Levels: the /OPTIMIZE=LEVEL=n qualifier

5.7.1 Optimizations Performed at All Optimization Levels

5.7.2 Local (Minimal) Optimizations

Compaq Fortran
User Manual for
OpenVMS Alpha Systems