Compaq Fortran
User Manual for
OpenVMS Alpha Systems


Previous Contents Index

5.2.1.2 Using a Command Procedure

Some of the information obtained by using the LIB$xxxx_TIMER routines can be obtained using a command procedure. You should be aware of the following:

Before using a command procedure to measure performance, define a foreign symbol that runs the program to be measured in a subprocess. In the following example, the name of the command procedure is TIMER:


$ TIMER :== SPAWN /WAIT /NOLOG @SYS$LOGIN:TIMER

The command procedure shown in Example 5-2 uses the F$GETJPI lexical function to measure performance statistics and the F$FAO lexical function to report the statistics. Each output line is saved as a logical name, which can be saved by the parent process if needed.

Example 5-2 Command Procedure that Measures Program Performance

$    verify = 'f$verify(0) 
$ 
$! Get initial values for stats (this removes SPAWN overhead or the current 
$! process values). 
$ 
$ bio1 = f$getjpi (0, "BUFIO") 
$ dio1 = f$getjpi (0, "DIRIO") 
$ pgf1 = f$getjpi (0, "PAGEFLTS") 
$ vip1 = f$getjpi (0, "VIRTPEAK") 
$ wsp1 = f$getjpi (0, "WSPEAK") 
$ dsk1 = f$getdvi ("sys$disk:","OPCNT") 
$ tim1 = f$time () 
$ 
$ set noon 
$ tik1 = f$getjpi (0, "CPUTIM") 
$ set noverify 
$ 
$! User command being timed: 
$ 
$ 'p1' 'p2' 'p3' 'p4' 'p5' 'p6' 'p7' 'p8' 
$ 
$ tik2 = f$getjpi (0, "CPUTIM") 
$ 
$ bio2 = f$getjpi (0, "BUFIO") 
$ dio2 = f$getjpi (0, "DIRIO") 
$ pgf2 = f$getjpi (0, "PAGEFLTS") 
$ vip2 = f$getjpi (0, "VIRTPEAK") 
$ wsp2 = f$getjpi (0, "WSPEAK") 
$ dsk2 = f$getdvi ("sys$disk:","OPCNT") 
$ tim2 = f$time () 
$ 
$ tim  = f$cvtime("''f$cvtime(tim2,,"TIME")'-''f$cvtime(tim1,,"TIME")'",,"TIME") 
$ thun = 'f$cvtime(tim,,"HUNDREDTH") 
$ tsec = (f$cvtime(tim,,"HOUR")*3600) + (f$cvtime(tim,,"MINUTE")*60) + - 
  f$cvtime(tim,,"SECOND") 
$ 
$ bio  = bio2 - bio1 
$ dio  = dio2 - dio1 
$ pgf  = pgf2 - pgf1 
$ dsk  = dsk2 - dsk1 
$ vip  = "" 
$ if vip2 .le. vip1 then vip = "*"   ! Asterisk means didn't change (from parent) 
$ wsp  = "" 
$ if wsp2 .le. wsp1 then wsp = "*" 
$ 
$ tiks = tik2 - tik1 
$ secs = tiks / 100 
$ huns = tiks - (secs*100) 
$ write sys$output "" 
$! 
$ time$line1 ==  - 
 f$fao("Execution (CPU) sec!5UL.!2ZL   Direct I/O  !7UL   Peak working set!7UL!1AS", - 
          secs, huns, dio, wsp2, wsp) 
$ write sys$output time$line1 
$! 
$ time$line2 ==  - 
 f$fao("Elapsed (clock) sec!5UL.!2ZL   Buffered I/O!7UL   Peak virtual    !7UL!1AS", - 
                tsec, thun, bio, vip2, vip) 
$ write sys$output time$line2 
$! 
$ time$line3 == - 
 f$fao("Process ID         !AS   SYS$DISK I/O!7UL   Page faults     !7UL", - 
        f$getjpi(0,"pid"), dsk, pgf) 
$ write sys$output time$line3 
$ if wsp+vip .nes. "" then write sys$output - 
 "                                                       (* peak from parent)" 
$ write sys$output "" 
$ 
$! Place these output lines in the job logical name table, so the parent 
$! can access them (useful for batch jobs to automate the collection). 
$ 
$ define /job/nolog time$line1 "''time$line1'" 
$ define /job/nolog time$line2 "''time$line2'" 
$ define /job/nolog time$line3 "''time$line3'" 
$ 
$ verify = f$verify(verify) 

This example command procedure accepts multiple parameters, which include the RUN command, the name of the executable image to be run, and any parameters to be passed to the executable image.


$ TIMER RUN PROG_TEST
$ 
$! User command being timed: 
$ 
$ RUN PROG_TEST.EXE; 
 
Execution (CPU) sec   45.39   Direct I/O        3   Peak working set   2224 
Elapsed (clock) sec   45.96   Buffered I/O     18   Peak virtual      15808 
Process ID         20A00999   SYS$DISK I/O      6   Page faults          64 

If your program displays a lot of text, you can redirect the output from the program. Displaying text increases the buffered I/O count. Redirecting output from the program will change the times reported because of reduced screen I/O.

For More Information:

About system-wide tuning and suggestions for other performance enhancements on OpenVMS systems, see the OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems.

5.2.2 The Performance and Coverage Analyzer (PCA)

To generate profiling information, you can use the optional Performance and Coverage Analyzer (PCA) tool.

Profiling helps you identify areas of code where significant program execution time is spent; it can also identify those parts of an application that are not executed (by a given set of test data). PCA has two components:

PCA works with related DECset tools LSE and the Test Manager. PCA provides a callable routine interface, as well as a command-line and DECwindows Motif graphical windowing interface. The following examples demonstrate the character-cell interface.

When compiling a program for which PCA will record and analyze data, specify the /DEBUG qualifier on the FORTRAN command line:


$ FORTRAN /DEBUG TEST_PROG.F90

On the LINK command line, specify the PCA debugging module PCA$OBJ using the Linker /DEBUG qualifier:


$ LINK /DEBUG=SYS$LIBRARY:PCA$OBJ.OBJ TEST_PROG

When you run the program, the PCA$OBJ.OBJ debugging module invokes the Collector and is ready to accept your input to run your program under Collector control and gather the performance or coverage data:


$ RUN TEST_PROG
PCAC> 

You can enter Collector commands, such as SET DATAFILE, SET PC_SAMPLING, GO, and EXIT.

To run the Analyzer, type the PCA command and specify the name of a performance data file, such as the following:


$ PCA TEST_PROG
PCAA> 

You can enter the appropriate Analyzer commands to display the data in the performance data file in a graphic representation.

For More Information:

5.3 Data Alignment Considerations

The Compaq Fortran compiler aligns most numeric data items on natural boundaries to avoid run-time adjustment by software that can slow performance.

A natural boundary is a memory address that is a multiple of the data item's size (data type sizes are described in Table 8-1). For example, a REAL (KIND=8) data item aligned on natural boundaries has an address that is a multiple of 8. An array is aligned on natural boundaries if all of its elements are.

All data items whose starting address is on a natural boundary are naturally aligned. Data not aligned on a natural boundary is called unaligned data.

Although the Compaq Fortran compiler naturally aligns individual data items when it can, certain Compaq Fortran statements (such as EQUIVALENCE) can cause data items to become unaligned (see Section 5.3.1).

Although you can use the FORTRAN command /ALIGNMENT qualifier to ensure naturally aligned data, you should check and consider reordering data declarations of data items within common blocks and structures. Within each common block, derived type, or record structure, carefully specify the order and sizes of data declarations to ensure naturally aligned data. Start with the largest size numeric items first, followed by smaller size numeric items, and then nonnumeric (character) data.

5.3.1 Causes of Unaligned Data and Ensuring Natural Alignment

Common blocks (COMMON statement), derived-type data, and Compaq Fortran 77 record structures (STRUCTURE and RECORD statements) usually contain multiple items within the context of the larger structure.

The following declaration statements can force data to be unaligned:

To avoid unaligned data in a common block, derived-type data, or record structure (extension), use one or both of the following:

Other possible causes of unaligned data include unaligned actual arguments and arrays that contain a derived-type structure or Compaq Fortran 77 record structure.

When actual arguments from outside the program unit are not naturally aligned, unaligned data access will occur. Compaq Fortran assumes all passed arguments are naturally aligned and has no information at compile time about data that will be introduced by actual arguments during program execution.

For arrays where each array element contains a derived-type structure or Compaq Fortran 77 record structure, the size of the array elements may cause some elements (but not the first) to start on an unaligned boundary.

Even if the data items are naturally aligned within a derived-type structure without the SEQUENCE statement or a record structure, the size of an array element might require use of the FORTRAN /ALIGNMENT qualifier to supply needed padding to avoid some array elements being unaligned.

If you specify /ALIGNMENT=RECORDS=PACKED (or equivalent qualifiers), no padding bytes are added between array elements. If array elements each contain a derived-type structure with the SEQUENCE statement, array elements are packed without padding bytes regardless of the FORTRAN command qualifiers specified. In this case, some elements will be unaligned.

When /ALIGNMENT=RECORDS=NATURAL is in effect (default), the number of padding bytes added by the compiler for each array element is dependent on the size of the largest data item within the structure. The compiler determines the size of the array elements as an exact multiple of the largest data item in the derived-type structure without the SEQUENCE statement or a record structure. The compiler then adds the appropriate number of padding bytes.

For instance, if a structure contains an 8-byte floating-point number followed by a 3-byte character variable, each element contains five bytes of padding (16 is an exact multiple of 8). However, if the structure contains one 4-byte floating-point number, one 4-byte integer, followed by a 3-byte character variable, each element would contain one byte of padding (12 is an exact multiple of 4).

For More Information:

On the FORTRAN command /ALIGNMENT qualifier, see Section 5.3.4.

5.3.2 Checking for Inefficient Unaligned Data

During compilation, the Compaq Fortran compiler naturally aligns as much data as possible. Exceptions that can result in unaligned data are described in Section 5.3.1.

Because unaligned data can slow run-time performance, it is worthwhile to:

There are two ways unaligned data might be reported:

For More Information:

On the /WARNINGS qualifier, see Section 2.3.48.

5.3.3 Ordering Data Declarations to Avoid Unaligned Data

For new programs or when the source declarations of an existing program can be easily modified, plan the order of your data declarations carefully to ensure the data items in a common block, derived-type data, record structure, or data items made equivalent by an EQUIVALENCE statement will be naturally aligned.

Use the following rules to prevent unaligned data:

Using the suggested data declaration guidelines minimizes the need to use the /ALIGNMENT qualifier to add padding bytes to ensure naturally aligned data. In cases where the /ALIGNMENT qualifier is still needed, using the suggested data declaration guidelines can minimize the number of padding bytes added by the compiler.

5.3.3.1 Arranging Data Items in Common Blocks

The order of data items in a COMMON statement determines the order in which the data items are stored. Consider the following declaration of a common block named X:


LOGICAL (KIND=2) FLAG 
INTEGER          IARRY_I(3) 
CHARACTER(LEN=5) NAME_CH 
COMMON /X/ FLAG, IARRY_I(3), NAME_CH 

As shown in Figure 5-1, if you omit the appropriate FORTRAN command qualifiers, the common block will contain unaligned data items beginning at the first array element of IARRY_I.

Figure 5-1 Common Block with Unaligned Data


As shown in Figure 5-2, if you compile the program units that use the common block with the /ALIGNMENT=COMMONS=STANDARD qualifier, data items will be naturally aligned.

Figure 5-2 Common Block with Naturally Aligned Data


Because the common block X contains data items whose size is 32 bits or smaller, you can specify the /ALIGNMENT=COMMONS qualifier and still have naturally aligned data. If the common block contains data items whose size might be larger than 32 bits (such as REAL (KIND=8) data), specify /ALIGNMENT=COMMONS=NATURAL to ensure naturally aligned data.

If you can easily modify the source files that use the common block data, define the numeric variables in the COMMON statement in descending order of size and place the character variable last. This provides more portability, ensures natural alignment without padding, and does not require the FORTRAN command /ALIGNMENT=COMMONS=NATURAL (or equivalent) qualifier:


LOGICAL (KIND=2) FLAG 
INTEGER          IARRY_I(3) 
CHARACTER(LEN=5) NAME_CH 
COMMON /X/ IARRY_I(3), FLAG, NAME_CH 

As shown in Figure 5-3, if you arrange the order of variables from largest to smallest size and place character data last, the data items will be naturally aligned.

Figure 5-3 Common Block with Naturally Aligned Reordered Data


When modifying or creating all source files that use common block data, consider placing the common block data declarations in a module so the declarations are consistent. If the common block is not needed for compatibility (such as file storage or Compaq Fortran 77 use), you can place the data declarations in a module without using a common block.

5.3.3.2 Arranging Data Items in Derived-Type Data

Like common blocks, derived-type data may contain multiple data items (members).

Data item components within derived-type data will be naturally aligned on up to 64-bit boundaries, with certain exceptions related to the use of the SEQUENCE statement and FORTRAN qualifiers.

Compaq Fortran stores a derived data type as a linear sequence of values, as follows:

Consider the following declaration of array CATALOG_SPRING of derived-type PART_DT:


MODULE DATA_DEFS 
  TYPE PART_DT 
    INTEGER           IDENTIFIER 
    REAL              WEIGHT 
    CHARACTER(LEN=15) DESCRIPTION 
  END TYPE PART_DT 
  TYPE (PART_DT) CATALOG_SPRING(30) 
  . 
  . 
  . 
END MODULE DATA_DEFS 

As shown in Figure 5-4, the largest numeric data items are defined first and the character data type is defined last. There are no padding characters between data items and all items are naturally aligned. The trailing padding byte is needed because CATALOG_SPRING is an array; it is inserted by the compiler when the /ALIGNMENT=RECORDS=NATURAL qualifier (default) is in effect.

Figure 5-4 Derived-Type Naturally Aligned Data (in CATALOG_SPRING : ( ,))


5.3.3.3 Arranging Data Items in Compaq Fortran 77 Record Structures

Compaq Fortran supports record structures provided by Compaq Fortran 77. Compaq Fortran 77 record structures use the RECORD statement and optionally the STRUCTURE statement, which are extensions to the FORTRAN-77, Fortran 90, and Fortran 95 standards. The order of data items in a STRUCTURE statement determines the order in which the data items are stored.

Compaq Fortran stores a record in memory as a linear sequence of values, with the record's first element in the first storage location and its last element in the last storage location. Unless you specify the /ALIGNMENT=RECORDS=PACKED qualifier, padding bytes are added if needed to ensure data fields are naturally aligned.

The following example contains a structure declaration, a RECORD statement, and diagrams of the resulting records as they are stored in memory:


STRUCTURE /STRA/ 
  CHARACTER*1 CHR 
  INTEGER*4 INT 
END STRUCTURE 
   .
   .
   .
RECORD /STRA/ REC 

Figure 5-5 shows the memory diagram of record REC for naturally aligned records.

Figure 5-5 Memory Diagram of REC for Naturally Aligned Records


For More Information:

On data declaration statements, see the Compaq Fortran Language Reference Manual.

5.3.4 Qualifiers Controlling Alignment

The following qualifiers control whether the Compaq Fortran compiler adds padding (when needed) to naturally align multiple data items in common blocks, derived-type data, and Compaq Fortran 77 record structures:

The default behavior is that multiple data items in derived-type data and record structures will be naturally aligned; data items in common blocks will not be naturally aligned (/ALIGNMENT=(COMMONS=(PACKED, NOMULTILANGUAGE), RECORDS=NATURAL).

In derived-type data, using the SEQUENCE statement prevents /ALIGNMENT=RECORDS=NATURAL from adding needed padding bytes to naturally align data items.

For More Information:

On the /ALIGNMENT qualifier, see Section 2.3.3.

5.4 Use Arrays Efficiently

The following sections discuss these topics:

5.4.1 Accessing Arrays Efficiently

Many of the array access efficiency techniques described in this section are applied automatically by the Compaq Fortran loop transformation optimizations (see Section 5.8.1) or by the Compaq KAP for Fortran 90 for OpenVMS Alpha Systems performance preprocessor (described in Section 5.1.1).

Several aspects of array use can improve run-time performance. The following sections describe these aspects.

Array Access

The fastest array access occurs when contiguous access to the whole array or most of an array occurs. Perform one or a few array operations that access all of the array or major parts of an array instead of numerous operations on scattered array elements.

Rather than use explicit loops for array access, use elemental array operations, such as the following line that increments all elements of array variable A:


  A = A + 1. 

When reading or writing an array, use the array name and not a DO loop or an implied DO-loop that specifies each element number. Fortran 90/95 array syntax allows you to reference a whole array by using its name in an expression. For example:


     REAL ::  A(100,100) 
     A = 0.0 
     A = A + 1.                       ! Increment all elements of A by 1 
     . 
     . 
     . 
 
     WRITE (8) A                      ! Fast whole array use 

Similarly, you can use derived-type array structure components, such as:


   TYPE X 
     INTEGER A(5) 
   END TYPE X 
   . 
   . 
   . 
   TYPE (X) Z 
   WRITE (8) Z%A                      ! Fast array structure component use 

Multidimensional Arrays

Make sure multidimensional arrays are referenced using proper array syntax and are traversed in the "natural" ascending order column major for Fortran. With column-major order, the leftmost subscript varies most rapidly with a stride of one. Writing a whole array uses column-major order.

Avoid row-major order, as is done by C, where the rightmost subscript varies most rapidly.

For example, consider the nested DO loops that access a two-dimension array with the J loop as the innermost loop:


   INTEGER  X(3,5), Y(3,5), I, J 
   Y = 0 
   DO I=1,3                   ! I outer loop varies slowest 
     DO J=1,5                 ! J inner loop varies fastest 
       X (I,J) = Y(I,J) + 1   ! Inefficient row-major storage order 
     END DO                   ! (rightmost subscript varies fastest) 
   END DO 
   . 
   . 
   . 
   END PROGRAM 

Since J varies the fastest and is the second array subscript in the expression X (I,J), the array is accessed in row-major order.

To make the array accessed in natural column-major order, examine the array algorithm and data being modified.

Using arrays X and Y, the array can be accessed in natural column-major order by changing the nesting order of the DO loops so the innermost loop variable corresponds to the leftmost array dimension:


   INTEGER  X(3,5), Y(3,5), I, J 
   Y = 0 
 
   DO J=1,5                   ! J outer loop varies slowest 
     DO I=1,3                 ! I inner loop varies fastest 
       X (I,J) = Y(I,J) + 1   ! Efficient column-major storage order 
     END DO                   ! (leftmost subscript varies fastest) 
  END DO 
    . 
    . 
    . 
   END PROGRAM 

The Fortran 90/95 whole array access (X = Y + 1) uses efficient column major order. However, if the application requires that J vary the fastest or if you cannot modify the loop order without changing the results, consider modifying the application program to use a rearranged order of array dimensions. Program modifications include rearranging the order of:

In this case, the original DO loop nesting is used where J is the innermost loop:


   INTEGER  X(5,3), Y(5,3), I, J 
   Y = 0 
   DO I=1,3                  ! I outer loop varies slowest 
     DO J=1,5                ! J inner loop varies fastest 
       X (J,I) = Y(J,I) + 1  ! Efficient column-major storage order 
     END DO                  ! (leftmost subscript varies fastest) 
   END DO 
   . 
   . 
   . 
   END PROGRAM 

Code written to access multidimensional arrays in row-major order (like C) or random order can often make inefficient use of the CPU memory cache. For more information on using natural storage order during record I/O operations, see Section 5.5.3.

Array Intrinsic Procedures

Use the available Fortran 90/95 array intrinsic procedures rather than create your own.

Whenever possible, use Fortran 90/95 array intrinsic procedures instead of creating your own routines to accomplish the same task. Compaq Fortran array intrinsic procedures are designed for efficient use with the various Compaq Fortran run-time components.

Using the standard-conforming array intrinsics can also make your program more portable.

Noncontiguous Access

With multidimensional arrays where access to array elements will be noncontiguous, avoid left-most array dimensions that are a power of two (such as 256, 512).

Since the cache sizes are a power of two, array dimensions that are also a power of two may make inefficient use of cache when array access is noncontiguous. If the cache size is an exact multiple of the leftmost dimension, your program will probably make little use of the cache. This does not apply to contiguous sequential access or whole array access.

One work-around is to increase the dimension to allow some unused elements, making the leftmost dimension larger than actually needed. For example, increasing the leftmost dimension of A from 512 to 520 would make better use of cache:


   REAL A (512,100) 
   DO I = 2,511 
     DO J = 2,99 
       A(I,J)=(A(I+1,J-1) + A(I-1, J+1)) * 0.5 
     END DO 
   END DO 

In this code, array A has a leftmost dimension of 512, a power of two. The innermost loop accesses the rightmost dimension (row major), causing inefficient access. Increasing the leftmost dimension of A to 520 (REAL A (520,100)) allows the loop to provide better performance, but at the expense of some unused elements.

Because loop index variables I and J are used in the calculation, changing the nesting order of the DO loops changes the results.


Previous Next Contents Index