OpenVMS VAX RTL Mathematics (MTH$) Manual

Document revision date: 30 March 2001

OpenVMS VAX RTL Mathematics (MTH$) Manual

Contents

Index

Chapter 2
Vector Routines in MTH$

This chapter discusses four sets of routines provided by the RTL MTH$ facility that support vector processing. These routines are as follows:

Basic Linear Algebra Subroutines (BLAS) Level 1
First Order Linear Recurrence (FOLR) routines
Vector versions of existing scalar routines
Fast-Vector math routines

2.1 BLAS --- Basic Linear Algebra Subroutines Level 1

BLAS Level 1 routines perform vector operations, such as copying one vector to another, swapping vectors, and so on. These routines help you take advantage of vector processing speed. BLAS Level 1 routines form an integral part of many mathematical libraries, such as LINPACK and EISPACK.¹ Because these routines usually occur in the innermost loops of user code, the Run-Time Library provides versions of the BLAS Level 1 that are tuned to take best advantage of the VAX vector processors.

Two versions of BLAS Level 1 are provided. To use either of these libraries, link in the appropriate shareable image. The libraries are:

Scalar BLAS --- contained in the shareable image BLAS1RTL
Vector BLAS (routines that take advantage of vectorization) --- contained in the shareable image VBLAS1RTL

Note

To call the scalar BLAS from a program that runs on scalar hardware, specify the routine name preceded by BLAS1$ (for example, BLAS1$xCOPY). To call the vector BLAS from a program that runs on vector hardware, specify the routine name preceded by BLAS1$V (for example, BLAS1$VxCOPY).

This manual describes both the scalar and vector versions of BLAS Level 1, but for simplicity the vector prefix (BLAS1$V) is used exclusively. Remember to remove the letter V from the routine prefix when you want to call the scalar version.

If you are a Compaq Fortran programmer, do not specify BLAS vector routines explicitly. Specify the Fortran intrinsic function name only. The Compaq Fortran 77 for OpenVMS VAX Systems compiler determines whether the vector or scalar version of a BLAS routine should be used. The Fortran /BLAS=([NO]INLINE,[NO]MAPPED) qualifier controls how the compiler processes calls to BLAS Level 1. If /NOBLAS is specified, then all BLAS calls are treated as ordinary external routines. The default of INLINE means that calls to BLAS Level 1 routines will be treated as known language constructs, and VAX object code will be generated to compute the corresponding operations at the call site, rather than call a user-supplied routine. If the Fortran qualifier /VECTOR or /PARALLEL=AUTO is in effect, the generated code for the loops may use vector instructions or be decomposed to run on multiple processors. If MAPPED is specified, these calls will be treated as calls to the optimized implementations of these routines in the BLAS1$ and BLAS1$V portions of the MTH$ facility. For more information on the Fortran /BLAS qualifier, refer to the DEC Fortran Performance Guide for OpenVMS VAX Systems.

Ten families of routines form BLAS Level 1. (BLAS1$VxCOPY is one family of routines, for example.) These routines operate at the vector-vector operation level. This means that BLAS Level 1 performs operations on one or two vectors. The level of complexity of the computations (in other words, the number of operations being performed in a BLAS Level 1 routine) is of the order n (the length of the vector).

Each family of routines in BLAS Level 1 contains routines coded in single precision, double precision (D and G formats), single precision complex, and double precision complex (D and G formats). BLAS Level 1 can be broadly classified into three groups:

BLAS1$VxCOPY, BLAS1$VxSWAP, BLAS1$VxSCAL and BLAS1$VxAXPY: These routines return vector outputs for vector inputs. The results of all these routines are independent of the order in which the elements of the vector are processed. The scalar and vector versions of these routines return the same results.
BLAS1$VxDOT, BLAS1$VIxAMAX, BLAS1$VxASUM, and BLAS1$VxNRM2: These routines are all reduction operations that return a scalar value. The results of these routines (except BLAS1$VIxAMAX) are dependent upon the order in which the elements of the vector are processed. The scalar and vector versions of BLAS1$VxDOT, BLAS1$VxASUM, and BLAS1$VxNRM2 can return different results. The scalar and vector versions of BLAS1$VIxAMAX return the same results.
BLAS1$VxROTG and BLAS1$VxROT: These routines are used for a particular application (plane rotations), unlike the routines in the previous two categories. The results of BLAS1$VxROTG and BLAS1$VxROT are independent of the order in which the elements of the vector are processed. The scalar and vector versions of these routines return the same results.

Table 2-1 lists the functions and corresponding routines of BLAS Level 1.

Table 2-1 Functions of BLAS Level 1
Function Routine Data Type

Copy a vector to another vector BLAS1$VSCOPY Single

BLAS1$VDCOPY Double (D-floating or G-floating)

BLAS1$VCCOPY Single complex

BLAS1$VZCOPY Double complex (D-floating or G-floating)

Swap the elements of two vectors BLAS1$VSSWAP Single

BLAS1$VDSWAP Double (D-floating or G-floating)

BLAS1$VCSWAP Single complex

BLAS1$VZSWAP Double complex (D-floating or G-floating)

Scale the elements of a vector BLAS1$VSSCAL Single

BLAS1$VDSCAL Double (D-floating)

BLAS1$VGSCAL Double (G-floating)

BLAS1$VCSCAL Single complex with complex scale

BLAS1$VCSSCAL Single complex with real scale

BLAS1$VZSCAL Double complex with complex scale (D-floating)

BLAS1$VWSCAL Double complex with complex scale (G-floating)

BLAS1$VZDSCAL Double complex with real scale (D-floating)

BLAS1$VWGSCAL Double complex with real scale (G-floating)

Multiply a vector by a scalar and add a vector BLAS1$VSAXPY Single

BLAS1$VDAXPY Double (D-floating)

BLAS1$VGAXPY Double (G-floating)

BLAS1$VCAXPY Single complex

BLAS1$VZAXPY Double complex (D-floating)

BLAS1$VWAXPY Double complex (G-floating)

Obtain the index of the first element of a vector having the largest absolute value BLAS1$VISAMAX Single

BLAS1$VIDAMAX Double (D-floating)

BLAS1$VIGAMAX Double (G-floating)

BLAS1$VICAMAX Single complex

BLAS1$VIZAMAX Double complex (D-floating)

BLAS1$VIWAMAX Double complex (G-floating)

Obtain the sum of the absolute values of the elements of a vector BLAS1$VSASUM Single

BLAS1$VDASUM Double (D-floating)

BLAS1$VGASUM Double (G-floating)

BLAS1$VSCASUM Single complex

BLAS1$VDZASUM Double complex (D-floating)

BLAS1$VGWASUM Double complex (G-floating)

Obtain the inner product of two vectors BLAS1$VSDOT Single

BLAS1$VDDOT Double (D-floating)

BLAS1$VGDOT Double (G-floating)

BLAS1$VCDOTU Single complex unconjugated

BLAS1$VCDOTC Single complex conjugated

BLAS1$VZDOTU Double complex unconjugated (D-floating)

BLAS1$VWDOTU Double complex unconjugated (G-floating)

BLAS1$VZDOTC Double complex conjugated (D-floating)

BLAS1$VWDOTC Double complex conjugated (G-floating)

Obtain the Euclidean norm of the vector BLAS1$VSNRM2 Single

BLAS1$VDNRM2 Double (D-floating)

BLAS1$VGNRM2 Double (G-floating)

BLAS1$VSCNRM2 Single complex

BLAS1$VDZNRM2 Double complex (D-floating)

BLAS1$VGWNRM2 Double complex (G-floating)

Generate the elements for a Givens plane rotation BLAS1$VSROTG Single

BLAS1$VDROTG Double (D-floating)

BLAS1$VGROTG Double (G-floating)

BLAS1$VCROTG Single complex

BLAS1$VZROTG Double complex (D-floating)

BLAS1$VWROTG Double complex (G-floating)

Apply a Givens plane rotation BLAS1$VSROT Single

BLAS1$VDROT Double (D-floating)

BLAS1$VGROT Double (G-floating)

BLAS1$VCSROT Single complex

BLAS1$VZDROT Double complex (D-floating)

BLAS1$VWGROT Double complex (G-floating)

**Table 2-1 Functions of BLAS Level 1**
Function	Routine	Data Type
Copy a vector to another vector	BLAS1$VSCOPY	Single
	BLAS1$VDCOPY	Double (D-floating or G-floating)
	BLAS1$VCCOPY	Single complex
	BLAS1$VZCOPY	Double complex (D-floating or G-floating)

Swap the elements of two vectors	BLAS1$VSSWAP	Single
	BLAS1$VDSWAP	Double (D-floating or G-floating)
	BLAS1$VCSWAP	Single complex
	BLAS1$VZSWAP	Double complex (D-floating or G-floating)

Scale the elements of a vector	BLAS1$VSSCAL	Single
	BLAS1$VDSCAL	Double (D-floating)
	BLAS1$VGSCAL	Double (G-floating)
	BLAS1$VCSCAL	Single complex with complex scale
	BLAS1$VCSSCAL	Single complex with real scale
	BLAS1$VZSCAL	Double complex with complex scale (D-floating)
	BLAS1$VWSCAL	Double complex with complex scale (G-floating)
	BLAS1$VZDSCAL	Double complex with real scale (D-floating)
	BLAS1$VWGSCAL	Double complex with real scale (G-floating)

Multiply a vector by a scalar and add a vector	BLAS1$VSAXPY	Single
	BLAS1$VDAXPY	Double (D-floating)
	BLAS1$VGAXPY	Double (G-floating)
	BLAS1$VCAXPY	Single complex
	BLAS1$VZAXPY	Double complex (D-floating)
	BLAS1$VWAXPY	Double complex (G-floating)

Obtain the index of the first element of a vector having the largest absolute value	BLAS1$VISAMAX	Single
	BLAS1$VIDAMAX	Double (D-floating)
	BLAS1$VIGAMAX	Double (G-floating)
	BLAS1$VICAMAX	Single complex
	BLAS1$VIZAMAX	Double complex (D-floating)
	BLAS1$VIWAMAX	Double complex (G-floating)

Obtain the sum of the absolute values of the elements of a vector	BLAS1$VSASUM	Single
	BLAS1$VDASUM	Double (D-floating)
	BLAS1$VGASUM	Double (G-floating)
	BLAS1$VSCASUM	Single complex
	BLAS1$VDZASUM	Double complex (D-floating)
	BLAS1$VGWASUM	Double complex (G-floating)

Obtain the inner product of two vectors	BLAS1$VSDOT	Single
	BLAS1$VDDOT	Double (D-floating)
	BLAS1$VGDOT	Double (G-floating)
	BLAS1$VCDOTU	Single complex unconjugated
	BLAS1$VCDOTC	Single complex conjugated
	BLAS1$VZDOTU	Double complex unconjugated (D-floating)
	BLAS1$VWDOTU	Double complex unconjugated (G-floating)
	BLAS1$VZDOTC	Double complex conjugated (D-floating)
	BLAS1$VWDOTC	Double complex conjugated (G-floating)

Obtain the Euclidean norm of the vector	BLAS1$VSNRM2	Single
	BLAS1$VDNRM2	Double (D-floating)
	BLAS1$VGNRM2	Double (G-floating)
	BLAS1$VSCNRM2	Single complex
	BLAS1$VDZNRM2	Double complex (D-floating)
	BLAS1$VGWNRM2	Double complex (G-floating)

Generate the elements for a Givens plane rotation	BLAS1$VSROTG	Single
	BLAS1$VDROTG	Double (D-floating)
	BLAS1$VGROTG	Double (G-floating)
	BLAS1$VCROTG	Single complex
	BLAS1$VZROTG	Double complex (D-floating)
	BLAS1$VWROTG	Double complex (G-floating)

Apply a Givens plane rotation	BLAS1$VSROT	Single
	BLAS1$VDROT	Double (D-floating)
	BLAS1$VGROT	Double (G-floating)
	BLAS1$VCSROT	Single complex
	BLAS1$VZDROT	Double complex (D-floating)
	BLAS1$VWGROT	Double complex (G-floating)

For a detailed description of these routines, refer to the Vector MTH$ Reference Section of this manual.

2.1.1 Using BLAS Level 1

The following sections provide some guidelines for using BLAS Level 1.

2.1.1.1 Memory Overlap

The vector BLAS produces unpredictable results when any element of the input argument shares a memory location with an element of the output argument. (An exception is a special case found in the BLAS1$VxCOPY routines.)

The vector BLAS and the scalar BLAS can yield different results when the input argument overlaps the output array.

2.1.1.2 Round-Off Effects

For some of the routines in BLAS Level 1, the final result is independent of the order in which the operations are performed. However, in other cases (for example, some of the reduction operations), efficiency dictates that the order of operations on a vector machine be different from the natural order of operations. Because round-off errors are dependent upon the order in which the operations are performed, some of the routines will not return results that are bit-for-bit identical to the results obtained by performing the operations in natural order.

Where performance can be increased by the use of a backup data type, this has been done. This is the case for BLAS1$VSNRM2, BLAS1$VSCNRM2, BLAS1$VSROTG, and BLAS1$VCROTG. The use of a backup data type can also yield a gain in accuracy over the scalar BLAS.

2.1.1.3 Underflow and Overflow

In accordance with LINPACK convention, underflow, when it occurs, is replaced by a zero. A system message informs you of overflow. Because the order of operations for some routines is different from the natural order, overflow might not occur at the same array element in both the scalar and vector versions of the routines.

2.1.1.4 Notational Definitions

The vector BLAS (except the BLAS1$VxROTG routines) perform operations on vectors. These vectors are defined in terms of three quantities:

A vector length, specified as n
An array or a starting element in an array, specified as x
An increment or spacing parameter to indicate the distance in number of array elements to skip between successive vector elements, specified as incx

Suppose x is a real array of dimension ndim, n is its vector length, and incx is the increment used to access the elements of a vector X . The elements of vector X, X_i, i=1,...,n, are stored in x. If incx is greater than or equal to 0, then X_i is stored in the following location:

x(1+(i-1)*incx)

However, if incx is less than 0, then X_i is stored in the following location:

x(1+(n-i)*|incx|)

It therefore follows that the following condition must be satisfied:

ndim => 1+(n-1)*|incx|

A positive value for incx is referred to as forward indexing, and a negative value is referred to as backward indexing. A value of zero implies that all of the elements of the vector are at the same location, x₁.

Suppose ndim = 20 and n = 5. In this case, incx = 2 implies that X₁, X₂, X₃, X₄, and X₅ are located in array elements x₁, x₃, x₅, x₇, and x₉.

If, however, incx is negative, then X₁, X₂, X₃, X₄, and X₅ are located in array elements x₉, x₇, x₅, x₃, and x₁. In other words, when incx is negative, the subscript of x decreases as i increases.

For some of the routines in BLAS Level 1, incx = 0 is not permitted. In the cases where a zero value for incx is permitted, it means that x₁ is broadcast into each element of the vector X of length n.

You can operate on vectors that are embedded in other vectors or matrices by choosing a suitable starting point of the vector. For example, if A is an n1 by n2 matrix, column j is referenced with a length of n1, starting point A(1,j), and increment 1. Similarly, row i is referenced with a length of n2, starting point A(i,1), and increment n1.

Note

¹ For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979.

2.2 FOLR --- First Order Linear Recurrence Routines

The MTH$ FOLR routines provide a vectorized algorithm for the linear recurrence relation. A linear recurrence uses the result of a previous pass through a loop as an operand for subsequent passes through the loop and prevents the vectorization of a loop.

The only error checking performed by the FOLR routines is for a reserved operand.

There are four families of FOLR routines in the MTH$ facility. Each family accepts each of four data types (longword integer, F-floating, D-floating, and G-floating). However, all of the arrays you specify in a single FOLR call must be of the same data type.

For a detailed description of these routines, see Part 3.

2.2.1 FOLR Routine Name Format

The four families of FOLR routines are as follows:

MTH$VxFOLRy_MA_V15
MTH$VxFOLRy_z_V8
MTH$VxFOLRLy_MA_V5
MTH$VxFOLRLy_z_V2

where:

x = J for longword integer, F for F-floating, D for D-floating, or G for G-floating

y = P for a positive recursion element, or N for a negative recursion element

z = M for multiplication, or A for addition

The FOLR entry points end with _Vn, where n is an integer between 0 and 15 that denotes the vector registers that the FOLR routine uses. For example, MTH$VxFOLRy_z_V8 uses vector registers V0 through V8.

To determine which group of routines you should use, match the task in the left column in Table 2-2 that you need the routine to perform with the method of storage that you need the routine to employ. The point where these two tasks meet shows the FOLR routine you should call.

Table 2-2 Determining the FOLR Routine You Need
Tasks Save each iteration in an array Save only last result in a variable

Multiplication AND addition MTH$VxFOLRy_MA_V15 MTH$VxFOLRLy_MA_V5

Multiplication OR addition MTH$VxFOLRy_z_V8 MTH$VxFOLRLy_z_V2

**Table 2-2 Determining the FOLR Routine You Need**
Tasks	Save each iteration in an array	Save only last result in a variable
Multiplication AND addition	MTH$VxFOLRy_MA_V15	MTH$VxFOLRLy_MA_V5
Multiplication OR addition	MTH$VxFOLRy_z_V8	MTH$VxFOLRLy_z_V2

2.2.2 Calling a FOLR Routine

Save the contents of V0 through Vn before calling a FOLR routine if you need it after the call. The variable n can be 2, 5, 8, or 15, depending on the FOLR routine entry point. (The OpenVMS Calling Standard specifies that a called procedure may modify all of the vector registers. The FOLR routines modify only the vector registers V0 through Vn.)

The MTH$ FOLR routines assume that all of the arrays are of the same data type.

Contents

Index

privacy and legal statement

6117PRO_002.HTML

x	=	J for longword integer, F for F-floating, D for D-floating, or G for G-floating
y	=	P for a positive recursion element, or N for a negative recursion element
z	=	M for multiplication, or A for addition

OpenVMS VAX RTL Mathematics (MTH$) Manual

Chapter 2Vector Routines in MTH$

2.1 BLAS --- Basic Linear Algebra Subroutines Level 1

1 For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979.

Chapter 2
Vector Routines in MTH$

¹ For more information, see Basic Linear Algebra Subprograms for FORTRAN Usage in ACM Transactions on Mathematical Software, Vol. 5, No. 3, September 1979.