OpenVMS User's Manual

Document revision date: 19 July 1999

OpenVMS User's Manual

Contents

Index

11.8 Optimizing a Sort or Merge Operation

There are several ways in which you can improve the efficiency of a Sort or Merge operation, based on your sorting environment. Use the /STATISTICS qualifier with the SORT or MERGE command to get information about the variables in your sorting environment.

After you examine the statistics display, consider any of the optimization options presented in the following sections.

When you enter the SORT or MERGE command with the /STATISTICS qualifier, you see output similar to the following:

$ SORT/STATISTICS PAGEANT.LIS DOCUMENT.LIS OpenVMS Sort/Merge Statistics Records read: 3 (1) Input record length: 26 Records sorted: 3 Internal length: 28 Records output: 3 Output record length: 26 Working set extent: 16384 (2) Sort tree size: 42 Virtual memory: 392 Number of initial runs: 0 Direct I/O: 10 Maximum merge order: 0 Buffered I/O: 11 Number of merge passes: 0 Page faults: 158 (3) Work file allocation: 0 (4) Elapsed time: 00:00:00.54 Elapsed CPU: 00:00:00.03 (5)

As you examine the fields, note the following:

Records read
Lists the number of records that were read during a Sort operation. See Section 11.8.2 for information on selectively omitting records from a Sort operation.
Working set extent
Shows how many blocks are reserved to perform the sort operation. See Section 11.8.4 for information on making your working set larger.
Page faults
Shows how many times the operating system has transferred parts of your process from physical memory to your paging device. See Section 11.8.4 for more information on preventing paging.
Work file allocation
Shows how much disk space is reserved for your work file. See Section 11.8.3 for more information on work files.
Elapsed CPU
Shows how much CPU time the operating system took to process the sort operation. See Section 11.8.1 for information on saving time by choosing different methods of sorting.

11.8.1 Sorting Process

Sort defines four processes for sorting data internally: record, tag, address and indexed. (The high-performance Sort/Merge utility supports only the record process. Implementation of tag, address, and index processes is deferred to a future OpenVMS Alpha release.) RECORD is the default process. The type of process you choose affects the performance of the Sort operation as well as storage requirements. See the Section 11.2.6 for information about the different sort processes.

Before you select a sorting process, consider the following:

How you will use the output file
- Because record and tag sorting generate files that contain entire sorted records, these reordered files are ready to be used.
- Both address- and index-sorted output files can be processed by a program written in a programming language such as Pascal, Fortran, MACRO, or C.
- Address sorting creates an output file of pointers to the records in the input file. This list consists of binary RFAs plus a file number when sorting multiple input files. A program accesses the records by using the pointers.
- Index sorting creates an output file containing both RFAs and key fields plus a file number when sorting multiple files. The format of these key fields is the same as in the input files. If the program needs the key field contents for a decision during future processing, select index sorting rather than address sorting.
If you need to reorder records from one file in several ways for different purposes, store several output files from address or index sorting. Use the output files to access the records in the main file in the sorted order that you want.
The temporary storage space available for sorting
Tag sorting uses less temporary storage space than record sorting. Because record sorting keeps the record intact during the sort, it uses much more work space when the files are large. Address and index sorting use little temporary storage space.
The type of input and output device used
Record sorting is the only process that can accept input from cards, magnetic tape, and disks. Output from tag and record sorting can go to any output device. Output from address and index sorting must go to a device that accepts binary data.
The differences in speed
If you plan to retrieve the sorted records at some point in the operation, record sorting is usually the fastest process. Otherwise, address and index sorting are the fastest processes.

11.8.2 Omitting Records and Fields

From a specification file, you can improve Sort efficiency by using the /CONDITION, /INCLUDE, and /OMIT qualifiers to process only those records needed in the output file. (The high-performance Sort/Merge utility does not support specification files. Implementation of this feature is deferred to a future OpenVMS Alpha release.) You can also use specification file qualifiers to reformat records, omitting unnecessary fields from the output file. These qualifiers are not available as command line qualifiers.

11.8.3 Assigning Work Files

During a Sort operation, records from the input file are read into memory. If the allocated memory cannot hold all the records, Sort transfers the sorted data to one or more temporary work files. Merge does not use work files.

You can increase sort efficiency by changing the number of work files and by assigning them to specific devices:

The Sort command line qualifier /WORK_FILES=n overrides the number of work files allocated.
Normally, Sort places work files on the device SYS$SCRATCH and accesses them in an arbitrary order. You can assign work files to specific devices in two ways:
- In a specification file, the /WORK_FILES=(device,...) qualifier places the work files on the specified devices. See Section 11.9.3 for more information about using the /WORK_FILES qualifier in a specification file.
- If you are not using a specification file, you can use the DCL command ASSIGN to assign the work files to specific devices.
  Sort uses the SORTWORKn logical names to identify user-specified device names for the workfiles, where n is a value from 0 through 9. (For the high-performance Sort/Merge utility, n is a value from 0 to 254.) Define a SORTWORKn logical as follows:
  ASSIGN device: SORTWORKn
  For example,
  $ ASSIGN WORK$2: SORTWORK1 $ ASSIGN WORK$3: SORTWORK2
  This example defines SORTWORK1 as the device WORK$2: and SORTWORK2 as the device WORK$3:. For more information on logical names, see Chapter 13.)

Consider the following when you assign work files to devices:

Assign work files to the fastest devices available. For example, random-access, mass storage devices such as disks.
Choose devices with the least activity and the most space available.
Assign each work file to a different physical device to maximize overlapping input and output.

11.8.4 Modifying the Working Set Extent

If Sort requires work files (for example, if you are sorting a large file), a larger working set can increase sort efficiency. However, if your system is used heavily, it might be unable to allocate all the pages in the working set extent to your process. This can result in paging, which occurs when the operating system transfers parts of a process between physical memory and memory on a paging device; only the active part of the process remains in the physical memory. To avoid excessive paging, you can decrease the working set extent for your process. (Use the SET WORKING_SET command to decrease the working set extent.)

11.9 Summary of Sort/Merge Qualifiers

The following list describes command qualifiers used with the SORT and MERGE commands. To use a command qualifier, include the qualifier immediately after the SORT or MERGE command.

/[NO]CHECK_SEQUENCE

This qualifier applies to the MERGE command only. It verifies the sequence of the records in MERGE input files. Merge checks the sequence of records by default.
The /CHECK_SEQUENCE qualifier checks whether the records of one or more files (up to 10; the high-performance Sort/Merge utility supports up to 12) have been sorted. (The records will still be directed to an output file, which you must specify.) If you are checking whether records are sorted on a key field other than the entire record, you must specify key information, along with the requesting sequence.
Use the /NOCHECK_SEQUENCE qualifier to prevent Merge from checking the sequence of records.
Example
$ MERGE/KEY=(SIZE:4,POSITION:3)/NOCHECK_SEQUENCE - _$ PRICE1.DAT,PRICE2.DAT PRICE.LIS
In this example, the /NOCHECK_SEQUENCE qualifier specifies that the sequence of the input files, PRICE1.DAT and PRICE2.DAT, is not to be checked.

/COLLATING_SEQUENCE=sequence

Selects one of three predefined collating orders for character key fields, or specifies the name of a National Character Set (NCS) collating sequence to be used in comparing character keys. (The high-performance Sort/Merge utility does not support the NCS collating sequences. Support for NCS collating sequences is deferred to a future OpenVMS Alpha release.) Sort can arrange characters in ASCII (default), EBCDIC, or Multinational sequences.
Example
$ SORT/COLLATING_SEQUENCE=MULTINATIONAL - _$ NAMES.DAT,NOM.DAT LIST.LIS
This SORT command arranges the input files NAMES.DAT and NOM.DAT according to the Multinational collating sequence to create the output file LIST.LIS.

/[NO]DUPLICATES

By default, Sort retains all multiple records with duplicate keys. The /NODUPLICATES qualifier eliminates all but one of multiple records with duplicate keys. The retained records may not appear in the same order as they appeared in the input file. If you want to specify which duplicate record to keep, invoke Sort at the program level and specify an equal-key routine.
The /STABLE and the /NODUPLICATES qualifiers are mutually exclusive.
Example
$ SORT/KEY=(POSITION:3,SIZE:5,DECIMAL)/NODUPLICATES - _$ ACCT1,ACCT2 ACCT.LIS
This SORT command arranges the two input files according to the key supplied and eliminates all but one of multiple records with equal keys.

/KEY=(POSITION:n,SIZE:n[,field,...])

Describes key fields, including the position, size, sorting order (ASCENDING or DESCENDING), priority (NUMBER:n), and data type (such as character, binary, h_floating). By default, Sort reorders a file by sorting entire records with character data in ascending order.
See Section 11.2.1 for detailed information about the /KEY qualifier.

/PROCESS=type

(Applies to the SORT command only.) Defines the internal sorting process. The /PROCESS qualifier allows you to choose one of four processes: record, tag, address, or index. (The high-performance Sort/Merge utility supports only the record process. Implementation of tag, address, and index processes is deferred to a future OpenVMS Alpha release.)
See Section 11.2.6 for detailed information about the /PROCESS qualifier.
Example
$ SORT/KEY=(POS:40,SIZ:2,DESC)/PROCESS=TAG YRENDAVG.DAT - _$ DESCYRAVG.LIS
This Sort operation uses a tag sorting process to create the output file DESCYRAVG.LIS.

/SPECIFICATION=filespec

(The high-performance Sort/Merge utility does not support this qualifier. Implementation of this feature is deferred to a future OpenVMS Alpha release.)

Identifies a Sort or Merge specification file to be used in a Sort or Merge operation. The default specification file type is .SRT.
See Section 11.7 and Section 11.9.3 for information about using specification files.

/[NO]STABLE

By default, records with equal keys are not guaranteed to be placed in the output file in the order they appear in the input file. The /STABLE qualifier maintains the records in that order.
The /STABLE and /NODUPLICATES qualifiers are mutually exclusive.
Example
$ SORT/KEY=(POS:1,SIZ:5,DECIMAL)/STABLE PRICESA.DAT, - _$ PRICESB.DAT,PRICESC.DAT SUMMARY.LIS
In this Sort operation, records with equal keys from PRICESA.DAT will be listed first, followed by those from PRICESB.DAT, followed by those from PRICESC.DAT.

/[NO]STATISTICS

Displays a statistical summary to SYS$OUTPUT that can be used for optimization. To save these statistics in a file, use the following command:

$ DEFINE/USER SYS$ERROR output-file

The statistical summary contains the following information:

Statistic Description

Records read The number of records read by Sort or Merge.

Records sorted The number of records that have been processed using Sort. This number could be less than the number of records read if a specification file is used to select only certain records for the Sort or Merge operation.

Records output The number of records written to the output file. This number could be less than the number of records sorted if /NODUPLICATES was selected or if I/O errors occurred when the output records were being written.

Working set extent The number of pages in the process working set extent. This value is used as an upper limit on the size of the sort data structure. Adjusting this value is one way to improve the efficiency of a Sort operation.

Virtual memory The number of pages of virtual memory added to the Sort image to hold the data.

Direct I/O + buffered I/O This total is the number of I/O movements needed to read and write data. The lower this total value is, the more efficient the ordering operation.

Page faults Indicates how well the data fits into memory: the higher the number of page faults, the less efficient the ordering operation.

Elapsed time The total wall clock time used by the Sort or Merge operation in hours, minutes, seconds, and hundredths of seconds.

Input record length This value is obtained from the Record Management Services (OpenVMS RMS) unless the user supplies it.

Internal length The size in bytes of an internal format node. This includes any keys, data, a word to store the length, record file addresses (RFAs), and converted keys.

Output record length The length of the output record. The length is computed from the input record length, the sort process, and the record reformatting requested.

Sort tree size The number of records that fit in the Sort internal data structure.

Number of initial runs One indication of how well the data fits into memory.

Maximum merge order The maximum number of sorted strings that are merged at one time.

Number of merge passes The number of times the Sort utility merges strings until one sorted output string is produced. The number of initial runs and the number of merge passes indicate how well the data fits into memory. The higher these numbers, the further the working set size is from containing the data and the longer the sorting takes.

Work file allocation The number of blocks used for the work files. When more than one merge pass is needed, this size is approximately twice the size of the input file allocation.

Elapsed CPU The CPU time used by the ordering operation; it does not include time spent waiting for I/O operations to complete or time spent waiting while another process executes.

Statistic	Description
Records read	The number of records read by Sort or Merge.
Records sorted	The number of records that have been processed using Sort. This number could be less than the number of records read if a specification file is used to select only certain records for the Sort or Merge operation.
Records output	The number of records written to the output file. This number could be less than the number of records sorted if /NODUPLICATES was selected or if I/O errors occurred when the output records were being written.
Working set extent	The number of pages in the process working set extent. This value is used as an upper limit on the size of the sort data structure. Adjusting this value is one way to improve the efficiency of a Sort operation.
Virtual memory	The number of pages of virtual memory added to the Sort image to hold the data.
Direct I/O + buffered I/O	This total is the number of I/O movements needed to read and write data. The lower this total value is, the more efficient the ordering operation.
Page faults	Indicates how well the data fits into memory: the higher the number of page faults, the less efficient the ordering operation.
Elapsed time	The total wall clock time used by the Sort or Merge operation in hours, minutes, seconds, and hundredths of seconds.
Input record length	This value is obtained from the Record Management Services (OpenVMS RMS) unless the user supplies it.
Internal length	The size in bytes of an internal format node. This includes any keys, data, a word to store the length, record file addresses (RFAs), and converted keys.
Output record length	The length of the output record. The length is computed from the input record length, the sort process, and the record reformatting requested.
Sort tree size	The number of records that fit in the Sort internal data structure.
Number of initial runs	One indication of how well the data fits into memory.
Maximum merge order	The maximum number of sorted strings that are merged at one time.
Number of merge passes	The number of times the Sort utility merges strings until one sorted output string is produced. The number of initial runs and the number of merge passes indicate how well the data fits into memory. The higher these numbers, the further the working set size is from containing the data and the longer the sorting takes.
Work file allocation	The number of blocks used for the work files. When more than one merge pass is needed, this size is approximately twice the size of the input file allocation.
Elapsed CPU	The CPU time used by the ordering operation; it does not include time spent waiting for I/O operations to complete or time spent waiting while another process executes.

Example

$ SORT/STATISTICS PRICE1.DAT,PRICE2.DAT PRICE.LIS

This SORT /STATISTICS command results in the following statistical display:

OpenVMS Sort/Merge Statistics Records read: 793 Input record length: 80 Records sorted: 793 Internal length: 80 Records output: 793 Output record length: 80 Working set extent: 100 Sort tree size: 412 Virtual memory: 433 Number of initial runs: 2 Direct I/O: 22 Maximum merge order: 2 Buffered I/O: 9 Number of merge passes: 1 Page faults: 3418 Work file allocation: 114 Elapsed time: 00:00:05.98 Elapsed CPU: 00:00:03.63

/WORK_FILES[=n]

(Applies to the SORT command only.) Increases the number of Sort work files by any number, from 1 to 10 (the high-performance Sort/Merge utility supports up to 255) inclusively, to make each work file smaller. If the available disks are too small or too full for work files, increasing the number of files can improve the efficiency of the Sort operation.
Sort does not create work files until it needs them. If Sort needs work files, it creates two by default (SORTWORK0, SORTWORK1), which are placed in the SYS$SCRATCH directory.
Example
$ ASSIGN DRA5: SORTWORK0 $ ASSIGN DB0: SORTWORK1 $ ASSIGN DB1: SORTWORK2 $ SORT/KEY=(POS:1,SIZ:80)/WORK_FILES=3 - _$ STATS1,STATS2,STATS3,STATS4 SUMMARY.LIS
Because the input files in this Sort operation are large files, specifying three work files improves the efficiency of the sort operation.
Note that you can also assign the work files to a specific directory on a device by including the directory name. For example, to assign SORTWORK0 to the [WORKSPACE] directory on DRA5, enter the following command:
$ ASSIGN DRA5:[WORKSPACE] SORTWORK0

11.9.1 Input File Qualifier

The following input qualifier should be included immediately after the input file specification in the SORT or MERGE command line:

/FORMAT=(RECORD_SIZE:n,FILE_SIZE:n)

Defines input file characteristics; allows you to specify or override record or file size. It must be specified immediately after the input file specification in the Sort or Merge command line.
Sort uses input file size information to determine the amount of memory needed, as well as the size of the work files for the Sort operation. If the file size is unknown (for example, you are sorting files that do not reside on disk or standard ANSI magnetic tape), Sort assumes a fairly large file size.
Specify the following qualifier values:

RECORD_SIZE: n

Specifies the input file's longest record length (LRL) in bytes. The maximum longest record length that can be specified depends on the file organization:

Sequential	32,767
Relative	16,383
Indexed-sequential	16,362

These values include control bytes for variable records with fixed-length control (VFC) format.

FILE_SIZE: n

Specifies input file size in blocks. The maximum file size accepted is 4,294,967,295 blocks.

You can also use /FORMAT as an output file qualifier. See Section 11.9.2 for more information.
Example

$ SORT/KEY=(POS:40,SIZ:2,DESC) - _$CRA0:YRENDAVG.DAT/FORMAT=(RECORD_SIZE:41,FILE_SIZE:3) - _$DESCYRAVG.LIS

Because the input file YRENDAVG.DAT does not reside on a disk device or ANSI magnetic tape, file organization must be described by the /FORMAT qualifier.

11.9.2 Output File Qualifiers

The following output qualifiers can be used with the SORT and MERGE commands. To use an output file qualifier, include the qualifier immediately after the output file specification in the SORT or MERGE command line.

/ALLOCATION=n

Specifies the number of blocks, from 1 through 4,294,967,295, to be preallocated to the output file for optimization. Use this qualifier when you know that the output file allocation will differ substantially from the total input file allocation (for example, when reformatting data or omitting records).
The /ALLOCATION qualifier is required if the /CONTIGUOUS qualifier is used.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT - _$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
This SORT command allocates 1000 contiguous blocks for the output file SUMMARY.LIS.

/BUCKET_SIZE=n

Specifies OpenVMS RMS bucket size (the number of 512-byte blocks per bucket) to be used by relative and indexed sequential output disk files for optimization. A value of 1 through 32 is allowed.
If the output file organization is the same as for the input files, the default value is the same as the bucket size of the first input file. If output file organization is different, the default value is 1.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS1.DAT,STATS2.DAT - _$ SUMMARY.LIS/BUCKET_SIZE=16/RELATIVE
This SORT command results in the output file SUMMARY.LIS that has a bucket size of 16 with relative organization.

/CONTIGUOUS

Requests that the output file be stored in contiguous disk blocks to decrease access time. Must be used with the /ALLOCATION qualifier. By default, Sort/Merge does not allocate contiguous disk blocks for the output file.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT - _$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
This SORT command allocates 1,000 contiguous blocks for the output file SUMMARY.LIS.

/FORMAT=(type:n[,...])

Specifies the output file record format (FIXED:n, VARIABLE:n, or CONTROLLED:n) if it differs from the input file format. You can also specify the size (SIZE:n) or the block size (BLOCK_SIZE:n) of the file records.
If the Sort operation is a record or tag sort, the default output record format is the same as the first input file record format. If the Sort operation is an address or index sort, the default output record format is fixed record format. If the input files have different record formats, Sort provides an output record size that is large enough to contain the largest record in the input files.
You can specify the following qualifier values.

BLOCK_SIZE: n	Specifies the output file's block size, in bytes, if you have directed the file to magnetic tape. If the input file is a tape file, the block size of the output file defaults to that of the input file. Otherwise, the output file block size defaults to the size used when the tape was mounted.
	Acceptable values for n range from 20 to 65,532. To ensure correct data interchange with other Digital systems, however, specify a block size of not more than 512 bytes. For compatibility with systems that are not made by Digital, the block size should not exceed 2,048 bytes.
CONTROLLED: n	Specifies variable with fixed-length control (VFC) records in the output file.
FIXED: n	Specifies fixed-length records in the output file.
SIZE: n	Specifies the size, in bytes, of the fixed portion of VFC (CONTROLLED) records, up to a maximum of 255 bytes. If you do not specify SIZE, the default is the size of the fixed portion of the first input file. If you specify this size as 0, OpenVMS RMS defaults the value to 2 bytes.
VARIABLE: n	Specifies variable-length records in the output file.

For any qualifier value, you can optionally specify n as the maximum record size (in bytes) of the output records. The maximum record size allowed depends on the file organization:

Sequential files	32,767
Relative files	16,383
Indexed-sequential files	16,362

These maximum record size values include control bytes for variable records with fixed-length control (VFC) format.
Example

Contents

Index

privacy and legal statement

6489PRO_026.HTML