Previous | Contents | Index |
The compiler allocates unique storage space in the Data Division for each file's current record area. Transferring records between files requires an intermediate buffer area and adds to a program's processing requirements.
To reduce address space and processing overhead, files can share current record areas. Specify the SAME RECORD AREA clause in the I-O-CONTROL paragraph of the Environment Division. Records need not be the same size, nor must the maximum size of each current record area be the same.
Figure 15-1 shows the effect of current record area sharing in a program that reads records from one file and writes them to another. However, it also shows a drawback: current record area sharing is equivalent to implicit redefinition. The records do not exist separately. Therefore, if the program changes a record defined for the output file, the input file record is no longer available.
Figure 15-1 Sharing Record Areas
15.6.4 Using COMP Unsigned Longword Integers
The compiler generates the most efficient code to process the following
clauses if a COMP unsigned longword integer (that is, PIC 9(9) COMP) is
used in those cases where a variable is needed:
RELATIVE KEY
DEPENDING ON
LINAGE IS
WITH FOOTING AT
LINES AT TOP
LINES AT BOTTOM
ADVANCING LINES
This section provides information on how to optimize the following file types:
For a full discussion of file types, see Chapter 6.
15.7.1 Sequential Files
Sequential files have the simplest structure and the fewest options for definition, population, and handling. You can reduce the number of disk accesses by minimizing record length.
With a sequential disk file, you can use multiblocking to access a buffer area larger than the default. Because the system transfers disk data in 512-byte blocks, a blocking factor with a multiple of 512 bytes improves I/O access time. In the following example, the multiblock count (four) causes reads and writes to FILE-A to access a buffer area of four physical blocks:
FILE SECTION. FD FILE-A BLOCK CONTAINS 2048 CHARACTERS . . . |
If you do not want to calculate the buffer size, but want to specify the number of records in each buffer, use the BLOCK CONTAINS n RECORDS clause. The following example specifies a buffer large enough to hold 15 records:
BLOCK CONTAINS 15 RECORDS |
When using the BLOCK CONTAINS n RECORDS clause for sequential files on disk, RMS calculates the buffer size by using the maximum record unit size and rounding up to a multiple of 512 bytes. Consequently, the buffer could hold more records than you specify.
In the following example, the BLOCK CONTAINS clause specifies five records. RMS calculates the block size as eight records, or 512 bytes.
FILE SECTION. FD FILE-A BLOCK CONTAINS 5 RECORDS. 01 FILE-A-REC PIC X(64). . . . |
By contrast, for magnetic tape, the program code entirely controls
blocking. Hence, efficiency of the program and the file depends on the
programmer's care with magnetic-tape blocking.
15.7.2 Relative Files
I/O optimization of a relative file depends on four concepts:
If you create a relative file with a Compaq COBOL program, the system sets the maximum record number (MRN) to 0, allowing the file to expand to any size.
If you create a relative file with the CREATE/FDL Utility, select a
realistic MRN, since an attempt to insert a record with a number higher
than the MRN will fail.
15.7.2.2 Cell Size
The system calculates cell size. (However, you can specify a different cell size when you create the file by using the RECORD CONTAINS clause in the file description.) You cannot write records larger than the specified cell size.
Avoid selecting a cell size larger than necessary since this wastes disk space. To optimize the packing of cells into buckets, cell size should be evenly divisible into bucket size.
The system calculates cell size using these formulas:
Fixed-length records: | cell size = 1 + record size |
Variable-length records: | cell size = 3 + record size |
For fixed-length records, the overhead byte is a record deletion indicator. Variable-length records use two additional overhead bytes to indicate record length. The following example calculates a cell size of 101 for fixed-length records:
FD A-FILE RECORD CONTAINS 100 CHARACTERS . . . |
A bucket's size is from 1 to 63 blocks. A large bucket improves sequential access to a relative file. You can prevent wasted space between the last cell and the end of a bucket by specifying a bucket size that is a multiple of cell size.
If you omit the BLOCK CONTAINS clause, the system calculates a bucket size large enough to hold at least one cell or 512 bytes, whichever is larger (that is, large enough to hold a record and its overhead bytes). Records cannot cross bucket boundaries, although they can cross block boundaries.
Use the BLOCK CONTAINS n CHARACTERS clause of the file description to set your own bucket size (in bytes per bucket). Consider the following example:
FILE-CONTROL. SELECT A-FILE ORGANIZATION IS RELATIVE. . . . DATA DIVISION. FILE SECTION. FD A-FILE RECORD CONTAINS 60 CHARACTERS BLOCK CONTAINS 1536 CHARACTERS . . . |
In the preceding example, the bucket size is 3 blocks. Each bucket contains:
If you use the BLOCK CONTAINS CHARACTERS clause and specify a value that is not a multiple of 512, the I/O system rounds the value to the next higher multiple of 512.
In the following example, the BLOCK CONTAINS clause specifies one record per bucket. Because the cell needs only 61 bytes, there are 451 wasted bytes in each bucket.
FILE-CONTROL. SELECT B-FILE ORGANIZATION IS RELATIVE. . . . DATA DIVISION. FILE SECTION. FD A-FILE RECORD CONTAINS 60 CHARACTERS BLOCK CONTAINS 1 RECORD. . . . |
To improve I/O access time: (1) specify a small bucket size, and (2) use the BLOCK CONTAINS n RECORDS clause to specify the number of records (cells) in each bucket. This example creates buckets that contain eight records.
FD A-FILE RECORD CONTAINS 60 CHARACTERS BLOCK CONTAINS 8 RECORDS. . . . |
In the preceding example, the bucket size is one 512-byte block. Each bucket contains:
Calculating a file's size helps you determine its space requirements. A file's size is a function of its bucket size. When you create a relative file, use the following calculations to determine the number of blocks that you need, rounding up the result in each case:
Assume that you want to create a relative file able to hold 3,000 records. The records are 255 bytes long (plus 1 byte per record for overhead), with 4 cells to a bucket (BLOCK CONTAINS 4 RECORDS). To determine file size: (see Section 15.7.2.3)
To allocate the 1500 calculated blocks to populate the entire file, use the APPLY CONTIGUOUS-BEST-TRY PREALLOCATION clause; otherwise, allocate fewer blocks.
Before writing a record to a relative file, the I/O system must have
formatted all buckets up to and including the bucket to contain the
record. Each time bucket reformatting occurs, response time suffers.
Therefore, writing the highest-numbered record first forces formatting
of the entire file only once. However, this technique can waste disk
space if the file is only partially loaded and not preallocated.
15.7.3 Indexed Files
An indexed file contains data records and pointers to facilitate record access.
All data records and record pointers are stored in buckets. A bucket contains an integral number of contiguous, 512-byte blocks. The number of blocks is the bucket size.
Every indexed file must have a primary key, a field in the record description that contains a value for each record. When the I/O system writes records to the indexed file, it collates them according to increasing primary key value in a series of chained buckets. Thus, you can access the records sequentially by specifying ACCESS SEQUENTIAL.
As the I/O system writes records, it builds and maintains a tree-like structure of key-value and location pointers. The highest level of the index is a single bucket, called the root bucket. The I/O system scans one bucket at each level until it reaches the bottom, or data level. In a primary key index, this level contains actual data records. Buckets in each higher level, called index levels, contain index records. Successive levels of an index file are numbered. The data level is level zero. The number of levels above level zero is called the index depth. Figure 15-2 shows a 2-level primary index.
Figure 15-2 Two-Level Primary Index
An index is also built for each alternate key that you define for the file. Like the primary index, alternate key indexes are contained in the file. The collating and chaining done for primary keys are also done for alternate keys. However, alternate keys do not contain data records at the data level; instead, they contain pointers, or secondary index data records (SIDRs), to data records in the data level of the primary index.
Each random access request begins by comparing a key value to the root bucket's entries. It seeks the first root bucket entry whose key value equals or exceeds the value of the access request key. (This search is always successful, because the root bucket's highest key value is the highest possible value that the key field can contain.) Once that key value is located, the bucket pointer is used to bring the target bucket on the next lower level into memory. This process is repeated for each level of the index.
One bucket is searched at each level of the index until a target bucket is reached at the data level. The data record's location is then determined so that a record can be retrieved or a new record written.
A data level bucket may not be large enough to contain a new record. In this case, the I/O system inserts a new bucket in the chain, moving enough records from the old bucket to preserve the key value sequence. This is known as a bucket split.
Data bucket splits can cause index bucket splits.
15.7.3.1 Optimizing Indexed File I/O
I/O optimization of an indexed file depends on five concepts:
Variable-length records can save file space: you need write only the primary record key data item (plus alternate keys, if any) for each record. In contrast, fixed-length records require that all records be equal in length.
For example, assume that you are designing an employee master file. A variable-length record file lets you write a long record for a senior employee with a large amount of historical data, and a short record for a new employee with less historical data.
In the following example of a variable-length record description, integer 10 of the RECORD VARYING clause represents the length of the primary record key, while integer 80 describes the length of the longest record in A-FILE:
FILE-CONTROL. SELECT A-FILE ASSIGN TO "AMAST" ORGANIZATION IS INDEXED. DATA DIVISION. FILE SECTION. FD A-FILE ACCESS MODE IS DYNAMIC RECORD KEY IS A-KEY RECORD VARYING FROM 10 TO 80 CHARACTERS. 01 A-REC. 03 A-KEY PIC X(10). 03 A-REST-OF-REC PIC X(70). . . . |
Buckets must contain enough room for record insertion, or bucket splitting occurs. The I/O system handles it by creating a new data bucket for the split, moving some records from the original to the new bucket, and putting the pointer to the new bucket into the lowest-level index bucket. If the lowest-level index bucket overflows, the I/O system splits it in similar fashion, on up to the top level (root level).
In an indexed file, the I/O system also maintains chains of forward pointers through the buckets.
For each record moved, a 7-byte pointer to the new record location remains in the original bucket. Thus, bucket splits can accumulate overhead and possibly reduce usable space so much that the original bucket can no longer receive records.
Record deletions can also accumulate storage overhead. However, most of the space is available for reuse.
There are several ways to minimize overhead accumulation. First, determine or estimate the frequency of certain operations. For example, if you expect to add or delete 100 records of a 100,000-record file, your database is stable enough to allow some wasted space for record additions and deletions. However, if you expect frequent additions and deletions, try to:
Each alternate key requires the creation and maintenance of a separate index structure. The more keys you define, the longer each WRITE, REWRITE, and DELETE operation takes. (The throughput of READ operations is not affected by multiple keys.)
If your application requires alternate keys, you can minimize I/O processing time if you avoid duplicate alternate keys. Duplicate keys can create long record pointer arrays, which fill bucket space and increase access time.
Bucket size selection can influence indexed file performance.
To the system, bucket size is an integral number of physical blocks, each 512 bytes long. Thus, a bucket size of 1 specifies a 512-byte bucket, while a bucket size of 2 specifies a 1024-byte bucket, and so on.
The Compaq COBOL compiler passes bucket size values to the I/O system based on what you specify in the BLOCK CONTAINS clause. In this case, you express bucket size in terms of records or characters.
If you specify block size in records, the bucket can contain more records than you specify, but never fewer. For example, assume that your file contains fixed-length, 100-byte records, and you want each bucket to contain five records, as follows:
BLOCK CONTAINS 5 RECORDS |
This appears to define a bucket as a 512-byte block, containing five records of 100 bytes each. However, the compiler adds I/O system record and bucket overhead to each bucket, as follows:
Bucket overhead | = 15 bytes per bucket |
Record overhead |
= 7 bytes per record (fixed-length)
9 bytes per record (variable-length) |
Thus, in this example, the bucket size calculation is:
Because blocks are 512 bytes long, and buckets are always an integral number of blocks, the smallest bucket size possible (the system default) in this case is two blocks. The system, however, puts in as many records as fit into each bucket. Thus, the bucket actually contains nine records, not five.
The CHARACTERS option of the BLOCK CONTAINS clause lets you specify bucket size more directly. For example:
BLOCK CONTAINS 2048 CHARACTERS |
This specifies a bucket size of four 512-byte blocks. The number of characters in a bucket is always a multiple of 512. If not, the I/O system rounds it to the next higher multiple of 512.
The length of data records, key fields, and buckets in the file determines the depth of the index. Index depth, in turn, determines the number of disk accesses needed to retrieve a record. The smaller the index depth, the better the performance. In general, an index depth of 3 or 4 gives satisfactory performance. If your calculated index depth is greater than 4, you should consider redesigning the file.
You can optimize your file's index depth after you have determined file, record, and key size. Calculating index depth is an iterative process, with bucket size as the variable. Keep in mind that the highest level (root level) can contain only one bucket.
If much data is added over time to an indexed file, you should reorganize the file periodically to restore its indexes to their optimal levels.
Following is detailed information on calculating file size, and an example of index depth calculation:
Use these calculations to determine data and index record size:
If a file has more than 65,536 blocks, the 3-byte index record overhead could increase to 5 bytes.
Use these calculations to determine SIDR record length:
Bucket packing efficiency determines how well bucket space is used. A packing efficiency of 1 means the buckets of an index are full. A packing efficiency of .5 means that, on the average, the buckets are half full. |
Consider an indexed file with these attributes:
Primary key index level calculations:
In the following calculations, some results are to be rounded up, and some truncated.
If you allow duplicate keys in alternate indexes, the number and size of SIDRs depend on the number of duplicate key values in the file. (For duplicate key alternate index calculations, see the OpenVMS Record Management Services Reference Manual.) Because alternate index records are usually inserted in random order, the bucket packing efficiency ranges from about .5 to about .65. The following example uses an average efficiency of .55.
In each of the following calculations, the results are either rounded up or truncated.
The system requires at least two buffers to process an indexed file: one for a data bucket, the other for an index bucket. In fact, a data buffer and an index buffer are needed for every level of indexing available in the file (a fact that is not visible to the COBOL program, because the minimum amount of space is always allocated). Each buffer is large enough to contain a single bucket. If your program does not contain a RESERVE n AREAS clause, or if you do not use the DCL SET RMS_DEFAULT command, the system sets the default.
A RESERVE n AREAS clause creates additional buffers for processing an indexed file. At run time, the system retains (caches) in memory the roots of one or more indexes of the file. Random access to any record through that index requires one less I/O operation.
You can also use the SET RMS_DEFAULT/BUFFER_COUNT=count to create additional buffers.
The following rules apply for caching index roots:
The DCL SET RMS commands also apply to sequential and relative files. The DCL SET RMS commands and RESERVE AREA clause provide the same functionality.
For information about DCL commands, see the OpenVMS DCL Dictionary. <>
Previous | Next | Contents | Index |