Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS RTL Library (LIB$) Manual


Previous Contents Index


LIB$TPARSE/LIB$TABLE_PARSE

The Table-Driven Finite-State Parser routine is a general-purpose, table-driven parser implemented as a finite-state automaton, with extensions that make it suitable for a wide range of applications. It parses a string and returns a message indicating whether or not the input string is valid.

Note

No support for arguments passed by 64-bit address reference or the use of 64-bit descriptors is planned for LIB$TPARSE. On Alpha systems, LIB$TABLE_PARSE supports arguments passed by 64-bit address reference and the use of 64-bit descriptors.)

LIB$T[ABLE_]PARSE is called with the address of an argument block, the address of a state table, and the address of a keyword table. The input string is specified as part of the argument block.

The LIB$ facility supports the following two versions of the Table-Driven Finite-State Parser:
LIB$TPARSE Available on VAX systems.
  LIB$TPARSE is available on Alpha systems in translated form. In this form, it is applicable to translated VAX images only.
LIB$TABLE_PARSE Available on VAX and Alpha systems.

LIB$TPARSE and LIB$TABLE_PARSE differ mainly in the way they pass arguments to action routines.

The term LIB$T[ABLE_]PARSE is used here to describe concepts that apply to both LIB$TPARSE and LIB$TABLE_PARSE.


Format

LIB$TPARSE/LIB$TABLE_PARSE argument-block ,state-table ,key-table


RETURNS


OpenVMS usage: cond_value
type: longword (unsigned)
access: write only
mechanism: by value


Arguments

argument-block


OpenVMS usage: unspecified
type: unspecified
access: modify
mechanism: by reference

LIB$T[ABLE_]PARSE argument block. The argument-block argument contains the address of this argument block.

The LIB$T[ABLE_]PARSE argument block contains information about the state of the parse operation. It is a means of communication between LIB$T[ABLE_]PARSE and the user's program. It is passed as an argument to all action routines.

You must declare and initialize the argument block. Section 1.4 describes the argument block in detail. Section 2.2 illustrates the coding for an argument block declaration and discusses its initialization.

LIB$T[ABLE_]PARSE supports the following argument blocks:

state-table


OpenVMS usage: unspecified
type: unspecified
access: read only
mechanism: by reference

Starting state in the state table. The state-table argument is the address of this starting state. Usually, the name appearing as the first argument of the $INIT_STATE macro is used.

You must define the state table for your parser. LIB$T[ABLE_]PARSE provides macros in the MACRO and BLISS languages for this purpose. Section 1.3 describes these macros.

key-table


OpenVMS usage: unspecified
type: unspecified
access: read only
mechanism: by reference

Keyword table. The key-table argument is the address of this keyword table. This name must be the same as that which appears as the second argument of the $INIT_STATE macro.

You must only assign a name to the keyword table. The LIB$T[ABLE_]PARSE macros allocate and define the table. See Section 4 for more information about the keyword table.


Description

The following sections explain in detail how LIB$T[ABLE_]PARSE works and how to call it from both the MACRO assembly language and high-level languages:
  1. How LIB$T[ABLE_]PARSE Works --- Describes the data structures used by LIB$T[ABLE_]PARSE and how LIB$T[ABLE_]PARSE operates on them.
  2. Coding and Using a Simple State Table --- Explains how to construct and use a simple state table.
  3. Using Advanced LIB$T[ABLE_]PARSE Features --- Explains how to use subexpressions, abbreviations, action routines, and other advanced features.
  4. Data Representation --- Includes information for the low-level-language programmer, such as the binary representation of state table data.

1 How LIB$T[ABLE_]PARSE Works
LIB$T[ABLE_]PARSE analyzes an input string according to a set of states and transitions presented in a state table you define. It determines whether the input string is valid according to the rules you define for the input language.

There are three parts to any parsing operation:

1.1 Overview
Before discussing the alphabet, the state table, and the argument block in detail, this section provides an overview of how these three parts work together.

1.1.1 Evaluating the Input String
LIB$T[ABLE_]PARSE evaluates the input string from left to right as it transitions from state to state. For a particular transition in a particular state, it evaluates the beginning of the unprocessed part of the input string against the symbol type you specify for the transition to determine whether there is a match.

LIB$T[ABLE_]PARSE compares each character of the remaining input string, from left to right, against the transition's symbol type until it encounters a character in the input string that does not match. It takes the substring that matches the symbol type and stores a pointer to it in the argument block as the current token. In this way, any character in the input string that does not belong to the symbol type's constituent character set effectively becomes a separator.

If LIB$T[ABLE_]PARSE finds a match, it executes the transition.

If the input string does not match, LIB$T[ABLE_]PARSE attempts to match the next transition. It performs the comparison using the transitions in the order in which you define them for the state.

1.1.2 Executing a Transition
When LIB$T[ABLE_]PARSE finds a match with a transition, it performs the following steps:

  1. Stores a pointer to the current token in the argument block. If the token matches one of the numeric symbol types, it also stores the token's binary representation in the argument block.
  2. Calls the action routine, if any, specified by the transition and passes it the argument block and any additional user-specified arguments.
    You can use an action routine to reject a transition. In this case, LIB$T[ABLE_]PARSE performs none of the following steps. See Section 3.1 for more information.
  3. Performs one of the following operations:
  4. Transfers control to the specified state, if any, or to the next state in the state table.

1.1.3 Exiting LIB$T[ABLE_]PARSE
LIB$T[ABLE_]PARSE continues to match and execute transitions from state to state until one of the following occurs:

Note

LIB$T[ABLE_]PARSE generates no signals and establishes no condition handler; action routines can signal through LIB$T[ABLE_]PARSE back to the calling program.

When LIB$T[ABLE_]PARSE cannot successfully parse the entire string, it defines the current token, as follows, and stores it in the argument block before returning:

1.2 Alphabet of LIB$T[ABLE_]PARSE
The LIB$T[ABLE_]PARSE alphabet consists of a set of symbol types defined in Table lib-9. This alphabet includes strings made up of elements of the ASCII character set. It provides all the basic building blocks needed for constructing a grammar using the ASCII character set. The alphabet also includes symbol types that represent the more complex constructions found in programming and command language grammar.

Use the symbols types that comprise the LIB$T[ABLE_]PARSE alphabet to define a vocabulary and grammar for your language. For each transition you define, you specify one of the alphabet symbol types. LIB$T[ABLE_]PARSE compares the characters at the beginning of the remaining input string with this symbol type of each of the possible transitions. If LIB$T[ABLE_]PARSE finds a match, it enters the state specified by that transition.

Table lib-9 The Alphabet of LIB$T [ABLE_]PARSE
Symbol Type Characters Matched
' x' The particular ASCII character. In a state table, it is expressed by enclosing the character in single quotation marks. The character can be any member of the 8-bit ASCII code set. LIB$T[ABLE_]PARSE does not consider uppercase and lowercase alphabetic characters and codes with different values in bit 7 to be equivalent.
TPA$_ANY Any single character.
TPA$_ALPHA Any alphabetic character, which includes the DEC multinational character set.
TPA$_DIGIT Any numeric character, that is, 0 through 9.
TPA$_STRING Any string of one or more alphanumeric characters, that is, uppercase or lowercase A through Z, and the numeric characters 0 through 9. The string can be any length. It is bounded on the right by the first nonalphanumeric character or by the end of the string.
TPA$_SYMBOL Any string of one or more through characters of the standard OpenVMS symbol constituent set, that is, uppercase and lowercase A through Z and all DEC multinational characters, in addition to the dollar sign ($) and the underscore (_). The string is bounded on the right by some character not in the symbol constituent set (usually a blank) or by the end of the string.
' keyword' The string of characters enclosed in single quotation marks. A keyword can consist of one or more characters of the OpenVMS symbol constituent set, that is, uppercase and lowercase A through Z, the numeric characters 0 through 9, the dollar sign ($), and the underscore (_). Uppercase and lowercase alphabetics are treated as different characters.

A state table can contain up to 220 keywords. The keyword is bounded on the right by a character not in the symbol constituent set or by the end of the string.

Keywords that are one character in length are expressed in the form ' x*' to distinguish them from the single-character symbol (' x'). They must be differentiated because they are not the same in operation. For example, in the input string AB+C, the single character 'A' would match the first character of this string, whereas the keyword 'A*' would not, because B in the string is in the symbol constituent set.

TPA$_BLANK Any string of one or more blanks and/or tabs.
TPA$_OCTAL Any octal number (that is, any string of one or more numeric characters 0 through 7) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
TPA$_DECIMAL Any decimal number (that is, any string of one or more numeric characters 0 through 9) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
TPA$_HEX Any hexadecimal number (that is, any string of one or more numeric characters 0 through 9, A through F) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
(Alpha specific) TPA$_OCTAL_64 Any octal number (that is, any string of one or more numeric characters 0 through 7) whose magnitude is less than 2 64.
(Alpha specific) TPA$_DECIMAL_64 Any decimal number (that is, any string of one or more numeric characters 0 through 9) whose magnitude is less than 2 64.
(Alpha specific) TPA$_HEX_64 Any hexadecimal number (that is, any string of one or more numeric characters 0 through 9, A through F) whose magnitude is less than 2 64.
TPA$_FILESPEC Any string that constitutes a valid OpenVMS file specification. The string is bounded on the right by the first character that either is not a file specification constituent character or would cause the string to violate the syntax rules of a file specification.
TPA$_NODE Matches a full node specification including the double colon (::).
TPA$_NODE_ACS Matches a primary node specification including the access control string, if any, but not the double colon (::).
TPA$_NODE_PRIMARY Matches a primary node specification excluding both the access control string, if any, and the double colon (::).
TPA$_UIC Any string that constitutes a valid OpenVMS numerical UIC specification, bounded by square brackets or angle brackets. The binary value of the UIC, converted in octal radix, is placed in the argument block. The wildcard character (*) is permitted in the group and/or member fields; its presence results in that field being set to its largest possible value in the binary representation.
TPA$_IDENT Any string that constitutes a valid OpenVMS identifier. Identifiers may be given as numerical UICs according to the rules for TPA$_UIC, or as alphabetic identifier names that appear in the system's rights database. The binary value of the identifier, converted in either octal or hexadecimal radix or by lookup in the system rights database, is placed in the argument block. Identifiers can be entered in any of the following forms:
 [n,m] <n,m>

[name1,name2] <name1,name2>
[name] <name>
name
%Xhex-value
You can use a wildcard (*) in place of any occurence of number or name in an identifier form.
TPA$_LAMBDA The empty string (always matches). As it executes the transition, LIB$T[ABLE_]PARSE does not remove any characters from the input string. LAMBDA transitions are useful in getting action routines called under otherwise awkward circumstances, providing unconditional GOTOs to link portions of a state table together, and providing default actions in certain cases.
TPA$_EOS The end of the input string.
state label The label of a state that functions as a subexpression. A subexpression is analogous to a subroutine within the state table.

The subexpression facility permits complex syntactic constructs that appear in many places in grammar to appear only once in the state table. It also permits a degree of nondeterministic or pushdown parsing with a parser that is otherwise deterministic and finite-state. See Section 3.5 for detailed information about subexpressions and examples of their use.

Note

By default, LIB$T[ABLE_]PARSE treats blanks (defined to be either spaces or tabs), as though they belong to no symbol type constituent set. Effectively, this makes the blank a separator. LIB$T[ABLE_]PARSE begins its next comparison with the first nonblank character following the blanks. To have LIB$T[ABLE_]PARSE evaluate a blank as it would any other character in the input string, set the TPA$V_BLANKS flag in the argument block. Section 3.2 provides an example of the use of this flag.

1.3 State Tables
This section describes state table generation and the macros used to construct state tables. Section 2 explains how to use these macros.

The state table must be set up using either MACRO or BLISS. Everything else, including any action routines, can be coded in the language of your choice. Simply compile the state table separately, then link it with your program.

The body of the state table consists of one or more states, each of which defines one or more transitions to the same or other states. The order of the states and the order of the transitions for each state are important:

1.3.1 MACRO State Table Generation Macro Calls
The OpenVMS system MACRO library contains a set of assembler macros that allow convenient and readable coding of a LIB$T[ABLE_]PARSE state table. These macros generate symbol definitions and tables. They do not produce any executable code or routine calls.

There are four MACRO state table generation macros:

A state table begins with a call to $INIT_STATE and ends with a call to $END_STATE. Within the state table, define each state by a call to $STATE immediately followed by as many calls to $TRAN as you need to define the transitions from that state.

1.3.1.1 $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE Macros
The $INIT_STATE macro declares the beginning of a state table. It initializes the internals of the table generator macros and declares the locations of the state table and the keyword table:

Section 4 provides specific information on the allocation and binary representations of the state table and the keyword table. This information may be useful in debugging your program.


$INIT_STATE     state-table ,key-table 

state-table

The name assigned to the state table. LIB$T[ABLE_]PARSE equates this label to the start of the first state in the state table.

key-table

The name assigned to the keyword table. LIB$T[ABLE_]PARSE equates this label to the start of the keyword table.

You must supply both the address of the state table and the address of the keyword table in the call to LIB$T[ABLE_]PARSE to perform a parse. The $INIT_STATE macro can appear more than once in a program. Each occurrence defines a separate state table. No part of any state table can refer to part of any other state table.

1.3.1.2 $STATE---Defines a State
The $STATE macro declares the beginning of a state.


$STATE   [label] 

label

An optional label for the state. LIB$T[ABLE_]PARSE equates the label, if present, to the starting address of the state.

1.3.1.3 $TRAN---Defines a State Transition
The $TRAN macro defines a transition from the state in which it is defined to some other (or to the same) state. The arguments of the macro define, among other things, the symbol type that causes the transition to be executed, the state to which to transfer, and the action routine to call, if any. The transition defined by a $TRAN macro belongs to the state defined by the last preceding $STATE macro.


$TRAN   type [,label] [,action] [,mask] [,msk-adr] [,argument] 

type

The symbol type, taken from the LIB$T[ABLE_]PARSE alphabet, that is recognized by this transition. The transition is taken if the characters from the beginning of the remaining input string match the specified symbol type.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
5932PRO_045.HTML