Updated: 11 December 1998

Contents

Index

2.2.1 The collating-element Statement

The collating-element statement specifies multicharacter collation items.

Syntax:

collating-element <character_symbol> from <string>

The character_symbol argument defines a collation item that is a string of one or more characters as a single collation item. The character_symbol cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition.

The string argument specifies a string of two or more characters that define the character_symbol argument. The following are examples of the syntax for the collating-element statement:

collating-element <ch> from "<c><h>" collating-element <e-acute> from "<acute><e>" collating-element <11> from "<1><1>"

A character_symbol argument defined by the collating-element statement is recognized only within the LC_COLLATE category.

2.2.2 The collating-symbol Statement

The collating-symbol statement specifies collation symbols for use in collation sequence statements.

Syntax:

collating-symbol <collating_symbol>

The collating-symbol argument cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition. The following are examples of collating-symbol statements:

collating-symbol <UPPER_CASE> collating-symbol <HIGH>

An argument defined by the collating-symbol statement is recognized only within the LC_COLLATE category.

2.2.3 The order_start Statement

The order_start statement is followed by one or more collation order statements that assign collation weights to collation items and the order_end keyword. The order_start statement is a required statement.

Syntax:

order_start sort_rules;sort_rules;...;sort_rules collation_order_statements order_end

Sort Rules

The sort_rules directives have the following syntax:

keyword, keyword,...,keyword

where keyword is FORWARD, BACKWARD, or POSITION.

The sort_rules directives are optional. If specified, they define the rules to apply during string comparison. The number of specified sort_rules directives defines the number of weights each collation item is assigned (that is, the directives define the number of collation orders in the locale). If no sort_rules directives are specified, one forward directive is assumed and comparisons are made on a character basis rather than a string basis.

If sort_rules directives are present, the first one applies when comparing strings that use the primary weight, the second when comparing strings that use the secondary weight, and so on. Each set of sort_rules directives is separated by a semicolon (;). A sort_rules directive consists of one or more keywords separated by commas. The following keywords are supported:

FORWARD --- Specifies that collation weight comparisons proceed from the beginning of a string to the end of the string.
BACKWARD --- Specifies that collation weight comparisons proceed from the end of a string to the beginning of the string.
POSITION --- Specifies that collation weight comparisons consider the relative position of nonignored elements in the string (that is, if strings compare as equal, the element with the shortest distance from the starting point of the comparison collates first).

The forward and backward keywords are mutually exclusive.

The following is an example of a sort_rules directive:

order_start forward;backward

Collation Order Statements

The following syntax rules apply to the collation order statements:

Each collation order statement consists of a <character_symbol> specification followed by white space and a set of collation orders.
Characters in the character set can be explicitly specified in the collation order statements or implicitly specified using the ellipsis symbol (...).
A collation order statement that begins with the UNDEFINED special symbol specifies any characters that are in the character set but not explicitly or implicitly specified by other collation order statements.

The optional operands for each collation item are used to define the primary, secondary, or subsequent weights for the collation item. The special symbol IGNORE is used to indicate a collation item that is to be ignored when strings are compared.

An ellipsis keyword appearing in place of a collating_element_list indicates the weights are to be assigned, for the characters in the identified range, in numerically increasing order from the weight for the character symbol on the left side of the preceding statement.

The use of the ellipsis keyword results in a locale that may collate differently when compiled with different character set description (charmap) source files.

The UNDEFINED special symbol includes all coded character set values not specified explicitly or with an ellipsis symbol. These characters are inserted in the character collation order at the point indicated by the UNDEFINED special symbol and are all assigned the same weight. If no UNDEFINED special symbol exists and the collation order does not specify all collation items from the coded character set, a warning is issued and all undefined characters are placed at the end of the character collation order.

Example

The following is an example of a collation order statement section in the LC_COLLATE locale definition source file category:

order_start forward;backward UNDEFINED IGNORE;IGNORE <LOW> <space> <LOW>;<space> ... <LOW>;... <a> <a>;<a> <a-acute> <a>;<a-acute> <a-grave> <a>;<a-grave> <A> <a>;<A> <A-acute> <a>;<A-acute> <A-grave> <a>;<A-grave> <ch> <ch>;<ch> <Ch> <ch>;<Ch> <s> <s>;<s> <ss> <s><s>;<s><s> <eszet> <s><s>;<eszet><eszet> ... <HIGH>;... <HIGH> order_end

This example is interpreted as follows:

The UNDEFINED special symbol indicates that all characters not specified in the definition (either explicitly or by the ellipsis symbol) are ignored for collation purposes.
All collation items between <space> and <a> have the same primary equivalence class and individual secondary weights based on their coded character-set values.
All versions of the letter a (uppercase and lowercase, and with or without diacriticals) belong to the same primary collation class.
The <c><h> multicharacter collation item is represented by the <ch> collating symbol and belongs to the same primary equivalence class as the <C><h> multicharacter collation item.
The <eszet> character is collated as an <s><s> string (that is, one <eszet> character is expanded to two characters before comparing).

2.3 LC_CTYPE Category

The LC_CTYPE category defines character classification, case conversion, and other character attributes. This category begins with the LC_CTYPE header and ends with the END LC_CTYPE trailer.

All operands for LC_CTYPE category statements are defined as lists of characters. Each list consists of one or more characters or symbolic character names separated by semicolons. An ellipsis (...) can represent a series of characters; for example, <a>;...;<z> represents the characters in the range a through z.

Table 2-2 lists the statement keywords recognized in the LC_CTYPE category. In the keyword descriptions, the phrase "automatically included" means that an error does not occur if the referenced characters are included or omitted; the characters are provided if they are missing, and are accepted if they are present.

Table 2-2 LC_CTYPE Category Keywords
Keyword Description

copy Specifies the name of an existing locale to be used as the definition for this category.
If you specify a copy statement, you cannot specify any other keyword.

upper Defines uppercase letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. The uppercase letters A through Z are automatically included in this set.

lower Defines lowercase letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. The lowercase letters a through z are automatically included in this set.

alpha Defines all letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. Characters defined by the upper and lower keywords are automatically included in this character class.

digit Defines numeric digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. The digits 0 through 9 are automatically included in this set.

space Defines white-space characters.
Do not specify any character defined by the upper, lower, alpha, digit, graph, or xdigit keyword. The space, form-feed, new-line, carriage-return, tab, and vertical tab characters are automatically included in this set.

cntrl Defines control characters.
Do not specify any character defined by the upper, lower, alpha, digit, punct, graph, print, or xdigit keyword.

punct Defines punctuation characters.
Do not specify the space character or any character defined by the upper, lower, alpha, digit, cntrl, or xdigit keywords.

graph Defines printable characters, excluding the space character.
Do not specify any character defined by the cntrl keyword. The characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class.

print Defines printable characters, including the space character.
Do not specify any character defined by the cntrl keyword. The space character and characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class.

xdigit Defines hexadecimal digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. Any character, however, can be specified for the hexadecimal values for 10 to 15. These alternate hexadecimal digits are not used by standard conversion routines when converting digit strings from hexadecimal to numeric quantities. The numbers 0 through 9 and the letters A through F and a through f are automatically included in this set.

blank Defines blank characters.
The space and horizontal tab characters are included in this character class. Any characters defined by this statement are automatically included in the space class.

toupper Defines the mapping of lowercase characters to uppercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered a lowercase character; the second character is considered an uppercase character. Only characters defined by the lower and upper keywords can be specified. If toupper is not specified, a through z is mapped to A through Z by default.

tolower Defines the mapping of uppercase characters to lowercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered an uppercase character; the second character is considered a lowercase character. Only characters defined by the lower and upper keywords can be specified.
If tolower is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both omitted, the mapping for each defaults to that of the C locale.

**Table 2-2 LC_CTYPE Category Keywords**
Keyword	Description
copy	Specifies the name of an existing locale to be used as the definition for this category. If you specify a copy statement, you cannot specify any other keyword.
upper	Defines uppercase letter characters. Do not specify any character defined by the cntrl, digit, punct, or space keyword. The uppercase letters A through Z are automatically included in this set.
lower	Defines lowercase letter characters. Do not specify any character defined by the cntrl, digit, punct, or space keyword. The lowercase letters a through z are automatically included in this set.
alpha	Defines all letter characters. Do not specify any character defined by the cntrl, digit, punct, or space keyword. Characters defined by the upper and lower keywords are automatically included in this character class.
digit	Defines numeric digit characters. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. The digits 0 through 9 are automatically included in this set.
space	Defines white-space characters. Do not specify any character defined by the upper, lower, alpha, digit, graph, or xdigit keyword. The space, form-feed, new-line, carriage-return, tab, and vertical tab characters are automatically included in this set.
cntrl	Defines control characters. Do not specify any character defined by the upper, lower, alpha, digit, punct, graph, print, or xdigit keyword.
punct	Defines punctuation characters. Do not specify the space character or any character defined by the upper, lower, alpha, digit, cntrl, or xdigit keywords.
graph	Defines printable characters, excluding the space character. Do not specify any character defined by the cntrl keyword. The characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class.
print	Defines printable characters, including the space character. Do not specify any character defined by the cntrl keyword. The space character and characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class.
xdigit	Defines hexadecimal digit characters. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. Any character, however, can be specified for the hexadecimal values for 10 to 15. These alternate hexadecimal digits are not used by standard conversion routines when converting digit strings from hexadecimal to numeric quantities. The numbers 0 through 9 and the letters A through F and a through f are automatically included in this set.
blank	Defines blank characters. The space and horizontal tab characters are included in this character class. Any characters defined by this statement are automatically included in the space class.
toupper	Defines the mapping of lowercase characters to uppercase characters. Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered a lowercase character; the second character is considered an uppercase character. Only characters defined by the lower and upper keywords can be specified. If toupper is not specified, a through z is mapped to A through Z by default.
tolower	Defines the mapping of uppercase characters to lowercase characters. Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered an uppercase character; the second character is considered a lowercase character. Only characters defined by the lower and upper keywords can be specified. If tolower is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both omitted, the mapping for each defaults to that of the C locale.

Additional keywords can be provided to define new character classifications. For example:

charclass vowel vowel <a>;<e>;;<o>;;<y>

The LC_CTYPE category does not support multicharacter elements (for example, the German Eszet character is traditionally classified as a lowercase letter). In proper capitalization of German text, the Eszet character is replaced by the two characters SS; there is no corresponding uppercase letter. This kind of conversion is outside the scope of the toupper and tolower keywords.

The following is a sample LC_CTYPE category specified in a locale definition source file:

LC_CTYPE #"alpha" is by default "upper" and "lower" #"alnum" is by definition "alpha" and "digit" #"print" is by default "alnum", "punct" and the space character #"graph" is by default "alnum" and "punct" #"tolower" is by default the reverse mapping of "toupper" # upper <A>;;<C>;<D>;<E>;<F>;<G>;<H>;;<J>;<K>;<L>;<M>;\ <N>;<O>;;<Q>;<R>;<S>;<T>;;<V>;<W>;<X>;<Y>;<Z> # lower <a>;;<c>;<d>;<e>;<f>;<g>;<h>;;<j>;<k>;<l>;<m>;\ <n>;<o>;;<q>;<r>;<s>;<t>;;<v>;<w>;<X>;<y>;<z> # digit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\ <seven>;<eight>;<nine> # space <tab>;<newline>;<vertical-tab>;<form-feed>;\ <carriage-return>;<space> # cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\ <form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;\ <ETX>;<EOT>;<ENQ>;<ACK>;<SO>;<SI>;<DLE>;<DC1>;<DC2>;\ <DC3>;<DC4>;<NAK>;<SYN>;<ETB>;<CAN>;;;\ <ESC>;<IS4>;<IS3>;<IS2>;<IS1>;<DEL> # punct <exclamation-mark>;<quotation-mark>;<number-sign>;\ <dollar-sign>;<percent-sign>;<ampersand>;<asterisk>;\ <apostrophe>;<left-parenthesis>;<right-parenthesis>;\ <plus-sign>;<comma>;<hyphen>;<period>;<slash>;\ <colon>;<semicolon>;<less-than-sign>;<equals-sign>;\ <greater-than-sign>;<question-mark>;<commercial-at>;\ <left-square-bracket>;<backslash>;<circumflex>;\ <right-square-bracket>;<underline>;<grave-accent>;\ <left-curly-bracket>;<vertical-line>;<tilde>;\ <right-curly-bracket> # xdigit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\ <seven>;<eight>;<nine>;<A>;;<C>;<D>;<E>;<F>;\ <a>;;<c>;<d>;<e>;<f> # blank <space>;<tab> # toupper (<a>,<A>);(,);(<c>,<C>);(<d>,<D>);(<e>,<E>);\ (<f>,<F>);(<g>,<G>);(<h>,<H>);(,);(<j>,<J>);\ (<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\ (,);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\ (,);(<v>,<V>);(<w>,<W>);(<X>,<X>);(<y>,<Y>);\ (<z>,<Z>) # END LC_CTYPE

2.4 LC_MESSAGES Category

The LC_MESSAGES category defines the format for affirmative and negative system responses. This category begins with the LC_MESSAGES header and ends with the END LC_MESSAGES trailer.

All operands for the LC_MESSAGES category are defined as strings or extended regular expressions bounded by double quotation marks ("). These operands are separated from the keyword they define by one or more blank characters (spaces or tabs). Two adjacent double quotation marks ("") indicate an undefined value.

Table 2-3 lists the statement keywords recognized in the LC_MESSAGES category.

Table 2-3 LC_MESSAGES Category Keywords
Keyword Description

copy Specifies the name of an existing locale to be used as the definition of this category.
If you specify a copy statement, you cannot specify any other keyword.

yesexpr Specifies an extended regular expression that describes the acceptable affirmative response to a question expecting an affirmative or negative response.

noexpr Specifies an extended regular expression that describes the acceptable negative response to a question expecting an affirmative or negative response.

yesstr Specifies the locale's equivalent of an acceptable affirmative response.
This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (YESSTR). Note that yesstr is likely to be withdrawn from the XPG4 standard; yesexpr is the recommended alternative.

nostr Specifies the locale's equivalent of an acceptable negative response.
This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (NOSTR). Note that nostr is likely to be withdrawn from the XPG4 standard; noexpr is the recommended alternative.

**Table 2-3 LC_MESSAGES Category Keywords**
Keyword	Description
copy	Specifies the name of an existing locale to be used as the definition of this category. If you specify a copy statement, you cannot specify any other keyword.
yesexpr	Specifies an extended regular expression that describes the acceptable affirmative response to a question expecting an affirmative or negative response.
noexpr	Specifies an extended regular expression that describes the acceptable negative response to a question expecting an affirmative or negative response.
yesstr	Specifies the locale's equivalent of an acceptable affirmative response. This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (YESSTR). Note that yesstr is likely to be withdrawn from the XPG4 standard; yesexpr is the recommended alternative.
nostr	Specifies the locale's equivalent of an acceptable negative response. This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (NOSTR). Note that nostr is likely to be withdrawn from the XPG4 standard; noexpr is the recommended alternative.

The following is a sample LC_MESSAGES category specified in a locale definition source file:

LC_MESSAGES # yesexpr "<circumflex><left-square-bracket><y><Y>\ <right-square-bracket>" noexpr "<circumflex><left-square-bracket><n><N>\ <right-square-bracket>" yesstr "<y><e><s>" nostr "<n><o>" # END LC_MESSAGES

2.5 LC_MONETARY Category

The LC_MONETARY category defines rules and symbols for formatting monetary numeric information. This category begins with the LC_MONETARY header and ends with the END LC_MONETARY trailer.

2.5.1 LC_MONETARY Keywords

All operands for the LC_MONETARY category keywords are defined as string or integer values. String values are bounded by double quotation marks ("). All values are separated from the keyword they define by one or more blank characters (spaces or tabs). Two adjacent double quotation marks ("") indicate an undefined string value. A negative one (--1) indicates an undefined integer value.

Table 2-4 lists the statement keywords recognized in the LC_MONETARY category.

Table 2-4 LC_MONETARY Category Keywords
Keyword Description

copy Specifies the name of an existing locale to be used as the definition of this category.
If you specify a copy statement, you cannot specify any other keyword.

int_curr_symbol Specifies the string used for the international currency symbol.
The operand for this keyword is a 4-character string+. The first three characters contain the alphabetic international currency symbol. The fourth character defines a character separator for insertion between the international currency symbol and a monetary quantity.

currency_symbol Specifies the string used for the local currency symbol.

mon_decimal_point Specifies the decimal delimiter string used for formatting monetary quantities.

mon_thousands_sep Specifies the character separator used for grouping digits to the left of the decimal delimiter in formatted monetary quantities.

mon_grouping Specifies a string that defines the size of each group of digits in formatted monetary quantities.
The operand for this keyword consists of a sequence of integers separated by semicolons. Each integer specifies the number of digits in a group. The first integer defines the size of the group immediately to the left of the decimal delimiter. Subsequent integers define succeeding groups to the left of the previous group. If the last integer is not --1, it is used to group any remaining digits. If the last integer is --1, no further grouping is performed.
A sample interpretation of the mon_grouping statement follows. Assuming a value of 123456789 to be formatted and a mon_thousands_sep operand of ' (single quotation mark), the following results occur:

mon_grouping Formatted Value

3;-1 123456'789

3 123'456'789

3;2;-1 1234'56'789

3;2 12'34'56'789

positive_sign Specifies the string used to indicate a nonnegative-formatted monetary quantity.

negative_sign Specifies the string used to indicate a negative-formatted monetary quantity.

int_frac_digits Specifies an integer value representing the number of fractional digits (those after the decimal delimiter) to be displayed in a formatted monetary quantity using the int_curr_symbol value.

frac_digits Specifies an integer value representing the number of fractional digits (those after the decimal delimiter) to be displayed in a formatted monetary quantity using the currency_symbol value.

p_cs_precedes Specifies an integer value indicating whether the int_curr_symbol or currency_symbol string precedes or follows the value for a nonnegative-formatted monetary quantity.
The following integer values are recognized:

0 The currency symbol follows the monetary quantity.

1 The currency symbol precedes the monetary quantity.

p_sep_by_space Specifies an integer value indicating whether the int_curr_symbol or currency_symbol string is separated by a space from a nonnegative-formatted monetary quantity.
The following integer values are recognized:

0 No space separates the currency symbol from the monetary quantity.

1 A space separates the currency symbol from the monetary quantity.

2 A space separates the currency symbol and the positive_sign string, if adjacent.

n_cs_precedes Specifies an integer value indicating whether the int_curr_symbol or currency_symbol string precedes or follows the value for a negative-formatted monetary quantity.
The following integer values are recognized:

0 The currency symbol follows the monetary quantity.

1 The currency symbol precedes the monetary quantity.

n_sep_by_space Specifies an integer value indicating whether the int_curr_symbol or currency_symbol string is separated by a space from a negative-formatted monetary quantity.
The following integer values are recognized:

0 No space separates the currency symbol from the monetary quantity.

1 A space separates the currency symbol from the monetary quantity.

2 A space separates the currency symbol and the negative_sign string, if adjacent.

p_sign_posn Specifies an integer value indicating the positioning of the positive_sign string for a nonnegative-formatted monetary quantity.
The following integer values are recognized:

0 A left parenthesis and right parenthesis symbol enclose both the monetary quantity and the int_curr_symbol or currency_symbol string.

1 The positive_sign string precedes the quantity and the int_curr_symbol or currency_symbol string.

2 The positive_sign string follows the quantity and the int_curr_symbol or currency_symbol string.

3 The positive_sign string immediately precedes the int_curr_symbol or currency_symbol string.

4 The positive_sign string immediately follows the int_curr_symbol or currency_symbol string.

n_sign_posn Specifies an integer value indicating the positioning of the negative_sign string for a negative-formatted monetary quantity.
The following integer values are recognized:

0 A left parenthesis and right parenthesis symbol enclose both the monetary quantity and the int_curr_symbol or currency_symbol string.

1 The negative_sign string precedes the quantity and the int_curr_symbol or currency_symbol string.

2 The negative_sign string follows the quantity and the int_curr_symbol or currency_symbol string.

3 The negative_sign string immediately precedes the int_curr_symbol or currency_symbol string.

4 The negative_sign string immediately follows the int_curr_symbol or currency_symbol string.

+The current implementation of the DEC C Run-Time Library allows more than four characters to be specified. However, the user should not rely on this fact and use it exactly as specified. The 4-character limit will be implemented in a future version of the DEC C Run-Time Library.

Contents

Index

Legal

 
6494PRO_001.HTML