Updated: 11 December 1998 |
Previous | Contents | Index |
The collating-element statement specifies multicharacter collation items.
Syntax:
collating-element <character_symbol> from <string> |
The character_symbol argument defines a collation item that is a string of one or more characters as a single collation item. The character_symbol cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition.
The string argument specifies a string of two or more characters that define the character_symbol argument. The following are examples of the syntax for the collating-element statement:
collating-element <ch> from "<c><h>" collating-element <e-acute> from "<acute><e>" collating-element <11> from "<1><1>" |
A character_symbol argument defined by the
collating-element statement is recognized only within the
LC_COLLATE category.
2.2.2 The collating-symbol Statement
The collating-symbol statement specifies collation symbols for use in collation sequence statements.
Syntax:
collating-symbol <collating_symbol> |
The collating-symbol argument cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition. The following are examples of collating-symbol statements:
collating-symbol <UPPER_CASE> collating-symbol <HIGH> |
An argument defined by the collating-symbol statement is
recognized only within the LC_COLLATE category.
2.2.3 The order_start Statement
The order_start statement is followed by one or more collation order statements that assign collation weights to collation items and the order_end keyword. The order_start statement is a required statement.
Syntax:
order_start sort_rules;sort_rules;...;sort_rules collation_order_statements order_end |
Sort Rules
The sort_rules directives have the following syntax:
keyword, keyword,...,keyword |
where keyword is FORWARD, BACKWARD, or POSITION.
The sort_rules directives are optional. If specified, they define the rules to apply during string comparison. The number of specified sort_rules directives defines the number of weights each collation item is assigned (that is, the directives define the number of collation orders in the locale). If no sort_rules directives are specified, one forward directive is assumed and comparisons are made on a character basis rather than a string basis.
If sort_rules directives are present, the first one applies when comparing strings that use the primary weight, the second when comparing strings that use the secondary weight, and so on. Each set of sort_rules directives is separated by a semicolon (;). A sort_rules directive consists of one or more keywords separated by commas. The following keywords are supported:
The forward and backward keywords are mutually exclusive.
The following is an example of a sort_rules directive:
order_start forward;backward |
Collation Order Statements
The following syntax rules apply to the collation order statements:
The optional operands for each collation item are used to define the primary, secondary, or subsequent weights for the collation item. The special symbol IGNORE is used to indicate a collation item that is to be ignored when strings are compared.
An ellipsis keyword appearing in place of a collating_element_list indicates the weights are to be assigned, for the characters in the identified range, in numerically increasing order from the weight for the character symbol on the left side of the preceding statement.
The use of the ellipsis keyword results in a locale that may collate differently when compiled with different character set description (charmap) source files.
The UNDEFINED special symbol includes all coded character set values not specified explicitly or with an ellipsis symbol. These characters are inserted in the character collation order at the point indicated by the UNDEFINED special symbol and are all assigned the same weight. If no UNDEFINED special symbol exists and the collation order does not specify all collation items from the coded character set, a warning is issued and all undefined characters are placed at the end of the character collation order.
Example
The following is an example of a collation order statement section in the LC_COLLATE locale definition source file category:
order_start forward;backward UNDEFINED IGNORE;IGNORE <LOW> <space> <LOW>;<space> ... <LOW>;... <a> <a>;<a> <a-acute> <a>;<a-acute> <a-grave> <a>;<a-grave> <A> <a>;<A> <A-acute> <a>;<A-acute> <A-grave> <a>;<A-grave> <ch> <ch>;<ch> <Ch> <ch>;<Ch> <s> <s>;<s> <ss> <s><s>;<s><s> <eszet> <s><s>;<eszet><eszet> ... <HIGH>;... <HIGH> order_end |
This example is interpreted as follows:
The LC_CTYPE category defines character classification, case conversion, and other character attributes. This category begins with the LC_CTYPE header and ends with the END LC_CTYPE trailer.
All operands for LC_CTYPE category statements are defined as lists of characters. Each list consists of one or more characters or symbolic character names separated by semicolons. An ellipsis (...) can represent a series of characters; for example, <a>;...;<z> represents the characters in the range a through z.
Table 2-2 lists the statement keywords recognized in the LC_CTYPE category. In the keyword descriptions, the phrase "automatically included" means that an error does not occur if the referenced characters are included or omitted; the characters are provided if they are missing, and are accepted if they are present.
Keyword | Description |
---|---|
copy |
Specifies the name of an existing locale to be used as the definition
for this category.
If you specify a copy statement, you cannot specify any other keyword. |
upper |
Defines uppercase letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. The uppercase letters A through Z are automatically included in this set. |
lower |
Defines lowercase letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. The lowercase letters a through z are automatically included in this set. |
alpha |
Defines all letter characters.
Do not specify any character defined by the cntrl, digit, punct, or space keyword. Characters defined by the upper and lower keywords are automatically included in this character class. |
digit |
Defines numeric digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. The digits 0 through 9 are automatically included in this set. |
space |
Defines white-space characters.
Do not specify any character defined by the upper, lower, alpha, digit, graph, or xdigit keyword. The space, form-feed, new-line, carriage-return, tab, and vertical tab characters are automatically included in this set. |
cntrl |
Defines control characters.
Do not specify any character defined by the upper, lower, alpha, digit, punct, graph, print, or xdigit keyword. |
punct |
Defines punctuation characters.
Do not specify the space character or any character defined by the upper, lower, alpha, digit, cntrl, or xdigit keywords. |
graph |
Defines printable characters, excluding the space character.
Do not specify any character defined by the cntrl keyword. The characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. |
Defines printable characters, including the space character.
Do not specify any character defined by the cntrl keyword. The space character and characters defined by the upper, lower, alpha, digit, xdigit, and punct keywords are automatically included in this character class. |
|
xdigit |
Defines hexadecimal digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. Any character, however, can be specified for the hexadecimal values for 10 to 15. These alternate hexadecimal digits are not used by standard conversion routines when converting digit strings from hexadecimal to numeric quantities. The numbers 0 through 9 and the letters A through F and a through f are automatically included in this set. |
blank |
Defines blank characters.
The space and horizontal tab characters are included in this character class. Any characters defined by this statement are automatically included in the space class. |
toupper |
Defines the mapping of lowercase characters to uppercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered a lowercase character; the second character is considered an uppercase character. Only characters defined by the lower and upper keywords can be specified. If toupper is not specified, a through z is mapped to A through Z by default. |
tolower |
Defines the mapping of uppercase characters to lowercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered an uppercase character; the second character is considered a lowercase character. Only characters defined by the lower and upper keywords can be specified. If tolower is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both omitted, the mapping for each defaults to that of the C locale. |
Additional keywords can be provided to define new character classifications. For example:
charclass vowel vowel <a>;<e>;<i>;<o>;<u>;<y> |
The LC_CTYPE category does not support multicharacter elements (for example, the German Eszet character is traditionally classified as a lowercase letter). In proper capitalization of German text, the Eszet character is replaced by the two characters SS; there is no corresponding uppercase letter. This kind of conversion is outside the scope of the toupper and tolower keywords.
The following is a sample LC_CTYPE category specified in a locale definition source file:
LC_CTYPE #"alpha" is by default "upper" and "lower" #"alnum" is by definition "alpha" and "digit" #"print" is by default "alnum", "punct" and the space character #"graph" is by default "alnum" and "punct" #"tolower" is by default the reverse mapping of "toupper" # upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\ <N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z> # lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\ <n>;<o>;<P>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<X>;<y>;<z> # digit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\ <seven>;<eight>;<nine> # space <tab>;<newline>;<vertical-tab>;<form-feed>;\ <carriage-return>;<space> # cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\ <form-feed>;<carriage-return>;<NUL>;<SOH>;<STX>;\ <ETX>;<EOT>;<ENQ>;<ACK>;<SO>;<SI>;<DLE>;<DC1>;<DC2>;\ <DC3>;<DC4>;<NAK>;<SYN>;<ETB>;<CAN>;<EM>;<SUB>;\ <ESC>;<IS4>;<IS3>;<IS2>;<IS1>;<DEL> # punct <exclamation-mark>;<quotation-mark>;<number-sign>;\ <dollar-sign>;<percent-sign>;<ampersand>;<asterisk>;\ <apostrophe>;<left-parenthesis>;<right-parenthesis>;\ <plus-sign>;<comma>;<hyphen>;<period>;<slash>;\ <colon>;<semicolon>;<less-than-sign>;<equals-sign>;\ <greater-than-sign>;<question-mark>;<commercial-at>;\ <left-square-bracket>;<backslash>;<circumflex>;\ <right-square-bracket>;<underline>;<grave-accent>;\ <left-curly-bracket>;<vertical-line>;<tilde>;\ <right-curly-bracket> # xdigit <zero>;<one>;<two>;<three>;<four>;<five>;<six>;\ <seven>;<eight>;<nine>;<A>;<B>;<C>;<D>;<E>;<F>;\ <a>;<b>;<c>;<d>;<e>;<f> # blank <space>;<tab> # toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\ (<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\ (<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\ (<P>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\ (<u>,<U>);(<v>,<V>);(<w>,<W>);(<X>,<X>);(<y>,<Y>);\ (<z>,<Z>) # END LC_CTYPE |
The LC_MESSAGES category defines the format for affirmative and negative system responses. This category begins with the LC_MESSAGES header and ends with the END LC_MESSAGES trailer.
All operands for the LC_MESSAGES category are defined as strings or extended regular expressions bounded by double quotation marks ("). These operands are separated from the keyword they define by one or more blank characters (spaces or tabs). Two adjacent double quotation marks ("") indicate an undefined value.
Table 2-3 lists the statement keywords recognized in the LC_MESSAGES category.
Keyword | Description |
---|---|
copy |
Specifies the name of an existing locale to be used as the definition
of this category.
If you specify a copy statement, you cannot specify any other keyword. |
yesexpr | Specifies an extended regular expression that describes the acceptable affirmative response to a question expecting an affirmative or negative response. |
noexpr | Specifies an extended regular expression that describes the acceptable negative response to a question expecting an affirmative or negative response. |
yesstr |
Specifies the locale's equivalent of an acceptable affirmative response.
This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (YESSTR). Note that yesstr is likely to be withdrawn from the XPG4 standard; yesexpr is the recommended alternative. |
nostr |
Specifies the locale's equivalent of an acceptable negative response.
This string is accessible to applications through the nl_langinfo subroutine as nl_langinfo (NOSTR). Note that nostr is likely to be withdrawn from the XPG4 standard; noexpr is the recommended alternative. |
The following is a sample LC_MESSAGES category specified in a locale definition source file:
LC_MESSAGES # yesexpr "<circumflex><left-square-bracket><y><Y>\ <right-square-bracket>" noexpr "<circumflex><left-square-bracket><n><N>\ <right-square-bracket>" yesstr "<y><e><s>" nostr "<n><o>" # END LC_MESSAGES |
The LC_MONETARY category defines rules and symbols for formatting
monetary numeric information. This category begins with the LC_MONETARY
header and ends with the END LC_MONETARY trailer.
2.5.1 LC_MONETARY Keywords
All operands for the LC_MONETARY category keywords are defined as string or integer values. String values are bounded by double quotation marks ("). All values are separated from the keyword they define by one or more blank characters (spaces or tabs). Two adjacent double quotation marks ("") indicate an undefined string value. A negative one (--1) indicates an undefined integer value.
Table 2-4 lists the statement keywords recognized in the LC_MONETARY category.
Keyword | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
copy |
Specifies the name of an existing locale to be used as the definition
of this category.
If you specify a copy statement, you cannot specify any other keyword. |
||||||||||
int_curr_symbol |
Specifies the string used for the international currency symbol.
The operand for this keyword is a 4-character string+. The first three characters contain the alphabetic international currency symbol. The fourth character defines a character separator for insertion between the international currency symbol and a monetary quantity. |
||||||||||
currency_symbol | Specifies the string used for the local currency symbol. | ||||||||||
mon_decimal_point | Specifies the decimal delimiter string used for formatting monetary quantities. | ||||||||||
mon_thousands_sep | Specifies the character separator used for grouping digits to the left of the decimal delimiter in formatted monetary quantities. | ||||||||||
mon_grouping |
Specifies a string that defines the size of each group of digits in
formatted monetary quantities.
The operand for this keyword consists of a sequence of integers separated by semicolons. Each integer specifies the number of digits in a group. The first integer defines the size of the group immediately to the left of the decimal delimiter. Subsequent integers define succeeding groups to the left of the previous group. If the last integer is not --1, it is used to group any remaining digits. If the last integer is --1, no further grouping is performed. A sample interpretation of the mon_grouping statement follows. Assuming a value of 123456789 to be formatted and a mon_thousands_sep operand of ' (single quotation mark), the following results occur:
|
||||||||||
positive_sign | Specifies the string used to indicate a nonnegative-formatted monetary quantity. | ||||||||||
negative_sign | Specifies the string used to indicate a negative-formatted monetary quantity. | ||||||||||
int_frac_digits | Specifies an integer value representing the number of fractional digits (those after the decimal delimiter) to be displayed in a formatted monetary quantity using the int_curr_symbol value. | ||||||||||
frac_digits | Specifies an integer value representing the number of fractional digits (those after the decimal delimiter) to be displayed in a formatted monetary quantity using the currency_symbol value. | ||||||||||
p_cs_precedes |
Specifies an integer value indicating whether the
int_curr_symbol or
currency_symbol string precedes or follows the value for a
nonnegative-formatted monetary quantity.
The following integer values are recognized:
|
||||||||||
p_sep_by_space |
Specifies an integer value indicating whether the
int_curr_symbol or
currency_symbol string is separated by a space from a
nonnegative-formatted monetary quantity.
The following integer values are recognized:
|
||||||||||
n_cs_precedes |
Specifies an integer value indicating whether the
int_curr_symbol or
currency_symbol string precedes or follows the value for a
negative-formatted monetary quantity.
The following integer values are recognized:
|
||||||||||
n_sep_by_space |
Specifies an integer value indicating whether the
int_curr_symbol or
currency_symbol string is separated by a space from a
negative-formatted monetary quantity.
The following integer values are recognized:
|
||||||||||
p_sign_posn |
Specifies an integer value indicating the positioning of the
positive_sign string for a nonnegative-formatted monetary
quantity.
The following integer values are recognized:
|
||||||||||
n_sign_posn |
Specifies an integer value indicating the positioning of the
negative_sign string for a negative-formatted monetary
quantity.
The following integer values are recognized:
|
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
6494PRO_001.HTML
|