Compaq C
Compaq C Run-Time Library Reference Manual for
OpenVMS Systems
Chapter 10 Developing International Software
This chapter describes typical features of international software and
the features provided with the Compaq C Run-Time Library (RTL) that enable you to design
and implement international software.
See the Reference Section for more detailed information on the
functions described in this chapter.
10.1 Internationalization Support
The Compaq C RTL has added capabilities to allow application
developers to create international software. The Compaq C RTL
obtains information about a language and a culture by reading this
information from locale files.
10.1.1 Installation
If you are using these Compaq C RTL capabilities, you must install
a separate kit to provide these files to your system.
The save set, VMSI18N0nn, is provided on the same media as the
OpenVMS operating system.
To install this save set, follow the standard OpenVMS
installation procedures using this save set name as the name of the
kit. There are several categories of locales that you can select to
install. You can select as many locales as you need by answering the
following prompts:
Do you want European and US support?
Do you want Chinese support?
Do you want Japanese support?
Do you want Korean support?
Do you want Thai support?
Do you want the Unicode converters?
|
This kit also has an Installation Verification Procedure that Compaq
recommends you run to verify the correct installation of the kit.
10.1.2 Unicode Support
In OpenVMS Version 7.2, the Compaq C Run-Time Library
added the Universal Unicode locale, which is distributed with the
OpenVMS system, not with the VMSI18N0nn kit. The name of the
Unicode locale is:
Like those locales shipped with the VMSI18N0nn kit, the Unicode locale
is located at the standard location referred to by the SYS$I18N_LOCALE
logical name.
The UTF8-20 Unicode is based on Unicode standard Version V2.0. The
Unicode locale uses UCS-4 as wide-character encoding and UTF-8 as
multibyte character encoding.
Compaq C RTL also includes converters that perform conversions
between Unicode and any other supported character sets. The expanded
set of converters includes converters for UCS-2, UCS-4, and UTF-8
Unicode encoding. The Unicode converters can be used by the ICONV
CONVERT utility and by the
iconv
family of functions in the Compaq C Run-Time Library.
In OpenVMS Version 7.2, the Compaq C Run-Time Library
added Unicode character set converters for Microsoft Code Page 437.
10.2 Features of International Software
International software is software that can support multiple languages
and cultures. An international program should be able to:
- Display messages in the user's own language. This includes screen
displays, error messages and prompts.
- Handle culture-specific information such as:
- Date and time formatting
The conventions for representing
dates and times vary from one country to another. For example, in the
U.S.A., the month is given first; in the U.K the day is specified
first. Therefore, the date 12/5/1993 is interpreted as December 5, 1993
in the U.S.A., and as May 12,1993 in the U.K.
- Numeric formatting
The character that represents the decimal
point (the radix character) and the thousands separator character vary
from one country to another. For example, in the U.K. the period (.) is
used to represent the radix character, and the comma is used as a
separator. However, in Germany, the comma is used as the radix
character and the period is the separator character. Therefore, the
number 2,345.67 in the U.K. is the same as 2.345,67 in Germany.
- Monetary formatting
Currency values are represented by
different symbols and can be formatted using a variety of separator
characters, depending on the currency.
- Handle different coded character sets (not just ASCII).
- Handle a mixture of single and multibyte characters.
- Provide multipass string comparisons.
String comparison
functions such as
strcmp
compare strings by comparing the codepoint values of the characters in
the strings. However, some languages require more complex comparisons
to correctly sort strings.
To meet the above requirements, an application should not make any
assumptions about the language, local customs or the coded character
set used. All this localization data should be defined separately from
the program, and only bound to it at run-time.
The rest of the chapter describes how you can create international
software using Compaq C.
10.3 Developing International Software Using Compaq C
The Compaq C environment provides the following facilities for
creating international software:
- A method for separating localization data from a program.
Localization data is held in a database known as a locale.
This stores all the language and culture information required by a
program. See Section 10.4 for details of the structure of locales.
A program specifies what locales to use by calling the
setlocale
function. See Section 10.5 for more information.
- A method of separating message text from the program source.
This is achieved using message catalogs that store all the
messages for an application. The message catalog is linked to the
application at run-time. This means that the messages can be translated
into different languages and then the required language version is
selected at run-time. See Section 10.6.
- Compaq C RTL functions that are sensitive to localization data.
The Compaq C RTL includes functions for:
- A special wide-character data type defined in the Compaq C RTL
makes it easier to handle codesets that have a mixture of single and
multibyte characters. A set of functions is also defined to support
this wide character data type. See Section 10.9.
10.4 Locales
A locale consists of different categories, each of which determines one
aspect of the international environment. Table 10-1 lists the
categories in a locale and describes the information in each.
Table 10-1 Locale Categories
Category |
Description |
LC_COLLATE
|
Contains information about collating sequences.
|
LC_CTYPE
|
Contains information about character classification.
|
LC_MESSAGES
|
Defines the answers that are expected in response to yes/no prompts.
|
LC_MONETARY
|
Contains monetary formatting information.
|
LC_NUMERIC
|
Contains information about formatting numbers.
|
LC_TIME
|
Contains time and date information.
|
The locales provided by Compaq reside in the directory defined by the
SYS$I18N_LOCALE logical name. The file naming convention for locales is:
language_country_codeset.locale
|
Where:
- language is the mnemonic for the language. For example, EN
indicates an English locale.
- country is the mnemonic for the country. For example, GB
indicates a British locale.
- codeset is the name of the ISO standard codeset for the
locale. For example, ISO8859-1 is the ISO 8859 codeset for the Western
European languages. See Section 10.7 for more information about the
codesets supported.
10.5 Using the setlocale Function to Set Up an International Environment
An application sets up its international environment at run-time by
calling the
setlocale
function. The international environment is set up in one of two ways:
- The environment is defined by one locale. In this case, each of the
locale categories is defined by the same locale.
- Categories are defined separately. This lets you define a mixed
environment that uses different locales depending on the operation
performed. For example, if an English user has some Spanish files that
are to be processed by an application, the LC_COLLATE category could be
defined by a Spanish locale while the other categories are defined by
an English locale. To do this you would call
setlocale
once for each category.
The syntax for the
setlocale
function is:
char *setlocale(int category, const char *locale)
|
Where:
- category is either the name of a category, or LC_ALL.
Specifying LC_ALL means that all the categories are defined by the same
locale. Specify a category name to set up a mixed environment.
- locale is one of the following:
- The name of the locale to use.
If you want users to specify the
locale interactively, your application could prompt the user for a
locale name, and then pass the name as an argument to the
setlocale
function. A locale name has the following format:
language_country.codeset[@modifier]
|
For example,
setlocale(LC_COLLATE, "en_US.ISO8859-1")
selects the locale en_US.ISO8859-1 for the LC_COLLATE category.
- ""
This causes the function to use logical names to determine
the locale for the category specified. See Specifying the Locale Using Logical Names for details.
If an application does not call the
setlocale
function, the default locale is the C locale. This allows such
applications to call those functions that use information in the
current locale.
Specifying the Locale Using Logical Names
If the
setlocale
function is called with "" as the locale argument, the
function checks for a number of logical names to determine the locale
name for the category specified.
There are a number of logical names that users can set up to define
their international environment:
- Logical name corresponding to a category
For example, the
LC_NUMERIC logical name defines the locale associated with the
LC_NUMERIC category within the user's environment.
- LC_ALL
- LANG
The LANG logical name defines the user's language.
In addition to the logical names defined by a user, there are a number
of system-wide logical names, set up during system startup, that define
the default international environment for all users on a system:
- SYS$category
Where category is the name of a
category. This specifies the system default for that category.
- SYS$LC_ALL
- SYS$LANG
The
setlocale
function checks for user-defined logical names first, and if these are
not defined, it checks the system logical names.
10.6 Using Message Catalogs
An important requirement for international software is that it should
be able to communicate with the user in the user's own language. The
messaging system enables program messages to be created separately from
the program source, and linked to the program at run-time.
Messages are defined in a message text source file, and compiled into a
message catalog using the GENCAT command. The message catalog is
accessed by a program using the functions provided in the Compaq C RTL.
The functions provided to access the messages in a catalog are:
- The
catopen
function, which opens a specified catalog ready for use.
- The
catgets
function, which enables the program to read a specific message from a
catalog.
- The
catclose
function, which closes a specified catalog. Open message catalogs are
also closed by the
exit
function.
For information on generating message catalogs, see the GENCAT command
description in the OpenVMS system documentation.
10.7 Handling Different Character Sets
The Compaq C RTL supports a number of state-independent codesets and
codeset encoding schemes that contain the ASCII encoded Portable
Character Set. It does not support state-dependent codesets. The
codesets supported are:
- ISO8859-n
where n = 1,2,5,7,8 or 9. This
covers codesets for North America, Europe (West and East), Israel, and
Turkey.
- eucJP, SJIS, DECKANJI, SDECKANJI: Codesets used in Japan.
- eucTW, DECHANYU, BIG5, DECHANZI: Chinese codesets used in China
(PRC), Hong-Kong, and Taiwan.
- DECKOREAN: Codeset used in Korea.
10.7.1 Charmap File
The characters in a codeset are defined in a charmap file. The charmap
files supplied by Compaq are located in the directory defined by the
SYS$I18N_LOCALE logical name. The file type for a charmap file is .CMAP.
10.7.2 Converter Functions
As well as supporting different coded character sets, the Compaq C RTL
provides the following converter functions that enable you to convert
characters from one codeset to another:
-
iconv_open
---specifies the type of conversion. It allocates a conversion
descriptor required by the
iconv
function.
-
iconv
---converts characters in a file to the equivalent characters in a
different codeset. The converted characters are stored in a separate
file.
-
iconv_close
---deallocates a conversion descriptor and the resources allocated to
the descriptor.
10.7.3 Using Codeset Converter Files
The file naming convention for codeset converters is:
Where fromcode is the name of the source codeset, and
tocode is the name of the codeset to which characters are
converted.
You can add codeset converters to a given system by installing the
converter files in the directory pointed by the logical name
SYS$I18N_ICONV.
Codeset converter files can be implemented either as table-based
conversion files or as algorithm-based converter files created as
OpenVMS shareable images.
Creating a Table-based Conversion File
The following summarizes the necessary steps to create a table-based
codeset converter file:
- Create a text file that describes the mapping between any character
from the source codeset to the target codeset. For the format of this
file, see the DCL command ICONV COMPILE in the OpenVMS New
Features Manual, which processes such a file and creates a codeset
converter table file.
- Copy the resulting file from the previous step to the directory
pointed by the logical SYS$I18N_ICONV, assuming you have the privilege
to do so.
Creating an Algorithm-based Conversion File
Use the following steps to create an algorithm-based codeset converter
file implemented as a shareable image:
- Create C source files that implement the codeset converter. The API
is documented in the public header file
<iconv.h>
as follows:
- The universal entry point
_u_iconv_open
is called by the Compaq C RTL routine
iconv_open
to initialize a conversion.
-
_u_iconv_open
returns to
iconv_open
a pointer to the structure
__iconv_extern_obj_t
.
- Within this structure, the converter exports its own conversion
entry point and conversion close routine, which are called by the
Compaq C RTL routines
iconv
and
iconv_close
, respectively.
- The major and minor identifier fields are required by
iconv_open
to test for a possible mismatch between the library and the converter.
The converter usually assigns the constants __ICONV_MAJOR and
__ICONV_MINOR, defined in the
<iconv.h>
header file.
- The field tcs_mb_cur_max is used only by the DCL command
ICONV CONVERT to optimize its buffer usage. This field reflects the
maximum number of bytes that comprise a single character in the target
codeset, including the shift sequence (if any).
- Compile and link the modules that comprise the codeset converter
as an OpenVMS shareable image, making sure that the file name
adheres to the preceding conventions.
- Copy the resulting file from the previous step to the directory
pointed by the logical SYS$I18N_ICONV, assuming you have the privilege
to do so.
Some Final Notes
SYS$I18N_ICONV is by default a search list where the first directory in
the list SYS$SYSROOT:[SYS$I18N.ICONV.USER] is meant for use as a
site-specific repository for
iconv
codeset converters.
The number of codesets and locales installed vary from system to
system. Check the SYS$I18N directory tree for the codesets, converters,
and locales installed on your system.
10.8 Handling Culture-Specific Information
Each locale contains the following cultural information:
- Date and time information
The LC_TIME category defines the
conventions for writing date and time, the names of the days of the
week, and the names of months of the year.
- Numeric information
The LC_NUMERIC category defines the
conventions for formatting non-monetary values.
- Monetary information
The LC_MONETARY category defines currency
symbols and the conventions used to format monetary values.
- Yes and no responses
The LC_MESSAGES category defines the
strings expected in response to yes/no questions.
You can extract some of this cultural information using the
nl_langinfo
function and the
localeconv
function. See Section 10.8.1.
10.8.1 Extracting Cultural Information From a Locale
The
nl_langinfo
function returns a pointer to a string that contains an item of
information obtained from the program's current locale. The information
you can extract from the locale is:
- Date and time formats
- The names of the days of the week, and months of the year in the
local language
- The radix character
- The character used to separate groups of digits in non-monetary
values
- The currency symbol
- The name of the codeset for the locale
- The strings defined for responses to yes/no questions
The
localeconv
function returns a pointer to a data structure that contains numeric
formatting and monetary formatting data from the LC_NUMERIC and
LC_MONETARY categories.
10.8.2 Date and Time Formatting Functions
The functions that use the date and time information are:
-
strftime
---takes date and time values stored in a data structure and formats
them into an output string. The format of the output string is
controlled by a format string.
-
strptime
---converts a string (of type
char
) into date and time values. A format string defines how the string is
interpreted.
-
wcsftime
---does the same as
strftime
except that it creates a wide-character string.
10.8.3 Monetary Formatting Function
The
strfmon
function uses the monetary information in a locale to convert a number
of values into a string. The format of the string is controlled by a
format string.
10.8.4 Numeric Formatting
The information in LC_NUMERIC is used by various functions. For example,
strtod
,
wcstod
, and the print and scan functions determine the radix character from
the LC_NUMERIC category.
10.9 Functions for Handling Wide Characters
A character can be represented by single-byte or multibyte values
depending on the codeset. To make it easier to handle both single-byte
and multibyte characters in the same way, the Compaq C RTL defines a
wide-character data type, wchar_t. This data type can store
characters that are represented by 1-, 2-, 3-, or 4-byte values.
The functions provided to support wide characters are:
|