Compaq C
Compaq C Run-Time Library Reference Manual for 
OpenVMS Systems
 
 
  
Chapter 10 Developing International Software
This chapter describes typical features of international software and 
the features provided with the Compaq C Run-Time Library  (RTL) that enable you to design 
and implement international software.
 
See the Reference Section for more detailed information on the 
functions described in this chapter.
10.1 Internationalization Support
 
The Compaq C RTL has added capabilities to allow application 
developers to create international software. The Compaq C RTL 
obtains information about a language and a culture by reading this 
information from locale files.
10.1.1 Installation
 
If you are using these Compaq C RTL capabilities, you must install 
a separate kit to provide these files to your system.
 
The save set, VMSI18N0nn, is provided on the same media as the 
OpenVMS operating system.
 
To install this save set, follow the standard OpenVMS 
installation procedures using this save set name as the name of the 
kit. There are several categories of locales that you can select to 
install. You can select as many locales as you need by answering the 
following prompts:
 
 
  
    
       
      
   Do you want European and US support? 
   Do you want Chinese support? 
   Do you want Japanese support? 
   Do you want Korean support? 
   Do you want Thai support? 
   Do you want the Unicode converters? 
 
 |   
This kit also has an Installation Verification Procedure that Compaq 
recommends you run to verify the correct installation of the kit.
10.1.2 Unicode Support
 
In OpenVMS Version 7.2, the Compaq C Run-Time Library 
added the Universal Unicode locale, which is distributed with the 
OpenVMS system, not with the VMSI18N0nn kit. The name of the 
Unicode locale is:
 
 
Like those locales shipped with the VMSI18N0nn kit, the Unicode locale 
is located at the standard location referred to by the SYS$I18N_LOCALE 
logical name.
 
The UTF8-20 Unicode is based on Unicode standard Version V2.0. The 
Unicode locale uses UCS-4 as wide-character encoding and UTF-8 as 
multibyte character encoding.
 
Compaq C RTL also includes converters that perform conversions 
between Unicode and any other supported character sets. The expanded 
set of converters includes converters for UCS-2, UCS-4, and UTF-8 
Unicode encoding. The Unicode converters can be used by the ICONV 
CONVERT utility and by the
iconv
 family of functions in the Compaq C Run-Time Library.
 
In OpenVMS Version 7.2, the Compaq C Run-Time Library 
added Unicode character set converters for Microsoft Code Page 437.
10.2 Features of International Software
 
International software is software that can support multiple languages 
and cultures. An international program should be able to:
 
  -  Display messages in the user's own language. This includes screen 
  displays, error messages and prompts.
  
 -  Handle culture-specific information such as:
  
    -  Date and time formatting 
 The conventions for representing 
    dates and times vary from one country to another. For example, in the 
    U.S.A., the month is given first; in the U.K the day is specified 
    first. Therefore, the date 12/5/1993 is interpreted as December 5, 1993 
    in the U.S.A., and as May 12,1993 in the U.K.
     -  Numeric formatting 
 The character that represents the decimal 
    point (the radix character) and the thousands separator character vary 
    from one country to another. For example, in the U.K. the period (.) is 
    used to represent the radix character, and the comma is used as a 
    separator. However, in Germany, the comma is used as the radix 
    character and the period is the separator character. Therefore, the 
    number 2,345.67 in the U.K. is the same as 2.345,67 in Germany.
     -  Monetary formatting 
 Currency values are represented by 
    different symbols and can be formatted using a variety of separator 
    characters, depending on the currency.
    
   -  Handle different coded character sets (not just ASCII).
  
 -  Handle a mixture of single and multibyte characters.
  
 -  Provide multipass string comparisons. 
 String comparison 
  functions such as
strcmp
 compare strings by comparing the codepoint values of the characters in 
 the strings. However, some languages require more complex comparisons 
 to correctly sort strings.
  
To meet the above requirements, an application should not make any 
assumptions about the language, local customs or the coded character 
set used. All this localization data should be defined separately from 
the program, and only bound to it at run-time.
 
The rest of the chapter describes how you can create international 
software using Compaq C.
10.3 Developing International Software Using Compaq C
 
The Compaq C environment provides the following facilities for 
creating international software:
 
  - A method for separating localization data from a program. 
  
 Localization data is held in a database known as a locale. 
  This stores all the language and culture information required by a 
  program. See Section 10.4 for details of the structure of locales.
     A program specifies what locales to use by calling the
setlocale
 function. See Section 10.5 for more information.
   - A method of separating message text from the program source. 
  
 This is achieved using message catalogs that store all the 
  messages for an application. The message catalog is linked to the 
  application at run-time. This means that the messages can be translated 
  into different languages and then the required language version is 
  selected at run-time. See Section 10.6.
   - Compaq C RTL functions that are sensitive to localization data. 
  
 The Compaq C RTL includes functions for:
  
   - A special wide-character data type defined in the Compaq C RTL 
  makes it easier to handle codesets that have a mixture of single and 
  multibyte characters. A set of functions is also defined to support 
  this wide character data type. See Section 10.9.
  
10.4 Locales
A locale consists of different categories, each of which determines one 
aspect of the international environment. Table 10-1 lists the 
categories in a locale and describes the information in each.
 
 
  Table 10-1 Locale Categories
  
    | Category  | 
    Description  | 
   
  
    | 
      LC_COLLATE
     | 
    
      Contains information about collating sequences.
     | 
   
  
    | 
      LC_CTYPE
     | 
    
      Contains information about character classification.
     | 
   
  
    | 
      LC_MESSAGES
     | 
    
      Defines the answers that are expected in response to yes/no prompts.
     | 
   
  
    | 
      LC_MONETARY
     | 
    
      Contains monetary formatting information.
     | 
   
  
    | 
      LC_NUMERIC
     | 
    
      Contains information about formatting numbers.
     | 
   
  
    | 
      LC_TIME
     | 
    
      Contains time and date information.
     | 
   
 
The locales provided by Compaq reside in the directory defined by the 
SYS$I18N_LOCALE logical name. The file naming convention for locales is:
 
 
  
    
       
      
language_country_codeset.locale 
 
 |   
Where:
 
  - language is the mnemonic for the language. For example, EN 
  indicates an English locale.
  
 - country is the mnemonic for the country. For example, GB 
  indicates a British locale.
  
 - codeset is the name of the ISO standard codeset for the 
  locale. For example, ISO8859-1 is the ISO 8859 codeset for the Western 
  European languages. See Section 10.7 for more information about the 
  codesets supported.
  
10.5 Using the setlocale Function to Set Up an International Environment
An application sets up its international environment at run-time by 
calling the
setlocale
 function. The international environment is set up in one of two ways:
 
  - The environment is defined by one locale. In this case, each of the 
  locale categories is defined by the same locale.
  
 - Categories are defined separately. This lets you define a mixed 
  environment that uses different locales depending on the operation 
  performed. For example, if an English user has some Spanish files that 
  are to be processed by an application, the LC_COLLATE category could be 
  defined by a Spanish locale while the other categories are defined by 
  an English locale. To do this you would call
setlocale
 once for each category.
  
The syntax for the
setlocale
 function is:
 
  
    
       
      
char *setlocale(int category, const char *locale) 
     | 
   
 
 
Where:
 
  - category is either the name of a category, or LC_ALL. 
  Specifying LC_ALL means that all the categories are defined by the same 
  locale. Specify a category name to set up a mixed environment.
  
 - locale is one of the following:
  
    - The name of the locale to use. 
 If you want users to specify the 
    locale interactively, your application could prompt the user for a 
    locale name, and then pass the name as an argument to the
setlocale
 function. A locale name has the following format:
 
  
    
       
      
   language_country.codeset[@modifier] 
 
 |   
       For example,
setlocale(LC_COLLATE,    "en_US.ISO8859-1")
 selects the locale en_US.ISO8859-1 for the LC_COLLATE category.
     - "" 
 This causes the function to use logical names to determine 
    the locale for the category specified. See Specifying the Locale Using Logical Names for details.
    
  
If an application does not call the
setlocale
 function, the default locale is the C locale. This allows such 
 applications to call those functions that use information in the 
 current locale.
 
Specifying the Locale Using Logical Names
 
 
If the
setlocale
 function is called with "" as the locale argument, the 
 function checks for a number of logical names to determine the locale 
 name for the category specified.
 
There are a number of logical names that users can set up to define 
their international environment:
 
  - Logical name corresponding to a category 
 For example, the 
  LC_NUMERIC logical name defines the locale associated with the 
  LC_NUMERIC category within the user's environment.
   - LC_ALL
  
 - LANG 
 The LANG logical name defines the user's language.
  
In addition to the logical names defined by a user, there are a number 
of system-wide logical names, set up during system startup, that define 
the default international environment for all users on a system:
 
  - SYS$category 
 Where category is the name of a 
  category. This specifies the system default for that category.
   -  SYS$LC_ALL
  
 - SYS$LANG
  
The
setlocale
 function checks for user-defined logical names first, and if these are 
 not defined, it checks the system logical names.
10.6 Using Message Catalogs
 
An important requirement for international software is that it should 
be able to communicate with the user in the user's own language. The 
messaging system enables program messages to be created separately from 
the program source, and linked to the program at run-time.
 
Messages are defined in a message text source file, and compiled into a 
message catalog using the GENCAT command. The message catalog is 
accessed by a program using the functions provided in the Compaq C RTL.
 
The functions provided to access the messages in a catalog are:
 
  - The
catopen
 function, which opens a specified catalog ready for use.
  
 - The
catgets
 function, which enables the program to read a specific message from a 
 catalog.
  
 - The
catclose
 function, which closes a specified catalog. Open message catalogs are 
 also closed by the
exit
 function.
  
For information on generating message catalogs, see the GENCAT command 
description in the OpenVMS system documentation.
10.7 Handling Different Character Sets
 
The Compaq C RTL supports a number of state-independent codesets and 
codeset encoding schemes that contain the ASCII encoded Portable 
Character Set. It does not support state-dependent codesets. The 
codesets supported are:
 
  - ISO8859-n 
 where n = 1,2,5,7,8 or 9. This 
  covers codesets for North America, Europe (West and East), Israel, and 
  Turkey.
   - eucJP, SJIS, DECKANJI, SDECKANJI: Codesets used in Japan.
  
 - eucTW, DECHANYU, BIG5, DECHANZI: Chinese codesets used in China 
  (PRC), Hong-Kong, and Taiwan.
  
 - DECKOREAN: Codeset used in Korea.
  
10.7.1 Charmap File
The characters in a codeset are defined in a charmap file. The charmap 
files supplied by Compaq are located in the directory defined by the 
SYS$I18N_LOCALE logical name. The file type for a charmap file is .CMAP.
10.7.2 Converter Functions
 
As well as supporting different coded character sets, the Compaq C RTL 
provides the following converter functions that enable you to convert 
characters from one codeset to another:
 
  - 
iconv_open
---specifies the type of conversion. It allocates a conversion 
descriptor required by the
iconv
 function.
  
 - 
iconv
---converts characters in a file to the equivalent characters in a 
different codeset. The converted characters are stored in a separate 
file.
  
 - 
iconv_close
---deallocates a conversion descriptor and the resources allocated to 
the descriptor.
  
10.7.3 Using Codeset Converter Files
The file naming convention for codeset converters is:
 
 
Where fromcode is the name of the source codeset, and 
tocode is the name of the codeset to which characters are 
converted.
 
You can add codeset converters to a given system by installing the 
converter files in the directory pointed by the logical name 
SYS$I18N_ICONV.
 
Codeset converter files can be implemented either as table-based 
conversion files or as algorithm-based converter files created as 
OpenVMS shareable images.
 
Creating a Table-based Conversion File
 
 
The following summarizes the necessary steps to create a table-based 
codeset converter file:
 
  - Create a text file that describes the mapping between any character 
  from the source codeset to the target codeset. For the format of this 
  file, see the DCL command ICONV COMPILE in the OpenVMS New 
  Features Manual, which processes such a file and creates a codeset 
  converter table file.
  
 - Copy the resulting file from the previous step to the directory 
  pointed by the logical SYS$I18N_ICONV, assuming you have the privilege 
  to do so.
  
Creating an Algorithm-based Conversion File
 
 
Use the following steps to create an algorithm-based codeset converter 
file implemented as a shareable image:
 
  - Create C source files that implement the codeset converter. The API 
  is documented in the public header file
<iconv.h>
as follows:
  
    - The universal entry point
_u_iconv_open
 is called by the Compaq C RTL routine
iconv_open
 to initialize a conversion.
    
 - 
_u_iconv_open
 returns to
iconv_open
 a pointer to the structure
__iconv_extern_obj_t
.
    
 - Within this structure, the converter exports its own conversion 
    entry point and conversion close routine, which are called by the 
    Compaq C RTL routines
iconv
 and
iconv_close
, respectively.
    
 - The major and minor identifier fields are required by
iconv_open
 to test for a possible mismatch between the library and the converter. 
 The converter usually assigns the constants __ICONV_MAJOR and 
 __ICONV_MINOR, defined in the
<iconv.h>
 header file.
    
 - The field tcs_mb_cur_max is used only by the DCL command 
    ICONV CONVERT to optimize its buffer usage. This field reflects the 
    maximum number of bytes that comprise a single character in the target 
    codeset, including the shift sequence (if any).
  
  
   -  Compile and link the modules that comprise the codeset converter 
  as an OpenVMS shareable image, making sure that the file name 
  adheres to the preceding conventions.
  
 -  Copy the resulting file from the previous step to the directory 
  pointed by the logical SYS$I18N_ICONV, assuming you have the privilege 
  to do so.
  
Some Final Notes
 
 
SYS$I18N_ICONV is by default a search list where the first directory in 
the list SYS$SYSROOT:[SYS$I18N.ICONV.USER] is meant for use as a 
site-specific repository for
iconv
 codeset converters.
 
The number of codesets and locales installed vary from system to 
system. Check the SYS$I18N directory tree for the codesets, converters, 
and locales installed on your system.
10.8 Handling Culture-Specific Information
 
Each locale contains the following cultural information:
 
  -  Date and time information 
 The LC_TIME category defines the 
  conventions for writing date and time, the names of the days of the 
  week, and the names of months of the year.
   -  Numeric information 
 The LC_NUMERIC category defines the 
  conventions for formatting non-monetary values.
   -  Monetary information 
 The LC_MONETARY category defines currency 
  symbols and the conventions used to format monetary values.
   - Yes and no responses 
 The LC_MESSAGES category defines the 
  strings expected in response to yes/no questions.
  
You can extract some of this cultural information using the
nl_langinfo
 function and the
localeconv
function. See Section 10.8.1.
10.8.1 Extracting Cultural Information From a Locale
 
The
nl_langinfo
 function returns a pointer to a string that contains an item of 
 information obtained from the program's current locale. The information 
 you can extract from the locale is:
 
  - Date and time formats
  
 - The names of the days of the week, and months of the year in the 
  local language
  
 - The radix character
  
 - The character used to separate groups of digits in non-monetary 
  values
  
 - The currency symbol
  
 - The name of the codeset for the locale
  
 - The strings defined for responses to yes/no questions
  
The
localeconv
 function returns a pointer to a data structure that contains numeric 
 formatting and monetary formatting data from the LC_NUMERIC and 
 LC_MONETARY categories.
10.8.2 Date and Time Formatting Functions
 
The functions that use the date and time information are:
 
  - 
strftime
---takes date and time values stored in a data structure and formats 
them into an output string. The format of the output string is 
controlled by a format string.
  
 - 
strptime
---converts a string (of type
char
) into date and time values. A format string defines how the string is 
interpreted.
  
 - 
wcsftime
---does the same as
strftime
except that it creates a wide-character string.
  
10.8.3 Monetary Formatting Function
The
strfmon
 function uses the monetary information in a locale to convert a number 
 of values into a string. The format of the string is controlled by a 
 format string.
10.8.4 Numeric Formatting
 
The information in LC_NUMERIC is used by various functions. For example,
strtod
,
wcstod
, and the print and scan functions determine the radix character from 
the LC_NUMERIC category.
10.9 Functions for Handling Wide Characters
 
A character can be represented by single-byte or multibyte values 
depending on the codeset. To make it easier to handle both single-byte 
and multibyte characters in the same way, the Compaq C RTL defines a 
wide-character data type, wchar_t. This data type can store 
characters that are represented by 1-, 2-, 3-, or 4-byte values.
 
The functions provided to support wide characters are:
 
  
         |