10.8 Functions for Handling Wide Characters

A character can be represented by single-byte or multibyte values depending on the codeset. To make it easier to handle both single- byte and multibyte characters in the same way, the DEC C RTL defines a wide-character data type, wchar_ t. This data type can store characters that are represented by 1-, 2-, 3-, or 4-byte values.

The functions provided to support wide characters are:

Character classification functions. See Section 10.8.1.
Case conversion functions. See Section 10.8.2.
Input and output functions. See Section 10.8.3.
Multibyte to wide-character conversion functions. See Section 10.8.4.
Wide-character to multibyte conversion functions. See Section 10.8.4.
Wide-character string manipulation functions. See Section 10.8.5.
Wide-character string collation and comparison functions. See Section 10.9.

10.8.1 Character Classification Functions

The LC_CTYPE category in a locale classifies the characters in the locale's codeset into different types (alphabetic, numeric, lowercase, uppercase, and so on). There are two sets of functions, one for wide characters and one for single-byte characters, that test whether a character is of a specific type. The is* functions test single-byte characters, and the isw* functions test wide characters.

For example, the iswalnum function tests if a wide character is classed as either alphabetic or numeric. It returns a nonzero value if the character is one of these types. For more information about the classification functions see Chapter 3 and the Reference Section.

10.8.2 Case Conversion Functions

The LC_CTYPE category defines mapping between pairs of characters of the locale. The most common character mapping is between uppercase and lowercase characters. However, a locale can support other than just case mappings.

Two functions are provided to map one character to another according to the information in the LC_CTYPE category of the locale:

wctrans-looks for the named mapping (predefined in the locale) between characters.
towcstrans-maps one character to another according to the named mapping given to the wctrans function.

Two functions are provided for character case mapping:

towlower-maps an uppercase wide character to its lowercase equivalent.
towupper-maps a lowercase wide character to its uppercase equivalent.

For more information about these functions, see the Reference Section.

10.8.3 Functions for Input and Output of Wide Characters

The set of input and output functions manages wide characters and wide-character strings.

Read Functions

The functions for reading wide characters and wide-character strings are fgetwc, fgetws, getwc, and getwchar.

There is also an ungetwc function that pushes a wide character back into the input stream.

Write Functions

The functions for writing wide characters and wide-character strings are fputwc, fputws, putwc, and putwchar.

Scan Functions

All the scan functions allow for a culture-specific radix character, as defined in the LC_NUMERIC category of the current locale.

The %lc, %C, %ls, and %S conversion specifiers enable the scan functions fwscanf, wscanf, swscanf, fscanf, scanf, and sscanf to read in wide characters.

Print Functions

All the print functions can format numeric values according to the data in the LC_NUMERIC category of the current locale.

The %lc, %C and %ls, %S conversion specifiers used with print functions convert wide characters to multibyte characters and print the resulting characters.

See Chapter 2 for details of all input and output functions.

10.8.4 Functions for Converting Multibyte and Wide Characters

Wide characters are used internally by an application to manage single-byte or multibyte characters. However, text files are generally stored in multibyte character format. To process these files, the multibyte characters need converting to wide-character format. This can be achieved using the following functions:

mbtowc, mbrtowc, btowc-convert one multibyte character to a wide character.
mbsrtowcs, mbstowcs-convert a multibyte character string to a wide-character string.

Similarly, the following functions convert wide characters into their multibyte equivalent:

wcrtomb, wctomb, wctob-convert a single wide character to a multibyte character.
wcsrtombs, wcstombs-convert a wide-character string to a multibyte character string.

Associated with these conversion functions, the mblen and mbrlen functions are used to determine the size of a multibyte character.

10.8.5 Functions for Manipulating Wide-Character Strings and Arrays

The DEC C RTL contains a set of functions (the wcs* and wmem* functions) that manipulate wide-character strings. For example, the wcscat function appends a wide-character string to the end of another string in the same way that the strcat function works on character strings of type char.

See Chapter 3 for details of the string manipulation functions.

Previous Page | Next Page | Table of Contents | Index