10.9 Collating Functions

In an international environment, string comparison functions need to allow for multipass collations. The collation requirements include:

Collating information is stored in the LC_COLLATE category of a locale. The DEC C RTL includes the strcoll and wcscoll functions that use this collating information to compare two strings.

Multipass collations by strcoll or wcscoll can be slower than using the strcmp or wcscmp functions. If your program needs to do many string comparisons using strcoll or wcscoll, it may be quicker to transform the strings once, using the strxfrm or wcsxfrm function, and then use the strcmp or wcscmp function.

The term collation refers to the relative order of characters. The collation order is locale-specific and might ignore some characters. For example, an American dictionary ignores the hyphen in words and lists take-out between takeoff and takeover.

Comparison, on the other hand, refers to the examination of characters for sameness or difference. For example, takeout and take-out are different words, although they may collate the same.

Suppose an application sorts a list of words so it can later perform a binary search on the list to quickly retrieve a word. Using strcmp, take-in, take-out, and take-up would be grouped in one part of the table. Using strcoll and a locale that ignores hyphens, take-out would be grouped with takeoff and takeover, and would be considered a duplicate of takeout. To avoid a binary search finding takeout as a duplicate of take-out, an application would most likely use strcmp rather than strcoll for forming a binary tree.


Previous Page | Next Page | Table of Contents | Index