United States |
|
|
||
10.9.1 Character Classification FunctionsThe LC_CTYPE category in a locale classifies the characters in the locale's codeset into different types (alphabetic, numeric, lowercase, uppercase, and so on). There are two sets of functions, one for wide characters and one for single-byte characters, that test whether a character is of a specific type. The is* functions test single-byte characters, and the isw* functions test wide characters.
For example, the
iswalnum
function tests if a wide character is classed as either alphabetic or
numeric. It returns a nonzero value if the character is one of these
types. For more information about the classification functions see
Chapter 3 and the Reference Section.
The LC_CTYPE category defines mapping between pairs of characters of the locale. The most common character mapping is between uppercase and lowercase characters. However, a locale can support other than just case mappings. Two functions are provided to map one character to another according to the information in the LC_CTYPE category of the locale:
Two functions are provided for character case mapping:
For more information about these functions, see the Reference Section.
The set of input and output functions manages wide characters and wide-character strings. The functions for reading wide characters and wide-character strings are fgetwc , fgetws , getwc , and getwchar . There is also an ungetwc function that pushes a wide character back into the input stream. The functions for writing wide characters and wide-character strings are fputwc , fputws , putwc , and putwchar . All the scan functions allow for a culture-specific radix character, as defined in the LC_NUMERIC category of the current locale. The %lc, %C, %ls, and %S conversion specifiers enable the scan functions fwscanf , wscanf , swscanf , fscanf , scanf , and sscanf to read in wide characters. All the print functions can format numeric values according to the data in the LC_NUMERIC category of the current locale. The %lc, %C and %ls, %S conversion specifiers used with print functions convert wide characters to multibyte characters and print the resulting characters.
See Chapter 2 for details of all input and output functions.
Wide characters are used internally by an application to manage single-byte or multibyte characters. However, text files are generally stored in multibyte character format. To process these files, the multibyte characters need converting to wide-character format. This can be achieved using the following functions:
Similarly, the following functions convert wide characters into their multibyte equivalent:
Associated with these conversion functions, the mblen and mbrlen functions are used to determine the size of a multibyte character.
Several of the wide-character functions take an argument of type
"pointer to
mbstate_t
", where
mbstate_t
is an opaque datatype (like
FILE
or
fpos_t
) intended to keep the conversion state for the state-dependent
codesets.
The Compaq C RTL contains a set of functions (the wcs * and wmem * functions) that manipulate wide-character strings. For example, the wcscat function appends a wide-character string to the end of another string in the same way that the strcat function works on character strings of type char.
See Chapter 3 for details of the string manipulation functions.
In an international environment, string comparison functions need to allow for multipass collations. The collation requirements include:
Collating information is stored in the LC_COLLATE category of a locale. The Compaq C RTL includes the strcoll and wcscoll functions that use this collating information to compare two strings. Multipass collations by strcoll or wcscoll can be slower than using the strcmp or wcscmp functions. If your program needs to do many string comparisons using strcoll or wcscoll , it may be quicker to transform the strings once, using the strxfrm or wcsxfrm function, and then use the strcmp or wcscmp function. The term collation refers to the relative order of characters. The collation order is locale-specific and might ignore some characters. For example, an American dictionary ignores the hyphen in words and lists take-out between takeoff and takeover. Comparison, on the other hand, refers to the examination of characters for sameness or difference. For example, takeout and take-out are different words, although they may collate the same. Suppose an application sorts a list of words so it can later perform a binary search on the list to quickly retrieve a word. Using strcmp , take-in, take-out, and take-up would be grouped in one part of the table. Using strcoll and a locale that ignores hyphens, take-out would be grouped with takeoff and takeover, and would be considered a duplicate of takeout. To avoid a binary search finding takeout as a duplicate of take-out, an application would most likely use strcmp rather than strcoll for forming a binary tree.
Chapter 11
|
Function | Description |
---|---|
asctime | Converts a broken-down time from localtime into a 26-character string. |
ctime | Converts a time, in seconds, since 00:00:00, January 1, 1970 to an ASCII string of the form generated by the asctime function. |
ftime | Returns the elapsed time since 00:00:00, January 1, 1970 in the structure pointed to by its argument. |
getclock | Gets the current value of the system-wide clock. |
gettimeofday | Gets the date and time. |
gmtime | Converts time units to GMT (Greenwich Mean Time). |
localtime | Converts a time (expressed as the number of seconds elapsed since 00:00:00, January 1, 1970) into hours, minutes, seconds, and so on. |
mktime | Converts a local time structure to a calendar time value. |
time | Returns the time elapsed since 00:00:00, January 1, 1970, in seconds. |
tzset | Sets and accesses time-zone conversion. |
Also, the time-related information returned by
fstat
and
stat
uses the new date/time model described in the next section.
11.1 Date/Time Support Models
Beginning with OpenVMS Version 7.0, the Compaq C RTL changed its date/time support model from one based on local time to one based on Universal Coordinated Time (UTC). This allows the Compaq C RTL to implement ANSI C/POSIX functionality that previously could not be implemented. A UTC time-based model also makes the Compaq C RTL compatible with the behavior of the Tru64 UNIX time functions.
By default, newly compiled programs will generate entry points into UTC-based date/time routines.
For compatibility with OpenVMS systems prior to Version 7.0, previously compiled programs that relink on an OpenVMS Version 7.0 system will retain local-time-based date/time support. Relinking alone will not access UTC support.
Compiling programs with the _DECC_V4_SOURCE and _VMS_V6_SOURCE feature-test macros defined will also enable local-time-based entry points. That is, the new OpenVMS Version 7.0 date/time functions will not be enabled.
Functions with both UTC-based and local-time-based entry points are:
ctime mktime fstat stat ftime strftime gmtime time localtime wcsftime |
Introducing a UTC-based, date/time model implies a certain loss of performance because time-related functions supporting UTC must read and interpret time-zone files instead of doing simple computations in memory as was done for the date/time model based on local time. To decrease this performance degradation, OpenVMS Version 7.1 and higher can maintain the process-wide cache of time-zone files. The size of the cache (that is, the number of files in the memory) is determined by the value of the logical name DECC$TZ_CACHE_SIZE. The default value is 2. Because the time-zone files are relatively small (about 3 blocks each) you might consider defining DECC$TZ_CACHE_SIZE as the maximum number of time zones used by the application. For example, the default cache size fits an application that does not switch time zones during the run and runs on a system where the TZ environment variable is defined with both Standard and Summer time zone. |
In the UTC-based model, times are represented as seconds since the Epoch. The Epoch is defined as the time 0 hours, 0 minutes, 0 seconds, January 1, 1970 UTC. Seconds since the Epoch is a value interpreted as the number of seconds between a specified time and the Epoch.
The functions time and ftime return the time as seconds since the Epoch.
The functions ctime , gmtime , and localtime take as their argument a time value that represents the time in seconds from the Epoch.
The function mktime converts a broken-down time, expressed as local time, into a time value in terms of seconds since the Epoch.
The values st_ctime , st_atime , and st_mtime returned in the stat structure by the stat and fstat functions are also in terms of UTC.
Time support new to OpenVMS Version 7.0 includes the functions tzset , gettimeofday , and getclock , and the external variables tzname , timezone and daylight .
The UTC-based time model enables the Compaq C RTL to:
Universal Coordinated Time (UTC) is an international standard for measuring time of day. Under the UTC time standard, zero hours occurs when the Greenwich Meridian is at midnight. UTC has the advantage of always increasing, unlike local time, which can go backwards/forwards depending on daylight saving time.
Also, UTC has two additional components:
For the Compaq C RTL time support to work correctly on OpenVMS Version 7.0 and higher, the following must be in place:
For more information, see the section on setting up your system to compensate for different time zones in your OpenVMS System Manager's Manual: Essentials.
The Compaq C RTL uses local time-zone conversion rules to compute local time from UTC, as follows:
By default, the time-zone conversion rules used for computing local time from UTC are specified in time-zone files defined by the SYS$LOCALTIME and SYS$POSIXRULES system logicals. These logicals are set during an OpenVMS installation to point to time-zone files that represent the system's best approximation to local wall-clock time:
SYS$POSIXRULES can be the same as SYS$LOCALTIME. See the
tzset
function for more information.
11.4 Time-Zone Conversion Rule Files
The time-zone files pointed to by the SYS$LOCALTIME and SYS$POSIXRULES logicals are part of a public-domain, time-zone support package installed on OpenVMS Version 7.0 and higher systems.
This support package includes a series of source files that describe the timezone conversion rules for computing local time from UTC in world-wide timezones. OpenVMS Version 7.0 and higher systems provide a time-zone compiler called ZIC. The ZIC compiler compiles time-zone source files into binary files that the Compaq C RTL reads to acquire time-zone conversion specifications. For more information on the format of these source files see the OpenVMS system documentation for ZIC.
The time-zone files are organized as follows:
Several of the time-zone files have names based on acronyms for the areas that they represent. Table 11-2 lists these acronyms.
Time-Zone Acronym | Description |
---|---|
CET | Central European Time |
EET | Eastern European Time |
Factory | Specifies No Time Zone |
GB-Eire | Great Britain/Ireland |
GMT | Greenwich Mean Time |
NZ | New Zealand |
NZ-CHAT | New Zealand, Chatham Islands |
MET | Middle European Time |
PRC | Peoples Republic of China |
ROC | Republic of China |
ROK | Republic of Korea |
SystemV | Specific to System V operating system |
UCT | Universal Coordinated Time |
US | United States |
UTC | Universal Coordinated Time |
Universal | Universal Coordinated Time |
W-SU | Middle European Time |
WET | Western European Time |
A mechanism is available for you to define and implement your own time-zone rules. For more information, see the OpenVMS system documentation on the ZIC compiler and the description of tzset in the reference section of this manual.
Also, the SYS$LOCALTIME and SYS$POSIXRULES system logicals can be redefined to user-supplied time zones.
Previous | Next | Contents | Index |
privacy statement and legal notices |