Document revision date: 19 July 1999 | |
Previous | Contents | Index |
When you are signaling a condition variable and that signal might cause the condition variable to be deleted, signal or broadcast the condition variable with the mutex locked.
The following C code fragment is executed by a releasing thread (Thread A) to wake a blocked thread:
pthread_mutex_lock (m); ... /* Change shared variables to allow another thread to proceed */ predicate = TRUE; pthread_mutex_unlock (m); (1) pthread_cond_signal (cv); (2) |
The following C code fragment is executed by a potentially blocking thread (thread B):
pthread_mutex_lock (m); while (!predicate ) pthread_cond_wait (cv, m); pthread_mutex_unlock (m); pthread_cond_destroy (cv); |
These code fragments also demonstrate a race condition; that is, the routine, as coded, depends on a sequence of events among multiple threads, but does not enforce the desired sequence. Signaling the condition variable while still holding the associated mutex eliminates the race condition. Doing so prevents thread B from deleting the condition variable until after thread A has signaled it.
This problem can occur when the releasing thread is a worker thread and the waiting thread is a boss thread, and the last worker thread tells the boss thread to delete the variables that are being shared by boss and worker.
Code the signaling of a condition variable with the mutex locked as follows:
pthread_mutex_lock (m); ... /* Change shared variables to allow some other thread to proceed */ pthread_cond_signal (cv); pthread_mutex_unlock (m); |
Although it is acceptable to the compiler, it is inappropriate to use the following POSIX.1c standard macros to initialize DECthreads synchronization objects that are allocated on the stack:
Each thread synchronization object is intended to be shared among the program's threads. If such an object is allocated on the stack, its address can asynchronously become invalid when the thread returns or otherwise terminates. For this reason, Compaq does not recommend allocating any thread synchronization object on the stack.
DECthreads detects some cases of misuse of static initialization of automatically allocated (stack-based) thread synchronization objects. For instance, if the thread on whose stack a statically initialized mutex is allocated attempts to access that mutex, the operation fails and returns [EINVAL]. If the application does not check status returns from DECthreads routines, this failure can remain unidentified. Further, if the operation was a call to pthread_mutex_lock(), the program can encounter a thread synchronization failure, which in turn can result in unexpected program behavior including memory corruption. (For performance reasons, DECthreads does not currently detect this error when a statically initialized mutex is accessed by a thread other than the one on whose stack the object was automatically allocated.)
If your application must allocate a thread synchronization object on
the stack, the application must initialize the object before it is used
by calling one of the routines pthread_mutex_init(),
pthread_cond_init(), or pthread_rwlock_init(), as
appropriate for the object. Your application must also destroy the
thread synchronization object before it goes out of scope (for
instance, due to the routine's returning control or raising an
exception) by calling one of the routines
pthread_mutex_destroy(), pthread_cond_destroy(), or
pthread_rwlock_destroy(), as appropriate for the object.
3.7 Granularity Considerations
Granularity refers to the smallest unit of storage (that is, bytes, words, longwords, or quadwords) that a host computer can load or store in one machine instruction. Granularity considerations can affect the correctness of a program in which concurrent or asynchronous access can occur to data objects stored in the same memory granule. This can occur in a multithreaded program, where different threads access data objects in the same memory granule, or in a single-threaded program that has any of the following characteristics:
The subsections that follow explain the granularity concept, why it can
affect the correctness of a multithreaded program, and techniques the
programmer can use to prevent the granularity-related race condition
known as word tearing.
3.7.1 Determinants of a Program's Granularity
A computer's processor typically makes available some set of granularities to programs, based on the processor's architecture, cache architecture, and instruction set. However, the computer's natural granularity also depends on the organization of the computer's memory and its bus architecture. For example, even if the processor "naturally" reads and writes 8-bit memory granules, a program's memory transfers may, in fact, occur in 32- or 64-bit memory granules.
On a computer that supports a set of granularities, the compiler determines a given program's actual granularity by the instructions it produces for the program to execute. For example, a given compiler on Alpha AXP systems might generate code that causes every memory access to load or store a quadword, regardless of the size of the data object specified in the application's source code. In this case, the application has a quadword actual granularity. For this application, 8-bit, 16-bit, and 32-bit writes are not atomic with respect to other memory operations that overlap the same 64-bit memory granule.
To provide a run-time environment for applications that is consistent and coherent, an operating system's services and libraries should be built so that they provide the same actual granularity. When this is the case, an operating system can be said to provide a system granularity to the applications that it hosts. (A system's system granularity is typically reflected in the default actual granularity that the system's compilers encodes when producing an object file.)
When preparing to port a multithreaded application from one system to
another, you should determine whether there is a difference in the
system granularities between the source and target systems. If the
target system has a larger system granularity than the source system,
you should become informed about the programming techniques presented
in the sections that follow.
3.7.1.1 Alpha AXP Processor Granularity
Systems based on the Alpha AXP processor family have a quadword (64-bit) natural granularity.
Versions EV4 and EV5 of the Alpha AXP processor family provide instructions for only longword- and quadword-length atomic memory accesses. Newer Alpha AXP processors (EV5.6 and later) support byte- and word-length atomic memory accesses as well as longword- and quadword-length atomic memory accesses.
On systems using DIGITAL UNIX Version 4.0 and later: If you use DEC C or DEC C++ to compile your application's modules on a system that uses the EV4 or EV5 version of the Alpha AXP processor, you can use the -arch56 compiler switch to cause the compiler to produce instructions available in the Alpha AXP processor version EV5.6 or later, including instructions for byte- and word-length atomic memory access, as needed. When an application compiled with the -arch56 switch runs under DIGITAL UNIX Version 4.0 or later with a newer Alpha AXP processor (that is, EV5.6 or later), it utilizes that processor's full instruction set. When that same application runs under DIGITAL UNIX Version 4.0 or later with an older Alpha AXP processor (that is, EV4 or EV5), the operating system performs a software emulation of each instruction that is not available to the older processor. See the DEC C and DEC C++ compiler documentation for more information about the -arch56 switch. |
On DIGITAL UNIX systems, use the /usr/sbin/psrinfo -v command
to determine the version(s) of your system's Alpha AXP processor(s).
3.7.1.2 VAX Processor Granularity
Systems based on the VAX processor family have longword (32-bit) natural granularity.
For more information about the granularity considerations of porting an
application from an OpenVMS VAX system to an OpenVMS Alpha systems,
consult the document Migrating to an OpenVMS AXP System in the
OpenVMS documentation set.
3.7.2 Compiler Support for Determining the Program's Actual Granularity
Table 3-1 summarizes the actual granularities that are provided by the respective compilers on the respective Compaq platforms.
Platform | Compiler | Default Granularity Setting | Optional Granularity Settings |
---|---|---|---|
DIGITAL UNIX Version 4.0D (Alpha only) | C/C++ | quadword | None |
OpenVMS Alpha Version 7.2 | C/C++ | quadword | byte, word, longword |
OpenVMS VAX Version 7.2 | C/C++ | longword | byte, word |
Windows NT Version 3.51 (Alpha only) | C/C++ | longword | byte, word |
Of course, for compilers that support an optional granularity setting,
it is possible to compile different modules in your application with
different granularity settings. You might do so to avoid the
possibility of word-tearing race condition, as described below, or to
improve the application's performance.
3.7.3 Word Tearing
In a multithreaded application, concurrent access by different threads to data objects that occupy the same memory granule can lead to a race condition known as word tearing. This situation occurs when two or more threads independently read the same granule of memory, update different portions of that granule, then independently (that is, asynchronously) store their respective copies of that granule. Because the order of the store operations is indeterminate, only the last thread to write the granule continues with a correct "view" of the granule's contents.
In a multithreaded program the potential for a word-tearing race condition exists only when both of the following conditions are met:
For instance, given a multithreaded program that has been compiled to
have longword actual granularity, if any two of the program's threads
can concurrently update different bytes or words in the same longword,
then that program is, in theory, at risk for encountering a
word-tearing race condition. However, in practice, language-defined
restrictions on the alignments of data objects limits the actual number
of candidates for a word-tearing scenario, as described in the next
section.
3.7.4 Alignments of Members of Composite Data Objects
The only data objects that are candidates for participating in a word-tearing race condition are members of composite data objects---that is, C language structures, unions, and arrays. In other words, the application's threads might access data objects that are members of structures or unions, where those members occupy the same byte, word, longword, or quadword. Similarly, the application might access arrays whose elements occupy the same word, longword, or quadword, or whose elements are themselves composite data objects whose members can do so.
On the other hand, the C language specification allows the compiler to allocate scalar data objects so that each is aligned on a boundary for the memory granule that the compiler prefers, as follows:
For the details of the compiler's rules for aligning scalar and
composite data objects, see the DEC C and C++ compiler documentation
for your application's host platforms.
3.7.5 Avoiding Granularity-Related Errors
Compaq recommends that you inspect your multithreaded application's code to determine whether a word-tearing race condition is possible for any two or more of the application's threads. That is, determine whether any two or more threads can concurrently write contiguously defined members of the same composite data object where those members occupy the same memory granule whose size is greater than or equal to the application's actual granularity.
If you find that you must change your application to avoid a
word-tearing scenario, there are several approaches available. The
simplest techniques require only that you change the definition of the
target composite data object before recompiling the application. The
following sections offers some suggestions.
3.7.5.1 Changing the Composite Data Object's Layout
If you can change the organization or layout of the composite data object's definition, you should do both of the following:
If you cannot change the organization or layout of the composite data object's definition, you should do one of the following:
If you must maintain the composite data object's layout and
cannot change the storage qualifiers for the application's composite
objects, you can instead use the technique described in the next
section.
3.7.5.3 Using One Mutex Per Composite Data Object
If your source code inspection identified an array or a set of contiguously defined structure or union members that is subject to a word-tearing race condition, the program can use a mutex that is dedicated to protect all write accesses by all threads to those data objects, rather than change the definition of the composite data objects.
To use this technique, create a separate mutex for each composite data object where any members share a memory granule that is greater than or equal to the program's actual granularity. For example, given an application with quadword actual granularity, if structure members M1 and M2 occupy the same longword in structure S and those members can be written concurrently by more than one thread, then the application must create and reserve a mutex for use only to protect all write accesses by all threads to those two members.
In general, this is a less desirable technique due to performance
considerations. However, if the absolute number of thread accesses to
the target data objects over the application's run-time will be small,
this technique provides explicit, portable correctness for all thread
accesses to the target members.
3.7.6 Identifying Possible Word-Tearing Situations Using Visual Threads
For DIGITAL UNIX systems, the Visual Threads tool can warn the developer at application run-time that a possible word-tearing situation has been detected. Enable the UnguardedData rule before running the application. This rule causes Visual Threads to track whether any memory location in the application has been accessed using the Load Locked...Store Conditional pair of Alpha AXP instructions, then later accessed using a normal Load...Store instruction pair. See the Visual Threads product's online help for more information.
Visual Threads is available as part of the Developer's Toolkit for
DIGITAL UNIX.
3.8 One-Time Initialization
Your program might have one or more routines that must be executed before any thread executes code in your facility, but that must be executed only once, regardless of the sequence in which threads start executing. For example, your program can initialize mutexes, condition variables, or thread-specific data keys---each of which must be created only once---in a one-time initialization routine.
Use the pthread_once() routine to ensure that your program's initialization routine executes only once---that is, by the first thread that attempts to initialize your program's resources. Multiple threads can call the pthread_once() routine, and DECthreads ensures that the specified routine is called only once.
On the other hand, rather than use the pthread_once() routine, your program can statically initialize a mutex and a flag, then simply lock the mutex and test the flag. In many cases, this technique might be more straightforward to implement.
Finally, you can use implicit (and nonportable) initialization
mechanisms, such as OpenVMS LIB$INITIALIZE, DIGITAL UNIX dynamic loader
__init_ code, or Win32 DLL initialization handlers for Windows
NT and Windows 95.
3.9 Managing Dependencies Upon Other Libraries
Because multithreaded programming has become common only recently, many existing code libraries are incompatible with multithreaded routines. For example, many of the traditional C run-time library routines maintain state across multiple calls using static storage. This storage can become corrupted if routines are called from multiple threads at the same time. Even if the calls from multiple threads are serialized, code that depends upon a sequence of return values might not work.
For example, the UNIX getpwent(2) routine returns the entries in the password file in sequence. If multiple threads call getpwent(2) repeatedly, even if the calls are serialized, no thread can obtain all entries in the password file.
Library routines might be compatible with multithreaded programming to
different extents. The important distinctions are thread reentrancy and
thread safety.
3.9.1 Thread Reentrancy
A routine is thread reentrant if it performs correctly despite being called simultaneously or sequentially by different threads. For example, the standard C run-time library routine strtok() can be made thread reentrant most efficiently by adding an argument that specifies a context for the sequence of tokens. Thus, multiple threads can simultaneously parse different strings without interfering with each other.
The ideal thread-reentrant routine has no dependency on static data. Because static data must be synchronized using mutexes and condition variables, there is always a performance penalty due to the time required to lock and unlock the mutex and also in the loss of potential parallelism throughout the program. A routine that does not use any data that is shared between threads can proceed without locking.
If you are developing new interfaces, make sure that any persistent
context information (like the last-token-returned pointer in
strtok()) is passed explicitly so that multiple threads can
process independent streams of information independently. Return
information to the caller through routine values, output parameters
(where the caller passes the address and length of a buffer), or by
allocating dynamic memory and requiring the caller to free that memory
when finished. Try to avoid using errno for returning error or
diagnostic information; use routine return values instead.
3.9.2 Thread Safety
A routine is thread safe if it can be called simultaneously from multiple threads without risk of corruption. Generally this means that it does some simple level of locking (perhaps using the DECthreads global lock) to prevent simultaneously active calls in different threads. See Section 3.9.3.3 for information about the DECthreads global lock.
Thread-safe routines might be inefficient. For example, a UNIX stdio package that is thread safe might still block all threads in the process while waiting to read or write data to a file.
Routines such as localtime(3) or strtok(), which traditionally rely on static storage, can be made thread safe by using thread-specific data instead of static variables. This prevents corruption and avoids the overhead of synchronization. However, using thread-specific data is not without its own cost, and it is not always the best solution. Using an alternate, reentrant version of the routine, such as the POSIX strtok_r() interface, is preferable.
Previous | Next | Contents | Index |
privacy and legal statement | ||
6101PRO_007.HTML |