Updated: 11 December 1998 |
Guide to DECthreads
Previous | Contents | Index |
This chapter discusses programming disciplines that you should follow
as you use DECthreads routines in your programs. Pertinent examples
include programming for asynchronous execution, choosing a
synchronization mechanism, avoiding priority scheduling problems,
making code thread safe, and working with code that is not thread safe.
3.1 Designing Code for Asynchronous Execution
When programming with threads, always keep in mind that the execution of a thread is inherently asynchronous with respect to other threads running the system (or in the process).
In short, there is no guarantee of when a thread will start. It can start immediately or not for a significant period of time, depending on the priority of the thread in relation to other threads that are currently running. When a thread will start can also depend on the behavior of other processes, as well as on other threaded subsystems within the current process.
You cannot depend upon any synchronization between two threads unless you explicitly code that synchronization into your program using one of the following:
Some implementations of threads operate by context-switching threads in user mode, within a single operating system process. Context switches between such threads occur only at relatively determinate times, such as when you make a blocking call to the threads library or when a timeslice interrupt occurs. This type of threading library might be termed "slightly asynchronous," because such a library tolerates many classes of errors in your application.
Systems that support kernel threads are less "forgiving" because context switches between threads can occur more frequently and for less deterministic reasons. Systems that allow threads within a single process to run simultaneously on multiple processors are even less forgiving.
The following subsections present examples of programming errors.
3.1.1 Avoid Passing Stack Local Data
Avoid creating a thread with an argument that points to stack local data, or to global or static data that is serially reused for a sequence of threads.
Specifically, the thread started with a pointer to stack local data may
not start until the creating thread's routine has returned, and the
storage may have been changed by other calls. The thread started with a
pointer to global or static data may not start until the storage has
been reused to create another thread.
3.1.2 Initialize DECthreads Objects Before Thread Creation
Initialize DECthreads objects (such as mutexes) or global data that a thread uses before creating that thread.
On slightly asynchronous systems this is often safe, because the thread will probably not run until the creator blocks. Thus, the error can go undetected initially. On another system (or in a later release of the operating system) that supports kernel threading, the created thread may run immediately, before the data has been initialized. This can lead to failures that are difficult to detect. Note that a thread may run to completion, before the call that created it returns to the creator. The system load may affect the timing as well.
Before your program creates a thread, it should set up all requirements
that the new thread needs in order to execute. For example, if your
program must set the new thread's scheduling parameters, do so with
attributes objects when you create it, rather than trying to use
pthread_setschedparam() or other routines afterwards. To set
global data for the new thread or to create synchronization objects, do
so before you create the thread, else set them in a
pthread_once() initialization routine that is called from each
thread.
3.1.3 Do Not Use Scheduling As Synchronization
Avoid using scheduling policy and scheduling priority attributes of threads as a synchronization mechanism.
In a uniprocessor system, only one thread can run at a time, and when a higher-priority (real-time policy) thread becomes runnable, it immediately preempts a lower-priority running thread. Therefore, a thread running at higher priority might erroneously be presumed not to need a mutex to access shared data.
On a multiprocessor system, higher- and lower-priority threads are likely to run at the same time. Situations can even arise where higher-priority threads are waiting to run while the threads that are running have a lower priority.
Regardless of whether your code will run only on a uniprocessor implementation, never try to use scheduling as a synchronization mechanism. Even on a uniprocessor system, your SCHED_FIFO thread can become blocked on a mutex (perhaps in a called library routine), on an I/O operation, or even a page fault. Any of these might allow a lower priority thread to run.
3.2 Memory Synchronization Between Threads
Your multithreaded program must ensure that access to data shared
between threads is synchronized.
The POSIX.1c standard requires that, when calling the following routines, a thread synchronizes its memory access with respect to other threads:
fork() | pthread_cond_signal() |
pthread_create() | pthread_cond_broadcast() |
pthread_join() | sem_post() |
pthread_mutex_lock() | sem_trywait() |
pthread_mutex_trylock() | sem_wait() |
pthread_mutex_unlock() | wait() |
pthread_cond_wait() | waitpid() |
pthread_cond_timedwait() |
If a call to one of these routines returns an error, synchronization is not guaranteed. For example, an unsuccessful call to pthread_mutex_trylock() does not necessarily provide actual synchronization.
Synchronization is a "protocol" among cooperating threads,
not a single operation. That is, unlocking a mutex does not guarantee
memory synchronization with all other threads---only with threads that
later perform some synchronization operation themselves, such as
locking a mutex.
3.3 Using Shared Memory
Most threads do not operate independently. They cooperate to accomplish a task, and cooperation requires communication. There are many ways that threads can communicate, and which method is most appropriate depends on the task.
Threads that cooperate only rarely (for example, a boss thread that only sends off a request for workers to do long tasks) may be satisfied with a relatively slow form of communication. Threads that must cooperate more closely (for example, a set of threads performing a parallelized matrix operation) need fast communication---maybe even to the extent of using machine-specific hardware operations.
Most mechanisms for thread communication involve the use of shared
memory, exploiting the fact that all threads within a process share
their full address space. Although all addresses are shared, there are
three kinds of memory that are characteristically used for
communication. The following sections describe the scope (or, the range
of locations in the program where code can access the memory) and
lifetime (or, the length of time the memory exists) of each of the
three types of memory.
3.3.1 Using Static Memory
Static memory is allocated by the language compiler when it translates source code, so the scope is controlled by the rules of the compiler. For example, in the C language, a variable declared as extern can be accessed anywhere, and a static variable can be referenced within the source module or routine, depending on where it is declared.
In this discussion, static memory is not the same as the C language static storage class. Rather, static memory refers to any variable that is permanently allocated at a particular address for the life of the program.
It is appropriate to use static memory in your multithreaded program when you know that only one instance of an object exists throughout the application. For example, if you want to keep a list of active contexts or a mutex to control some shared resource, you would not want individual threads to have their own copies of that data.
The scope of static memory depends on your programming language's
scoping rules. The lifetime of static memory is the life of the program.
3.3.2 Using Stack Memory
Stack memory is allocated by code generated by the language compiler at run time, generally when a routine is initially called. When the program returns from the routine, the storage ceases to be valid (although the addresses still exist and might be accessible).
Generally, the storage is valid while the routine runs, and the actual address can be calculated and passed to other threads; however, this depends on programming language rules. If you pass the address of stack memory to another thread, you must ensure that all other threads are finished processing that data before the routine returns; otherwise the stack will be cleared, and values might be altered by subsequent calls. The other threads will not be able to determine that this has happened, and erroneous behavior will result.
The scope of stack memory is the routine or a block within the routine.
The lifetime is no longer than the time during which the routine
executes.
3.3.3 Using Dynamic Memory
Dynamic memory is allocated by the program as a result of a call to some memory management routine (for example, the C language run-time routine malloc() or the OpenVMS common run-time routine LIB$GET_VM).
Dynamic memory is referenced through pointer variables. Although the pointer variables are scoped depending on their declaration, the dynamic memory itself has no intrinsic scope or lifetime. It can be accessed from any routine or thread that is given its address and will exist until explicitly made free. In a language supporting automatic garbage collection, it will exist until the run-time system detects that there are no references to it. (If your language supports garbage collection, be sure the garbage collector is thread safe.)
The scope of dynamic memory is anywhere a pointer containing the address can be referenced. The lifetime is from allocation to deallocation.
Typically dynamic memory is appropriate to manage persistent context.
For example, in a thread-reentrant routine that is called multiple
times to return a stream of information (such as to list all active
connections to a server or to return a list of users), using dynamic
memory allows the program to create multiple contexts that are
independent of all the program's threads. Thus, multiple threads could
share a given context, or a single thread could have more than one
context.
3.4 Managing a Thread's Stack
For each thread created by your program, DECthreads sets a default stack size that is acceptable to most applications. You can also set the stacksize attribute in a thread attributes object, to specify the stack size needed by the next thread created.
This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.
Most compilers on Compaq VAX based systems do not probe the stack. Portable code that supports threads should use as little stack memory as practical.
Most compilers on Compaq Alpha based systems generate code in the
procedure prologue that probes the stack, ensuring there is enough
space for the procedure to run.
3.4.1 Sizing the Stack
To determine the optimal size of a thread's stack, multiply the largest number of nested subroutine calls by the size of the call frames and local variables. Add to that number an extra amount of memory to accommodate interrupts. Determining this figure is difficult because stack frames vary in size and because it might not be possible to estimate the depth of library routine call frames.
You can also run your program using a profiling tool that measures
actual stack use. This is commonly done by "poisoning" the
stack before it is used by writing a distinctive pattern, and then
checking for that pattern after the thread completes.
Remember: Use of profiling monitoring tools typically
increases the amount of stack memory that your program uses.
3.4.2 Using a Stack Guard Area
By default, at the overflow end of each thread's stack DECthreads allocates a guard area, or a region of no-access memory. A guard area can help a multithreaded program detect overflow of a thread's stack. When the thread attempts to access a memory location within this region, a memory addressing violation occurs.
For a thread that allocates large data structures on the stack, create that thread using a thread attributes object in which a large guardsize attribute value has been set. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.
The low-level memory regions that form a stack guard region are also
known as guard pages.
3.4.3 Diagnosing Stack Overflow Errors
A process can produce a memory access violation (or bus error or segmentation fault) when it overflows its stack. As a first step in debugging this behavior, it is often necessary to run the program under the control of your system's debugger to determine which routine's stack has overflowed. However, if the debugger shares resources with the target process (as under OpenVMS), perhaps allocating its own data objects on the target process's stack, the debugger might not operate properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger.
For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it could hang.
If a thread receives a memory access exception during a routine call or when accessing a local variable, increase the size of the thread's stack. Of course, not all memory access violations indicate a stack overflow.
To set the stacksize attribute in a thread attributes object, use the
pthread_attr_setstacksize() routine. (See Section 2.3.2.4 for
more information.)
3.5 Scheduling Issues
There are programming issues that are unique to the scheduling
attributes of threads.
3.5.1 Real-Time Scheduling
Use care when writing code that uses real-time scheduling to control the priority of threads:
Priority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes.
You can address the phenomenon of priority inversion as follows:
On DIGITAL UNIX systems, to use high (real-time) thread scheduling priorities, a thread with system contention scope must run in a process with sufficient real-time scheduling privileges. On the other hand, a thread with process contention scope has access to all levels of priority without requiring special real-time scheduling privileges.
Due to this, for a process that is not privileged, when a thread with a
high priority and with process contention scope attempts to create
another thread with system contention scope, the creation will fail if
the created thread's attributes object specifies to inherit the
creating thread's scheduling policy and priority.
3.6 Using Synchronization Objects
The following sections discuss how to determine when to use a mutex versus a condition variable, and how to use mutexes to prevent two erroneous behaviors that are common in multithreaded programs: race conditions and deadlocks.
Also discussed is why you should signal a condition variable with the
associated mutex locked.
3.6.1 Distinguishing Proper Usage of Mutexes and Condition Variables
Use a mutex for tasks with fine granularity. Examples of a "fine-grained" task are those that serialize access to shared memory or make simple modifications to shared memory. This typically corresponds to a critical section of a few program statements or less.
Mutex waits are not interruptible. Threads waiting to acquire a mutex cannot be alerted or canceled.
Do not use a condition variable to protect access to data. Rather, use it to wait for data to assume a desired state. Always use a condition variable with a mutex that protects the shared data. Condition variable waits are interruptible.
See Section 2.4.1 and Section 2.4.2 for more information about mutexes
and condition variables.
3.6.2 Avoiding Race Conditions
A race condition occurs when two or more threads perform an operation, and the result of the operation depends on unpredictable timing factors; specifically, when each thread executes and waits and when each thread completes the operation.
For example, if two threads execute routines and each increments the same variable (such as x = x + 1), the variable could be incremented twice and one of the threads could use the wrong value. For example:
Race conditions result from lack of (or ineffectual) synchronization. To avoid race conditions, ensure that any variable modified by more than one thread has only one mutex associated with it, and ensure that all accesses to the variable are made after acquiring that mutex.
See Section 3.6.4 for another example of a race condition.
3.6.3 Avoiding Deadlocks
A deadlock occurs when a thread holding a resource is waiting for a resource held by another thread, while that thread is also waiting for the first thread's resource. Any number of threads can be involved in a deadlock if there is at least one resource per thread. A thread can deadlock on itself. Other threads can also become blocked waiting for resources involved in the deadlock.
Following are two techniques you can use to avoid deadlocks:
Previous | Next | Contents | Index |
Copyright © Compaq Computer Corporation 1998. All rights reserved. Legal |
6101PRO_006.HTML
|