Guide to DECthreads

Document revision date: 19 July 1999

Guide to DECthreads

Contents

Index

Chapter 3
Programming with Threads

This chapter discusses programming disciplines that you should follow as you use DECthreads routines in your programs. Pertinent examples include programming for asynchronous execution, choosing a synchronization mechanism, avoiding priority scheduling problems, making code thread safe, and working with code that is not thread safe.

3.1 Designing Code for Asynchronous Execution

When programming with threads, always keep in mind that the execution of a thread is inherently asynchronous with respect to other threads running the system (or in the process).

In short, there is no guarantee of when a thread will start. It can start immediately or not for a significant period of time, depending on the priority of the thread in relation to other threads that are currently running. When a thread will start can also depend on the behavior of other processes, as well as on other threaded subsystems within the current process.

You cannot depend upon any synchronization between two threads unless you explicitly code that synchronization into your program using one of the following:

Mutexes
A properly tested application predicate loop on a condition variable
A call to join with a thread you expect to terminate
An equivalent platform-dependent programming construct, such as VAX interlocked instructions or Alpha load locked/store conditional sequences

Some implementations of threads operate by context-switching threads in user mode, within a single operating system process. Context switches between such threads occur only at relatively determinate times, such as when you make a blocking call to the threads library or when a timeslice interrupt occurs. This type of threading library might be termed "slightly asynchronous," because such a library tolerates many classes of errors in your application.

Systems that support kernel threads are less "forgiving" because context switches between threads can occur more frequently and for less deterministic reasons. Systems that allow threads within a single process to run simultaneously on multiple processors are even less forgiving.

The following subsections present examples of programming errors.

3.1.1 Avoid Passing Stack Local Data

Avoid creating a thread with an argument that points to stack local data, or to global or static data that is serially reused for a sequence of threads.

Specifically, the thread started with a pointer to stack local data may not start until the creating thread's routine has returned, and the storage may have been changed by other calls. The thread started with a pointer to global or static data may not start until the storage has been reused to create another thread.

3.1.2 Initialize DECthreads Objects Before Thread Creation

Initialize DECthreads objects (such as mutexes) or global data that a thread uses before creating that thread.

On slightly asynchronous systems this is often safe, because the thread will probably not run until the creator blocks. Thus, the error can go undetected initially. On another system (or in a later release of the operating system) that supports kernel threading, the created thread may run immediately, before the data has been initialized. This can lead to failures that are difficult to detect. Note that a thread may run to completion, before the call that created it returns to the creator. The system load may affect the timing as well.

Before your program creates a thread, it should set up all requirements that the new thread needs in order to execute. For example, if your program must set the new thread's scheduling parameters, do so with attributes objects when you create it, rather than trying to use pthread_setschedparam() or other routines afterwards. To set global data for the new thread or to create synchronization objects, do so before you create the thread, else set them in a pthread_once() initialization routine that is called from each thread.

3.1.3 Do Not Use Scheduling As Synchronization

Avoid using scheduling policy and scheduling priority attributes of threads as a synchronization mechanism.

In a uniprocessor system, only one thread can run at a time, and when a higher-priority (real-time policy) thread becomes runnable, it immediately preempts a lower-priority running thread. Therefore, a thread running at higher priority might erroneously be presumed not to need a mutex to access shared data.

On a multiprocessor system, higher- and lower-priority threads are likely to run at the same time. Situations can even arise where higher-priority threads are waiting to run while the threads that are running have a lower priority.

Regardless of whether your code will run only on a uniprocessor implementation, never try to use scheduling as a synchronization mechanism. Even on a uniprocessor system, your SCHED_FIFO thread can become blocked on a mutex (perhaps in a called library routine), on an I/O operation, or even a page fault. Any of these might allow a lower priority thread to run.

3.2 Memory Synchronization Between Threads

Your multithreaded program must ensure that access to data shared between threads is synchronized.

The POSIX.1c standard requires that, when calling the following routines, a thread synchronizes its memory access with respect to other threads:

fork() pthread_cond_signal()

pthread_create() pthread_cond_broadcast()

pthread_join() sem_post()

pthread_mutex_lock() sem_trywait()

pthread_mutex_trylock() sem_wait()

pthread_mutex_unlock() wait()

pthread_cond_wait() waitpid()

pthread_cond_timedwait()

If a call to one of these routines returns an error, synchronization is not guaranteed. For example, an unsuccessful call to pthread_mutex_trylock() does not necessarily provide actual synchronization.

Synchronization is a "protocol" among cooperating threads, not a single operation. That is, unlocking a mutex does not guarantee memory synchronization with all other threads---only with threads that later perform some synchronization operation themselves, such as locking a mutex.

3.3 Using Shared Memory

Most threads do not operate independently. They cooperate to accomplish a task, and cooperation requires communication. There are many ways that threads can communicate, and which method is most appropriate depends on the task.

Threads that cooperate only rarely (for example, a boss thread that only sends off a request for workers to do long tasks) may be satisfied with a relatively slow form of communication. Threads that must cooperate more closely (for example, a set of threads performing a parallelized matrix operation) need fast communication---maybe even to the extent of using machine-specific hardware operations.

Most mechanisms for thread communication involve the use of shared memory, exploiting the fact that all threads within a process share their full address space. Although all addresses are shared, there are three kinds of memory that are characteristically used for communication. The following sections describe the scope (or, the range of locations in the program where code can access the memory) and lifetime (or, the length of time the memory exists) of each of the three types of memory.

3.3.1 Using Static Memory

Static memory is allocated by the language compiler when it translates source code, so the scope is controlled by the rules of the compiler. For example, in the C language, a variable declared as extern can be accessed anywhere, and a static variable can be referenced within the source module or routine, depending on where it is declared.

In this discussion, static memory is not the same as the C language static storage class. Rather, static memory refers to any variable that is permanently allocated at a particular address for the life of the program.

It is appropriate to use static memory in your multithreaded program when you know that only one instance of an object exists throughout the application. For example, if you want to keep a list of active contexts or a mutex to control some shared resource, you would not want individual threads to have their own copies of that data.

The scope of static memory depends on your programming language's scoping rules. The lifetime of static memory is the life of the program.

3.3.2 Using Stack Memory

Stack memory is allocated by code generated by the language compiler at run time, generally when a routine is initially called. When the program returns from the routine, the storage ceases to be valid (although the addresses still exist and might be accessible).

Generally, the storage is valid while the routine runs, and the actual address can be calculated and passed to other threads; however, this depends on programming language rules. If you pass the address of stack memory to another thread, you must ensure that all other threads are finished processing that data before the routine returns; otherwise the stack will be cleared, and values might be altered by subsequent calls. The other threads will not be able to determine that this has happened, and erroneous behavior will result.

The scope of stack memory is the routine or a block within the routine. The lifetime is no longer than the time during which the routine executes.

3.3.3 Using Dynamic Memory

Dynamic memory is allocated by the program as a result of a call to some memory management routine (for example, the C language run-time routine malloc() or the OpenVMS common run-time routine LIB$GET_VM).

Dynamic memory is referenced through pointer variables. Although the pointer variables are scoped depending on their declaration, the dynamic memory itself has no intrinsic scope or lifetime. It can be accessed from any routine or thread that is given its address and will exist until explicitly made free. In a language supporting automatic garbage collection, it will exist until the run-time system detects that there are no references to it. (If your language supports garbage collection, be sure the garbage collector is thread safe.)

The scope of dynamic memory is anywhere a pointer containing the address can be referenced. The lifetime is from allocation to deallocation.

Typically dynamic memory is appropriate to manage persistent context. For example, in a thread-reentrant routine that is called multiple times to return a stream of information (such as to list all active connections to a server or to return a list of users), using dynamic memory allows the program to create multiple contexts that are independent of all the program's threads. Thus, multiple threads could share a given context, or a single thread could have more than one context.

3.4 Managing a Thread's Stack

For each thread created by your program, DECthreads sets a default stack size that is acceptable to most applications. You can also set the stacksize attribute in a thread attributes object, to specify the stack size needed by the next thread created.

This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.

Most compilers on Compaq VAX based systems do not probe the stack. Portable code that supports threads should use as little stack memory as practical.

Most compilers on Compaq Alpha based systems generate code in the procedure prologue that probes the stack, ensuring there is enough space for the procedure to run.

3.4.1 Sizing the Stack

To determine the optimal size of a thread's stack, multiply the largest number of nested subroutine calls by the size of the call frames and local variables. Add to that number an extra amount of memory to accommodate interrupts. Determining this figure is difficult because stack frames vary in size and because it might not be possible to estimate the depth of library routine call frames.

You can also run your program using a profiling tool that measures actual stack use. This is commonly done by "poisoning" the stack before it is used by writing a distinctive pattern, and then checking for that pattern after the thread completes. Remember: Use of profiling monitoring tools typically increases the amount of stack memory that your program uses.

3.4.2 Using a Stack Guard Area

By default, at the overflow end of each thread's stack DECthreads allocates a guard area, or a region of no-access memory. A guard area can help a multithreaded program detect overflow of a thread's stack. When the thread attempts to access a memory location within this region, a memory addressing violation occurs.

For a thread that allocates large data structures on the stack, create that thread using a thread attributes object in which a large guardsize attribute value has been set. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.

The low-level memory regions that form a stack guard region are also known as guard pages.

3.4.3 Diagnosing Stack Overflow Errors

A process can produce a memory access violation (or bus error or segmentation fault) when it overflows its stack. As a first step in debugging this behavior, it is often necessary to run the program under the control of your system's debugger to determine which routine's stack has overflowed. However, if the debugger shares resources with the target process (as under OpenVMS), perhaps allocating its own data objects on the target process's stack, the debugger might not operate properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger.

For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it could hang.

If a thread receives a memory access exception during a routine call or when accessing a local variable, increase the size of the thread's stack. Of course, not all memory access violations indicate a stack overflow.

To set the stacksize attribute in a thread attributes object, use the pthread_attr_setstacksize() routine. (See Section 2.3.2.4 for more information.)

3.5 Scheduling Issues

There are programming issues that are unique to the scheduling attributes of threads.

3.5.1 Real-Time Scheduling

Use care when writing code that uses real-time scheduling to control the priority of threads:

Review Section 3.1. Scheduling of threads is not the same as synchronizing of threads.
Giving threads higher priority does not necessarily make your code run faster. Real-time priority adds overhead that can slow a program down, especially when interfacing with other libraries. For example, a higher-priority thread that polls for keyboard input may block work being done by other threads.
Watch for pitfalls like priority inversion. It is best to avoid relying on real-time scheduling, except where necessary to meet design goals. On the other hand, most systems that interact with external devices have some real-time aspect.

3.5.2 Priority Inversion

Priority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes.

You can address the phenomenon of priority inversion as follows:

To avoid priority inversion, associate a priority (at least as high as the highest-priority thread that will use it) with each resource and force any thread using that object to first increase its priority to that associated with the object.
To minimize the chance that an occurrence of priority inversion will cause a complete blockage of higher-priority threads, use the (default) throughput scheduling policy. The throughput scheduling policy allows even low-priority threads to execute eventually and to release the resources they hold. The FIFO and RR scheduling policies do not provide for resumption of the low-priority thread if the middle-priority thread executes indefinitely.

3.5.3 Dependencies Among Scheduling Attributes and Contention Scope

On DIGITAL UNIX systems, to use high (real-time) thread scheduling priorities, a thread with system contention scope must run in a process with sufficient real-time scheduling privileges. On the other hand, a thread with process contention scope has access to all levels of priority without requiring special real-time scheduling privileges.

Due to this, for a process that is not privileged, when a thread with a high priority and with process contention scope attempts to create another thread with system contention scope, the creation will fail if the created thread's attributes object specifies to inherit the creating thread's scheduling policy and priority.

3.6 Using Synchronization Objects

The following sections discuss how to determine when to use a mutex versus a condition variable, and how to use mutexes to prevent two erroneous behaviors that are common in multithreaded programs: race conditions and deadlocks.

Also discussed is why you should signal a condition variable with the associated mutex locked.

3.6.1 Distinguishing Proper Usage of Mutexes and Condition Variables

Use a mutex for tasks with fine granularity. Examples of a "fine-grained" task are those that serialize access to shared memory or make simple modifications to shared memory. This typically corresponds to a critical section of a few program statements or less.

Mutex waits are not interruptible. Threads waiting to acquire a mutex cannot be alerted or canceled.

Do not use a condition variable to protect access to data. Rather, use it to wait for data to assume a desired state. Always use a condition variable with a mutex that protects the shared data. Condition variable waits are interruptible.

See Section 2.4.1 and Section 2.4.2 for more information about mutexes and condition variables.

3.6.2 Avoiding Race Conditions

A race condition occurs when two or more threads perform an operation, and the result of the operation depends on unpredictable timing factors; specifically, when each thread executes and waits and when each thread completes the operation.

For example, if two threads execute routines and each increments the same variable (such as x = x + 1), the variable could be incremented twice and one of the threads could use the wrong value. For example:

Thread A increments variable x.
Thread A is interrupted (or blocked, or scheduled off), and thread B is started.
Thread B starts and increments variable x.
Thread B is interrupted (or blocked, or scheduled off), and thread A is started.
Thread A checks the value of x and performs an action based on that value.
The value of x differs from when thread A incremented it, and the program's behavior is incorrect.

Race conditions result from lack of (or ineffectual) synchronization. To avoid race conditions, ensure that any variable modified by more than one thread has only one mutex associated with it, and ensure that all accesses to the variable are made after acquiring that mutex.

See Section 3.6.4 for another example of a race condition.

3.6.3 Avoiding Deadlocks

A deadlock occurs when a thread holding a resource is waiting for a resource held by another thread, while that thread is also waiting for the first thread's resource. Any number of threads can be involved in a deadlock if there is at least one resource per thread. A thread can deadlock on itself. Other threads can also become blocked waiting for resources involved in the deadlock.

Following are two techniques you can use to avoid deadlocks:

Use sequence numbers with fast mutexes. Associate a sequence number with each mutex and acquire mutexes in sequence. Never attempt to acquire a mutex with a sequence number lower than that of a mutex the thread already holds.
If a thread needs to acquire a mutex with a lower sequence number, it must first release all mutexes with a higher sequence number (after ensuring that the protected data is in a consistent state).
Use a recursive mutex. This technique is useful when a thread needs to acquire the same mutex more than once before releasing it. This technique can help prevent a thread from deadlocking on itself.

Contents

Index

privacy and legal statement

6101PRO_006.HTML

fork()	pthread_cond_signal()
pthread_create()	pthread_cond_broadcast()
pthread_join()	sem_post()
pthread_mutex_lock()	sem_trywait()
pthread_mutex_trylock()	sem_wait()
pthread_mutex_unlock()	wait()
pthread_cond_wait()	waitpid()
pthread_cond_timedwait()

Guide to DECthreads

Chapter 3Programming with Threads

3.5.2 Priority Inversion

3.5.3 Dependencies Among Scheduling Attributes and Contention Scope

Chapter 3
Programming with Threads