Document revision date: 19 July 1999
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Debugger Manual


Previous Contents Index


Chapter 17
Debugging Tasking Programs

This chapter describes features of the debugger that are specific to multithread programs (also called tasking programs). Tasking programs consist of multiple tasks, or threads, executing concurrently in a single process. Within the debugger, the term task denotes such a flow of control regardless of the language or implementation. The debugger's tasking support applies to all such programs. These programs include the following:

Note

Within the debugger, the terms task and thread are synonyms.

When you are debugging programs linked with PTHREAD$RTL Version 7.1 or greater, you can directly access the DECthread debugger with the PTHREAD command.

In this chapter, any language-specific information or information specific to DECthreads is identified as such. Section 17.1 provides a cross-reference between DECthreads terminology and Ada tasking terminology.

The features described in this chapter enable you to perform functions such as:

When using these features, remember that the debugger might alter the behavior of a tasking program from run to run. For example, while you are suspending execution of the currently active task at a breakpoint, the delivery of an asynchronous system trap (AST) or a POSIX signal as some input/output (I/O) completes might make some other task eligible to run as soon as you allow execution to continue.

For more information about DECthreads or POSIX threads, see the Guide to DECthreads. For more information about Ada tasks, see the DEC Ada documentation.

The debugging of multiprocess programs (programs that run in more than one process) is described in Chapter 15.

17.1 Comparison of DECthreads and Ada Terminology

Table 17-1 compares DECthreads and Ada terminology and concepts.

Table 17-1 Comparison of DECthreads and Ada Terminology
DECthreads Terminology Ada Terminology Description
Thread Task The flow of control within a process
Thread object Task object The data item that represents the flow of control
Object name or expression Task name or expression The data item that represents the flow of control
Start routine Task body The code that is executed by the flow of control
Not applicable Master task A parent flow of control
Not applicable Dependent task A child flow of control that is controlled by some parent
Synchronization object (mutex, condition variable) Rendezvous construct such as an entry call or accept statement Method of synchronizing flows of control
Scheduling policy and scheduling priority Task priority Method of scheduling execution
Alert operation Abort statement Method of canceling a flow of control
Thread state Task state Execution state (waiting, ready, running, terminated)
Thread creation attribute (priority, scheduling policy, and so on) Pragma Attributes of the parallel entity

17.2 Sample Tasking Programs

The following sections present sample tasking programs with common errors that you might encounter when debugging tasking programs:

Some other examples in this chapter are derived from these programs.

17.2.1 Sample C Multithread Program

Example 17-1 is a multithread C program that shows incorrect use of condition variables, which results in blocking.

Explanatory notes are included after the example. Following these notes are instructions showing how to use the debugger to diagnose the blocking by controlling the relative execution of the threads.

In Example 17-1, the initial thread creates two worker threads that do some computational work. After the worker threads are created, a SHOW TASK/ALL command will show three tasks, each corresponding to a thread ( Section 17.4 explains how to use the SHOW TASK command).

In Example 17-1, a synchronization point (a condition wait) has been placed in the workers' path at line 3893. (The comment starting at line 3877 indicates that a straight call such as this one is incorrect programming and shows the correct code.)

When the program executes, the worker threads are busy computing when the initial thread broadcasts on the condition variable. The first thread to wait on the condition variable detects the initial thread's broadcast and clears it, which leaves any remaining threads stranded. Execution is blocked and the program cannot terminate.

Example 17-1 Sample C Multithread Program

3777  /* DEFINES  */ 
3778  #define NUM_WORKERS 2           /* Number of worker threads    */ 
3779 
3780  /* MACROS                                                      */ 
3781  #define check(status,string) \
3782      if (status == -1) perror (string); \
3783 
3784  /* GLOBALS                                                     */ 
3785  int              cv_pred1;     /* Condition Variable predicate */ 
3786  pthread_mutex_t  cv_mutex;     /* Condition Variable mutex     */ 
3787  pthread_cond_t   cv;           /* Condition Variable           */ 
3788  pthread_mutex_t  print_mutex;  /* Print mutex                  */ 
3799 
3790  /* ROUTINES                                                    */ 
3791  static pthread_startroutine_t 
3792  worker_routine (pthread_addr_t  arg); 
3793 
3794  main () 
3795     { 
3796     pthread_t  threads[NUM_WORKERS];  /* Worker threads         */ 
3787     int        status;                /* Return statuses        */ 
3798     int        exit;                  /* Join exit status       */ 
3799     int        result;                /* Join result value      */ 
3800     int        i;                     /* Loop index             */ 
3801 
3802     /* Initialize mutexes                                       */ 
3803     status = pthread_mutex_init (&cv_mutex, pthread_mutexattr_default); 
3804     check (status, "cv_mutex initialization bad status"); 
3805     status = pthread_mutex_init (&print_mutex, pthread_mutexattr_default); 
3806     check (status, "print_mutex intialization bad status"); 
3807 
3808     /* Initialize condition variable                            */ 
3809     status = pthread_cond_init (&cv, pthread_condattr_default); 
3810     check (status, "cv condition init bad status"); 
3811 
3812     /* Initialize condition variable predicate.                 */ 
3813     cv_pred1 = 1;                                            (1)
3814 
3815     /* Create worker threads                                    */ 
3816     for (i = 0; i < NUM_WORKERS; i++) {                      (2)
3817         status = pthread_create ( 
3818                         &threads[i], 
3819                         pthread_attr_default, 
3820                         worker_routine, 
3821                         0); 
3822         check (status, "threads create bad status"); 
3823         } 
3824 
3825     /* Set cv_pred1 to false; do this inside the lock to insure visibility. */ 
3826 
3827     status = pthread_mutex_lock (&cv_mutex); 
3828     check (status, "cv_mutex lock bad status"); 
3829 
3830     cv_pred1 = 0;                                            (3)
3831 
3832     status = pthread_mutex_unlock (&cv_mutex); 
3833     check (status, "cv_mutex unlock bad status"); 
3834 
3835     /* Broadcast. */ 
3836     status = pthread_cond_broadcast (&cv);                   (4)
3837     check (status, "cv broadcast bad status"); 
3838 
3839     /* Attempt to join both of the worker threads. */ 
3840     for (i = 0; i < NUM_WORKERS; i++) {                      (5)
3841         exit = pthread_join (threads[i], (pthread_addr_t*)&result); 
3842         check (exit, "threads join bad status"); 
3843         } 
3844     } 
3845 
3846  static pthread_startroutine_t 
3847  worker_routine(arg) 
3848     pthread_addr_t   arg;                                    (6)
3849     { 
3850     int   sum; 
3851     int   iterations; 
3852     int   count; 
3853     int   status; 
3854 
3855     /* Do many calculations                        */ 
3856     for (iterations = 1; iterations < 10001; iterations++) { 
3857         sum = 1; 
3858         for (count = 1; count < 10001; count++) { 
3859             sum = sum + count; 
3860             } 
3861         } 
3862 
3863     /* Printf may not be reentrant, so allow 1 thread at a time */ 
3864 
3865     status = pthread_mutex_lock (&print_mutex); 
3866     check (status, "print_mutex lock bad status"); 
3867     printf (" The sum is %d \n", sum); 
3868     status = pthread_mutex_unlock (&print_mutex); 
3869     check (status, "print_mutex unlock bad status"); 
3870 
3871     /* Lock the mutex associated with this condition variable. pthread_cond_wait will */ 
3872     /* unlock the mutex if the thread blocks on the condition variable.               */ 
3873 
3874     status = pthread_mutex_lock (&cv_mutex); 
3875     check (status, "cv_mutex lock bad status"); 
3876 
3877     /* In the next statement, the correct condition-wait syntax would be to loop      */ 
3878     /* around the condition-wait call, checking the predicate associated with the     */ 
3879     /* condition variable.  This would guard against condition waiting on a condition */ 
3880     /* variable that may have already been broadcast upon, as well as spurious wake   */ 
3881     /* ups. Execution would resume when the thread is woken AND the predicate is      */ 
3882     /* false.  The call would look like this:                                         */ 
3883     /*                                                                                */ 
3884     /*    while (cv_pred1) {                                                          */ 
3885     /*      status = pthread_cond_wait (&cv, &cv_mutex);                              */ 
3886     /*      check (status, "cv condition wait bad status");                           */ 
3887     /*    }                                                                           */ 
3888     /*                                                                                */ 
3888     /* A straight call, as used in the following code, might cause a thread to       */ 
3890     /* wake up when it should not (spurious) or become permanently blocked, as        */ 
3891     /* should one of the worker threads here.                                         */ 
3892 
3893     status = pthread_cond_wait (&cv, &cv_mutex);             (7)
3894     check (status, "cv condition wait bad status"); 
3895 
3896     /* While blocking in the condition wait, the routine lets go of the mutex, but    */ 
3897     /* it retrieves it upon return.                                                   */ 
3898 
3899     status = pthread_mutex_unlock (&cv_mutex); 
3900     check (status, "cv_mutex unlock bad status"); 
3901 
3902     return (int)arg; 
3903     } 

Key to Example 17-1:

  1. The first few statements of main() initialize the synchronization objects used by the threads, as well as the predicate that is to be associated with the condition variable. The synchronization objects are initialized with the default attributes. The condition variable predicate is initialized such that a thread that is looping on it will continue to loop. At this point in the program, a SHOW TASK/ALL display lists %TASK 1.
  2. The worker threads %TASK 2 and %TASK 3 are created. Here the created threads execute the same start routine (worker_routine) and can also reuse the same call to pthread_create with a slight change to store the different thread IDs. The threads are created using the default attributes and are passed an argument that is not used in this example.
  3. The predicate associated with the condition variable is cleared in preparation to broadcast. This ensures that any thread awaking off the condition variable has received a valid wake-up and not a spurious one. Clearing the predicate also prevents any new arrivals from waiting on the condition variable because it has been broadcast or signaled upon. (The desired effect depends on correct coding being used for the condition wait call at line 3893, which is not the case in this example.)
  4. The initial thread issues the broadcast call almost immediately, so that none of the worker threads should yet be at the condition wait. A broadcast should wake any threads currently waiting on the condition variable.
    As the programmer, you should ensure that a broadcast is seen by either by ensuring that all threads are waiting on the condition variable at the time of broadcast or ensuring that an associated predicate is used to flag that the broadcast has already happened. (These measures have been left out of this example on purpose.)
  5. The initial thread attempts to join with the worker threads to ensure that they exited properly.
  6. When the worker threads execute worker_routine, they spend time doing many computations. This allows the initial thread to broadcast on the condition variable before either of the worker threads is waiting on it.
  7. The worker threads then proceed to execute a pthread_cond_wait call by performing locks around the call as required. It is here that both worker threads will block, having missed the broadcast. A SHOW TASK/ALL command entered at this point will show both of the worker threads waiting on a condition variable. (After the program is deadlocked in this way, you must press Ctrl/C to return control to the debugger.)

The debugger enables you to control the relative execution of threads to diagnose problems of the kind shown in Example 17-1. In this case, you can suspend the execution of the initial thread and let the worker threads complete their computations so that they will be waiting on the condition variable at the time of broadcast. The following procedure explains how:

  1. At the start of the debugging session, set a breakpoint on line 3836 to suspend execution of the initial thread just before broadcast.
  2. Enter the GO command to execute the initial thread and create the worker threads.
  3. At this breakpoint, which causes the execution of all threads to be suspended, put the initial thread on hold with the SET TASK/HOLD %TASK 1 command.
  4. Enter the GO command to let the worker threads continue execution. The initial thread is on hold and cannot execute.
  5. When the worker threads block on the condition variable, press Ctrl/C to return control to the debugger at that point. A SHOW TASK/ALL command should indicate that both worker threads are suspended in a condition wait substate. (If not, enter GO to let the worker threads execute, press Ctrl/C, and enter SHOW TASK/ALL, repeating the sequence until both worker threads are in a condition wait substate.)
  6. Enter the SET TASK/NOHOLD %TASK command 1 and then the GO command to allow the initial thread to resume execution and broadcast. This will enable the worker threads to join and terminate properly.

17.2.2 Sample Ada Tasking Program

Example 17-2 demonstrates a number of common errors that you may encounter when debugging tasking programs. The calls to procedure BREAK in the example mark points of interest where breakpoints could be set and the state of each task observed. If you ran the example under debugger control, you could enter the following commands to set breakpoints at each call to the procedure BREAK and display the current state of each task:


DBG> SET BREAK %LINE  46 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE  71 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE  76 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE  92 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE 100 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE 104 DO (SHOW TASK/ALL)
DBG> SET BREAK %LINE 120 DO (SHOW TASK/ALL)

The program creates four tasks:

Example 17-2 Sample Ada Tasking Program

 1 -- Tasking program that demonstrates various tasking conditions. 
 2 
 3 package TASK_EXAMPLE_PKG is 
 4    procedure BREAK; 
 5 end; 
 6 
 7 package body TASK_EXAMPLE_PKG is 
 8    procedure BREAK is 
 9    begin 
 10      null; 
 11   end; 
 12 end; 
 13 
 14 
 15 with TEXT_IO; use TEXT_IO; 
 16 with TASK_EXAMPLE_PKG; use TASK_EXAMPLE_PKG; 
 17 procedure TASK_EXAMPLE is (1)  
 18 
 19    pragma TIME_SLICE(0.0); -- Disable time slicing. (2)
 20 
 21    task type FATHER_TYPE is 
 22       entry START; 
 23       entry RENDEZVOUS; 
 24       entry BOGUS; -- Never accepted, caller deadlocks. 
 25    end FATHER_TYPE; 
 26 
 27    FATHER : FATHER_TYPE; (3)
 28 
 29    task body FATHER_TYPE is 
 30       SOME_ERROR : exception; 
 31 
 32       task CHILD is  (4)
 33          entry E; 
 34       end CHILD; 
 35 
 36       task body CHILD is 
 37       begin 
 38          FATHER_TYPE.BOGUS;   -- Deadlocks on call to its parent 
 39       end CHILD;              -- (parent does not have an accept 
 40                               -- statement for entry BOGUS). Whenever 
 41                               -- a task-type name (here, FATHER_TYPE) 
 42                               -- is used within a task body, the 
 43                               -- name designates the task currently 
 44                               -- executing the body. 
 45    begin -- (of FATHER_TYPE body)  
 46 
 47       accept START do    
 48          BREAK;   -- Main program is waiting for this rendezvous to 
 49                   -- complete; CHILD is suspended when it calls the 
 50                   -- entry BOGUS. 
 51          null; 
 52       end START; 
 53 
 54       PUT_LINE("FATHER is now active and"); (5)  
 55       PUT_LINE("is going to rendezvous with main program."); 
 56 
 57       for I in 1..2 loop 
 58          select 
 59             accept RENDEZVOUS do 
 60                PUT_LINE("FATHER now in rendezvous with main program"); 
 61             end RENDEZVOUS; 
 62          or 
 63             terminate; 
 64          end select; 
 65 
 66          if I = 2 then 
 67             raise SOME_ERROR; 
 68          end if; 
 69       end loop; 
 70 
 71    exception   
 72       when OTHERS => 
 73          BREAK;   -- CHILD is suspended on entry call to BOGUS. 
 74        -- Main program is going to delay while FATHER 
 75                   -- terminates. 
 76                   -- MOTHER is ready to begin executing. 
 77          abort CHILD; 
 78          BREAK;   -- CHILD is now abnormal due to the abort statement. 
 79 
 80          raise; -- SOME_ERROR exception terminates FATHER. 
 81    end FATHER_TYPE; 
 82 
 83 begin    -- (of TASK_EXAMPLE)  (6)  
 84 
 85    declare 
 86       task MOTHER is  (7)
 87          entry START; 
 88          pragma PRIORITY (6); 
 89       end MOTHER; 
 90 
 91       task body MOTHER is 
 92       begin 
 93          accept START; 
 94          BREAK;   -- At this point, the main program is waiting for 
 95                   -- its dependents (FATHER and MOTHER) to terminate. 
 96                   -- FATHER is terminated. 
 97          null; 
 98       end MOTHER; 
 99    begin  (8)
100 
101 
102       BREAK;   -- FATHER is suspended at accept start. 
103                -- CHILD is suspended in its deadlock. 
104                -- MOTHER has activated and ready to begin executing. 
105       FATHER.START;  (9)        
106       BREAK;   -- FATHER is suspended at its 'select or terminate' 
107                -- statement. 
108 
109 
110       FATHER.RENDEZVOUS;  
111       FATHER.RENDEZVOUS;  (10)
112       loop  (11)               
113          -- This loop causes the main program to busy wait 
114          -- for the termination of FATHER, so that FATHER 
115          -- can be observed in its terminated state. 
116          if FATHER'TERMINATED then 
117             exit; 
118          end if;       
119          delay 1.0; 
120       end loop; 
121 
122       BREAK;   -- FATHER has terminated by now with the unhandled 
123                -- exception SOME_ERROR. CHILD no longer exists 
124                -- because its master (FATHER) has terminated. Task 
125                -- MOTHER is ready. 
126       MOTHER.START; (12)     
127          -- The main program enters a wait-for-dependents state, 
128          -- so that MOTHER can finish executing. 
129    end; 
130 end TASK_EXAMPLE; (13)

Key to Example 17-2:

  1. After all of the Ada library packages are elaborated (in this case, TEXT_IO), the main program is automatically called and begins to elaborate its declarative part (lines 18 through 82).
  2. To ensure repeatability from run to run, the example uses no time slicing (see Section 17.5.2). The 0.0 value for the pragma TIME_SLICE documents that the procedure TASK_EXAMPLE needs to have time slicing disabled.
    On VAX processors, time slicing is disabled if the pragma TIME_SLICE is omitted or is specified with a value of 0.0.
    On Alpha processors, pragma TIME_SLICE (0.0) must be used to disable time slicing.
  3. Task object FATHER is elaborated, and a task designated %TASK 2 is created. FATHER has no pragma PRIORITY, and thus assumes a default priority. FATHER (%TASK 2) is created in a suspended state and is not activated until the beginning of the statement part of the main program (line 83), in accordance with Ada rules. The elaboration of the task body on lines 29 through 81 defines the statements that tasks of type FATHER_TYPE will execute.
  4. Task FATHER declares a single task named CHILD (line 32). A single task represents both a task object and an anonymous task type. Task CHILD is not created or activated until FATHER is activated.
  5. The only source of asynchronous system traps (ASTs) is this series of TEXT_IO.PUT_LINE statements (I/O completion delivers ASTs).
  6. The task FATHER is activated while the main program waits. FATHER has no pragma PRIORITY and this assumes a default priority of 7. (See the DEC Ada Run-Time Reference Manual for the rules about default priorities.) FATHER's activation consists of the elaboration of lines 29 through 44.
    When task FATHER is activated, it waits while its task CHILD is activated and a task designated %TASK 3 is created. CHILD executes one entry call on line 38, and then deadlocks because the entry is never accepted (see Section 17.7.1).
    Because time slicing is disabled and there are no higher priority tasks to be run, FATHER will continue to execute past its activation until it is blocked at the ACCEPT statement at line 47.
  7. A single task, MOTHER, is defined, and a task designated %TASK 4 is created. The pragma PRIORITY gives MOTHER a priority of 6.
  8. The task MOTHER begins its activation and executes line 91. After MOTHER is activated, the main program (%TASK 1) is eligible to resume its execution. Because %TASK 1 has the default priority 7, which is higher than MOTHER's priority, the main program resumes execution.
  9. This is the first rendezvous the main program makes with task FATHER. After the rendezvous FATHER will suspend at the SELECT with TERMINATE statement at line 58.
  10. At the third rendezvous with FATHER, FATHER raises the exception SOME_ERROR on line 67. The handler on line 72 catches the exception, aborts the suspended CHILD task, and then reraises the exception; FATHER then terminates.
  11. A loop with a delay statement ensures that when control reaches line 122, FATHER has executed far enough to be terminated.
  12. This entry call ensures that MOTHER does not wait forever for its rendezvous on line 93. MOTHER executes the accept statement (which involves no other statements), the rendezvous is completed, and MOTHER is immediately switched off the processor at line 94 because its priority is only 6.
  13. After its rendezvous with MOTHER, the main program (%TASK 1) executes lines 127 through 129. At line 129, the main program must wait for all its dependent tasks to terminate. When the main program reaches line 129, the only nonterminated task is MOTHER (MOTHER cannot terminate until the null statement at line 97 has been executed). MOTHER finally executes to its completion at line 98. Now that all tasks are terminated, the main program completes its execution. The main program then returns and execution resumes with the command line interpreter.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
4538PRO_035.HTML