Document revision date: 19 July 1999
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Alpha Galaxy Guide


Previous Contents Index

19.2 SYS$CPU_TRANSITIONW

On Alpha systems, changes the current processing state of a CPU in the configure set or an unassigned CPU in a Galaxy configuration. This service completes synchronously; that is, it returns to the caller only after the final completion status of the operation is known.

In all other respects, $CPU_TRANSITIONW is identical to $CPU_TRANSITION. For all other information about the $CPU_TRANSITIONW service, refer to the description of $CPU_TRANSITION in Section 19.1.

This service accepts 64-bit addresses.

Format

SYS$CPU_TRANSITIONW tran_id, cpu_id, nodename, node_id, flags, efn, iosb, astadr_64, astprm_64


Chapter 20
Configuration Management Programming Interfaces

20.1 SYS$GETSYI

SYI$_GALAXY_PLATFORM returns 1 if you are running on a Galaxy platform, 0 if not.

SYI$_GALAXY_MEMBER returns 1 if you are a member of a Galaxy sharing community, 0 if not.

SYI$_GALAXY_ID returns the 128-bit Galaxy ID.

SYI$_PARTITION_ID returns the integer Galaxy partition ID.

SYI$_COMMUNITY_ID returns the integer Galaxy community ID.

SYI$_SCSNODE returns the ASCII string Galaxy instance name.


Chapter 21
OpenVMS Galaxy Device Drivers

This chapter describes the OpenVMS Alpha Version 7.2--1 direct-mapped DMA window changes for PCI drivers.

21.1 Direct-mapped DMA Window Changes

These changes are required to support the Galaxy Software architecture on OpenVMS and memory holes. The change involves moving the direct-mapped DMA window away from physical memory location 0. This chapter should provide enough background and information for you to update your driver.

Note that this chapter does not cover bus addressable pool (BAP).

21.2 How PCI Direct-mapped DMA Works Prior to OpenVMS V7.2

On all PCI based machines, the direct-mapped DMA window begins at (usually) 1Gb in PCI space, and covers physical memory beginning at 0 for 1Gb :

Figure 21-1 PCI-based DMA


Typically drivers compare their buffer addresses against the length of the window returned by calling IOC$NODE_DATA with the IOC$K_DIRECT_DMA_SIZE function code. This assumes that the window on the memory side starts at zero. Another popular method for determining whether map registers are necessary involves looking at MMG$GL_MAXPFN. This is also not likely to work correctly in OpenVMS Version 7.2--1.

For a much better picture and explanation

See the Writing OpenVMS Device Alpha Drivers in C book.

21.3 How PCI Direct-mapped DMA Works as of OpenVMS Version 7.2

Galaxy and memory hole considerations force OpenVMS to change the placement of the direct-mapped DMA window.

Figure 21-2 OpenVMS Version 7.2--1 DMA


It is unknown from the drivers perspective where in memory the base of the direct-mapped DMA window will be. Simply comparing a buffer address against the length of the window will no longer be sufficient to determine if a buffer is within the direct-mapped DMA window. Also, comparing against mmg$gl_maxpfn will no longer guarantee that all of pool is within the window. The correct cell to check is mmg$gl_max_node_pfn. Additionally, alignment concerns may require that a slightly different offset be incorporated into physical bus address calculations.

21.4 IOC$NODE_DATA Changes to Support Nonzero Direct-mapped DMA Windows

To alleviate this problem, new function codes have been added to ioc$node_data. Here is a list of all the codes relating to direct-mapped DMA, and a description of what the data means.
IOC$K_DIRECT_DMA_BASE This is the base address on the PCI side, or bus address. There is a synonym for this function code called IOC$K_DDMA_BASE_BA.
IOC$DIRECT_DMA_SIZE On non-Galaxy machines, this returns the size of the direct-mapped DMA window (in megabytes). On a system where the direct-mapped DMA window does not start at zero, the data returned is zero, implying no direct-mapped DMA windows exist.
IOC$K_DDMA_WIN_SIZE On all systems (as of x6jh) this will always return the size of the direct-mapped DMA window (in megabytes).
IOC$K_DIRECT_DMA_BASE_PA This is the base physical address in memory of the direct-mapped DMA window. It is probably closely related to mmg$gl_min_node_pfn, but may be slightly different due to alignment concerns.

The address returned with the IOC$K_DIRECT_DMA_BASE_PA code is necessary to compute the offset. (This usually used to be the 1Gb difference between the memory PA and the bus address.) The offset is defined as the signed difference between the base bus address and the base memory address. This is now not necessarily 1Gb.


Appendix A
OpenVMS Galaxy CPU Load Balancer Program

This appendix contains an example program of a privileged-code application that dynamically reassigns CPU resources among instances in an OpenVMS Galaxy.

A.1 CPU Load Balancer Overview

The OpenVMS Galaxy CPU Load Balancer program is a privileged application that dynamically reassigns CPU resources among instances in an OpenVMS Galaxy.

The program must be run on each participating instance. Each image will create, or map to, a small shared memory section and periodically post information regarding the depth of that instance's COM queues. Based upon running averages of this data, each instance will determine the most and the least busy instance. If these factors exist for a specified duration, the least busy instance having available secondary processors will reassign one of its processors to the most busy instance, thereby effectively balancing processor usage across the OpenVMS Galaxy. The program provides command line arguments to allow tuning of the load balancing algorithm. The program is admittedly shy on error handling.

This program uses the following OpenVMS Galaxy system services:
SYS$CPU_TRANSITION CPU reassignment
SYS$CRMPSC_GDZRO_64 Shared memory creation
SYS$SET_SYSTEM_EVENT OpenVMS Galaxy event notification
SYS$*_GALAXY_LOCK_* OpenVMS Galaxy locking

Because OpenVMS Galaxy resources are always reassigned via a "push" model, where only the owner instance can release its resources, one copy of this process must run on each instance in the OpenVMS Galaxy.

This program can be run only in an OpenVMS Version 7.2 or later multiple-instance Galaxy.

A.1.1 Required Privileges

The CMKRNL privilege is required to count CPU queues. The SHMEM privilege is required to map shared memory.

A.1.2 Build and Copy Instructions

Compile and link the example program as described below, or copy the precompiled image found in SYS$EXAMPLES:GCU$BALANCER.EXE to SYS$COMMON:[SYSEXE]GCU$BALANCER.EXE

If your OpenVMS Galaxy instances use individual system disks, you will need to do the above for each instance.

If you change the example program, compile and link it as follows:


$ CC GCU$BALANCER.C+SYS$LIBRARY:SYS$LIB_C/LIBRARY 
$ LINK/SYSEXE GCU$BALANCER 

A.1.3 Startup Options

You must establish a DCL command for this program. We have provided a sample command table file for this purpose. To install the new command, do the following:


$ SET COMMAND/TABLE=SYS$LIBRARY:DCLTABLES - 
/OUT=SYS$COMMON:[SYSLIB]DCLTABLES GCU$BALANCER.CLD 

This command inserts the new command definition into DCLTABLES.EXE in your common system directory. The new command tables will take effect when the system is rebooted. If you would like to avoid a reboot, do the following:


$ INSTALL REPLACE SYS$COMMON:[SYSLIB]DCLTABLES.EXE 

After this command, you will need to log out, then log back in to use the command from any active processes. Alternatively, if you would like to avoid logging out, do the following from each process you would like to run the balancer from:


$ SET COMMAND GCU$BALANCER.CLD 

Once your command has been established, you may use the various command line parameters to control the balancer algorithm.


$ CONFIGURE BALANCER{/STATISTICS} x y time 

Where: "x" is the number of load samples to take.
"y" is the number of queued processes required to trigger resource reassignment.
"time" is the delta time between load sampling.

The /STATISTICS qualifier causes the program to display a continuous status line. This is useful for tuning the parameters. This output is not visible if the balancer is run detached, as is the case if it is invoked via the GCU. It is intended to be used only when the balancer is invoked directly from DCL in a DECterm window.

For example: $ CONFIG BAL 3 1 00:00:05.00

Starts the balancer which samples the system load every 5 seconds. After 3 samples, if the instance has one or more processes in the COM queue, a resource (CPU) reassignment will occur, giving this instance another CPU.

A.1.4 Starting the Load Balancer from the GCU

The GCU provides a menu item for launching SYS$SYSTEM:GCU$BALANCER.EXE and a dialog for altering the balancer algorithm. These features will only work if the balancer image is properly installed as described the following paragraphs.

To use the GCU-resident balancer startup option, you must:

A.1.5 Shutdown Warning

In an OpenVMS Galaxy, no process may have shared memory mapped on an instance when it leaves the Galaxy---for example, during a shutdown. To stop the process if the GCU$BALANCER program is run from a SYSTEM UIC, you must modify SYS$MANAGER:SYSHUTDWN.COM. Processes in the SYSTEM UIC group are not terminated by SHUTDWN.COM when shutting down or rebooting OpenVMS. If a process still has shared memory mapped when an instance leaves the Galaxy, the instance will crash with a GLXSHUTSHMEM bugcheck.

To make this work, SYS$MANAGER:SYSHUTDWN.COM must stop the process as shown in the following example. Alternatively, the process can be run under a suitably privileged, non-SYSTEM UIC.


** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM 
** 
**    $! 
**    $! If the GCU$BALANCER image is running, stop it to release shmem. 
**    $! 
**    $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql") 
**    $ procid  = f$pid(ctx) 
**    $ if procid .NES. "" then $ stop/id='procid' 

Note that you could also use a "$ STOP GCU$BALANCER" statement.

A.2 Example Program


/* 
** COPYRIGHT (c) 1998 BY COMPAQ COMPUTER CORPORATION ALL RIGHTS RESERVED. 
** 
** THIS SOFTWARE IS FURNISHED UNDER A LICENSE AND MAY BE USED AND COPIED 
** ONLY  IN  ACCORDANCE  OF  THE  TERMS  OF  SUCH  LICENSE  AND WITH THE 
** INCLUSION OF THE ABOVE COPYRIGHT NOTICE. THIS SOFTWARE OR  ANY  OTHER 
** COPIES THEREOF MAY NOT BE PROVIDED OR OTHERWISE MADE AVAILABLE TO ANY 
** OTHER PERSON.  NO TITLE TO AND  OWNERSHIP OF THE  SOFTWARE IS  HEREBY 
** TRANSFERRED. 
** 
** THE INFORMATION IN THIS SOFTWARE IS  SUBJECT TO CHANGE WITHOUT NOTICE 
** AND  SHOULD  NOT  BE  CONSTRUED  AS A COMMITMENT BY COMPAQ COMPUTER 
** CORPORATION. 
** 
** COMPAQ ASSUMES NO RESPONSIBILITY FOR THE USE  OR  RELIABILITY OF ITS 
** SOFTWARE ON EQUIPMENT WHICH IS NOT SUPPLIED BY COMPAQ OR DIGITAL. 
** 
**===================================================================== 
** WARNING - This example is provided for instructional and demo 
**           purposes only.  The resulting program should not be 
**           run on systems which make use of soft-affinity 
**           features of OpenVMS, or while running applications 
**           which are tuned for precise processor configurations.  
**           We are continuing to explore enhancements such as this 
**           program which will be refined and integrated into 
**           future releases of OpenVMS. 
**===================================================================== 
** 
** GCU$BALANCER.C - OpenVMS Galaxy CPU Load Balancer. 
** 
** This is an example of a privileged application which dynamically 
** reassigns CPU resources among instances in an OpenVMS Galaxy.  The 
** program must be run on each participating instance.  Each image 
** will create, or map to, a small shared memory section and periodically 
** post information regarding the depth of that instances' COM queues. 
** Based upon running averages of this data, each instance will 
** determine the most, and least busy instance.  If these factors 
** exist for a specified duration, the least busy instance having 
** available secondary processors, will reassign one of its processors 
** to the most busy instance, thereby effectively balancing processor 
** usage across the OpenVMS Galaxy.  The program provides command line 
** arguments to allow tuning of the load balancing algorithm. 
** The program is admittedly shy on error handling. 
** 
** This program uses the following OpenVMS Galaxy system services: 
** 
**      SYS$CPU_TRANSITION   - CPU reassignment 
**      SYS$CRMPSC_GDZRO_64  - Shared memory creation 
**      SYS$SET_SYSTEM_EVENT - OpenVMS Galaxy event notification 
**      SYS$*_GALAXY_LOCK_*  - OpenVMS Galaxy locking 
** 
** Since OpenVMS Galaxy resources are always reassigned via a "push" 
** model, where only the owner instance can release its resources, 
** one copy of this process must run on each instance in the OpenVMS 
** Galaxy. 
** 
** ENVIRONMENT: OpenVMS V7.2 Multiple-instance Galaxy. 
** 
** REQUIRED PRIVILEGES:  CMKRNL required to count CPU queues 
**                       SHMEM  required to map shared memory 
** 
** BUILD/COPY INSTRUCTIONS: 
** 
** Compile and link the example program as described below, or copy the 
** precompiled image found in SYS$EXAMPLES:GCU$BALANCER.EXE to 
** SYS$COMMON:[SYSEXE]GCU$BALANCER.EXE 
** 
** If your OpenVMS Galaxy instances utilize individual system disks, you 
** will need to do the above for each instance. 
** 
** If you change the example program, compile and link it as follows: 
** 
**   $ CC GCU$BALANCER.C+SYS$LIBRARY:SYS$LIB_C/LIBRARY 
**   $ LINK/SYSEXE GCU$BALANCER 
** 
** STARTUP OPTIONS: 
** 
** You must establish a DCL command for this program.  We have provided a 
** sample command table file for this purpose.  To install the new command, 
** do the following: 
** 
**    $ SET COMMAND/TABLE=SYS$LIBRARY:DCLTABLES - 
**      /OUT=SYS$COMMON:[SYSLIB]DCLTABLES GCU$BALANCER.CLD 
** 
** This command inserts the new command definition into DCLTABLES.EXE 
** in your common system directory.  The new command tables will take 
** effect when the system is rebooted.  If you would like to avoid a 
** reboot, do the following: 
** 
**    $ INSTALL REPLACE SYS$COMMON:[SYSLIB]DCLTABLES.EXE 
** 
** After this command, you will need to log out, then log back in to 
** use the command from any active processes.  Alternatively, if you 
** would like to avoid logging out, do the following from each process 
** you would like to run the balancer from: 
** 
**    $ SET COMMAND GCU$BALANCER.CLD 
** 
** Once your command has been established, you may use the various 
** command line parameters to control the balancer algorithm. 
** 
**    $ CONFIGURE BALANCER{/STATISTICS} x y time 
** 
** Where: "x" is the number of load samples to take. 
**        "y" is the number of queued processes required to trigger 
**            resource reassignment. 
**        "time" is the delta time between load sampling. 
** 
** The /STATISTICS qualifier causes the program to display a 
** continuous status line.  This is useful for tuning the parameters. 
** This output is not visible if the balancer is run detached, as is 
** the case if it is invoked via the GCU.  It is intended to be used 
** only when the balancer is invoked directly from DCL in a DECterm 
** window. 
** 
** For example: $ CONFIG BAL 3 1 00:00:05.00 
** 
**        Starts the balancer which samples the system load every 
**        5 seconds.  After 3 samples, if the instance has one or 
**        more processes in the COM queue, a resource (CPU) 
**        reassignment will occur, giving this instance another CPU. 
** 
** GCU STARTUP: 
** 
** The GCU provides a menu item for launching SYS$SYSTEM:GCU$BALANCER.EXE 
** and a dialog for altering the balancer algorithm.  These features will 
** only work if the balancer image is properly installed as described 
** the the following paragraphs.  
** 
** To use the GCU-resident balancer startup option, you must: 
** 
** 1) Compile, link, or copy the balancer image as described previously. 
** 2) Invoke the GCU via: $ CONFIGURE GALAXY   You may need to set your 
**    DECwindows display to a suitably configured workstation or PC. 
** 3) Select the "CPU Balancer" entry from the "Galaxy" menu. 
** 4) Select appropriate values for your system.  This may take some 
**    testing.  By default, the values are set aggressively so that 
**    the balancer action can be readily observed.  If your system is 
**    very heavily loaded, you will need to increase the values 
**    accordingly to avoid excessive resource reassignment.  The GCU 
**    does not currently save these values, so you may want to write 
**    them down once you are satisfied. 
** 5) Select the instance/s you wish to have participate, then select 
**    the "Start" function, then press OK.  The GCU should launch the 
**    process GCU$BALANCER on all selected instances.  You may want to 
**    verify these processes have been started. 
** 
** SHUTDOWN WARNING: 
** 
** In an OpenVMS Galaxy, no process may have shared memory mapped on an 
** instance when it leaves the Galaxy, as during a shutdown. Because of 
** this, SYS$MANAGER:SYSHUTDWN.COM must be modified to stop the process 
** if the GCU$BALANCER program is run from a SYSTEM UIC.  Processes in the 
** SYSTEM UIC group are not terminated by SHUTDOWN.COM when shutting down 
** or rebooting OpenVMS. If a process still has shared memory mapped when 
** an instance leaves the Galaxy, the instance will crash with a 
** GLXSHUTSHMEM bugcheck.  
** 
** To make this work, SYS$MANAGER:SYSHUTDWN.COM must stop the process as 
** shown in the example below.  Alternatively, the process can be run 
** under a suitably privileged, non-SYSTEM UIC. 
** 
** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM 
** 
**    $! 
**    $! If the GCU$BALANCER image is running, stop it to release shmem. 
**    $! 
**    $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql") 
**    $ procid  = f$pid(ctx) 
**    $ if procid .NES. "" then $ stop/id='procid' 
** 
** Note, you could also just do a "$ STOP GCU$BALANCER" statement. 
** 
** OUTPUTS: 
** 
**    If the logical name GCU$BALANCER_VERIFY is defined, notify the 
**    SYSTEM account when CPUs are reassigned.  If the /STATISTICS 
**    qualifier is specified, a status line is continually displayed, 
**    but only when run directly from the command line. 
** 
** REVISION HISTORY: 
** 
** 02-Dec-1998 Greatly improved instructions. 
** 03-Nov-1998 Improved instructions. 
** 24-Sep-1998 Initial code example and integration with GCU. 
*/ 
#include <BRKDEF>
#include <BUILTINS>
#include <CSTDEF>
#include <DESCRIP>
#include <GLOCKDEF>
#include <INTS>
#include <PDSCDEF>
#include <PSLDEF>
#include <SECDEF>
#include <SSDEF>
#include <STARLET>
#include <STDIO>
#include <STDLIB>
#include <STRING>
#include <SYIDEF>
#include <SYSEVTDEF>
#include <VADEF>
#include <VMS_MACROS>
#include <CPUDEF>
#include <IOSBDEF.H>
#include <EFNDEF.H>
/* For CLI */ 
#include <cli$routines.h> 
#include <chfdef.h> 
#include <climsgdef.h> 
 
#define HEARTBEAT_RESTART     0 /* Flags for synchronization            */ 
#define HEARTBEAT_ALIVE       1 
#define HEARTBEAT_TRANSPLANT  2 
 
#define GLOCK_TIMEOUT    100000 /* Sanity check, max time holding gLock */ 
#define _failed(x) (!((x) & 1)) 
 
$DESCRIPTOR(system_dsc, "SYSTEM");              /* Brkthru account name     */ 
$DESCRIPTOR(gblsec_dsc, "GCU$BALANCER");        /* Global section name      */ 
 
struct  SYI_ITEM_LIST {                         /* $GETSYI item list format */ 
  short buflen,item; 
  void *buffer,*length; 
}; 
 
/* System information and an item list to use with $GETSYI */ 
 
static unsigned long total_cpus; 
static uint64   partition_id; 
static long     max_instances = 32;             
iosb            g_iosb; 
 
struct SYI_ITEM_LIST syi_itemlist[3] = { 
     {sizeof (long), SYI$_ACTIVECPU_CNT,&total_cpus,  0}, 
     {sizeof (long), SYI$_PARTITION_ID, &partition_id,0}, 
     {0,0,0,0}}; 
 
extern uint32 *SCH$AQ_COMH;             /* Scheduler COM queue address  */ 
unsigned long PAGESIZE;                 /* Alpha page size              */ 
uint64        glock_table_handle;       /* Galaxy lock table handle     */ 
 
/* 
** Shared Memory layout (64-bit words): 
** ==================================== 
** 0  to  n-1:  Busy count, where 100 = 1 process in a CPU queue 
** n  to  2n-1: Heartbeat (status) for each instance 
** 2n to  3n-1: Current CPU count on each instance 
** 3n to  4n-1: Galaxy lock handles for modifying heartbeats 
** 
** where n = max_instances * sizeof(long). 
** 
** We assume the entire table (easily) fits in two Alpha pages. 
*/ 
 
/* Shared memory pointers must be declared volatile */ 
 
volatile uint64  gs_va = 0;             /* Shmem section address        */ 
volatile uint64  gs_length = 0;         /* Shmem section length         */ 
volatile uint64 *gLocks;                /* Pointers to gLock handles    */ 
volatile uint64 *busycnt,*heartbeat,*cpucount;  
 
/************************************************************************/ 
/* FUNCTION init_lock_tables - Map to the Galaxy locking table and      */ 
/* create locks if needed. Place the lock handles in a shared memory    */ 
/* region, so all processes can access the locks.                       */ 
/*                                                                      */ 
/* ENVIRONMENT: Requires SHMEM and CMKRNL to create tables.             */ 
/* INPUTS:      None.                                                   */ 
/* OUTPUTS:     Any errors from lock table creation.                    */ 
/************************************************************************/ 
int init_lock_tables (void) 
{ 
    int status,i; 
    unsigned long sanity; 
    uint64 handle; 
    unsigned int min_size, max_size; 
 
    /* Lock table names are 15-byte padded values, unique across a Galaxy. */ 
    char table_name[] = "GCU_BAL_GLOCK  "; 
 
    /* Lock names are 15-byte padded values, but need not be unique. */ 
    char lock_name[] = "GCU_BAL_LOCK   "; 
 
    /* Get the size of a Galaxy lock */ 
    status = sys$get_galaxy_lock_size(&min_size,&max_size); 
    if (_failed(status)) return (status); 
 
    /* 
    ** Create or map to a process space Galaxy lock table. We assume 
    ** one page is enough to hold the locks. This will work for up 
    ** to 128 instances. 
    */ 
    status = sys$create_galaxy_lock_table(table_name,PSL$C_USER, 
                PAGESIZE,GLCKTBL$C_PROCESS,0,min_size,&glock_table_handle); 
    if (_failed(status)) return (status); 
 
    /* 
    ** Success case 1: SS$_CREATED                                   
    ** We created the table, so  populate it with locks and          
    ** write the handles to shared memory so the other partitions    
    ** can access them. Only one instance can receive SS$_CREATED    
    ** for a given lock table; all other mappers will get SS$_NORMAL. 
    */ 
    if (status == SS$_CREATED) 
    { 
      printf ("%%GCU$BALANCER-I-CRELOCK, Creating G-locks\n"); 
      for (i=0; i<max_instances; i++) 
      { 
        status = sys$create_galaxy_lock(glock_table_handle,lock_name, 
                   min_size,0,0,0,&handle); 
        gLocks[i] = handle; 
        if (_failed(status)) return (status); 
      } 
    } 
    else 
    { 
    /* 
    ** Success case 2: SS$_NORMAL                              
    ** We mapped the table, but did not create it. Spin until  
    ** the creator fills the lock handles. NOTE: If the creator 
    ** fails in the loop above and does not finish creating the 
    ** locks, then we will be stuck waiting forever - so we    
    ** use a sanity check here. Process space lock tables and  
    ** memory regions are automatically deleted when all       
    ** processes mapping them are deleted, so the worst case   
    ** is this:                                                 
    **                                                          
    ** - Process 1 starts, creates gLock table                  
    ** - Processes 2-n start and are waiting on gLock creation  
    ** - Process 1 dies before completing gLock creation        
    ** - Processes 2-n time out and exit; the half-initialized  
    **   section and lock tables are deleted by VMS.            
    ** - The user (or script) receives SS$_TIMEOUT and can      
    **   now restart all processes with a "clean slate".        
    */ 
      sanity = 0; 
      printf ("%%GCU$BALANCER-I-WAITLOCK, Waiting for G-lock creation...\n"); 
      while (gLocks[max_instances-1] == 0) 
      { 
        if (sanity++ > 1000000) return (SS$_TIMEOUT); 
      } 
    } 
    return (SS$_NORMAL); 
} 
 
/************************************************************************/ 
/* FUNCTION update_cpucount - Update the number of CPUs in this instance*/ 
/*                                                                      */ 
/* ENVIRONMENT: Called directly or via a system event AST.              */ 
/* INPUTS:      None.                                                   */ 
/* OUTPUTS:     Updates this instance's CPU count in shared memory.     */ 
/************************************************************************/ 
void update_cpucount(int unused) 
{ 
   sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); 
   cpucount[partition_id] = total_cpus; 
} 
 
/************************************************************************/ 
/* FUNCTION cpu_q - Count the number of processes in CPU COM queues     */ 
/*                                                                      */ 
/* ENVIRONMENT: OpenVMS Kernel Mode.                                    */ 
/* INPUTS:      None.                                                   */ 
/* OUTPUTS:     Returns the number of processes on the COM queues.      */ 
/************************************************************************/ 
long cpu_q(void) 
{ 
   uint32 *head, *tmp; 
   long procs = 0; 
   int p; 
 
   head = SCH$AQ_COMH;          /* Head of 1st COM queue                */ 
   sys_lock(SCHED,1,0);         /* Obtain SCHED spinlock                */ 
 
   for (p=64; p>0; p--)         /* Queues to scan (32 COM + 32 COMO)    */ 
   { 
     tmp = (uint32 *) *head;    /* Look at first flink                  */ 
     while (tmp != head)        /* Compare vs. head of queue            */ 
     { 
       procs++;                 /* Different, count a job waiting       */ 
       tmp = (uint32 *) *tmp;   /* Go to next queue entry               */ 
     } 
     head = head + 2;           /* Go to next queue (increment by 2*32) */ 
   }                            /* And scan it (loop to "for p...")     */ 
 
   sys_unlock(SCHED,0,0);       /* Release SCHED spinlock               */ 
   return procs; 
} 
 
/************************************************************************/ 
/* FUNCTION lockdown - Lock the cpu_q routine into the working set      */ 
/*                    so that it can't pagefault while at elevated IPL  */ 
/*                                                                      */ 
/* ENVIRONMENT: Requires CMKRNL privilege.                              */ 
/* INPUTS:      None.                                                   */ 
/* OUTPUTS:     None.                                                   */ 
/************************************************************************/ 
void lockdown(void) 
{ 
   struct pdscdef *proc_desc = (void *)cpu_q; 
   unsigned long sub_addr[2], locked_head[2], locked_code[2]; 
   unsigned long status; 
 
   sub_addr[0] = (unsigned long) cpu_q; 
   sub_addr[1] = sub_addr[0] + PAGESIZE; 
   if (__PAL_PROBER((void *)sub_addr[0],sizeof(int),PSL$C_USER) != 0) 
     sub_addr[1] = sub_addr[0]; 
 
   status = sys$lkwset(sub_addr,locked_head,PSL$C_USER); 
   if (_failed(status)) exit(status); 
 
   sub_addr[0] = proc_desc->pdsc$q_entry[0]; 
   sub_addr[1] = sub_addr[0] + PAGESIZE; 
   if (__PAL_PROBER((void *)sub_addr[0],sizeof(int),PSL$C_USER) != 0) 
        sub_addr[1] = sub_addr[0]; 
 
   status = sys$lkwset(sub_addr,locked_code,PSL$C_USER); 
   if (_failed(status)) exit(status); 
} 
 
/************************************************************************/ 
/* FUNCTION reassign_a_cpu - Reassign a single CPU to another instance. */ 
/*                                                                      */ 
/* ENVIRONMENT: Requires CMKRNL privilege.                              */ 
/* INPUTS:      most_busy_id: partition ID of destination.              */ 
/* OUTPUTS:     None.                                                   */ 
/*                                                                      */ 
/* Donate one CPU at a time - then wait for the remote instance to      */ 
/* reset its heartbeat and recalculate its load.                        */ 
/************************************************************************/ 
void reassign_a_cpu(int most_busy_id) 
{ 
  int status,i; 
  static char op_msg[255]; 
  static char iname_msg[1]; 
  $DESCRIPTOR(op_dsc,op_msg); 
  $DESCRIPTOR(iname_dsc,""); 
  iname_dsc.dsc$w_length = 0; 
 
  /* Update CPU info */ 
 
  status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); 
  if (_failed(status)) exit(status); 
 
  /* Don't attempt reassignment if we are down to one CPU */ 
 
  if (total_cpus > 1) 
  { 
    status = sys$acquire_galaxy_lock(gLocks[most_busy_id],GLOCK_TIMEOUT,0); 
    if (_failed(status)) exit(status); 
    heartbeat[most_busy_id] = HEARTBEAT_TRANSPLANT; 
    status = sys$release_galaxy_lock(gLocks[most_busy_id]); 
    if (_failed(status)) exit(status); 
 
    status = sys$cpu_transitionw(CST$K_CPU_MIGRATE,CST$K_ANY_CPU,0, 
                                  most_busy_id,0,0,0,0,0,0); 
    if (status & 1) 
    { 
      if (getenv ("GCU$BALANCER_VERIFY")) 
      { 
        sprintf(op_msg, 
                "\n\n*****GCU$BALANCER: Reassigned a CPU to instance %li\n", 
                most_busy_id); 
        op_dsc.dsc$w_length = strlen(op_msg); 
        sys$brkthru(0,&op_dsc,&system_dsc,BRK$C_USERNAME,0,0,0,0,0,0,0); 
      } 
      update_cpucount(0);  /* Update the CPU count after donating one */ 
    } 
  } 
} 
 
/************************************************************************/ 
/* IMAGE ENTRY - MAIN                                                   */ 
/*                                                                      */ 
/* ENVIRONMENT: OpenVMS Galaxy                                          */ 
/* INPUTS:      None.                                                   */ 
/* OUTPUTS:     None.                                                   */ 
/************************************************************************/ 
int main(int argc, char **argv) 
{ 
   int           show_stats = 0; 
   long          busy,most_busy,nprocs; 
   int64         delta; 
   unsigned long status,i,j,k,system_cpus,instances; 
   unsigned long arglst         = 0; 
   uint64        version_id[2]  = {0,1}; 
   uint64        region_id      = VA$C_P0; 
   uint64        most_busy_id,cpu_hndl = 0; 
 
/* Static descriptors for storing parameters.  Must match CLD defs */ 
 
   $DESCRIPTOR(p1_desc,"P1");           
   $DESCRIPTOR(p2_desc,"P2"); 
   $DESCRIPTOR(p3_desc,"P3"); 
   $DESCRIPTOR(p4_desc,"P4"); 
   $DESCRIPTOR(stat_desc,"STATISTICS"); 
 
/* Dynamic descriptors for retrieving parameter values */ 
 
   struct dsc$descriptor_d samp_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; 
   struct dsc$descriptor_d proc_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; 
   struct dsc$descriptor_d time_desc = {0,DSC$K_DTYPE_T,DSC$K_CLASS_D,0}; 
 
   struct SYI_ITEM_LIST syi_pagesize_list[3] = { 
     {sizeof (long), SYI$_PAGE_SIZE      ,&PAGESIZE     ,0}, 
     {sizeof (long), SYI$_GLX_MAX_MEMBERS,&max_instances,0}, 
     {0,0,0,0}}; 
/* 
** num_samples and time_desc determine how often the balancer should check 
** to see if any other instance needs more CPUs. num_samples determines the 
** number of samples used to calculate the running average, and sleep_dsc 
** determines the amount of time between samples. 
** 
** For example, a sleep_dsc of 30 seconds and a num_samples of 20 means that 
** a running average over the last 10 minutes (20 samples * 30 secs) is used 
** to balance CPUs. 
** 
** load_tolerance is the minimum load difference which triggers a CPU 
** migration. 100 is equal to 1 process in the computable CPU queue. 
*/ 
   int num_samples;     /* Number of samples in running average      */ 
   int load_tolerance;  /* Minimum load diff to trigger reassignment */ 
 
/* Parse the CLI */ 
                                                /* CONFIGURE VERB */ 
   status      = CLI$PRESENT(&p1_desc);         /* BALANCER       */ 
   if (status != CLI$_PRESENT) exit(status); 
   status      = CLI$PRESENT(&p2_desc);         /* SAMPLES        */ 
   if (status != CLI$_PRESENT) exit(status); 
   status      = CLI$PRESENT(&p3_desc);         /* PROCESSES      */ 
   if (status != CLI$_PRESENT) exit(status); 
   status      = CLI$PRESENT(&p4_desc);         /* TIME           */ 
   if (status != CLI$_PRESENT) exit(status); 
 
   status     = CLI$GET_VALUE(&p2_desc,&samp_desc); 
   if (_failed(status)) exit(status); 
   status     = CLI$GET_VALUE(&p3_desc,&proc_desc); 
   if (_failed(status)) exit(status); 
   status     = CLI$GET_VALUE(&p4_desc,&time_desc); 
   if (_failed(status)) exit(status); 
   status     = CLI$PRESENT(&stat_desc); 
   show_stats = (status == CLI$_PRESENT) ? 1 : 0; 
 
   num_samples = atoi(samp_desc.dsc$a_pointer); 
   if (num_samples <= 0) num_samples = 3; 
 
   load_tolerance = (100 * (atoi(proc_desc.dsc$a_pointer))); 
   if (load_tolerance <= 0) load_tolerance = 100; 
 
   if (show_stats) 
     printf("Args: Samples: %d, Processes: %d, Time: %s\n", 
        num_samples,load_tolerance/100,time_desc.dsc$a_pointer); 
 
   lockdown();                  /* Lock down the cpu_q subroutine */ 
 
   /* Get the page size and max members for this system */ 
 
   status = sys$getsyiw(EFN$C_ENF,0,0,&syi_pagesize_list,&g_iosb,0,0); 
   if (_failed(status)) return (status); 
 
   if (max_instances == 0) max_instances = 1; 
 
   /* Get our partition ID and initial CPU info */ 
 
   status = sys$getsyiw(EFN$C_ENF,0,0,&syi_itemlist,&g_iosb,0,0); 
   if (_failed(status)) return (status); 
 
   /* Map two pages of shared memory */ 
 
   status = sys$crmpsc_gdzro_64(&gblsec_dsc,version_id,0,PAGESIZE+PAGESIZE, 
              &region_id,0,PSL$C_USER,(SEC$M_EXPREG|SEC$M_SYSGBL|SEC$M_SHMGS), 
              &gs_va,&gs_length); 
   if (_failed(status)) exit(status); 
 
   /* Initialize the pointers into shared memory */ 
 
   busycnt   = (uint64 *) gs_va; 
   heartbeat = (uint64 *) gs_va     + max_instances; 
   cpucount  = (uint64 *) heartbeat + max_instances; 
   gLocks    = (uint64 *) cpucount  + max_instances; 
 
   cpucount[partition_id] = total_cpus; 
 
   /* Create or map the Galaxy lock table */ 
 
   status = init_lock_tables(); 
   if (_failed(status)) exit(status); 
 
   /* Initialize delta time for sleeping */ 
 
   status = sys$bintim(&time_desc,&delta); 
   if (_failed(status)) exit(status); 
 
   /* 
   ** Register for CPU migration events. Whenever a CPU is added to 
   ** our active set, the routine "update_cpucount" will fire. 
   */ 
   status = sys$set_system_event(SYSEVT$C_ADD_ACTIVE_CPU, 
              update_cpucount,0,0,SYSEVT$M_REPEAT_NOTIFY,&cpu_hndl); 
   if (_failed(status)) exit(status); 
 
   /* Force everyone to resync before we do anything */ 
 
   for (j=0; j<max_instances; j++) 
   { 
     status = sys$acquire_galaxy_lock(gLocks[j],GLOCK_TIMEOUT,0); 
     if (_failed(status)) exit(status); 
     heartbeat[j] = HEARTBEAT_RESTART; 
     status = sys$release_galaxy_lock (gLocks[j]); 
     if (_failed(status)) exit(status); 
   } 
 
   printf("%%GCU$BALANCER-S-INIT, CPU balancer initialized.\n\n"); 
 
   /*** Main loop ***/ 
   do 
   { 
     /* Calculate a running average and update it */ 
 
     nprocs = sys$cmkrnl(cpu_q,&arglst) * 100; 
 
     /* Check out our state... */ 
 
     switch (heartbeat[partition_id]) 
     { 
       case HEARTBEAT_RESTART: /* Mark ourself for reinitializition. */ 
       { 
         update_cpucount(0); 
         status = sys$acquire_galaxy_lock(gLocks[partition_id],GLOCK_TIMEOUT,0); 
         if (_failed(status)) exit(status); 
         heartbeat[partition_id] = HEARTBEAT_ALIVE; 
         status = sys$release_galaxy_lock(gLocks[partition_id]); 
         if (_failed(status)) exit(status); 
         break; 
       } 
       case HEARTBEAT_ALIVE: /* Update running average and continue. */ 
       { 
         busy = (busycnt[partition_id]*(num_samples-1)+nprocs)/num_samples; 
         busycnt[partition_id] = busy; 
         break; 
       } 
       case HEARTBEAT_TRANSPLANT:  /* Waiting for a new CPU to arrive. */ 
       { 
         /* 
         ** Someone just either reset us, or gave us a CPU and put a wait on 
         ** further donations.  Reassure the Galaxy that we're alive, and 
         ** calculate a new busy count. 
         */ 
         busycnt[partition_id] = nprocs; 
         status = sys$acquire_galaxy_lock(gLocks[partition_id],GLOCK_TIMEOUT,0); 
         if (_failed(status)) exit(status); 
         heartbeat[partition_id] = HEARTBEAT_ALIVE; 
         status = sys$release_galaxy_lock(gLocks[partition_id]); 
         if (_failed(status)) exit(status); 
         break; 
       } 
       default:         /* This should never happen. */ 
       { 
         exit(0); 
         break; 
       } 
     } 
 
     /* Determine the most_busy instance. */ 
 
     for (most_busy_id=most_busy=i=0; i<max_instances; i++) 
     { 
       if (busycnt[i] > most_busy) 
       { 
         most_busy_id = (uint64) i; 
         most_busy    = busycnt[i]; 
       } 
     } 
 
     if (show_stats) 
       printf("Current Load: %3Ld, Busiest Instance: %Ld, Queue Depth: %4d\r", 
             busycnt[partition_id],most_busy_id,(nprocs/100)); 
 
     /* If someone needs a CPU and we have an extra, dontate it. */ 
 
     if ((most_busy > busy + load_tolerance) && 
         (cpucount[partition_id] > 1) && 
         (heartbeat[most_busy_id] != HEARTBEAT_TRANSPLANT) && 
         (most_busy_id != partition_id)) 
     { 
        reassign_a_cpu(most_busy_id); 
     } 
 
     /* Hibernate for a while and do it all again. */ 
 
     status = sys$schdwk(0,0,&delta,0); 
     if (_failed(status)) exit(status); 
     status = sys$hiber(); 
     if (_failed(status)) exit(status); 
 
   } while (1); 
   return (1); 
} 
 
 


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6512PRO_013.HTML