OpenVMS Alpha Partitioning and Galaxy Guide

Document revision date: 30 March 2001

OpenVMS Alpha Partitioning and Galaxy Guide

Contents

Index

2.12 Security Considerations in an OpenVMS Galaxy Computing Environment

OpenVMS Galaxy instances executing in a shared-everything cluster environment, in which all security database files are shared among all instances, automatically provide a consistent view of all Galaxy-related security profiles.

If you choose not to share all security database files throughout all Galaxy instances, a consistent security profile can only be achieved manually. Changes to an object's security profile must be followed by similar changes on all instances where this object can be accessed.

Because of the need to propagate changes manually, it is unlikely that such a configuration would ever be covered by a US C2 evaluation or by similar evaluations from other authorities. Organizations that require operating systems to have security evaluations should ensure that all instances in a single OpenVMS Galaxy belong to the same cluster.

2.13 Configuring OpenVMS Galaxy Instances in Time Zones

OpenVMS Galaxy instances do not have to be in the same time zone unless they are in the same cluster. For example, each instance in a three-instance Galaxy configuration could be in a different time zone.

2.14 Developing OpenVMS Galaxy Programs

The following sections describes OpenVMS programming interfaces that are useful in developing OpenVMS Galaxy application programs. Many of the concepts are extensions of the traditional single-instance OpenVMS system.

To see the C function prototypes for the services described in these chapters, enter the following command:

$ library/extract=starlet sys$library:sys$starlet_c.tlb/output=filename

Then search the output file for the service you want to see.

2.14.1 Locking Programming Interfaces

One of the major features of the Galaxy platform is the ability to share resources across multiple instances of the operating system. As with any shared resource, the need arises to synchronize access to that resource. The services described in this chapter provide primitives upon which a cooperative scheme can be created to synchronize access to shared resources within a Galaxy.

A Galaxy lock is a combination of a spinlock and a mutex. While attempting to acquire an owned galaxy lock, the thread will spin for a short period. If the lock does not become available during the spin, the thread will put itself into a wait state. This is different from SMP spinlocks in which the system crashes if the spin times out, behavior that is not acceptable in a Galaxy.

Given the nature of Galaxy locks, they will reside somewhere in shared memory. That shared memory can be allocated either by the user or by the galaxy locking services. If the user allocates the memory, the locking services track only the location of the locks. If the locking services allocate the memory, it is managed on behalf of the user.

Unlike other monitoring code which is only part of the MON version of execlets, the Galaxy lock monitoring code is always loaded.

There are several routines provided to manipulate Galaxy locks. The routines do not provide anything but the basics when it comes to locking. They are a little richer than the spinlocks used to support SMP, but far less than what the lock manager provides. Table 2-1 summarizes the OpenVMS system services for lock programming.

Table 2-1 Galaxy System Services for Lock Programming
System Service Description

$ACQUIRE_GALAXY_LOCK Acquires ownership of an OpenVMS Galaxy lock.

$CREATE_GALAXY_LOCK Allocates an OpenVMS Galaxy lock block from a lock table created with the $CREATE_GALAXY_LOCK service.

$CREATE_GALAXY_LOCK_TABLE Allocates an OpenVMS Galaxy lock table.

$DELETE_GALAXY_LOCK Invalidates an OpenVMS Galaxy lock and deletes it.

$DELETE_GALAXY_LOCK_TABLE Deletes an OpenVMS Galaxy lock table.

$GET_GALAXY_LOCK_INFO Returns "interesting" fields from the specified lock.

$GET_GALAXY_LOCK_SIZE Returns the minimum and maximum size of an OpenVMS Galaxy lock.

$RELEASE_GALAXY_LOCK Releases ownership of an OpenVMS Galaxy lock.

**Table 2-1 Galaxy System Services for Lock Programming**
System Service	Description
$ACQUIRE_GALAXY_LOCK	Acquires ownership of an OpenVMS Galaxy lock.
$CREATE_GALAXY_LOCK	Allocates an OpenVMS Galaxy lock block from a lock table created with the $CREATE_GALAXY_LOCK service.
$CREATE_GALAXY_LOCK_TABLE	Allocates an OpenVMS Galaxy lock table.
$DELETE_GALAXY_LOCK	Invalidates an OpenVMS Galaxy lock and deletes it.
$DELETE_GALAXY_LOCK_TABLE	Deletes an OpenVMS Galaxy lock table.
$GET_GALAXY_LOCK_INFO	Returns "interesting" fields from the specified lock.
$GET_GALAXY_LOCK_SIZE	Returns the minimum and maximum size of an OpenVMS Galaxy lock.
$RELEASE_GALAXY_LOCK	Releases ownership of an OpenVMS Galaxy lock.

2.14.2 System Events Programming Interfaces

Applications can register to be notified when certain system events occur; for example, when an instance joins the Galaxy or if a CPU joins a configure set. If events are registered, an application can decide how to respond when the registered events occur.

Table 2-2 summarizes the OpenVMS system services available for events programming.

Table 2-2 Galaxy System Services for Events Programming
System Service Description

$CLEAR_SYSTEM_EVENT Removes one or more notification requests previously established by a call to $SET_SYSTEM_EVENT.

$SET_SYSTEM_EVENT Establishes a request for notification when an OpenVMS system event occurs.

**Table 2-2 Galaxy System Services for Events Programming**
System Service	Description
$CLEAR_SYSTEM_EVENT	Removes one or more notification requests previously established by a call to $SET_SYSTEM_EVENT.
$SET_SYSTEM_EVENT	Establishes a request for notification when an OpenVMS system event occurs.

2.14.3 Using SDA in an OpenVMS Galaxy

This section describes SDA information that is specific to an OpenVMS Galaxy computing environment.

For more information about using SDA, refer to the OpenVMS Alpha System Analysis Tools Manual.

2.14.3.1 Dumping Shared Memory

When a system crash occurs in a Galaxy instance, the default behavior of OpenVMS is to dump the contents of private memory of the failed instance and the contents of shared memory. In a full dump, every page of both shared and private memory is dumped; in a selective dump, only those pages in use at the time of the system crash are dumped.

Dumping of shared memory can be disabled by setting bit 4 the dynamic SYSGEN parameter DUMPSTYLE. This bit should only be set on the advice of "your Compaq support," as the resulting system dump may not contain the data required to determine the cause of the system crash.

Table 2-3 shows the definitions of all the bits in DUMPSTYLE and their meanings in OpenVMS Alpha. Bits can be combined in any combination.

Table 2-3 Definitions of Bits in DUMPSTYLE
Bit Value Description

0 1 0= Full dump. The entire contents of physical memory will be written to the dump file.

1= Selective dump. The contents of memory will be written to the dump file selectively to maximize the usefulness of the dump file while conserving disk space. (Only pages that are in use are written).

1 2 0= Minimal console output. This consists of the bugcheck code; the identity of the CPU, process, and image where the crash occurred; the system date and time; plus a series of dots indicating progress writing the dump.

1= Full console output. This includes the minimal output described above plus stack and register contents, system layout, and additional progress information such as the names of processes as they are dumped.

2 4 0= Dump to system disk. The dump will be written to SYS$SYSDEVICE:[SYSn.SYSEXE]SYSDUMP.DMP, or in its absence, SYS$SYSDEVICE:[SYSn.SYSEXE]PAGEFILE.SYS.

1= Dump to alternate disk. The dump will be written to dump_dev:[SYSn.SYSEXE]SYSDUMP.DMP, where dump_dev is the value of the console environment variable DUMP_DEV.

3 8 0= Uncompressed dump. Pages are written directly to the dump file.

1= Compressed dump. Each page is compressed before it is written, providing a saving in space and in the time taken to write the dump, at the expense of a slight increase in time taken to access the dump.

4 16 0= Dump shared memory.

1= Do not dump shared memory.

**Table 2-3 Definitions of Bits in DUMPSTYLE**
Bit	Value	Description
0	1	0= Full dump. The entire contents of physical memory will be written to the dump file. 1= Selective dump. The contents of memory will be written to the dump file selectively to maximize the usefulness of the dump file while conserving disk space. (Only pages that are in use are written).
1	2	0= Minimal console output. This consists of the bugcheck code; the identity of the CPU, process, and image where the crash occurred; the system date and time; plus a series of dots indicating progress writing the dump. 1= Full console output. This includes the minimal output described above plus stack and register contents, system layout, and additional progress information such as the names of processes as they are dumped.
2	4	0= Dump to system disk. The dump will be written to SYS$SYSDEVICE:[SYSn.SYSEXE]SYSDUMP.DMP, or in its absence, SYS$SYSDEVICE:[SYSn.SYSEXE]PAGEFILE.SYS. 1= Dump to alternate disk. The dump will be written to dump_dev:[SYSn.SYSEXE]SYSDUMP.DMP, where dump_dev is the value of the console environment variable DUMP_DEV.
3	8	0= Uncompressed dump. Pages are written directly to the dump file. 1= Compressed dump. Each page is compressed before it is written, providing a saving in space and in the time taken to write the dump, at the expense of a slight increase in time taken to access the dump.
4	16	0= Dump shared memory. 1= Do not dump shared memory.

The default setting for DUMPSTYLE is 0 (an uncompressed full dump, including shared memory, written to the system disk). Unless a value for DUMPSTYLE is specified in MODPARAMS.DAT, AUTOGEN.COM will set DUMPSTYLE to 1 (an uncompressed selective dump, including shared memory, written to the system disk) if there is less than 128 megabytes of memory on the system, or to 9 (a compressed selective dump, including shared memory, written to the system disk) otherwise.

2.14.3.2 Summary of SDA Command Interface Changes or Additions

The following list summarizes how the System Dump Analyzer (SDA) has been enhanced to view shared memory and OpenVMS Galaxy data structures. For more details, see the appropriate commands.

Added SHOW SHM_CPP. Default is a brief display of all SHM_CPPs.
Added VALIDATE SHM_CPP. Default action is to validate all SHM_CPPs and the counts and ranges of attached PFNs, but not the contents of the database for each PFN.
Added SHOW SHM_REG. Default is a brief display of all SHM_REGs.
Added /GLXSYS and /GLXGRP to SHOW GSD.
Added SHOW GMDB to display the contents of the GMDB and NODEB blocks. Default is detailed display of GMDB.
SHOW GALAXY shows a brief display of GMDB and all node blocks.
SHOW GLOCK displays Galaxy lock structures. Default is display of base GLOCK structures.
SHOW GCT displays Galaxy configuration tree. Default is /SUMMARY.
SHOW PAGE_TABLE and SHOW PROCESS/PAGE_TABLE.

Chapter 3
NUMA Implications on OpenVMS Applications

NUMA is an attribute of a system in which access time to any given physical memory location is not the same for all CPUs. Given this architecture, you must have consistently good location (but not necessarily 100% of the time) for high performance. In the new AlphaServer GS series, CPUs will access memory in their own QBB faster than they will access memory in another QBB.

If Open VMS is running on the resources of a single QBB, then there is no NUMA effect and this discussion does not apply. Whenever possible and practical, you can benefit by running in a single QBB, thereby eliminating the complexities NUMA may present.

The most common question for overall system performance in a NUMA environment is, "uniform for all?" or "optimal for a few?" In other words, do you want all processes to have roughly equivalent performance, or do you want to focus on some specific processes and make them as efficient as possible? Whenever a single instance of OpenVMS runs on multiple QBBs (whether it is the entire machine, a hard partition, or a Galaxy instance), then you must answer this question, because the answer dictates a number of configuration and management decisions you need to understand.

The OpenVMS default NUMA mode of operation is "uniform for all". Resources are assigned so that over time each process on the system has, on average, roughly the same performance potential.

If "uniform for all" is not what you want, you must understand the interfaces available to you in order to achieve the more specialized "optimal for a few" or "dedicated" environment. Processes and data can be assigned to specific resources to give them the highest performance potential possible.

To further enhance your understanding of the NUMA environment, this chapter discusses the following:

Base operating system NUMA actions
Application resource considerations
APIs

3.1 OpenVMS NUMA Awareness

OpenVMS memory management and process scheduling have been enhanced to work more efficiently on the new AlphaServer GS Series systems hardware.

The operating system treats the hardware as a set of Resource Affinity Domains (RADs). A RAD is the software grouping of physical resources (CPUs, memory, and I/O) with common access characteristics. On the new AlphaServer GS Series systems, a RAD corresponds to a Quad Building Block (QBB). When a single instance of OpenVMS runs on multiple QBBs, a QBB is seen as a RAD by OpenVMS.

Each of the following areas of enhancement adds a new capability to the system. Individually each brings increased performance potential for certain application needs. Collectively they provide the environment necessary for a diverse application mix. The areas being addressed are:

Assignment of process private pages
Assignment of reserved memory pages
Process scheduling
Replication of read-only system space pages
Allocation of nonpaged pool
Tools for observing page assignment

A CPU references memory in the same RAD three times faster than it references memory in another RAD. Therefore, it is important to keep the code being executed and the memory being referenced in the same RAD as much as possible. Consistently good location is the key to good performance. In assessing performance the following questions illustrate the types of things a programmer needs to consider.

Where is the code you are executing?
Where is the data you are accessing?
Where is the I/O device you are using?

The OpenVMS scheduler and the memory management subsystem work together to achieve the best possible location by:

Assigning each process a preferred or "home" RAD.
Usually scheduling a process on a CPU in its home RAD.
Replicating operating system read-only code and some data in each RAD.
Distributing global pages over RADs.
Striping reserved memory over RADs.

3.1.1 Home RAD

The OpenVMS operating system assigns a home RAD to each process during process creation. This has two major implications. First, with rare exception, one of the CPUs in the process's home RAD will run the process. Second, all process private pages required by the process will come from memory in the home RAD. This combination aids in maximizing local memory references.

When assigning home RADs, the default action of OpenVMS is to distribute the processes over the RADs.

3.1.2 System Code Replication

During system startup the operating system code is replicated in the memory of each RAD so that each process in the system will be accessing local memory whenever it requires system functions. This replication is of both the executive code and the installed resident image code granularity hint regions.

3.1.3 Distributing Global Pages

The default action of OpenVMS is to distribute global pages (the pages of a global section) over the RADs. This approach is also taken with the assignment of global pages that have been declared as reserved memory during system startup.

3.2 Application Resource Considerations

Each application environment is different. An application's structure may dictate which options are best for achieving the desired goals. Some of the deciding factors include:

Number of processes
Amount of memory needed
Amount of sharing between processes
Use of certain base operating system features
Use of locks and their location

There are few absolute rules, but the following sections present some basic concepts and examples that will usually lead to the best outcome. Localizing (on-QBB) memory access is always the goal, but it is not always achievable and that is where tradeoffs are most likely to be made.

3.2.1 Processes and Shared Data

If you have hundreds, or maybe thousands, of processes that access a single global section, then you most likely want the default behavior of the operating system. The pages of the global section will be equally distributed in the memory of all RADs, and the processes' home RAD assignments will be equally distributed over the CPUs. This is the distributed, or "uniform", effect where over time all processes have similar performance potential given random accesses to the global section. None will be optimal but none will be at a severe disadvantage compared to the others.

On the other hand, a small number of processes accessing a global section can be "located" in a single RAD as long as 4 CPUs can handle the processing load and a single RAD contains sufficient memory for the entire global section. This will localize most memory access and therefore enhance performance of those specifically located processes. This strategy can be employed multiple times on the same system by locating one set of processes and their data in one RAD and a second set of processes and their data in another RAD.

3.2.2 Memory

A single QBB can have up to 32 GB of memory; two can have up to 64 GB, and so on. Take advantage of the large memory capacity whenever possible. For example, consider duplicating code or data in multiple RADs. It will take some analysis, may seem wasteful of space, and will require coordination. However, it may be worthwhile if it ultimately makes significantly more memory references local.

Consider the use of a RAM disk product. Even if NUMA is involved, in-memory references will outperform real device I/O.

3.2.3 Sharing and Synchronization

Sharing data usually requires synchronization. If the coordination mechanism is a single memory location (sometimes called a latch, a lock, or a semaphore), then it may be the cause of many remote accesses and therefore degrade performance if the contention is high enough. Multiple levels of such locks distributed throughout the data may reduce the amount of remote access.

3.2.4 Use of OpenVMS Features

Heavy use of certain base operating system features will result in much remote access because the data to support these functions resides in the memory of QBB0. Some data cannot be duplicated and some can be but has not been yet.

3.3 RAD Application Programming Interfaces

A number of interfaces specific to RADs are available to application programmers and system managers for controlling the location of processes and memory if the system defaults do not meet the needs of the operating environment. The following subsections are brief descriptions; the details can be found in the appropriate OpenVMS System Services Reference Manual.

3.3.1 Creating a Process

If you want a process to have a specific home RAD, then use the new HOME_RAD argument in the SYS$CREPRC system service. This allows the application to control the location.

3.3.2 Moving a Process

If a process has already been created and you want to relocate it, use the HOME_RAD argument to the SYS$SET_PROCESS_PROPERTIES system service. The process's working set will be purged and, as it runs on the CPUs in its new home RAD, its private pages will be reassigned from memory in the new home RAD.

3.3.3 Getting Information About a Process

The SYS$GETJPI system service returns the home RAD of a process.

3.3.4 Creating a Global Section

The SYS$CRMPSC_GDZRO_64 and SYS$CREATE_GDZRO system services accept a RAD argument mask. This indicates in which RADs OpenVMS should attempt to assign the pages of the global section.

3.3.5 Assigning Reserved Memory

The SYSMAN interface for assigning reserved memory has a RAD qualifier, so a system manager can declare that the memory being reserved should come from specific RADs.

3.3.6 Getting Information About the System

The SYS$GETSYI system service defines the following item codes for obtaining RAD information.

RAD_MAX_RADS shows the maximum number of RADs possible on a platform.
RAD_CPUS shows a longword array of RAD/CPU pairs.
RAD_MEMSIZE shows a longword array of RAD/page_count pairs.
RAD_SHMEMSIZE shows a longword array of RAD/page_count pairs.

3.3.7 RAD_SUPPORT System Parameter

The RAD_SUPPORT system parameter has numerous bits and fields defined for customizing individual RAD-related actions.

3.4 RAD System Services Summary Table

The following table describes RAD system service information for OpenVMS Version 7.3.

For additional information, refer to the OpenVMS System Services Reference Manual.

System Service RAD Information

$CREATE_GDZRO Argument: rad_mask
Flag: SEC$M_RAD_HINT
Error status: SS$_BADRAD

$CREPRC Argument: home_rad
Status flag bit: stsflg
Symbolic name: PRC$M_HOME_RAD
Error status: SS$_BADRAD

$CRMPSC_GDZRO_64 Argument: rad_mask
Flag: SEC$M_RAD_MASK
Error status: SS$_BADRAD

$GETJPI Item code: JPI$_HOME_RAD

$GETSYI Item codes: RAD_MAX_RADS, RAD_CPUS, RAD_MEMSIZE, RAD_SHMEMSIZE, GALAXY_SHMEMSIZE

$SET_PROCESS_PROPERTIESW Item code: PPROP$C_HOME_RAD

System Service	RAD Information
$CREATE_GDZRO	Argument: rad_mask Flag: SEC$M_RAD_HINT Error status: SS$_BADRAD
$CREPRC	Argument: home_rad Status flag bit: stsflg Symbolic name: PRC$M_HOME_RAD Error status: SS$_BADRAD
$CRMPSC_GDZRO_64	Argument: rad_mask Flag: SEC$M_RAD_MASK Error status: SS$_BADRAD
$GETJPI	Item code: JPI$_HOME_RAD
$GETSYI	Item codes: RAD_MAX_RADS, RAD_CPUS, RAD_MEMSIZE, RAD_SHMEMSIZE, GALAXY_SHMEMSIZE
$SET_PROCESS_PROPERTIESW	Item code: PPROP$C_HOME_RAD

3.5 RAD DCL Command Summary Table

The following table summarizes OpenVMS RAD DCL commands. For additional information, refer to the OpenVMS DCL Dictionary.

DCL Command/Lexical RAD Information

SET PROCESS Qualifier: /RAD=HOME= n

SHOW PROCESS Qualifier: /RAD

F$GETJPI Item code: HOME_RAD

F$GETSYI Item codes: RAD_MAX_RADS, RAD_CPUS, RAD_MEMSIZE, RAD_SHMEMSIZE

DCL Command/Lexical	RAD Information
SET PROCESS	Qualifier: /RAD=HOME= n
SHOW PROCESS	Qualifier: /RAD
F$GETJPI	Item code: HOME_RAD
F$GETSYI	Item codes: RAD_MAX_RADS, RAD_CPUS, RAD_MEMSIZE, RAD_SHMEMSIZE

Contents

Index

privacy and legal statement

6512PRO_002.HTML

OpenVMS Alpha Partitioning and Galaxy Guide

2.12 Security Considerations in an OpenVMS Galaxy Computing Environment

Chapter 3NUMA Implications on OpenVMS Applications

3.1 OpenVMS NUMA Awareness

3.1.1 Home RAD

3.3.7 RAD_SUPPORT System Parameter

Chapter 3
NUMA Implications on OpenVMS Applications