Document revision date: 19 July 1999
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Alpha Galaxy Guide


Previous Contents Index

10.2 Hardware and Firmware Issues Affecting AlphaServer 8400 and 8200 Systems

The following sections describe known OpenVMS Galaxy issues that affect AlphaServer 8400 and 8200 systems.

10.2.1 Allocating Primary CPUs to Instances on AlphaServer 8400/8200

On AlphaServer 8400/8200 systems, only an even-numbered CPU can be the primary CPU in an instance. Therefore, do not define an instance to consist only of odd numbered CPUs.

10.2.2 Console Terminal Connection (Secondary Instances)

The console terminal must be connected to COM1, which is the connector furthest from the DWLPB motherboard. See the Chapter 6 in this document for complete details about installing the KFE72-DA console subsystem.

If the console terminal is connected to COM2, you will be able to enter console commands, but you will not see any output from the console or from OpenVMS.

10.2.3 EISA Ethernet Port is Unsupported

Use of the Twisted-Pair Ethernet port on the standard I/O module of the KFE72-DA is unnsupported.

10.2.4 Console MIGRATE Command on AlphaServer 8200

On an AlphaServer 8200 Galaxy system, the CPUs are 8, 9, 10, 11. CPU 11 cannot be reassigned from one instance to another under the following conditions:

The console migrate command generated by OpenVMS was:


MIGRATE -CPU 11 -PARTITION 0 

This produced the error "unable to migrate CPU 17". Changing the migrate command to specify -CPU 0B produced the desired effect.

The problem occurs because the MIGRATE command currently expects hexadecimal CPU numbers. It is likely to affect only the AlphaServer 8200, because a 12-CPU, 4GB (total), 2-instance AlphaServer 8400 Galaxy seems to be an unlikely configuration.

This problem will be fixed in a future console version.

10.2.5 DWLPA Cannot Be Used

The KFE72-DA used to provide the console connection for instances other than instance zero on an AlphaServer 8200 or 8400 must be installed in a DWLPB PCI bus. If a DWLPA PCI bus is used, a console machinecheck will occur during power-up before the initial P00>>> prompt is displayed.

10.3 Hardware and Firmware Issues Affecting AlphaServer 4100 Systems

The following sections describe known OpenVMS Galaxy issues that affect AlphaServer 4100 systems.

10.3.1 Allocating Primary CPUs to Instances on AlphaServer 4100

On AlphaServer 4100 systems, the following CPU allocation restrictions apply:

10.3.2 No LPINIT on AlphaServer 4100 Systems

On AlphaServer 8400 and 8200 systems, the LPINIT or GALAXY console commands start the consoles for second or third instances.

The LPINIT command is not valid on AlphaServer 4100 systems. On AlphaServer 4100 systems, you must use the GALAXY command to start the console for the second instance.

If you enter the LPINIT command on an AlphaServer 4100 system, the following message is displayed:


P00>>> lpinit 
lpinit: No such command 
P00>>> 

This will be corrected in a future Galaxy firmware update.

10.3.3 Do Not Use Gigabit Cards on AlphaServer 4100 Galaxy Systems

The Gigabit Ethernet adapter (DEGPA) is not supported for use in an OpenVMS Galaxy system. A fix will be included in a future version of OpenVMS.

10.3.4 Minimum Revision Power Control Module

A power control module (PCM) 54-24117-01 Rev F03 is needed to support more than two CPUs.

Three CPUs might work sometimes, but using a PCM 54-24117-01 Rev F03 is the safest practice.

10.4 OpenVMS Software Issues Affecting All Platforms

This section lists known OpenVMS software issues that affect all Galaxy platforms.

10.4.1 SSRVEXCEPT Bugcheck

If your OpenVMS Galaxy is configured in an existing OpenVMS Cluster, you must ensure that all the nodes in the cluster recognize new security classes as described in the Release Notes chapter (Chapter 1).

Failure to follow these procedures will cause OpenVMS VAX and Alpha systems running OpenVMS Version 6.2 or Version 7.1 to crash.

For complete documentation about this issue, see the Release Notes (Chapter 1).

10.4.2 GLXSHUTSHMEM Bugcheck

In an OpenVMS Galaxy, no process can have shared memory mapped to an instance when it leaves the Galaxy---for example, during a shutdown. To stop the process if an application is running from a system process (UIC group 1), you must modify SYS$MANAGER:SYSHUTDWN as shown in the following example from the OpenVMS Galaxy CPU Load balancer program:


** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM 
** 
** $! 
** $! If the GCU$BALANCER image is running, stop it to release shmem. 
** $! 
** $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql") 
** $ procid  = f$pid(ctx) 
** $ if procid .NES. "" then $ stop/id='procid' 

For more information about the shutdown warning in the OpenVMS Galaxy CPU Load balancer program, see Appendix A.

If a process still has shared memory mapped when an instance leaves the Galaxy, the instance will crash with a GLXSHUTSHMEM bugcheck.

10.4.3 GLXRMTPFN Bugcheck (Selective Dump Only)

The GLXRMTPFN bugcheck occurs if a potential corruption is detected by one instance while a selective system dump is being written and a PTE is seen with write access to private memory in another instance. Because there is no way of knowing whether corruption has occurred, the instance affected is asked to crash too. The following error message is displayed:


GLXRMTPFN, Remote node had PFN of this node mapped for write 

To find the culprit, use ANALYZE/CRASH on the "triggering" dump file:

  1. Use SHOW PAGE/INVALID_PFN=WRITABLE to search system space.
  2. Use SHOW PROCESS/PAGE/INVALID_PFN=WRITABLE to search the current process.
  3. Use SHOW PROCESS ALL/PAGE/INVALID_PFN=WRITABLE to search remaining processes.

Note that if you find PFN 0 referenced, it might be poor coding practice rather than a corruption event.

10.4.4 PTE Checker Program

The PTE checker program examines page table entries (PTEs) to check for mapped but invalid PFNs. (Invalid means that the instance is not allowed write access to this PFN.) If such an entry is found, the memory management routine called to validate the PFNs will crash the system running the PTE checker program with an INVPFN bugcheck. If a selective dump is written, the instance that owns the PFN in question will crash with a GLXRMTPFN bugcheck.

Note that the INVPFN crash dump is the one to look at, not the resulting GLXRMTPFN.

A process must have CMKRNL privilege to run the SYS$TEST:GLX$PTECHECK.EXE program.

Using this program avoids waiting for a crash to occur to find a mapped but invalid PFN.

10.4.5 SET CPU/FAILOVER Error Messages

The causes of some errors when using the SET CPU/FAILOVER command might not be obvious from the following error message:

NOSUCHCPU is displayed when no CPU has been specified.

BADPARAM is displayed when a nonexistent instance has been given.

Remember that the command needs both a CPU and a target instance. For example:


SET CPU cpu /FAILOVER=instance 
 
SET CPU /ALL /FAILOVER=instance 

This message will be improved in a future version of OpenVMS.

10.4.6 No Network Booting of Nonprimary Instances

Network booting of Galaxy instances is only supported for instance 0. In other words, you cannot boot any instance other than instance 0 over a network.

10.5 Turning Galaxy Mode Off

If you want to turn off OpenVMS Galaxy software, change the lp_count environment variable as follows and enter the following commands:


>>> SET LP_COUNT 0   ! Return to monolithic SMP config 
>>> INIT                      ! Return to single SMP console 
>>> B -fl 0,1 device          ! Stop at SYSBOOT 
SYSBOOT> SET GALAXY 0 
SYSBOOT> CONTINUE 


Part III
Managing an OpenVMS Galaxy


Chapter 11
OpenVMS Galaxy Configuration Utility

The Galaxy Configuration Utility (GCU) is a DECwindows Motif application that allows system managers to configure and manage an OpenVMS Galaxy system from a single workstation window.

Using the GCU, system managers can:

The GCU resides in the SYS$SYSTEM directory along with a small number of files containing configuration knowledge.

The GCU consists of the following files:
SYS$SYSTEM:GCU.EXE GCU executable image
SYS$MANAGER:GCU.DAT Optional DECwindows resource file
SYS$MANAGER:GALAXY.GCR Galaxy Configuration Ruleset
SYS$MANAGER:GCU$ACTIONS.COM System management procedures
SYS$MANAGER: xxx.GCM User-defined configuration models
SYS$HELP:GALAXY_GUIDE.DECW$BOOK Online help in Bookreader form

The GCU can be run from any Galaxy instance. If the system does not directly support graphics output, then the DECwindows display can be set to an external workstation or suitably configured PC. However, the GCU application itself must always run on the Galaxy system.

When the GCU is started, it loads any customizations found in its resource file (GCU.DAT); then it loads the Galaxy Configuration Ruleset (GALAXY.GCR). The ruleset file contains statements that determine the way the GCU displays the various system components, and includes rules that govern the ways in which users can interact with the configuration display. Users do not typically alter the ruleset file unless they are well versed in its structure or are directed to do so by a Compaq Services Engineer. After the GCU display becomes visible, the GCU determines whether the system is currently configured as an OpenVMS Galaxy or as a single-instance Galaxy on a non-Galaxy platform. If the system is configured as a Galaxy, the GCU displays the active Galaxy configuration model. The main observation window displays a hierarchical view of the Galaxy. If the system has not yet been configured as a Galaxy, the GCU prompts you as to whether or not to create a single-instance Galaxy. Note that the GCU can create a single-instance Galaxy on any Alpha system, but multiple-instance OpenVMS Galaxy environments are created by using console commands and console environment variables.

Once the Galaxy configuration model is displayed, users can either interact with the active model or take the model off line and define specific configurations for later use. The following sections discuss these functions in greater detail.

11.1 GCU Tour

The GCU can perform three types of operations:

Most GCU operations are organized around the main observation window and its hierarchical display of Galaxy components. The observation window provides a porthole into a very large space. The observation window can be panned and zoomed as needed to observe part of or all of the entire Galaxy configuration. The main toolbar contains a set of buttons that control workspace zoom operations. Workspace panning is controlled by the horizontal and vertical scrollbars; workspace sliding is achieved by holding down the middle mouse button as you drag the workspace around. This obviously assumes you have a three-button mouse.

The various GCU operations are invoked from pull-down or pop-up menu functions. General operations such as opening and closing files, and invoking external tools, are accomplished using the main menu bar entries. Operations specific to individual Galaxy components are accomplished using pop-up menus that appear whenever you click the right mouse button on a component displayed in the observation window.

In response to many operations, the GCU displays additional dialog boxes containing information, forms, editors, or prompts. Error and information responses are displayed in pop-up dialog boxes or inside the status bar along the bottom of the window, depending on the severity of the error and importance of the message.

11.1.1 Creating Galaxy Configuration Models

You can use the GCU to create Galaxy configuration models and a single-instance Galaxy on any Alpha system.

When viewing the active Galaxy configuration model, direct manipulation of display objects (components) may alter the running configuration. For example, dragging a CPU from its current location and dropping it on top of a different instance component will invoke a management action procedure that reassigns the selected CPU to the new instance. At certain times this may be a desirable operation; however, in other situations you might want to reconfigure your Galaxy all at once rather than component by component. To accomplish this, you must create an offline Galaxy configuration model.

To create a Galaxy configuration model, we must start with an existing model, typically the active one, alter it in some manner, and save it in a file.

Starting from the active Galaxy Configuration Model:

  1. Press the ENGAGE button such that the model becomes DISENGAGED. The button should turn from red to white, and its appearance should be popped outward. When disengaged, all CPU components in the display will turn red as an indication that they are no longer engaged. Do not panic, they have not been shut down!
  2. Alter the CPU assignments by dragging and dropping individual CPUs onto the instances on which you want to assign them.
  3. When finished, you can either reengage the model, or save the model in a file for later use. Whenever you reengage a model, regardless of whether the model was derived from the active model or from a file-based model, the GCU will compare the active system configuration with the configuration proposed by the model. It will then provide a summary of management actions that would need to be performed to reassign the system to the new model. If the user approves of the actions, the GCU will commence with execution of the required management actions and the resulting model will be displayed as the active and engaged model.

The reason for creating offline models is to allow significant configuration changes to be automated. For example, you can create models representing the desired Galaxy configuration at different times and then engage the models interactively by following this procedure.

11.1.2 Observation

The GCU can display the single active Galaxy configuration model, or any number of offline Galaxy configuration models. Each loaded model appears as an item in the Model menu on the toolbar. You can switch between models by clicking the desired menu item.

The active model is always named GLX$ACTIVE.GCM. When the active model is first loaded, a file by this name will exist briefly as the system verifies the model with the system hardware.

When a model is visible, you can zoom, pan, or slide the display as needed to view Galaxy components. Use the buttons on the left side of the toolbar to control the zoom functions.

The zoom functions include:
Galactic zoom Zoom to fit the entire component hierarchy into observation window.
Zoom 1:1 Zoom to the component normal scale.
Zoom to region Zoom to a selected region of the display.
Zoom in Zoom in by 10 percent.
Zoom out Zoom out by 10 percent.

Panning is accomplished by using the vertical and horizontal scrollbars. Sliding is done by pressing and holding the middle mouse button and dragging (sliding) the cursor and the image.

11.1.2.1 Layout Management

The Automatic Layout feature manages the component layout. If you ever need to refresh the layout while in Automatic Layout mode, simply select the root (topmost) component.

To alter the current layout, select Manual Layout from the Windows menu. In Manual Layout Mode, you can freely drag and drop components however you like to generate a pleasing structure. Because each component is free from automatic layout constraints, you may need to invest some time in positioning each component, possibly on each of the charts. To make things simpler, you can click the right mouse button on any component and select Layout Subtree to provide automatic layout assistance below that point in the hierarchy.

When you are satisfied with the layout, you must save the current model in a file to retain the manual layout information. The custom layout is used when the model is open. Note that if you select Auto Layout mode, your manual layout will be lost for the in-memory model. Also, in order for CPU components to reassign in a visually effective manner, they must perform subtree layout operations below the instance level. For this reason, it is best to limit any manual layout operations to the instance and community levels of the component hierarchy.

11.1.2.2 OpenVMS Galaxy Charts

The GCU provides six distinct subsets of the model, known as charts.

The six charts include:
Chart Name Shows
Logical Structure Dynamic resource assignments
Physical Structure Nonvolatile hardware relationships
CPU Assignment Simplified view of CPU assignments
Memory Assignment Memory subsystem components
IOP Assignment I/O module relationships
Failover Targets Processor failover assignments

These charts result from enabling or disabling the display of various component types to provide views of sensible subsets of components.

Specific charts may offer functionality that can be provided only for that chart type. For example, reassignment of CPUs requires that the instance components be visible. Because instances are not visible in the Physical Structure or Memory Assignment charts, you can reassign CPUs only in the Logical Structure and CPU Assignment charts.

For more information about charts, refer to Section 11.4.

11.1.3 Interaction

When viewing the active Galaxy configuration model, you can interact directly with the system components. For example, to reassign a CPU from one instance to another, you can drag and drop a CPU onto the desired instance. The GCU will validate the operation and execute an external command action to make the configuration change. Interacting with a model that is not engaged, is simply a drawing operation on the offline model, and has no impact to the running system.

While interacting with Galaxy components, the GCU applies built-in and user-defined rules that prevent misconfiguration and improper management actions. For example, you cannot reassign primary CPUs, and you cannot reassign a CPU to any component other than a Galaxy instance. Either operation would result in an error message on the status bar, and the model would return to its proper configuration. If the attempted operation violates one of the configuration rules, the error message, displayed in red on the status bar, will describe the rule that fired.

You can view details for any selected component by clicking the right mouse button and either selecting the Parameters item from the pop-up menu or by selecting Parameters from the Components menu on the main toolbar.

The GCU can shut down or reboot one or more Galaxy instances using the Shutdown or Reboot items on the Galaxy menu. The various shutdown or reboot parameters can be entered in the Shutdown dialog box. Be sure to specify the CLUSTER_SHUTDOWN option to fully shut down clustered Galaxy instances. The Shutdown dialog box allows you to select any combination of instances, or all instances. The GCU is "smart" enough to shut down its owner instance last.

11.2 Managing an OpenVMS Galaxy with the GCU

Your ability to manage a Galaxy system using the Galaxy Configuration Utility (GCU) depends on the capabilities of each instance involved in a management operation.

The GCU can be run from any instance in the Galaxy. However, the Galaxy Software Architecture implements a push-model for resource reassignment. This means that, in order to reassign a processor, you must execute the reassign command function on the instance that currently owns the processor. The GCU is aware of this requirement, and will attempt to use one or more communications paths to send the reassignment request to the owner instance. DCL is not inherently aware of this requirement; therefore, if you use DCL to reassign resources, you will need to use SYSMAN or a separately logged-in terminal to issue the commands on the owner instance.

The GCU favors using SYSMAN, and its underlying SMI_Server processes to provide command paths to the other instances in the Galaxy. However, the SMI_Server requires that the instances be in a cluster so that the command environment falls within a common security domain. However, Galaxy instances might not be clustered.

If the system cannot provide a suitable command path for the SMI_Server to use, the GCU will attempt to use DECnet task-to-task communications. This requires that the participating instances be running DECnet, and that each participating Galaxy instance have a proxy set up for the SYSTEM account.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6512PRO_005.HTML