Document revision date: 19 July 1999
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Alpha Guide to Upgrading Privileged-Code Applications


Previous Contents Index

  1. The lock_irp parameter can be either an IRP or an IRPE, depending on the data structure that was used with EXE_STD$READLOCK.
  2. Before returning from this error callback routine, you must provide the original IRP via the real_irp_p parameter so that the I/O can be properly terminated.
  3. If this routine has been passed an IRPE, a pointer to the original IRP from the irpe$l_driver_p0 cell is obtained because it was explicitly placed there.
  4. The new EXE_STD$LOCK_ERR_CLEANUP routine does all the needed unlocking and deallocation of IRPEs.
  5. Provide the address of the original IRP to the caller.

2.2.2 Impact of MMG_STD$IOLOCK, MMG_STD$UNLOCK Changes

The interface changes to the MMG_STD$IOLOCK and MMG_STD$UNLOCK routines are described in Appendix B. The general approach to these changes is to use the corresponding replacement routines and the new DIOBM structure.

2.2.2.1 Direct I/O Functions

OpenVMS device drivers that perform data transfers using direct I/O functions do so by locking the buffer into memory while still in process context, that is, in a driver FDT routine. The PTE address of the first page that maps the buffer is obtained and the byte offset within the page to the start of the buffer is computed. These values are saved in the IRP ( irp$l_svapte and irp$l_boff). The rest of the driver then uses values in the irp$l_svapte and irp$l_boff cells and the byte count in irp$l_bcnt in order to perform the transfer. Eventually when the transfer has completed and the request returns to process context for I/O postprocessing, the buffer is unlocked using the irp$l_svapte value and not the original process buffer address.

To support 64-bit addresses on a direct I/O function, one only needs to ensure the proper handling of the buffer address within the FDT routine.

Almost all device drivers that perform data transfers via a direct I/O function use OpenVMS-supplied FDT support routines to lock the buffer into memory. Because these routines obtain the buffer address either indirectly from the IRP or directly from a parameter that is passed by value, the interfaces for these routines can easily be enhanced to support 64-bit wide addresses.

However, various OpenVMS Alpha memory management infrastructure changes made to support 64-bit addressing have a potentially major impact on the use of the 32-bit irp$l_svapte cell by device drivers prior to OpenVMS Alpha Version 7.0. In general, there are two problems:

  1. It takes a full 64-bits to address a process PTE in page table space, and,
  2. The 64-bit page table space address for a process PTE is only valid when in the context of that process. (This is also known as the "cross-process PTE problem.")

In most cases, both of these PTE access problems are solved by copying the PTEs that map the buffer into nonpaged pool and setting irp$l_svapte to point to the copies. This copy is done immediately after the buffer has been successfully locked. A copy of the PTE values is acceptable because device drivers only read the PTE values and are not allowed to modify them. These PTE copies are held in a new nonpaged pool data structure, the Direct I/O Buffer Map (DIOBM) structure. A standard DIOBM structure (also known as a fixed-size primary DIOBM) contains enough room for a vector of 9 (DIOBM$K_PTECNT_FIX) PTE values. This is sufficient for a buffer size up to 64K bytes on a system with 8 KB pages.1 It is expected that most I/O requests are handled by this mechanism and that the overhead to copy a small number of PTEs is acceptable, especially given that these PTEs have been recently accessed to lock the pages.

The standard IRP contains an embedded fixed-size DIOBM structure. When the PTEs that map a buffer fit into the embedded DIOBM, the irp$l_svapte cell is set to point to the start of the PTE copy vector within the embedded DIOBM structure in that IRP.

If the buffer requires more than 9 PTEs, then a separate "secondary" DIOBM structure that is variably-sized is allocated to hold the PTE copies. If such a secondary DIOBM structure is needed, it is pointed to by the original, or "primary" DIOBM structure. The secondary DIOBM structure is deallocated during I/O postprocessing when the buffer pages are unlocked. In this case, the irp$l_svapte cell is set to point into the PTE vector in the secondary DIOBM structure. The secondary DIOBM requires only 8 bytes of nonpaged pool for each page in the buffer. The allocation of the secondary DIOBM structure is not charged against the process BYTLM quota, but it is controlled by the process direct I/O limit (DIOLM). This is the same approach used for other internal data structures that are required to support the I/O, including the kernel process block, kernel process stack, and the IRP itself.

However, as the size of the buffer increases, the run-time overhead to copy the PTEs into the DIOBM becomes noticeable. At some point it becomes less expensive to create a temporary window in S0/S1 space to the process PTEs that map the buffer. The PTE window method has a fixed cost, but the cost is relatively high because it requires PTE allocation and TB invalidates. For this reason, the PTE window method is not used for moderately sized buffers.

The transition point from the PTE copy method with a secondary DIOBM to the PTE window method is determined by a new system data cell, ioc$gl_diobm_ptecnt_max, which contains the maximum desireable PTE count for a secondary DIOBM. The PTE window method will be used if the buffer is mapped by more than ioc$gl_diobm_ptecnt_max PTEs.

When a PTE window is used, irp$l_svapte is set to the S0/S1 virtual address in the allocated PTE window that points to the first PTE that maps the buffer. This S0/S1 address is computed by taking the S0/S1 address that is mapped by the first PTE allocated for the window and adding the byte offset within page of the first buffer PTE address in page table space. A PTE window created this way is removed during I/O postprocessing.

The PTE window method is also used if the attempt to allocate the required secondary DIOBM structure fails due to insufficient contiguous nonpaged pool. With an 8 Kb page size, the PTE window requires a set of contiguous system page table entries equal to the number of 8 Mb regions in the buffer plus 1. Failure to create a PTE window as a result of insufficient SPTEs is unusual. However, in the unlikely event of such a failure, if the process has not disabled resource wait mode, the $QIO request is be backed out and the requesting process is put into a resource wait state for nonpaged pool (RSN$_NPDYNMEM). When the process is resumed, the I/O request is retried. If the process has disabled resource wait mode, a failure to allocate the PTE window results in the failure of the I/O request.

When the PTE window method is used, the level-3 process page table pages that contain the PTEs that map the user buffer are locked into memory as well. However, these level-3 page table pages are not locked when the PTEs are copied into nonpaged pool or when the SPT window is used.

The new IOC_STD$FILL_DIOBM routine is used to set irp$l_svapte by one of the previously described three methods. The OpenVMS-supplied FDT support routines EXE_STD$MODIFYLOCK, EXE_STD$READLOCK, and EXE_STD$WRITELOCK use the IOC_STD$FILL_DIOBM routine in the following way:

  1. The buffer is locked into memory by calling the new MMG_STD$IOLOCK_BUF routine. This routine returns a 64-bit pointer to the PTEs and replaces the obsolete MMG_STD$IOLOCK routine.


    status = mmg_std$iolock_buf (buf_ptr, bufsiz, is_read, pcb, &irp->irp$pq_vapte, 
                             &irp->irp$ps_fdt_context->fdt_context$q_qio_r1_value); 
    

    For more information about this routine, see Section B.17.

  2. A value for the 32-bit irp$l_svapte cell is derived by calling the new IOC_STD$FILL_DIOBM routine with a pointer to the embedded DIOBM in the IRP, the 64-bit pointer to the PTEs that was returned by MMG_STD$IOLOCK_BUF, and the address of the irp$l_svapte cell.1


    status = ioc_std$fill_diobm (&irp->irp$r_diobm, irp->irp$pq_vapte, pte_count, 
                                 DIOBM$M_NORESWAIT, &irp->irp$l_svapte); 
    

    The DIOBM structure is fully described in Section A.6 and this routine is described in Section B.10.

Device drivers that call MMG_STD$IOLOCK directly will need to examine their use of the returned values and might need to call the IOC_STD$FILL_DIOBM routine.

2.2.3 Impact of MMG_STD$SVAPTECHK Changes

Prior to OpenVMS Alpha Version 7.0, the MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines compute a 32-bit svapte for either a process or system space address. As of OpenVMS Alpha Version 7.0, these routines are restricted to an S0/S1 system space address and no longer accept an address in P0/P1 space. The MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines declare a bugcheck for an input address in P0/P1 space. These routines return a 32-bit system virtual address through the SPT window for an input address in S0/S1 space.

The MMG_STD$SVAPTECHK and MMG$SVAPTECHK routines are used by a number of OpenVMS Alpha device drivers and privileged components. In most instances, no source changes are required because the input address is in nonpaged pool.

The 64-bit process-private virtual address of the level 3 PTE that maps a P0/P1 virtual address can be obtained using the new PTE_VA macro. Unfortunately, this macro is not a general solution because it does not address the cross-process PTE access problem. Therefore, the necessary source changes depend on the manner in which the svapte output from MMG_STD$SVAPTECHK is used.

The INIT_CRAM routine uses the MMG$SVAPTECHK routine in its computation of the physical address of the hardware I/O mailbox structure within a CRAM that is in P0/P1 space. If you need to obtain a physical address, use the new IOC_STD$VA_TO_PA routine.

If you call MMG$SVAPTECHK and IOC$SVAPTE_TO_PA, use the new IOC_STD$VA_TO_PA routine instead.

The PTE address in dcb$l_svapte must be expressible using 32 bits and must be valid regardless of process context. Fortunately, the caller's address is within the buffer that was locked down earlier in the CONV_TO_DIO routine via a call to EXE_STD$WRITELOCK and the EXE_STD$WRITELOCK routine derived a value for the irp$l_svapte cell using the DIOBM in the IRP. Therefore, instead of calling the MMG$SVAPTECHK routine, the BUILD_DCB routine has been changed to call the new routine EXE_STD$SVAPTE_IN_BUF, which computes a value for the dcb$l_svapte cell based on the caller's address, the original buffer address in the irp$l_qio_p1 cell, and the address in the irp$l_svapte cell.

2.2.4 Impact of PFN Database Entry Changes

There are changes to the use of the PFN database entry cells containing the page reference count and back link pointer.

For more information, see Section 2.3.6.

2.2.5 Impact of IRP Changes

All source code references to the irp$l_ast, irp$l_astprm, and irp$l_iosb cells have been changed. These IRP cells were removed and replaced by new cells.

For more information, see Appendix A.

Note

1 Eight PTEs are sufficient only if the buffer begins exactly on a page boundary, otherwise a 9th is required.
1 For performance reasons, the common case that can be handled by the PTE vector in the embedded DIOBM may be duplicated in line to avoid the routine call.

2.3 General Memory Management Infrastructure Changes

This section describes OpenVMS Alpha Version 7.0 changes to the memory management subsystem that might affect privileged-code applications.

For complete information about OpenVMS Alpha support for 64-bit addresses, see the OpenVMS Alpha Guide to 64-Bit Addressing and VLM Features.

2.3.1 Location of Process Page Tables

The process page tables no longer reside in the balance slot. Each process references its own page tables within page table space using 64-bit pointers.

Three macros (located in VMS_MACROS.H in SYS$LIBRARY:SYS$LIB_C.TLB) are available to obtain the address of a PTE in Page Table Space:

Two macros (located in SYS$LIBRARY:LIB.MLB) are available to map and unmap a PTE in another process's page table space:

Note that use of MAP_PTE and UNMAP_PTE requires the caller to hold the MMG spinlock across the use of these macros and that UNMAP_PTE must be invoked before another MAP_PTE can be issued.

2.3.2 Interpretation of Global and Process Section Table Index

As of OpenVMS Alpha Version 7.0, the global and process section table indexes, stored primarily in the PHD and PTEs, have different meanings. The section table index is now a positive index into the array of section table entries found in the process section table or in the global section table. The first section table index in both tables is now 1.

To obtain the address of a given section table entry, do the following:

  1. Add the PHD address to the value in PHD$L_PST_BASE_OFFSET.
  2. Multiply the section table index by SEC$C_LENGTH.
  3. Subtract the result of Step 2 from the result of Step 1.

2.3.3 Location of Process and System Working Set Lists

The base address of the working set list can no longer be found within the process PHD or the system PHD. To obtain the address of the process working set list, use the 64-bit data cell CTL$GQ_WSL To obtain the address of the system working set list, use the 64-bit data cell MMG$GQ_SYSWSL.

Note that pointers to working set list entries must be 64-bit addresses in order to be compatible with future versions of OpenVMS Alpha after Version 7.0.

2.3.4 Size of a Working Set List Entry

Each working set list entry (WSLE) residing in the process or the system working set list is now 64 bits in size. Thus, a working set list index must be interpreted as an index into an array of quadwords instead of an array of longwords, and working set list entries must be interpreted as 64 bits in size. Note that the layout of the low bits in the WSLE is unchanged.

2.3.5 Location of Page Frame Number (PFN) Database

Due to the support for larger physical memory systems in OpenVMS Alpha Version 7.0, the PFN database has been moved to S2 space, which can only be accessed with 64-bit pointers. Privileged routine interfaces within OpenVMS Alpha that pass PFN database entry addresses by reference have been renamed to force compile-time or link-time errors.

Privileged code that references the PFN database must be inspected and possibly modified to ensure that 64-bit pointers are used.

2.3.6 Format of PFN Database Entry

The offset PFN$L_REFCNT in the PFN database entry has been replaced with a different-sized offset that is packed together with other fields in the PFN database.

References to the field PFN$L_REFCNT should be modified to use the following macros (located in PFN_MACROS.H within SYS$LIBRARY:SYS$LIB_C.TLB).

As of OpenVMS Alpha Version 7.0, the offset PFN$L_PTE in the PFN database entry has been replaced with a new PTE backpointer mechanism. This mechanism can support page table entries that reside in 64-bit virtual address space.

References to the field PFN$L_PTE should be modified to use one of the following macros (located in PFN_MACROS.H within SYS$LIBRARY:SYS$LIB_C.TLB).

Note that pointers to PFN database entries must be 64 bits.

2.3.7 Process Header WSLX and BAK Arrays

Prior to OpenVMS Alpha Version 7.0, the process header contained two internal arrays of information that were used to help manage the balance slot contents (specifically, page table pages) during process swapping. These two arrays, along with the working set list index (WSLX) and backing storage (BAK) arrays, no longer are required for page table pages.

The swapper process now uses the upper-level page table entries and the working set list itself to manage the swapping of page table pages. A smaller version of the BAK array, now used only for backing storage information for balance slot pages, is located at the end of the fixed portion of the process header at the offset PHD$Q_BAK_ARRAY.

2.3.8 Free S0/S1 System Page Table Entry List

The format of a free page table entry in S0/S1 space has been changed to use index values from the base of page table space instead of the base of S0/S1 space. The free S0/S1 PTE list also uses page table space index values. The list header has been renamed to LDR$GQ_FREE_S0S1_PT.

2.3.9 Location of the Global Page Table

In order to support larger global sections and larger numbers of global sections in OpenVMS Alpha Version 7.0, the global page table has been moved to S2 space, which can be accessed only with 64-bit pointers.

Privileged code that references entries within GPT must be inspected and possibly modified to ensure that 64-bit pointers are used.

2.3.10 Free Global Page Table Entry List

The format of the free GPT entry has been changed to use index values from the base of the global page table instead of using the free pool list structure. The free GPT entry format is now similar to the free S0/S1 PTE format.

Note that pointers to GPT entries must be 64 bits.

2.3.11 Region Descriptor Entries (RDEs)

As of OpenVMS Alpha Version 7.0, each process virtual addressing region is described by a region descriptor entry (RDE). The program region (P0) and control region (P1) have region descriptor entries that contain attributes of the region and describe the current state of the region. The program region RDE is located at offset PHD$Q_P0_RDE within the process's PHD. The control region RDE is located at offset PHD$Q_P1_RDE, also within the process's PHD.

Many internal OpenVMS Alpha memory management routines accept a pointer to the region's RDE associated with the virtual address also passed to the routine.

The following two functions (located in MMG_FUNCTIONS.H in SYS$LIBRARY:SYS$LIB_C.TLB) are available to obtain the address of an RDE:

2.4 Kernel Threads Changes

This section describes the OpenVMS Alpha kernel threads features that might require changes to privileged-code applications.

2.4.1 The CPU$L_CURKTB Field

The CPU$L_CURKTB field in the CPU databases contains the current kernel thread executing on that CPU. If kernel-mode codes own the SCHED spinlock, then the current KTB address can be obtained from this field. Before kernel threads implementation, this was the current PCB address.

2.4.2 Mutex Locking

No changes are necessary to kernel-mode code that locks mutexes. All of SCH$LOCK*, SCH$UNLOCK*, and SCH$IOLOCK* routines determine the correct kernel thread if it must wait because the mutex is already owned.

2.4.3 Scheduling Routines

Code that calls any of the scheduling routines that previously took the current PCB as a parameter must be changed to pass the current KTB. These scheduling routines are as follows:
EXE$KERNEL_WAIT SCH$WAIT_PROC
EXE$KERNEL_WAIT_PS SCH$UNWAIT
   
SCH$RESOURCE_WAIT RPTEVT macro
SCH$RESOURCE_WAIT_PS SCH$REPORT_EVENT
SCH$RESOURCE_WAIT_SETUP SCH$CHANGE_CUR_PRIORITY
   
SCH$CHSE SCH$REQUIRE_CAPABILITY
SCH$CHSEP SCH$RELEASE_CAPABILITY

Code that calls any of the scheduling routines that previously took the process PID as a parameter must be changed to pass the thread's PID. These scheduling routines are as follows:
SCH$POSTEF SCH$WAKE

2.4.4 New MWAIT State

A thread that is waiting for ownership of the inner-mode semaphore may be put into MWAIT. The KTB$L_EFWM field contains a process-specific MWAIT code. The low word of the field contains RSN$_INNER_MODE, and the upper word contains the process index from the PID.

2.4.5 System Services Dispatching

The system services dispatcher has historically passed the PCB address to the inner-mode services. This is still true with kernel threads. The current KTB is not passed to the services.

2.4.6 ASTs

The ACB$L_PID field in the ACB should represent the kernel thread to which the AST is targeted. All other AST context is the same.

Inner-mode ASTs can be delivered on whichever kernel thread is currently in inner mode. ASTs that have the ACB$V_THREAD_SAFE bit set will always be delivered to the targeted thread, regardless of other-inner mode activity. Use extreme care if this is used. Attempted thread-safe AST delivery to a kernel thread that has been deleted is delivered to the initial thread.

2.4.7 TB Invalidation and Macros

With the kernel threads implementation, a process's address space can be active on multiple CPUs at the same time. Any privileged code that creates or deletes process virtual address space "by hand" must do the proper invalidation across all CPUs. A set of macros have been created for BLISS, C, and MACRO-32 to facilitate translation buffer invalidation. The macros are as follows:

Table 2-1 describes the arguments for the TBI_DATA_64 and TBI_SINGLE macros. Note that the difference between TBI_DATA_64 and TBI_SINGLE is that the former invalidates an entry from the data translation buffer only, while the latter invalidates an entry from both the data and the instruction translation buffers.

Table 2-1 Arguments for TBI_DATA_64 and TBI_SINGLE
Keyword Value Meaning
ADDR = The virtual address to be invalidated. The address can be either a 64-bit VA or a sign-extended 32-bit VA. For MACRO-32, the address must be specified in a register.  
ENVIRON = THIS_CPU_ONLY Indicates that this invocation of TBIS is to be executed strictly within the context of the local CPU only. Thus, no attempt is made whatsoever to extend the TBIS request to any CPU or other "processor" that might exist within the system.
  = ASSUME_PRIVATE Indicates that this is a threads environment and that the address should be treated as a private address and not be checked. Therefore, in an SMP environment, it is necessary to do the invalidate to other CPUs that are running a kernel thread from this process. This argument is used for system space addresses that should be treated as private to the process.
  = ASSUME_SHARED Indicates that this invocation of TBIS should be broadcast to all other CPUs in the system. ASSUME_SHARED is the opposite of THIS_CPU_ONLY.
  = LOCAL This is now obsolete and generates an error.
  = anything other than the above Forces the TB invalidate to be extended to all components of the system that may have cached PTEs.
PCBADDR = Address of current process control block. Default is NO_PCB, which means that a PCB address does not need to be specified. The default is R31 for the MACRO-32 macros. This argument must be specified if the address to be invalidated is process-private (either ENVIRON=ASSUME_PRIVATE or no keyword for the ENVIRON qualifier was specified).

Table 2-2 describes the arguments for the TBI_ALL macro.

Table 2-2 Arguments for TBI_ALL
Keyword Value Meaning
ENVIRON = THIS_CPU_ONLY Indicates that this invocation of TBI_ALL is to be executed strictly within the context of the local CPU. No attempt is made to extend the TBIA request to any CPU or other "processor" that might exist within the system.
  = LOCAL This is now obsolete and generates an error.
  = anything other than the above Forces the TB invalidate to be extended to all components of the system that may have cached PTEs.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6466PRO_001.HTML