Reliable Transaction Router
Release Notes
1.5 Documentation Changes
There are no corrections to existing documentation in this release.
1.6 Limitations
Please see the Software Product Description.
1.7 Known Problems
- If an RTR ACP process dies by any means other than the RTR STOP RTR
command, Compaq strongly recommends that you immediately issue the RTR
STOP RTR command to update RTR's shared memory tables. Similarly with
the RTR Command Server, type RTR DISCONNECT SERVER whenever a Command
Server dies in an unplanned manner. Failure to do so may cause RTR to
try to connect to processes that no longer exist; this may have
undesirable results.
- 14-1-520 Remote commands fail with ERRACCNOD when
DECnet/TCP preference mismatched
Remote Commands may not work if
there is a mismatch between the RTR_PREF_PROT network protocol
preference environment variable on local and remote nodes. Although the
name of the remote node can be prefixed with tcp. or dna. to select a
protocol with which the local node contacts the remote node, this does
not influence the protocol used for the return leg. If the remote node
attempts to connect back to the local node using the wrong protocol,
then the remote command attempt can fail with ERRACCNOD, without a more
detailed entry in the log. [A more normal cause of ERRACCNOD is a lack
of authorization: try simple non-RTR remote commands like rsh host
date, or TYPE host::"0=procedure".]
The default for the
environment variable RTR_PREF_PROT is RTR_DNA_FIRST for OpenVMS nodes
with DECnet, but RTR_TCP_FIRST for other platforms. Other possible
values are RTR_DNA_ONLY and RTR_TCP_ONLY.
1.8 Problem Reporting
For problem reporting:
- Send mail to your Compaq Service Representative requesting that it
be forwarded to the RTR Quality Manager.
- If you have any RTR log files or pertinent output from monitor
pictures or RTR SHOW commands, send it to us via E-mail.
- Send us as much other information as possible about the conditions
which caused the failure, pointers to applications programs which
caused the problem, command sequences, etc.
2 Compaq Tru64 UNIX Specific Information
This chapter gives platform-specific information for the Compaq Tru64
UNIX implementation of Reliable Trasaction Router, Version 3.2.
2.1 New Features
- RTR supports XA; however, problems have been found when testing
with Oracle 7.34 and 8.04. Contact Oracle support for details.
- 14-5-44 New script rtr_snapshot.sh for gathering RTR
diagnostic data
The new command rtr_snapshot.sh calls various SHOW
and MONITOR commands to output a snapshot of the state of RTR on a
node. This information may be of use for monitoring, tuning,
troubleshooting, and reporting problems.
2.2 Known Problems Corrected Since Version 3.1D
- 14-1-643 Assertion when restarting timed out command
server at RTR> prompt
When an idle command server started by the
same RTR> prompt process times out after RTR_COMSERV_TIMEOUT seconds
(default 300) and is restarted for a new command, the RTR> prompt
process could raise an assertion. This problem has been corrected.
- 14-3-190 Signal handling by RTR shared library in RTR
applications
The first RTR api call no longer replaces any existing
signal handlers that were installed by the application main program for
the three usual termination signals SIGINT, SIGHUP, and SIGTERM.
If
no existing termination signal handlers are found (SIG_DFL), RTR
installs a simple handler which will cause RTR to call exit() at the
next convenient opportunity during an RTR api call, or in the RTR
polling thread in a threaded application.
RTR installs an exit()
handler with atexit(). This handler is not essential, but is intended
to perform a more controlled shutdown of RTR in an application than
when the process is terminated abruptly, for example with _exit(),
which does not call exit handlers.
The application may choose to
leave the RTR termination signal handler in place, or to install its
own handlers at any time. The application handlers should notify the
mainline program in an async-safe manner that it should call exit()
when convenient, and may even be constructed to also call the RTR
handler they replaced so that the application can exit in an RTR api
call too. Consult the operating system documentation for the usual
restrictions on exactly what is permitted in an async-safe signal
handler.
If the application does not install its own signal
handlers for the usual termination signals and does not continue to
make regular RTR api calls, then the application will appear to ignore
them.
RTR still installs an empty handler to catch the SIGPIPE
signal to avoid the default action of program termination. In
unthreaded applications RTR may still install the RTR SIGIO handler
which also executes any previous SIGIO handler installed by the main
program.
2.3 Known Problems with Workarounds
- 14-3-217 Unthreaded UNIX applications using rtr_set_wakeup
can fail, e.g., in malloc
When an unthreaded UNIX RTR application
calls rtr_set_wakeup, the non-reentrant RTR shared library -lrtr with
which it is linked installs a signal handler. This signal handler
called functions internal to RTR which could occasionally call runtime
library functions such as malloc() that are not async-safe, according
to the relevant standards. See man (4) signal.
In practice this may
appear to work most of the time, but break for no apparent reason when
the signal happens to occur while background code is also in a runtime
library call such as malloc.
The problem in RTR has been corrected.
The small penalty for this is that RTR no longer makes any attempt to
try to ensure that messages available are not just housekeeping.
Applications must always be prepared for a timeout return status on
calling rtr_receive_message with a zero timeout, even after a wakeup
suggests that a message ought to be available.
Application writers
are reminded that their RTR wakeup handlers are subject to the same
restrictions: routines like printf, malloc, and the entire RTR API may
not be used directly or indirectly from within a signal handler. A
workaround for applications with unsafe wakeup handlers can be to link
with the reentrant version of the library -lrtr_r because different
rules apply for wakeups in a thread: applications should not call
anything that is not thread-safe, or anything that might block
indefinitely, such as rtr_send_to_server, rtr_reply_to_client,
rtr_broadcast_event, or rtr_receive_message with a non-zero timeout.
- 14-7-952 Treating dumb unknown terminal like a VT100
If you try to run RTR on a terminal or window with unknown (zero)
dimensions, RTR exits immediately with a BADROWCOL message.
A
workaround is to enter the following UNIX command:
RTR expects the terminal or window to be at least capable of
emulating a VT100 terminal. Otherwise, a few control characters are
displayed at the beginning of each line, and the output from the
MONITOR command contains so many control sequences that it is
unreadable.
A workaround is to redirect both standard input and
output to files:
rtr monitor calls < /dev/null > monitored_calls.lis
|
2.4 Restrictions
- 14-1-420 RTR's use of the Trucluster Distributed Lock
Manager
RTR uses the Distributed Lock Manager that comes with
TruCluster PS to manage access to certain system resources. Among other
uses, the primary reason for locks is to coordinate access to RTR's
journal.
To support standby servers in a TruCluster, the RTR
journal for each node must be accessible by RTR on any node in the
TruCluster in case of failure of any other cluster member nodes. As
part of TruCluster support, the ownership of the NFS service may
failover from one node to another. RTR exploits this feature when it
finds it necessary to recover transactions from another node's journal.
Before RTR opens a journal, it will verify that the local node has
assumed ownership of the shared disk service (as determined by the
Distributed Lock Manager). This can work only if each RTR journal in a
TruCluster is located on its own distinct shared disk service.
- 14-3-50 Maximum number of application processes limit
An ACP crash that occurred when starting the last of a great many
applications has been corrected.
When the process open file limit
is reached, the application will now generally report ACPNOTVIA, "RTR
ACP is no longer a viable entity, restart RTR". In actual fact the ACP
continues to operate with all previously connected processes, and only
the new rejected process thinks that the RTR ACP is not alive. This
message should be interpreted as "ACPINSRES, The RTR ACP has
insufficient resources."
Please ensure that your system is
configured with sufficient default per-process resources, or that the
acp process is started with increased resource limits. Allow at least
one open file for each additional application process, and at least one
for each link.
3 OpenVMS Specific Information
This chapter gives platform-specific information for the OpenVMS
implementation of Reliable Transaction Router, Version 3.2.
3.1 New Features
There are no new features in this release that are specific to this
platform.
3.2 Known Problems Corrected Since Version 3.1D
- 14-1-170 rtr_api_wakeup_entries/exits not maintained on
OpenVMS
The process counters rtr_api_wakeup_entries/exits were not
incremented on OpenVMS. This gave an incorrect indication of the number
of wakeup calls on the "monitor calls" picture. This behavior has been
corrected.
- 14-1-260 Display key range bounds completely and in
appropriate format
Quadword signed and unsigned key ranges are
supported on all Rtr platforms including OpenVMS Alpha and VAX.
- 14-1-544 Non-portable VMS journals across VAX/ALPHA
The incompatibility between the VAX and Alpha journal files has
been corrected. Customers will have to do: rtr>
CREATE JOURNAL/SUPERSEDE when they install V3.2.
- 14-3-53 Sys$start_txw sometimes returns 0 instead of 1
upon success
ASTlm resource limitations may result in applications
receiving an erroneous indication that the ACP is not available.
Raising the process ASTlm quota corrected this problem.
- 14-3-89 V2 field ASTPRM not in RTR$_EVT
The RTR$_EVT
structure, part of the v2 compatibility layer, now contains the field
RTR$L_EVT_ASTPRM (as with RTR V2). The value of RTR$K_EVTAST_ARGNO has
been altered accordingly (from 6 to 7).
- 14-3-131 $DCL_TX_PRC crashes when the user is
underprivileged
Running a V2 application from an account that does
not have RTR info privilege no longer causes the application to crash.
- 14-3-135 RTR V3 does not select all nodes in a VMScluster
when using the SET ENVIRONMENT command
SET ENVIRONMENT/CLUSTER now
works on OpenVMS and Windows NT.
Previously, all nodes in the
cluster had to be listed in a SET ENVIRONMENT /NODE=(...) command in
order to issue subsequent commands to all of them. SET ENVIRONMENT
/CLUSTER is now available on OpenVMS Windows and NT clusters, as well
as on Compaq Tru64 UNIX TruCluster.
- 14-3-169 Application not notified if ACP dies
Upon the
death of the ACP process, RTR V3 would incorrectly terminate any
outstanding calls to the V2 API wrapper with the status ACPNOTVIA. V2
behaviour has been restored, and such calls now terminate with NOACP.
- 14-3-196 Application calling $START_TX at AST level while
the ACP died would cause the application to crash inside LIBRTR.
This has been corrected and SYS$START_TX will simply return to the
caller a message indicating that the ACP is not available.
- 14-3-197 ACPNOTVIA error returned if RTR command
$DCL_TX_PRC issued
The RTR command $DCL_TX_PRC issued for a
non-existent facility caused an ACPNOTVIA error return. This does not
happen the first time - only subsequent times if RTR is stopped in
between.
API verbs called from the RTR command line interpreter
would fail with the status ACPNOTVIA if RTR was stopped and restarted
without restarting the command server. This has been corrected. The
problem can be avoided on earlier vesions of RTR by issuing the command
'disconnect server' after stopping RTR.
- 14-3-285 OpenVMS process quotas artificially constrained
Prior versions of RTR would limit the maximum values that could be
specified for the ACP process quotas to 64K. This restriction has been
removed. Warning messages are generated if the requested (or default)
memory quotas conflict with the system wide WSMAX parameter, or if the
calculated or specified page file quota is greater than the remaining
free page file space.
- 14-3-286 Synchronous call to accept DECnet connect causes
links to get isolated
Stalling of ACP due to synchronous(sys$qiow)
calls has been fixed by changing to asynchronous calls (sys$qio), which
prevents the link from being disconnected. A completion event is called
at the end of a successful asynchronous DECnet accept connection.
Similarly, DECnet connection reject has also been fixed by changing to
asynchronous calls instead of synchronous.
- 14-7-640 "exceeded byte count quota" message received if
process quota bytlm is less than the specified value
On starting
RTR in OpenVMS, if process quota bytlm is less than the specified value
(e.g., currently 100000), RTR will return an OpenVMS error message
"exceeded byte count quota" and will not start.
Users
should change the BYTLM setting to the specified value or higher to
eliminate the error message and start RTR.
Application users with
less process quota bytlm than the specified value will receive RTR
error code RTR_STS_BYTLMNSUFF on starting their application.
- 14-8-130 ACCVIO and omitted parameters using
Inter-Operability Services
This version of the RTR
Inter-Operability Services now checks the number of parameters passed.
If the consumer of the API omits the trailing optional parameter(s),
RTR will detect it and supply the necessary value.
It is better
practice to supply a "0" for the optional arguments.
3.3 Known Problems with Workarounds
There are no known problems with workarounds in this release that are
specific to this platform.
3.4 Restrictions
- 14-1-279 RTR V2 compatibility interface is not yet
thread-safe
The RTR V2 compatibility interface may only be called
from one program thread.
- 14-3-139 RTR V3 only allows up to 30 bytes for the EVTNAM
parameter
The RTR V2 compatibility layer only allows up to 30 bytes
for the EVTNAM parameter to $DCL_TX_PRC(W), whereas RTR V2 allows up to
32 bytes.
- 14-7-625 RTR V3 cannot be run in system-mode on a machine
on which RTR V2 is already running
RTR V3 cannot be run in
system-mode on a machine on which RTR V2 is already running. If this is
attempted the RTR V3 acp process will fail. Please make sure V2 RTR has
been stopped before attempting to install and run RTR V3.
- 14-7-1026 Increased AST Process Quota Usage
It may be
necessary to increase process ASTLM quotas after upgrading from RTR V2
to V3. If your application receives a large number of messages in a
relatively small time period, and you find that RTR calls are failing
to complete, raise the ASTLM substantially. For example, if your
process receives several hundred broadcasts in a few seconds, raise
ASTLM by several hundred.
4 AIX Specific Information
This chapter gives platform-specific information for the AIX
implementation of Reliable Transaction Router, Version 3.2.
4.1 New Features
- 14-5-44 New script rtr_snapshot.sh for gathering RTR
diagnostic data
The new command rtr_snapshot.sh calls various SHOW
and MONITOR commands to output a snapshot of the state of RTR on a
node. This information may be of use for monitoring, tuning,
troubleshooting, and reporting problems.
4.2 Known Problems Corrected Since Version 3.1D
- 14-1-643 Assertion when restarting timed out command
server at RTR> prompt
When an idle command server started by the
same RTR> prompt process times out after RTR_COMSERV_TIMEOUT seconds
(default 300) and is restarted for a new command, the RTR> prompt
process could raise an assertion. This problem has been corrected.
- 14-3-190 Signal handling by RTR shared library in RTR
applications
The first RTR api call no longer replaces any existing
signal handlers that were installed by the application main program for
the three usual termination signals SIGINT, SIGHUP, and SIGTERM.
If
no existing termination signal handlers are found (SIG_DFL), RTR
installs a simple handler which will cause RTR to call exit() at the
next convenient opportunity during an RTR api call, or in the RTR
polling thread in a threaded application.
RTR installs an exit()
handler with atexit(). This handler is not essential, but is intended
to perform a more controlled shutdown of RTR in an application than
when the process is terminated abruptly, for example with _exit(),
which does not call exit handlers.
The application may choose to
leave the RTR termination signal handler in place, or to install its
own handlers at any time. The application handlers should notify the
mainline program in an async-safe manner that it should call exit()
when convenient, and may even be constructed to also call the RTR
handler they replaced so that the application can exit in an RTR api
call too. Consult the operating system documentation for the usual
restrictions on exactly what is permitted in an async-safe signal
handler.
If the application does not install its own signal
handlers for the usual termination signals and does not continue to
make regular RTR api calls, then the application will appear to ignore
them.
RTR still installs an empty handler to catch the SIGPIPE
signal to avoid the default action of program termination. In
unthreaded applications RTR may still install the RTR SIGIO handler
which also executes any previous SIGIO handler installed by the main
program.
- 14-3-275 aio not available makes RTR fail with unresolved
errors for kaio_rdrw, etc.
RTR for AIX exploits Asynchronous I/O
for increased journal performance. By default, aio is only
defined, i.e., disabled, instead of available. Aio
can be configured with the system management tool: # smit aio.
The
RTR installation procedure post_i script now makes aio available, and
ensures that aio will also be available after a restart.
4.3 Known Problems with Workarounds
- 14-3-217 Unthreaded UNIX applications using rtr_set_wakeup
can fail, e.g., in malloc
When an unthreaded UNIX RTR application
calls rtr_set_wakeup, the non-reentrant RTR shared library -lrtr with
which it is linked installs a signal handler. This signal handler
called functions internal to RTR which could occasionally call runtime
library functions such as malloc() that are not async-safe, according
to the relevant standards. See man (4) signal.
In practice this may
appear to work most of the time, but break for no apparent reason when
the signal happens to occur while background code is also in a runtime
library call such as malloc.
The problem in RTR has been corrected.
The small penalty for this is that RTR no longer makes any attempt to
try to ensure that messages available are not just housekeeping.
Applications must always be prepared for a timeout return status on
calling rtr_receive_message with a zero timeout, even after a wakeup
suggests that a message ought to be available.
Application writers
are reminded that their RTR wakeup handlers are subject to the same
restrictions: routines like printf, malloc, and the entire RTR API may
not be used directly or indirectly from within a signal handler. A
workaround for applications with unsafe wakeup handlers can be to link
with the reentrant version of the library -lrtr_r because different
rules apply for wakeups in a thread: applications should not call
anything that is not thread-safe, or anything that might block
indefinitely, such as rtr_send_to_server, rtr_reply_to_client,
rtr_broadcast_event, or rtr_receive_message with a non-zero timeout.
- 14-7-952 Do not treat dumb unknown terminal like a VT100
If you try to run RTR on a terminal or window with unknown (zero)
dimensions, RTR exits immediately with a BADROWCOL message.
A
workaround is to enter the following UNIX command:
RTR expects the terminal or window to be at least capable of
emulating a VT100 terminal. Otherwise, a few control characters are
displayed at the beginning of each line, and the output from the
MONITOR command contains so many control sequences that it is
unreadable.
A workaround is to redirect both standard input and
output to files:
rtr monitor calls < /dev/null > monitored_calls.lis
|
4.4 Restrictions
- 14-3-50 Maximum number of application processes limit
An ACP crash that occurred when starting the last of a great many
applications has been corrected.
When the process open file limit
is reached, the application will now generally report ACPNOTVIA, "RTR
ACP is no longer a viable entity, restart RTR". In actual fact the ACP
continues to operate with all previously connected processes, and only
the new rejected process thinks that the RTR ACP is not alive. This
message should be interpreted as "ACPINSRES, The RTR ACP has
insufficient resources."
Please ensure that your system is
configured with sufficient default per-process resources, or that the
acp process is started with increased resource limits. Allow at least
one open file for each additional application process, and at least one
for each link.
5 Sun Solaris Specific Information
This chapter gives platform-specific information for the Sun Solaris
implementation of Reliable Transaction Router, Version 3.2.
5.1 New Features
- 14-5-44 New script rtr_snapshot.sh for gathering RTR
diagnostic data
The new command rtr_snapshot.sh calls various SHOW
and MONITOR commands to output a snapshot of the state of RTR on a
node. This information may be of use for monitoring, tuning,
troubleshooting, and reporting problems.
5.2 Known Problems Corrected Since Version 3.1D
- 14-1-643 Assertion when restarting timed out command
server at RTR> prompt
When an idle command server started by the
same RTR> prompt process times out after RTR_COMSERV_TIMEOUT seconds
(default 300) and is restarted for a new command, the RTR> prompt
process could raise an assertion. This problem has been corrected.
- 14-3-190 Signal handling by RTR shared library in RTR
applications
The first RTR api call no longer replaces any existing
signal handlers that were installed by the application main program for
the three usual termination signals SIGINT, SIGHUP, and SIGTERM.
If
no existing termination signal handlers are found (SIG_DFL), RTR
installs a simple handler which will cause RTR to call exit() at the
next convenient opportunity during an RTR api call, or in the RTR
polling thread in a threaded application.
RTR installs an exit()
handler with atexit(). This handler is not essential, but is intended
to perform a more controlled shutdown of RTR in an application than
when the process is terminated abruptly, for example with _exit(),
which does not call exit handlers.
The application may choose to
leave the RTR termination signal handler in place, or to install its
own handlers at any time. The application handlers should notify the
mainline program in an async-safe manner that it should call exit()
when convenient, and may even be constructed to also call the RTR
handler they replaced so that the application can exit in an RTR api
call too. Consult the operating system documentation for the usual
restrictions on exactly what is permitted in an async-safe signal
handler.
If the application does not install its own signal
handlers for the usual termination signals and does not continue to
make regular RTR api calls, then the application will appear to ignore
them.
RTR still installs an empty handler to catch the SIGPIPE
signal to avoid the default action of program termination. In
unthreaded applications RTR may still install the RTR SIGIO handler
which also executes any previous SIGIO handler installed by the main
program.
- 14-3-193 Link loss after Sun Solaris 2.5.1 send (34:
Result too large)
Sun has confirmed that the sendmsg() system call
on Sun Solaris 2.5.1 can return with an undocumented error number
ERANGE "Result too large". Rtr now works around this and no longer
closes the link.