hp Reliable Transaction Router
Application Design Guide

Journal Accessibility

The RTR journal on each node must be accessible to be used to replay transactions. When setting up your system, consider both journal sizing and how to deal with replay anomalies.

Journal Sizing

To size a journal, use the guidelines described in the section Creating a Recovery Journal in the Reliable Transaction Router System Manager's Manual.

Use of large transactions generally causes poor performance, not only for initial processing and recording in the database, but also during recovery. Large transactions fill up the RTR journals more quickly than small ones.

For replay anomalies, use the RTR_STS_REPLYDIFF status message to determine if a transaction has been recorded differently during replay. For details on this and other status messages, see the Reliable Transaction Router C++ Foundation Classes manual or the Reliable Transaction Router C Application Programmer's Reference Manual.

You should also consider how the application is to handle secondary or shadow server errors and aborts, and write your application accordingly.

Design for Performance

In designing for performance, take the following into account:

Consider the amount of data being transferred.
Keep the size of transaction messages short.
Tie up the database for as short a time as possible.
When using transactional shadowing to two sites, have high-speed links between sites.
Evaluate your hardware, in particular:
- Memory (see the Reliable Transaction Router System Manager's Manual for information on virtual memory requirements for RTR links, channels, and messages)
- Disk striping
- Volume shadowing
- Disk performance/fragmentation
- Disk controllers
Consider tuning your operating system on nodes where RTR is running.
With the C++ API:
- Use the RTRServerTransactionController::SetIndependentTransaction method.
- Use multi-transaction controller applications, which are more efficient than multiple, single transaction controller applications.
- Use the RTRClientTransactionController::SetReadOnly method to reduce RTR journaling.
With the C API:
- Use the independent transaction flag.
- Use multi-channel applications, which are more efficient than multiple, single channel applications.
- Use the READ_ONLY flag to reduce RTR journaling.
- Use single accept_txn flags for client/server calls to minimize transaction activity; for example, send/accept or reply/forget .

RTR Performance Tests

An important part of your application design will concern performance considerations: how will your application perform when it is running with RTR on your systems and network? Providing a methodology for evaluating the performance of your network and systems is beyond the scope of this document. However, to assist your understanding of the impact of running RTR on your systems and network, this section provides information on two major performance parameters:

Number of client channels for the C API or transaction controllers (client side) and partitions (server side) for the C++ API.
Size of messages (in bytes)

This information is roughly scalable to other CPUs and networks. The material is based on empirical tests run on a relatively busy Ethernet network operating at 700 to 800 Kbps (kilobytes per second). This baseline for the network was based on FTP tests (doing file transfer using a File Transfer Protocol tool) because in a given configuration, network bandwidth is often a limiting factor in performance. For a typical CPU (for example, a Compaq AlphaServer 4100 5/466 4 MB) opening 80 to 100 channels with a small (100 byte) message size, a TPS (transactions per second) rate of 1400 to 1600 is usual.

Tests were performed using simple application programs (TPSREQ - client and TPSSRV - server) that use RTR Version 3 C API application programming interface calls to generate and accept transactions. (TPSREQ and TPSSRV are supplied on the RTR software kit.) The transactions consisted of a single message from client to server. The tests were conducted on OpenVMS Version 7.1 running on AlphaServer 4100 5/466 4 MB machines. Two hardware configurations were used:

A single node, both client and server running on the same machine
Two nodes, one configured as a frontend, the other as a router and backend

In each configuration, transactions per second (TPS) and CPU-load (CPU%) consumed created by the application (app-cpu) and the RTR ACP process (acp-cpu) were measured as a function of the:

Number of client channels opened by the TPSREQ test program
Size of the message sent

The transactions used in these tests were regular read/write transactions; there was no use of optimizations such as READONLY or ACCEPT_FORGET. The results for a single node with an incrementing number of channels are shown in Figure 2-8.

Figure 2-8 Single-Node TPS and CPU Load by Number of Channels

This test using 100-byte messages suggests the following:

CPU saturation limited the maximum TPS at about 2500.
CPU resource cost per transaction goes down rapidly as offered load increases (probably due to more effective use of RTR optimizations to `batch' I/Os for disk and interprocess communication (IPC) as more transactions are being processed concurrently).
In an SMP environment , the RTRACP will likely limit the maximum TPS per system to about 3000, regardless of the number of CPUs added.

The results for a single node with a changing message size are shown in Figure 2-9.

Figure 2-9 Single-Node TPS and CPU Load by Message Size

This test using 80 client and server channels suggests that:

CPU saturation appears to limit TPS for small message sizes.
Disk I/O rates appear to limit TPS for large messages.

The results for the two-node configuration are shown in Figure 2-10.

Figure 2-10 Two-Node TPS and CPU Load by Number of Channels

This two-node test using 100-byte messages provides CPU usage with totals for frontend and backend combined (hence a maximum of 200 percent). This test suggests that the constraint in this case appears to be network bandwidth. The TPS rate flattens out at a network traffic level consistent with that measured on the same LAN by other independent tests (for example, using FTP to transfer data across the same network links).

Summary

Determining the factors that limit performance in a particular configuration can be complex. While the previous performance data can be used as a rough guide to what can be achieved in particular configurations, they should be applied with caution. Performance will certainly vary depending on the capabilities of the hardware, operating system, and RTR version in use, as well as the work performed by the user application (the above tests employ a dummy application which does no real end-user work.)

In general, performance in a particular case is constrained by contention for a required resource. Typical resource constraints are:

CPU saturation
Disk storage I/O bandwidth and latency
Network bandwidth and delays
Server application I/O delays
Database tuning
Optimum database connection bandwidth
Size of messages
Number of transaction controllers or channels

Additionally, achieving a high TPS rate can be limited by:

Lack of applied client load

For suggestions on examining your RTR environment for performance, see Appendix F in this document, Evaluating Application Resource Requirements.

Concurrent Servers

Use concurrent servers in database applications to optimize performance and continue processing when a concurrent server fails.

When programming for concurrency, you must ensure that the multiple threads are properly synchronized so that the program is thread-safe and provides a useful degree of concurrency without ever deadlocking. Always check to ensure that interfaces are thread-safe. If it is not explicitly stated that a method is thread-safe, you should assume that the routine or method is not thread-safe. For example, to send RTR messages in a different thread, make sure that the methods for sending to server, replying to client and broadcasting events are safe. You can use these methods provided that the:

Sending thread owns the object being sent.
Transaction controller has been completely constructed before any other threads use it.
Transaction controller is not destructed before other threads have stopped using it.

Partitions and Performance

Partitioning data enables the application to balance traffic to different parts of the database on different disk drives. This achieves parallelism and provides better throughput than using a single partition. Using partitions may also enable your application to survive single-drive failure in a multi-drive environment more gracefully. Transactions for the failed drive are logged by RTR while other drives continue to record data.

Facilities and Performance

To achieve performance goals, you should establish facilities spread across the nodes in your physical configuration using the most powerful nodes for your backends that will have the most traffic.

In some applications with several different types of transactions, you may need to ensure that certain transactions go only to certain nodes. For example, a common type of transaction is for a client application to receive a stock sale transaction, which then proceeds through the router to the current server application. The server may then respond with a broadcast transaction to only certain client applications. This exchange of messages between frontends and backends and back again can be dictated by your facility definition of frontends, routers, and backends.

Router Placement

Placement of routers can have a significant effect on your system performance. With connectivity over a wide-area network possible, do not place your routers far from your backends, if possible, and make the links between your routers and backends as high speed as possible. However, recognize that site failover may send transactions across slower-speed links. For example, Figure 2-11 shows high-speed links to local backends, but lower-speed links that will come into use for failover.

Figure 2-11 Two-Site Configuration

Additionally, placing routers on separate nodes from backends provides better failover capabilities than placing them on the same node as the backend.

In some configurations, you may decide to use a dual-rail or multihome setup for a firewall or to improve network-related performance. (See the Reliable Transaction Router System Manager's Manual section on Network Transports for information on this setup.)

Broadcast Messaging

When a server or client application sends out a broadcast message, the message passes through the router and is sent to the client or server application as appropriate. A client application sending a broadcast message to a small number of server applications will probably have little impact on performance, but a server application sending a broadcast message to many, potentially hundreds of clients, can have a significant impact. Therefore, consider the impact of frequent use of large messages broadcast to many destinations. If your application requires use of frequent broadcasts, place them in messages as small as possible. Broadcasts could be used, for example, to inform all clients of a change in the database that affects all clients.

Figure 2-12 illustrates message fan-out from client to server, and from server to client.

Figure 2-12 Message Fan-Out

You can also improve performance by creating separate facilities for sending broadcasts.

Making Broadcasts Reliable

To help ensure that broadcasts are received at every intended destination, the application might number them with an incrementing sequence number and have the receiving application check that all numbers are received. When a message is missing, have a retransmit server re-send the message.

Large Configurations

Very large configurations with unstable or slow network links can reduce performance significantly. In addition to ensuring that your network links are the fastest you can afford and put in place, examine the volume of inter-node traffic created by other uses and applications. RTR need not be isolated from other network and application traffic, but can be slowed down by them.

Using Read-Only Transactions

Read-only transactions can significantly improve throughput because they do not need to be journaled. A read-only database can sometimes be updated only periodically, for example, once a week rather than continuously, which again can reduce application and network traffic.

Making Transactions Independent

When using transactional shadowing, it can enhance performance to process certain transactions as independent. When transactions are declared as independent, processing on the shadow server proceeds without enforced serialization. Your application analysis must establish what transactions can be considered independent, and you must then write your application accordingly. For example, bets placed at a racetrack for a specific race are typically independent of each other. In another example, transactions within one customer's bank account are typically independent of transactions within another customer's account. For examples of code snippets for each RTR API, see the appendices of samples in this manual.

Configuration for Operability

To help make your RTR system as manageable and operable as possible, consider several tradeoffs in establishing your RTR configuration. Review these tradeoffs before creating your RTR facilities and deploying an application. Make these considerations part of your design and validation process.

Define your facilities with an eye to the number and placement of frontends, routers, and backends.
To avoid problems with quorum resolution, design your configuration with an odd number of routers to ensure that quorum can be achieved.
Separate your routers from your backends to improve failover, so that failure of one node does not take out both the router and the backend.
If your application requires frontend failover when a router fails, frontends must be located on separate nodes from the routers, but frontends and routers must of course be in the same facility. For frontend failover, a frontend must be in a facility with multiple routers. You use frontend failover with nested transactions.
To identify a node used only for quorum resolution, define the node as a router or as a router and frontend. Define all backends in the facility, but no other frontends.
With a widely dispersed set of nodes, for example, nodes distributed across an entire country, use local routers to deal with local front ends. This can be more efficient than having many dispersed frontends connecting to a small number of distant routers.
In many configurations, it may be more effective to place routers near backends.

Firewalls and RTR

For security purposes, your application transactions may need to pass through firewalls in the path from the client to the server application. RTR provides this capability within the CREATE FACILITY syntax. See the Reliable Transaction Router System Manager's Manual, Network Transports, for specifics on how to specify a node to be used as a firewall, and how to set up your application tunnel through the firewall.

Avoiding DNS Server Failures

Nodes in your configuration are often specified with names and IP or DECnet addresses fielded by a name server. When the name server goes down or becomes unavailable, the name service is not available and certain requests may fail. To minimize such outages, declare the referenced node name entries in a local host names file that is available even when the name server is not. Using a host names file can also improve performance for name lookups. For details on this, see the Reliable Transaction Router System Manager's Manual section on Network Transports.

Batch Procedures

Operations staff often create batch or command procedures to take snapshots of system status to assist in monitoring applications. The character cell displays (ASCII output) of RTR can provide input to such procedures. Be aware that system responses from RTR can change with each release, which can cause such command procedures to fail. If possible, plan for such changes when bringing up new versions of the product.

Chapter 3
Implementing an Application

In addition to understanding the RTR run-time and system management environments, you must also understand the RTR applications environment and the implications of that environment on your implementation. This section provides information on requirements that transaction processing applications must take into account and deal with effectively. It also cites rules to follow that can help prevent your application from violating the rules for ensuring that your transactions are ACID compliant. The requirements and rules complement each other and sometimes repeat a similar concept. Your application must take both into account.

RTR Requirements on Applications

Applications written to operate in the RTR environment should adhere to the following rules:

Be transaction aware
Avoid server-specific data
Optionally, have independent transactions
Optionally, use two identical databases for transactional shadow servers
Make transactions self-contained
Lock shared resources

Be Transaction Aware

RTR expects server applications to be transaction aware; an application must be able to roll back an appropriate amount of work when asked. Furthermore, to preserve transaction integrity, rollback must be all or nothing. Each transaction incurs some overhead, and the application must be prepared to deal with failures and concomitant rollback gracefully. When designing your client and server applications, note the outcome of transactions. Transactional applications often store data in variables that pertain to the operation taking place outside the control of RTR. Depending on the outcome of the RTR transaction, the values of these variables may need to be adjusted. RTR guarantees delivery of messages (usually to a database), but RTR does not know about any data not passed through RTR.

The rule is:
Code your application to preserve transaction integrity through failures.

Avoid Server-Specific Data

The client and server applications must not exchange any data that makes sense on only one node in the configuration. Such data can include, for example, a memory reference pointer, whose purpose is to allow the client to reference this context in a later transaction, indexes into files, node names, or database record numbers. These values only make sense on the machine on which they were generated. If your application sends data to another machine, that machine will not be able to interpret the data correctly. Furthermore, data cannot be shared across servers, transaction controllers, or channels.

The rule is: How you track state must be meaningful on all nodes where your application runs.

Be Independent of Time of Processing

Transactions are assumed to contain all the context information required to be successfully executed. An RTR transaction is assumed to be independent of time of processing. For example, in a shadow environment, if the secondary server cannot credit an account because it is past midnight, but the transaction has already been successfully committed on the primary server, this would cause an inconsistency between the primary and secondary databases. Or, in another example, Transaction B cannot rely on the fact that Transaction A performed some operation before it.

Make no assumptions about the amount of time that will occur between transactions, and avoid using a transaction to establish a session with a server application that can time out. Such a timeout might occur in a client application that logs onto a server application that sets a timer to determine when to log the client off. If a crash occurs after a successful logon, subsequent transactions may fail because the logon session is no longer valid.

The rule is:
If you have operations that must not be shadowed, identify them and exclude them from your application. Furthermore, do not keep a state that can become stale over time.

In your application, you can define transactions as independent with the C++ API, using the SetIndependentTransaction method in your transaction controller AcceptTransaction or SendApplicationMessage calls. Using the C API, you use the independent transaction flag in your rtr_accept_tx or rtr_reply_to_client calls.

For more information on the independent transaction methods in the RTRServerTransactionController class, refer to the Reliable Transaction Router C++ Foundation Classes manual. For more information on the independent transaction flag and the different uses of these calls, refer to the Reliable Transaction Router C Application Programmer's Reference Manual.

Use Two Identical Databases for Shadow Servers

Shadow server use is aimed at keeping two identical copies of the database synchronized. For example, Figure 3-1 illustrates a configuration with a router serving two backends to two shadow databases. The second router is for router failover.

Figure 3-1 Transactional Shadow Servers

If an update of a copy triggers the update of a third common database, the application must determine whether it is running as a primary or a secondary, and only perform an update if it is the primary. Otherwise, there can be complex failure scenarios where duplication can occur.

For example, RTR has no way to determine if a transaction being shadowed is a one-time-only transaction, such as a bookstore debiting your credit card for the purchase of a book. If this transaction is processed on the primary node and the processed data fed to a third common database, and the transaction is later processed on the secondary node, your account would incorrectly be double charged. The application must handle this situation correctly.

The rule is:
Design your application to deal correctly with transactions, such as debiting a credit card or bank account, that must never be performed more than once.

Figure 3-2 shows a configuration with two shadow servers and a third, independent server for a third, common database. This is not a configuration recommended for use with RTR without application software that deals with the kind of failure situation described above. Another method is to decouple the shadow message from the other branch.

Figure 3-2 Shadow Servers and Third Common Database (not recommended)

When updating a single resource through multiple paths, the recommended method is to use the RTR standby functionality.

Make Transactions Self-Contained

All information required to process a transaction from the perspective of the server application should be contained within the transaction message. For example, if the application required a user-ID established earlier to successfully execute the transaction, the user-ID should be included in the transaction message.

The rule is:
Construct complete transaction messages within your application.

Contents

Index

hp Reliable Transaction RouterApplication Design Guide