1. Figure 1 is TEST.FTN. 2. Figure 2 is CHKP.MAC 3. TSTROT.MAC, TSTTBL.MAC, TSTUTL.MAC, and TST.CMD should be moved to ARIS. How Fast (Slow) Is RSX? With many real-time systems, milliseconds may make the difference between success and failure of an application. This article looks at how fast or slow is RSX. Simple test programs have been written to measure the execution speed of most RSX directives. Some of the results are surprising. A few months ago I was looking at a RSX-11M-Plus task accounting report from a system I was seeing that day for the first time. The report covered a ten minute interval when the particular application was in full production. A quick glance through the listing revealed six tasks which consumed the majority of CPU time. These tasks were issuing system directives at the rate as high as 400 directives/second. The RMDEMO system page displayed directive rates for the whole system as high as 800 directives/second. The accounting report showed some of the tasks context switching more than 100 times a second. These rates seemed excessive. But I had nothing on which to base this judgement. If a typical directive takes one millsecond to execute, 800 directives/second would be very significant (80% of the CPU)! But if average directive execution is 0.1 milliseconds, only an insignificant 8% of the CPU is involved. A few simple tests showed 800 directives/second would be a substantial load on this particular PDP-11 model. A close examination of the source code revealed a frequently called subroutine which was always disabling and enabling checkpointing around some of its operations. The programs using this subroutine were the most important tasks in the system and thus always in memory. We installed the tasks as noncheckpointable, removed the directives from the subroutine, and eliminated over 300 directives/second. This experience lead to further questions about the speed of specific RSX features. What is the overhead of a QIO? How many bytes of data can be moved around a system using send/receive messages? How long does it take to context switch or execute an AST? Is stopping more efficient than waiting? Is there any differences between RSX-11M and RSX-11M-Plus or between old and new versions of the RSX operating systems? The answer to these questions can be found with some very simple test programs and a standalone system. The standard 60-cycle system clock does not have enough resolution to measure how long it takes to execute a single directive. However, accurate results can be obtained by computing the elapsed time when issuing 10,000 iterations of the directive. A standalone system is needed or the elapsed wall-clock time will include execution time for other tasks. The initial test program I used was a Fortran-77 main program named TIMER which timed how long it took to make some number of calls to the subroutine TEST. I then wrote a whole series of TEST subroutines, each one issuing some directive sequence. Figure 1 shows the Fortran-77 main program. Figure 2 lists a TEST subroutine named CHKP which issues the disable and enable checkpointing directives. The first couple of versions of TEST subroutines lead to other tests for such things as mark times, send receive, and event flags. Soon I had accumulated 23 different TEST subroutines and task built 23 different tasks. A more systematic approach seemed to be appropriate. I started over again and wrote the TST utility. TST contains a database of almost all RSX-11M and RSX-11M-Plus directives. The TST command line selects a test by directive name and specifies the loop count. The resulting timings are output to the user terminal. TST uses only the basic PDP-11 instruction set and should run on an PDP-11 model. TST is too long to reproduce in this article. The TST sources and a command file to execute all available tests are available through ARIS. The program is also available on the San Francisco Fall 1986 RSX SIG Tape. The submission also includes 15 small test tasks which act as targets for some of the directives. The initial tests simply establish the baseline performance of the TST utility and particular PDP-11 CPU. I ran all tests on a PDP-11/24 running RSX-11M-Plus V3.0, Autopatch A. The Null test measures how fast the test loop runs with just a simple return instruction. My PDP-11/24 could execute 57,142 null tests/second. Thus the test loop code takes only a little less than 0.02 milliseconds to execute. We will see that this is only 1-2% of event the fastest RSX directive. The Block Move test measures how long it takes to copy a 512-byte block of data using eight loops through a series of 32 MOV instructions. This test gives us a means of comparing directive execution time to some easily visualized workload, that is, we can state some directive took 2.1 block moves to execute. The PDP-11/24 system executed 733 block moves per second or 1.36 milliseconds to move one block of data. The Get Sense Switch directive (GSSW) is effectively the RSX null directive. The actual directive processing takes only one instruction. As the test results below show, the common processing performed for any directive is about 1/3 of a block move. Block Move 733/sec 1.36 ms 1.00 bm GSSW 2310/sec 0.43 ms 0.31 bm Some of the fastest RSX directives are directives to set, clear, and read event flags. The following timings show the time for local flags: SETF 1790/sec 0.55 ms 0.41 bm CLEF 1810/sec 0.55 ms 0.40 bm RDEF 1810/sec 0.55 ms 0.40 bm Using global event flags is only slightly slower than local flags. However, a uniform 0.13 millisecond difference could be seen between local and group flags. This can be seen by looking at the set event flag directive: SETF/Local 1790/sec 0.55 ms 0.41 bm SETF/Global 1760/sec 0.57 ms 0.42 bm SETF/Group 1440/sec 0.69 ms 0.51 bm Surprisingly, it is slightly faster to wait on the logical-or of event flags than it is to wait on a single event flag. Also, waiting is slightly faster than stopping on an event flag. The following results are the test times when waiting on event flag 1 which has already been set. WTLO 1210/sec 0.83 ms 0.61 bm WTSE 1140/sec 0.88 ms 0.64 bm STSE 1120/sec 0.89 ms 0.65 bm Some directives are almost always issued in pairs. This includes enable and disable checkpointing and enable and disable AST recognition. These directive pairs are very fast. But as discussed in the introduction, these directives are often used around frequently called critical code segments. The CPU time used for these directives becomes very significant when issued 100 or more times per second. ENCP/DSCP 1050/sec 0.95 ms 0.70 bm ENAR/DSAR 974/sec 1.03 ms 0.75 bm Perhaps the most common RSX directive is the QIO. The QIO directive is tested by issuing read virtual functions to the null device. One test issued separate QIO and WTSE (wait for event flag) directives. The second test uses the more common and more effective QIOW form. QIO/WTSE 328/sec 3.05 ms 2.23 bm QIOW 401/sec 2.49 ms 1.83 bm As the above numbers show, the QIO and QIOW directives take almost six times the processing needed for simple event flag directives. However, the QIO is certainly not the slowest RSX directive. The TST utility showed the new Parse FCS and Parse RMS directives win the slowest award: PFCS 45/sec 22.2 ms 16.29 bm PRMS 45/sec 22.2 ms 16.29 bm Perhaps the second most common set of directives issued by RSX applications are the PLAS directives. A new feature of RSX-11M-Plus V3.0 is the fast mapping mechanism and the TST utility results show the new feature can be well-worth the time it takes to convert MAP$ directives to the new calling sequence. MAP 555/sec 1.80 ms 1.32 bm fast no length 5833/sec 0.17 ms 0.13 bm fast w/ length 3488/sec 0.29 ms 0.21 bm A fast mapping call with no length change can reposition you inside a region 10 times faster than the equivalent MAP directive. This is the normal case as you almost alwats set the window size to the full 4KW boundary. Even if you change the length, the fast mapping feature is a big winner. The TST utility showed the execution speed of the MAP directive was the same whether changing the region offset, length, or even between regions. The most common method used for RSX intertask communication are the send and receive directives. Both RSX-11M and RSX-11M-Plus offer a fixed-length (13-word) mechanism. RSX-11M-Plus extends the mechanism to include any length up to 256 words. The simplest case measured by TST is a send to itself followed by one receive. While the 13-word SDAT/RCVD test executes twice as fast as the 256-word variable send and receive, the effective baud rate is only 1/10 as fast. SDAT/RCVD 226/sec 4.42 ms 3.24 bm ( 47 kb) VSDA/VRCD 112/sec 8.93 ms 6.54 bm (458 kb) The above test are unrealistic as very little useful work is accomplished by sending data to yourself. The following tests show the TST utility sending data to other tasks using different techniques to wake up the receiving task: event flags, AST's, receive-data-or-stop, and receive-data-or-exit. Note the receiving tasks always do two receives for each message sent. The second receive is needed to detect no more messages from the lower priority TST utility. SDAT/RCVD flag 126/sec 7.94 ms 5.82 bm ( 26 kb) SDAT/RCVD AST 126/sec 7.94 ms 5.82 bm ( 26 kb) SDAT/RCST 123/sec 8.13 ms 5.96 bm ( 25 kb) SDRC/RCVX 81/sec 12.34 ms 9.05 bm ( 17 kb) VSDA/VRCD flag 79/sec 12.66 ms 9.28 bm (324 kb) VSDA/VRCD AST 79/sec 12.66 ms 9.28 bm (324 kb) VSDA/VRCS 77/sec 12.99 ms 9.52 bm (315 kb) VSRC/VRCX 59/sec 16.95 ms 12.42 bm (242 kb) In all tests above, the receiving task was fixed in memory so no disk load time is involved. The send-and-request, receive-and-exit test show the penality paid for starting and stopping a task. The TST utility results can also be used to measure how much CPU is used to process an AST or context switch between two tasks. AST time is measured by adding an AST to the QIOW and simple send/receive data tests. The results show the time RSX uses to declare and exit AST's is the same as one block move. QIOW w/AST 257/sec 3.89 ms 2.85 bm QIOW 401/sec 2.49 ms 1.83 bm ------------------------------------------ AST processing 1.40 ms 1.02 bm SDAT/RCVD w/AST 171/sec 5.85 ms 4.29 bm SDAT/RCVD 226/sec 4.42 ms 3.24 bm ------------------------------------------ AST processing 1.43 ms 1.05 bm The same technique can be used to measure task context switching. The time needed for local send/receive processing can be compared against the time needed for the same set of directives in two tasks. S/R (2 task) 126/sec 7.94 ms 5.82 bm S/R (1 task) 150/sec 6.67 ms 4.87 bm ------------------------------------------ Context switch 1.27 ms 0.95 bm The typical RSX system spends more time in the exceutive then executing your application code. As these results above show, RSX directives are not free. If accounting and other data indicates particular tasks are issuing large number of directives, you should question what function they are accomplishing with these directives. You will often find directives being issued inside a loop which could be moved outside the loop. Directive timings are also important to establish performance limits for your system. The TST utility or simple programs like TIMER can easily find the limits of your particular system. I intend to publish a complete table of RSX directive timings in the future. This table will include as many different PDP-11 CPU's and versions of RSX-11M and RSX-11M-Plus as I am able to test. If you are able to run the TST utility on your system, please forward a copy of the results to me care of the DEC Professional.