.RIGHT MARGIN 80.LEFT MARGIN 5
.FIRST TITLE
.TITLE Sort-11 V3.0 performance
.SUBTITLE B.#Z.#Lederman
.PARAGRAPH
The following is some of the data I obtained when testing some variants of
Sort-11 V2.0 and V3.0. Statistics were obtained from Sort-11 itself, through
System Accounting, SRMLOG (a system measurment program), and with SPM-11.
.PARAGRAPH
The first test shown is sorting a file containing 20,000 records of 64 bytes
(2579 blocks), each record being a string of random numbers (the file is in
"random" order), sorted with four keys (1.8:57.8:9.8:17.8). The different
lines of data are for differnt numbers of work files: the first is with a
large number of work files (5 for V2.0, 7 for V3.0), the second is with the
default number of work files, and the third (when present) is for 3 work
files. The three letters at the left is the task name that that version was
installed under, which will be seen in other tables.
.BLANK.NO JUSTIFY.NO FILL.TEST PAGE 30
                                 Data from SPM                System
                   Elapsed Time    CPU     DB1 F11ACP Over-  Accounting
Version  Type              sec.    sec.    QIO   QIO   lays  Directives
-----------------------------------------------------------------------
.BLANK
srt V2.0 as distributed     228    106    3149    31   88    9579
                            299    132    3687    89  199   11242
.BLANK
srr V2.0 non-overlayed      237    116    3061    32         3266
         resident library   306    141    3464   114         3954
.BLANK
srs V2.0 overlayed          218    112    2413    35    7    2614
         resident library   288    136    2780   122    7    3315
.BLANK
sr3 V3.0 as distributed     228    144    2241    86  110    2697
                            221    143    1832    72   88    2242
                            256    160    1971    51   64    2329
.BLANK
sr4 V3.0 resident library   211    145    1531    99   19    1947
                            211    146    1351    87   19    1740
                            239    161    1420    77   19    1736
.BLANK
sri V3.0 I/D space, non-    208    144    1398   101         1787
         overlayed,         209    146    1247   114         1626
         resident library   238    161    1298   104         1652
.BLANK
srj V3.0 as above, larger   200    142    1176    96         1568
         work space         204    143    1042   115         1432
                            231    159    1086   131         1490
.BLANK.JUSTIFY.FILL
When the test is run on an idle system (Sort was the only task of significance
running at the time), there does not appear to be a great amount of
difference beween versions. It can be seen that use of the resident libraries
reduces overlays, and reduces the number of disk I/O's (there is more task
image space for work area, so less disk work space is required), but task
elapsed time isn't much improved. In the case of V2.0, building the task
non-overlayed is worse than the overlayed version: this is because only a few
overlays are needed (once the RMS overlays are taken out), and because less
task work area is available so more disk work area is used. Of the three, the
overlayed task linked to the RMS supervisor mode library places the least load
on the system, but this has little effect on an idle system. Comparing V3.0
with V2.0 as distributed, there is a considerable reduction in the number of
directives and QIO's, but little improvement elsewhere. Linking to
the RMS resident library produces similar benefits as were obtained with V2.0,
and the resulting task is now running elapsed times comparable to V2.0 (note
that the actual percentage difference between all of the tasks is small). With
V3.0 however, a further improvement is obtained by building the task
non-overlayed, as it may be built as an I-and-D space task: this yields a net
increase in task work space, rather than the loss that occurred with V2.0
which cannot be built I-and-D space. (There is so much more code with V3.0,
primarily command parsing, that it cannot reasonably be built non-overlayed
without making it I-and-D space.)# Now, the task has both increased internal
work space and also has no disk overlays, and this can be seen from the even
smaller numbers of QIO's and directives issued. Still, on an idle system, the
net elapsed time is not changed much, due to waiting times for disk I/O. It
might also be noted that V3.0 seems to be a little smarter about choosing the
best number of work files, and in using them effectively.
.PARAGRAPH
The above test was repeated, but with 10 other programs running which were
created to load the CPU and disk to create an environment similar to what we
have on our production systems. The load from these programs is constant and
predictable, however, so that testing the different versions of Sort would be
valid. The data file was reduced to 15,000 records (1875 blocks).
.BLANK.NO JUSTIFY.NO FILL.TEST PAGE 17
                                 Data from SPM
                   Elapsed Time    CPU     DB1 F11ACP Over-
Version  Type              sec.    sec.    QIO   QIO   lays
------------------------------------------------------------
.BLANK
srt V2.0 as distributed     332    108    2732   129   193
.BLANK
srr V2.0 non-overlayed      345    127    2532   137
.BLANK
srs V2.0 overlayed, reslib  304    119    1868   124
.BLANK
sr3 V3.0 as distributed     254    132    1328    81    88
.BLANK
sri V3.0 I/D space          242    133     867   115
.BLANK
srj V3.0 as above, larger   238    131     696   127
.BLANK.JUSTIFY.FILL
Now that there are other programs on the system competing for CPU time, and
more importantly, disk I/O, the improvement obtained is more obvious. Use of
the resident library has improved V2.0 elapsed time by almost 10%; V3.0 is now
somewhat faster than V2.0, and a small but significant improvement can be seen
in the non-overlayed versions of V3.0 over the distributed version. By
reducing the number of QIOs, there is less competition for disk access, and
reducing the number of directives issued means less CPU time is taken up by
the operating system, and allows the program to get more work done within it's
time slice. Increasing the internal work space also allows the program to
waste less time and disk resource by not having to reference the disk work
files as often.
.PARAGRAPH
Note that all of the testing was done with RECORD sorts: I have not tested
other types of sorts, as we do not use them much here. I have tested the
different versions with different data files: with small (less than 3000
record) files, there is less difference between the different versions. With
very small files (less than 100 records), V3.0 is usually a little slower:
apparently it takes longer to parse out the commands (there are more of them,
especially with specification files) and initialize the task than V2.0 did, but
usually the difference is very small, and can only be seen by measuring it (it
does not appear to a person sitting at a terminal that the sort took longer).
We have settled on V3.0 linked to the RMS library, but overlayed: we would like
to go to the I-and-D space version, but there are other problems (like the
inability to install it in VMR) that prevent us from doing so at this time.