RSX11M System Tuning and Performance Optimization Workshop GOALS o Document What Areas Which Can be Optimized o How Much Improvement Is Possible o How Much Effort is Required CONCLUSION o 10% To 50% (or more) Improvements in Overall Performance Can Be Realized With Moderate Effort. o Performance Improvements in Specific Application tasks of 200% or more is possible. o RSX11M Can Be Friendly. Ease of Program Development Can Be Significantly Improved. TYPES OF OPTIMIZATION o ADD FUNCTIONALITY TO OPERATING SYSTEM o OVERALL SYSTEM THROUGHPUT o SUBSYSTEM OR APPLICATION SYSTEM THROUGHPUT o EASE OF USE (SYSTEM/SUBSYSTEM) o MEMORY USAGE (size * core-time) o POOL USAGE (Discussed earlier) o DISK USAGE TOPICS and Speakers John Covert - DEC (Panel) Jim Downward OVERVIEW - What I'm saying now. FAST DISKS - Effects of using a DISK EMULATOR CCL - How to same pool and make system friendly FCSRES - Benefits of building EVERYTHING with FCSRES. LDLIB - Tansient resident libraries. TUNING - What you can do to optimize performance PERFORMANCE MEASUREMENT - How to do it, a case study Bob Dosen FAST DISK I/O - High rate I/O to disks (RL01/RK05) - Opening files without FCS. Direct QIO's to Disk. Dynamic allocation of FCB's. EXTENDING RSX - Task address space expansion (Supervisor mode space on 11/45's, control with pseudo directives) - PLAS directives in User controlled partitions. - Adding directives using illegal instruction traps - Dynamic common areas. Brian McCarthy - DEC (To be announced) Art Perlo INTERRUPT OVERHEAD - Measurement of interrupt overhead for TTDRV (full/half duplex) and LPDRV. HARDWARE OPTIMIZATION o ADD CACHE MEMORY TO IMPROVE THROUGHPUT o FIXED HEAD DISKS OR DISK EMULATORS o BALLENCE LOADS ON EXISTING DISKS o DMA TERMINAL INTERFACE PERFORMANCE MEASUREMENT TECHNIQUES o MEASURE WITH USERS OFF SYSTEM. o ONLY MODIFY ONE VARIABLE AT A TIME. o CREATE STANDARD COMMAND FILES TO EXCERCISE SELECTIVE SYSTEM FEATURES. CASE STUDY: A SYSTEM TERMINAL LOAD SIMULATOR o QUESTION - WHAT ARE THE LIMITS OF EXPANSION AND USE OF OUR EXISTING SYSTEM? o GOALS: - TEST WITHOUT ADDING ADDITIONAL TERMINALS. - PROFILE SYSTEM THROUGHPUT AS THE NUMBER OF USER TERMINAL LOADS INCREASES. - SIMULATE WORST CASE POOL USAGE AS THE NUMBER OF TERMINAL INCREASES. - MONITOR SYSTEM PERFORMANCE PARAMETERS. - MONITOR USAGE OF THE SYSTEM DISK LB:. - PINPOINT SYSTEM BOTTLENECKS. SYSTEM TERMINAL LOAD SIMULATOR o DRIVEN BY SINGLE COMMAND FILE o SYSTEM STATISTICS DATA GATHERED BY KMS ACCOUNTING o SEPERATE TASKS MONITOR SYSTEM DISK (LB:) USAGE AND SNAPSHOT ACCOUNTING DATA (CPU TIME etc.) o SIMULATES 'N' SEPERATE JOB STREAMS DESIGN OF A SYSTEM TERMINAL LOAD SIMULATOR o CREATE nn COPIES OF INSTALL, I00,I01,...Inn o PASS COMMAND LINES TO NONINSTALLED TASKS VIA INSTALL Inn FOO/TASK=FOOnn/RUN=REM/PRM="cmdline" o CREATE PROCEDURE COMMAND FILES LOADTnn.PRC CONSISTING OF A SEQUENCE OF INSTALL /RUN=REM COMMANDS. EX. Inn $PIP/TASK=PIPXnn/RUN=REM/PRM="FOOXnn.OBJ;1/DE" Inn $F4P/TASK=F4PXnn/RUN=REM/PRM="FOOXnn=FOO" o SUBMIT EACH PROCEDURE FILE TO A SEPERATE COPY OF PINXnn, ONE FOR EACH PSEUDO TERMINAL. o CONTROL AND TIME THE EXECUTION OF THE PROCEDURE FILES FROM THE CENTRAL INDIRECT COMMAND FILE. PRODUCING AN OPTIMIZED 'TUNED' SYSTEM o USE CCL o USE BATCH ON VERY LOADED SYSTEM o REDUCE SYSTEM SWAPPING o INCLUDE QIO OPTIMIZATION o TUNE ROUND ROBIN AND SWAP INTERVAL FOR SYSTEM DISK o SELECT OPTIMAL F11ACP o TUNE TASK MEMORY USAGE TO MINIMIZE SWAPPING o TUNE SPECIFIC TASKS TO MINIMIZE SIZE*CORE-TIME MAXIMIZE SPEED (as required). o USE RESIDENT LIBRARIES BATCH o PROGRAM DEVELOPMENT TASKS (F4P, TKB, MAC) ARE GENERALLY THE LARGEST USER OF SYSTEM RESOURCES. o IF SUFFICIENTLY LOADED, GREATER THROUPUT IS OBTAINED BY QUEUEING ALL USERS THROUGH 'n' PROGRAM DEVELOPMENT (BATCH ) STREAMS. o REDUCE SYSTEM SWAPPING o The New V3.2 RMDEMO will cause a heavily loaded System to swap to death. It will reduce system throughput by 26% or more on a heavily loaded system. SOLUTION: Use the old version of RMDEMO. The version 3.2 update is on the fall 1979 San Diego RSX SIG tape. o FIX the SHUFFLER (SHF...) in its own partition. This will increase throughput by 5% on a heavily loaded system. o Adjust the ROUND ROBIN and SWAP times for maximum throughput. (more on this later) o Make average task size smaller or memory larger. (more on this later). QIO OPTIMIZATION o Only tested effect of MAXPKT (MAXPKT = 0, and MAXPKT = 15) o Use NL: as output device. (Test QIO processing time, not Driver time) o QIO time 6 - 11% faster with MAXPKT = 15 o Software timing tests. I) Dump an RK05 to NL: with PIP. Transfer is 2.6% faster with MAXPKT=15. II) Six identical tasks executing simultaniously. Each task writes 10,000 to NL: Job runs 6 - 7% faster with MAXPKT = 15. o CONCLUSION: QIO Optimization speeds up QIO processing by at 6-11% depending on your system. This does not include the effect of the BLXIO transfer vector, which will increase processing still further. Actual system throughput gains will be less because of driver processing time. ROUND ROBIN SCHEDULER AND SWAPPING INTERVAL o TEST EFFECT ON THROUGHPUT OF SYSTEM WITH TWICE AS MANY TASKS TRYING TO EXECUTE AS THERE IS FREE CORE o USE THE PROGRAM 'SCH' TO MODIFY ROUND ROBIN INTERVAL AND SWAPPING INTERVAL 'ON LINE'. (On 1979 San Diego DECUS SIG Tape) o SYNTAX: SCH /RBN:nn/SWP:mm CONCLUSIONS o Throughput insensitive to ROUND ROBIN time for 'reasonable' values (5 tics or so). o From graph of execution time vs Swap Interval find that the Swap Interval should be 40 tics or more for an RK07. This value will vary with disk type. MEMORY USAGE o Examine Utility and Privileged Task Build Command Files. Replace PAR = GEN:XXX:YYY with PAR = GEN as appropriate. Not all tasks need be built to 8K in size. EXAMPLE: LPP.TSK gets built 4K too large. INS.TSK gets built 1K too large. o If you have the Extend Task directive, install BIGTKB such that its size is slightly (2-4K) under the size needed by an average taskbuild. For heavily overlayed tasks(BP2, SYSGEN) use a 32K size taskbuilder. o Since TKB is one of the biggest core user, speed up taskbuilds by building with resident libraries. o If no room for permenant resident libraries, do developmental work with transient resident libraries. o Decrease amount of time large tasks in memory. Speed task up (big buffering, FCSRES, ?). o Use BATCH on crowded development systems. o Use a Procedure Interpreter (PIN) instead of ...AT. for simple command files. PIN is 4 times smaller than Indirect. Use INDIRECT to create PIN control files on the fly. CCL - CONSOLE COMMAND LANGUAGE o Originally created by Richard Kirkman. o Two components. a) A file driven catchall task(...CA.) interprets non-MCR commands. b) Passing command lines to uninstalled tasks. o Frees up POOL!!! o Speeds program development o Makes system very, very friendly.(too friendly??) o More versitile than VAX's DCL. o NO SYSTEM SHOULD BE WITHOUT IT!! FCS RESIDENT LIBRARY o Build all system tasks with FCSRES including all DEC utilities. o TKB command and ODL files to do this on Fall 1979 DECUS SIG TAPE. DISADVANTAGES o More tasks to build at first SYSGEN. o On-Line SYSGEN from one base level of RSX11M to another is hard. o Patching a FCS BUG requires rebuilding all tasks linked to FCSRES. FCS RESIDENT LIBRARY ADVANTAGES o Faster taskbuild times for application tasks. TKB in core less time. o Smaller on-disk task size. Standard DEC utilities on average use 23% less space. o Smaller in-core task size (.5 to 2K for overlayed DEC tasks, 3-4K for unoverlayed tasks). o Initial task load faster. The average DEC utility loads 13% faster o LESS LOADING on system disk. o Smaller tasks swap faster. o More tasks can fit in core. o COT..., and the new QUEUE manager become small enough to use on a system smaller than a PDP 11/70. o Tasks less overlayed, run faster. FCS RESIDENT LIBRARY TIMING BENCHMARKS Speed Improvement PIP NL:=DM:[*,*]*.*/FU 2.4 times faster NL:=[1,2]*.HLP 1.2 times faster BIGMAC NL:=HELLO.MAC 1.1 times faster ...AT. CMD file loop including (.INC, .IF, .TESTFILE,.GOTO) 1.7 times faster ON-DISK TASK SIZE TASK No FCSRES With FCSRES BIGMAC 71 57 BIGTKB 161 145 CDA 159 114 CMP 50 29 CRF 36 24 DMP 57 41 EDI 60 41 EDT 108 88 FLX 129 106 FMT 65 57 IOX 99 79 LBR 72 52 PAT 44 25 PIP 67 51 SLP 48 30 VFY 57 39 VMR 144 124 ZAP 38 25 -------------------------------------- TOTAL 1465 blocks 1126 blocks TRANSIENT RESIDENT LIBRARIES FOR RSX11M o Developed by Brian McArthy (DEC). Packaged and cleaned up by KMS. !!!ON FALL DECUS TAPE!!! o Allows linking to a non-resident Resident Library. o When task is run the library is loaded and linked to the task as a PLAS region. o Current support for F4PRES and BP2RES/RMSSEQ resident libraries. o OTS initialization modules replaced with a call to LDLIB to load and link the required library prior to OTS initialization. o Requires taskbuilding with an STB file (vs using LIBR=). o Multiple tasks will use the same PLAS resident region. o The PLAS region is NOT checkpointable.