Pool It is the middle of the afternoon and there are users at all twenty terminals on your RSX system. Suddenly the system wheezes to a stop. Your system has just ran out of pool. No single problem plagues more RSX systems than running out of system pool. Low pool is a particularly bothersome problem as it only happens when the system is busiest. Even worse, the obvious cure of doing less work is not generally acceptable. Pool problems can be fixed. This article looks at what is pool and why pool is so important. In the second part, the article lists techniques you can use to maximize available pool and minimize pool usage. Finally, the procedure for doing a 'pool audit' is discussed. Pool is the RSX executive's data storage area. Almost everything which RSX needs to keep track of is stored in pool. Pool is in a constant state of flux as tasks enter and leave memory, read and write disk files, and exchange messages with each other. The highly changeable nature of RSX requires dynamic rather than static databases. For instance, rather than maintain a static table which allows some maximum number of programs, RSX allocates a control block for each separate tasks and chains the task control blocks (TCB) together. The only static data element is the TCB list header. The formal RSX name for pool is Dyanamic Storage Region or DSR. Pool is a simply an area of physical memory from which RSX allocates and deallocates different size buffers to use as data structures. The free space in the region is maintained as linked lists of free buffers. The buffer are linked by increasing memory address. The first word in each free buffer points to the next free buffer. The next word is the size of the buffer in bytes. This two word control structure requires RSX round up all buffer sizes to the next double-word boundary. The executive routines $ALOCB and $DEACB manage pool. $ALOCB allocates a buffer using a first-fit algorithm. The routine scans through the free space and stops when it finds a buffer greater or equal to the requested size. $DEACB is called when a executive data structure is no longer used. The buffer space is returned to the appropriate point in the free list. $DEACB merges the new space with any immediately adjacent free space to form the largest possible free buffer. If there was no limit to the size of physical memory which could be used for pool, RSX would not have pool problems. However, the design of the RSX executive limits pool to the first 20K of physical memory. The maximum size of pool is thus 20,480 words minus whatever executive code is loaded into this area. There are very good reasons for the 20K boundary. The PDP-11 memory management unit splits the 32K address space into eight 4K units. The RSX executive must allocate one of its mapping registers to the I/O page. Two more registers are made available for dynamically mapping different objects: device drivers, privilege task code, or user buffers. The remaining five mapping registers provides a total of 20K to permanently map the RSX executive and pool. Almost every RSX directive has an impact on pool. As stated before, pool is the executive's data storage area. Any information with which RSX needs to keep track of an event will be stored in pool. For events like new programs or I/O requests, new data structures need to be allocated. If a structure cannot be allocated, RSX cannot continue to process the directive. When a pool allocation failure is returned to a program, one of three actions is possible: loop and keep issuing the request, ignore the failure and continue, or give up and abort the program. If the operation and program is nonessential to the system operation, the last alternative is perhaps the best. When the task exits all the pool it was consuming will be released. This may be sufficient to allow some critical function to proceed. If the operation is so critical you must loop and keep issuing the request, a wait for significant event (WSIG$S) directive should be issued before looping. Otherwise, no CPU time will be passed to any tasks with lower priority. Any event which frees system pool will cause a significant event. One common type of pool failure could be named the spike effect. One example of a pool spike is using PIP to spool a large directory to a line printer (PIP *.*/SP). PIP sends a separate message to the queue manager. As these messages accumulate faster than they are processed, available pool can reach bottom. Sometimes programming errors can cause an infinite spike and deplete pool. One common error is a loop which issues reads to a terminal. If a nonblocking QIO is used, the program will consume pool with I/O requests. A rarer programming error is the bleeder. Something slowy bleeds pool away. One application I wrote years ago failed because a program sent a message once a hour to a task which was issuing its receive data directive incorrectly. After about ten days of up time, pool would run out. RSX has an executive feature called pool monitoring which can help recover from low pool conditions. The executive tracks pool usage and notifies a special task when pool is low. This task, PMT, will block any new programs from running and warn users on the system to beware of low pool conditions. When pool rises above some preset high water boundary, PMT returns the system to normal operation. When pool becomes critical low and no fragment larger than 84 bytes is available, PMT goes into a second mode. PMT takes over the system and lists potential abortable tasks on the console terminal. You can then select some task(s) for PMT to abort. Potential enough pool can be recovered for the system to continue. Maximize Available Pool -------- --------- ---- The most common cause of pool failure is too much work and too little pool. Pool problems are attacked from two angles: maximizing available pool and minimizing pool usuage. You accomplish the first step by removing as much as code as possible from the first 20K of memory and setting the top of pool to 20K. If you are running RSX-11M on a 22-bit system, the most effective solution to pool problems is to convert to RSX-11M-Plus. RSX-11M-Plus includes many features not available in RSX-11M which increase the available pool and minimize pool usage. The sum of features, especially on I/D space systems, can double or triple pool usage. Many of the problems discussed below have already been solved for RSX-11M-Plus systems. Pool problems have been with RSX since RSX-11M V1.0. RSX Software Engineering has taken many actions to maximize available pool. The initial step was to raise the pool limit from 16K to 20K. Other RSX releases used loadable device drivers, loadable task loader, and directive commons to move code out of the 20K address space. One major initial justification for RSX-11M-Plus was to use the I/D feature of PDP-11/70's to separate executive code (I) from pool (D). The I/D space feature allows RSX-11M-Plus executives to support 20K code and 16K pool segments. The first 4K of memory must be mapped by both I and D space to handle interrupt vectors and the kernel stack. Whenever you generate a RSX system, there are several steps you should take to maximize potentially available pool. These steps should be taken even if pool is not an immediate problem and the top of pool is not set to 20K. If and when pool allocation failures occur, you will be able to raise the top of pool and avoid a new system generation. The large (20K) executive should always be chosen. The only advantage of a 16K executive is privilege tasks may be to be 4K larger. No Digital-supplied privilege tasks use the additional space. You should always select features which move executive code from pool space. Features such as executive common, loadable drivers, loadable task loader costs little in terms of physical memory and have substantial pool gains. Some RSX features take take substantial code to implement and therefore subtract from available pool. These features should be chosen only if there is a valid reason. The most costly feature is XDT. The executive debugging tool is 1K in size and should only be selected if you intended to write device devices and privilege code. Other optional features which generate over 500 bytes of code include crash dump, connect-to-interrupt, parity memory, powerfail, and error logging. While normally included, these features can be dropped if pool is a critical concern. The other RSX features which you can chose do not add greatly to the size of the executive and it is safe to select even if there is no immediate use. Minimize Pool Usage -------- ---- ----- Digital has also made many improvements to minimize pool usage by moving data structures from the executive pool to other areas in memory. Both the terminal driver and files system (F11ACP) kept separate pools for their private data structures. Only when these spaces are exhausted will system pool be used. One major improvement available only in RSX-11M-Plus is secondary pool. This a pool area that sits outside the 20K boundary and is used for data structures that are not referenced often by the executive. Another RSX-11M-Plus only feature is external task headers. Task headers vary from 60 to over 500 words in size. RSX-11M-Plus moves the headers from pool to right below the task in memory. You can help minimize pool usage by making sure these alternative pools are fully utilized. The terminal driver can be a maximum of 8K. This allows about a 3K private pool. If you have pool problems and over eight terminals, you should consider using VMR to change the partition size to the maximum. The private F11ACP pool is used for the file control blocks. Each open file uses a 22 word FCB. The different F11ACP tasks have different size private pools. The small and minimum versions have no internal pool. The middle F11ACP version (F11MDL) can hold only 4 FCBs. The large version of F11ACP (F11LRG) holds 40 FCB's in its pool. You can adjust the size of F11ACP pool by editting the task build file and changing the size of program section $$AFR1. Beware not to move start of $$BUF3 beyond 160000 or system crashes will result. You also minimize pool usage by eliminating unnecessary uses of pool. One very common example is installed tasks. Task Control Blocks take from 24 to 40 words of pool depending on the type of system. This can be a significant amount of space when a system has over a hundred task installed. A task only needs to be installed when it is go to be run. Otherwise, its TCB is using pool unnecessarily. The most common type of installed task are those installed with names ...xxx. Such tasks are called MCR tasks and include PIP, MAC, LBR, and TKB. Often a system will have 40-60 MCR tasks installed when only a handful are actively used. This does not severely impact a RSX-11M-Plus system since the MCR task TCB's are stored in secondary pool. However the TCB's can be a major impact of RSX-11M system pool. Users should be trained to install, run, and remove infrequently used tasks. Task headers are another larger consumer of pool, particularly on RSX-11M systems which lack the external header feature available on RSX-11M-Plus. A task header is only allocated when a task is in memory, so there are no unnecessary headers. However, task headers can be too large. A two-word entry is allocated in the header for each logical unit available to a task. The number of logical units is set by the task builder UNITS option. 100 words of pool would be wasted if a task used logical units 1-5 and 56. Attachment descriptors is a small data structure used to link tasks to regions of memory. If your applications uses the Program Logical Address Space or PLAS directives, an attachment descriptor is created every time a region is attached. It is easy to accumulate a large number of attachment descriptor by forgetting to detach from regions when finished. It is even possible to loop and keep attaching the same region over and over. Open files can also consume large pool buffers. Each logical unit with an open file has an associated exceutive data structure called a window block. This structure is used to map the virtual disk block numbers to the actual disk blocks. RSX uses a scheme called 2,4-window retrieval pointers. A retrieval pointer is a three word cell. The first two bytes is the size of disk block segment. The next 4 bytes is the physical block address of the first block. If a 30,000 block file is contiguous, only one retrieval pointer is needed to map the entire file. If the same file is fragmented into ten block segments, a window block would have to be 9,000 words long to hold all 3,000 retrieval pointers. Such a system would soon drain pool on even VMS systems. Window blocks are limited to some number of retrieval pointers. The default number is fixed when the disk is initialized and/or mounted and cannot exceed 127. A program may change the number of retrieval pointers when it opens a file. The normal number of retrieval pointers is seven. When a disk I/O touches a disk block not mapped by the current window, the file system is called to update the window. This operation is commonly called a 'window turn'. Two main problems result with window blocks. If a volume default is set to some high number to reduce window turns and get good disk performance, much of the space in windows will be wasted when small files are opened. For instance, if the window size is set to 20 retrieval pointers, each window costs 67 words (there is a seven word fixed overhead). 57 words will be wasted for each open file which only needs one retrieval pointer. A solution to wasted space is to set the default window size to full mapping. Now RSX creates windows that exactly map existing files. Initially, the system performs well. As time progresses and files become more fragmented, more and more pool will be used by large window blocks. The full mapping option is very useful, but requires diligence to make sure files to not become excessively fragmented. Auditing Pool -------- ---- A pool audit combines all of suggestions made previously into a formal procedure. The result should be system whose pool usage is tuned to almost the last byte. Pool audits will take 2-5 days to perform. Before beginning an audit, make sure that obvious steps have been taken: loadable device drivers, loadable loader, executive commons, no XDT support, and no large number of unnecessary installed tasks. A pool audit starts by considering if any executive features could be removed from the current system. The symbols in the executive configuration file RSXMC.MAC are divided into those features required for your application and features that are useful but not needed. For instance, error logging is a very useful feature for maintaining your system, but almost all applications would work fine is error logging was removed. There will usually be 6 to 12 items on the typical list of optional features. The next step is to determine exactly how memory each feature takes. You are only concerned with code which is included in the 20K address space. Code in the directive commons or device drivers can be ignored. There is no magic method to total the code which is conditionally controlled by each symbol. If you have a pattern matching utility, you can search the sources in [11,10] of the distribution kit to find which executive modules have code controlled by the symbols. The listings of these modules can be examined to determine how much code is controlled by the symbols. The resulting sizes help you judge whether to trade the executive feature for the gain in pool. For instance, alternate CLI support may have been chosen but little use made of DCL and there are no user-written CLI's. Therefore CLI support is not worth the code which is generated in the executive. Once the purpose of all code in the executive is justified, the next step in the pool audit process is to justify all uses of pool. You start by taking a crash dump of the system. The best time for the dump is when pool failure occurs. Otherwise you need to force a system crash during normal processing in order to get meaningful information about pool usage. The crash dump is analyized using the /ALL switch to get a full listing which includes a dump of pool. Using different color highlighting pens, you color the various pool data structures. You start with free pool segments and continue with task control blocks, task headers, device data structures, etc. You should also be compiling lists of each data structure. Next month's article will take a detailed look at the RSX data structures. For immediate information, see the RSX-11M/M-Plus Crash Dump Reference and the Guide for Writing an I/O Driver Manuals. When finished, you will be able to account for almost all of pool. Anything not colored is some minor structure and can safely be ignored. Now you know how much pool is being used for each individual task. With this knowledge, you can use the minimizing pool hints discussed early and make necessary optimizations. Often, you will find unexpected uses of pool such as large numbers of clock queue entries. The last pool audit I performed on a PDP-11/70 system recovered 2.6KW of pool. In this case, the vast majority came from file windows on large fragmented disk files. More importantly, you will be able to relate pool usage to your application and find the boundary conditions placed on your application by pool limitations. This knowledge can be used to program limits into your applications or justify new systems. Pool is a bounded resource. It is possible that the most extensive optimizations may still leave your system short of pool. You will however be getting the maximum work from your current system while planning for the next generation of systems.