DEVELOPING AN RSX-11M ACP IN A HIGHER ORDER LANGUAGE Carl T. Mickelson Goodyear Aerospace Corporation 1210 Massillon Road Akron, Ohio 44315 ABSTRACT Developing an Ancillary Control Processor (ACP) in a higher order language for RSX-11M is not difficult. This paper describes the development of an ACP to control a complex permuting memory for a large scale parallel processor. The central portions of the ACP are written in FORTRAN. The program was originally written as an interactive program to aid in the development and de-bug cycle. Once the interactive implementation was completed, the ACP form of the program was implemented using the same object library that was used to create the interactive program. The paper discusses the basic design considerations that permitted this approach to be successful. A discussion of some of the implementation details, and how existing DECUS ACP documentation was used is also included. INTRODUCTION In May 1983 Goodyear Aerospace Corporation (GAC) delivered the Massively Parallel Processor (MPP) to the NASA Goddard Space Flight Center (GSFC) in Greenbelt, Maryland. The MPP is the largest and most powerful parallel processor ever built. The central computational element in the MPP is a square array of 16,384 bit-serial processing elements (PE's) with orthogonal nearest-neighbor connectivity (figure 1). Performing all of its calculations using bit-serial techniques, the MPP is capable of performing in excess of 6 billion 8-bit additions per second (1,2,3,4). Because of its parallel architecture, the "natural" memory access mode for the MPP PE's is to retrieve the same bit of each of 16K operands in a single memory cycle. This word-parallel / bit-serial memory organization is in-consistent with the normal bit-parallel / word-serial memory organization of a conventional computer. Since the data that is processed by the MPP array is generally provided by a conventional, word-serial host computer system, some way must be found to re-organize or "corner-turn" the data that will be processed in the array. It is also highly desirable that this corner-turning operation be performed "automatically", without the requirement to execute any special re-formatting subroutines. The solution to this problem is provided in the MPP system by the inclusion of a hardware buffer or staging memory inserted in the data path between the host computer and the array memory (figure 2). This 1 staging memory, which can be expanded to 64 megabytes, is used to permute data to be processed in the array unit as it is moved to or from the host computer into or out of the array memory. The staging memory converts the bit-parallel / word-serial host data organization to the word-parallel / bit-serial organization required in the array unit (figures 3 and 4). Which of the 2**29 permutations performed by the stager is controlled by a set of 890 control bytes computed from a description of the data organization provided by the user. The purpose of this paper is to describe how the calculation of these control bytes is accomplished using an RSX-11M Ancillary Control Processor (ACP) and to present some of the design considerations that were pertinent to the development of this ACP. STAGER CONTROL SOFTWARE The use of an ACP in this application to provide support for the staging memory is justified for a number of reasons: 1. The calculation of the stager control constants is complex and best placed in a single program in the system. This provides the consistency necessary to insure proper system operation for many different MPP application programs. 2. The ACP mechanism provides the capability to service user I/O requests in a transparent manner. All MPP I/O requests are routed to the MPP device driver. It is the driver's responsibility to involve the staging memory ACP for those requests that require its support. 3. The ACP can be used to both create and to manage an I/O process involving data transfers through the MPP staging memory. The process creation or open function is used to compute the hardware control parameters required to permute data as it is moved through the stager. These parameters are saved and subsequently used to program the stager hardware whenever a staging memory data transfer function is requested by an application program. The I/O process is terminated, and the parameters deleted when a close function is issued by the application program. 4. The ACP mechanism allows this software to be developed without the restrictions that apply to an RSX-11M device driver. As an ACP, the program has all the attributes of an RSX-11M task, and can be overlayed to support the complex computations necessary to produce the parameters needed to program the staging memory. SOFTWARE DESIGN OBJECTIVES A number of design objectives were identified early in the ACP development effort. The first such objective was to develop the parameter calculation algorithm in the form of an interactive higher-order language program. This type of development vehicle would provide the tool necessary to easily debug and check out the algorithm. 2 A second objective was to use a single object library to support both the interactive form of the parameter calculation algorithm and the ACP. Once the algorithm was developed and checked out, it was an objective to use the same object modules to construct the staging memory ACP for on-line use with the MPP device driver. Such an objective could be achieved if individual modules could be driven from either an interactive main program or from a routine providing an on-line interface to the user's I/O request (figure 5). If these objectives could be achieved, both forms of the program could be supported from a single set of source files, easing the program maintenance effort and eliminating the necessity of re-writing the parameter calculation algorithm in assembly language. However, these benefits are not achievable without considering the environment in which the ACP must execute. All interactive FORTRAN I/O must be eliminated in the ACP form of the program. The I/O support required by FORTRAN would make the overall program too large for an ACP. Any error conditions detected by the ACP should be returned to the user program issuing the I/O request by way of the QIO I/O status block. Error messages, issued directly to the user should be minimized. The ACP must fit into 12 Kwords of virtual address space, including a memory allocation bitmap for data buffers in the staging memory. I/O processes opened by the ACP must maintain all the necessary parameters and information needed to program stager permutations at run time. Multiple occurrences of such I/O processes should be able to be created and maintained simultaneously. Because of the possibility of a variable number of concurrently opened processes, the ACP data base can not be pre-sized. Therefore, the parameters needed to maintain the I/O processes would be kept in Executive Pool. Finally, to be easily used to support data transfers to and from the stager, the I/O process information must be accessible to a FORTRAN subroutine in the ACP that programs and controls the stager hardware. SOFTWARE DESIGN APPROACH The first step in developing the MPP staging memory control software was to identify all the parameters needed from the user to describe the required data permutation to be performed in the hardware. In the interactive form of the program, these parameters are read either from a data file or from the user's terminal. In the ACP form of the program, the same parameters are supplied in the user's data buffer identified in an "open" function QIO. The required parameters, organized in a defined data structure for the "open" QIO, are used to initialize a FORTRAN common data base for the hardware control parameter calculations. The interactive program uses a prompt/response subroutine to acquire the needed startup data. The ACP uses a small assembly language subroutine to transfer the initial data 3 from the user's QIO buffer into the FORTRAN common data base. The ACP version of the program allocates a "window block" (similar in concept to the Files-11 ACP) in Executive Pool and uses this block to collect the data that will be needed later to move data through the staging memory. The window block contains a pointer to a second block of pool used to store the computed hardware control parameters. Both the interactive and the ACP versions of the program calculate the hardware control parameters required for the user's data permutation using the low level subroutines that are part of the common object module library. These subroutines are implemented as FORTRAN functions that return a status flag and/or error code in the event of a detected error condition. The interactive program main module uses the error information to inform the user directly, via FORTRAN I/O statements, about the error and its cause. The ACP version of the program returns the error code to the user program that issued the QIO via the upper, device dependent byte, of the first word of the IOSB. This design technique allowed all FORTRAN I/O requirements to be satisfied in a top-level routine so that the low level routines were useable in both program versions. Upon successful completion of the requested I/O process "open" function, the interactive program produces two output files. The first contains a copy of the parameter calculation results in binary form. The purpose of this file will be described later. The second file is a listing of the calculation results that can be interpreted by the user to determine the memory packing efficiency and I/O transfer rate achievable for his specified permutation. A listing file option permits the binary data to be presented in its entirety. This feature was used during the program development effort to help debug the parameter calculation algorithm. The ACP program version simply stores the results of the successful parameter calculation step in the data block allocated in pool. The final "open" QIO step performed in the ACP is to build a FORTRAN calling sequence in the I/O process window block. All the parameters necessary to properly move data through the staging memory are identified and located through addresses contained in this calling sequence. By passing the address of the calling sequence contained in the window blocks of different I/O processes to a single FORTRAN subroutine, the staging memory can be made to perform any previously specified data permutation in response to a data transfer QIO request issued by a user program. An alternate "open" function is also available that permits a user program to create a stager I/O process without need of executing the parameter calculation algorithm at run time. This alternate "open" function makes use of the binary output data file produced by the interactive program. The user program issuing such a "fast-open" QIO provides a data buffer to the ACP that contains a copy of the binary 4 data file specifying the desired permutation. The ACP uses the data to build the same pool data structures that are constructed by the "open" function described earlier. Once these structures are created, it is not possible to determine how a particular I/O process was created. The data structures contained in Executive Pool are identical for either case. ACP IMPLEMENTATION APPROACH After developing and debugging the interactive staging memory parameter calculation program according to the design objectives and restrictions outlined earlier, development of the ACP version was started. The sample ACP package (5) was used as a starting point. The manual accompanying the prototype ACP is an excellent source of information on how an ACP interfaces with the RSX-11M QIO mechanism. The specific type of ACP used for this application is characterized in the ACP documentation as a user ACP. This ACP type was selected in order to avoid making changes to either the RSX module DRQIO, the QIO directive processor, or to the MCR mount processor (MOU). In order to support the ACP, the user written device driver that processes I/O requests for the MPP performs all parameter checks on all I/O functions including those that will be dispatched to the user written ACP. These parameter checks include the re-mapping of data buffer addresses provided in QIO requests to the driver or the ACP. In order for a data transfer to be accomplished properly, the user program virtual buffer address must be mapped into a kernal APR 6 virtual address, with relocation bias, before the QIO is queued to the ACP. When the ACP processes such an I/O request, and subsequently routes a data transfer QIO to the device driver, the buffer address provided in the QIO is actually the previously relocated address of the user program data buffer. Because of this, the device driver, in its QIO parameter checking code, must not relocate the data buffer address in an ACP issued data transfer QIO. RSX-11M, by its design, requires that an ACP must be "known" or mounted to a device in order that "virtual" I/O functions can be issued for a device. This mounting operation is the system step that establishes the link between a particular device and the ACP that supports it. The tasks that perform this function are the enable (ENA) and disable (DIS) tasks provided in the sample ACP package. These programs were directly applicable to this ACP development effort with only minor changes in the default ACP name coded into the ENA source. Reference 6 provides some patches that should be applied to these programs before their use. The basic interface with RSX-11M was provided by the sample ACP prototype. This module serves as the ACP's main program. The stager control parameter calculation algorithm developed in the interactive program is used for I/O process creation in the ACP. This algorithm is called whenever an "open" QIO is issued to the ACP. The window block created for each of these processes is linked to the second LUT word of the issuing task's logical unit table entry for the LUN assigned to the 5 staging memory. This link is then used to access I/O process parameters whenever a data transfer function is issued to the ACP. If such a link does not exist, the ACP returns an error; no I/O process has been created. The ACP is implemented as an overlaid program, written mostly in FORTRAN. The overlay structure is divided primarily between the I/O process creation or open functions and the data transfer functions. The I/O process open function uses an assembly language module to transfer the user's process description parameters from the QIO data buffer. These parameters are used to initialize the same FORTRAN common region that is filled in via user dialog in the interactive development program. The ACP then executes the hardware parameter calculation algorithm to fill in the I/O process window block. Error conditions detected in the user's initial data are returned to the user program as an ACP error code. After completing the hardware parameter calculation algorithm, the ACP fills in a 14 element FORTRAN subroutine calling sequence in the process window block. This calling sequence is used by the data transfer function module to move data into and out of the staging memory. Assembly language modules are used to allocate space in executive pool for the parameter buffers referenced in the process window block. These routines are also used to create an I/O process from pre-computed, user supplied hardware parameters from a saved binary data file as described earlier. A data transfer request causes the ACP to find the appropriate window block linked to the second LUT word of the user program's assigned LUN. The main ACP module then calls a FORTRAN subroutine to do the actual data transfer using the calling sequence in the selected window block. Even though the ACP is overlaid, a properly written user program will not suffer excessive I/O processing delays. If all the required I/O processes are created and then used repetitively, the ACP overlay structure will not need to be re-loaded between QIO requests. Some special steps are performed at compile and task build times to minimize the size of the ACP. First, the FORTRAN language modules are complied with all traceback code supressed. Second, FORTRAN run-time error messages are limited only to an announcement of the runtime error number and the program counter where the error occurred. This limited error report is produced by task building the ACP with the F4POTS library module $SHORT. Finally, since the ACP is primarily a FORTRAN program that performs no FORTRAN I/O, certain of the F4POTS I/O support modules may be shortened by using selected module versions contained in the F4PNIO library. This library can be created in UIC [1,1] from the FORTRAN language distribution file F4PNIO.OBJ. Details of the use of these modules can be found in the FORTRAN 4 Plus, or FORTRAN 77 User's Guides. 6 DEBUGGING AIDES AND SURPRISES Debugging an ACP has some interesting side-effects. Since an ACP has all the attributes of a task, it can be debugged using ODT as a "user" program. Additionally, if the "debuggable" ACP is built with default, rather than an elevated, priority, the debugging effort can generally be accomplished without a severe impact on system performance. Two complications arose in debugging this particular ACP because the program was overlaid and it was written in FORTRAN, requiring the use of some of the FORTRAN run-time support routines. The first complication was eased significantly by making use of the information contained in reference 7. Knowing the techniques described in this article, it should have been easy to start up the ACP and begin a debugging session. Since an ACP is not normally started up like a user task, but rather by a request directive (RQST$) issued by the ACP enable task (ENA), the ACP normally runs with TT0: as its default terminal device. If the ACP is going to be debugged, and TT0: and another system terminal are available, it is best to start the user program that will be using the ACP on the second terminal and use TT0: to enter ODT debug commands. If only one terminal is available, it should be TT0:. Debugging efforts can be successful if care is taken in entering commands since the terminal is shared by both the user program and ODT. However, attempting to debug the ACP with ODT proved to be somewhat challenging. When the enable task (ENA) was used to initiate the ACP to be debugged, it seemed that the program would not execute. If started from a terminal with a RUN command, the ACP would properly prompt for an ODT command and would run correctly when started with the "G" command. As a result of a telephone conversation with Ralph Stamerjohn, at Monsanto, it was finally realized that the ACP, when started by the ENA task, was running with its terminal set to the CO: device. When the task was started, ODT would attempt to input a command from CO:. With the console logger (COT...) active, the read request received an end-of-file and the ACP exited! If COT... was first stopped with the command SET /NOCOLOG, the ACP built with ODT executed properly when started with the ENA task. Note that during normal system operation, using an ACP without ODT, if the ACP should abort, the console log will have a message containing information identifying the cause and location of the error. The second check-out complication encountered in developing the ACP was related to the FORTRAN OTS interface. The FORTRAN subroutine used by the ACP to move data through the staging memory has three optional variable length arrays passed to it when it is called with the parameter list built in the I/O process window block. Certain permutations that have no need to use these arrays use an address value of -1 to indicate that the parameters are not defined. When five calls were made to this subroutine for permutations that did not use the optional parameters, the ACP would abort. The problem was 7 traced to a FORTRAN OTS run-time check done on variable length passed to a subroutine. Since the address -1 is an illegal PDP-11 word address, the OTS would detect and count an array size error for each unspecified array parameter upon entering the subroutine. When the task error count was exceeded, the ACP aborted. The fix for this particular problem was to call the ERRSET subroutine during ACP initialization to turn off error logging and limit checking for this error condition. In this way the ACP would continue to execute correctly for permutations that did not require the optional parameters. FURTHER INFORMATION Selected modules from the ACP, together with the manuscript for this paper are available on the RSX Sig Tape for the Fall 1983 DECUS Symposium. The modules included on the tape are limited only to those top-level routines that characterize both the interactive and ACP program forms. The missing modules are limited to those that perform the actual stager hardware parameter calculations. These modules are peculiar to the ACP developed for the MPP. They provide no additional specific insights into the design of an ACP for implementation using a higher order language. REFERENCES 1. Batcher, Kenneth E., "Massively Parallel Processor (MPP) System", AIAA Computers in Aerospace Conference II, October 1979 2. Tsoras, John, "The Massively Parallel Processor (MPP), Innovation in High Speed Processors", AIAA Computers in Aerospace Conference III, October 1981 3. Gilmore, Paul A., "The Computer MPP", International Society of Photogrammetry and Remote Sensing Commission II Symposium, August - September 1982 4. Burkley, John T. and Mickelson, Carl T., "MPP: A Case Study of a Highly Parallel System", AIAA Computers in Aerospace Conference IV, October 1983 5. Stamerjohn, Ralph, "Up Your ACP" (et al), DECUS Spring 1980 Symposium RSX SIG Tape, UIC=[346,100] 6. Stamerjohn, Ralph, "Bug Fixes for Sample ACP", DECUS RSX Multi-Tasker, Vol 16, #2, August 1982 7. Johnson, Kenneth, "Using ODT in Overlaid Programs", DECUS RSX Multi-Tasker, Vol 13, #2, August 1980 8. "MPP System User's Guide", Goodyear Aerospace Corporation, GER-1714, April 1983 FIGURE CAPTIONS Figure 1 - MPP Array Unit Figure 2 - MPP System Block Diagram Figure 3 - MPP Staging Memory Figure 4 - Stager Reformatting Example 8 Figure 5 - Stager Software Structure 9