USING THE DR11-W DMA DEVICE FOR INTERPROCESSOR COMMUNICATIONS IN RT-11. Mark Pyatetsky, Peter Heinicke, David Ritchie, Vicky White Fermi National Accelerator Laboratory Batavia, Illinois ABSTRACT ________ At Fermilab, DR11-W's have been used as high speed data links to interconnect PDP-11's (under both RT-11 and RSX11-M) in data acquisition applications. Using this hardware, several processors can be interconnected to provide distributed data collection, data monitoring and control for High Energy Physics experiments. This paper discusses the implementation of the DR11-W link under RT-11. It describes the RT-11 handler, and some problems (and solutions) associated with the RT-11 handler design are discussed. Among these: implementation of the internal queues in the handler, serialization of the completion routines, time-out support, multiple DR11-W devices, interface with the application programs, error reporting. INTRODUCTION The Data Acquisition (DA) systems, used at Fermilab, are designed to process large volumes of data in very short periods of time. These DA systems run on various PDP-11 and VAX configurations under either RT-11, RSX-11M or VMS. Interconnecting the DA systems via high-speed data links increases event data rates, provides extra memory and processing time and other benefits. These data links must allow, of course, communications among the DA systems running under different operating systems. At Fermilab, these data links are implemented by using DR11-W DMA devices. In this paper we will discuss the implementation of the DR11-W link under RT-11. We will show how an RT-11 application program can communicate with an RT-11 or RSX-11M program running in a different PDP-11 processor, using a DR11-W interprocessor link. We will first briefly describe the levels of protocols for such a communication. Next, we will discuss why we implemented our link communication driver as an RT-11 device handler. We will also talk about some additional features not generally provided by RT-11 handlers. Implementation of internal queues in the handler, serialization of the completion routines, time-out support and multiple DR11-W devices will be covered here. Finally, we will describe the interface with the application programs, error reporting, FORTRAN interface routines. Our (so far limited) experience in using the handler will also be discussed, together with some performance data. THE INTERPROCESSOR LINK LAYERED ARCHITECTURE. We will now briefly describe the layers of interfaces and protocols necessary to implement an interprocessor link architecture. A more detailed discussion is presented in the associated paper (1). On this project, we implemented a point-to-point interprocessor communication link. Figure 1 depicts some possible link configurations. An RT-11 application program running in one processor "talks" to an RT-11 or RSX-11M application program running in another processor within a framework of a three-layered architecture (Figure 2). Layer 1 implements physical link (hardware) protocol. It is in essence, two DMA controllers (DR11-W) connected end-to-end over a parallel link, interfacing each processor's UNIBUS. Layer 2 provides data link control. It is a communication driver which is implemented as a device handler under RT-11 (It is a device driver under RSX-11M - see associated paper (2)). Two communication drivers (one per processor) interact with each other (horizontally) in accordance with the data transmission protocol, developed at Fermilab (1). Hardware interrupts and DR11-W registers provide for a (vertical) interface between layer 2 and layer 1. The application programs constitute layer 3 in this hierarchy. Two application programs running in different processors interact with each other in accordance with their own logical protocol which may be different for different types of applications. RT-MULTI is one example of an application program (3). A file transfer program is another example. The application programs can be written in MACRO-11 or in FORTRAN. The application programs interface (vertically) the layer 2 communication driver via software driver interface. Different interfaces are provided under RT-11 and under RSX-11M. The RT-11 software driver interface will be further discussed in more detail in this paper. We have also developed a set of FORTRAN-callable subroutines, CDPACK, which converts the RT-11 or RSX-11M software driver interface into a standard application program interface. CDPACK allows the same application program, written in FORTRAN, to run under either RT-11 or RSX-11M. CDPACK is described in more detail in (1). THE COMMUNICATION DRIVER (CD:) The communication driver in the RT-11 environment is written as an RT-11 device handler. This approach has several advantages. It allows standard I/O to the link (writes, reads, special functions) from the application program. It also provides all application programs with a single standard interface to the DR-11W, and relieves them from having to implement the data transmission protocol (layer 2), and will hopefully insulate them from possible future changes in DR-11W (by DEC) and the data transmission protocol. In addition, a mechanism of completion routines can be used with the device handler. The completion routines are very helpful in preventing possible communication deadlocks, or in performing several I/O's concurrently (e.g. to a disk and the communication link). Finally, the device handler approach adds a greater degree of flexibility: being written in PIC (Position Independent Code) it can be moved around in memory very easily, logical unit numbers can be assigned to physical links, "set" function allows for setting or resetting of various handler features from the console or the application program, I/O requests are queued and can be executed concurrently with the application program, etc. It turned out, however, that our communication driver needed some additional features not generally provided by RT-11. Before we get to the implementation of the internal queues, serialization of the completion routines, time-out support and multiple DR11-W devices, let us first describe the functionality of our interprocessor communication driver. HOW I/O REQUESTS ARE PROCESSED BY THE DRIVER. There are several types of I/O requests processed by the driver. As a typical example, we will consider here the two most frequently used: sending and receiving messages of several 16-bit words each ('several' means from 1 to 32000 words). Each message sent over the link has a destination address which we call PTC (Packet Type Code). The DR-11W device has two modes of transfer: single word and DMA. In accordance with our transmission protocol, the driver uses single word mode to tell the other driver the word count and the PTC of a message that it wants to send. Then both drivers simultaneously set up a DMA transfer of the message over the link. It should be noted that the driver does not have a buffer space for the messages it sends or receives - the driver sets up the DMA transfer directly from or into the buffer space provided by the application program in its write or read request. Therefore, the driver will not accept unsolicited messages. The application program must have issued a READ request, providing the driver with the buffer area in which to put the message. Several read and/or write requests can be outstanding at any time. The driver processes all READ requests randomly, i.e. as soon as the message with the requested PTC is received on the link. All WRITE requests are processed sequentially, i.e. in order they are posted by the application program. This type of processing of the I/O requests is drastically different from commonly used by other RT-11 handlers such as a disk handler or magtape handler. A disk handler, for example, processes all READ/WRITE requests sequentially, i.e. in the order they are received. Therefore, our communication driver maintains internally two separate queues for the outstanding READs and WRITEs. It turns out that our driver has to maintain yet another internal queue - we call it EXIT queue. We will show why this queue is necessary and how a queue element travels from one queue to another later in this paper. Let us now switch our attention to another quite common communication problem - a message timeout. In the course of exchange of messages between the processors, either processor may get hung, or stopped or the link hardware break down. Therefore, each step (transaction as we call it) of the message exchange is timed out. That is, most transactions sent over the link must be acknowledged by the other side within a certain period of time (timeout window). If no acknowledgement is received during this time, a timeout mechanism triggers execution of the timeout routine. The length of the timeout and the timeout routine may be different for different transactions. The timeout problem is compounded by the fact that for multiple DR11-W links, several timeouts may be outstanding at any given time. The implementation of the timeout mechanism is discussed later in this paper. INTERNAL QUEUES. Each RT-11 device handler is normally assigned one I/O queue, which is administered by the RT-11 Monitor. Whenever the application program issues an I/O request (e.g. in MACRO-11: .READ, .WRITE, .SPFUN), the RT-11 Monitor picks up a queue element from the pool of idle queue elements, fills it out with the parameters, supplied in the I/O request, and places it on the device handler I/O queue. If the device handler I/O queue was previously empty, the Monitor calls the handler so that the handler can start processing of the I/O request. When the handler finishes processing of the I/O request, it returns to the Monitor by executing the .DRFIN macro. The Monitor reclaims the queue element from the handler's I/O queue and either assigns it immediately to the pool of idle queue elements, or first arranges for the execution of the completion routine and assigns the queue element to the idle pool afterwards. (We say "arranges", because this step is done differently in SJ and FB or XM Monitors). If, at this time, the handler I/O queue is not empty, the Monitor calls the handler to start processing of the next I/O request. This works just fine when the I/O requests are to be processed sequentially, in the order they are received by the Monitor. As we already explained, we needed a different order of the I/O processing. Therefore, we implemented internal queues in our handler. The handler has three internal queues. They are shown in Figure 4. Here, the queue marked 'CD' is the standard handler I/O queue, normally administered only by the Monitor. This queue is always kept empty by our handler. Hence, whenever the Monitor receives a 'write' or 'read' request, it queues it up in the handler 'CD' queue and always calls the handler. The handler immediately transfers the queue element from the 'CD' queue to either 'writes' or 'reads' queue. When the I/O request completes, its queue element must be removed from the 'writes' or 'reads' queue and placed back on the 'CD' queue, so that it can be eventually reclaimed by the Monitor. Well, the removal is easy. However, the immediate placement on the 'CD' queue may create a problem. The I/O request may complete on an interrupt level at priority higher than zero. It may have interrupted when a new I/O request was just placed on the 'CD' queue. Executing .DRFIN or its equivalent at this moment may cause a mix up of the queue elements with unpredictable results. This is why the completed I/O requests are first transfered into the 'EXIT' queue. They are transfered to the 'CD' queue one at a time, whenever 'CD' queue is empty. When a queue element is transfered from one queue to another, say from queue A to queue B, it is always removed from the beginning of queue A and linked at the end of queue B. The handler processes 'write' requests sequentially, always working with the request at the beginning of 'writes' queue. The 'read' requests are processed randomly. As soon as the message with the requested PTC is received on the communication link, the handler searches the 'reads' queue. When it finds a queue element with the requested PTC, the handler moves this element to a position at the beginning of the 'reads' queue and works with it until this READ completes. Implementation. The data structures for ______________ the 'writes', 'reads' and 'EXIT' queues are shown in Figure 5. Note, that each internal queue is represented by: queue count - showing the number of queue elements in the queue, the current queue element pointer - pointing to the beginning of the queue, and the last queue element pointer - pointing to the end of the queue. The movement of the queue elements from one queue to another is done by macro 'QUEXFR' shown in Fig. 6. The calls to this macro, which perform moves 1,2,3,4 and 5 in Fig. 4, are shown in Fig. 7. Figure 8 shows macro 'CHKPTC' which searches a queue for a requested PTC and moves it to the beginning of the queue. SERIALIZATION OF THE COMPLETION ROUTINES. The RT-11 SJ (Single Job) Monitor schedules completion routines differently, from FB or XM Monitors. Whereas the FB and XM Monitors execute the completion routines serially, in the order in which they are released by the device handler, the SJ Monitor does not provide this feature. It may, in fact, interrupt one (running) completion routine to start another. This may cause an unwanted restriction to be imposed, that the completion routines be re-entrant. It is a problem if the completion routines were supposed to be written in FORTRAN (since the FORTRAN-IV code is not re-entrant), or when new I/O requests ought to be posted by the completion routine (.READ, .WRITE and .SPFUN are not re-entrant). Implementation. We already mentioned ______________ ".DRFIN or its equivalent" when we talked about the 'EXIT' queue. We will now describe it in more detail. The .DRFIN is the macro a standard RT-11 handler executes to pass a queue element with the completed I/O to the Monitor. By executing .DRFIN, the handler also exits (to the Monitor). Our communication handler executes a .DRFIN substitute which we call 'JSRFIN' (see Fig. 9). This .DRFIN substitute is described in Chapter 7.4 of the RT-11 Software Support Manual. The 'JSRFIN' macro returns the queue element to the Monitor without exiting. That is, after the Monitor is called by the 'JSRFIN', it reclaims the queue element and, if necessary, calls the completion routine. After the completion routine finishes, the Monitor returns to the 'JSRFIN'. The serialization is done (see Fig. 10) by setting a flag 'D$FFIN' before the 'JSRFIN' and resetting it after the 'JSRFIN'. THE TIMEOUT SUPPORT The RT-11 supports time-out in its device handlers with two macros: .TIMIO and .CTIMIO. These macros and their use are described in Chapter 7.6 of the RT-11 Software Support Manual. We will repeat some of this information here, and we will also discuss some problems with the implementation of the time-out support in our handler. Finally, we will show our implementation of the time-out support. The .TIMIO and .CTIMIO macros can only be used by device handlers. The handler requests a timeout with the .TIMIO macro and cancels its previous timeout request with the .CTIMIO macro. For each timeout request, the handler must allocate a 7-word block in memory which can not be re-used until either the timeout expires and the Monitor executes the handler's timeout completion routine, or the timeout is cancelled by the handler. This timer block contains, among other data, the time interval in number of clock ticks (one tick is 1/60 of a sec), the timer block number (from 177400 to 177477 octal) and the address of the completion routine. The .TIMIO and .CTIMIO requests can only be made at priority 0 and so must be preceded by a .FORK macro call if made at an interrupt level. If the device has timed out by the moment the handler placed a .CTIMIO request, the .CTIMIO call returns with the carry bit set (.CTIMIO fail condition). This means that either the completion routine has already executed or is about to execute (and it is too late for the handler to stop it). Our communication handler must issue most of its timeout requests at an interrupt level. The interrupts may come in as often as once every 1/1000 of a sec. Almost on each interrupt, the handler must cancel the old timeout request and issue a new one. The time interval (to be set) varies with each interrupt from 2 to 60 ticks, but most of the time it is 2-4 ticks. On each interrupt or timeout, the handler sends a transaction over the DR11-W link, and the contents of the transaction is defined by our link transmission protocol (see Fig. 2). This contents depends on the handler's internal state and is different for different interrupts and timeouts. Issuing .CTIMIO/.TIMIO on each interrupt occurence would involve forking and forking means delays. These delays may be comparable with our transmission delays. Besides, forking would require special care at a fork level - code re-entrancy or device interrupt disable. Implementation. We use a concept of a free ______________ running timer. The timer is not associated with any particular transaction's interrupt or DR11-W unit. It is just one timer (and timer block) for all occasions. When the message exchange on the link starts, the timer is started, when the message exchange ceases, the timer eventually stops. The .TIMIO macro is used to start or re-start the timer. The .CTIMIO is not used at all. When the address of the completion routine in the timer block is non-zero - the timer is running, otherwise the timer is not running. Each DR11-W unit is assigned a timeout counter in the handler (Fig. 11). Every time the Monitor fires up the timer's completion routine, all non-zero timeout counters are decremented by the routine (see Fig. 12). Next, the routine re-starts the timer (with .TIMIO) if there was at least one non-zero counter prior to decrementing. Finally, the routine executes the timeout actions for all units whose counters reached zero (after decrementing). Whenever the handler needs to start (or re-start) a timeout at an interrupt level or elsewhere, to a particular time interval for a particular DR11-W unit - it just sets the unit's timeout counter to the required number of ticks. Fast and convenient, and no delays. Fig. 13 shows macro 'ENTIME' which does that, Fig 14 shows an example of how this macro is used. MULTIPLE DR11-W DEVICES WITH SOME COMMENTS ON PIC. PIC of course stands for Position Independent Code. In this section we will discuss some problems of supporting several units with the same handler, what is available in RT-11 and the additional efforts necessary to do the job. Basically, multiple units (in our case - multiple DR11-W's) as compared to a single unit, may impose the following requirements on the device handler: (1) separate interrupt vectors, say, one per each unit (2) separate sets of registers (CSR, I/O data registers, etc.), one per each unit The first feature is fully supported by the RT-11 macro .DRVTB and is described in Chapter 7.2.2.4 of the RT-11 Software Support Manual. We will not discuss it any further. The second feature is not supported by RT-11 (there are no device block data structures on a per-unit basis). In our handler we have data structures for the CSR's, output data registers, timeout counters (see Fig. 11), etc. The problem is with accessing these data structures, since the handler code is written in PIC. One way of doing it is PC-relative, which is explained in Appendix G of the Macro-11 Language Reference Manual (see also Fig. 12, "PC-relative"). This usually involves 2-4 instructions per each access. For large handlers it may produce quite an overhead. In our handler we do it base-relative. We first establish a PC-relative base at the beginning of the handler (see Fig. 3). Then, say, at the interrupt or timeout level, after the unit number of the device is established, say, in register R1, we add 'base' to it (see Fig. 12, "1"). After that, the data structures can be accessed with a single instruction (see Fig. 12, "2") using R1 as an index register. THE HANDLER - APPLICATION PROGRAM INTERFACE. Our communication handler (or driver as we also called it) makes the DR11-W link appear to the application program as a non-file structured device, similar to a magtape filled with card images. As we already mentioned, the driver writes into and reads from the application program buffer area directly, i.e. the driver will not maintain its own buffers for sending and receiving the DMA messages over the DR11-W link. The following RT-11 standard programming requests are available to the application programs for interface with the driver (requests marked ** below do not directly interface the driver): ** .FETCH (fetch the driver into the memory) ** .LOOKUP (open logical channel) * .READC/.READ/.READW (reads) * .WRITC/.WRITE/.WRITW (writes) * .SPFUN (special function requests) ** .WAIT (wait for request to complete) ** .CLOSE/.PURGE/.SRESET/.HRESET,etc. (close logical channel) ** .QSET (allocate additional queue elements) Care should be taken in using .READW or .WAIT after a .READ request, since the application program could wait indefinitely if no message of the requested PTC arrived. The use of .READC and .WRITC requests is preferred, in order to speed things up (no waits!) and to avoid confusion or possible deadlocks. When using .READC or .WRITC, the application program specifies a completion routine, which can perform error handling, transmission restart, etc. There are several types of .SPFUN requests available to the application programs. The .SPFUN 'kill' stops the driver cleanly, that is, it returns all queue elements to the Monitor (with an appropriate error message for the application program), disables the DR11-W interrupt and puts the driver into its initial state. This .SPFUN must be called before closing the logical channel to prevent system crash. The other .SPFUN requests provide some specific communication functions (see (1)). They will not be covered in this paper. Message Block Area. It is well known that _______ _____ ____ the RT-11 is not very generous in providing its device handlers with the means of reporting errors and status of the I/O requests to the application programs. Therefore, when issuing I/O requests (read, write, spfun) to our communication driver, the application programs supply an additional 4-word Message Block Area (MBA for short) per each I/O request. There must be one MBA per each I/O request, which can not be re-used before the I/O request completes. The address of the MBA is placed in the 'block' field of the I/O request. For example, the .READC call is of the form (see also the RT-11 Programmers Manual): .READC area,chan,buf,wcnt,crtn,blk where blk = address of MBA If the I/O request provides an invalid MBA address, the driver returns 'hard' error (in channel status word), otherwise the driver uses the MBA for error and status reporting. The MBA has the following layout: word 1: PTC of the request (1-255) word 2: Message block number (1-32) word 3: Error status returned by the driver word 4: Word count, actually sent/received by the driver The Message Block Number is used by the application program to distinguish between its various read/write requests, on completion. It is returned to the application program in bits 8 to 12 of the channel status word upon completion of the read/write (this status word is available to the completion routine). The application program can write zero into word 3 of the MBA before placing its I/O request. Since the error status (or success) returned by the driver on completion of the request is not equal to zero, the application program could periodically check this word to see whether or not the I/O completed. USING THE HANDLER, SOME PERFORMANCE DATA. We have had so far very limited experience of using the handler (primarily in the test environment). We have successfully performed several tests. In particular, we have had two processors, either both running RT-11, or one running RT-11 and the other running RSX-11M (or both running RSX-11M), talking to each other. They exchanged, successfully, tens of thousand messages of various length (from 1 word to 4000 words per message). In each case, the data communications were very stable and reliable. According to its specs, the DR11-W has a burst rate of 500000 words/sec. (16-bit words). We have observed the DMA rate of about 330000 words/sec. The RT-11 driver has a delay of about 7 msec per each DMA message transfer across the link. Our plans call for using it in a limited production environment by the end of this year. CONCLUSIONS. In this paper, we have presented some results of our interprocesor communications project. We have found, among other things, that RT-11 is capable of supporting a high-speed interprocessor link. We have developed quite elaborate communication protocol (discussed in details in (1)), which takes full advantage of the characteristics of the DR11-W, and implemented this protocol in our RT-11 device handler. We hope that the problems and solutions we have discussed in this paper would be helpful to all those who design non-standard RT-11 device handlers particularily in the area of the internal queues, timeout support, multiple devices and error/status reporting to the application program. REFERENCES 1. J.Biel, D.Burch, R.Dosen,P.Heinicke, M.Pyatetsky, D.Ritchie, V.White, 1982 "High Speed Interprocessor Data Links Using The DR11-W", 1982 Fall DECUS U.S. ____ ____ _____ ____ Symposium, Anaheim, Ca _________ 2. D.Burch, V.White, R.Dosen, 1982 "An RSX-11M Device Driver Implementing A Network Protocol For The DR11-W", 1982 ____ Fall DECUS U.S. Symposium, Anaheim, Ca ____ _____ ____ _________ 3. J.Bartlett, J.Biel, D.Curtis, R.Dosen, T.Lagerlund, D.Ritchie, L.Taff, 1979 "RT/RSX MULTI: Packages For Data Acquisition And Analysis In High Energy Physics, IEEE Transactions On Nuclear ____ ____________ __ _______ Science, Vol. NS-26, No 4, August 1979 _______