PDP-1 COMPUTER ELECTRICAL ENGINEERING DEPARTMENT MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASSACHUSETTS 02139 PDP-35 INSTRUCTION MANUAL PART 4 -- MULTIPROCESSING 9 November 1971 >>13<<>>75<< Introduction As long as no IO operations take place, the "ethic" of a computer is to execute instructions, in the sequence specified by the programmer, as rapidly as possible. When IO operations are present, the computer usually must be slowed down to synchronize with the limited rate of information transfer of the IO devices. The simplest way to conceive of this is to think of the instructions that operate the IO devices as taking a long time to complete. For example, a type-out instruction keeps the computer busy for the entire length of time that the typewriter is operating. Only when the typewriter mechanism is finished with the character does the instruction complete and the computer go on to the next instruction. This method of doing IO is elegant because it makes the above-mentioned "ethic" still correct>>40<<. execute instructions as rapidly as possible, in all cases. In most computers, this is one way of actually doing IO at the hardware level>>40<<, in inexpensive computers, it is sometimes the only way. To make absolutely clear what is meant, we list some IO operations>>40<<. Type-out (also paper tape punch, plotter, etc.) - When the instruction begins, the data is sent to the device. The instruction ends when the device has finished and is ready to be operated again. Type-in - When a key is struck, the data is sent to the computer and the instruction completes. Paper tape reader - Same as type-in, but the reader mechanism is started when the instruction begins. Drum or tape read - the operation begins when the instruction does. The instruction completes when the data are safely in core and the checksum has been verified. Drum or tape write - The operation begins and ends when the instruction does. A program to type out characters stored in an array might look like this>>40<<. p, lac array ivk 0 /type idx p jmp p and a real time graph of its execution would look like this>>40<<. time >>20<< type type type all the other instructions >>13<>40<<, IO instructions are interpreted by the supervisor.) With this feature, a program that had a large amount of calculation between characters (say a real time printing of "e"), instead of behaving this way>>40<<. compute 1 type 1 compute 2 type 2 compute 3 type 3 during these times, the typewriter is idle behaves this way>>40<<. type 1 type 2 type 3 compute 1 compute 2 wait compute 3 wait compute 4 Similarly, if one didn't care about reliability, one could make a tape writing operation take up the data into a buffer and then complete immediately, allowing computation to proceed while the data are moved from the buffer to the tape. The PDP-1 timesharing supervisor does not do this, so that the user will be assured that, when the instruction completes, the transfer has taken place and, for example, the tape did not break. This is a simple case of multiplexing the processor and an output device. 2) Something similar to 1) can be done for input. Suppose a compiler that runs about twice as fast as a tape reader is to make efficient use of the reader. It might operate like this>>40<<. read 1 read 2 read 3 wait use 1 wait use 2 wait use 3 >>13<>40<<. one to initiate the operation and another to wait for completion and obtain the data. The PDP-1 hardware does this. This is an extremely simple example of multiplexing the processor and an input device. 3) Another useful mode of operation is multiplexing two or more IO devices. For example, to obtain a listing of a file on a machine with two typewriters, the file could be divided in half and each half listed separately. Here a real synchronization problem appears, especially if the typewriters run at different speeds. 4) And, one might want to multiplex the processor with several IO devices. For example, while a batch job is running, the previous job's output could be listing on a line printer (having been stored on a high speed medium such as a disk) and the next job's input being read in and stored on a disk. Such a state of affairs, called SPOOLing (Simultaneous Peripheral Output On Line) is used in some of the less medieval batch processing systems. Here the synchronizing problem is even more formidable, especially since all three operations handle unrelated data, and the main program neither knows nor cares when a card has been read or when the printer is ready for another line. 5) The ultimate in multiplexing of unrelated operations is the timesharing system, which has not only to allow for complete generality in the mode of multiplexing of programs and IO devices, but has to allow the mode to change under program control. There are three common ways of implementing a computer system that can do these things (particularly 3), 4), and 5)). One way is to use an interrupt (sequence break) system, in which IO instructions complete immediately without hanging up the processor. When the IO device completes, it interrupts the processor and starts it on a special "service" program. This program does whatever is necessary, e.g. obtains another line of output and sends it to the printer, and then resumes the original program. Another, much more expensive way is with a "data channel", which is a small processor, independent of the main processor, which executes the IO program (spending most of its time waiting) while the main processor runs at full speed on some other program. The third method is multiprocessing, in which the IO program is executed by what appears to be another main processor, identical to the original one. Because processors with the full instruction set are expensive, and the processor executing the IO program is idle most of the time, multiprocess- ing is usually (as in the PDP-1 timesharing system) "faked" with only one real processor which is multiplexed between programs. >>13<>40<<. It preserves the simple and elegant form of IO instructions given above, viz. an IO instruction hangs up the processor executing it until the operation is complete. The IO programs are handled by a processor identical with the main processor, which is much more powerful than the typical data channel. This compatibility also simplifies communication between processors. Multiprocessing is applicable to situations other than IO synchronization. In a system with more than one hardware processor, multiprocessing can be used to advantage in non- IO situations. Because the PDP-1 has only one processor, and the techniques of multiprocessing are the same in IO and non-IO cases, discussion will be limited here to IO applications. Even so, a subroutine can be considered as an IO device, and in such cases multiprocessing can sometimes bring a conceptual simplification (or a shorter program) which is worth the overhead of multiplexing a single processor. Here is how application 4) (SPOOLing) would look when multiprocessing is used>>40<<. get a line of output read card for from job n-1 job n+1 execute job n print it store it program 1 program 2 program 3 The hardware processor spends almost all of its time executing program 2. >>13<< Instructions The "thing" that runs around in a program executing it is called a virtual processor or >>40<>40<>40<<. A, I, X, F (containing the six program flags and the address mode), G (the program counter, the overflow and extend mode bits), and W (a software register used for communicating with the supervisor). These registers are private to each process>>40<<, no process can directly examine or modify the live registers of another process. All processes running in the same sphere share the same contents of core memory and the same C-list. The >>40<>40<>40<>40<>13<<>>13<< Example Suppose it is desired to build an array of data and display the data while they are being added to the array. The data are being generated as a function of some input, e.g. type-in. It can be done this way>>40<<. /start here with one process frk b a, ivk 200 /get input dac i p /store it idx p jmp a /second process will start here b, law m /start at beginning of array dac q c, sad p /reached end of array? jmp b /yes lac i q /no, get data jdp disp /display data in desired format idx q jmp c dimension q(1),m(1000) p, m /pointer to end of array Conceptually, a time graph of the execution of this program would look like>>40<<. display fork wait wait wait store store store Since the PDP-1 has only one real processor, the execution would actually look like>>40<<. fork display display display store store store >>13<>40<<. law i 1 /(one's complement mode) adm p This would introduce a hazard. Suppose p contained m+101, q contained m+100, and the second process were somewhere in the display subroutine. p would be changed to m+100, q would be idx'ed to m+101, and the display would run off the end of the array, displaying garbage. Note that the bug will not be manifested unless p is decremented just at the time q is pointing at the end of the array. Hence, this program might be tested and found to operate correctly 100 times, which ordinarily would be fairly convincing evidence of its correctness. But it would still have a bug. Hazards such as this are notoriously difficult to weed out in the normal process of debugging, because they are non- deterministic. Moreover, the odds of the hazard manifesting itself may be changed drastically by minor changes in parts of the program that have nothing to do with the real bug. Because hazards may exist without immediately making them- selves evident, those parts of multiprocessing programs concerned with process synchronization and control must be debugged on theoretical grounds rather than with trial runs. This memo contains many examples of properly written multiprocessing pro- grams. The scheduler is that part of the timesharing supervisor responsible for multiplexing the processor among those processes which can run. The process, and not the processor, is of concern to programmers. It is tempting to make assumptions about the likelihood of, say, one process executing a large number of instructions before another process has executed any, but to assume so is to invite hazards. The only guarantee the programmer has regarding the scheduler is the following>>40<<. a process not hung in a wait will >>40<>40<>13<<>>17<< Process Synchronization and Control Synchronization and control is a very important part of multiprocessing. Just as the stored program computer offers an immense variety of ways to realize a given algorithm, there are many ways to realize a solution to a control problem. Since core memory and the C-list are the only things that are shared between processes, all methods of synchronization use one of these two means. One sequence that is very common in program control is called join>>40<<. isp c qit It joins n processes into one>>40<<, the resultant process does not emerge until all n processes have entered. If the variable c is initialized to -n, the first n-1 processes that come to the join will quit, leaving -1 in c. The last process will skip over the qit. In the above, c is a process control variable (sometimes called a semaphore). Process control variables are often used with the idx, adm, and isp instructions, for reasons that will become clear later. Join presents another opportunity for a hazard. The initialization of the counter must take place before the first fork. If it is done this way>>40<<. frk a law i 2 /(one's complement mode) dac c ... jmp j a, ... j, isp c qit ... the process at "a" might do whatever it has to do and get all the way to the isp before the other process initializes c. Then, depending on what happened to be in c, no process would emerge, or one would emerge prematurely. While this may seem farfetched, it isn't. The effects of this bug may be easily demonstrated in practice. >>13<<3 Reentrant Programming It is often desirable to have more than one process executing the same section of code, without any particular synchronization. At any instant, each process may be at any point in the section of code, independent of other processes. Such code is called reentrant or "pure". Suppose that three typewriters are available, and a file is to be printed by being divided into thirds and each third printed. For simplicity, the file will be just an array. It could be done as follows>>40<<. /one process starts here nam frk g2 frk g3 g1, lac i p1 ivk 1 idx p1 sas e1 jmp g1 jmp j g2, lac i p2 ivk 2 idx p2 sas e2 jmp g2 jmp j g3, lac i p3 ivk 3 idx p3 sas e3 jmp g3 j, isp c qit dsm c, -3 dimension a(60) /contains the file p1, a p2, a+20 p3, a+40 e1, a+20 e2, a+40 e3, a+60 >>13<>13<>40<<. g, aam lac i p dac >>56<>40<<. dac i w mul i w dimension w(3) Another bug would be introduced if we tried to use a program-modified ivk instead of an xct>>40<<. lio (ivk 1 X+II dio .+1 0 /ivk 1, 2, or 3 This sequence could be interfered with between the time the ivk is stored and the time it is executed. Program-modified instructions nearly always lead to hazards in reentrant programs. The jsp instruction is useful in reentrant programs, because jdp and jda lead to hazards. >>13<<, Example We now turn to a more involved example, a buffer. When used for output, this will "cushion" irregularities in the rate at which characters are produced, enabling the output device to run at full speed. One process takes characters produced somewhere and inserts them into the buffer. Another process takes the characters from the buffer and outputs them. Since the insertion takes very little time, the process generating characters doesn't have to wait, unless the buffer becomes full. generate put get output character process 1 process 2 Process 1 sends data to process 2 through the buffer. Both processes must send control information to each other. More specifically, when the buffer becomes full, process 1 quits until space becomes available, and must be restarted (with a fork) when process 2 removes the next character. Similarly, when the buffer becomes empty process 2 must quit until process 1 inserts the next character. Let the capacity of the buffer be n and the variable p indicate its current level. The following will work>>40<<. /start here with one process a, [generate] [put] idx p sad (n qit sas (1 jmp a f, frk a b, [get] [use] law i 1 adm p sza i qit sas (n-1 jmp b jmp f p, 0 >>13<>13<<5 After checking that the generate, put, get, and use routines work properly, it is necessary to prove that the multiprocessing is correct. In this case, it must be shown that (1) The number of puts will not exceed the number of gets+n (the buffer will not overflow). (2) The number of gets will not exceed the number of puts (the buffer will not underflow). This is achieved by making sure that no process will run in a section of the program that will violate the above rules, i.e. no process will be in the generate/put section if the buffer is full, and none will be in the get/use section if the buffer is empty. Furthermore, it must be shown that (3) No process will be unnecessarily deleted, i.e., the generate/put process will be removed only if the buffer is full, and the get/use process will be removed only if the buffer is empty. The usual method of proving such a thing is to divide the flow chart into regions, such that each transition between regions is simultaneous with a modification of a semaphore, and then correlate the numbers of processes in the various regions with the semaphore states. start, p=0 | | A [generate] [put] | p|=1 p|=n p=n - p=p+1 - -qit p=1| - frk - | B [get] [use] p|=0 p=0 p|=n-1 | qit- - p=p-1 - - p=n-1 | F - - >>13<< The only possible states are>>40<<. p no. in A no. in B no. in F 0 1 0 0 (start) 1 0 0 1 1 1 1 0 2 1 1 0 ... 1 1 0 n-2 1 1 0 n-1 1 1 0 n-1 0 0 1 n 0 1 0 Exercise>>40<<. Show that no other states can be reached, by considering the effect of any instruction execution on each state. Conclude that, since "get" is not executed when p=0 and p is increased to 1 after "put", the buffer will not underflow. Conclude similarly that the other two requirements are met. >>13<