Caught in the Act All programmers encounter program errors. The majority are simple to track down and fix. There is a straight-line you can trace back from the point where the error became visible and it actually occurred. There are two parts to any program error. The physical error is the point where the program faulted. In RSX systems, this is usually means a task abort, infinite loop, or some other abnormal behavior. The logical error is the actual point in the program which must be corrected. Debugging programs is simply a matter of tracing back in time from the physical error to the logical error. Some fraction of program errors defy solution. There may not be any tracks left by the time the physical error occurs. All you know for sure is sometime since the system was booted a value in global common was corrupted or a vital record in the master database file was overwritten. Such errors are analogous to a lightning bolt striking a tree. It is obvious from the splinters what the damage is, but it is impossible to trace back up to the sky to the exact point where the bolt came out of the sky. Once you have established the symptoms of a problem, there are a a variety of debugging techniques which let you catch the guilty party in the act. Fortran Traceback ------- --------- One error which is difficult to trace is a corrupted variable in a common area. If you are programming with Fortran, you can use the Fortran module traceback mechanism to track down the guilty module. This technique lets you check the problem variable before and after each subroutine call. If a bad value is detected, you can trap the program immediately. Unless a Fortran module is compiled with /TR:NONE, the first action any Fortran module takes is to call the NAM$ routine to add the module's name to the traceback list. When the subroutine exits, control is passed back to NAM$ and the module name is removed from the list. By adding some code to NAM$, you can use it to check for the problem symptoms on entry and exit from each subroutine. Thus you can narrow the logical problem to the specific module. Figure 1 shows the disassembled Macro-11 code for NAM$ (comments added by author). Figure 2 shows how NAM$ was changed to track down a specific problem. The update NAM.OBJ module is included explicitly in the task build of the target program. In this case, variable FOO in common area BAR should never be set to zero. Fortran listings shows FOO is offset 102 (octal) bytes from the start of common area BAR. When a zero value is detected, a Fortran error 98 is declared (user-declared error). The module listed first in the traceback chain has set FOO to zero. T-Bit Trap ----- ---- The technique above only works because of the method used to implement Fortran module traceback. Some other technique must be used if coding in Macro-11 or using a language which does not have a traceback facility similar to Fortran. Although it generates a fair amount of overhead, it is possible to trace every instruction executed by a task. The trace bit (T-bit) in the Processor Status Word causes a trap after every instruction. One little known debugging aid supplied by Digital that uses this feature is LB:[1,1]TRACE.OBJ. When linked to a task, TRACE outputs to the console listing device (CL:) a register dump for every instruction. The routines traced can be controlled by task build parameters. TRACE is fully documented in Chapter 6 of the IAS/RSX-11 ODT Reference Manual. The Digital TRACE module produces volumes of output. It may be simpler to code a trace module which watches for your specific error. Figure 3 shows a trace module (TRACE.MAC) which again watches variable FOO for a non-zero value and halts when the error is detected. In this case, the technique detects the exact instruction causing the error. The tracing module is designed to be linked to the target task as a debugging aid. This is using done by linking TRAP in the task build command file as TRACE.OBJ/DA. When the task is started, the tracing module is called. The trace module first establishes a T-trap handler using the SVDB$ directive. The T-bit is set and control passed to the actual task entry by faking a interrupt return. RSX traps after every instruction and passes control to the T-trap handler. If no error is detected, the module simply returns using the RTT instruction instead of RTI to prevent an immediate trap. While the sample trace module halts on an error, you can modify it to take whatever action is appropriate. Fortran Execution Profile The Fortran traceback mechanism and the T-bit trap can also be used to profile a Fortran program's execution. The NAM$ routine lets you total the number of times each subroutine is called. The T-bit trap lets you count each instruction executed. The module TRACK.MAC combines the two techniques. You start execution profiling at the appropriate point in your task with a call to TRKBEG. This routine takes three arguments: the number of subroutine counting entries, the starting address of the entries, and an overflow entry. Each subroutine entry takes 3 double precision words: the Radix-50 subroutine name, the number of times the routine is called, and the number of instructions executed. All values should initially be set to zero. The excution profile is stopped with a call to TRKEND. The program TEST shows a simple example of how TRACK can be used to profile execution. This technique can be used to find out where a program is spending most of its execution time. It is also useful for comparing two or more different algorithms.