A comparison of sorting routines. B. Z. Lederman I.T.T. World Communications. In the Spring 1981 proceedings, there was an article comparing various sorting routines. This article estimated the time various routines would take by theoretical calculations. Not long after, we had an aplication which called for sorting a large array in core, and I was interested in finding out just how long each routine would take, so I wrote all of the routines in Fortran, and ran them on a lightly loaded PDP-11/70 with RSX-11M (at that time, V3.1 or V3.2). In order to test the timing, all routines were compiled with the Fortran-IV-Plus compiler with no trace. All of the programs use the same base: PROGRAM SORT IMPLICIT INTEGER (A-Z) PARAMETER MAX=2000 REAL SECNDS, A, DELTA, RAN INTEGER*2 D(MAX) C DO 50 I=1,MAX A=100.*RAN(1,1) 50 D(I)=NINT(A) C A=SECNDS(0.0) C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C (one of the sorting modules goes here) C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC C DELTA=SECNDS(A) TYPE 60, DELTA 60 FORMAT(1X/' TIME = ',F14.7/1X) C DO 80 J=25, MAX, 25 I=J-24 TYPE 10, (D(K),K=I,J) 10 FORMAT(1X, 25(1X, I2)) 80 CONTINUE C STOP END The program is very simple. The loop ending at 50 puts random numbers between 1 and 99 into an array whose size is determined by the parameter MAX, and the time is obtained from the system. The chosen routine then sorts the array, and the elapsed time is obtained from the system and printed at the terminal. The loop ending at 80 prints out the data so I can see if the routine worked properly. As the (old) random number generator always generates the same sequence of pseudo-random numbers, all of the routines work on the same data, so PAGE 2 the timings are valid. The sorting routines used are: BUBBLE SORT DO 500 K=1, MAX-1 DO 300 L=1, MAX-1 IF (D(L).LE.D(L+1)) GOTO 300 X=D(L) D(L)=D(L+1) D(L+1)=X 300 CONTINUE 500 CONTINUE INSERT SORT DO 500 K=3, MAX+1 DO 300 I=K,3,-1 X=D(I) D(1)=X J=I-1 IF (X.GE.D(J)) GOTO 300 D(I)=D(J) D(J)=X J=J-1 300 CONTINUE 500 CONTINUE SELECTION SORT DO 500 I=1, MAX-1 D=D(I) DO 300 J=(I+1), MAX IF (D(J).GE.X) GOTO 300 X=D(J) D(J)=D(I) D(I)=X 300 CONTINUE 500 CONTINUE PAGE 3 SHELL SORT M=1 20 IF ((2**M).GT.MAX) GOTO 30 M=M+1 GOTO 20 30 M=M-1 M=2**M C 100 K=MAX-M DO 500 J=1,K DO 300 I=J,1,-(M) IF (D(I+M).GE.D(I)) GOTO 500 X=D(I) D(I)=D(I+M) D(I+M)=X 300 CONTINUE 500 CONTINUE M=M/2 IF (M.GT.0) GOTO 100 PAGE 4 TREE SORT L=MAX DO 100 I=(MAX/2),2,-1 100 CALL UPTREE I=1 DO 200 L=MAX,2,-1 CALL UPTREE X=D(1) D(1)=D(L) 200 D(L)=X The following subroutine is also required. SUBROUTINE UPTREE IMPLICIT INTEGER (A-Z) PARAMETER MAX=2000 DIMENSION D(MAX) COMMON /TR/ D, I, L C A=I X=D(A) 300 F=2*A IF (F.GT.L) GOTO 500 IF (F.EQ.L) GOTO 400 IF (D(F+1).GT.D(F)) F=F+1 400 IF (D(F).LE.X) GOTO 500 D(A)=D(F) A=F GOTO 300 500 D(A)=X RETURN END As the article mentioned explains the sorts I will not do so here except for a few particulars about the implementation. In the Shell sort, the selection of the jump parameter is important if the sort is to work. The method I used is to set M to be the largest power of two which is not greater than the number of items to be sorted (MAX). The routine in fortran looks complicated, but it is not. When we implemented this sort later in assembly language, it was very easy, as one simply finds the most significant bit set in the number of items, and this is your starting value of M. For each pass, the jump M is reduced by a factor of two: in assembly language, a simple right shift one place until M is zero is all that is required. In the tree sort, a common region was set up to pass the parameters from the main routine to the subroutine: if the parameters were passed as subroutine arguments (i.e., CALL UPTREE(D, I, L)), then the sort would take longer as the argument list would have to be set up for each call. The final test of any sort is how long it takes, and how large it is. I determined the code size from the $CODE psect in the fortran compiler listings (the tree sort includes both the main routine and the subroutine). The speed of PAGE 5 the sort was determined for MAX equal to 2000 and 6000 items. The results were: SORT MAX=2000 MAX=6000 $CODE BUBBLE 39 Seconds 358 Seconds 132 INSERT 38 355 145 SELECT 17 155 143 SHELL 2.3 0.63 184 TREE 1.6 0.47 191 I believe the above figures speak for themselves. The only sorts worth considering are the Shell and tree sorts. As I find the Shell sort easier to understand, I have been using it exclusively since I first ran these tests. I must point out that there is a very subtle and elusive bug in the INSERT sort routine, which causes exactly one data item to be incorrect. Because the insert sort showed no real advantage over the bubble sort, I did not persue this fault. Every indication shows that the Shell and tree sorts are so much faster that there is nothing to be gained by fixing the insert sort.