A comparison of sorting routines.

                  B. Z. Lederman		I.T.T. World Communications.

     In the Spring 1981 proceedings, there  was  an  article  comparing  various
sorting  routines.   This article estimated the time various routines would take
by theoretical calculations.  Not long after, we had an aplication which  called
for  sorting a large array in core, and I was interested in finding out just how
long each routine would take, so I wrote all of the routines in Fortran, and ran
them on a lightly loaded PDP-11/70 with RSX-11M (at that time, V3.1 or V3.2).

     In  order  to  test  the  timing,  all  routines  were  compiled  with  the
Fortran-IV-Plus compiler with no trace.  All of the programs use the same base:

        PROGRAM SORT
        IMPLICIT INTEGER (A-Z)
        PARAMETER MAX=2000
        REAL SECNDS, A, DELTA, RAN
        INTEGER*2 D(MAX)
C
        DO 50 I=1,MAX
        A=100.*RAN(1,1)
50      D(I)=NINT(A)
C
        A=SECNDS(0.0)
C
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
C
        (one of the sorting modules goes here)
C
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
C
        DELTA=SECNDS(A)
        TYPE 60, DELTA
60      FORMAT(1X/'  TIME = ',F14.7/1X)
C
        DO 80 J=25, MAX, 25
        I=J-24
        TYPE 10, (D(K),K=I,J)
10      FORMAT(1X, 25(1X, I2))
80      CONTINUE
C
        STOP
        END


     The program is very simple.  The loop ending  at  50  puts  random  numbers
between  1  and  99 into an array whose size is determined by the parameter MAX,
and the time is obtained from the system.  The chosen  routine  then  sorts  the
array,  and  the  elapsed  time  is  obtained from the system and printed at the
terminal.  The loop ending at 80 prints out the data so I can see if the routine
worked properly.  As the (old) random number generator always generates the same
sequence of pseudo-random numbers, all of the routines work on the same data, so


                                                                          PAGE 2


the timings are valid.  The sorting routines used are:

                                  BUBBLE SORT

        DO 500 K=1, MAX-1
        DO 300 L=1, MAX-1
        IF (D(L).LE.D(L+1)) GOTO 300
        X=D(L)
        D(L)=D(L+1)
        D(L+1)=X
300     CONTINUE
500     CONTINUE

                                  INSERT SORT

        DO 500 K=3, MAX+1
        DO 300 I=K,3,-1
        X=D(I)
        D(1)=X
        J=I-1
        IF (X.GE.D(J)) GOTO 300
        D(I)=D(J)
        D(J)=X
        J=J-1
300     CONTINUE
500     CONTINUE

                                 SELECTION SORT

        DO 500 I=1, MAX-1
        D=D(I)
        DO 300 J=(I+1), MAX
        IF (D(J).GE.X) GOTO 300
        X=D(J)
        D(J)=D(I)
        D(I)=X
300     CONTINUE
500     CONTINUE


                                                                          PAGE 3


                                   SHELL SORT

        M=1
20      IF ((2**M).GT.MAX) GOTO 30
        M=M+1
        GOTO 20
30      M=M-1
        M=2**M
C
100     K=MAX-M
        DO 500 J=1,K
        DO 300 I=J,1,-(M)
        IF (D(I+M).GE.D(I)) GOTO 500
        X=D(I)
        D(I)=D(I+M)
        D(I+M)=X
300     CONTINUE
500     CONTINUE
        M=M/2
        IF (M.GT.0) GOTO 100


                                                                          PAGE 4


                                   TREE SORT

        L=MAX
        DO 100 I=(MAX/2),2,-1
100     CALL UPTREE
        I=1
        DO 200 L=MAX,2,-1
        CALL UPTREE
        X=D(1)
        D(1)=D(L)
200     D(L)=X

The following subroutine is also required.

        SUBROUTINE UPTREE
        IMPLICIT INTEGER (A-Z)
        PARAMETER MAX=2000
        DIMENSION D(MAX)
        COMMON /TR/ D, I, L
C
        A=I
        X=D(A)
300     F=2*A
        IF (F.GT.L) GOTO 500
        IF (F.EQ.L) GOTO 400
        IF (D(F+1).GT.D(F)) F=F+1
400     IF (D(F).LE.X) GOTO 500
        D(A)=D(F)
        A=F
        GOTO 300
500     D(A)=X
        RETURN
        END


     As the article mentioned explains the sorts I will not do  so  here  except
for  a  few  particulars  about  the  implementation.   In  the  Shell sort, the
selection of the jump parameter is important if the sort is to work.  The method
I  used is to set M to be the largest power of two which is not greater than the
number of items to be sorted (MAX).  The routine in fortran  looks  complicated,
but it is not.  When we implemented this sort later in assembly language, it was
very easy, as one simply finds the most significant bit set  in  the  number  of
items,  and  this  is  your  starting  value of M.  For each pass, the jump M is
reduced by a factor of two:  in assembly language,  a  simple  right  shift  one
place  until  M  is  zero  is  all that is required.  In the tree sort, a common
region was set  up  to  pass  the  parameters  from  the  main  routine  to  the
subroutine:   if  the parameters were passed as subroutine arguments (i.e., CALL
UPTREE(D, I, L)), then the sort would take longer as  the  argument  list  would
have to be set up for each call.

     The final test of any sort is how long it takes, and how large  it  is.   I
determined  the  code size from the $CODE psect in the fortran compiler listings
(the tree sort includes both the main routine and the subroutine).  The speed of


                                                                          PAGE 5


the sort was determined for MAX equal to 2000 and 6000 items.  The results were:

SORT            MAX=2000            MAX=6000           $CODE

BUBBLE           39 Seconds         358 Seconds        132

INSERT           38                 355                145

SELECT           17                 155                143

SHELL             2.3                 0.63             184

TREE              1.6                 0.47             191

I believe the  above  figures  speak  for  themselves.   The  only  sorts  worth
considering  are  the  Shell and tree sorts.  As I find the Shell sort easier to
understand, I have been using it exclusively since I first ran these tests.

     I must point out that there is a very subtle and elusive bug in the  INSERT
sort  routine,  which causes exactly one data item to be incorrect.  Because the
insert sort showed no real advantage over the bubble sort, I did not persue this
fault.   Every indication shows that the Shell and tree sorts are so much faster
that there is nothing to be gained by fixing the insert sort.