1 !
	500.22 - STAT11 - RANK0C

		RANK CORRELATION MODULE

	RELEASED FOR SUBMISSION TO THE DECUS LIBRARY BY THE DEC
	ENGINEERING SYSTEMS GROUP AND THE EDUCATION PRODUCTS GROUP
		SEPTEMBER, 1977


2 !	COPYRIGHT (C) 1973, DIGITAL EQUIPMENT CORPORATION,

			MAYNARD, MASSACHUSETTS

3 !	THIS SOFTWARE IS FURNISHED TO PURCHASER UNDER A LICENSE FOR USE
	ON A SINGLE COMPUTER SYSTEM AND CAN BE COPIED (WITH INCLUSION OF
	DEC'S COPYRIGHT NOTICE) ONLY FOR USE IN SUCH SYSTEM, EXCEPT AS
	MAY OTHERWISE BE PROVIDED IN WRITING BY DEC.

4 !	THE INFORMATION IN THIS DOCUMENT IS SUBJECT TO CHANGE WITHOUT
	NOTICE AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY DIGITAL
	EQUIPMENT CORPORATION.

5 !	DEC ASSUMES NO RESPONSIBILITY FOR USE OR RELIABILITY OF ITS
	SOFTWARE ON EQUIPMENT WHICH IS NOT SUPPLIED BY DEC.



6 !	THIS MODULE PERFORMS RANK CORRELATION, COMPUTING THE KENDALL RANK
	CORRELATION COEFFICIENT, STANDARD DEVIATION, AND SIGNIFICANCE FOR ANY
	TWO VARIABLES IN THE DATA MATRIX.

7 ! 	AUTHOR:		MICHAEL D. KNAUER

	VERSION NUMBER:	001

	DATE:		OCTOBER, 1973



8  !  	MODIFICATIONS:  MAY, 1975

	MODIFIED TO ACCEPT MISSING DATA POINTS

		BY ARDOTH HASSLER WILSON
		   CENTRAL STATE UNIVERSITY
		   EDMOND, OKLAHOMA


00009!	MODIFICATIONS:  JUNE, 1976

	CTRL/C TRAP ADDED

10 !
	CALLING ARGUMENTS

11 !		1) VARIABLE NAME:	F3$
		   RANGE OF VALUES:	S0000.RWM - S99999.RWM
		   USE:			NAME OF THE 250 ROW BY 15 COLUMN
					VIRTUAL DATA MATRIX.

13 !		2) VARIABLE NAME:	R%
		   RANGE OF VALUES:	1 - 250
		   USE:			NUMBER OF ROWS IN DATA MATRIX

15 !		3) VARIABLE NAME:	C%
		   RANGE OF VALUES:	1 - 15
		   USE:			NUMBER OF COLUMNS IN DATA MATRIX



20 !
	RETURNING ARGUMENTS

		NONE -- THIS MODULE DOES NOT CHANGE OR ADD TO ANY OF THE
			FILES OR VARIABLES PASSED TO IT.



30 !	DESCRIPTION OF FUNCTION

31 !		THIS MODULE PERFORMS RANK CORRELATION, COMPUTING THE KENDALL
		RANK CORRELATION COEFFICIENT, STANDARD DEVIATION, AND
		SIGNIFICANCE FOR ANY TWO VARIABLES IN THE DATA MATRIX.

32 !		IT BEGINS BY RANKING THE OBSERVATIONS IN EACH OF THE TWO
		VARIABLES, AND THEN COUNTING THE TIES FOR RANKS.

33 !		THEN IT WILL PRINT THE DATA AND RANKS, IF THE USER DESIRES.

34 !		NEXT IT SORTS THE RANKS FOR VARIABLE A IN ASCENDING
		ORDER, KEEPING THE VARIABLE B RANKS MATCHING.

35 !		THEN IT SCORES THE VARIABLE B RANKS, ESSENTIALLY SEEING HOW
		CLOSE TO PERFECT ASCENDING RANKING ORDER THEY ARE.

36 !		WITH THIS SCORE AND WITH THE COUNTS OF TIES FOR RANKS, IT
		FINALLY COMPUTES AND PRINTS THE KENDALL RANK CORRELATION
		COEFFICIENT, STANDARD DEVIATION, AND SIGNIFICANCE (Z).

37 !		THE USER CAN HAVE MORE RANK CORRELATIONS DONE. OTHERWISE,
		CONTROL IS RETURNED TO STATCM.

40 !














99 !


	FIRST THE CALLING ARGUMENTS ARE RETRIEVED FROM COMMON, THIS DATA
	MATRIX FILE IS OPENED, AND THE USER IS ASKED TO INPUT TWO
	VARIABLES FOR RANK CORRELATION.



100	ON ERROR GOTO 5000: GOSUB 10000: F$=SYS(CHR$(7%))
	: R%=VAL(MID(F$,46%,5%)): C%=VAL(MID(F$,51%,5%))
	: F3$=MID(F$,31%,15%)
110	A9$=MID(F$,62%,9%): A9$=LEFT(A9$,INSTR(5%,A9$,"]")):
	H9=.9E-38
140  OPEN F3$ FOR INPUT AS FILE 3:
  DIM #3%,Z(250%,15%)
160  PRINT "ENTER THE COLUMN NUMBERS OF TWO VARIABLES"
165  INPUT "FOR RANK CORRELATION. SEPARATE THEM WITH A COMMA";V1%,V2%
170  PRINT
180 IF V1%>=1% AND V1%<=C% AND V2%>=1% AND V2%<=C% GOTO 230
190  PRINT "YOUR VARIABLE NUMBERS MUST BE BETWEEN 1 AND"; C%
200  PRINT "PLEASE TRY AGAIN"
205  GOTO 160
229 !


	THIS SECTION MOVES THE DATA POINTS FOR THE TWO VARIABLES FROM THE
	DATA MATRIX (Z) TO WORKING STORAGE ARRAYS (A AND B).



230	N5=0: FOR I%=1% TO R%:
	IF Z(I%,V1%)=H9 OR Z(I%,V2%)=H9 GOTO 250
240	N5=N5+1: A(N5)=Z(I%,V1%): B(N5)=Z(I%,V2%)
250	NEXT I%
260	IF N5=0 THEN PRINT "NO DATA EXISTS TO PERFORM THIS ANALYSIS":
	GOTO 2500
1760 M9=N5*(N5-1%):
 DIM A(250%),R(250%):
 DIM B(250%),T(250%):
 GOSUB 4000
1900 !


	THE PREVIOUS GOSUB 4000 RANKS THE DATA POINTS IN ARRAY A, COMPLETE
	WITH TIED RANKS. THE NEXT GOSUB 4280 COUNTS THE NUMBER OF OBSERVATIONS
	TIED FOR RANKS.



1910 K1=1%:
 GOSUB 4280
1929 !


	AFTER RANKING VARIABLE A, IT SWITCHES THE INFORMATION OF VARIABLES
	(ARRAYS) A AND B SO IT CAN USE THE SAME ROUTINE (BEGINNING AT
	LINE 4000) TO RANK BOTH VARIABLES.



1930 FOR L=1% TO N5:
 R2=A(L):
 A(L)=B(L):
 B(L)=R2:
 T(L)=R(L)
1980 NEXT L
1988 !


	THE NEXT GOSUB 4000 RANKS THE 2ND COLUMN. C8 WAS COMPUTED IN THE
	PREVIOUS GOSUB 4280 AND IS USED TO COMPUTE THE KENDALL RANK
	CORRELATION COEFFICIENT.
1989 !	THE NEXT GOSUB 4280 COUNTS HOW MANY TIES FOR RANKS THERE ARE FOR
	THE 2ND VARIABLE.



1990 GOSUB 4000:
	C9=C8:
	GOSUB 4280
2019 !


	THIS SECTION PRINTS THE DATA AND RANK, IF THE USER DESIRES.



2020 INPUT "WOULD YOU LIKE TO SEE DATA AND RANK";Q$:PRINT
2025 IF Q$="YES" OR LEFT(Q$,1%)="Y" GOTO 2050
2030 IF Q$="NO" OR LEFT(Q$,1%)="N" OR Q$="" GOTO 2150
2035 PRINT "YOU MUST ANSWER EITHER YES OR NO"
2040 GOTO 2020
2050	PRINT:
 PRINT "OBS.    VAR. A    RANK A        VAR. B    RANK B":
 PRINT "................................................."
2080	FOR L%=1% TO N5:
	PRINT L%;TAB(7%);B(L%);TAB(17%);T(L%);TAB(31%);A(L%);TAB(41%);
	R(L%)
2100	NEXT L%:
 PRINT "..................................................":
	PRINT: PRINT
2149 !


	THIS SORTS THE VARIABLE A RANKS (I.E. THE RANKS THAT WERE PRINTED
	OUT UNDER RANK A), KEEPING THE VARIABLE B RANKS MATCHING.



2150 I1=0%:
 FOR I=2% TO N5:
 IF T(I)>=T(I-1%) THEN 2250
2180 I1=1%:
 R2=R(I):
 R(I)=R(I-1%):
 R(I-1%)=R2:
 R2=T(I):
 T(I)=T(I-1%):
 T(I-1%)=R2
2250 NEXT I:
 IF I1>0% THEN 2150
2260 !


	THIS SECTION SCORES THE VARIABLE B RANKS.S1 IS THE TOTAL SCORE FOR
	THE VARIABLE RANKS. WHAT THE SCORE MEASURES IS HOW CLOSE TO BEING
	IN PERFECT ASCENDING ORDER THE RANKS FOR VARIABLE B ARE.
2261 !	FOR EACH R(I), IT ADDS ONE TO S1 FOR EACH LARGER RANK ON DOWN THE
	COLUMN, AND SUBTRACTS ONE FROM S1 FOR EACH SMALLER RANK ON DOWN
	THE COLUMN.



2270 S1=0%:
 FOR I=1% TO N5:
 FOR J=I TO N5:
 IF R(J)>R(I) THEN 2340
2310 IF R(I) = R(J) THEN 2350
2320 S1 = S1 - 1% : GOTO 2350
2340 S1=S1+1%
2350 NEXT J:
 NEXT I:
 T9=S1/(SQR((.5*M9-C8)*(.5*M9-C9))):
 PRINT 
2389 !

	TAU IS COMPUTED ON THE PRECEDING LINE, AND IS PRINTED BY THE NEXT
	LINE. SD AND Z (S8 AND Z1) ARE ALSO COMPUTED AND PRINTED HERE.


2390 PRINT "KENDALL RANK CORRELATION COEFFICIENT (TAU)....";:
     PRINT USING "####.####",T9
2430 S8=SQR((2%*(2%*N5+5%))/(9%*N5*(N5-1%))):
 PRINT "STANDARD DEVIATION (SD).......................";:
 PRINT USING "####.####",S8:
 Z1=T9/S8
2455 PRINT "Z-VALUE TO TEST SIGNIFICANCE (TAU/SD).........";:
 PRINT USING "####.####",Z1 : PRINT:
 PRINT
2499 !


	HERE IS THE MORE RANK CORRELATION QUESTION. IF NO, THEN CONTROL IS
	RETURNED TO STATCM.



2500 INPUT "DO YOU WISH TO PERFORM MORE RANK CORRELATION"; Q$
2510 IF LEFT(Q$,1%)="Y" GOTO 160
2520 IF Q$="NO" OR LEFT(Q$,1%)="N" OR Q$="" GOTO 2550
2530 PRINT "YOU MUST ANSWER EITHER YES OR NO"
2540 GOTO 2500
2550 R$=SYS(CHR$(8%)+F$)
2560 CHAIN "STATCM"+A9$
3999 !


	THIS SECTION RANKS THE OBSERVATIONS FOR A VARIABLE.



4000 REM
4009 !
	FIRST, THE R (RANK) ARRAY IS ZEROED OUT.

4010 N=N5:
 FOR I=1% TO N:
 R(I)=0%
4040 NEXT I:
 FOR I=1% TO N:
 IF R(I)>0% THEN 4260
4041 !
	THE PREVIOUS IF STATEMENT MEANS THAT TIED RANKS NEED NOT BE RECOMPUTED

4068 !
	THE J LOOP COUNTS HOW MANY OBSERVATIONS ARE SMALLER THAN AND EQUAL TO
	A GIVEN DATA POINT.

4070 S=0%:
 E=0%:
 FOR J=1% TO N
4100 IF A(J)>A(I) THEN 4160
4110 IF A(J)=A(I) THEN 4140
4119 !
	S = THE NUMBER OF SMALLER DATA POINTS

4120 S=S+1%:
 GOTO 4160
4139 !
	E = NUMBER OF DATA POINTS EQUAL TO A GIVEN POINT

	EQUAL DATA POINTS ARE GIVEN A TEMPORARY RANK OF -1

4140 E=E+1%:
 R(J)=-1%
4160 NEXT J:
 IF E>1% THEN 4200
4179 !
	A DATA POINT'S RANK IS SET HERE UNLESS IT WAS TIED WITH OTHER POINTS

4180 R(I)=S+1%:
 GOTO 4260
4199 !
	HERE THE JOINT RANK FOR A GROUP OF TIED DATA POINTS IS COMPUTED
	THEN, IN THE N4 LOOP, THE TIED RANKS (REMEMBER THEY WERE SET EQUAL
	TO -1) ARE SET TO P1, THEIR JOINT RANK.

4200 P1=S+E/2+.5000:
 FOR N4=1% TO N:
 IF R(N4)>=0% THEN 4250
4230 R(N4) = P1
4250 NEXT N4
4260 NEXT I:
 RETURN 
4278 !


	THIS SECTION COMPUTES C8, WHICH IS A FUNCTION OF THE COUNTS OF TIED
	RANKS, AND WHICH IS USED IN COMPUTING TAU, THE RANK CORRELATION
	COEFFICIENT.



4280 C8 = 0% : Y = 0%
4289 !
	THIS I LOOP FINDS, FOR EACH STEP, THE NEXT LARGER RANK AND SETS X
	EQUAL TO THAT RANK.SO IT GOES THROUGH FINDING THE RANKS IN ORDER FROM 
	LOWEST TO HIGHEST.

4290 I1 = 0% : X = 999999 : 
     FOR I = 1% TO N :
     IF R(I) <= Y THEN 4360
4330 IF R(I)>=X THEN 4360
4340 X=R(I):
 I1=I1+1%
4360 NEXT I
4369 !
	I1 IS A SWITCH. YOU SEE, WHEN THE HIGHEST RANK HAS BEEN FOUND, THEN
	ON THE NEXT STEP NO HIGHER RANK WILL BE FOUND, THUS I1 WON'T BE
	INCREMENTED AND WILL BE 0 AND SO CONTROL WILL PASS TO 4500.

4370 IF I1<1% THEN 4500
4379 !
	Y IS USED AS THE LOWER BOUND FOR THE NEXT STEP, SO RANKS LOWER THAN
	Y WILL BE SKIPPED OVER.

4380 Y=X
4389 !
	THIS I LOOP COUNTS (USING C1) HOW MANY OTHER RANKS ARE TIED WITH
	A GIVEN RANK.

4390 C1=0:
	FOR I=1% TO N:
	IF R(I)<>X THEN 4430
4420 C1=C1+1%
4430 NEXT I:
 IF C1=0% THEN 4290
4448 !
	IF THERE WERE ANY RANKS TIED WITH THAT RANK, THEN C8 (THE CORRECTION
	FACTOR FOR TIES) IS AUGMENTED. NOTE THAT K1, WHICH IS SET IN LINE
	1910, IS ALWAYS 1. SO THE ON GOTO ALWAYS GOES TO 4460.
4449 !	IN FACT, I DON'T KNOW WHAT THE OTHER EQUATION WOULD EVER BE USED FOR.

4450 ON K1 GOTO 4460,4480%
4460 C8=C8+C1*(C1-1%)/2:
 GOTO 4290
4480 C8=C8+(C1^3-C1)/12:
 GOTO 4290
4500 RETURN 
4999 !


	THIS IS THE ERROR ROUTINE FOR THE USER WHO TRIES TO BE CUTE.IT IS
	USED BY THE VARIABLE NUMBER INPUT ROUTINE.



5000	IF ERR=28% THEN GOSUB 10000: RESUME 2500
5005	PRINT "YOU TYPED NON-NUMERIC CHARACTERS IN THE VARIABLE NUMBERS"
5010 PRINT "PLEASE TYPE ONLY NUMBERS WHEN NUMBERS ARE REQUESTED"
5020 RESUME 190
10000	V0$=SYS(CHR$(6%)+CHR$(-7%)): RETURN	! CTRL/C TRAP
32767 END
