WESTERN MICHIGAN UNIVERSITY COMPUTER CENTER LIBRARY PROGRAM #1.1.6 CALLING NAME: CROSS PREPARED BY: SAM ANEMA APPROVED BY: JACK R. MEAGHER DATE: AUGUST, 1972 (VERSION 3) PDP-10 NUCROS 1.0 PURPOSE THIS PROGRAM IS A MODIFICATION OF A PROGRAM TAKEN FROM THE BOOK, "DATA PRO- CESSING" BY K. JANDA, NORTHWESTERN UNIVERSITY PRESS. IT WILL PERFORM CROSS TABULATIONS ON TWO VARIABLES WITH THE OPTION OF USING UP TO TWO OTHER VARIABLES AS CONTROL VARIABLES. IT HAS THE CAPABILITY OF CALCULATING ROW AND COLUMN PERCENTAGES, CALCULATING CHI-SQUARES, AND CALCULATING CORRELATION COEFFICIENTS (KENDALL'S TAU-A, TAU-B, DXY, DYX, GAMMA). FOR AN EXPLANATION OF THESE CORRELATION COEFFICIENTS, SEE "A NEW ASYMMETRIC MEASURE OF ASSOCIATION FOR ORDINAL VARIABLES" IN THE AMERICAN SOCIOLOGICAL REVIEW, VOLUME 27, DECEMBER 1962 BY ROBERT H. SOMERS. IT ALSO HAS THE ABILITY TO RECODE DATA BY GROUPING CON- TIGUOUS OR NON-CONTIGUOUS LEVELS WITHIN VARIABLES. 2.0 USER OPTIONS THE DATA INPUT ON THIS PROGRAM MAY BE EITHER FROM CARDS, DISK, TAPE, DECTAPE, OR TERMINAL. THE CONTROL INFORMATION MAY BE EITHER FROM CARDS OR TERMINAL. THE OUTPUT MAY BE TO THE TERMINAL, DISK, TAPE, DECTAPE, OR LINE PRINTER. 3.0 LIMITATIONS 1. THE MAXIMUM NUMBER OF VARIABLES TO BE PROCESSED AT ONE TIME IS 40. 2. THE MAXIMUM NUMBER OF TABLES TO BE CREATED IN ONE RUN IS 72. 3. THE MAXIMUM NUMBER OF UPPER BOUNDARIES IF TYPE 04 OR 08 RECODING IS USED IS 16. 4. THERE IS NO LIMITATION ON THE NUMBER OF CASES. 4.0 METHOD OF USE THE FOLLOWING IS A SUMMARY OF THE QUESTIONS WHICH ARE TO BE ANSWERED IN THE CROSS PROGRAM. ^USER'S RESPONSES ARE INDICATED BY AT THE END OF THE LINE. A MEANS THAT THE RETURN KEY IS TO BE PRESSED. A LINE BY LINE EXPLANATION WILL FOLLOW THE SUMMARY. .R CROSS LINE 1 OUTPUT? TTY: LINE 2 INPUT? TTY: LINE 3 FORMAT: (I-TYPE ONLY) LINE 4 (3I1) LINE 5 ENTER IDENTIFICATION LINE 6 TRIAL RUN LINE 7 ENTER PROBLEM CONTROL PARAMETERS LINE 8 15,3,2,1,1,1,1,1 LINE 9 ENTER MAXIMUM VALUES LINE 10 020202 LINE 11 ENTER TABLE CONTROL PARAMETERS 0101 0203 LINE 12 0000 0000 LINE 13 ENTER TWO RECODING LINES LINE 14 080808 LINE 15 000000 LINE 16 ENTER GROUP BOUNDARIES LINE 17 2,4,6,8,9, LINE 18 ENTER CATEGORIZATION LINE 19 1,2,1,2,1 LINE 20 ENTER GROUP BOUNDARIES LINE 21 2,4,6,8,9 LINE 22 ENTER CATEGORIZATION LINE 23 1,2,1,2,1 LINE 24 ENTER GROUP BOUNDARIES LINE 25 2,4,6,8,9 LINE 26 ENTER CATEGORIZATION LINE 27 1,2,1,2,1 LINE 28 ENTER VARIABLE NAMES AGE LINE 29 SEX LENGTH OF HAIR LINE 30 ENTER DATA 123 456 789 987 654 321 369 LINE 31 258 147 741 852 963 126 324 357 [OUTPUT] LINE 32 INPUT? FINISH THE FOLLOWING IS AN EXPLANATION OF THE ABOVE SUMMARY. LINE 1 OUTPUT? LINE 2 INPUT? LINES 1 AND 2 DEFINE WHERE THE USER INTENDS TO WRITE HIS OUTPUT FILE (LINE 1) AND FROM WHERE THE USER EXPECTS TO READ HIS INPUT DATA (LINE 2). SEE NOTE (2) BELOW FOR OTHER INPUT OPTIONS. THE PROPER RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF THREE BASIC PARTS: A DEVICE, A FILENAME, AND A PROJECT-PROGRAMMER NUMBER. THE GENERAL FORMAT FOR THESE THREE PARTS IS AS FOLLOWS: DEV:FILE.EXT[PROJ,PROG] 1) DEV: ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED: DEVICE LIST DEFINITION STATEMENT USE TTY: TERMINAL INPUT OR OUTPUT DSK: DISK INPUT OR OUTPUT CDR: CARD READER INPUT ONLY LPT: LINE PRINTER OUTPUT ONLY DTA0: DECTAPE 0 INPUT OR OUTPUT DTA1: DECTAPE 1 INPUT OR OUTPUT DTA2: DECTAPE 2 INPUT OR OUTPUT DTA3: DECTAPE 3 INPUT OR OUTPUT DTA4: DECTAPE 4 INPUT OR OUTPUT DTA5: DECTAPE 5 INPUT OR OUTPUT DTA6: DECTAPE 6 INPUT OR OUTPUT DTA7: DECTAPE 7 INPUT OR OUTPUT MTA0: MAGNETIC TAPE 0 INPUT OR OUTPUT MTA1: MAGNETIC TAPE 1 INPUT OR OUTPUT INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO THE CARD READER. 2) FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED. THIS PART OF THE SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED. 3) [PROJ,PROG] IF DISK IS USED AND THE USER WISHES TO READ A FILE IN ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE PROJECT-PROGRAMMER NUMBER OF THE DIRECTORY FROM WHICH HE WISHES TO READ. THE PROJECT NUMBER AND THE PROGRAMMER NUMBER MUST BE SEPARATED BY A COMMA AND ENCLOSED IN BRACKETS. OUT- PUT MUST GO TO YOUR OWN AREA. EXAMPLE: OUTPUT? LPT:/2 INPUT? DSK:DATA.DAT[71171,71026] IN THE EXAMPLE, TWO COPIES OF THE OUTPUT ARE TO BE PRINTED BY THE HIGH SPEED LINE PRINTER. THE INPUT DATA IS A DISK FILE OF NAME DATA.DAT IN USER DIRECTORY [71171,71026]. DEFAULTS: 1) IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED THE DEFAULT DEVICE WILL BE DSK: 2) IF NO FILENAME IS SPECIFIED AND A DISK OR DECTAPE IS USED THE DEFAULT ON INPUT WILL BE FROM INPUT.DAT; ON OUTPUT IT WILL BE OUTPT.DAT. 3) IF THE PROGRAM IS RUN FROM THE TERMINAL AND NO SPECIFICATION IS GIVEN (JUST A CARRIAGE RETURN) BOTH INPUT AND OUTPUT DEVICES WILL BE THE TERMINAL. 4) IF THE PROGRAM IS RUN THROUGH BATCH AND NO SPECIFICATION IS GIVEN, (A BLANK CARD) THE INPUT DEVICE WILL BE CDR: AND THE OUTPUT DEVICE WILL BE LPT: 5) IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER WILL BE ASSUMED. NOTE: (1) IF LPT: IS USED AS AN OUTPUT DEVICE, MULTIPLE COPIES MAY BE OBTAINED BY SPECIFYING LPT:/N WHERE N REFERS TO THE NUMBER OF COPIES DESIRED. (2) THE FOLLOWING TWO OPTIONS ARE NOT APPLICABLE FOR THE FIRST DATA SET, I.E., IT IS APPLICABLE ONLY WHEN THE PROGRAM BRANCHES BACK TO LINE 2 UPON FIRST COMPLETION OF LINES 1-31. (A) SAME OPTION UPON RETURNING FROM LINE 31, IF THE SAME DATA FILE IS TO BE USED AGAIN, SIMPLY ENTER "SAME", OTHERWISE, EITHER USE THE FINISH OPTION OR ENTER ANOTHER FILENAME, ETC. (B) FINISH OPTION THE USER MUST ENTER "FINISH" TO BRANCH OUT OF THE CROSS PROGRAM. FAILURE TO DO SO MIGHT RESULT IN LOSING THE ENTIRE OUTPUT FILE. LINES 3-4. THERE ARE 3 OPTIONS AVAILABLE FOR THE FORMAT, NAMELY: (A) STANDARD FORMAT OPTION UNLESS OTHERWISE SPECIFIED, THE PROGRAM ASSUMED THE STANDARD OPTION. IN THIS OPTION, THE DATA ARE ARRANGED IN GROUPS OF 10 PER LINE, TWO VALUES BEING SEPARATED BY A COMMA. TO USE THIS OPTION, SIMPLY TYPE IN "" ON TERMINAL JOBS OR USE A BLANK CARD FOR BATCH JOBS OR TYPE "STD". (B) OBJECT TIME FORMAT OPTION IF THE DATA IS SUCH THAT A USER'S OWN FORMAT IS REQUIRED, SIMPLY ENTER A LEFT PARENTHESIS FOLLOWED BY THE FIRST FORMAT SPECIFICATION, A COMMA AND THE SECOND SPECIFICATION, ETC. WHEN YOU FINISH ENTER A RIGHT PARENTHESIS, AND THEN A CARRIAGE RETURN. THERE CAN BE A MAXIMUM OF 3 LINES FOR THE FORMAT, EACH LINE BEING 80 COLUMNS LONG. NOTE THAT THE FORMAT SPECIFICATION LIST MUST USE THE FIXED POINT (I-TYPE) NOTATION AND MUST CONTAIN SPECIFICATION FOR EACH OF THE VARIABLES. THE SPECIFICATIONS FOR THE FORMAT ITSELF ARE THE SAME AS FOR THE FORTRAN IV FORMAT STATEMENT. (C) SAME OPTION THE SAME OPTION IS APPLICABLE ONLY TO JOBS THAT USE MORE THAN ONE DATA FILE. IF AN OBJECT TIME FORMAT WAS USED ON A DATA SET AND THE SUCCEEDING DATA SET UTILIZES THE SAME FORMAT, SIMPLY ENTER "SAME". LINES 5-6. PROBLEM IDENTIFICATION LINES 7-8. PROBLEM CONTROL PARAMETERS ENTER THE FOLLOWING NUMBERS SEPARATED BY COMMAS: 1. NUMBER OF CASES OR OBSERVATIONS 2. NUMBER OF VARIABLES (MAXIMUM 40) 3. NUMBER OF TABLES (MAXIMUM 72). A TABLE CONSISTS OF A CROSS TABULATION ON TWO VARIABLES. CONTROLLING ON ONE OR TWO VARIABLES DOES NOT CHANGE THE NUMBER OF TABLES. 4. 1 IF VARIABLES ARE TO BE NAMED IN ADDITION TO BEING NUMBERED; 0 OTHERWISE. 5. 1 IF RECODING IS TO BE PERFORMED; 0 OTHERWISE. 6. 1 IF CHI-SQUARE STATISTIC IS DESIRED; 0 OTHERWISE. 7. 1 IF COLUMN AND ROW PERCENTAGES ARE DESIRED; 0 OTHERWISE. 8. 1 IF CORRELATION COEFFICIENTS ARE DESIRED; 0 OTHERWISE. LINES 9-10. MAXIMUM VALUES OF VARIABLES THESE ARE THE MAXIMUM VALUES AFTER RECODING, SUBMITTED 30 PER LINE, TWO SPACES FOR EACH VALUE, RIGHT JUSTIFIED. NOTE: THE MAXIMUM TABLE SIZE IS 20 COLUMNS BY 100 ROWS. LINES 11-12. TABLE CONTROL PARAMETERS 1. ON THE FIRST LINE ENTER VARIABLE NUMBERS OF THE VARIABLES WHICH ARE TO BE THE COLUMN VARIABLES. 2. ON THE SECOND LINE ENTER CORRESPONDING VARIABLE NUMBERS OF THE VARIABLES WHICH ARE TO BE THE ROW VARIABLES. 3. ON THE THIRD LINE ENTER FIRST CONTROL VARIABLE (IF NO CONTROL VARIABLES, ENTER 0'S). 4. ON THE FOURTH LINE ENTER SECOND CONTROL VARIABLES (IF NO SECOND CONTROL VARIABLE, ENTER 0'S). IN EACH OF THE ABOVE CASES, ENTER VARIABLE NUMBERS AS TWO-DIGIT RIGHT JUSTIFIED NUMBERS AT A RATE OF 30 NUMBERS PER LINE. LINES 13-15. IF RECODING HAS BEEN SPECIFIED, TWO LINES MUST BE ENTERED AT THIS POINT. 1. ON THE FIRST LINE, A TWO-DIGIT, RIGHT-JUSTIFIED CODE MUST BE ENTERED FOR EACH VARIABLE. POSSIBLE CODES ARE: 00 - NO RECODING 01 - RECODE BY ADDING A CONSTANT 02 - RECODE BY SUBTRACTING A CONSTANT 03 - RECODE BY DIVIDING A VARIABLE BY A CONSTANT 04 - RECODE BY REGROUPING CONTIGUOUS CATEGORIES 05 - REGROUPING DATA AND ADDITION 06 - REGROUPING DATA AND SUBTRACTION 07 - REGROUPING DATA AND DIVISION 08 - REGROUPING NON-CONTIGUOUSLY 2. ON THE SECOND LINE, INDICATE THE CONSTANT. IF NO CONSTANT IS REQUIRED, ENTER 00. AGAIN, 2-DIGIT, RIGHT JUSTIFIED 30 PER LINE. LINES 16-27. FOR EACH VARIABLE FOR WHICH TYPE 04 OR TYPE 08 RECODING HAS BEEN REQUESTED ENTER: (04) ONE LINE WHICH CONTAINS THE UPPER LIMITS OF THE RE- GROUPING (08) TWO LINES, THE FIRST WHICH INDICATES THE UPPER LIMITS OF THE CONTIGUOUS GROUPS. THE SECOND INDICATES THE CORRESPONDING NUMBER TO WHICH EACH OF THE ABOVE GROUPS IS TO BE RECODED. THE NUMBERS IN THE ABOVE TWO CASES MUST BE SUBMITTED SEPARATED BY COMMAS. LINES 28-29. IF OPTION 4 ON THE PROBLEM CARD INDICATES A DESIRE TO NAME THE VARIABLES, ENTER THE NAMES HERE, ONE NAME PER LINE, LESS THAN 60 CHARACTERS PER NAME. LINES 30-31. IF INPUT IS FROM THE TERMINAL, ENTER IT AT THIS POINT (SEE LINE 2). LINE 32. THE PROGRAM IS NOW READY TO ACCEPT ANOTHER INPUT SPECIFICATION TO CONTINUE PROCESSING (SEE LINE 2). SUBMITTING FINISH WILL TER- MINATE THE PROGRAM. 5.0 SAMPLE TERMINAL RUN .R CROSS PDP-10 VERSION OF NUCROS OUTPUT? (TYPE HELP IF NEEDED--TTY: INPUT? (TYPE HELP IF NEEDED)--TTY: FORMAT: (I-TYPE ONLY) (3I1) ENTER IDENTIFICATION TRIAL RUN TRIAL RUN ENTER PROBLEM CONTROL PARAMETERS 15,3,2,1,1,1,1,1 TOTAL CASES USED = 15 ENTER MAXIMUM VALUES 020202 VARIABLE 1 2 3 MAX.VALUE 2 2 2 ENTER TABLE CONTROL PARAMETERS 0101 0203 0000 0000 TABLE NO. VARIABLES MAXIMUM VALUES 1 1 2 2 2 2 1 3 2 2 ENTER TWO RECODING LINES 080808 000000 ENTER GROUP BOUNDARIES 2,4,6,8,9 ENTER CATEGORIZATION 1,2,1,2,1 ENTER GROUP BOUNDARIES 2,4,6,8,9 ENTER CATEGORIZATION 1,2,1,2,1 ENTER GROUP BOUNDARIES 2,4,6,8,9 ENTER CATEGORIZATION 1,2,1,2,1 RECODING OPTION CALLED FOR VAR.NO. CODE CONSTANT RECODED CATEGORIES (UPPER LIMITS) 1 8 NONE 2 4 6 8 9 0 0 0 0 0 0 0 0 0 0 0 1 2 1 2 1 0 0 0 0 0 0 0 0 0 0 0 2 8 NONE 2 4 6 8 9 0 0 0 0 0 0 0 0 0 0 0 1 2 1 2 1 0 0 0 0 0 0 0 0 0 0 0 3 8 NONE 2 4 6 8 9 0 0 0 0 0 0 0 0 0 0 0 1 2 1 2 1 0 0 0 0 0 0 0 0 0 0 0 ENTER VARIABLE NAMES AGE SEX LENGTH OF HAIR VAR.NO. 1 AGE 2 SEX 3 LENGTH OF HAIR ENTER DATA 123 456 789 987 654 321 369 258 147 741 852 963 126 324 357 TABLE NO. 1 VARIABLE NO. 1 AGE VRS. VARIABLE NO. 2 SEX TABLE SIZE = 3 BY 3 TOT 0 1 2 0 0 0 0 0 1 11 0 5 6 2 4 0 2 2 TOTAL 15 0 7 8 CHI SQUARE = 0.024 C = .040 TAU-B = -0.040 GAMMA = -0.091 DXY = -0.045 DYX = -0.036 PERCENTS BY COLUMN FROM THE ABOVE MATRIX 0 0 00.0 00.0 00.0 1 11 00.0 71.4 75.0 2 4 00.0 28.6 25.0 TOTAL 15 0 7 8 PERCENTS BY ROW FROM THE ABOVE MATRIX 0 0 00.0 00.0 00.0 1 11 00.0 45.5 54.5 2 4 00.0 50.0 50.0 TOTAL 15 0 7 8 TABLE NO. 2 VARIABLE NO. 1 AGE VRS. VARIABLE NO. 3 LENGTH OF HAIR TABLE SIZE = 3 BY 3 TOT 0 1 2 0 0 0 0 0 1 7 0 1 6 2 8 0 6 2 TOTAL 15 0 7 8 CHI SQUARE = 5.529 C = .519 TAU-B = -0.607 GAMMA = -0.895 DXY = -0.607 DYX = -0.607 PERCENTS BY COLUMN FROM THE ABOVE MATRIX 0 0 00.0 00.0 00.0 1 7 00.0 14.3 75.0 2 8 00.0 85.7 25.0 TOTAL 15 0 7 8 PERCENTS BY ROW FROM THE ABOVE MATRIX 0 0 00.0 00.0 00.0 1 7 00.0 14.3 85.7 2 8 00.0 75.0 25.0 TOTAL 15 0 7 8 INPUT? (TYPE HELP IF NEEDED)--FINISH END OF EXECUTION CPU TIME: 0.84 ELAPSED TIME: 8.80 EXIT 6.0 BATCH OPERATION THE FOLLOWING IS A BATCH JOB SET UP: (EACH LINE REPRESENTS ONE CARD, EACH CARD STARTING IN COLUMN 1. DO NOT INCLUDE THE COMMENTS AT THE RIGHT.) -------------------------------------------------------------------------------- COMMENTS $JOB [PROJ,PROG] JOB CARD; INSERT USER'S PROJECT- PROGRAMMER NUMBER WITHIN THE BRACKET $PASSWORD ###### IN PLACE OF THE 6#'S, PUT IN THE PASS- WORD. $DATA SIGNIFY BEGINNING OF DATA DECK (DATA CARDS) INSERT THE DATA CARD DECK TO BE ANALYZED $EOD SIGNIFY THE END OF DATA CARD DECK .R CROSS START THE EXECUTION (RESPONSES TO LINES 1-32 IN SECTION 4.0 REPEATED OR NOT) USER'S RESPONSE (EOF) AN END-OF-FILE CARD (GET THESE AT THE COMPUTER CENTER) -------------------------------------------------------------------------------- EXAMPLE: $JOB [410,410] $PASSWORD $DATA 123 456 789 987 654 321 369 258 147 741 852 963 126 324 357 $EOD .R CROSS LPT: CDR: (3I1) BATCH EXAMPLE 15,3,4,1,1,1,1,1 020202 0101 0203 0000 0000 080808 000000 2,4,6,8,9 1,2,1,2,1 2,4,6,8,9 1,2,1,2,1 2,4,6,8,9 1,2,1,2,1 AGE SEX LENGTH OF HAIR FINISH (EOF)