WESTERN MICHIGAN UNIVERSITY COMPUTER CENTER LIBRARY PROGRAM #1.3.2 CALLING NAME: STEPR PREPARED BY: RUSSELL R. BARR III & SAM ANEMA STATISTICAL CONSULTANT: MICHAEL R. STOLINE PROGRAMMED BY: * APPROVED BY: JACK R. MEAGHER DATE: FEBRUARY 1977 (VERSION 4) STEPWISE REGRESSION 1.0 PURPOSE LET THERE BE GIVEN M OBSERVATIONS ON N VARIABLES X(1),...,X(N). THE USER MAY OBTAIN CORRELATIONS, REGRESSIONS, AND STEPWISE REGRESSIONS ON THIS DATA. PRO- VISIONS ARE AVAILABLE FOR HANDLING MISSING DATA. TRANSFORMATIONS OF VARIABLES ARE POSSIBLE PRIOR TO THE REGRESSION ANALYSIS SO THAT THIS PROGRAM COULD BE USED FOR CURVILINEAR OR CERTAIN NON-LINEAR STEPWISE REGRESSIONS. MISSING DATA AND TRANSFORMATION OF VARIABLES ARE JUST 2 OF 13 OPTIONS AVAIL- ABLE FOR THE USER. THESE OPTIONS ARE DESCRIBED IN DETAIL IN SECTION 3.0. FOR ANY SINGLE PROGRAM THE USER MAY ELECT NONE, SOME, OR ALL OF THESE OPTIONS. EACH OF THE 13 OPTIONS IS IDENTIFIED WITH A UNIQUE 3-LETTER CODE. THE USER RESPONDS TO THE COMPUTER STATEMENT ENTER OPTIONS BY: (I) PUSHING THE RETURN KEY IF NO OPTIONS ARE WANTED (SEE SECTION 2.0 FOR MORE DETAILS ON THIS), OR (II) ENTERING A 3-LETTER CODE FOR EACH OPTION DESIRED. THE CODES ARE SEPARATED BY COMMAS AND NO PARTICULAR ORDER IS REQUIRED. TABLE 1 OPTION DESCRIPTION AND CODE TABLE CODE DESCRIPTION MIS* MISSING DATA TRA* TRANSFORMATION OF VARIABLES MVS* MEANS, STANDARD DEVIATIONS, AND VARIANCES XPR RAW SUMS OF SQUARES AND CROSS PRODUCTS COV VARIANCE - COVARIANCE MATRIX COR CORRELATION, FISHER Z, AND T MATRICES FVL F-VALUES FOR FORCE VARIABLES ELM ELIMINATE VARIABLES RES RESIDUALS ANA ANALYSIS OF VARIANCE AT FINAL STAGE DUR DURBIN-WATSON STATISTICS ZER FORCE REGRESSION THROUGH ZERO *READ PERTINENT REMARKS IN SECTION 3.0 BEFORE USING. --------------- *THIS WAS PROGRAMMED BY SAM ANEMA PATTERNED AFTER A PROGRAM GIVEN BY WAYNE STATE UNIVERSITY. 2.0 AUTOMATIC STEPWISE REGRESSION IF NO OPTIONS ARE SPECIFIED, THE USER OBTAINS AUTOMATICALLY A GENERAL STEPWISE REGRESSION IN WHICH EACH INDEPENDENT VARIABLE IS FORCED INTO THE REGRESSION EQUATION ONE AT A TIME UNTIL ALL N-1 INDEPENDENT VARIABLES ARE IN THE REGRES- SION EQUATION OF THE LAST STAGE. THE USER AUTOMATICALLY OBTAINS THE INTERMEDIATE EQUATIONS AT EACH STAGE OF THE STEPWISE PROCEDURE: Y = B(0) + B(1) X1 Y = B(0)' + B(1)' X(1) + B(2) X(2) Y = B(0)" + B(1)" X(1) + B(2)' X(2) + B(3) X(3) ETC. THE VARIABLE NUMBERS 1,2,...,N ARE NOT NECESSARILY INCLUDED IN THE STEPWISE REGRESSION IN THE ORDER 1,2,...,N. RATHER, THE NEXT VARIABLE TO BE SELECTED FOR INCLUSION WILL BE THAT VARIABLE FOUND TO CONTRIBUTE THE MOST TO THE IN- CREASE IN THE COEFFICIENT OF DETERMINATION (MULTIPLE R*R), WHICH IS THE PRO- PORTION OF VARIANCE OF THE DEPENDENT VARIABLE ACCOUNTED FOR BY THE INDEPENDENT VARIABLES INCLUDED IN THE REGRESSION EQUATION. AT EACH STATE IN THE STEPWISE REGRESSION THE FOLLOWING ITEMS ARE PRINTED: (I) VARIABLE NUMBER ENTERING (II) PARTIAL REGRESSION COEFFICIENTS AND THEIR STANDARD ERRORS (III) STANDARD ERROR OF ESTIMATE (IV) CONSTANT TERM (V) MULTIPLE R AND COEFFICIENT OF DETERMINATION = R*R (VI) INCREASE IN COEFFICIENT OF MULTIPLE DETERMINATION OVER THE PREVIOUS STAGE (VII) (F-LEVEL) AT EACH STAGE THE USER CAN TEST THE HYPOTHESIS: H(0) : VARIABLE JUST ADDED IS NOT SIGNIFICANT H(1) : VARIABLE JUST ADDED IS SIGNIFICANT THIS IS DONE BY COMPARING THE GIVEN F-VALUE TO A TABLED F POINT FOR 1 AND M-K-1 DEGREES OF FREEDOM, WHERE M = NUMBER OF OBSERVATIONS K = NUMBER OF VARIABLES INCLUDED IN THE REGRESSION AT THIS PARTICULAR STAGE H(1) IS ACCEPTED IF THE F-VALUE IS GREATER THAN THE TABLED F. TABLE 2 GIVES THE ALPHA = .05 AND ALPHA = .01 CRITICAL VALUES FOR AN F DISTRIBUTION WITH 1 AND N=M-K-1 DEGREES OF FREEDOM. TABLE 2 N ALPHA =.05 ALPHA =.01 2 18.5 98.5 3 10.1 34.1 4 7.71 21.2 5 6.61 16.3 6 5.99 13.7 7 5.59 12.2 8 5.32 11.3 9 5.12 10.6 10 4.96 10.0 11 4.84 9.65 12 4.75 9.33 15 4.54 8.68 20 4.35 8.10 24 4.26 7.82 30 4.17 7.56 40 4.08 7.31 60 4.00 7.08 120 3.92 6.85 INFINITY 3.84 6.64 THE USER MAY ELECT TO HAVE THE COMPUTER ADD OR DELETE VARIABLES AT EACH STAGE BY USE OF THE OPTION FVL. ALSO THE OPTIONS FOR AND ELM AUTOMATICALLY FORCE INTO OR ELIMINATE FROM THE REGRESSION EQUATIONS CERTAIN SPECIFIED VARIABLES. THESE OPTIONS ARE EXPLAINED IN SECTION 3.0. AT THE FINAL STAGE OF THE STEPWISE PROCEDURE THE FOLLOWING ENTRIES ARE PRINTED IN ADDITION TO (I) - (VII) GIVEN ON THE PREVIOUS PAGE: (A) THE STANDARDIZED PARTIAL REGRESSION COEFFICIENTS. THE B(I) (UNSTANDARDIZED PARTIAL REGRESSION COEFFICIENTS) ARE RELATED TO BETA(I) (STANDARDIZED PARTIAL REGRESSION COEFFICIENTS) BY THE FORMULA B(I) =(BETA(I)*SD(Y))/SD(X(I)) , WHERE SD(X(I)) AND SD(Y) ARE THE STANDARD DEVIATIONS OF X(I) AND Y RESPEC- TIVELY. Y IS THE DEPENDENT VARIABLE. IN THE OUTPUT COEFFICIENT REFERS TO UNSTANDARDIZED COEFFICIENTS. (B) T-VALUES, WHICH ARE USED FOR TESTING THE SIGNIFICANCE OF THE PARTIAL REGRESSION COEFFICIENTS. T-VALUES ARE OBTAINED BY DIVIDING THE COEFF BY THE STD ERROR OF COEFF. (FOR TESTING, REFER TO A T-TABLE FOR N=M-K-1 DEGREES OF FREEDOM, WHERE K=NUMBER OF VARIABLES INCLUDED IN THE FINAL REGRESSION EQUATION.) 3.0 DETAILED OPTION DESCRIPTIONS OPTION 1. MIS - MISSING DATA THE USER ELECTS THIS OPTION WHEREVER THERE IS AT LEAST ONE DATA POINT MISSING FROM THE DATA SUBMITTED. WHENEVER THE OPTION IS USED, THE USER MUST MAKE TWO DECISIONS: (I) WHETHER TO USE A SINGLE MISSING DATA SYMBOL FOR ALL VARIABLES OR TO USE A SEPARATE MISSING DATA SYMBOL FOR EACH OF THE N VARIABLES, AND (II) WHETHER TO INSTRUCT THE COMPUTER TO REPLACE MISSING VALUES BY THE MEAN OF THE NON-MISSING DATA POINTS FOR THAT VARIABLE OR TO DELETE THAT PARTICULAR OBSERVATION FROM ALL REGRESSION CALCULATIONS. RULES FOR MISSING DATA SYMBOLS: (I) A MISSING DATA SYMBOL MUST BE AN INTEGER OR A DECIMAL NUMBER. (II) A LETTER CANNOT BE A MISSING DATA SYMBOL. (III) THE NUMBER USED FOR A MISSING DATA SYMBOL MUST NOT BE EQUAL TO ANY VALID INPUT DATA POINT SUBMITTED. (IV) WHEN A SEPARATE MISSING DATA SYMBOL IS USED FOR EACH VARIABLE, THEN (A) THEY ARE SEPARATED BY COMMAS. (B) THERE MUST BE EXACTLY AS MANY MISSING DATA SYMBOLS AS THERE ARE VARIABLES, EVEN THOUGH SOME VARIABLES DO NOT CONTAIN MISSING DATA. (C) IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, THEY MUST BE ENTERED AT THE RATE OF 10 PER LINE. WE WILL ILLUSTRATE THE USE OF THIS OPTION WITH TWO EXAMPLES. FOR EXAMPLE 1 AND 2, WE ARE SHOWING ONLY THOSE INSTRUCTIONS, QUESTIONS, AND RESPONSES WE WANT TO EMPHASIZE. EXAMPLE 1 HOW MANY VARIABLES: 3 ENTER OPTIONS OR TYPE "HELP" MIS IS THERE MORE THAN ONE MISSING DATA SYMBOL? NO ENTER MISSING DATA SYMBOL (1 IS THE MISSING DATA SYMBOL FOR VARIABLES 1, 2, AND 3.) EXAMPLE 2 IS THERE MORE THAN ONE MISSING DATA SYMBOL? YES ENTER MISSING DATA SYMBOLS 9,8,9 (9 IS THE MISSING DATA SYMBOL FOR VARIABLES 1 AND 3; 8 IS THE MISSING DATA SYMBOL FOR VARIABLE 2.) OPTION 2. TRA - TRANSFORMATION OF VARIABLES THIS OPTION ALLOWS ONE TO MAKE TRANSFORMATIONS OF VARIABLES PRIOR TO THE REGRESSION ANALYSIS. TABLE 3 CONTAINS 15 POSSIBLE TRANSFORMATIONS WHICH CAN BE USED. RULES FOR USING TRANSFORMATIONS ARE: (I) TRANSFORMATIONS WILL BE PERFORMED IN THE ORDER THAT THE USER SUBMITS THEM. (II) NO MORE THAN 40 TRANSFORMATIONS ARE ALLOWED. (III) A VARIABLE MAY BE TRANSFORMED MORE THAN ONCE. (IV) NEW VARIABLES MAY BE GENERATED. (V) AFTER ALL THE TRANSFORMATIONS HAVE BEEN GIVEN, THE USER MUST TYPE END AND ENTER RETURN. (VI) ALL TRANSFORMATIONS MUST BE OF THE FORM IN TABLE 3. THREE ILLUSTRATIVE EXAMPLES OF THE TRANSFORMATION OPTION ARE GIVEN. NOTE THAT A**B MEANS A RAISED TO THE POWER B. TABLE 3 1. (I) = -(J) VARIABLE I BECOMES (-VARIABLE J) 2. (I) = (J) 3. (I) = (J)**A 4. (I) =(J)+(K) 5. (I) = (J)*(K) VARIABLE I = (VARIABLE J)* (VARIABLE K) 6. (I) = LN(J) 7. (I) = LOG(J) 8. (I) = E**(J) 9. (I) = 10**(J) 10. (I) = (J)+A 11. (I) = (J)*A VARIABLE I = A(VARIABLE J) 12. (I) = SIN (J) 13. (I) = COS (J) 14. (I) = (J)/(K) VARIABLE I = (VARIABLE J)/(VARIABLE K) 15. (I) = (I) - (J) NOTATION: (I), (J), (K) DENOTE VARIABLES. A DENOTES A CONSTANT WHICH THE USER SPECIFIES. IN EXAMPLES 1, 2, AND 3 WE ARE SHOWING ONLY THOSE INSTRUCTIONS, QUESTIONS, AND RESPONSES WE WANT TO EMPHASIZE. EXAMPLE 1 POLYNOMIAL REGRESSION (CUBIC) HOW MANY VARIABLES? 2 ENTER OPTIONS OR TYPE "HELP" TRA ENTER TRANSFORMATIONS (3)=(2)**2 (4)=(2)**3 END WHICH IS THE DEPENDENT VARIABLE? 1 (LETTING Y=VARIABLE 1 AND X=VARIABLE 2, THE MODEL IS: Y = A + B(1)X + B(2)X + B(3)X , WHERE VARIABLE 3=X*X AND VARIABLE 4=X*X*X.) EXAMPLE 2 SECOND DEGREE RESPONSE SURFACE HOW MANY VARIABLES: 3 ENTER OPTIONS OR TYPE "HELP" TRA ENTER TRANSFORMATIONS (4)=(2)**2 (5)=(2)*(3) (6)=(3)**2 END WHICH IS THE DEPENDENT VARIABLE? 1 (LETTING Y=VARIABLE 1, X=VARIABLE 2, AND Z=VARIABLE 3, THE MODEL IS: Y= A + B(1)X + B(2)Z + B(3)X + B(4)XZ + BZ , WHERE VARIABLE 4=X*X, VARIABLE 5=XZ, AND VARIABLE 6=Z*Z.) EXAMPLE 3 NON-LINEAR FIT MODEL Y = VARIABLE 1 X = VARIABLE 2 FIND LEAST SQUARES ESTIMATING FOR A AND B, WHERE Y = A + B(1) SIN (1/(1+X)) SOLUTION: HOW MANY VARIABLES: 2 ENTER OPTIONS TRA ENTER TRANSFORMATIONS (2)=(2)+1 (X IS REPLACED BY X+1) (2)=(2)**-1 (X IS REPLACED BY 1/(1+X)) (2)=SIN(2) (X IS REPLACED BY SIN(1/(1+X))) END WHICH IS THE DEPENDENT VARIABLE? 1 (VARIABLE 2 IS NOW SIN(1/(1+X))). OPTION 3. MVS - MEANS, STANDARD DEVIATIONS, AND VARIANCES FOR EACH VARIABLE AND PRINTED OPTION 4. XPR - ROW SUMS OF SQUARES AND CROSS PRODUCTS FOR EACH PAIR OF VARIABLES IS PRINTED. (CROSS PRODUCT = SUM X(I,J)X(I,K) FOR EACH VARIABLE PAIR J AND K, I GOES FROM 1 TO M.) OPTION 5. COV - THE N BY N VARIANCE-COVARIANCE MATRIX OF THE N VARIABLES IS PRINTED. OPTION 6. COR - THE N BY N CORRELATION, FISHER Z, AND T-VALUE MATRICES ARE PRINTED. (LET R(IJ) = CORRELATION OF VARIABLES I AND J FISHER Z = Z(IJ) = 1/2*LN((1+R(I,J))/(1-R(I,J)))*SQRT(M-3) AND T-VALUE = T(IJ) = R(I,J)*SQRT((N(I,J)-2)/(1-R(I,J)*R(I,J))) OPTION 7. FVL - F-VALUES. ALLOWS ONE TO SPECIFY F-VALUES FOR ENTERING AND DELETING VARIABLES FROM THE REGRESSION EQUATION. TWO RULES FOR USING FVL: (I) TWO F-VALUES MUST BE SPECIFIED BY THE USER; ONE FOR ENTER- ING A VARIABLE AND ONE FOR DELETING A VARIABLE. (II) THE F-VALUE FOR ENTERING A VARIABLE MUST BE EQUAL TO OR LARGER THAN THE F-VALUE FOR REMOVING A VARIABLE. A VARIABLE MAY BE SIGNIFICANT AT AN EARLY STAGE OF THE STEPWISE PROCEDURE AND THUS ENTER THE EQUATION; BUT LATER, AFTER MORE VARIABLES HAVE BEEN ADDED, THIS VARIABLE MAY BECOME INSIGNIFICANT. IT WILL THEN BE DELETED FROM THE REGRESSION EQUATION BEFORE AN ADDITIONAL VARIABLE IS ADDED. ONLY SIGNIFICANT (AT THE USER'S SPECIFIED LEVEL) VARIABLES ARE INCLUDED IN THE FINAL EQUATIONS. SIGNIFICANCE OF A VARIABLE IS INDICATED BY THE FACT THAT THE F-LEVEL CALCULATED BY THE STEPR PROGRAM FOR THAT VARIABLE IS GREATER THAN OR EQUAL TO THE USER SPECIFIED F-VALUE. THE F-STATISTIC (LEVEL) CALCULATED BY THE STEPR PROGRAM FOR ENTERING A VARI- ABLE HAS AN F-DISTRIBUTION WITH 1 AND M-K-1 DEGREES OF FREEDOM, WHERE M = NUMBER OF OBSERVATIONS, AND K = NUMBER OF VARIABLES INCLUDED IN THE REGRESSION OF THIS PARTI- CULAR STAGE. HOWEVER, THE USER SPECIFIES A SINGLE F-VALUE FOR ENTERING A VARIABLE WHICH IS USED AT EACH STAGE. HENCE, THE TESTS FOR ENTERING AND DELETING A VARIABLE ARE SLIGHTLY ARBITRARY. TABLE 1 INCLUDES THE ALPHA =.05 AND ALPHA =.01 CRITICAL VALUES FOR AN F-DISTRIBUTION WITH 1 AND M-K-1 DEGREES OF FREEDOM. EXAMPLE ENTER OPTIONS OR TYPE "HELP" FVL ENTER F-VALUES FOR ENTERING A VARIABLE 4.0 ENTER F-VALUE FOR OMITTING A VARIABLE 3.5 WHEN OPTION FVL IS NOT SPECIFIED, THE COMPUTER ASSUMES THAT BOTH F-VALUES ARE ZERO; HENCE, ALL VARIABLES ARE INCLUDED IN THE FINAL REGRESSION EQUATION AND NONE ARE ELIMINATED. NOTE: INTERMEDIATE STEPS ARE PRINTED. IT IS POSSIBLE THAT ROUNDOFF ERRORS AND OTHER REASONS CAUSE THE CORRELATION MATRIX TO BE NEAR- SINGULAR OR SINGULAR. THIS COULD CAUSE ONE OR MORE VARIABLES NOT TO BE INCLUDED IN THE FINAL REGRESSION EQUATION. OPTION 8. FOR - SPECIFIED VARIABLES ARE FORCED INTO THE FINAL REGRESSION EQUA- TION. INTERMEDIATE STEPS ARE PRINTED ONLY FOR THOSE VARIABLES WHICH ARE NOT FORCED INTO THE REGRESSION EQUATION. IN THE CASE YOU FORCE ALL THE VARIABLES SUBMITTED TO THIS PROGRAM, NO INTERMEDIATE STEPS ARE PRINTED AND ONLY THE FINAL REGRESSION EQUATION IS SHOWN. FORCING ALL THE VARIABLES MAKES THE STEPR PROGRAM BEHAVE AS A MULTI- PLE REGRESSION PROGRAM. NOTE: DO NOT FORCE THE DEPENDENT VARIABLE. FOR THE LAST TWO EXAMPLES GIVEN BELOW, ONLY THOSE INSTRUCTIONS, QUESTIONS, AND RESPONSES WE WANT TO EMPHASIZE ARE SHOWN. EXAMPLE ENTER OPTIONS OR TYPE "HELP" (SUPPOSE THAT 6 VARI- ABLES HAVE BEEN ENTERED) FOR ENTER NUMBER OF VARIABLES TO BE FORCED INTO THE REGRESSION 3 WHICH ARE THEY? (MAX: 20 PER LINE) 1,4,5 (THE REGRESSION ANALYSIS IS THEN PERFORMED DIRECTLY ON THE MODEL WITH VARIABLE 2 AS THE DEPENDENT VARIABLE AND VARIABLES 1,4, AND 5 AS THE INDEPENDENT VARI- ABLES.) OPTION 9. ELM - THIS OPTION ALLOWS THE USER TO ELIMINATE SPECIFIED VARIABLES FROM THE STEPWISE REGRESSION AT ALL STAGES. EXAMPLE ENTER NUMBER OF VARIABLES 6 ENTER OPTIONS OR TYPE "HELP" ELM WHICH IS THE DEPENDENT VARIABLE? 2 HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE? 2 WHICH ARE THEY? (MAX: 20 PER LINE) 1,5 (THE STEPWISE REGRESSION IS THEN RUN ON THE MODEL WITH VARIABLE 2 AS THE DE- PENDENT VARIABLE AND VARIABLES 3,4, AND 6 AS THE INDEPENDENT VARIABLES.) NOTE: DO NOT ELIMINATE THE DEPENDENT VARIABLE! OPTION 10. RES - RESIDUALS ARE PRINTED IN THE FINAL STEP IN THE REGRESSION. (IN THE OUTPUT TABLE TITLED PREDICTED VS ACTUAL RESULTS, THE NUMBERS UNDER ACTUAL ARE THE OBSERVATIONS SUPPLIED BY THE USER OR THE VALUES RESULTING FROM TRANSFORMATION OF OBSERVATIONS SUPPLIED BY THE USER FOR THE DEPENDENT VARIABLE. THE NUMBERS UNDER PREDICTED ARE CALCULATED FROM THE FINAL REGRESSION EQUATION --I.E., Y(PI) = B(0) + B(1) X(1I) + ... + B(N) X(NI), ..., WHERE X(1I),..., X(NI) ARE THE VALUES OF THE I-TH OBSERVATION OF THE N INDEPENDENT VARIABLES AND Y(PI) IS THE PREDICTED VALUE FOR THE I-TH OBSERVATION OF THE DEPENDENT VARIABLE.) OPTION 11. ANA - AN ANALYSIS OF VARIANCE IS PRINTED AFTER THE FINAL STAGE IN THE REGRESSION. OPTION 12. DUR - DURBIN-WATSON STATISTICS ARE PRINTED. THE DURBIN-WATSON STATISTIC IS A TEST FOR POSITIVE OR NEGATIVE CORRELATION. SPECIAL DURBIN-WATSON TABLES MUST BE CONSULTED IN ORDER TO INTERPRET THE SIGNIFICANCE OF THIS STATISTIC. FOR A COMPLETE DISCUSSION OF THIS STATISTIC AND TABLES, THE USER SHOULD CONSULT "TESTING FOR SERIAL CORRELATION IN LEAST SQUARE REGRESSION I", BIOMETRIKA, VOLUME 37 (1950), PAGES 409-428 AND "TESTING FOR SERIAL CORRELATION IN LEAST SQUARE REGRESSION II", BIOMETRIKA, VOLUME 38 (1951), PAGES 159-178. BOTH ARTICLES ARE BY J. DURBIN AND G.S. WATSON. OPTION 13. ZER - FORCING REGRESSION THROUGH ZERO. THIS OPTION FORCES THE CONSTANT TERM OF THE UNSTANDARDIZED REGRESSION EQUATION TO BE ZERO. 4.0 LIMITATIONS 1. NO MORE THAN 70 VARIABLES 2. NO MORE THAN 40 TRANSFORMATIONS. 3. NO MORE THAN 5 FORMAT CARDS. 4. NO LIMIT ON OBSERVATIONS. 5. IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, THEY MUST BE ENTERED AT THE RATE OF 10 PER LINE. 6. WITH STANDARD FORMAT IF THERE ARE MORE THAN 10 VARIABLES, THEY MUST BE ENTERED AT THE RATE OF 10 PER LINE. 5.0 METHOD OF USE AN EXAMPLE OF RUNNING THE STEPR PROGRAM IS GIVEN HERE WITH A LINE BY LINE EXPLANATION FOLLOWING IT. ^Z AND LINES ENDING WITH CONTAIN USER'S RESPONSES. AN ^ MEANS THAT THE CHARACTER FOLLOWING IT IS TYPED WHILE DEPRESSING THE CONTROL (CTRL) KEY. A INDICATES THAT THE RETURN KEY IS PRESSED. .R STEPR WMU STEPWISE REGRESSION LINE 1 OUTPUT? TTY: LINE 2 INPUT? TTY: LINE 3 FORMAT: (F-TYPE ONLY) LINE 4 STD LINE 5 ENTER NUMBER OF VARIABLES LINE 6 4 LINE 7 ENTER IDENTIFICATION IF DESIRED OTHERWISE RETURN LINE 8 TRIAL RUN LINE 9 ENTER OPTIONS OR TYPE "HELP" LINE 10 MIS,TRA,MVS,FVL,FOR,ELM,ANA LINE 11 IS THERE MORE THAN ONE MISSING DATA SYMBOL? LINE 12 NO LINE 13 ENTER MISSING DATA SYMBOL LINE 14 0 HOW WOULD YOU LIKE TO COMPENSATE FOR MISSING DATA? TYPE: LINE 15 1 - TO REPLACE MISSING DATA BY MEAN VALUE 2 - TO DELETE THE OBSERVATION LINE 16 1 LINE 17 ENTER TRANSFORMATIONS LINE 18 (5)=E**(4) LINE 19 END LINE 20 ENTER DATA (AT MOST 10 PER LINE) 1,3,2,4 0,1,3,2 1,4,2,3 LINE 21 7,1,4,5 1,1,2,1 1,0,5,5 9,7,8,7 ^Z [OUTPUT] LINE 22 WHICH IS THE DEPENDENT VARIABLE? LINE 23 1 LINE 24 ENTER F-VALUE FOR ENTERING A VARIABLE. LINE 25 .2 LINE 26 ENTER F-VALUE FOR OMITTING A VARIABLE. LINE 27 .2 LINE 28 HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE? LINE 29 1 LINE 30 WHICH ARE THEY? (MAX: 20 PER LINE) LIEN 31 4 LINE 32 ENTER NUMBER OF VARIABLES TO BE FORCED INTO THE REGRESSION. LINE 33 2 LINE 34 WHICH ARE THEY? (MAX: 20 PER LINE) LINE 35 2,3 [OUTPUT] LINE 36 DO YOU WISH TO REANALYZE THE SAME DATA? LINE 37 NO LINE 38 INPUT? FINISH CPU TIME: 2.03 ELPASED TIME: 10:1.35 NO EXECUTION ERRORS DETECTED EXIT . THE FOLLOWING IS AN EXPLANATION OF THE EXAMPLE LISTED ABOVE. LINE 1 OUTPUT? LINE 2 INPUT? LINES 1 AND 2 DEFINE WHERE THE USER INTENDS TO WRITE HIS OUTPUT FILE (LINE 1) AND FROM WHERE THE USER EXPECTS TO READ HIS INPUT DATA (LINE 2). SEE NOTE (2) BELOW FOR OTHER INPUT OPTIONS. THE PROPER RESPONSE TO EACH OF THESE QUESTIONS CONSISTS OF THREE BASIC PARTS: A DEVICE, A FILENAME, AND A PROJECT-PROGRAMMER NUMBER. THE GENERAL FORMAT FOR THESE THREE PARTS IS AS FOLLOWS: DEV:FILE.EXT[PROJ,PROG] 1) DEV: ANY OF THE FOLLOWING DEVICES ARE APPROPRIATE WHERE INDICATED: DEVICE LIST DEFINITION STATEMENT USE TTY: TERMINAL INPUT OR OUTPUT DSK: DISK INPUT OR OUTPUT CDR: CARD READER INPUT ONLY LPT: LINE PRINTER OUTPUT ONLY DTA0: DECTAPE 0 INPUT OR OUTPUT DTA1: DECTAPE 1 INPUT OR OUTPUT DTA2: DECTAPE 2 INPUT OR OUTPUT DTA3: DECTAPE 3 INPUT OR OUTPUT DTA4: DECTAPE 4 INPUT OR OUTPUT DTA5: DECTAPE 5 INPUT OR OUTPUT DTA6: DECTAPE 6 INPUT OR OUTPUT DTA7: DECTAPE 7 INPUT OR OUTPUT MTA0: MAGNETIC TAPE 0 INPUT OR OUTPUT MTA1: MAGNETIC TAPE 1 INPUT OR OUTPUT INPUT MAY NOT BE DONE FROM THE LINE PRINTER NOR MAY OUTPUT GO TO THE CARD READER. 2) FILE.EXT IS THE NAME AND EXTENSION OF THE FILE TO BE USED. THIS PART OF THE SPECIFICATION IS USED ONLY IF DISK OR DECTAPE IS USED. 3) [PROJ,PROG] IF A DISK IS USED AND THE USER WISHES TO READ A FILE IN ANOTHER PERSON'S DIRECTORY, HE MAY DO SO BY SPECIFYING THE PROJECT-PROGRAMMER NUMBER OF THE DIRECTORY FROM WHICH HE WISHES TO READ. THE PROJECT NUMBER AND THE PROGRAMMER NUMBER MUST BE SEPARATED BY A COMMA AND ENCLOSED IN BRACKETS. OUTPUT MUST GO TO YOUR OWN AREA. EXAMPLE: OUTPUT? LPT:/2 INPUT? DSK:DATA.DAT[71171,71026] IN THE EXAMPLE, TWO COPIES OF THE OUTPUT ARE TO BE PRINTED BY THE HIGH SPEED LINE PRINTER. THE INPUT DATA IS A DISK FILE OF NAME DATA.DAT IN USER DIRECTORY [71171,71026]. DEFAULTS: 1) IF NO DEVICE IS SPECIFIED BUT A FILENAME IS SPECIFIED THE DEFAULT DEVICE WILL BE DSK: 2) IF NO FILENAME IS SPECIFIED AND A DISK OR DECTAPE IS USED THE DEFAULT ON INPUT WILL BE FROM INPUT.DAT; ON OUTPUT IT WILL BE OUTPT.DAT. 3) IF THE PROGRAM IS RUN FROM THE TERMINAL AND NO SPECIFICATION IS GIVEN (JUST A CARRIAGE RETURN) BOTH INPUT AND OUTPUT DEVICES WILL BE THE TERMINAL. 4) IF THE PROGRAM IS RUN THROUGH BATCH AND NO SPECIFICATION IS GIVEN, (A BLANK CARD) THE INPUT DEVICE WILL BE CDR: AND THE OUTPUT DEVICE WILL BE LPT: 5) IF NO PROJECT-PROGRAMMER NUMBER IS GIVEN, THE USER'S OWN NUMBER WILL BE ASSUMED. NOTE: (1) IF LPT: IS USED AS AN OUTPUT DEVICE MULTIPLE COPIES MAY BE OBTAINED BY SPECIFYING LPT:/N WHERE N REFERS TO THE NUMBER OF COPIES DESIRED. (2) THE FOLLOWING TWO OPTIONS ARE NOT APPLICABLE FOR THE FIRST DATA SET, I.E., IT IS APPLICABLE ONLY WHEN THE PROGRAM BRANCHES BACK TO LINE 2 UPON FIRST COMPLETION OF LINES 1-37. (A) SAME OPTION UPON RETURNING FROM LINE 37, IF THE SAME DATA FILE IS TO BE USED AGAIN SIMPLY ENTER "SAME", OTHERWISE, EITHER USE THE FINISH OPTION OR ENTER ANOTHER FILENAME ETC. (B) FINISH OPTION THE USER MUST ENTER "FINISH" TO BRANCH OUT OF THE STEPR PROGRAM. FAILURE TO DO SO MIGHT RESULT IN LOSING THE ENTIRE OUTPUT FILE. LINES 3-4 THERE ARE 3 OPTIONS AVAILABLE FOR THE FORMAT, NAMELY: (A) STANDARD FORMAT OPTION UNLESS OTHERWISE SPECIFIED, THE PROGRAM ASSUMES THE STANDARD OPTION. IN THIS OPTION, THE DATA ARE ARRANGED IN GROUPS OF 10 PER LINE, TWO VALUES BEING SEPARATED BY A COMMA. TO USE THIS OPTION, SIMPLY TYPE IN "" ON TERMINAL JOBS OR USE A BLANK CARD FOR BATCH JOBS OR ENTER "STD". (B) OBJECT TIME FORMAT OPTION IF THE DATA IS SUCH THAT A USER'S OWN FORMAT IS REQUIRED, SIMPLY ENTER A LEFT PARENTHESIS FOLLOWED BY THE FIRST FORMAT SPECIFICATION, A COMMA AND THE SECOND SPECIFICATION, ETC. WHEN YOU FINISH ENTER A RIGHT PARENTHESIS, AND THEN A CARRIAGE RETURN. THERE CAN BE A MAXIMUM OF 5 LINES FOR THE FORMAT, EACH LINE BEING 80 COLUMNS LONG. NOTE THAT THE FORMAT SPECIFICATION LIST MUST USE THE FLOATING POINT (F-TYPE) NOTATION AND MUST CONTAIN SPECIFICATION FOR EACH OF THE VARIABLES. THE SPECIFICATIONS FOR THE FORMAT ITSELF ARE THE SAME AS FOR THE FORTRAN IV FORMAT STATEMENT. (C) SAME OPTION THE SAME OPTION IS APPLICABLE ONLY TO JOBS THAT USE MORE THAN ONE DATA FILE. IF AN OBJECT TIME FORMAT WAS USED ON A DATA SET AND THE SUCCEEDING DATA SET UTILIZES THE SAME FORMAT, SIMPLY ENTER "SAME". LINES 5-6. ON LINE 6 ENTER THE NUMBER OF VARIABLES TO BE ENTERED INTO THE PROGRAM FROM THE DATA. DO NOT COUNT ANY VARIABLES GENERATED DURING TRANSFORMATIONS. THIS NUMBER MUST BE LESS THAN OR EQUAL TO 70. LINES 7-8. THE USER MAY IDENTIFY HIS OUTPUT FROM THIS PROGRAM BY UP TO 80 CHARACTERS BY ENTERING THEM ON LINE 8. IF NO OUTPUT IDENTIFICATION IS WANTED, THE ENTERING OF A CARRIAGE RETURN ON LINE 8 WILL CAUSE THE COMPUTER TO SKIP TO LINE 9. LINES 9-10. AT THIS POINT THE USER MUST SUBMIT THE OPTIONS HE HAS SELECTED. (SEE SECTION 3.0) THESE THREE DIGIT CODES MAY BE SUBMITTED IN ANY ORDER. TYPING HELP GIVES AN OUTPUT SIMILAR TO TABLE 1 ON PAGE 1, THEN THE QUESTION IS REPEATED. LINES 11-14. THE QUESTIONS ON LINES 11, 13, AND 15 WILL ONLY BE ASKED AND ANSWERS WILL ONLY BE REQUIRED IF THE MIS OPTION IS SELECTED. THE USER CAN SPECIFY A SINGLE MISSING DATA SYMBOL FOR ALL VARIABLES. IN THIS CASE NO IS TYPED ON LINE 12 AND THE SINGLE MISSING DATA SYMBOL IS TYPED ON LINE 14. THE ALTERNATIVE TO SPECIFYING A SINGLE MISSING DATA SYMBOL IS TO SPECIFY A MISSING DATA SYMBOL FOR EACH VARIABLE SUBMITTED BY THE USER. IN THIS CASE YOU MUST TYPE YES ON LINE 12. THEN THIS PROGRAM WILL RESPOND BY CAUSING THE PRINTING OF ENTER MISSING DATA SYMBOLS ON LINE 13. YOU MUST THEN TYPE ON LINE 14 THE MISSING DATA SYMBOLS FOR EACH VARIABLE SEPARATED BY COMMAS. A MISSING DATA SYMBOL MUST BE AN INTEGER OR A DECIMAL NUMBER. A LETTER CANNOT A MISSING DATA SYMBOL. THE NUMBER USED FOR MISSING DATA SYMBOL MUST NOT BE EQUAL TO ANY VALID INPUT NUMBER SUBMITTED BY THE USER. EXAMPLE: LINE 11. IS THERE MORE THAN ONE MISSING DATA SYMBOL? LINE 12. YES LINE 13. ENTER MISSING DATA SYMBOLS LINE 14. 9,8,9 9 IS THE MISSING DATA SYMBOL FOR VARIABLES 1 AND 3; AND 8 IS THE MISSING DATA SYMBOL FOR VARIABLE 2. IF THERE ARE MORE THAN 10 MISSING DATA SYMBOLS, ENTER THEM AT THE RATE OF 10 PER LINE. CAUTION: THERE MUST BE EXACTLY AS MANY MISSING DATA SYMBOLS AS THERE ARE VARIABLES SUBMITTED BY THE USER, EVEN THOUGH SOME VARIABLES DO NOT CONTAIN MISSING DATA. LINES 15-16. SELECT THE METHOD TO BE USED TO COMPENSATE FOR MISSING DATA. IF 1 IS SELECTED MISSING DATA CODES WILL BE REPLACED BY THE MEAN VALUE OF THE VALID CODES IN THE VARIABLE IN WHICH THE MISSING DATA CODE APPEARS. IF 2 IS SELECTED ANY OBSERVATION IN WHICH A MISSING DATA CODE APPEARS WILL BE OMITTED FROM THE ANALYSIS. LINES 17-19. THE QUESTION ON LINE 17 WILL APPEAR ONLY IF THE TRA OPTION HAS BEEN SELECTED. THE USER MUST SUBMIT TRANSFORMATIONS OF THE ORIGINAL DATA AND ENTER AN END WHEN ALL TRANSFORMATIONS HAVE BEEN ENTERED. SEE THE DETAILED DESCRIPTION OF THE TRA OPTION UNDER PART 3.0 OF THE DOCUMENT. LINES 20-21. IF THE INPUT DEVICE SELECTED IN LINE 2 WAS TTY: THE DATA MUST BE ENTERED AT THIS POINT. BE SURE TO SUBMIT IT IN THE FORMAT SELECTED IN LINE 4. DATA ENTRY IS TERMINATED BY A ^Z. IF SOME OTHER DEVICE WAS SELECTED FOR INPUT THE DATA WILL BE READ FROM THE DEVICE AT THIS POINT. (IT MAY TAKE A FEW MINUTES TO READ THE DATA.) LINES 22-23. AT THIS POINT ONE OF THE VARIABLES THAT WERE EITHER SUBMITTED OR GENERATED WITH TRANSFORMATIONS MUST BE SELECTED AS A DEPENDENT VARIABLE. LINES 24-27. IF THE FVL OPTION WAS SELECTED, THE F-VALUE CRITERIA FOR ENTERING AND OMITTING VARIABLES MUST BE SUBMITTED ON LINES 25 AND 27. SEE THE DESCRIPTION OF THE FVL OPTION UNDER PART 3.0 OF THIS DOCUMENT. LINES 28-31. IF THE ELM OPTION WAS SELECTED THE QUESTIONS ON LINES 28 AND 30 WILL BE ASKED. THE RESPONSES TO THESE QUESTIONS GIVE THE USER THE OPPORTUNITY TO PERFORM A STEPWISE REGRESSION ON A SUBSET OF THE EXISTING VARIABLES. SEE THE ELM OPTION UNDER PART 3.0 OF THIS DOCUMENT. LINES 32-35. IF THE FOR OPTION WAS SELECTED THE QUESTIONS ON LINES 32 AND 34 WILL BE ASKED. THE RESPONSES TO THESE QUESTIONS GIVE THE USER THE ABILITY TO FORCE VARIABLES INTO THE REGRESSION REGARDLESS OF THEIR STATISTICAL SIGNIFICANCE. SEE THE FOR OPTION UNDER PART 3.0 OF THIS DOCUMENT. LINES 36-37. IF THE RESPONSE TO THE QUESTION ON LINE 36 IS YES THE PROGRAM WILL TRANSFER CONTROL TO LINE 22 AND OTHER VARIABLES MAY BE FORCED OR OMITTED AS OTHER F-VALUES MAY BE SELECTED. IF THE RESPONSE IS NO THE PROGRAM WILL REQUEST MORE INPUT. LINE 38. IF THE USER WISHES TO PROCESS ANOTHER SET OF DATA HE MAY BE SELECTING THE APPROPRIATE INPUT OPTION, (SEE LINE 2) OR HE MAY EXIT FROM THE PROGRAM BY TYPING FINISH. 6.0 SAMPLE TERMINAL RUN .R STEPR WMU STEPWISE REGRESSION OUTPUT? (TYPE HELP IF NEEDED)--TTY: INPUT? (TYPE HELP IF NEEDED)--TTY: FORMAT: (F-TYPE ONLY) STD ENTER NUMBER OF VARIABLES 4 ENTER IDENTIFICATION IF DESIRED OTHERWISE RETURN TRIAL RUN ENTER OPTIONS OR TYPE "HELP" MIS,TRA,MVS,FVL,ELM,ANA IS THERE MORE THAN ONE MISSING DATA SYMBOL ? NO ENTER MISSING DATA SYMBOL 0 HOW WOULD YOU LIKE TO COMPENSATE FOR MISSING DATA? TYPE: 1 - TO REPLACE MISSING DATA BY MEAN VALUE 2 - TO DELETE THE OBSERVATION 1 ENTER TRANSFORMATIONS (5)=E**(4) END ENTER DATA (AT MOST 10 PER LINE) 1,3,2,4 0,1,3,2 1,4,2,3 7,1,4,5 1,1,2,1 1,0,5,5 9,7,8,7 ^Z TRIAL RUN THE NUMBER OF OBSERVATIONS IS 7 VARIABLE MEAN VARIANCE STD. DEV. 1 3.333333 11.22222 3.349959 2 2.833333 4.805556 2.192158 3 3.714286 4.904762 2.214670 4 3.857143 4.142857 2.035401 5 211.1786 156321.4 395.3751 WHICH IS THE DEPENDENT VARIABLE? 1 ENTER F-VALUE FOR ENTERING A VARIABLE. .2 ENTER F-VALUE FOR OMITTING A VARIABLE. .2 HOW MANY VARIABLES WOULD YOU LIKE TO ELIMINATE? 1 WHICH ARE THEY?(MAX: 20 PER LINE) 4 VARIABLE NO. 1 IS DEPENDENT ELIMINATE 4 STANDARD ERROR OF Y = 3.349959 STEP NO. 1 VARIABLE ENTERING 5 F LEVEL 7.9830 F-PROB = 0.03687 STANDARD ERROR OF ESTIMATE = 2.2773 COEFFICIENT OF DETERMINATION = 0.61488 COEFFICIENT OF MULTIPLE REGRESSION = 0.78414 INCREASE IN COEFFICIENT OF DETERMINATION = 0.61488 DEGREES OF FREEDOM = 5 CONSTANT STD. ERR. T T-PROB 1.930277 0.9937278 1.942460 0.10972 VARIABLE COEFFICIENT STD ERROR OF COEF X= 5 0.00664 0.00235 STEP NO. 2 VARIABLE ENTERING 2 F LEVEL = 3.6842 F-PROB = 0.12736 STANDARD ERROR OF ESTIMATE = 1.8370 COEFFICIENT OF DETERMINATION = 0.79953 COEFFICIENT OF MULTIPLE REGRESSION = 0.89416 INCREASE IN COEFFICIENT OF DETERMINATION = 0.18465 DEGREES OF FREEDOM = 4 CONSTANT STD. ERR. T T-PROB 4.117491 1.393215 2.955387 0.04175 VARIABLE COEFF. STD ERR OF COEFF. T-VALUE T-PROB STANDARDIZED COEFF. X( 2) -1.17425 0.611774 -0.768411 -1.919 0.12736 X( 5) 0.01204 0.003392 1.421175 3.550 0.02380 ANALYSIS OF VARIANCE SOURCE SUM OF SQ. DF MEAN SQ. F F-PROB REGRESSION 53.834723 2 26.91736 7.976 0.04019 ERROR 13.498612 4 3.37465 TOTAL 67.333334 6 DO YOU WISH TO REANALYZE THE SAME DATA? NO INPUT? (TYPE HELP IF NEEDED)--FINISH END OF EXECUTION CPU TIME: 0.80 ELAPSED TIME: 9.73 EXIT 7.0 BATCH OPERATION THE FOLLOWING IS A BATCH JOB SET UP: (EACH LINE REPRESENTS ONE CARD, EACH CARD STARTING IN COLUMN 1; DO NOT INCLUDE THE COMMENTS AT THE RIGHT.) -------------------------------------------------------------------------------- COMMENTS $JOB [#,#] JOB CARD; INSERT USER'S PROJECT- PROGRAMMER NUMBER WITHIN THE BRACKET $PASSWORD ###### IN PLACE OF THE 6#'S, PUT IN THE PASSWORD $DATA SIGNIFY BEGINNING OF DATA DECK (DATA CARDS) INSERT THE DATA CARD DECK TO BE ANALYZED $EOD SIGNIFY THE END OF DATA CARD DECK .R STEPR START THE EXECUTION (RESPONSES TO LINES 1-38 IN SECTION 5.0 REPEATED OR NOT) USER'S RESPONSE (EOF) END-OF-FILE CARD -------------------------------------------------------------------------------- REFERENCE: "MATHEMATICAL METHODS FOR DIGITAL COMPUTERS", A. RALSTON AND H.S. WILF, 1960, JOHN WILEY & SONS, INC., ARTICLE BY M.A. EFROYMSON