.LM0;.RM75;.PS41,75;.TS71;.LC;.AP;.NNM;.FL CAP;# .BR;^FIRST PAGE .PAGE;# .SK10;.C;^^A USER PROGRAM FOR MULTIPLE LINEAR REGRESSION ANALYSIS\\ .BR;.C;====================================================== .SK7;.C;^MARTEN VAN ^GELDEREN .SK2;.C;^VERSION 5^H(246) .SK2;.C;16-^FEB-80 .PAGE;# .SK23;.C;^COPYRIGHT (^C) 1975, 1979, 1980 BY .BR;.C;^FOUNDATION ^MATHEMATICAL ^CENTRE, ^AMSTERDAM .BR;.C;^INSTITUTE FOR ^NUCLEAR ^PHYSICS ^RESEARCH, ^AMSTERDAM .SK;.LM+4;.RM-4;^GENERAL PERMISSION TO MAKE FAIR USE IN TEACHING OR RESEARCH OF ALL OR PART OF THIS MATERIAL IS GRANTED TO INDIVIDUAL READERS AND TO NONPROFIT ORGANIZATIONS, PROVIDED THAT THE COPYRIGHT NOTICE OF EITHER THE ^FOUNDATION ^MATHEMATICAL ^CENTRE OR THE ^INSTITUTE FOR ^NUCLEAR ^PHYSICS ^RESEARCH IS GIVEN AND THAT REFERENCE IS MADE TO THIS PUBLICATION AND TO THE FACT THAT REPRINTING PRIVILEGES WERE GRANTED BY PERMISSION OF ONE OF THE ABOVE MENTIONED ORGANIZATIONS. .LM-4;.RM+4;.PAGE;=FRM) .BR;REGRESSION P-1 #B'^X'^Y#-#NU_^2 ####=FRR) .BR;RESIDUAL N-P ##=FRL) .BR;#PURE ERROR N-K ###=FRQ) .BR;--------------------------------------------------------------------------- ^THE COLUMN 'MEAN SQUARE' IS OBTAINED BY DIVISION OF THE SUMS OF SQUARES BY THEIR CORRESPONDING DEGREES OF FREEDOM. ^THE COLUMN '^F-RATIO' IS OBTAINED BY DIVISION OF THE MEAN SQUARES BY THE RESIDUAL MEAN SQUARE, EXCEPT FOR THE LACK OF FIT ^F-RATIO, WHICH IS OBTAINED BY DIVISION OF THE LACK OF FIT MEAN SQUARE BY THE PURE ERROR MEAN SQUARE, THUS: .SK;.I3;=#. ^AS IN THE CASE OF SINGLE NAMES USED AS A REPETITION FACTOR EACH (NON-STANDARD#FUNCTION AND NON-OPTION) NAME USED IN SUCH A (SPECIAL) EXPRESSION MUST HAVE BEEN GIVEN, FOLLOWED BY A COMMA, EARLIER IN THE INPUT FORMULA THAN THE USE OF THAT NAME IN THE EXPRESSION. ^THE LINKAGE BETWEEN THE MODEL FORMULA AND THE INPUT FORMULA IS ESTABLISHED BY USING THE SAME NAMES IN THE MODEL TERMS AND IN THE INPUT VARIABLE LISTS. ^NUMBERS FROM THE INPUT DATA THAT BELONG TO SUCH INPUT NAMES WILL BE TREATED AS OBSERVATIONS FOR THE MODEL VARIABLES, WHILE NUMBERS THAT BELONG TO INPUT NAMES BETWEEN SQUARE BRACKETS WHICH DO NOT APPEAR IN THE MODEL FORMULA, ARE SKIPPED. ^OFTEN, REPEATED OBSERVATIONS FOR THE DEPENDENT VARIABLE ARE AVAILABLE. ^IN ORDER TO BE ABLE TO PROCESS THESE OBSERVATIONS AUTOMATICALLY, IT IS NECESSARY THAT A VARIABLE LIST CONSISTING ENTIRELY OF DEPENDENT VARIABLES IS PRECEEDED BY A REPETITION FACTOR (FOLLOWED BY AN ASTERISK) INDICATING THE NUMBER OF REPETITIONS. ^IF A VARIABLE LIST CONTAINS INDEPENDENT AS WELL AS DEPENDENT VARIABLES, THE NUMBER OF REPLICATIONS IS ASSUMED TO BE 1. ^A SERIES OF (SAY 100) OBSERVATIONS FOR A DEPENDENT VARIABLE WITH NO REPLICATIONS IS DENOTED AS: .SK;.C;100 * ([DEP VAR]) .SK;^THE REPETITION FACTOR IN FRONT OF THE OPENING SQUARE BRACKET IS OMITTED (BECAUSE IT IS 1), ALTHOUGH THE PARENTHESES ARE NOT. ^WITHOUT THE PARENTHESES IT WOULD MEAN 100 REPLICATIONS OF [DEP VAR]. .SK; * (C, M, M * [Y], [X1,X2,X3,X4], C), -99; .LM+12;.SK;.I-12;MEANS THAT:#THE FIRST NUMBER IS READ AND ITS VALUE ASSIGNED TO K, .BR;THE NEXT NUMBER IS READ AND ITS VALUE ASSIGNED TO N, .BR;THEN K+N TIMES THE FOLLOWING HAPPENS: .LM+5;.BR;A NUMBER IS READ AND ITS VALUE ASSIGNED TO C, .BR;THE NEXT NUMBER IS READ AND ITS VALUE ASSIGNED TO M, .BR;THEN THE M REPLICATIONS FOR Y ARE READ, .BR;NEXT THE OBSERVATIONS FOR X1, X2, X3 AND X4 ARE READ, .BR;THEN A NUMBER IS READ AND ITS VALUE COMPARED WITH C, .LM-5;.BR;FINALLY A NUMBER IS READ AND ITS VALUE COMPARED WITH -99. .LM-12;.SK;^IF THE COMPARISONS FAIL, AN ERROR MESSAGE IS SUPPLIED AND EXECUTION OF THE JOB IS TERMINATED, OTHERWISE (K+N) OBSERVATIONS FOR X1, X2, X3, X4 AND FOR EACH QUADRUPLE M REPLICATIONS FOR Y, HAVE BEEN IDENTIFIED. .SK2;2.3##^^THE OPTION SPECIFICATION\\ ^IT IS POSSIBLE TO HAVE THE PROGRAM PERFORM SOME TASKS OPTIONALLY BY PROVIDING AN OPTION SPECIFICATION IN A JOB. ^IT CONSISTS OF THE KEYWORD "^OPTIONS" FOLLOWED BY A LIST OF OPTION IDENTIFIERS OR CORRESPONDING OPTION NUMBERS (THE OPTION STATEMENT), SEPARATED BY COMMAS AND TERMINATED WITH A ';'#(SEMICOLON). ^THE FOLLOWING TEN OPTIONS ARE AVAILABLE: .SK;.TS20,40;.I15;OPTION NUMBER ##OPTION NAME .SK;# #1 ^TRANSFORMED DATA MATRIX .BR;# #2 ^CORRELATION MATRIX .BR;# #3 ^RESIDUAL ANALYSIS .BR;# #4 ^NO REGRESSION ANALYSIS .BR;# #5 ^PROCESS SUBMODELS .BR;# #6 ^PRINT INPUT DATA .BR;# #7 ^NO INPUT DATA REWIND .BR;# #8 ^SAVE ORIGINAL MODEL .BR;# #9 ^TEST REDUCED MODEL .BR;# 10 ^MISSING VALUES ^OPTIONS 1, 2, 3 AND 6 CAUSE THE CORRESPONDING PIECE OF INFORMATION TO BE PRINTED. ^HOWEVER, OPTION 1 LISTS ONLY THOSE (POSSIBLY TRANSFORMED) VARIABLES THAT ARE PRESENT IN THE MODEL FORMULA IN A NEAT TABULAR FORM, WHILE OPTION 6 LISTS ALL THE ORIGINAL INPUT DATA SERIALLY (ELEVEN NUMBERS PER LINE) WITHOUT ANY SPECIAL LAYOUT, BECAUSE THE INPUT DATA CONSISTS (BY DEFINITION) OF AN UNSTRUCTURED SERIES OF NUMBERS (CF.#SECTION 2.4). .SK;.I5;^OPTION 4 SUPPRESSES THE REGRESSION ANALYSIS; IT IS MEANT TO BE USED IN COMBINATION WITH OPTION 1 AND/OR 2. ^OPTION 5 CAUSES THE PROGRAM TO PROCESS SUBMODELS, WHICH ARE FORMED BY A FORM OF BACKWARD ELIMINATION: EACH TIME THE LAST TERM FROM THE RIGHT HAND PART FROM THE MODEL FORMULA IS OMITTED, BY DELETING THE LAST COLUMN FROM THE DESIGN MATRIX, AND A REGRESSION ANALYSIS IS PERFORMED WITH THE REDUCED DESIGN MATRIX. ^MESSAGES ARE GENERATED ABOUT WHICH TERMS ARE OMITTED, WHILE FURTHER PROCESSING OF THE JOB CEASES WHEN THE RESULTING MODEL FORMULA IS OF THE FORM: Y#=#C. ^MOREOVER A TEST IS MADE (UNDER THE USUAL ASSUMPTIONS) WHETHER THE OMITTED TERMS DID CONTRIBUTE SIGNIFICANTLY TO THE REGRESSION SUM OF SQUARES (CF.#SECTION 1.3.2.4). ^TO OPTION 5 A SPECIFIER LIST MAY BE APPENDED, TO PREVENT THE PRO- DUCTION OF WASTE OUTPUT FOR UNWANTED SUBMODELS. ^IN THIS LIST THE NUMBER OF TERMS TO BE OMITTED FROM THE MODEL FORMULA (COUNTING BACKWARDS, STARTING AT THE END) MUST BE GIVEN ENCLOSED IN PARENTHESES. ^FOR EXAMPLE THE OPTION: PROCESS SUBMODELS (6, 10) INSTRUCTS THE PROGRAM TO PROCESS ONLY TWO SUBMODELS, ONE WITH THE LAST SIX TERMS OMITTED AND ONE WITH THE LAST TEN TERMS OMITTED (FROM THE ORIGINAL MODEL#FORMULA). ^IF THE USER ASKS FOR MORE TERMS TO BE OMITTED THAN ARE PRESENT IN THE MODEL FORMULA, AN ERROR MESSAGE IS SUPPLIED AND THE EXECUTION OF THAT JOB IS TERMINATED. ^MOREOVER, IF NO EXPLICIT SPECIFIER LIST IS APPENDED TO OPTION 5, THE OPTIONS 2 AND 3 YIELD NO EFFECT (EVEN IF SPECIFIED), WHICH IS ALSO TO PREVENT THE PRODUCTION OF WASTE OUTPUT FOR THE SUBMODELS. ^OPTION 7 GIVES THE USER THE OPPORTUNITY TO PROCESS CONSECUTIVE PIECES OF INPUT DATA IN CONSECUTIVE JOBS. ^NORMALLY THE PROCESSING OF THE INPUT DATA FOR EACH JOB STARTS WITH THE FIRST NUMBER IN THE DATA SPECIFICATION (OR WITH THE FIRST NUMBER IN THE DATASTREAM), AND THE PROGRAM GIVES A (WARNING) MESSAGE IF THE INPUT FORMULA DOES NOT MATCH THE INPUT DATA PRECISELY. ^THIS OPTION DISENGAGES THE MESSAGE AND CAUSES THE PROGRAM TO CONTINUE PROCESSING INPUT DATA WHERE THE PREVIOUS JOB HAD FINISHED. ^OPTION 8 CAUSES THE RESIDUAL DEGREES OF FREEDOM AND RESIDUAL SUM OF SQUARES FROM THE CURRENT JOB TO BE SAVED, IN ORDER TO BE ABLE IN THE NEXT JOB, BY MEANS OF SPECIFYING OPTION 9, TO TEST WHETHER THE MODEL UNDER CONSIDERATION IN THAT NEXT JOB, SHOWS A SIGNIFICANT INCREASE IN RESIDUAL SUM OF SQUARES IN COMPARISON WITH THE MODEL IN THE PREVIOUS JOB. ^IN EFFECT THIS GIVES THE POSSIBILITY OF TESTING A HYPOTHESIS CONCERNING A LINEAR COMBINATION OF THE PARAMETERS FROM A MODEL (CF.#SECTION 1.3.2.5), FOR INSTANCE (CF.# * [CORRELATION ELEMENT], .BR;.I9;S, S * (T, T * [ESTIMATE], .BR;.I17;Q, _ * [COVARIANCE ELEMENT], .BR;.I17;R, R * (6 * [RESIDUAL ELEMENT]) ); ^FOR THE ORIGINAL MODEL THE FOLLOWING RELATIONS HOLD:#Q#=#T, T#=#M-1, R#=#N AND P#=#M (OR P#=#M-1 IF REPLICATIONS AND/OR WEIGHTS ARE SPECIFIED); S IS THE NUMBER OF PROCESSED (SUB)MODELS;#FOR EACH SUBMODEL T AND Q ARE DECREASED WITH THE NUMBER OF TERMS THAT ARE OMITTED FROM THE ORIGINAL MODEL. ^REAL NUMBERS IN THE PRINTED OUTPUT ARE GIVEN IN FIXED POINT FORMAT WITH A SIX DECIMAL FRACTIONAL PART, THE ONLY EXCEPTIONS ARE THE ESTIMATES FOR THE REGRESSION PARAMETERS WITH THEIR STANDARD DEVIATIONS, WHICH HAVE A TEN DECIMAL FRACTIONAL PART AND THE NUMBERS IN THE LISTINGS OF THE INPUT DATA AND THE TRANSFORMED DATA MATRIX, WHICH HAVE A THREE DECIMAL FRACTIONAL PART. ^REAL NUMBERS IN THE DATA OUTPUT ARE GIVEN IN FLOATING POINT FORMAT WITH A SIXTEEN DECIMAL MANTISSA AND A TWO DECIMAL EXPONENT PART. .SK2;3.3##^^ERROR MESSAGES\\ .SK;^ERROR MESSAGES AGAINST SYNTAX OR SEMANTICS HAVE THE FOLLOWING LAYOUT: .SK;.C;^ERROR#:#_ OR _ .SK;^THE ERROR TEXT CORRESPONDING TO THE ERROR NUMBERS IS: .SK;.NO FILL;.NO JUSTIFY;.TAB STOPS 5; 1 ^NO INPUT DATA GIVEN. 2 ^ALL INPUT DATA HAS BEEN SKIPPED. 3 ^ATTEMPT TO PROCESS MORE INPUT DATA THAN PROVIDED. 4 ^NUMBER IN THE INPUT DATA IS INCORRECT OR TOO LARGE. 5 ^IN A NUMBER '.' IS NOT FOLLOWED BY A DIGIT. 6 ^IN A NUMBER '_#' IS NOT FOLLOWED BY '+', '-' OR A DIGIT. .SK 10 ^NO MODEL FORMULA GIVEN. 11 ^LEFT HAND PART IS NOT FOLLOWED BY '='. 12 ^EXPRESSION IS NOT FOLLOWED BY ')'. 13 ^OPTION NAME USED IN A PRIMARY IN AN EXPRESSION. 14 ^INCORRECT PRIMARY IN A FACTOR IN AN EXPRESSION. 15 ^INCORRECT (CONTROL) IDENTIFIER IN AN EXPRESSION. 16 ^PARAMETER LIST OF A STANDARD FUNCTION IS NOT FOLLOWED BY ')'. 17 ^STANDARD FUNCTION CALL WITH INCORRECT NUMBER OF PARAMETERS. .SK 20 ^NO INPUT FORMULA GIVEN. 21 ^EXPRESSION IN A CONTROL IS NOT FOLLOWED BY '_>'. 22 ^OPTION NAME USED IN A CONTROL IN AN INPUT STATEMENT. 23 ^INPUT STATEMENT IN A DESCRIPTION IS NOT FOLLOWED BY ')'. 24 ^VARIABLE LIST IN A DESCRIPTION IS NOT FOLLOWED BY ']'. 25 ^INCORRECT DESCRIPTION IN AN INPUT STATEMENT. 26 ^INCORRECT IDENTIFIER IN A VARIABLE LIST. 27 ^ITEM IN A VARIABLE LIST IS NOT AN IDENTIFIER. .SK 30 ^INCORRECT OPTION NUMBER IN AN OPTION STATEMENT. 31 ^INCORRECT OPTION NAME IN AN OPTION STATEMENT. 32 ^SPECIFIER LIST IS NOT FOLLOWED BY ')'. 33 ^NUMBER IN A SPECIFIER LIST IS INCORRECT OR TOO LARGE. 34 ^SPECIFIER LIST IS APPENDED TO INCORRECT OPTION. 35 ^SPECIFIER IS NOT A NUMBER. 36 ^SPECIFICATION IS NOT PROPERLY CONTINUED. 37 ^SPECIFICATION IS NOT TERMINATED WITH ';'. .SK 40 ^NO DEFINED (INDEPENDENT) IDENTIFIER TO THE RIGHT OF '='. 41 ^INCORRECT USE OF A PARAMETER IN A REGRESSION TERM. 42 ^UNDEFINED (WEIGHT) IDENTIFIER TO THE LEFT OF '='. 43 ^UNDEFINED (DEPENDENT) IDENTIFIER TO THE LEFT OF '='. 44 ^NUMBER IN A REGRESSION TERM IS INCORRECT OR TOO LARGE. 45 ^TERM DOES NOT HAVE THE FORM: PARAM * FACTOR OR FACTOR * PARAM. 46 ^UNDEFINED (INDEPENDENT) IDENTIFIER IN A REGRESSION TERM. 47 ^NO REGRESSION PARAMETER IN A REGRESSION TERM. .SK 50 ^DIVISION BY ZERO. 51 ^INTEGER DIVISION BY ZERO. 52 ^OBSERVATION FOR DEPENDENT VARIABLE IS IN ABSOLUTE VALUE TOO LARGE. 53 ^OBSERVATION FOR INDEPENDENT VARIABLE IS IN ABSOLUTE VALUE TOO LARGE. 54 ^EXPONENTIATION WITH ZERO BASE AND NON POSITIVE EXPONENT. 55 ^EXPONENTIATION WITH NEGATIVE BASE AND REAL EXPONENT. 56 ^WEIGHT FACTOR IS NOT POSITIVE. .SK 60 ^ARGUMENT OF '^SQRT' IS NEGATIVE. 61 ^ARGUMENT OF '^LN' IS NOT POSITIVE. 62 ^ARGUMENT OF '^LOG' IS NOT POSITIVE. 63 ^ARGUMENT OF '^EXP' IS TOO LARGE. 64 ^ARGUMENT OF '^ARCSIN' IS IN ABSOLUTE VALUE LARGER THAN ONE. 65 ^ARGUMENT OF '^ARCCOS' IS IN ABSOLUTE VALUE LARGER THAN ONE. 66 ^ARGUMENT OF '^SINH' IS IN ABSOLUTE VALUE TOO LARGE. 67 ^ARGUMENT OF '^COSH' IS IN ABSOLUTE VALUE TOO LARGE. .SK 70 ^NUMBER OF OBSERVATIONS FOR THE FIRST DEPENDENT VARIABLE IS ZERO. 71 ^NUMBERS OF OBSERVATIONS FOR THE DEPENDENT VARIABLES ARE NOT EQUAL. 72 ^NUMBER OF OBSERVATIONS FOR THE FIRST INDEPENDENT VARIABLE IS ZERO. 73 ^NUMBERS OF OBSERVATIONS FOR THE INDEPENDENT VARIABLES ARE NOT EQUAL. 74 ^CONTROL READS AN INCORRECT NUMBER IN THE INPUT DATA. 75 ^NUMBERS OF REPLICATIONS FOR THE DEPENDENT VARIABLES ARE NOT EQUAL. 76 ^GIVEN, READ OR COMPUTED REPLICATION FACTOR IS NOT INTEGRAL. .FILL;.JUSTIFY;.SK ^IF THE ERROR NUMBER LIES BETWEEN: .BR;.LM+11;.I-11;#5#AND#37, IT IS FOLLOWED BY THE MOST RECENTLY PROCESSED IDENTIFIER, NUMBER AND SYMBOL. ^ONLY THE FIRST EIGHT CHARACTERS OF EACH NAME ARE DISPLAYED. .BR;.I-11;41#AND#47, IT IS FOLLOWED BY THE NUMBER OF THE RIGHT HAND PART REGRESSION TERM WHICH CAUSES THE ERROR, OR A ZERO IF THE LEFT HAND PART IS AT FAULT. .BR;.I-11;50#AND#67, IT IS FOLLOWED BY THE WRONG VALUE AND THE NUMBER OF THE LINE IN THE TRANSFORMED DATA MATRIX WHICH CAUSES THE ERROR. ^INSTEAD OF THE WRONG VALUE, THE NUMBER OF THE RIGHT HAND PART REGRESSION TERM WHICH CAUSES THE ERROR IS DISPLAYED WHEN THE ERROR NUMBER LIES BETWEEN 50 AND 53. .BR;.I-11;70#AND#76, IT IS FOLLOWED BY THE CHECK VALUE AND THE WRONG VALUE. ^INSTEAD OF THE WRONG VALUE, THE VALUE OF THE CONTROLLING VARIABLE OF THE NEXT ENCLOSING REPETITION LOOP IS DISPLAYED WHEN THE ERROR NUMBER IS 76. .LM-11;.SK2;3.4##, {, }, [, ], ( AND ). ^THE#,#AND#.#ARE PART OF THE METALANGUAGE ^ENGLISH IN WHICH WE ARE DESCRIBING ::= ['+'#|#'-'] _ { ('+'#|#'-') _ } ^THE METASYMBOLS _< AND _> ARE USED AS DELIMITERS TO ENCLOSE THE NAME OF A CLASS. ^THE METASYMBOL#::=#MAY BE READ AS 'IS DEFINED AS' OR AS 'CONSISTS OF'. ^THE METASYMBOL#|#IS READ AS 'OR'. ^REPETITION IS DENOTED BY CURLY BRACKETS, I.E.#{#A#} STANDS FOR E#|#A#|#AA#|#... ^OPTIONALITY IS EXPRESSED BY SQUARE BRACKETS, I.E. [#A#] STANDS FOR E#|#A. ^PARENTHESES MERELY SERVE FOR GROUPING (FACTORIZATION) I.E. (A#|#B)#C STANDS FOR AB#|#AC. ^TERMINAL SYMBOLS APPEAR ENCLOSED IN SINGLE APOSTROPHES. ^THE ABOVE PHRASE DEFINES AN EXPRESSION AS A TERM, OPTIONALLY PRECEEDED BY A '+' OR A '-' AND FOLLOWED BY AN ARBITRARY REPETITION OF TERMS, EACH PRECEEDED BY A '+' OR A '-'. .SK;^THE SYNTAX OF A USER PROGRAM CAN THUS BE DEFINED AS FOLLOWS: .PAGE;.NO FILL;.NO JUSTIFY; .BR;_::= ^^'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'| .BR;.I12;'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'|\\ .BR;.I12;'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'| .BR;.I12;'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z' .SK;_::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' .SK;_::= '"^MODEL"' | '"::= '"^INPUT"' | '"::= '"^OPTIONS"' | '"::= '"^DATA"' | '"::= '"^RUN"' | '"::= '"^EXIT"' | '"::= '^ABS' | '^SIGN' | '^SQRT' | '^SIN' | '^COS' | '^TAN' | .BR;.I19;'^LN' | '^LOG' | '^EXP' | '^ENTIER' | '^ROUND' | '^MOD' | .BR;.I19;'^MIN' | '^MAX' | '^ARCSIN' | '^ARCCOS' | '^ARCTAN' | .BR;.I19;'^SINH' | '^COSH' | '^TANH' | '^INDICATOR' .SK;_