.LM0;.RM75;.TS72;.LC;.AP;.FLAG CAPITAL;.NO PAGING;.NO NUMBER;# .BR;^MULTIPLE ^LINEAR ^REGRESSION ^ANALYSIS .SK;^^THE REGRESSION MODEL\\ ^IN A REGRESSION PROBLEM THE RESEARCHER POSTULATES A CERTAIN RELATION- SHIP BETWEEN A RANDOM VARIABLE Y (THE REALIZATIONS OF WHICH ARE SUBJECT TO SOME FORM OF DISTURBANCE) ON THE ONE SIDE AND A NUMBER OF VARIABLES X1,...,XP (WHICH ARE WITHOUT OR AT LEAST ALMOST WITHOUT DISTURBANCES) ON THE OTHER SIDE. ^THIS RELATIONSHIP IS EXPRESSED BY A MATHEMATICAL FORMULA, WHICH IS CALLED THE (LINEAR) REGRESSION MODEL, FOR INSTANCE: .TS72;.SK;.I18;Y = A0 + A1 * X1 +#...#+ AP * XP + E (1) .SK;IN WHICH A0,...,AP REPRESENT UNKNOWN REGRESSION COEFFICIENTS (PARAMETERS) WHICH ARE TO BE ESTIMATED AND E REPRESENTS THE DISTURBANCE. ^IF A CONSTANT TERM IS PRESENT IN THE MODEL FORMULA (IN (1) THE A0), THE MODEL IS SAID TO BE AN 'INTERCEPT#MODEL', IF NO CONSTANT TERM IS PRESENT, THE MODEL IS CALLED A 'NO-INTERCEPT#MODEL'. .BR;^THE VARIABLES X1,...,XP AND THE VARIABLE Y CAN ALSO REPRESENT (OTHER) TRANSFORMED VARIABLES. ^THE RESEARCHER MIGHT HAVE REASONS TO BELIEVE (FROM BACKGROUND INFORMATION CONCERNING THE EXPERIMENT) THAT TRANSFORMATIONS ARE NECESSARY, FOR INSTANCE: .BR;1)#TO OBTAIN NORMALLY DISTRIBUTED DISTURBANCES, .BR;2)#TO OBTAIN A GREATER HOMOGENEITY OF THE VARIANCES OF THE DISTURBANCES, .BR;3)#TO LINEARIZE NON-LINEAR REGRESSION MODELS (IF POSSIBLE). .BR;^THE TRANSFORMED REGRESSION MODEL CAN BE WRITTEN AS: .SK;.I5;^G(Y) = A0 + A1 * ^F1(X1,...,XM) +#...#+ AP * ^FP(X1,...,XM) + E (2) .SK;IN WHICH ^G, ^F1,...,^FP REPRESENT THE TRANSFORMATIONS, .BR;.I12;A0,...,AP REPRESENT THE PARAMETERS TO BE ESTIMATED, .BR;.I20;Y REPRESENTS THE DEPENDENT VARIABLE, .BR;.I12;X1,...,XM REPRESENT THE INDEPENDENT VARIABLES, .BR;.I20;E REPRESENTS THE DISTURBANCE. ^THE CHOICE OF A TRANSFORMATION BY MEANS OF 'TRIAL AND ERROR' IS RATHER TIME CONSUMING AND COSTLY. ^THE IMPORTANCE OF THE LOCATION PARAMETER MAKES FOR THE DIFFICULTY. ^IT IS NOT UNUSUAL THAT ^LOG#(X) YIELDS NO IMPROVEMENT, BUT THAT ^LOG#(C+X) GIVES BETTER RESULTS FOR A PARTICULAR CHOICE OF C. ^BECAUSE THIS HOLDS FOR ALMOST ANY TRANSFORMATION OF SOME IMPORTANCE, WE MUST ACTUALLY SOLVE IN EACH CASE A NONLINEAR ADJUSTMENT PROBLEM. ^OFTEN THOUGH, A SIMPLE FORM OF THE TRANSFORMATION IS SUGGESTED BY THE RESEARCHER WHO IS BETTER ACQUAINTED WITH THE PECULIARITIES OF THE EXPERIMENT. .SK2;^^LEAST SQUARES\\ ^REGRESSION ANALYSIS CONSISTS IN FACT OF THE ADJUSTMENT OF A HYPERPLANE OF THE REQUIRED DIMENSION TO THE DATA. ^THE FITTING IS DONE WITH THE METHOD OF LEAST SQUARES, WHICH MEANS THAT THE SUM OF THE SQUARES OF THE DIFFERENCES BETWEEN THE OBSERVED VALUES FOR Y AND THE ESTIMATED VALUES FOR THE EXPECTATION OF Y, ARE MINIMIZED. ^THIS SUM OF SQUARES IS ALSO CALLED THE RESIDUAL SUM OF SQUARES. .BR;^IN MATRIX NOTATION THE REGRESSION MODEL CAN BE WRITTEN AS: .SK;.I30;^Y = ^XA + E (3) .SK;IN WHICH ^Y IS A (N*1) RANDOM VECTOR OF OBSERVATIONS, .BR;.I9;^X IS A (N*P) MATRIX OF KNOWN (FIXED) VALUES, .BR;.I9;A IS A (P*1) VECTOR OF (UNKNOWN) PARAMETERS, .BR;.I5;AND E IS A (N*1) RANDOM VECTOR OF DISTURBANCES. .SK;^IT IS SUPPOSED THAT ^E(E)#=#0 AND VAR(E)#=#^ISIGMA_^2, IN WHICH ^I IS THE UNIT MATRIX, THUS: .SK;.I31;^E(^Y) = ^XA (4) ^THE SUM OF SQUARES OF THE DIFFERENCES BETWEEN THE OBSERVED VALUES OF ^Y AND THE ESTIMATED VALUES FOR THE EXPECTATION OF ^Y THUS EQUALS: .SK;.I17;(^Y-^XA)'(^Y-^XA) =