Multiple Linear Regression Analysis EXAMPLES ********************************** * Example 1 originates from: * * DE JONGE [4], pp. 472 & 479. * ********************************** "Model" y = c * Log (x) + a + b * x; "Input" 5 * ([x], 10 * [y]); "Options" Transformed data matrix, Correlation matrix, Residual analysis, Process submodels (1, 2); Transformed data matrix ======================= obs.no. c a b dep.var. repeats 1 1.398 1.000 25.000 0.790 10.000 2 1.699 1.000 50.000 0.984 10.000 3 1.903 1.000 80.000 1.058 10.000 4 2.114 1.000 130.000 1.163 10.000 5 2.255 1.000 180.000 1.209 10.000 Control information =================== transformed variable denoted by parameter mean standard deviation minimum maximum c 1.873843 0.306746 1.397940 2.255273 a 1.000000 0.000000 1.000000 1.000000 b 93.000000 56.387870 25.000000 180.000000 dep.var. 1.040800 0.163655 0.670000 1.330000 Number of observations : 5 Correlation matrix of the variables =================================== c a b dep.var. c 1.000000 a * 1.000000 b 0.962417 * 1.000000 dep.var. 0.907742 * 0.849838 1.000000 Multiple correlation coefficient 0.911959 (adjusted 0.908023) ================================ Proportion of variation explained 0.831669 (adjusted 0.824506) ================================= Standard deviation of the error term 0.068558 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability c 0.6499168512 0.1175695440 30.558070 0.000001 a -0.0899819314 0.1641470240 0.300500 0.586163 b -0.0009361326 0.0006395700 2.142390 0.149935 Correlation matrix of the estimates =================================== c a b c 1.000000 a -0.993392 1.000000 b -0.962417 0.929333 1.000000 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 50 55.475600 --------------------------------------------------------------------------------------------------------------- mean 1 54.163232 54.163232 11523.444701 0.000000 regression 2 1.091456 0.545728 116.105776 0.000000 residual 47 0.220912 0.004700 --------------------------------------------------------------------------------------------------------------- lack of fit 2 0.005012 0.002506 0.522336 0.596686 pure error 45 0.215900 0.004798 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : c = b = 0 Residual analysis ================= standardized studentized obs.no. observation fitted value standard deviation residual residual residual 1 0.790000 0.795160 0.020789 -0.005160 -0.118992 -0.078976 2 0.984000 0.967401 0.013590 0.016599 0.382824 0.247021 3 1.058000 1.071978 0.015165 -0.013978 -0.322363 -0.209059 4 1.163000 1.162208 0.012847 0.000792 0.018260 0.011757 5 1.209000 1.207254 0.019954 0.001746 0.040272 0.026622 sum of residuals : 0.000000 Upper bound for the right tail probability of the largest absolute studentized residual (no. 2) : 1.000000 Control information - submodel 1 =================== transformed variable denoted by parameter mean standard deviation minimum maximum b omitted c 1.873843 0.306746 1.397940 2.255273 a 1.000000 0.000000 1.000000 1.000000 dep.var. 1.040800 0.163655 0.670000 1.330000 Number of observations : 5 Multiple correlation coefficient 0.907742 (adjusted 0.905720) ================================ Proportion of variation explained 0.823996 (adjusted 0.820329) ================================= Standard deviation of the error term 0.069370 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability c 0.4842988398 0.0323066319 224.720913 0.000000 a 0.1332999205 0.0613273100 4.724458 0.034701 Correlation matrix of the estimates =================================== c a c 1.000000 a -0.987122 1.000000 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 50 55.475600 --------------------------------------------------------------------------------------------------------------- mean 1 54.163232 54.163232 11255.564569 0.000000 regression 1 1.081386 1.081386 224.720852 0.000000 residual 48 0.230982 0.004812 --------------------------------------------------------------------------------------------------------------- lack of fit 3 0.015082 0.005027 1.047838 0.380681 pure error 45 0.215900 0.004798 --------------------------------------------------------------------------------------------------------------- reduction 1 0.010070 0.010070 2.142390 0.149935 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : c = 0 (in the reduced model) reduction null hypothesis : b = 0 (in the original model) Residual analysis ================= standardized studentized obs.no. observation fitted value standard deviation residual residual residual 1 0.790000 0.810321 0.018238 -0.020321 -0.378175 -0.303615 2 0.984000 0.956109 0.011321 0.027891 0.519060 0.407526 3 1.058000 1.054964 0.009856 0.003036 0.056497 0.044211 4 1.163000 1.157080 0.012506 0.005920 0.110169 0.086758 5 1.209000 1.225526 0.015751 -0.016526 -0.307551 -0.244617 sum of residuals : -0.000000 Upper bound for the right tail probability of the largest absolute studentized residual (no. 2) : 1.000000 Control information - submodel 2 =================== transformed variable denoted by parameter mean standard deviation minimum maximum a omitted b omitted c 1.873843 0.306746 1.397940 2.255273 dep.var. 1.040800 0.163655 0.670000 1.330000 Number of observations : 5 There is no constant independent variable in the transformed (sub)model (message) Multiple correlation coefficient 0.997711 (adjusted 0.997664) ================================ Proportion of variation explained 0.995427 (adjusted 0.995333) ================================= Standard deviation of the error term 0.071958 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability c 0.5536156656 0.0053607978 10664.926934 0.000000 Correlation matrix of the estimates =================================== c c 1.000000 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 50 55.475600 --------------------------------------------------------------------------------------------------------------- regression 1 55.221883 55.221883 10664.926934 0.000000 residual 49 0.253717 0.005178 --------------------------------------------------------------------------------------------------------------- lack of fit 4 0.037817 0.009454 1.970525 0.115263 pure error 45 0.215900 0.004798 --------------------------------------------------------------------------------------------------------------- reduction 2 0.032804 0.016402 3.489644 0.038633 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : c = 0 (in the reduced model) reduction null hypothesis : a = b = 0 (in the original model) Residual analysis ================= standardized studentized obs.no. observation fitted value standard deviation residual residual residual 1 0.790000 0.773921 0.007494 0.016079 0.249818 0.224666 2 0.984000 0.940576 0.009108 0.043424 0.674690 0.608353 3 1.058000 1.053580 0.010202 0.004420 0.068669 0.062046 4 1.163000 1.170312 0.011332 -0.007312 -0.113612 -0.102902 5 1.209000 1.248554 0.012090 -0.039554 -0.614569 -0.557614 sum of residuals : 0.170553 Upper bound for the right tail probability of the largest absolute studentized residual (no. 2) : 1.000000 End of job : 1 ********************************** * Example 2 originates from: * * SEARLE [11], pp. 121-123 * ********************************** "Model 1" y = a3 * x3 + a2 * x2 + a1 * x1; "Input" 5 * [y, x1, x2, x3]; "Options" Save original model, Process submodels (1); Control information =================== transformed variable denoted by parameter mean standard deviation minimum maximum a3 3.400000 1.949359 1.000000 6.000000 a2 1.000000 2.549510 -3.000000 4.000000 a1 1.000000 1.224745 -1.000000 2.000000 dep.var. 9.000000 2.236068 6.000000 12.000000 Number of observations : 5 There is no constant independent variable in the transformed (sub)model (message) Multiple correlation coefficient 0.936662 (adjusted 0.832670) ================================ Proportion of variation explained 0.877336 (adjusted 0.693339) ================================= Standard deviation of the error term 5.105507 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability a3 2.5446171560 0.9982125895 6.498286 0.125553 a2 0.2665515256 1.0423373169 0.065395 0.822061 a1 -1.3851468048 2.3646149361 0.343140 0.617320 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 5 425.000000 --------------------------------------------------------------------------------------------------------------- regression 3 372.867588 124.289196 4.768212 0.178233 residual 2 52.132412 26.066206 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : a3 = a2 = a1 = 0 Control information - submodel 1 =================== transformed variable denoted by parameter mean standard deviation minimum maximum a1 omitted a3 3.400000 1.949359 1.000000 6.000000 a2 1.000000 2.549510 -3.000000 4.000000 dep.var. 9.000000 2.236068 6.000000 12.000000 Number of observations : 5 There is no constant independent variable in the transformed (sub)model (message) Multiple correlation coefficient 0.925359 (adjusted 0.872057) ================================ Proportion of variation explained 0.856290 (adjusted 0.760483) ================================= Standard deviation of the error term 4.512086 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability a3 2.1052066559 0.5820385900 13.082354 0.036325 a2 0.4159957059 0.8931663715 0.216927 0.673122 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 5 425.000000 --------------------------------------------------------------------------------------------------------------- regression 2 363.923242 181.961621 8.937686 0.054479 residual 3 61.076758 20.358919 --------------------------------------------------------------------------------------------------------------- reduction 1 8.944346 8.944346 0.343140 0.617320 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : a3 = a2 = 0 (in the reduced model) reduction null hypothesis : a1 = 0 (in the original model) End of job : 2 "Model 2" y - 4 * x1 = b2 * (x1 + x2) + b3 * x3; (eqn. 118, p. 121) "Input" 5 * [y, x1, x2, x3]; "Options" Test reduced model, Transformed data matrix; Transformed data matrix ======================= obs.no. b2 b3 dep.var. 1 3.000 4.000 0.000 2 1.000 1.000 14.000 3 -2.000 4.000 5.000 4 3.000 2.000 -2.000 5 5.000 6.000 8.000 Control information =================== transformed variable denoted by parameter mean standard deviation minimum maximum b2 2.000000 2.645751 -2.000000 5.000000 b3 3.400000 1.949359 1.000000 6.000000 dep.var. 5.000000 6.403124 -2.000000 14.000000 Number of observations : 5 There is no constant independent variable in the transformed (sub)model (message) Proportion of variation explained 0.293057 (adjusted -0.178239) ================================= Standard deviation of the error term 8.252406 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability b2 -0.2325836533 1.6513864104 0.019836 0.896920 b3 1.1991223258 1.3390842284 0.801883 0.436516 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 5 289.000000 --------------------------------------------------------------------------------------------------------------- regression 2 84.693363 42.346681 0.621811 0.594397 residual 3 204.306637 68.102212 --------------------------------------------------------------------------------------------------------------- reduction 1 152.174225 152.174225 5.837989 0.136963 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : b2 = b3 = 0 End of job : 3 **************************************** * Example 3 originates from: * * AFIFI & AZEN [1], pp. 88 & 93-100. * **************************************** "Model" y = alfa0 + alfa1 * x; "Input" 5 * ([x], n, n * [y]); "Option" Transformed data matrix, Print input data; "Data" 1.000 4.000 1.100 0.700 1.800 0.400 3.000 5.000 3.000 1.400 4.900 4.400 4.500 5.000 3.000 7.300 8.200 6.200 10.000 4.000 12.000 13.100 12.600 13.200 15.000 4.000 18.700 19.700 17.400 17.100 Transformed data matrix ======================= obs.no. alfa0 alfa1 dep.var. repeats 1 1.000 1.000 1.000 4.000 2 1.000 3.000 3.640 5.000 3 1.000 5.000 7.233 3.000 4 1.000 10.000 12.725 4.000 5 1.000 15.000 18.225 4.000 Control information =================== transformed variable denoted by parameter mean standard deviation minimum maximum alfa0 1.000000 0.000000 1.000000 1.000000 alfa1 6.700000 5.262579 1.000000 15.000000 dep.var. 8.385000 6.545571 0.400000 19.700000 Number of observations : 5 Multiple correlation coefficient 0.987051 (adjusted 0.986326) ================================ Proportion of variation explained 0.974269 (adjusted 0.972840) ================================= Standard deviation of the error term 1.078736 ==================================== Regression parameters ===================== right tail parameter estimate standard deviation F - ratio probability alfa0 0.1594830832 0.3968072487 0.161536 0.692478 alfa1 1.2276890919 0.0470261690 681.549798 0.000000 Analysis of variance ==================== source of right tail variation df sum of squares mean square F - ratio probability --------------------------------------------------------------------------------------------------------------- total 20 2220.210000 --------------------------------------------------------------------------------------------------------------- mean 1 1406.164500 1406.164500 1208.387113 0.000000 regression 1 793.099430 793.099430 681.549798 0.000000 residual 18 20.946070 1.163671 --------------------------------------------------------------------------------------------------------------- lack of fit 3 4.252403 1.417468 1.273658 0.319196 pure error 15 16.693667 1.112911 --------------------------------------------------------------------------------------------------------------- regression null hypothesis : alfa1 = 0 End of job : 4 *** Marten van Gelderen; Mathematisch Centrum ***