.PAPER SIZE 60,80 .LEFT MARGIN 4 .RIGHT MARGIN 70 .SPACING 1 .LITERAL .END LITERAL .CENTER ;WELCOME .CENTER ;TO .CENTER ;SAMSTAT .BLANK 4 .CENTER ;IF YOU HAVE ANY QUESTIONS, CALL .CENTER ;DR. PHELPS CRUMP OR DON COSGROVE .CENTER ; AT 2818 .PAGE .CENTER ;SAMSTAT CONTENTS .BLANK 4 .LITERAL ANALYSIS CALL CODE EDIT/PRINT DATA EDIT TRANSFORMATION OF DATA TRANS ELEMENTARY STATISTICS ELEM CROSS TABULATION CROSS CORRELATION MATRIX CORR RANK CORRELATION RANK CHI-SQUARED CHI T-TEST T ANALYSIS OF VARIANCE ANOVA (one-way and two-way) LINEAR REGRESSION REG STEPWISE/MULTIPLE REGRESSION STEP SCATTER PLOT SCAT HISTOGRAM HIST END OF SESSION EXIT .END LITERAL .PAGE .CENTER ; General Steps in Using SAMSTAT .LITERAL 1. Log on: ---------------------------- |>HEL | | Account or Name: Fuchs | | Password: - - - | ---------------------------- .END LITERAL .BLANK 3 2. Invoke SAMSTAT .LITERAL ---------- | | | >SST | | | ---------- This command gets you directly into SAMSTAT 3. Follow the Conversation Mode of SAMSTAT responding to the question with a Y (yes) or N (no), or select the desired option code, followed by the Return key. 4. LOG OFF: --------- | >BYE | | | --------- .END LITERAL .PAGE .CENTER ;DATA INPUT TO SAMSTAT .BLANK 3 When the user enters SAMSTAT, the program will ask for the input and output files. These files are areas of disk storage. The input file will be the source of the data used in the analyses. The output file will be used to save the data. The user should enter the two file names separated by a comma: .LITERAL -------------------------- | | | INPUT.SST , OUTPUT.SST | | | -------------------------- .END LITERAL .BLANK 2 If new data is to be read from the keyboard instead of from an existing file, an * is used for the input file: .LITERAL ------------------- | | | * , OUTPUT.SST | | | ------------------- .END LITERAL File names must start with an alpha(A-Z) character and may contain numeric (0-9) characters (no special charcters ). The "name" part of the file is limited to six characters and should be followed by .LITERAL ------------- | | | .SST | | | ------------- .END LITERAL .PAGE .CENTER ;DATA ENTRY .BLANK .BLANK 3 Data is stored in matrix form. The user specifies the number of rows and columns in the matrix. .BLANK 2 The maxium size of the matrix is 250 rows and 15 columns. .BLANK 2 The data is typed one row at a time, numbers separatd by a comma. .BLANK 2 If any observations are missing, an "X" should be typed for the missing data value. - SAMSTAT does not currently estimate missing data, but will recognize and delete the missing values in the computations. Where appropriate, the entire row with missing data will be excluded from the analyses. .PAGE .CENTER ;LEGAL NUMBERS .BLANK 4 Integers, Decimal Numbers, Negative Numbers .blank 3 No Commas (for example 1,000) .blank 3 .LITERAL Scientific Notation 12300=1.23E4 The number following the E is the exponent of 10 1.23E4 = 1.23 x 10 ** 4 = 1.23 x 10000= 12300 .END LITERAL .blank 3 The number following E should never exceed 19 or be less than -19 for SAMSTAT .PAGE .CENTER ;EMERGENCY EXIT .CENTER ;FROM SAMSTAT .BLANK 4 .CENTER ;CTR/C (then the return key) .BLANK 2 .CENTER ;>ABO SST .BLANK 2 .CENTER ;Later delete "junk" files that .CENTER ;will appear in your directory .PAGE .CENTER ;AFTER DATA ENTRY .BLANK 3 Print Option (0 = No print, 1 = entire matrix, 2 = submatrix) .BLANK 2 Elementary Statistics will now be Calculated.... .BLANK 2 Does the matrix now contain the data you plan to save at the end of this session? .BLANK 1 .LEFT MARGIN 8 If you plan to keep your file for later use (in your directory), you must reply 'Y' or 'YES' to this prompt. Reply 'YES' only after you have edited or transformed the data. This command puts the data onto the disk under your file output name. At the end of the session, SAMSTAT will then ask you if you want to keep this file. You must reply 'Y' to both questions if you want to keep the file you have created! .LEFT MARGIN 4 .BLANK 2 Enter Analysis To Be Performed .INDENT 4 Type help for a list of analysis codes. .PAGE EXAMPLE: .LITERAL RSX-11M BL22 MULTI-USER SYSTEM GOOD MORNING >SST SAMSTAT 10:14 AM 08-MAY-79 PLEASE ENTER NAMES OF INPUT AND OUTPUT FILES FORM: IN,OUT ( *,OUT DENOTES DATA ENTRY MODE) -> ? STE.SST,WORK.SST STE.SST CONTAINS 13 ROWS AND 5 COLUMNS OF DATA ENTER PRINT OPTION (0=NO PRINT, 1=ENTIRE MATRIX, 2=SUBMATRIX)? 1 ROW COLUMN 1 2 3 4 5 1 7.0000 26.0000 6.0000 60.0000 78.5000 2 1.0000 29.0000 15.0000 52.0000 74.3000 3 11.0000 56.0000 8.0000 20.0000 104.3000 4 11.0000 31.0000 8.0000 47.0000 87.6000 5 7.0000 52.0000 6.0000 33.0000 95.9000 6 11.0000 55.0000 9.0000 22.0000 109.2000 7 3.0000 71.0000 17.0000 6.0000 102.7000 8 1.0000 31.0000 22.0000 44.0000 72.5000 9 2.0000 54.0000 18.0000 22.0000 93.1000 10 21.0000 47.0000 4.0000 26.0000 115.9000 11 1.0000 40.0000 23.0000 34.0000 83.8000 12 11.0000 66.0000 9.0000 12.0000 113.3000 13 10.0000 68.0000 8.0000 12.0000 109.4000 ELEMENTARY STATISTICS WILL NOW BE CALCULATED DOES THE MATRIX NOW CONTAIN THE DATA YOU PLAN TO SAVE AT THE END OF THIS SESSION (Y/N)? N DO YOU WANT TO EDIT OR PRINT YOUR DATA? (Y,N)? Y .END LITERAL .PAGE .CENTER ;EDIT .CENTER ;OPTIONS AND CODES .BLANK 3 .LITERAL 0 - No more editing is desired 1 - Print entire matrix 2 - Print submatrix 3 - Delete a column 4 - Add a column 5 - Delete a row 6 - Add a row 7 - Replace an entire row 8 - Replae an individual value .END LITERAL .BLANK 2 NOTE: On successive deletions, proceed from the highest to the .INDENT 7 lowest numbered columns or rows. .PAGE EXAMPLE: .LITERAL TYPE 'HELP' FOR A LIST OF EDIT CODES TYPE EDIT CODE (0 FOR NO MORE EDIT)? HELP THE FOLLOWING CODES SIGNIFY THE EDIT OPTIONS 0 - NO MORE EDIT 1 - PRINT ENTIRE MATRIX 2 - PRINT SUBMATRIX 3 - DELETE A COLUMN 4 - ADD A COLUMN 5 - DELETE A ROW 6 - ADD A ROW 7 - REPLACE AN ENTIRE ROW 8 - REPLACE AN INDIVIDUAL VALUE TYPE EDIT CODE (0 FOR NO MORE EDIT)? 8 TYPE ROW AND COLUMN OF VALUE TO BE REPLACED? 7,4 NEW VALUE: (X FOR MISSING)? X TYPE EDIT CODE (0 FOR NO MORE EDIT)? 5 WHAT ROW NUMBER IS TO BE DELETED? 2 THE NEW MATRIX HAS 12 ROWS AND 5 COLUMNS TYPE EDIT CODE (0 FOR NO MORE EDIT)? 2 TYPE FIRST ROW, LAST ROW TO PRINT? 1,3 TYPE FIRST COLUMN, LAST COLUMN TO PRINT? 1,3 ROW COLUMN 1 2 3 1 7.0000 26.0000 6.0000 2 11.0000 56.0000 8.0000 3 11.0000 31.0000 8.0000 TYPE EDIT CODE (0 FOR NO MORE EDIT)? 0 ELEMENTARY STATISTICS WILL NOW BE CALCULATED DOES THE MATRIX NOW CONTAIN THE DATA YOU PLAN TO SAVE AT THE END OF THIS SESSION (Y/N)? N DO YOU WANT TO EDIT OR PRINT YOUR DATA? (Y,N)? N TYPE 'HELP' FOR A LIST OF ANALYSIS CODES ENTER ANALYSIS TO BE PERFORMED ? HELP ANALYSIS CODE EDIT/PRINT DATA EDIT TRANSFORMATION TRANS ELEMENTARY STATISTICS ELEM CROSS TABULATION CROSS RANK CORRELATION RANK CORRELATION CORR CHI-SQUARED CHI T-TEST T ANALYSIS OF VARIANCE ANOVA REGRESSION REG STEPWISE/MULTIPLE STEP SCATTER DIAGRAM SCAT HISTOGRAM HIST **END THIS SESSION** EXIT ENTER ANALYSIS TO BE PERFORMED ? TRANS THERE ARE 12 ROWS AND 5 COLUMNS IN YOUR DATA MATRIX TYPE 'HELP' FOR A LIST OF TRANSFORMATION CODES TYPE TRANSFORMATION CODE? HELP .END LITERAL .PAGE .CENTER ;TRANSFORMATION (TRANS) .BLANK 2 .LITERAL CODE FUNCTION COMMENTS 0 No more transformation To exit TRANS 1 Square Root (X) Domain: X>=0 2 Common Log (X) Domain: X>0 3 Natural Log (X) Domain: X>0 4 e**X Exponentiate 5 X + Y Addition of 2 columns 6 X - Y Subtraction 7 X * Y Multiplication 8 X/Y Division 9 X + C Addition of a constant, C 10 X * C Multiplication by a constant 11 1/X Reciprocal of X 12 Arcsin (X) In radians; -1<=x=<1 13 X**C X**C 14 If X>=C then Z=1 else Z=0 Grouping 15 Delete a column 16 Scalar Product of X,Y X1Y1+....+XnYn 17 Print Entire Matrix 18 Print submatrix 19 Generate Random Numbers From a uniform distribution .END LITERAL .PAGE EXAMPLE: .LITERAL DEFINITIONS OF TRANSFORMATION CODES (X AND Y ARE VARIABLES, C IS A CONSTANT) 1 SQUARE ROOT OF X 2 COMMON LOG OF X 3 NATURAL LOG OF X 4 EXPONENTIAL OF X 5 X + Y 6 X - Y 7 X * Y 8 X / Y 9 X + C 10 X * C 11 1 / X 12 ARCSIN OF X 13 X**C 14 IF X>=C, THEN X=1; OTHERWISE X=0 15 DELETE A COLUMN 16 SCALAR PRODUCT 17 PRINT ENTIRE MATRIX 18 PRINT A SUBMATRIX 19 GENERATE RANDOM NUMBERS 0 NO MORE TRANSFORMATIONS TYPE TRANSFORMATION CODE? 14 ENTER VARIABLE X, CONSTANT C AND THE VARIABLE NUMBER TO BE ASSIGNED TO THE TRANSFORMED DATA (FORM; XX,XX.X,XX)? 1,10.0,5 TRANSFORMATION 14 HAS BEEN COMPLETED TYPE TRANSFORMATION CODE? 18 TYPE FIRST ROW, LAST ROW TO PRINT? 1,4 TYPE FIRST COLUMN, LAST COLUMN TO PRINT? 1,6 COLUMN NUMBERS MUST BE BETWEEN 1 AND 5 PLEASE TRY AGAIN TYPE FIRST ROW, LAST ROW TO PRINT? 1,5 ROW COLUMN 1 2 3 4 5 1 7.0000 26.0000 6.0000 60.0000 0.0000 2 11.0000 56.0000 8.0000 20.0000 1.0000 3 11.0000 31.0000 8.0000 47.0000 1.0000 4 7.0000 52.0000 6.0000 33.0000 0.0000 TYPE TRANSFORMATION CODE? 0 YOU NOW HAVE 5 VARIABLES IN YOUR DATA MATRIX ELEMEMENTARY STATISTICS WILL NOW BE CALCULATED DOES THE MATRIX NOW CONTAIN THE DATA YOU PLAN TO SAVE AT THE END OF THIS SESSION (Y/N)? N DO YOU WANT TO EDIT OR PRINT YOUR DATA? (Y,N)? Y TYPE 'HELP' FOR A LIST OF EDIT CODES TYPE EDIT CODE (0 FOR NO MORE EDIT)? 1 .END LITERAL .PAGE .LITERAL ROW COLUMN 1 2 3 4 5 1 7.0000 26.0000 6.0000 60.0000 0.0000 2 11.0000 56.0000 8.0000 20.0000 1.0000 3 11.0000 31.0000 8.0000 47.0000 1.0000 4 7.0000 52.0000 6.0000 33.0000 0.0000 5 11.0000 55.0000 9.0000 22.0000 1.0000 6 3.0000 71.0000 17.0000 0.0000 7 1.0000 31.0000 22.0000 44.0000 0.0000 8 2.0000 54.0000 18.0000 22.0000 0.0000 9 21.0000 47.0000 4.0000 26.0000 1.0000 10 1.0000 40.0000 23.0000 34.0000 0.0000 11 11.0000 66.0000 9.0000 12.0000 1.0000 12 10.0000 68.0000 8.0000 12.0000 1.0000 TYPE EDIT CODE (0 FOR NO MORE EDIT)? 0 .END LITERAL .PAGE .CENTER ;TRANS .BLANK 3 The transformed values can be assigned either to another column, or to the same column in which case the original data will be replaced with the transformed data. .BLANK 2 .CENTER ;CHAIN TRANSFORMATION .BLANK 1 More complex functions can be computed by repeated calls of the appropriate transformations in the proper sequence. For example the exponential risk function .blank 1 .CENTER ;P=1/(1+e**(A+BX1)) .BLANK 1 Can be computed by the following 5 steps. .BLANK 1 .LITERAL B*X1 = X2 TRANS 7 (X*C) A+X2 = X3 TRANS 9 (X+C) eX3 = X4 TRANS 4 e**X 1+X4 = X5 TRANS 9 (X+C) 1/X5 = P TRANS 11 (1/X) .END LITERAL .PAGE .CENTER ;Elementary Statistics (ELEM) .BLANK 3 The following statistics are given for each variable (column): .Blank 2 .LITERAL N (Number of data values associated with each variable) Maximum value of each variable Minimum value of each variable Mean of each variable Standard Deviation (SD) of each variable Standard Error of each variable = SD/SQRT(N) Range (maximum value - minimum value) .END LITERAL .LEFT MARGIN 4 .PAGE EXAMPLE: .RIGHT MARGIN 80 .LITERAL TYPE 'HELP' FOR A LIST OF ANALYSIS CODES ENTER ANALYSIS TO BE PERFORMED ? ELEM VAR N MAXIMUM MINIMUM MEAN STD DEV STD ERROR RANGE 1 12 21.0000 1.0000 8.0000 5.7997 1.6742 20.0000 2 12 71.0000 26.0000 49.7500 15.1004 4.3591 45.0000 3 12 23.0000 4.0000 11.5000 6.6127 1.9089 19.0000 4 11 60.0000 12.0000 30.1818 15.1711 4.5743 48.0000 5 12 1.0000 0.0000 0.5000 0.5222 0.1508 1.0000 .END LITERAL .RIGHT MARGIN 70 .PAGE .CENTER ;Elementary Statistics (ELEM) .BLANK 2 .CENTER ;COMMENTS .BLANK 1 Use maximum and minimum values as a quick check for outliers and/or data entry errors. .BLANK 1 Standard deviation is a measure of the variability of observations .BLANK 1 Standard Error is a measure of the variablility of sample means of size N (The standard error is the standard deviation of the mean) .BLANK 1 Therefore, present standard deviations when you want to draw attention to the variablility among the members in your population, and present standard errors when you want to focus on how good is your sample estimate of the population mean (that is, variability of your sample mean. .BLANK 1 Variance can be computed by squaring the standard deviation VAR = SD**2 .PAGE .CENTER ;HISTOGRAM (HIST) .BLANK 4 Useful for getting a "picture" of the distribution of your data. .BLANK 2 Computed for one column (variable) .BLANK 2 A BAR GRAPH of the frequency in each interval .BLANK 2 The user specificies the number of (equal-spaced) intervals (max 10) .BLANK 2 The range of the variable is divided by the number of intervals to give the interval length, or the user has the option of specifying the upper and lower boundaries of the histogram. .PAGE EXAMPLE: .LITERAL ENTER ANALYSIS TO BE PERFORMED ? HIST TYPE THE COLUMN NUMBER OF THE VARIABLE TO BE USED IN THE HISTOGRAM? 1 ENTER THE NUMBER OF INTERVALS YOU WANT IN THE HISTOGRAM (MAXIMUM OF 10)? 5 DO YOU WISH TO SPECIFY THE LOWER AND UPPER BOUNDARIES FOR THE HISTOGRAM (Y/N)? N HISTOGRAM FOR VARIABLE 1 FREQUENCY 4 2 5 0 1 5 | * 4 | * * 3 | * * 2 | * * * 1 | * * * * --------+---------+---------+---------+---------+---------+ 1.00 5.00 9.00 13.00 17.00 21.00 DO YOU WISH TO PRINT MORE HISTOGRAMS? N ENTER ANALYSIS TO BE PERFORMED .END LITERAL .PAGE .CENTER ;CORRELATION (CORR) .BLANK 3 This routine computes the linear correlation between two variables, X and Y, that are assumed to have a bivariate normal distribution. .BLANK 1 The output consists of a matrix containing the linear correlation coefficients (symbolized by the letter "r") between each variable (column) and every other variable (column). The number in parenthesis below each correlation coefficient is the N (number of subjects) associated with that correlation coefficient. .BLANK 1 The minimum N required for each correlation coefficient is 4. .BLANK 1 -110) is printed. The two-tailed probability level associated with this test is also computed. .PAGE EXAMPLE: .LITERAL ENTER ANALYSIS TO BE PERFORMED ? RANK ENTER THE COLUMN NUMBERS OF TWO VARIABLES FOR RANK CORRELATION. (FORM:XX,XX)? 1,2 SPEARMAN'S RANK CORRELATION (N = 12) = 0.1968 Z VALUE TO TEST SIGNIFICANCE = 0.6528 PROBABILITY (FOR HO: R=0) = 0.5139 DO YOU WISH TO PRINT DATA AND RANKS? Y OBS VAR A RANK A VAR B RANK B 1 7.0000 5.5 26.0000 1.0 2 11.0000 9.5 56.0000 9.0 3 11.0000 9.5 31.0000 2.5 4 7.0000 5.5 52.0000 6.0 5 11.0000 9.5 55.0000 8.0 6 3.0000 4.0 71.0000 12.0 7 1.0000 1.5 31.0000 2.5 8 2.0000 3.0 54.0000 7.0 9 21.0000 12.0 47.0000 5.0 10 1.0000 1.5 40.0000 4.0 11 11.0000 9.5 66.0000 10.0 12 10.0000 7.0 68.0000 11.0 DO YOU WISH TO COMPUTE MORE RANK CORRELATION? N .END LITERAL .PAGE .CENTER ;LINEAR REGRESSION ( REG) .BLANK 4 This analysis illustrates a linear relationship between two variables, X and Y, where Y is measured with error and X has little or no error. .blank 1 The model is Y=a+bX where X is the independent variable and Y is the dependent variable, and a and b are constants .blank 1 The Least Squares method is used to calculate "A" (the estimate of the intercept, a) and "B" (the estimate of the slope or regression coefficient, b). .blank 1 From these estimates, predicted values of Y for a given X can be computed as .CENTER ;Y=A+BX and saved in a designated column. .blank 1 Computations include the standard error of the regression coefficient [SE(B)], a t value for the test of b =0, the correlation coefficient between x and y, the standard error of the estimate of Y (from the model), and a regression analysis of variance. .PAGE .CENTER ;REG .BLANK 4 Interpretation of B: for a one unit change in X, Y changes B units. .blank 1 When B=0, information on X contributes nothing to the knowledge of Y (that is, no relationship exits). .blank 1 To test H0:B=0, compare the t-value to the appropriate critical t-value from t-tables. The appropriate degrees of freedom (D.F.) is N-2. The F-Value from the Regression Analysis is an identical test since t squared (df)=F(1,df). .blank 1 In the table of residuals (Y observed minus Y estimated), the STD RESID = RESIDUAL/STD ERROR OF ESTIMATE. .BLANK 1 Use the standardized residuals to check for outliers, which can drastically affect the estimates. .blank 1 Observed and estimated values can optionally be plotted. .PAGE EXAMPLE: .RIGHT MARGIN 80 .LITERAL ENTER ANALYSIS TO BE PERFORMED ? REG ENTER THE DEPENDENT VARIABLE? 2 ENTER THE INDEPENDENT VARIABLE? 3 Y INTERCEPT = 52.12890 REGRESSION COEFFICIENT = -0.20686 STD. ERROR OF REG. COEF. = 0.719 COMPUTED T-VALUE = -0.288 CORRELATION COEFFICIENT = -0.091 STD. ERROR OF ESTIMATE = 15.772 ANALYSIS OF VARIANCE FOR THE REGRESSION SOURCE OF VARIATION D.F . SUM OF SQ. MEAN SQ. F VALUE ATTRIBUTABLE TO REGRESSION 1 20.583 20.583 0.083 DEVIATION FROM REGRESSION 10 2487.670 248.767 TOTAL 11 2508.250 DO YOU WISH TO PRINT TABLE OF RESIDUALS (Y/N)? Y ROW X VALUE Y OBSERVED Y ESTIMATED RESIDUAL STD RESID 1 6.0000 26.0000 50.8877 -24.8877 -1.5779 2 8.0000 56.0000 50.4740 5.5260 0.3504 3 8.0000 31.0000 50.4740 -19.4740 -1.2347 4 6.0000 52.0000 50.8877 1.1123 0.0705 5 9.0000 55.0000 50.2672 4.7328 0.3001 6 17.0000 71.0000 48.6123 22.3877 1.4194 7 22.0000 31.0000 47.5780 -16.5780 -1.0511 8 18.0000 54.0000 48.4054 5.5946 0.3547 9 4.0000 47.0000 51.3015 -4.3015 -0.2727 10 23.0000 40.0000 47.3711 -7.3711 -0.4673 11 9.0000 66.0000 50.2672 15.7328 0.9975 12 8.0000 68.0000 50.4740 17.5260 1.1112 DO YOU WISH TO SAVE YOUR ESTIMATES (Y/N)? N DO YOU WISH TO PLOT Y OBSERVED AND Y ESTIMATED (Y/N)? N DO YOU WISH TO COMPUTE MORE REGRESSION (Y/N)? N ENTER ANALYSIS TO BE PERFORMED ? EXIT .END LITERAL .RIGHT MARGIN 70 .PAGE .CENTER ;Stepwise/Multiple Regression (STEP) .BLANK 4 Multiple regression examines the linear relationship between a dependent variable (Y) and a set of K independent variables (X1,X2,....XK). .BLANK 1 Model assumed: Yi=B(0)+B(1)X(1)+B(2)X(2)+...+B(K)X(K) .blank 1 Interpretation of B(i): For a one unit change in X(i), Y changes B(i) units, while all other X's remain fixed. .blank 1 User specifies which column is the dependent variable (Y). .blank 1 CAUTION: No independent variable (X) can be a linear combination of other variables in the model. For example, if X3=X2-X1, do not use X1,X2 and X3 together. .blank 1 Number of subjects must be greater than K+1. .blank 1 SAMSTAT asks the user if he wants to see the last step only. If 'yes' then all variables (columns remaining after naming those to be deleted) are put into the model as X's. If "no", then the program will select the independent variables one at a time in the order of decreasing magnitude of their contribution to the reduction of the variance of Y. A report on the contribution of each variable is printed as that variable is added. .PAGE .CENTER ;STEP After each report, SAMSTAT asks whether the user wishes to include that variable in the regression. .blank 1 Thus, the stepwise/multiple regression program can be used either as a multiple regression or as a stepup regression analysis. .blank 1 Observed Y, estimated Y and residual (OBS-EST) values are optionally printed at the end of the analysis. .blank 1 Since the B(i)'s are Partial Regression Coefficients (dependent upon the other B's) the estimates of B's change with each addition of another variable. .blank 1 The partial F printed at each step can be used as a guide to measure the significance of the contribution of that variable over and above all other variables already in the model. Compare the F value to critical F values found in an F table. .PAGE .CENTER ;STEP .BLANK 3 The total sums of squares, adjusted for the mean, is a measure of the unexplained variablity in the data (Y values). .blank 1 With the addition of each X, the proportion of this unexplained variance in Y that is accounted for by the measurement of X is computed. .blank 1 Sums of squares reduced in each step as well as cumulative sum of squares reduced is printed at each step. .blank 1 The multiple correlation is the correlation between the estimated and observed values. .blank 1 Regression coefficients and tests against zero of each coefficient (t-TESTS) are also computed at each step. .PAGE EXAMPLE: .RIGHT MARGIN 80 .LITERAL SAMSTAT 10:29 AM 08-MAY-79 PLEASE ENTER NAMES OF INPUT AND OUTPUT FILES FORM: IN,OUT ( *,OUT DENOTES DATA ENTRY MODE) ? STE.SST,WORK2.SST STE.SST CONTAINS 13 ROWS AND 5 COLUMNS OF DATA ENTER PRINT OPTION (0=NO PRINT, 1=ENTIRE MATRIX, 2=SUBMATRIX)? 0 ELEMENTARY STATISTICS WILL NOW BE CALCULATED DOES THE MATRIX NOW CONTAIN THE DATA YOU PLAN TO SAVE AT THE END OF THIS SESSION (Y/N)? N DO YOU WANT TO EDIT OR PRINT YOUR DATA? (Y,N)? N TYPE 'HELP' FOR A LIST OF ANALYSIS CODES ENTER ANALYSIS TO BE PERFORMED ? STEP STEP-WISE REGRESSION ALL DATA POINTS MUST EXIST FOR THIS ANALYSIS. ANY ROW OF DATA CONTAINING A MISSING VALUE WILL BE DROPPED FROM THE ANALYSIS DO YOU WANT TO SEE THE LAST STEP ONLY (Y/N)? N ENTER THE COLUMN NUMBER OF THE DEPENDENT VARIABLE? 5 DO YOU WISH TO OMIT ANY VARIABLES FROM THIS ANALYSIS (Y/N)? N STEP 1 VARIABLE SELECTED IS X 4 SUM OF SQUARES REDUCED IN THIS STEP = 1831.9000 PROPORTION OF VARIANCE OF Y REDUCED = 0.6745 PARTIAL F (DF =1, 11) = 22.7985 CUMULATIVE SUM OF SQUARES REDUCED = 1831.9000 CUMULATIVE PROPORTION REDUCED = 0.6745 (OF 2715.7600) MULTIPLE CORRELATION COEFFICIENT = 0.8213 F FOR ANALYSIS OF VAR. (D.F. = 1 , 11) 22.7985 STANDARD ERROR OF ESTIMATE 8.9639 INTERCEPT AFTER STEP 1 = 117.5680 VARIABLE REG COEFF STD ERR-COEFF COMPUTED T 4 -0.7382 0.1546 -4.7748 STEP 2 VARIABLE SELECTED IS X 1 SUM OF SQUARES REDUCED IN THIS STEP = 809.1050 PROPORTION OF VARIANCE OF Y REDUCED = 0.2979 PARTIAL F (DF =1, 10) = 108.2240 TYPE 'HELP' FOR AN EXPLANATION DO YOU WISH TO ENTER THIS VARIABLE IN THE REGRESSION (Y/N)? Y CUMULATIVE SUM OF SQUARES REDUCED = 2641.0000 CUMULATIVE PROPORTION REDUCED = 0.9725 (OF 2715.7600) MULTIPLE CORRELATION COEFFICIENT = 0.9861 F FOR ANALYSIS OF VAR. (D.F. = 2 , 10) 176.6270 STANDARD ERROR OF ESTIMATE 2.7343 INTERCEPT AFTER STEP 2 = 103.0970 VARIABLE REG COEFF STD ERR-COEFF COMPUTED T 4 -0.6140 0.0486 -12.6212 1 1.4400 0.1384 10.4031 STEP 3 VARIABLE SELECTED IS X 2 SUM OF SQUARES REDUCED IN THIS STEP = 26.7893 PROPORTION OF VARIANCE OF Y REDUCED = 0.0099 PARTIAL F (DF =1, 9) = 5.0258 DO YOU WISH TO ENTER THIS VARIABLE IN THE REGRESSION (Y/N)? Y CUMULATIVE SUM OF SQUARES REDUCED = 2667.7900 CUMULATIVE PROPORTION REDUCED = 0.9823 (OF 2715.7600) MULTIPLE CORRELATION COEFFICIENT = 0.9911 F FOR ANALYSIS OF VAR. (D.F. = 3 , 9) 166.8310 STANDARD ERROR OF ESTIMATE 2.3087 INTERCEPT AFTER STEP 3 = 71.6483 VARIABLE REG COEFF STD ERR-COEFF COMPUTED T 4 -0.2365 0.1733 -1.3650 1 1.4519 0.1170 12.4100 2 0.4161 0.1856 2.2418 STEP 4 VARIABLE SELECTED IS X 3 SUM OF SQUARES REDUCED IN THIS STEP = 0.1091 PROPORTION OF VARIANCE OF Y REDUCED = 0.0000 PARTIAL F (DF =1, 8) = 0.0182 DO YOU WISH TO ENTER THIS VARIABLE IN THE REGRESSION (Y/N)? N DO YOU WISH TO PRINT THE TABLE OF RESIDUALS (Y/N)? Y OBS Y OBSERVED Y ESTIMATED RESIDUAL STD RESID 1 78.5000 78.4383 0.0617 0.0267 2 74.3000 72.8673 1.4327 0.6205 3 104.3000 106.1910 -1.8910 -0.8190 4 87.6000 89.4016 -1.8016 -0.7804 5 95.9000 95.6438 0.2562 0.1110 6 109.2000 105.3020 3.8982 1.6885 7 102.7000 104.1290 -1.4287 -0.6188 8 72.5000 75.5919 -3.0919 -1.3392 9 93.1000 91.8182 1.2818 0.5552 10 115.9000 115.5460 0.3539 0.1533 11 83.8000 81.7023 2.0977 0.9086 12 113.3000 112.2440 1.0556 0.4572 13 109.4000 111.6250 -2.2247 -0.9636 DO YOU WISH TO PRINT THE DATA USED IN THIS ANALYSIS (Y/N)? N DO YOU WISH TO COMPUTE MORE STEP-WISE REGRESSION? N ENTER ANALYSIS TO BE PERFORMED ? EXIT .END LITERAL .RIGHT MARGIN 70 .PAGE .CENTER ;CHI - SQUARED (CHI) .BLANK 2 Used to test for independence between two classification variables. .blank 1 For example, heart cath patients can be classified as smokers and non-smokers, and as positive or negative upon catheterization. Independence between smoking and heart disease can then be tested using this analysis. .blank 1 Let's suppose the following enumeration data were collected on 61 subjects. .blank 2 .LITERAL POSITIVE NEGATIVE ---------------------------------- SMOKERS | 18 | 10 | 28 |---------------|----------------| NON-SMOKERS | 6 | 27 | 33 ---------------------------------- 24 37 61 .END LITERAL .BLANK 2 Then your SAMSTAT data array would have two rows and two columns as follows: .blank 1 .LITERAL COLUMN 1 COLUMN 2 ROW 1 18 10 ROW 2 6 27 .END LITERAL .BLANK 2 This is refered to as a 2x2 Table. Four rows and 3 columns would be called a 4x3 table. .blank 1 The expected frequencies for each cell in the table is optionally computed so that you can compare your observed frequency to the expected frequency under the null hypothesis of independence between the variables. .PAGE .CENTER ;CHI .BLANK 3 When the computed chi-squared value exceeds the tabled critical value of chi-squared for probability level alpha, one can infer at the alpha level of significance that the variables are not independent. .blank 1 This is not a test of cause and effect, but only of dependence. .blank 1 Degrees of freedom = (Ncol-1) (Nrows-1) .blank 1 This is a good approximate test whenever all cells have an expected cell frequency of at least 5. .blank 1 While zeroes are valid cell entries, X's are not meaningful and not allowed. If there is missing data, the analysis will not be performed. .blank 1 Elementary statistics is not appropriate for this type data. .blank 1 The chi-squared value is not adjusted for continuity. .PAGE EXAMPLE: .LITERAL SAMSTAT 10:34 AM 08-MAY-79 PLEASE ENTER NAMES OF INPUT AND OUTPUT FILES FORM: IN,OUT ( *,OUT DENOTES DATA ENTRY MODE) ? *,WORK3.SST TYPE NUMBERS OF ROWS AND COLUMNS OF DATA. FORM (ROWS,COLUMNS)? 2,4 IF YOU HAVE MISSING DATA, TYPE X WHERE THESE VALUES WOULD NORMALLY OCCUR ENTER 2 ROWS, ONE AT A TIME ROW 1? 56,36,48,29 ROW 2? 14,18,51,89 ENTER PRINT OPTION (0=NO PRINT, 1=ENTIRE MATRIX, 2=SUBMATRIX)? 0 ELEMENTARY STATISTICS WILL NOW BE CALCULATED DOES THE MATRIX NOW CONTAIN THE DATA YOU PLAN TO SAVE AT THE END OF THIS SESSION (Y/N)? N DO YOU WANT TO EDIT OR PRINT YOUR DATA? (Y,N)? N TYPE 'HELP' FOR A LIST OF ANALYSIS CODES ENTER ANALYSIS TO BE PERFORMED ? CHI CHI-SQUARE = 61.7778 DEGREES OF FREEDOM = 3 DO YOU WISH TO PRINT EXPECTED FREQUENCIES (Y/N)? Y EXPECTED FREQUENCY FOR EACH CELL ROW COLUMN 1 2 3 4 1 34.6921 26.7625 49.0645 58.4809 2 35.3079 27.2375 49.9355 59.5191 .END LITERAL .PAGE .CENTER ;Cross Tabulation (CROSS) .BLANK 2 This program computes and prints a table containing frequency counts for two variables (columns). .blank 1 For example: .LITERAL 300 ------------- | 14 | 81 | 250 ------------- X data ranges | 19 | 105 | from 25 to 35 200 ------------- | 7 | 92 | Y data ranges 150 ------------- from 150 to 300 25 30 35 X .END LITERAL .BLANK 2 The user is asked to specify the horizantal and vertical variables, and the number of intervals into which the range of each variable is to be divided. .blank 1 The maximum number of intervals is 10 for each variable. .blank 1 The number shown in the cross table indicate the number of observations which fall between each of the limits marked along the axes. .blank 1 SAMSTAT will also print expected frequences (under the hypothesis of no relationship between variables and provide a chi-squared analysis on the data in the table. .PAGE EXAMPLE: .RIGHT MARGIN 80 .LITERAL ENTER ANALYSIS TO BE PERFORMED ? CROSS ENTER THE COLUMN NUMBERS OF TWO VARIABLES FOR CROSS TABULATION (FORM: HORIZONTAL,VERTICAL)? 1,5 ENTER THE NUMBER OF INTERVALS FOR VARIABLE 1 (MAXIMUM OF 10)? 4 ENTER THE NUMBER OF INTERVALS FOR VARIABLE 5 (MAXIMUM OF 10)? 2 DO YOU WISH TO SPECIFY INTERVAL BOUNDARIES FOR THE CROSSTABLE (Y/N)? N 1.000 + | 0 1 4 1 0.500 + | 4 2 0 0 0.000 +---------+---------+---------+---------+ 1.00 6.00 11.00 16.00 21.00 DO YOU WISH TO COMPUTE CHI-SQUARE FOR THE ABOVE TABLE (Y/N)? Y CHI-SQUARE = 9.3333 DEGREES OF FREEDOM = 3 8 CELL(S) USED IN CALCULATING CHI-SQUARE HAVE AN EXPECTED FREQUENCY LESS THAN 5. THEREFORE THE VALUE COMPUTED MAY BE MEANINGLESS DO YOU WISH TO PRINT EXPECTED FREQUENCIES (Y/N)? Y EXPECTED FREQUENCY FOR EACH CELL ROW COLUMN 1 2 3 4 1 2.0000 1.5000 2.0000 0.5000 2 2.0000 1.5000 2.0000 0.5000 DO YOU WISH TO COMPUTE ANOTHER CROSS TABLE (Y/N) ? N .END LITERAL .RIGHT MARGIN 70 .PAGE .CENTER ;STUDENT'S t-TEST (T) .BLANK 3 This analysis calculates the t-STATISTIC which may be used to test whether two sample means (computed from two columns) differ significantly or if a mean of one column significantly differs from a specified constant. .blank 1 There are four options from which to select the one that is appropriate to your design and the assumptions on your data. .blank 1 .LEFT MARGIN 10 .INDENT -3 1) Mean of group A vs a constant .INDENT -3 2) Mean of group A vs mean of group B, assuming the two population variances are equal (different subjects in each group-unpaired t-TEST) .INDENT -3 3) Same as 2 above, but assuming the two population variances are unequal .INDENT -3 4) Mean of A vs mean of B when all subjects were measured under both conditions A and B (paired t-TEST). .LEFT MARGIN 4 .PAGE .CENTER ;T .BLANK 3 Compare your computed t-VALUE to the significant values in the t-TABLE for the appropriate degrees of freedom (df). When your calculated t EXCEEDS the tabled value, you can infer that the two means differ significantly at the tabled probability level. .blank 1 For option 1, be careful not to assume an estimate from another study (perhaps one presented in the literature) is a constant - it too has variablilty which should be considered. .blank 1 For options 2 and 3, the ratio of the largest to smallest variance is printed. To test for differences in the variances, compare this ratio to the critical F values in an F table with (numerator, denominator) degrees of freedom but double the probability level from the table. For example, if your F = 2.63 with (10,14) df, then from the table P is less than .05 (critical F.05 with 10,14 df=2.60), then double the probability: 2 X .05 = .10. Thus P <.10 is correct. .PAGE .CENTER ;T For unpaired data, when variances differ significatly, use option 3. This is not critical when the sample sizes from the two groups are equal or nearly equal. .blank 1 Remember, for paired data use option 4. A common example of paired data is pre vs post treatment data collected on subjects or experimental animals. If in doubt, check with a statistician in the consultation/training branch in BR (2818) .blank 1 Caution: do not overtest your data. Making t-TESTS on all possible pairs of experimental groups is inviting "false positive" significance tests. That is, you are likely to find false significant differences when no real differences exist. For such "fishing expeditions", check with us for an appropriate multiple range test. .PAGE EXAMPLE: .LITERAL ENTER ANALYSIS TO BE PERFORMED ? T TYPE 'HELP' FOR A LIST OF T-TEST OPTIONS ENTER YOUR OPTION NUMBER? HELP THE FOLLOWING FOUR HYPOTHESES ARE OPTIONAL FOR CALCULATION OF T-STATISTIC ON THE MEANS OF SAMPLE VARIABLES A AND B: 1 - MEAN OF GROUP A VS. A CONSTANT 2 - MEAN OF GROUP A VS. MEAN OF GROUP B, ASSUMING THE TWO POPULATION VARIANCES ARE EQUAL (DIFFERENT SUBJECTS IN EACH GROUP - UNPAIRED T-TEST) 3 - SAME AS 2 ABOVE, BUT ASSUMING THE TWO POPULATION VARIANCES ARE UNEQUAL 4 - MEAN OF A VS. MEAN OF B WHEN ALL SUBJECTS WERE MEASURED UNDER CONDITIONS A AND B. (PAIRED T-TEST) ENTER YOUR OPTION NUMBER? 1 ENTER COLUMN A? 1 ENTER THE VALUE YOU WISH TO COMPARE AGAINST THE MEAN OF VARIABLE 1 ? 10 COMPUTED T = -1.1946 DEGREES OF FREEDOM = 11 DO YOU WISH TO COMPUTE MORE T VALUES (Y/N)? Y TYPE 'HELP' FOR A LIST OF T-TEST OPTIONS ENTER YOUR OPTION NUMBER? 2 ENTER THE NUMBERS OF 2 COLUMNS FOR THE T-TEST? 1,2 F VALUE FOR RATIO OF VARIANCES (D.F.= 11 , 11 ) = 6.78 NOTE: THE PROBABILITY LEVEL FOR THE ABOVE F TEST SHOULD BE DOUBLED COMPUTED T = -8.9409 DEGREES OF FREEDOM = 22 DO YOU WISH TO COMPUTE MORE T VALUES (Y/N)? Y TYPE 'HELP' FOR A LIST OF T-TEST OPTIONS ENTER YOUR OPTION NUMBER? 3 ENTER THE NUMBERS OF 2 COLUMNS FOR THE T-TEST? 1,2 F VALUE FOR RATIO OF VARIANCES (D.F.= 11 , 11 ) = 6.78 NOTE: THE PROBABILITY LEVEL FOR THE ABOVE F TEST SHOULD BE DOUBLED COMPUTED T = -8.9409 DEGREES OF FREEDOM = 15 DO YOU WISH TO COMPUTE MORE T VALUES (Y/N)? Y TYPE 'HELP' FOR A LIST OF T-TEST OPTIONS ENTER YOUR OPTION NUMBER? 4 ENTER THE NUMBERS OF 2 COLUMNS FOR THE T-TEST? 3,4 COMPUTED T = -3.9749 DEGREES OF FREEDOM = 10 DO YOU WISH TO COMPUTE MORE T VALUES (Y/N)? N .END LITERAL .PAGE .CENTER ;Analysis of Variance (ANOVA) .BLANK 3 An extension of the t-TEST concept to many groups or treatments. .blank 1 Tests for differences among group means. .blank 1 (H0: u1=u2=...=uk) .blank 1 ui = population mean of group i .blank 1 Rejection of the null hypothesis (H0) only infers that all the means are not equal. It says nothing about any specific comparison of individual means in the set. .blank 1 The analysis of variance may be computed for two different designs; .LEFT MARGIN 10 .INDENT -3 1) Completely randomized design (one-way ANOVA), where different subjects are randomly assigned to each treatment (an extension of the unpaired t-TEST to multiple groups). .blank 1 .INDENT -3 2) Randomized complete block design (Two-way ANOVA), where every subject is studied in every experimental condition (an extension of the paired t-TEST). .LEFT MARGIN 4 .PAGE .CENTER ;ANOVA .BLANK 3 For each design, an F value is computed. Compare this F value to the tabled F-values with numerator df EQUAL to the df of the treatments and denominator df EQUAL to the df of the error or residual. .BLANK 1 You have the option to omit some groups (columns) from the analysis. .blank 1 The analysis of variance partitions the total sums of squares (variablity in the data) into sums of squares attributable to treatments (also "subjects" for the two-way ANOVA) and error or residual sums of squares. The mean squares are the sum of squares divided by the df. .blank 1 The F ratio then is the ratio of the treatment mean square to the error (residual) mean square. .blank 1 An assumption in the one-way ANOVA is that the group variances are homogeneous (not significantly different). .blank 1 A common two-way design at SAM is when subjects are measured across time. The "treatments" then refers to the time periods. .PAGE .CENTER ;SOME COMMON USES OF TRANSFORMATIONS .BLANK 3 One assumption in the analysis of variance is homogeneity of error (a common variance in all treatment populations). Sometimes a transformation of the data is necessary to meet this assumption, and then more valid inferences can be drawn by analyzing the transformed data. For example: .blank 1 .LITERAL TRANSFORMATION WHEN APPROPRIATE -------------- ---------------- Square Root Group means and variances are proportional (POISSON), common in small whole numbers such as number of bacterial colonies in a plate, etc. LOG (either common Group Means and Standard or natural logs) Deviations are proportional. Use LOG (X + 1) if zeros are present ARCSIN of Square Binomial Data Expressed as Root Decimal Fractions (eg, fraction survival per cage) .END LITERAL .PAGE EXAMPLE: .LITERAL ENTER ANALYSIS TO BE PERFORMED ? ANOVA ANALYSIS OF VARIANCE DESIGN OPTIONS: 1. ONE-WAY ANOVA: APPROPRIATE WHEN EACH SUBJECT WAS MEASURED UNDER ONLY ONE TREATMENT 2. TWO-WAY ANOVA: APPROPRIATE WHEN EACH SUBJECT WAS MEASURED UNDER ALL TREATMENTS TYPE THE DESIGN NUMBER APPROPRIATE FOR YOUR DATA? 1 DO YOU WISH TO OMIT ANY TREATMENTS (COLUMNS) FROM THIS ANALYSIS (Y/N)? Y TYPE TREATMENT TO BE OMITTED (0 FOR NO MORE)? 5 TYPE TREATMENT TO BE OMITTED (0 FOR NO MORE)? 0 ONE-WAY ANALYSIS OF VARIANCE SOURCE DF SUM OF SQUARES MEAN SQUARE F TREATMENTS 3 13298.0508 4432.6836 33.671 ERROR 43 5660.8900 131.6490 TOTAL 46 18958.9375 DO YOU WISH TO COMPUTE MORE ANALYSES OF VARIANCE? Y ANALYSIS OF VARIANCE DESIGN OPTIONS: 1. ONE-WAY ANOVA: APPROPRIATE WHEN EACH SUBJECT WAS MEASURED UNDER ONLY ONE TREATMENT 2. TWO-WAY ANOVA: APPROPRIATE WHEN EACH SUBJECT WAS MEASURED UNDER ALL TREATMENTS TYPE THE DESIGN NUMBER APPROPRIATE FOR YOUR DATA? 2 DO YOU WISH TO OMIT ANY TREATMENTS (COLUMNS) FROM THIS ANALYSIS (Y/N)? Y TYPE TREATMENT TO BE OMITTED (0 FOR NO MORE)? 5 TYPE TREATMENT TO BE OMITTED (0 FOR NO MORE)? 0 AT PRESENT THIS ANALYSIS CANNOT BE PERFORMED WITH MISSING DATA. DO YOU WISH TO OMIT THOSE ROWS WITH MISSING DATA (Y/N)? Y TWO-WAY ANALYSIS OF VARIANCE SOURCE DF SUM OF SQUARES MEAN SQUARE F SUBJECTS 10 3.1818 0.3182 TREATMENTS 3 11172.2000 3724.0600 21.886 RESIDUAL 30 5104.8200 170.1610 TOTAL 43 16280.2000 DO YOU WISH TO COMPUTE MORE ANALYSES OF VARIANCE? N .END LITERAL