


ANOVA(1STAT)        UNIX Programmer's Manual         ANOVA(1STAT)



NNNNAAAAMMMMEEEE
     anova - statistics: multi-factor analysis of variance

SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
     aaaannnnoooovvvvaaaa [column names]

DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
     _a_n_o_v_a does multi-factor analysis of variance on designs with
     within groups factors, between groups factors, or both.
     _a_n_o_v_a reads from the standard input (via redirection with <
     or piped with |) and writes to the standard output.  _a_n_o_v_a
     allows variable numbers of replications (averaged before
     analysis) on any factor.  All factors except the random fac-
     tor must be crossed; some nested designs are not allowed.
     Unequal group sizes on between groups factors are allowed
     and are solved with the weighted means solution, however
     empty cells are not permitted.  The program is based on a
     method of doing analysis discussed by Keppel (1973) in
     _D_e_s_i_g_n _a_n_d _A_n_a_l_y_s_i_s: _A _R_e_s_e_a_r_c_h_e_r'_s _H_a_n_d_b_o_o_k. The input for-
     mat was designed with the help of Jay McClelland of UCSD.

     _I_n_p_u_t _F_o_r_m_a_t. _a_n_o_v_a uses an input format unlike conventional
     programs.  The input format was designed so that if a user
     specifies the role individual data play in the overall
     design, _a_n_o_v_a would figure out program parameters that usu-
     ally need to be explicitly specified with a special control
     language.  This way, the error prone process of design
     specification is taken out of the hands of the user.  The
     input to _a_n_o_v_a consists of each datum on a separate line,
     preceded by a list of indexes, one for each factor, that
     specifies the level of each factor at which that datum was
     obtained.  By convention, data are always in the last
     column, and indexes for the one allowable random factor must
     be in the first.  Data can be real numbers or integers.
     Indexes can be any character string, allowing mnemonic
     labels to simplify reading the output.  For example:
                          fred  3  hard  10
     indicates that "fred" at level "3" of the factor indexed by
     column two and at level "hard" of the factor indexed by
     column three, scored 10.  Indexes and data on a line can be
     separated by tabs or spaces for readability.  Data from an
     experiment consists of a series of lines like the one above.
     The order of these lines does not matter, so additional data
     can simply be appended to the end of a file.  Replications
     are coded by having more than one line with the same list of
     leading indexes.

     With this information, _a_n_o_v_a determines the number of fac-
     tors, and the number and names of levels of each factor.
     _a_n_o_v_a also figures out whether a factor is between groups or
     within groups so that it can correctly choose error terms
     for F-ratios.



Printed 11/22/82        October 16, 1980                        1






ANOVA(1STAT)        UNIX Programmer's Manual         ANOVA(1STAT)



     Optionally, names of independent and dependent variables can
     be specified to _a_n_o_v_a, providing mnemonic labels for the
     output.  These names can be quite long but will be truncated
     in the output.  The names should have unique first charac-
     ters because that is all that is used in the F tables.  For
     example, in a three factor design, the call to _a_n_o_v_a:
                anova subjects group difficulty errors
     would give the name "subjects" to the random factor, "group"
     and "difficulty" to the next two, and "errors" to the depen-
     dent variable.  If names are not specified, the default name
     for the random factor is RANDOM, for the dependent variable,
     DATA, and for the independent variables, A, B, C, D, etc.

     _O_u_t_p_u_t _F_o_r_m_a_t. The output from _a_n_o_v_a consists of cell means
     and standard deviations for each source not involving the
     random factor, a summary of design information, and an F
     table.  Sums of squares, degrees of freedom, mean squares, F
     ratio and significance level are all reported for each F
     test.

LLLLIIIIMMMMIIIITTTTAAAATTTTIIIIOOOONNNNSSSS AAAANNNNDDDD DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
     The maximum number of factors _a_n_o_v_a allows is ten, and the
     maximum number of levels of factors is 100 (on some ver-
     sions, 500), although both these limitations can be changed
     by resetting just one parameter and recompiling.

     _a_n_o_v_a checks its input to make sure that data are numerical,
     and that each datum is preceded by a consistent number of
     indexes (if not, _a_n_o_v_a will complain about "Ragged input").

     _a_n_o_v_a will not print its F tables if it cannot make sense
     out of the the input specification ("Unbalenced factor or
     Empty cell").  This can happen if there is missing data and
     is detected when the cell sizes of all the scores for a
     source do not add up to the expected grand total, the number
     of levels of the data column reported with the design infor-
     mation.  (This number is equal to the product of all the
     within subjects factors, and the number of levels of the
     random factor.) Unbalenced factors often are due to a typo-
     graphical error, but the empty cell size message can be due
     to an illegal nested design (only the random factor can be
     nested).

     _a_n_o_v_a uses a temporary file to store its input, will com-
     plain if it is unable to create it ("Can't open temporary
     file").  This is usually because you are in some other
     user's directory that is "write protected." _a_n_o_v_a has to
     allocate space to compute means and for some designs it will
     be unable due to lack of available memory

EEEEXXXXAAAAMMMMPPPPLLLLEEEE




Printed 11/22/82        October 16, 1980                        2






ANOVA(1STAT)        UNIX Programmer's Manual         ANOVA(1STAT)



     Suppose you have done an experiment in which the two factors
     of interest were difficulty of material to be learned, and
     amount of knowledge a person brings with him or her into the
     experiment.  Each person is given two learning tasks, one
     easy and one hard, so task difficulty is a within groups
     factor.  Three people are experts in the task domain you
     have chosen, while four are novices, so knowledge is a
     between groups factor with unequal group sizes.  The depen-
     dent variable is the amount of time it takes a person to
     correctly work through a problem.  Data is formatted as fol-
     lows: in column one is the name of the person; in column two
     is the level of the difficulty factor; in column three is
     the level of the knowledge factor; and in column four is the
     time, in seconds, to solve the problem.  A set of fictitious
     data follow.
                   lucy     easy     novice     12
                   lucy     hard     novice     22
                   fred     easy     novice     15
                   fred     hard     novice     16
                   ethel    easy     novice     10
                   ethel    hard     novice     15
                   ricky    easy     novice     25
                   ricky    hard     novice     30
                   ernie    easy     expert      7
                   ernie    hard     expert     10
                   bert     easy     expert     12
                   bert     hard     expert     18
                   bigbird  easy     expert     15
                   bigbird  hard     expert     14

     The call to _a_n_o_v_a to analyze the data would probably look
     like:

           anova subjects difficulty knowledge time < data

     "data" being the name of the file containing the above data.
     Notice that subjects is the random factor so indexes for
     that factor appear in the first column.  Data, here called
     "time", must appear in the last column.  You can see that
     "difficulty" is a within groups factor because each person
     appears at every level of that factor.  In the third column
     are indexes for "knowledge", the between groups factor, so
     identified because no person appears at more than one level
     of that factor.

FFFFIIIILLLLEEEESSSS
     /tmp/anova.????

SSSSEEEEEEEE AAAALLLLSSSSOOOO
     dm(1),
     G. Perlman, Data analysis programs for the UNIX operating
     system, _B_e_h_a_v_i_o_r _R_e_s_e_a_r_c_h _M_e_t_h_o_d_s _a_n_d _I_n_s_t_r_u_m_e_n_t_a_t_i_o_n, 1980,



Printed 11/22/82        October 16, 1980                        3






ANOVA(1STAT)        UNIX Programmer's Manual         ANOVA(1STAT)



     _1_2, 554-558.

AAAAUUUUTTTTHHHHOOOORRRR
     Gary Perlman

BBBBUUUUGGGGSSSS
     Huge designs, such as those with hundreds of levels of the
     random factor, cause some rounding errors to creep in.  This
     only seems to be true when the effects of a factor are
     negligible, so it appears this is not a practical problem.













































Printed 11/22/82        October 16, 1980                        4



