Variance (ANOVA) Designs
ANOVA - Comparing More Than
Two Group Means
As we've seen, t tests
compare the means of two independent groups or two related groups or
scores. Many research designs involve more than just two groups. One
approach for comparing more than two means is to test each pair of
means with a t
test. Not only does this approach involve many tests, but it also
compounds the Type 1 error rate. For example, if we have three groups,
Morning, Afternoon, and Evening, whose scores we want to compare, we
could compare 1) Morning with Afternoon, 2) Morning with Evening, and
3) Afternoon with Evening. This approach would involve three separate
tests, and it would result in an overall error rate of three times that
of a single test. Instead of this approach, statisticians use an
overall test, called an ANOVA.
ANOVA stands for ANalysis Of VAriance and
is described as an omnibus test because it tests all differences
between the separate groups at once. The variances being analyzed are
the deviations from the mean scores, which we used to calculate the
standard deviation (itself a measure of variation, of course). Review
the calculation for the variance and standard deviation before
proceeding on. Understanding that calculation will help to understand
the following technique.
There are many types of
ANOVAs, which depend on the specifics of the research design. The first
type we will consider is called simple or oneway ANOVA. A oneway ANOVA is used to compare the
means of three or more groups. Think of it as an extension
of an independent t
test, or think of the independent t
test as a special case of an ANOVA for two groups.
heart of an ANOVA test is something called the sum of squares. This is where
recalling the calculation of the variance and standard deviation is
helpful. Remember that we determined deviations from the mean for each
score in a distribution. This process resulted in positive and negative
deviations, which, due to the property of the mean, all sum to 0
always. In order to calculate an average deviation, we squared and then
summed the deviations. This is the sum of squares (SS) - think of it as
the sum of squared deviations to remember where the squares originated.
In this section, we will be considering three
different types of sums of squares.
variance plus unexplained
variance equals total variance, or stated another
way, the between groups
squares plus the within groups sum of squares equals the
total sum of squares. In a t
between groups sum of squares is represented by the difference between
the two group means and the within groups sum of squares is represented
by the variances of the two groups.
- The total sum of squares
represents the total amount of variation in the combined distribution
of all raw scores.
- The between groups sum of squares
represents the variation that is related to group membership. The
between groups sum of squares represents variance that is explained by the
characteristics that define the separate groups.
- The within groups sum of squares
represents the variation within the separate groups, which is unexplained variance and is
also called error.
statistic for an ANOVA is called the F
ratio, which compares the between groups sum of squares
with the within groups sum of squares. The between groups sum
of squares is divided by the degrees of freedom associated with the
number of group means (e.g., if there are 3 groups, then there are 2
degrees of freedom) to obtain the
mean sum of squares between groups. Likewise, the within
groups sum of squares is divided by the degrees of freedom associated
with the group sample size (n). For example, when comparing three
groups of 10 scores, there are 27 degrees of freedom, which is 9
degrees of freedom from each group. Dividing the within groups sum of
squares by its associated degrees of freedom gives the mean sum of squares within groups.
The F ratio is
the mean between groups sum of squares divided by the
mean within groups sum of squares. This test statistic is
used to determine statistical significance of the result. SPSS and
Excel report the associated p-value directly; Appendix D in the text
lists some example F values for specific α levels and degrees
of freedom. Notice that there are two degrees of freedom parameters in
the table - one for the numerator of the F ratio and another for the
size for an ANOVA is called eta squared, η2,
and is calculated by dividing the between
sum of squares by the total
sum of squares to obtain the percentage of variance that is explained
by group membership. Similar to the coefficient of
determination, the amount of unexplained variance is 1 - the amount of
The assumptions for an ANOVA are
extensions of the assumptions for a t
test, namely 1)
independent observations, 2) normally distributed data, and 3)
homogeneity of variance among all groups. There is a form of Levene's
test to assess whether the variances are similar.
step through an example. Suppose that you are teaching three sections
of the same course, designated Morning, Afternoon, and Evening. For the
sake of this example, each section has 10 students. You collected data
from your students using an instrument that measures their level of
activity in the class. Here are the data:
1. Set up the
The null hypothesis is that the population means
are equal. Symbolically, H0: μMorning
alternative, research, hypothesis is never directional for an ANOVA. In
this case, the alternative hypothesis is that the three means are not
MeanMorning ≠ MeanAfternoon ≠ MeanEvening
As we will see, this form
of the alternative hypothesis will not identify the source of specific
differences if they are found to exist.
the alpha level at α =
.05, which is always two-tailed for an ANOVA, based on the
3. Select the
appropriate test statistic (see the decision tree inside the back cover
The appropriate test to use is a
The test statistic will be F.
the test's assumptions and the compute
the test statistic based on the sample data (obtained value).
Independence is a result of the
data collection process.
Normality is checked by inspecting
the histograms and skewness ratios.
Homogeneity of variances
is check with Levene's test (see the SPSS output below).
test statistic is calculated and reported by SPSS or Excel. This is how
the F ratio is determined:
First, a grand mean is calculated.
30 scores shown above are added and the sum is divided by 30 to obtain
a grand mean of 6.30.
between groups sum of squares is the sum of the squared differences of
each group mean from the grand mean multiplied by the sample size.
[Note: ^2 represents squaring and * represents multiplication]
groups sum of squares = 10 * [(6.60-6.30)^2 + (4.90-6.30)^2 +
(7.40-6.30)^2] , which equals 10 * [0.09 + 1.96 + 1.21] or 32.60
within groups sum of squares is the sum of each score's squared
deviation from its group mean. In other words, we subtract 6.60, 4.90,
or 7.40 from
each score listed above, square that result, and then add up all of the
squares. Here are the squared deviations from the grand mean:
groups sum of squares = 8.40 + 4.90 + 8.40, which equals 21.70
the two mean sums of squares are calculated by dividing by the degrees
of freedom. Because there are three groups, the between groups degrees
of freedom is 3-1 or 2. Because each group has 10 observations, the
within groups degrees of freedom is 3 * (10 -1) or 3*9 which is 27.
mean between groups sum of squares is 32.60/2 or 16.30
mean within groups sum of squares is 21.70/27 or .803704
F ratio is 16.30/.80 or 20.28
to #8. Determine
the critical value for the test statistic.
to #8. Compare
the obtained value with the critical value.
to #8. Either
reject or retain the null hypothesis based on the following:
value > critical value, then reject the null hypothesis -
evidence supports the research hypothesis.
obtained value <= critical value, then retain the null
hypothesis - evidence does not support the research hypothesis.
to #5-7 - for use with SPSS output:
Compare the reported
p-value (Sig.) with the preset alpha level.
p-value < alpha level, then reject the null hypothesis -
supports the research hypothesis. There is a small chance of committing
a Type I error.
>= alpha level, then retain
hypothesis - evidence does not support the research hypothesis. The
chance of committing a Type I error is too large.
the first table of the SPSS output below, find the group means and the
grand mean described above. The second table contains the results of
Levene's test, which establishes the equality of the group standard
deviations. The third table contains the ANOVA results, which include
the three sums of squares, the two mean sums of squares, the F ratio,
and the observed p-value. Because the p-value is less than .05, we
reject the null hypothesis and conclude that the three group means are
different - but how are the different? For this we need to conduct
another test, called pairwise comparisons. There are numerous versions
of this test. We'll use the one that the author recommends, the
Bonferroni test. But before that, look over the Excel results,
generated using the ANOVA: Single Factor option on the Data Analysis
dialog box, which is accessed from the Data Analysis command on the
Tools menu (see the earlier note about installing these tools).
Here are the Excel ANOVA results:
for the pairwise comparisons. The Bonferroni test identifies the
specific source of the differences found by the overall ANOVA
test reports that the mean score of the Afternoon section
different from both the Morning (6.6) and Evening (7.4) sections, but
that the difference between Morning and Evening sections' scores is not
statistically significant. Here is a picture of the pattern of mean
effect size is
computed from the ANOVA table as the percentage of explained variance,
denoted by η2,
which is 32.6/54.3
or about 40.0%, representing a medium effect. The determination of the
relative strength of an effect depends on the field of study - general
guidelines should be weighed against findings of other researchers.
practice understanding the oneway ANOVA and explained and unexplained
Factorial ANOVA - Comparing
Group Means Based on Combinations of Independent Variables
take the application of ANOVA just one step further. Suppose that you
not only want to compare sections of a course but you also want to
compare the level of activity of majors and non-majors in the course.
Now, instead of just three groups, you have six groups - three sections
and two types of students within each section. Again to make the
example more straightforward, we'll assume equal numbers of majors and
non-majors within the sections. That is, there are exactly five majors
and five non-majors in each of the three sections. This uniformity is
not a requirement, but it does make the results easier to understand.
conducting the comparison of the six group means, let's introduce a few
new terms. First of all, this type of ANOVA is called a factorial ANOVA, and in
particular for the example just described, a two factor or two-way
ANOVA. A factor
is an independent variable that is used to categorize the observations.
we compare the means, there are two types of effects that we can
observe. They are called main effects and interaction effects. Main effects
are due to the factors themselves. In this example, there is a main
effect due to Section and another main effect due to Type of
due to the combination of the two factors. For example, if we found
that majors are more actively learning in the morning section and
non-majors are more actively learning in the afternoon section, there
would be an interaction effect. Interactions are designated by a
combination of factors, such as Section * Type.
table, the result of the factorial structure is a separation of the
between groups sum of squares. Because there are more categories of
students, we have more ways to determine where the differences might
arise. In the following example, which uses the same data that we used
earlier, you'll notice that the overall sum of squares is the same.
various F ratios and effect sizes are calculated in the same
manner as they were in the oneway ANOVA. The assumptions are
same as the oneway ANOVA as well - there are just more subgroups to
Here are the data seen earlier, but
now divided by Type of Student as well.
is the Excel output from the ANOVA: Two-Factor with Replication
command. Note when using this command, the column and row containing
the headings are required in the range. The values in the Total column
were reported in the previous oneway ANOVA.
Here is the ANOVA
table with the sum of squares for the two main effects and for the
that the differences due to the Section are statistically significant,
which is what we found in the oneway ANOVA, and that the differences
due to Type of Student are statistically significant, but the
interaction effect is not statistically significant. Try computing the
effect sizes for the statistically significant effects. What percentage
of variance is left unexplained?
is the same analysis from SPSS. First, here are the descriptive
the groups as well as Levene's test for the equality of variances.
Then here are the results of
the ANOVA. Notice that SPSS includes additional information in the
the purposes of our example, we just need to focus on the rows labeled,
Section, Type, Section * Type, Error, and Corrected Total. Compare
these results to the Excel output displayed above.
is a picture that illustrates the pattern of means.
the two lines do not cross each other, there is no interaction effect.
In this graph, we can see that no matter which section they were in,
the mean scores of Majors exceeded those of Non-Majors. This pattern
represents a main effect. Also, by estimating the midpoints between the
two mean scores for each section, we can see that Morning and Evening
mean scores are higher than the Afternoon mean score.
practice understanding the two-way ANOVAs and main effects and
and choose the Two-Way model.
Correlation and Regression
previous module, correlation was introduced as a descriptive
statistic that describes the nature of a relationship between two
variables. There are various types of correlations depending on the
specific characteristics of the two variables between compared. One
commonly used correlation is the Pearson product-moment correlation
coefficient, which measures the extent of a linear relationship between
two interval or ratio-level variables. The Pearson correlation,
represented by r, ranges from -1 to +1. The magnitude of r indicates
the degree to which the pattern of paired points represents a line. The
sign of r (- or +) indicates the slope of the line that represents the
relationship - a positive r indicates a direct relationship a negative
r indicates an indirect relationship. When r = 0, there is no linear
relationship between the two variables. The characteristics of
relationships that are represented by a correlation must be checked by
inspecting a scatterplot of pairs of points.
example, here is a scatterplot of 2006 and 2007 API scores for San
Francisco elementary schools.
Here is a
correlation matrix of various characteristics of the San Francisco
elementary schools. Meals represents the percentage of students who are
eligible for free or reduced lunched. P_EL represents the percentage of
English language learners in the schools. Not-HSG through Grad_Sch
represent percentages of parent education levels. Each cell in
the matrix includes the Pearson r, significance level (p-value), and N,
which represents the number of schools.
shown in the matrix above, correlation can be used in an inferential
test. The second number in each cell of the matrix is the level of
statistical significance (p-value) associated with the inferential test
of the correlation value.
The assumptions for
conducting an inferential test of correlation rely on the concept of a
conditional distribution. A conditional distribution can be thought of
as either horizontal or vertical bands of a scatterplot. The
conditional distribution of Y given X is the distribution of Y values
for any given X value. The actual assumptions for the test are the
assumptions can be checked by inspecting the scatterplot, identifying
outliers, and analyzing skewness ratios.
- Normality - both variables must
be normally distributed as well as each conditional distribution is
- Homoscedasticity - the
standard deviations for each conditional distribution are equal.
- the means of each conditional distribution lie on a straight line.
The null hypothesis for the
inferential test is:
H0: ρ = 0,
that is, the
correlation between the two variables in the population is equal to 0,
or stated another way, the two variables are not correlated.
The alternative hypothesis is one
of the following:
H1: r ≠ 0
(the non-directional alternative hypothesis)
r < 0 (the correlation is indirect)
r > 0 (the correlation is direct)
distribution associated with the inferential test is Student's t distribution,
which is based on a function of r and n, namely
The associated degrees of
freedom for the test are N-2. If the p-value obtained is less than the
preset value of α (usually .05), then the null hypothesis of no
correlation between the variables is rejected in favor of the
alternative hypothesis. .
In the matrix shown
above, the correlation between api06 and api07 of r = .938 is
statistically significant at the .05 level, and the correlation between
api07 and some_col (percentage of parents who have some college
education but not a degree) of r = -.220 is not statistically
significant. For correlations that are found to be statistically
significant, an effect size of r2 can be
computed to report the amount of shared variance between the two
variables. This effect size measure is called the coefficient of determination.
In the example of api06 and api07, .9382 =
.8798, which indicates that 88.0% of the variance is shared between the
two sets of API scores. Caution must be applied when statistically
significant correlations are found for large samples. A statistically
significant correlation does not necessarily indicate a meaningful
When a statistically and practically significant
correlation is found between two variables, it may be appropriate and
worthwhile to consider predicting one of the variables based on the
other variable. The variable used in the prediction is called
or independent variable.
The variable being predicted is called the criterion
or dependent variable. The
process used in the prediction is called simple linear regression and
uses the following linear equation:
Y' = bX + a
Y' is the predicted variable, b is the slope of the line, X is the
independent variable, and a is the y-intercept (the point at which the
line crosses the vertical axis). The equation parameters, a and b, can
be derived as detailed in the text, or can be determined
using the correlation between X and Y and the fact that the
line passes through the point (X,
The magnitude (absolute value) of the correlation coefficient
determines the accuracy of the prediction. This accuracy is based on
the difference between the observed Y value and the predicted Y' value
(Y - Y'), which is called a deviation, discrepancy, or residual. An
overall measure of the accuracy of the prediction is also given by
the standard error of
estimate, which is the square root of the sum of
these deviations squared and divided by the sample size.
Here is an example of a regression analysis
in which 2007 API scores for San Francisco elementary schools are
predicted using the 2006 API scores.
Note that the Model Summary
table displays R, R2, and the standard error of
estimate, which was mentioned earlier. R is the correlation between Y'
and Y, which, for simple linear regression with one predictor variable,
is equal to the absolute value of r. The adjusted R2
corrects for the effects of small sample sizes, if any.
ANOVA table shown below reports the sums of squares for the predicted
variable and the error (residuals/deviations/discrepancies). It is
important to note that the decimal places in the sum of squares column
are not aligned - use caution when analyzing these values. The
interpretation of this table is the following. First of all, the large
value of F and small significance (p-value) indicates that the fit of
the regression model to the data is good - the null hypothesis of no
relationship between the data and the regression model is rejected. The
second important source of information is the sum of squares
column. The amount of
explained variance is given by the ratio of the Regression
sum of squares to the Total sum of squares, or 439291.6 divided by
499042.2, which equals .88 or 88%. This coincides with the value of R2
listed in the previous table. Both of these measures are effect sizes
for the regression model. Consequently, the remaining unexplained variance, which
is given by the ratio of the Residual sum of squares to the Total sum
of squares, is 12%.
table provides the regression equation parameters, a and b, as well as
two t tests
about those parameters. In the model, the regression equation is
= .962 X + 34.988
The first t test assesses
whether the y-intercept, a, is different from 0. In this case, based on
the standard error, the value of the y-intercept is not statistically
significantly different than 0. This result may have meaning in
particular settings. In this example, it has no real importance. The
second t test
assesses whether the slope is different from 0. This test is more
important than the previous one because if the slope is likely to be 0,
then so is the correlation and, consequently, the prediction would not
be very accurate. In this case, the slope of .962 is statistically
significantly different from 0 because the p-value associated with t = 22.194 is less
Leverage, and Influence of Observations
Once a regression
equation with adequate prediction accuracy has been determined, more
specific analysis of individual points that were used to develop the
regression model can be performed. Three characteristics will be
introduced: Discrepancy (also known as deviation or residual),
Leverage, and Influence.
(deviation or residual) is the vertical distance between an
observed point and its predicted value, which is the point
directly above or below it on the regression line. Observed points that
fall far from the regression line have high discrepancy. Observed
points that fall on the regression line have no discrepancy.
Leverage is the horizontal distance that a
point falls from the center (mean) of the predictor (independent)
Observed points far from the mean for the predictor variable (i.e., the
extreme scores for X) have high leverage. Those near the mean of X have
combination of discrepancy and leverage determines a points
influence. Those points with both high discrepancy and
high leverage have a greater degree of influence on the parameters for
the regression equation. Eliminating the points with high influence may
improve the accuracy of the regression model. These highly influential
points may indicate unique characteristics that merit more careful
Here are some examples of plots of
the regression line, discrepancy, leverage, and influence for the 2006
and 2007 API scores for San Francisco elementary schools.
the scatterplot of the two API scores below, the regression line has
been superimposed to represent the predicted Y' scores. Notice that
most of the observed points fall fairly close to the line. Some of the
lower API 2007 scores fail to match the pattern.
The discrepancies are the vertical
distances between the observed points and the line. Notice that most
observed points are within a distance of 50 score points from the line.
One observation is over 150 points below the line.
As mentioned earlier, leverage is
the horizontal distance for the mean of X (API06). The mean API score
for these schools in 2006 was 774 points. The points with the most
leverage are those scores well below and above 774.
combination of discrepancy and leverage indicates the influence a
point has on the regression model. Note here that the two
points with the most influence are the circled ones. Closer
investigation of the data reveals that Cesar Chavez Elementary
went from an API score of 735 in 2006 to an API score of 596 in 2007,
when the school's predicted score was 742. John Muir
Elementary went from an API score of 615 in 2006 to an API
score of 573 in 2007, when the school's predicted score was 627. The
observation for Chavez Elementary has very high discrepancy and
relatively low leverage, because its 2006 API score was near the mean
of 774 points. The observation for Muir Elementary has moderate
discrepancy but high leverage, because its 2006 API score was among the
lowest. Inspecting the discrepancy plot will show two other schools
whose predicted scores were off by over 50 points, but because their
2006 API scores are closer to the mean than Muir's was, their influence
on the regression model is less.