A One-Way Analysis of Variance is a way to test the equality of three or more population means at one time by using sample variances, under the following assumptions:

- The data involved must be interval or ratio level data.
- The populations from which the samples were obtained must be normally or approximately normally distributed.
- The samples must be independent.
- The variances of the populations must be equal (i.e., homogeneity of variance).

The null hypothesis is that all population means are equal, the alternative hypothesis is that at least one mean is different.

In the case where one is dealing with $k \ge 3$ samples all of the same size $n$, the calculations involved are much simpler, so let us consider this scenario first.

The strategy behind an ANOVA test relies on estimating the common population variance in two different ways: 1) through the mean of the sample variances -- called the **variance within samples** and denoted $s^2_p$, and 2) through the variance of the sample means -- called the

When the means are not significantly different, the variance of the sample means will be small, relative to the mean of the sample variances. When the at least one mean is significantly different from the others, the variance of the sample means will be larger, relative to the mean of the sample variances.

Consequently, precisely when at least one mean is significantly different from the others, the ratio of these estimates $$F = \frac{s^2_b}{s^2_w}$$ which follows an $F$-distribution, will be large (i.e., somewhere in the right tail of the distribution).

To calculate the variance of the sample means, recall that the Central Limit Theorem tells us that $$\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}$$ Solving for the variance, $\sigma^2$, we find $$\sigma^2 = n\sigma^2_{\overline{x}}$$ Thus, we can estimate $\sigma^2$ with $$s^2_b = n s^2_{\overline{x}}$$

Calculating the mean of the sample variances is straight-forward, we simply average $s^2_1, s^2_2, \ldots, s^2_k$. Thus, $$s^2_p = \frac{\sum s^2_i}{k}$$

Given the construction of these two estimates for the common population variance, their quotient $$F = \frac{s^2_b}{s^2_w}$$ gives us a test statistic that follows an $F$-distribution with $k-1$ degrees of freedom associated with the numerator and $(n-1) + (n-1) + \cdots + (n-1) = k(n-1) = kn - k = N - k$ degrees of freedom associated with the denominator.

Things become more complicated when the sample sizes are not all the same, but the principle is the same. The process in outlined below. In the following, lower case letters apply to the individual samples and capital letters apply to the entire set collectively. That is, $n$ is one of many sample sizes, but $N$ is the total sample size.

The **grand mean** of a set of samples is the total of all the data values divided by the total sample size (or as a weighted average of the sample means).
$$\overline{X}_{GM} = \frac{\sum x}{N} = \frac{\sum n\overline{x}}{\sum n}$$

The **total variation** (not variance) is comprised the sum of the
squares of the differences of each mean with the grand mean.
$$SS(T) = \sum (x - \overline{X}_{GM})^2$$

The **between group variation** due to the interaction between the samples is denoted SS(B) for **sum of squares between groups**. If the sample means are close to each other (and therefore the grand mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom.
$$SS(B) = \sum n(\overline{x} - \overline{X}_{GM})^2$$

The variance between the samples, $s^2_b$ is also denoted by MS(B) for **mean square between groups**. This is the between group variation divided by its degrees of freedom.
$$s^2_b = MS(B) = \frac{SS(B)}{k-1}$$

The **within group variation** due to differences within individual samples, denoted SS(W) for **sum of squares within groups**. Each sample is considered independently, so no interaction between samples is involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal to one less than their sample sizes, and there are $k$ samples, the total degrees of freedom is $k$ less than the total sample size: $df = N - k$. $$SS(W) = \sum df \cdot s^2$$

The variance within samples $s^2_w$ is also denoted by MS(W) for **mean square within groups**. This is the within group variation divided by its degrees of freedom. It is the weighted average of the variances (weighted with the degrees of freedom).
$$s^2_w = MS(W) = \frac{SS(W)}{N-k}$$

Here again we find an $F$ test statistic by dividing the between group variance by the within group variance -- and as before, the degrees of freedom for the numerator are $(k-1)$ and the degrees of freedom for the denominator are $(N-k)$. $$F = \frac{s^2_b}{s^2_w}$$

All of this sounds like a lot to remember, and it is. However, the following table might prove helpful in organizing your thoughts: $$\begin{array}{l|c|c|c|c|} & \textrm{SS} & \textrm{df} & \textrm{MS} & \textrm{F}\\\hline \textrm{Between} & SS(B) & k-1 & \displaystyle{s^2_b = \frac{SS(B)}{k-1}} & \displaystyle{\frac{s^2_b}{s^2_w} = \frac{MS(B)}{MS(W)}}\\\hline \textrm{Within} & SS(W) & N-k & \displaystyle{s^2_w = \frac{SS(W)}{N-k}} & \\\hline \textrm{Total} & SS(W) + SS(B) & N-1 & & \\\hline \end{array}$$

Notice that each Mean Square is just the Sum of Squares divided by its degrees of freedom, and the F value is the ratio of the mean squares.

Importantly, one must not put the largest variance in the numerator, always divide the between variance by the within variance. If the between variance is smaller than the within variance, then the means are really close to each other and you will want to fail to reject the claim that they are all equal.

The null hypothesis is rejected if the test statistic from the table is greater than the F critical value with k-1 numerator and N-k denominator degrees of freedom.

If the decision is to reject the null, then the conclusion is that at least one of the means is different. However, the ANOVA test does not tell you where the difference lies. For this, you need another test, like the Scheffe' test described below, applied to every possible pairing of samples in the original ANOVA test.

To test the null hypothesis of $\mu_i = \mu_j$ (two means associated with a previously conducted ANOVA test), the following test statistic $$F_S = \frac{(\overline{x}_i - \overline{x}_j)^2}{\displaystyle{s^2_w \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}}$$ can be compared with a right-tail critical value of $$F' = (k-1)\cdot(\textrm{CV from the previous ANOVA test})$$