7.4.3.1. One-way ANOVA overview

7. Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
7.4.3. Are the means equal?

7.4.3.1. One-way ANOVA overview

Overview and principles

This section gives an overview of the one-way ANOVA. First we explain the principles involved in the one-way ANOVA.

Partition response into components

In an analysis of variance the variation in the response measurements is partitoned into components that correspond to different sources of variation.

The goal in this procedure is to split the total variation in the data into a portion due to random error and portions due to changes in the values of the independent variable(s).

Variance of $n$ measurements

The variance of $n$ measurements is given by $$ s^2 = \frac{\sum_{i=1}^n (y_i - \bar{y})^2}{n-1} \, , $$ where $\bar{y}$ is the mean of the $n$ measurements.

Sums of squares and degrees of freedom

The numerator part is called the sum of squares of deviations from the mean, and the denominator is called the degrees of freedom.

The variance, after some algebra, can be rewritten as: $$ s^2 = \frac{\sum_{i=1}^n y_i^2 - \frac{1}{n} \, \left( \sum_{i=1}^n y_i \right)^2}{n-1} \, . $$ The first term in the numerator is called the "raw sum of squares" and the second term is called the "correction term for the mean". Another name for the numerator is the "corrected sum of squares", and this is usually abbreviated by Total $SS$ or $SS(Total)$.

The $SS$ in a one-way ANOVA can be split into two components, called the "sum of squares of treatments" and "sum of squares of error", abbreviated as $SST$ and $SSE$, respectively.

The guiding principle behind ANOVA is the decomposition of the sums of squares, or $SS(Total)$

Algebraically, this is expressed by $$ \begin{array}{ccccc} SS(Total) & = & SST & + & SSE \\ & & & & \\ \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{\huge{\cdot \cdot}})^2 & = & \sum_{i=1}^k n_i (\bar{y}_{i \huge{\cdot}} - \bar{y}_{\huge{\cdot \cdot}})^2 & + & \sum_{i=1}^k \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_{i \huge{\cdot}})^2 \, , \end{array}$$

where $k$ is the number of treatments and the bar over the $y_{\huge{\cdot \cdot}}$ denotes the "grand" or "overall" mean. Each $n_i$ is the number of observations for treatment $i$. The total number of observations is $N$ (the sum of the $n_i$).

Note on subscripting

Don't be alarmed by the double subscripting. The $SS(Total)$ can be written single or double subscripted. The double subscript stems from the way the data are arranged in the data table. The table is usually a rectangular array with $k$ columns and each column consists of $n_i$ rows (however, the lengths of the rows, or the $n_i$, may be unequal).

Definition of "Treatment"

We introduced the concept of treatment. The definition is: A treatment is a specific combination of factor levels whose effect is to be compared with other treatments.