7.
Product and Process Comparisons
7.4. Comparisons based on data from more than two processes
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fixed and Random Factors and Components of Variance | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A fixed level of a factor or variable means that the factor levels in the experiment are the only ones we are interested in |
In the previous example, the levels
of the temperature factor were considered as fixed; that is,
the three temperatures were the only ones that we were interested in.
The model employed for fixed levels is called a fixed effects model.
The adequacy of the fixed effects model may be questionable in the previous example, but for a study involving fixed points of the International Temperature Scale of 1990 (ITS-90), treating the effects of temperature as fixed effects can be meaningful and very appropriate. When the levels of a factor have been chosen by random sampling, such as operators, days, lots or batches, or more generally when they are of no specific interest in themselves, being regarded as representative of all possible levels that the factor may take, then the appropriate model is a random effects model. In these circumstances, the inferences drawn from the data are meant to apply to the whole collection of levels that the factor may conceivably take. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Variance Components | Fitting a random effects model is often the means to obtain estimates of the contributions that different experimental factors make to the overall variability of the data, as expressed by their variance. These contributions are called variance components. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Example: Variance Components | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data for the example |
A company supplies a customer with a larger number of batches of raw
materials. The customer makes three sample determinations from each
of five randomly selected batches to control the quality of the
incoming material. The model is
$$ Y_{ij} = \mu + \tau_i + \epsilon_{ij} \, , $$
and the \(k\)
levels (e.g., the batches) are chosen at random
from a population with variance \(\sigma_\tau\).
The data are shown below.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ANOVA table for example |
A one-way ANOVA performed on the data yielded the following results, where
SS denotes the sums of squares, df denotes the numbers of
degrees of freedom, MS denotes the mean squares, EMS denotes the expected mean squares, F is the value of the statistic used to test whether
\(\sigma_\tau\) = 0, followed by the p-value of the test.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interpretation of the ANOVA table |
The computations that produce the SS are the same for both the fixed
and the random effects models. For the random effects model, however, the
batch sum of squares, 147.73,
is an estimate of \(\{\sigma_\epsilon^2 + 3 \sigma_\tau^2\}\).
This is shown in the EMS column of the ANOVA table.
The statistic used to test the hypothesis that there are no batch effects, that is \(\sigma_\tau^2\) = 0, is F = 36.933 / 1.800 = 20.52. The p-value is the probability of observing an F statistic this large or larger owing to the vagaries of sampling alone, when \(\sigma_\tau^2\) = 0. Since the p-value is very small, we conclude that \(\sigma_\tau^2\) > 0, hence that there are significant batch effects. The validity of this conclusion rests on several assumptions: (i) the \(\{\tau_{i}\}\) are a sample from a Gaussian distribution with mean 0 and standard deviation \(\sigma_\tau\); (ii) the \(\{\epsilon_{ij}\}\) are a sample from a Gaussian distribution with mean 0 and standard deviation \(\sigma_{\epsilon}\); and (iii) the \(\{\tau_{i}\}\) and the \(\{\epsilon_{ij}\}\) are mutually independent. To find out how much of the variance in the results of the experiment may be attributed to batch differences and how much to random error, we need to estimate the variance components, \(\sigma_{\tau}^{2}\) and \(\sigma_{\epsilon}^{2}\). |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Method of moments estimates of variance components |
The classical estimates of \(\sigma_{\tau}^{2}\) and
\(\sigma_{\epsilon}^{2}\) are computed using the method of moments, by
equating expected and observed mean squares, and solving the two
simultaneous equations for \(\sigma_{\tau}^{2}\) and
\(\sigma_{\epsilon}^{2}\):
$$ \begin{aligned} \sigma_{\epsilon}^{2} + 3 \sigma_{\tau}^{2} &= 36.933 \\ \sigma_{\epsilon}^{2} &= 1.800 . \end{aligned} $$ The solution is \(\widehat{\sigma}_{\epsilon}^{2}\) = 1.800 and \(\widehat{\sigma}_{\tau}^{2}\) = (36.933 - 1.800)/3 = 11.71. The latter amounts to 11.71/ (1.800 + 11.71) = 86.7 % of the total variance, while the batch-specific errors contribute only 13.3 %. Note that, especially when \(\sigma_{\tau}^{2}\) is small relative to \(\sigma_{\epsilon}^{2}\), the batch MS can be smaller than the error MS. In these circumstances, the method of moments estimate of \(\sigma_{\tau}^{2}\) can be negative, which is nonsensical. For this reason, and for reasons related to the performance of the estimates, alternative estimates may be preferable, which we illustrate next. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Restricted maximum likelihood estimates of variance components |
The restricted maximum likelihood (REML) estimates of
the variance components are generally preferable to the method of
moments estimates.
Searle (2006)
discuss the matter in detail.
The companion R code shows how these estimates can be computed. In this case, the REML estimates of \(\sigma_{\tau}^{2}\) and of \(\sigma_{\epsilon}^{2}\) are identical to those obtained by application of the method of moments. However, the REML approach also yields confidence intervals for the variance components. A 95 % confidence interval for \(\sigma_{\tau}^{2}\) ranges from about 1 to 34; while confirming the significance of the batch effects, it also reveals that there is great uncertainty about the true value of this variance component. |