7.
Product and Process Comparisons
7.2. Comparisons based on data from one process 7.2.6. What intervals contain a fixed percentage of the population values?


Definition of a tolerance interval  A confidence interval covers a population parameter with a stated confidence, that is, a certain proportion of the time. There is also a way to cover a fixed proportion of the population with a stated confidence. Such an interval is called a tolerance interval. The endpoints of a tolerance interval are called tolerance limits. An application of tolerance intervals to manufacturing involves comparing specification limits prescribed by the client with tolerance limits that cover a specified proportion of the population.  
Difference between confidence and tolerance intervals  Confidence limits are limits within which we expect a given population parameter, such as the mean, to lie. Statistical tolerance limits are limits within which we expect a stated proportion of the population to lie.  
Not related to engineering tolerances  Statistical tolerance intervals have a probabilistic interpretation. Engineering tolerances are specified outer limits of acceptability which are usually prescribed by a design engineer and do not necessarily reflect a characteristic of the actual measurements.  
Three types of tolerance intervals 
Three types of questions can be addressed by tolerance intervals.
Question (1) leads to a twosided interval; questions (2) and (3)
lead to onesided intervals.


Tolerance intervals for measurements from a normal distribution 
For the questions above, the corresponding tolerance intervals are
defined by lower (L) and upper (U) tolerance limits which are computed
from a series of measurements \(Y_1, \, \ldots, \, Y_N\):


Calculation of \(k\) factor for a twosided tolerance limit for a normal distribution 
If the data are from a normally distributed population, an
approximate value for the \(k_2\)
factor as a function of \(p\)
and \(\alpha\)
for a twosided tolerance interval
(Howe, 1969) is
$$ k_2 = z_{(1+p)/2}
\sqrt{\frac{\nu \left(1 + \frac{1}{N}\right) \, }{\chi^2_{1\alpha, \, \nu}}}
\, , $$
where \(\chi_{1\alpha, \, \nu}^2\)
is the
critical value of the chisquare
distribution with degrees of freedom \(\nu\)
that is exceeded with probability \(\alpha\),
and \(z_{(1+p)/2}\)
is the critical
value of the normal distribution associated with cummulative
probability \((1+p)/2\).
The quantity \(\nu\) represents the degrees of freedom used to estimate the standard deviation. Most of the time the same sample will be used to estimate both the mean and standard deviation so that \(\nu = N1\), but the formula allows for other possible values of \(\nu\). 

Guenther's correction to \(k_2\)  Guenther (1977) recommends the following correction to Howe's approximation $$ k_2^{*} = wk_2 $$ where $$ w = \sqrt{ 1 + \frac{N 3  \chi^2_{N1,1  \alpha} } {2(N+1)^2}} $$ For reasonably large values of \(N\), this correction factor should be close to 1. For example, for \(N\) = 40 and \(\alpha\) = 0.95, the correction factor is 0.9972.  
Example of calculation 
For example, suppose that we take a sample of \(N\)
= 43 silicon
wafers from a lot and measure their thicknesses in order to find
tolerance limits within which a proportion \(p\)
= 0.90 of the wafers in the lot fall with confidence \(\alpha\)
= 0.99. Since the standard deviation, \(s\),
is computed from the sample of 43 wafers, the degrees of freedom
are \(\nu = N1\).
The reader can download the data as a text file. 

Use of tables in calculating twosided tolerance intervals 
Values of the \(k_2\)
factor as a function of \(p\) and \(\alpha\)
are tabulated in some textbooks, such as
Dixon and
Massey (1969). To use the normal and chisquare tables in this
handbook to approximate the \(k_2\)
factor, follow the steps outlined below.
The tolerance limits are then computed from the sample mean, \(\bar{Y}\), and standard deviation, \(s\), according to case(1). 

Important notes 
The notation for the critical value of the chisquare
distribution can be confusing. Values as tabulated are, in a sense, already
squared; whereas the critical value for the normal distribution must be
squared in the formula above.
Some software is capable of computing a tolerance intervals for a given set of data so that the user does not need to perform all the calculations. All the tolerance intervals shown in this section can be computed using both Dataplot code and R code. In addition, R software is capable of computing an exact value of the \(k_2\) factor thus replacing the approximation given above. R and Dataplot examples include the case where a tolerance interval is computed automatically from a data set. 

Calculation of a onesided tolerance interval for a normal distribution  The calculation of an approximate \(k\) factor for onesided tolerance intervals comes directly from the following set of formulas (Natrella, 1963): $$ \begin{eqnarray} k_{1} & = & \frac{z_{p} + \sqrt{z_{p}^2  ab}} {a} \\ & & \\ a & = & 1  \frac{z_{\alpha}^2}{2(N1)} \\ & & \\ b & = & z_{p}^2  \frac{ z_{\alpha}^2}{N} \, . \end{eqnarray} $$  
A onesided tolerance interval example  For the example above, it may also be of interest to guarantee with 0.99 probability (or 99 % confidence) that 90 % of the wafers have thicknesses less than an upper tolerance limit. This problem falls under case (3). The calculations for the \(k_1\) factor for a onesided tolerance interval are: $$ \begin{eqnarray} a & = & 1  \frac{1}{2(431)} \, (2.3263)^2 = 0.9356\\ & & \\ b & = & (1.2816)^2  \frac{1}{43} \, (2.3263)^2 = 1.5165\\ & & \\ k_{1} & = & \frac{1.2816 + \sqrt{(1.2816)^2  (0.9356)(1.5165)}} {0.9356} = 1.8752 \, . \end{eqnarray} $$  
Tolerance factor based on the noncentral \(t\) distribution 
The value of \(k_1\)
can also be computed using the
inverse cumulative distribution function for the noncentral \(t\)
distribution. This method may give more accurate
results for small values of \(N\).
The value of \(k_1\)
using the noncentral \(t\)
distribution (using the same example as above) is:
$$ \begin{eqnarray}
\delta & = & z_{p} \sqrt{N} = 1.2816 \sqrt{43} = 8.4037 \\
& & \\
k_1 & = & \frac{ t_{\alpha, \, N1, \, \delta} }{ \sqrt{N} }
= \frac{12.28834}{\sqrt{43}} = 1.8740 \, ,
\end{eqnarray} $$
where \(\delta\)
is the noncentrality parameter.
In this case, the difference between the two computations is negligible (1.8752 versus 1.8740). However, the difference becomes more pronounced as the value of \(N\) gets smaller (in particular, for \(N \le\) 10). For example, if \(N\) = 43 is replaced with \(N\) = 6, the noncentral \(t\) method returns a value of 4.4111 for \(k_1\) while the method based on the Natrella formulas returns a value of 5.2808. The disadvantage of the noncentral \(t\) method is that it depends on the inverse cumulative distribution function for the noncentral \(t\) distribution. This function is not available in many statistical and spreadsheet software programs, but it is available in Dataplot and R (see Dataplot code and R code). In addition, the inverse of the noncentral t function may lose accuracy for large sample sizes. The Natrella formulas only depend on the inverse cumulative distribution function for the normal distribution (which is available in just about all statistical and spreadsheet software programs). Unless you have small samples (say \(N \le\) 10), the difference in the methods should not have much practical effect. 