|
GRUBBS TESTName:
Grubbs test detects one outlier at a time. For multiple outliers, delete the single outlier detected and run the Grubbs test. Repeat this process until no outliers are detected. More formally, the Grubbs test can be defined as follows.
Note that the above is actually a combination of the following two tests:
To generate these one-sided tests, the test statistic is
or
The significance level in the TPPF function needs to be doubled for the one-sided tests. You can request that one of the one-sided tests be performed (see the Syntax section). Generally, graphical methods such as the box plot or histogram are used to detect outliers. However, the Grubbs test can be used if you prefer a more formal test.
where <y> is the response variable being tested; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax performs the two-sided test.
where <y> is the response variable being tested; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax performs the one-sided test for the minimum value.
where <y> is the response variable being tested; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax performs the one-sided test for the maximum value.
where <y> is the response variable being tested; <labid> is a variable containing the lab-id corresponding to each value of the response variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. The <labid> variable is used to identify the lab-id of the minimum and maximum points. However, it is not used in the computation of the statistic.
<SUBSET/EXCEPT/FOR qualification> where <y1> ... <yk> is a list of up to k response variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax performs a Grubb test on <y1> then on <y2> and so on. Up to 30 response variables can be specified. Note that the syntax
is supported. This is equivalent to
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x1> ... <xk> is a list of up to k group-id variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax can also be used with the MINIMUM and MAXIMUM version of the tests. This syntax peforms a cross-tabulation of <x1> ... <xk> and performs a Grubbs test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 Grubbs tests performed. Up to six group-id variables can be specified. Note that the syntax
is supported. This is equivalent to
GRUBBS TEST Y1 LABID GRUBBS MULTIPLE TEST Y1 Y2 Y3 GRUBBS REPLICATED TEST Y X1 X2 GRUBBS TEST Y1 SUBSET TAG > 2 GRUBBS MINIMUM TEST Y1 GRUBBS MAXIMUM TEST Y1
Masking can occur when we specify too few outliers in the test. For example, if we are testing for a single outlier when there are in fact two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are declared as outliers. On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two outliers when there is in fact only a single outlier, both points may be declared outliers. The possibility of masking and swamping are an important reason why it is useful to complement formal outlier tests with graphical methods. Graphics can often help identify cases where masking or swamping may be an issue. Also, masking is one reason that trying to apply a single outlier test sequentially can fail. If there are multiple outliers, masking may cause the outlier test for the first outlier to return a conclusion of no outliers (and so the testing for any additional outliers is not done). The Grubbs test is used to check for a single outlier. If there are in fact multiple outliers, the results of the Grubbs test can be distorted. If multiple outliers are suspected, then the Tietjen-Moore or the generalized extreme studentized deviate tests may be preferred. The Tietjen-Moore test is a generalization of the Grubbs test for the case where multiple outliers may be present. The Tietjen-Moore test requires that the number of suspected outliers be specified exactly while the generalized extreme studentized deviate test only requires that an upper bound on the suspected number of outliers be specified.
If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.
LET A = GRUBBS DIRECTION Y LET A = GRUBBS INDEX Y LET A = GRUBBS Y The GRUBBS INDEX returns the row index of the most extreme point and GRUBBS DIRECTION specifies whether the most extreme point is in the minimum direction (a -1 is returned) or the maximum direction (a +1 is returned). In addition to the above LET command, built-in statistics are supported for about 20+ different commands (enter HELP STATISTICS for details).
Alternatively, the population standard deviation may be considered to be known accurately (usually based on extensive historical data). In either of these cases, the critical values for the Grubbs test are modified. To support these options, enter the commands
SET GRUBB STANDARD DEVIATION <value> If the specified standard deviation is positive, Dataplot uses the formulas based on the independent estimate of the standard deviation. If the degrees of freedom are not specified, a value of 10,000 will be used. Essentially, any value greater than 120 is effectively treated as a "known" population standard deviation. To compute the critical values using simulation, enter the command SET GRUBB TEST CRITICAL VALUES SIMULATION To reset the default of basing the critical values on a formula, enter SET GRUBB TEST CRITICAL VALUES FORMULA The formula from the E178 standard is
where t is the percent point function of the t distribution and \( \nu \) is the degrees of freedom. For the "known" standard deviation case, the t distribution is replaced with a normal distribution.
REPLICATED GRUBBS TEST is a synonym for GRUBBS REPLICATED TEST
Stefansky, W., "Rejecting Outliers in Factorial Designs", Technometrics, Vol. 14, 1972, pp. 469-479. E178 - 16A (2016), "Standard Practice for Dealing with Outlying Observations", ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, USA.
SKIP 25 READ VANGEL31.DAT Y SET WRITE DECIMALS 4 GRUBBS TEST YThe following output is generated: Grubb Test for Outliers: Test for Minimum and Maximum (Assumption: Normality) Response Variable: Y H0: There are no outliers Ha: The extreme point is an outlier Summary Statistics: Number of Observations: 38 Sample Minimum: 147.0000 ID for Sample Minimum: 1 Sample Maximum: 231.0000 ID for Sample Maximum: 38 Sample Mean: 185.7894 Sample SD: 18.5954 Grubbs Test Statistic Value: 2.4312 Percent Points of the Reference Distribution ----------------------------------- Percent Point Value ----------------------------------- 0.0 = 0.000 50.0 = 2.392 75.0 = 2.601 90.0 = 2.846 95.0 = 3.013 97.5 = 3.169 99.0 = 3.355 100.0 = 6.001 Conclusions (Upper 1-Tailed Test) ---------------------------------------------- Alpha CDF Critical Value Conclusion ---------------------------------------------- 10% 90% 2.846 Accept H0 5% 95% 3.013 Accept H0 2.5% 97.5% 3.169 Accept H0 1% 99% 3.355 Accept H0
Date created: 06/05/2001 |
Last updated: 12/11/2023 Please email comments on this WWW page to alan.heckert@nist.gov. |