SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

EXTREME STUDENTIZED DEVIATE TEST

Name:
    EXTREME STUDENTIZED DEVIATE TEST
Type:
    Analysis Command
Purpose:
    Perform a generalized extreme studentized deviate (ESD) test for outliers.
Description:
    The generalized extreme Studentized deviate (ESD) test is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution.

    The primary limitation of the Grubbs test and the Tietjen-Moore test is that the suspected number of outliers, k, must be specified exactly. If k is not specified correctly, this can distort the conclusions of these tests. On the other hand, the generalized ESD test only requires that an upper bound for the suspected number of outliers be specified.

    Given the upper bound, r, the generalized ESD test essentially performs r separate tests: a test for one outlier, a test for two outliers, and so on up to r outliers.

    The generalized ESD test is defined for the hypothesis:

    H0: There are no outliers in the data set
    Ha: There are up to r outliers in the data set
    Test Statistic: Compute

      \( R_{1} = \mbox{max}_{i}|x_{i} - \bar{x}|/s \)

    with \( \bar{x} \) and s denoting the sample mean and sample standard deviation, respectively.

    Remove the observation that maximizes \( |x_{i} - \bar{x}| \) and then recompute the above statistic with n - 1 observations. Repeat this process until r observations have been removed. This results in the r test statistics R1, R2, ..., Rr.

    Significance Level: \( \alpha \)
    Critical Region: Corresponding to the r test statistics, compute the following r critical values

      \( \lambda_{i} = \frac{t_{n-i-1,p(n-i)}} {\sqrt{(n-i-1+t_{n-i-1,p}^{2}) (n-i+1)}} \)

    where i = 1, 2, ..., r, \( t_{\nu,p} \) is the 100p percentage point from the t distribution with \( \nu \) degrees of freedom and \( p = 1 - \frac{\alpha}{2(n-i+1)} \).

    The number of outliers is determined by finding the largest i such that Ri > \( \lambda_{i} \).

    Simulation studies by Rosner indicate that this critical value approximation is very accurate for n ≥ 25 and reasonably accurate for n ≥ 15.

    Note that although the generalized ESD is essentially Grubbs test applied sequentially, there are a few important distinctions:

    • The generalized ESD test makes approriate adjustments for the critical values based on the number of outliers being tested for that the sequential application of Grubbs test does not.

    • If there is significant masking, applying Grubbs test sequentially may stop too soon. The example below identifies 3 outliers at the 5% level when using the generalized ESD test. However, trying to use Grubbs test sequentially would stop at the first iteration and declare no outliers.

    • Grubbs test allows one-sided tests (i.e., you can specify a minimum test or the maximum test) in addition to two-sided tests (both the minimum and the maximum value are tested). The generalized ESD test is restricted to two-sided tests.
Syntax 1:
    EXTREME STUDENTIZED DEVIATE TEST <y>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    EXTREME STUDENTIZED DEVIATE MULTIPLE TEST <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of up to k response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs an extreme studentized deviate test on <y1> then on <y2> and so on. Up to 30 response variables can be specified.

    Note that the syntax

      EXTREME STUDENTIZED DEVIATE MULTIPLE TEST Y1 TO Y4

    is supported. This is equivalent to

      EXTREME STUDENTIZED DEVIATE MULTIPLE TEST Y1 Y2 Y3 Y4
Syntax 3:
    EXTREME STUDENTIZED DEVIATE REPLICATED TEST <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x1> ... <xk> is a list of up to k group-id variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax peforms a cross-tabulation of <x1> ... <xk> and performs an extreme studentized deviate test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 extreme studentized deviate tests performed.

    Up to six group-id variables can be specified.

    Note that the syntax

      EXTREME STUDENTIZED DEVIATE REPLICATED TEST Y X1 TO X4

    is supported. This is equivalent to

      EXTREME STUDENTIZED DEVIATE REPLICATED TEST Y X1 X2 X3 X4
Examples:
    EXTREME STUDENTIZED DEVIATE TEST Y1
    EXTREME STUDENTIZED DEVIATE TEST Y1 LABID
    EXTREME STUDENTIZED DEVIATE MULTIPLE TEST Y1 Y2 Y3
    EXTREME STUDENTIZED DEVIATE REPLICATED TEST Y X1 X2
    EXTREME STUDENTIZED DEVIATE TEST Y1 SUBSET TAG > 2
    EXTREME STUDENTIZED DEVIATE MINIMUM TEST Y1
    EXTREME STUDENTIZED DEVIATE MAXIMUM TEST Y1
Note:
    The upper bound on the number of outliers to test for is specified with the command

      LET NOUTLIER = <value>
Note:
    Masking and swamping are two issues that can affect outlier tests.

    Masking can occur when we specify too few outliers in the test. For example, if we are testing for a single outlier when there are in fact two (or more) outliers, these additional outliers may influence the value of the test statistic enough so that no points are declared as outliers.

    On the other hand, swamping can occur when we specify too many outliers in the test. For example, if we are testing for two outliers when there is in fact only a single outlier, both points may be declared outliers.

    The possibility of masking and swamping are an important reason why it is useful to complement formal outlier tests with graphical methods. Graphics can often help identify cases where masking or swamping may be an issue.

    Also, masking is one reason that trying to apply a single outlier test sequentially can fail. If there are multiple outliers, masking may cause the outlier test for the first outlier to return a conclusion of no outliers (and so the testing for any additional outliers is not done). Also, applying a single outlier test sequentially does not properly adjust the critical value for the overall test.

    The masking/swamping issue explains the primary advantage of the generalized ESD test. When there is masking or swamping, it is not uncommon to see the conclusion for the prescence of outliers change as the value for the number of outliers changes. By weaking the assumption that the exact number of potential outliers is known to the assumption that an upper bound is known (and we can always pick this upper bound a little high if we do not have a good handle on it), we are more likely to avoid distortions caused by masking or swamping.

Note:
    Tests for outliers are dependent on knowing the distribution of the data. The generalized ESD test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the extreme studentized deviate test be complemented with a normal probability plot. If the data are not approximately normally distributed, then the generalized ESD test may be detecting the non-normality of the data rather than the presence of outliers.
Note:
    You can specify the number of digits in the generalized ESD output with the command

      SET WRITE DECIMALS <value>
Note:
    The EXTREME STUDENTIZED DEVIATE TEST command automatically saves the following parameters:

      STATVAL = the value of the test statistic
      PVAL = the p-value of the test statistic
      CUTOFF0 = the 0 percent point of the reference distribution
      CUTOFF50 = the 50 percent point of the reference distribution
      CUTOFF75 = the 75 percent point of the reference distribution
      CUTOFF90 = the 90 percent point of the reference distribution
      CUTOFF95 = the 95 percent point of the reference distribution
      CUTOFF975 = the 97.5 percent point of the reference distribution
      CUTOFF99 = the 99 percent point of the reference distribution

    If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:
    In addition to the EXTREME STUDENTIZED DEVIATE TEST command, the following command can also be used:

      LET A = EXTREME STUDENTIZED DEVIATE Y

    In addition to the above LET command, built-in statistics are supported for 20+ different commands (enter HELP STATISTICS for details).

Default:
    None
Synonyms:
    ESD is a synonym for EXTREME STUDENTIZED DEVIATE
    MULTIPLE ESD is a synonym for ESD MULTIPLE
    REPLICATION ESD is a synonym for ESD REPLICATION
Related Commands: References:
    Rosner, Bernard (May 1983), "Percentage Points for a Generalized ESD Many-Outlier Procedure," Technometrics, Vol. 25, No. 2, pp. 165-172.

    Iglewicz and Hoaglin (1993), "Volume 16: How to Detect and Handle Outliers," The ASQC Basic Reference in Quality Control: Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.

Applications:
    Outlier Detection
Implementation Date:
    2009/11
    2011/08: Fixed bug where the table for "Conclusions (2-Tailed Test)" was printing the critical values in an inverted order
Program:
     
    .  Step 1: Data from Rosner paper
    .
    serial read y
    -0.25 0.68 0.94 1.15 1.20 1.26 1.26 1.34 1.38 1.43 1.49 1.49 1.55 1.56
     1.58 1.65 1.69 1.70 1.76 1.77 1.81 1.91 1.94 1.96 1.99 2.06 2.09 2.10
     2.14 2.15 2.23 2.24 2.26 2.35 2.37 2.40 2.47 2.54 2.62 2.64 2.90 2.92
     2.92 2.93 3.21 3.26 3.30 3.59 3.68 4.30 4.64 5.34 5.42 6.01
    end of data
    .
    .  Step 2: Generate a normal probability plot
    .
    title case asis
    title offset 2
    label case asis
    title Normal Probability Plot
    y1label Sorted Data
    x1label Theoretical Percent Points
    char circle
    char fill on
    char hw 1.2 0.8
    line blank
    normal prob plot y
    .
    .  Step 3: Perform the generalized ESD outlier test
    .
    set write decimals 5
    let noutlier = 10
    extreme studentized deviate test y
        
    The following output is generated.

    plot generated by sample program

                Generalized Extreme Studentized Deviate Test for
                   Multiple Outliers (Assumption: Normality)
     
    Response Variable: Y
     
    Summary Statistics:
    Number of Observations:                              54
    Sample Minimum:                                -0.25000
    Sample Maximum:                                 6.00999
    Sample Mean:                                    2.32074
    Sample SD:                                      1.18286
     
     
     
     
    H0: There are no outliers
    Ha: There is exactly     1 outlier
    Potential Outlier Value Tested at This Step:              6.00999
     
    Extreme Studentized Deviate Test Statistic Value:         3.11890
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.532
               75.0    =          2.738
               90.0    =          2.987
               95.0    =          3.158
               97.5    =          3.318
               99.0    =          3.516
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.987      Reject H0
         5%    95%            3.158      Accept H0
       2.5%  97.5%            3.318      Accept H0
         1%    99%            3.516      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     2 outliers
    Potential Outlier Value Tested at This Step:              5.41999
     
    Extreme Studentized Deviate Test Statistic Value:         2.94297
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.524
               75.0    =          2.730
               90.0    =          2.980
               95.0    =          3.150
               97.5    =          3.311
               99.0    =          3.508
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.980      Accept H0
         5%    95%            3.150      Accept H0
       2.5%  97.5%            3.311      Accept H0
         1%    99%            3.508      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     3 outliers
    Potential Outlier Value Tested at This Step:              5.33999
     
    Extreme Studentized Deviate Test Statistic Value:         3.17942
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.516
               75.0    =          2.724
               90.0    =          2.972
               95.0    =          3.144
               97.5    =          3.303
               99.0    =          3.500
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.972      Reject H0
         5%    95%            3.144      Reject H0
       2.5%  97.5%            3.303      Accept H0
         1%    99%            3.500      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     4 outliers
    Potential Outlier Value Tested at This Step:              4.63999
     
    Extreme Studentized Deviate Test Statistic Value:         2.81018
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.509
               75.0    =          2.717
               90.0    =          2.964
               95.0    =          3.136
               97.5    =          3.295
               99.0    =          3.491
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.964      Accept H0
         5%    95%            3.136      Accept H0
       2.5%  97.5%            3.295      Accept H0
         1%    99%            3.491      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     5 outliers
    Potential Outlier Value Tested at This Step:             -0.25000
     
    Extreme Studentized Deviate Test Statistic Value:         2.81557
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.501
               75.0    =          2.709
               90.0    =          2.956
               95.0    =          3.128
               97.5    =          3.287
               99.0    =          3.482
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.956      Accept H0
         5%    95%            3.128      Accept H0
       2.5%  97.5%            3.287      Accept H0
         1%    99%            3.482      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     6 outliers
    Potential Outlier Value Tested at This Step:              4.29999
     
    Extreme Studentized Deviate Test Statistic Value:         2.84817
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.494
               75.0    =          2.701
               90.0    =          2.948
               95.0    =          3.120
               97.5    =          3.278
               99.0    =          3.474
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.948      Accept H0
         5%    95%            3.120      Accept H0
       2.5%  97.5%            3.278      Accept H0
         1%    99%            3.474      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     7 outliers
    Potential Outlier Value Tested at This Step:              3.67999
     
    Extreme Studentized Deviate Test Statistic Value:         2.27932
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.486
               75.0    =          2.693
               90.0    =          2.940
               95.0    =          3.112
               97.5    =          3.270
               99.0    =          3.463
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.940      Accept H0
         5%    95%            3.112      Accept H0
       2.5%  97.5%            3.270      Accept H0
         1%    99%            3.463      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     8 outliers
    Potential Outlier Value Tested at This Step:              3.58999
     
    Extreme Studentized Deviate Test Statistic Value:         2.31036
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.478
               75.0    =          2.685
               90.0    =          2.932
               95.0    =          3.103
               97.5    =          3.262
               99.0    =          3.455
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.932      Accept H0
         5%    95%            3.103      Accept H0
       2.5%  97.5%            3.262      Accept H0
         1%    99%            3.455      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly     9 outliers
    Potential Outlier Value Tested at This Step:              0.68000
     
    Extreme Studentized Deviate Test Statistic Value:         2.10158
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.468
               75.0    =          2.677
               90.0    =          2.923
               95.0    =          3.093
               97.5    =          3.253
               99.0    =          3.444
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.923      Accept H0
         5%    95%            3.093      Accept H0
       2.5%  97.5%            3.253      Accept H0
         1%    99%            3.444      Accept H0
     
     
     
    H0: There are no outliers
    Ha: There are exactly    10 outliers
    Potential Outlier Value Tested at This Step:              3.29999
     
    Extreme Studentized Deviate Test Statistic Value:         2.06717
     
    Percent Points of the Reference Distribution
    -----------------------------------
      Percent Point               Value
    -----------------------------------
                0.0    =          0.000
               50.0    =          2.460
               75.0    =          2.668
               90.0    =          2.915
               95.0    =          3.084
               97.5    =          3.242
               99.0    =          3.435
     
    Conclusions (2-Tailed Test)
    ----------------------------------------------
      Alpha    CDF   Critical Value     Conclusion
    ----------------------------------------------
        10%    90%            2.915      Accept H0
         5%    95%            3.084      Accept H0
       2.5%  97.5%            3.242      Accept H0
         1%    99%            3.435      Accept H0
     
     
    Summary Table
    ----------------------------------------------------------------------
         Exact           Test       Critical       Critical       Critical
     Number of      Statistic          Value          Value          Value
      Outliers          Value            10%             5%             1%
    ----------------------------------------------------------------------
             1        3.11890        2.98680        3.15879        3.51571
             2        2.94297        2.97960        3.15142        3.50772
             3        3.17942        2.97224        3.14388        3.49952
             4        2.81018        2.96469        3.13616        3.49110
             5        2.81557        2.95697        3.12824        3.48246
             6        2.84817        2.94906        3.12012        3.47358
             7        2.27932        2.94094        3.11179        3.46445
             8        2.31036        2.93262        3.10324        3.45506
             9        2.10158        2.92408        3.09445        3.44539
            10        2.06717        2.91530        3.08542        3.43543
        
Date created: 09/09/2010
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.