SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

DAVID TEST

Name:
    DAVID TEST
Type:
    Analysis Command
Purpose:
    Perform the David, Hartley and Pearson test for univariate outliers from a normal distribution.
Description:
    The David, Hartley and Pearson statistic tests whether the minimum and maximum values from a univariate dataset are simultaneously outliers. This test assumes that the data come from an approximately normal distribution.

    The test statistic is

      \[ D = \frac{r} {s} \]

    with \( r \) and \( s \) denoting the sample range and sample standard deviation, respectively.

    Dataplot supports several methods for deterining the critical values for this test.

    1. The ASTM E178-16a standard provides tables for n = 3 to 50 and alpha levels of 0.10, 0.05 and 0.01. Linear interpolation is used for values of n not given in the table. For values of n > 50, the simulation (see below) method is used.

    2. The original paper by David contains tables for n = 3 to 1,000 and alpha levels of 0.10, 0.05, 0.025, 0.01 and 0.005. Linear interpolation is used for values of n not given in the table.

    3. The David paper suggests the following formula (equation 6 on page 485)

      \[ cv = \sqrt{ \frac{2(n-1)t_{((1-\alpha)/(n(n-1)),n-2)}^{2}} {(n-2) + t_{((1-\alpha)/(n(n-1)),n-2)}^{2}} } \]

      with t denoting the percent point function of the t distribution.

    4. Critical values can be obtained via simulation.

    To specify the method used to compute the critical value, enter one of the following commands (the default is ASTM)

      SET DAVID TEST CRITICAL VALUES ASTM
      SET DAVID TEST CRITICAL VALUES DAVID
      SET DAVID TEST CRITICAL VALUES FORMULA
      SET DAVID TEST CRITICAL VALUES SIMULATION

    This test is included in the ASTM E178 standard for outliers.

Syntax 1:
    DAVID TEST <y> <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable being tested;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    MULTIPLE DAVID TEST <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of up to k response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs the David test on <y1>, then on <y2>, and so on. Up to 30 response variables may be specified.

    Note that the syntax

      MULTIPLE DAVID TEST Y1 TO Y4

    is supported. This is equivalent to

      MULTIPLE DAVID TEST Y1 Y2 Y3 Y4
Syntax 3:
    REPLICATED DAVID TEST <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x1> ... <xk> is a list of up to k group-id variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax performs a cross-tabulation of <x1> ... <xk> and performs a David test for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 David tests performed.

    Up to six group-id variables can be specified.

    Note that the syntax

      REPLICATED DAVID TEST Y X1 TO X4

    is supported. This is equivalent to

      REPLICATED DAVID TEST Y X1 X2 X3 X4
Examples:
    DAVID TEST Y1
    MULTIPLE DAVID TEST Y1 Y2 Y3
    REPLICATED DAVID TEST Y X1 X2
    DAVID TEST Y1 SUBSET TAG > 2
Note:
    Tests for outliers are dependent on knowing the distribution of the data. The David test assumes that the data come from an approximately normal distribution. For this reason, it is strongly recommended that the David test be complemented with a normal probability test. If the data are not approximately normally distributed, then the David test may be detecting the non-normality of the data rather than the presence of an outlier.
Note:
    You can specify the number of digits in the David output with the command

      SET WRITE DECIMALS <value>
Note:
    The DAVID TEST command automatically saves the following parameters:

      STATVAL = the value of the test statistic
      STATDCF = the CDF value of the test statistic
      PVALUE = the p-value of the test statistic
      CUTOFF80 = the 80 percent point of the reference distribution
      CUTOFF90 = the 90 percent point of the reference distribution
      CUTOFF95 = the 95 percent point of the reference distribution
      CUTOF975 = the 97.5 percent point of the reference distribution
      CUTOFF99 = = the 99 percent point of the reference distribution

    The STATCDF and PVALUE are only saved when the simulation method is used to obtain critical values. If the ASTM method is used to obtain critical values, the CUTOFF80 and CUTOF975 values are not saved. When the DAVID method is used to obtain critical values, the CUTOFF80 value is not saved.

    If the MULTIPLE or REPLICATED option is used, these values will be written to the file "dpst1f.dat" instead.

Note:
    In addition to the DAVID TEST command, the following commands can also be used:

      LET A = DAVID TEST Y
      LET A = DAVID TEST CDF Y
      LET A = DAVID TEST PVALUE Y
      LET A = DAVID TEST MINIMUM INDEX Y
      LET A = DAVID TEST MAXIMUM INDEX Y

      LET ALPHA = <value>
      LET A = DAVID TEST CRITICAL VALUE Y

    The DAVID TEST, DAVID TEST CDF, and DAVID TEST PVALUE return the values of the test statistic, the cdf of the test statistic and the pvalue of the test statistic, respectively. For the DAVID TEST CDF and DAVID TEST PVALUE commands, the simulation method will be used. Otherwise, the method specified by the SET DAVID TEST CRITICAL VALUE command will be used.

    The DAVID TEST MINIMUM INDEX and DAVID TEST MAXIMUM INDEX return the row index of the minimum and maximum values of the response variable, respectively.

    The DAVID TEST CRITICAL VALUE returns the critical value for the specified value of ALPHA. If ALPHA is not specified, it will be set to 0.05. Note that if the ASTM or DAVID methods are specified for the critical values, only a few select values for alpha are supported (0.01, 0.05 and 0.10 for ASTM and 0.005, 0.01, 0.025, 0.05 and 0.10 for DAVID).

    In addition to the above LET command, built-in statistics are supported for about 25 different commands (enter HELP STATISTICS for details).

Default:
    The ASTM method is used to obtain critical values
Synonyms:
    None
Related Commands: Reference:
    David, Hartley, and Pearson (1954), "The Distribution of the Ratio, in a Single Normal Sample, of Range to Standard Deviation", Biometrika, Vol. 41, pp. 482-493.

    E178 - 16A (2016), "Standard Practice for Dealing with Outlying Observations", ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, USA.

Applications:
    Outlier Detection
Implementation Date:
    2019/10
Program:
     
    . Step 1:   Read the data - example from ASTM E178 standard
    .
    read y
    -1.40
    -0.44
    -0.30
    -0.24
    -0.22
    -0.13
    -0.05
    0.06
    0.10
    0.18
    0.20
    0.39
    0.48
    0.63
    1.01
    end of data
    .
    . Step 2:   Perform the DAVID TEST
    .
    set write decimals 3
    set david test critical values astm
    david test y
    set david test critical values david
    david test y
    set david test critical values formula
    david test y
    set david test critical values simulation
    david test y
        
    The following output is generated
    THE FORTRAN COMMON CHARACTER VARIABLE DAVITEST HAS JUST BEEN SET TO ASTM
     
                David, Hartley, Pearson Test for Outliers: Simultaneous Test
                      for Minimum and Maximum (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The minimum and maximum are not
        both outliers
    Ha: Both the minimum and maximum are
        outliers
    Potential minimum outlier value tested:          -1.400
    Potential maximum outlier value tested:           1.010
     
    Summary Statistics:
    Number of Observations:                              15
    Sample Minimum:                                  -1.400
    ID for Sample Minimum:                                1
    Sample Maximum:                                   1.010
    ID for Sample Maximum:                               15
    Sample Mean:                                      0.018
    Sample SD:                                        0.551
    Sample Range:                                     2.410
     
    David Test Statistic Value:                       4.374
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        10%    90%          4.374            4.025      Reject H0
         5%    95%          4.374            4.171      Reject H0
         1%    99%          4.374            4.435      Accept H0
     
     
     
    Critical Values Based on ASTM E-178 Tables
     
     
    THE FORTRAN COMMON CHARACTER VARIABLE DAVITEST HAS JUST BEEN SET TO DAVI
     
                David, Hartley, Pearson Test for Outliers: Simultaneous Test
                      for Minimum and Maximum (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The minimum and maximum are not
        both outliers
    Ha: Both the minimum and maximum are
        outliers
    Potential minimum outlier value tested:          -1.400
    Potential maximum outlier value tested:           1.010
     
    Summary Statistics:
    Number of Observations:                              15
    Sample Minimum:                                  -1.400
    ID for Sample Minimum:                                1
    Sample Maximum:                                   1.010
    ID for Sample Maximum:                               15
    Sample Mean:                                      0.018
    Sample SD:                                        0.551
    Sample Range:                                     2.410
     
    David Test Statistic Value:                       4.374
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        10%    90%          4.374            4.020      Reject H0
         5%    95%          4.374            4.170      Reject H0
       2.5%  97.5%          4.374            4.290      Reject H0
         1%    99%          4.374            4.430      Accept H0
       0.5%  99.5%          4.374            4.530      Accept H0
     
     
     
    Critical Values Based on David Tables
     
     
    THE FORTRAN COMMON CHARACTER VARIABLE DAVITEST HAS JUST BEEN SET TO FORM
     
                David, Hartley, Pearson Test for Outliers: Simultaneous Test
                      for Minimum and Maximum (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The minimum and maximum are not
        both outliers
    Ha: Both the minimum and maximum are
        outliers
    Potential minimum outlier value tested:          -1.400
    Potential maximum outlier value tested:           1.010
     
    Summary Statistics:
    Number of Observations:                              15
    Sample Minimum:                                  -1.400
    ID for Sample Minimum:                                1
    Sample Maximum:                                   1.010
    ID for Sample Maximum:                               15
    Sample Mean:                                      0.018
    Sample SD:                                        0.551
    Sample Range:                                     2.410
     
    David Test Statistic Value:                       4.374
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        20%    80%          4.374            3.875      Reject H0
        10%    90%          4.374            4.034      Reject H0
         5%    95%          4.374            4.173      Reject H0
       2.5%  97.5%          4.374            4.295      Reject H0
         1%    99%          4.374            4.435      Accept H0
       0.5%  99.5%          4.374            4.527      Accept H0
     
     
     
    Critical Values Based on Formula
     
     
    THE FORTRAN COMMON CHARACTER VARIABLE DAVITEST HAS JUST BEEN SET TO SIMU
     
                David, Hartley, Pearson Test for Outliers: Simultaneous Test
                      for Minimum and Maximum (Assumption: Normality)
     
    Response Variable: Y
     
    H0: The minimum and maximum are not
        both outliers
    Ha: Both the minimum and maximum are
        outliers
    Potential minimum outlier value tested:          -1.400
    Potential maximum outlier value tested:           1.010
     
    Summary Statistics:
    Number of Observations:                              15
    Sample Minimum:                                  -1.400
    ID for Sample Minimum:                                1
    Sample Maximum:                                   1.010
    ID for Sample Maximum:                               15
    Sample Mean:                                      0.018
    Sample SD:                                        0.551
    Sample Range:                                     2.410
     
    David Test Statistic Value:                       4.374
    CDF Value:                                        0.986
    P-Value                                           0.014
     
     
     
    Conclusions (Upper 1-Tailed Test)
    -------------------------------------------------------------
      Alpha    CDF      Statistic   Critical Value     Conclusion
    -------------------------------------------------------------
        20%    80%          4.374            3.842      Reject H0
        10%    90%          4.374            4.021      Reject H0
         5%    95%          4.374            4.166      Reject H0
       2.5%  97.5%          4.374            4.296      Reject H0
         1%    99%          4.374            4.428      Accept H0
       0.5%  99.5%          4.374            4.523      Accept H0
     
     
     
    Critical Values Based on 50,000 Simulations
        
Date created: 01/22/2020
Last updated: 10/16/2024

Please email comments on this WWW page to alan.heckert@nist.gov.