|
1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques
|
|||||||||||
|
Purpose: Test for Distributional Adequacy |
The Anderson-Darling test
(Stephens, 1974)
is used to test if a sample of data came from a population
with a specific distribution. It is a modification of the
Kolmogorov-Smirnov (K-S) test and
gives more weight to the tails than does the K-S test. The K-S
test is distribution free in the sense that the critical values
do not depend on the specific distribution being tested (note that
this is true only for a fully specified distribution, i.e. the
parameters are known). The Anderson-Darling test makes use of the
specific distribution in calculating critical values. This has the
advantage of allowing a more sensitive test and the disadvantage
that critical values must be calculated for each distribution.
Currently, tables of critical values are available for the
normal,
uniform,
lognormal,
exponential,
Weibull,
extreme value type I, generalized Pareto,
and logistic distributions. We do not provide the tables of
critical values in this Handbook (see
Stephens 1974,
1976, 1977, and 1979) since this test is usually
applied with a statistical software program that will print
the relevant critical values.
The Anderson-Darling test is an alternative to the chi-square and Kolmogorov-Smirnov goodness-of-fit tests. |
||||||||||
| Definition |
The Anderson-Darling test is defined as:
|
||||||||||
|
Sample Output |
We generated 1,000 random numbers for normal,
double exponential, Cauchy, and lognormal distributions.
In all four cases, the Anderson-Darling
test was applied to test for a normal distribution.
The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4.
Distribution Mean Standard Deviation
------------ -------- ------------------
Normal (Y1) 0.004360 1.001816
Double Exponential (Y2) 0.020349 1.321627
Cauchy (Y3) 1.503854 35.130590
Lognormal (Y4) 1.518372 1.719969
H0: the data are normally distributed
Ha: the data are not normally distributed
Y1 adjusted test statistic: A2 = 0.2576
Y2 adjusted test statistic: A2 = 5.8492
Y3 adjusted test statistic: A2 = 288.7863
Y4 adjusted test statistic: A2 = 83.3935
Significance level: α = 0.05
Critical value: 0.752
Critical region: Reject H0 if A2 > 0.752
When the data were generated using a normal distribution, the
test statistic was small and the hypothesis of normality was not
rejected. When the data were generated using the double exponential,
Cauchy, and lognormal distributions, the test statistics were
large, and the hypothesis of an underlying normal
distribution was rejected at the 0.05 significance level.
|
||||||||||
| Questions |
The Anderson-Darling test can be used to answer the following
questions:
|
||||||||||
| Importance |
Many statistical tests and procedures are based on specific
distributional assumptions. The assumption of normality
is particularly common in classical statistical tests.
Much reliability modeling is based on the assumption
that the data follow a Weibull distribution.
There are many non-parametric and robust techniques that do not make strong distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than non-parametric and robust techniques. Therefore, if the distributional assumptions can be validated, they are generally preferred. |
||||||||||
| Related Techniques |
Chi-Square goodness-of-fit Test Kolmogorov-Smirnov Test Shapiro-Wilk Normality Test Probability Plot Probability Plot Correlation Coefficient Plot |
||||||||||
| Case Study | Josephson junction cryothermometry case study. | ||||||||||
| Software | The Anderson-Darling goodness-of-fit test is available in some general purpose statistical software programs. Both Dataplot code and R code can be used to generate the analyses in this section. | ||||||||||