1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques


Purpose: Test for distributional adequacy 
The chisquare test
(Snedecor and Cochran,
1989) is used to test if a sample
of data came from a population with a specific distribution.
An attractive feature of the chisquare goodnessoffit test is that it can be applied to any univariate distribution for which you can calculate the cumulative distribution function. The chisquare goodnessoffit test is applied to binned data (i.e., data put into classes). This is actually not a restriction since for nonbinned data you can simply calculate a histogram or frequency table before generating the chisquare test. However, the value of the chisquare test statistic are dependent on how the data is binned. Another disadvantage of the chisquare test is that it requires a sufficient sample size in order for the chisquare approximation to be valid. The chisquare test is an alternative to the AndersonDarling and KolmogorovSmirnov goodnessoffit tests. The chisquare goodnessoffit test can be applied to discrete distributions such as the binomial and the Poisson. The KolmogorovSmirnov and AndersonDarling tests are restricted to continuous distributions. Additional discussion of the chisquare goodnessoffit test is contained in the product and process comparisons chapter (chapter 7). 

Definition 
The chisquare test is defined for the hypothesis:


ChiSquare Test Example 
We generated 1,000 random numbers for normal,
double exponential, t with 3 degrees
of freedom, and lognormal distributions. In all cases,
a chisquare test with k = 32 bins was applied to test
for normally distributed data. Because the normal distribution
has two parameters, c = 2 + 1 = 3
The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the t random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4. H_{0}: the data are normally distributed H_{a}: the data are not normally distributed Y1 Test statistic: Χ^{ 2} = 32.256 Y2 Test statistic: Χ^{ 2} = 91.776 Y3 Test statistic: Χ^{ 2} = 101.488 Y4 Test statistic: Χ^{ 2} = 1085.104 Significance level: α = 0.05 Degrees of freedom: k  c = 32  3 = 29 Critical value: Χ^{ 2}_{1α, kc} = 42.557 Critical region: Reject H_{0} if Χ^{ 2} > 42.557As we would hope, the chisquare test fails to reject the null hypothesis for the normally distributed data set and rejects the null hypothesis for the three nonnormal data sets. 

Questions 
The chisquare test can be used to answer the following
types of questions:


Importance 
Many statistical tests and procedures are based on specific
distributional assumptions.
The assumption of normality
is particularly common in classical statistical tests.
Much reliability modeling is based on the assumption that
the distribution of the data follows a Weibull distribution.
There are many nonparametric and robust techniques that are not based on strong distributional assumptions. By nonparametric, we mean a technique, such as the sign test, that is not based on a specific distributional assumption. By robust, we mean a statistical technique that performs well under a wide range of distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than these nonparametric and robust techniques. By power, we mean the ability to detect a difference when that difference actually exists. Therefore, if the distributional assumption can be confirmed, the parametric techniques are generally preferred. If you are using a technique that makes a normality (or some other type of distributional) assumption, it is important to confirm that this assumption is in fact justified. If it is, the more powerful parametric techniques can be used. If the distributional assumption is not justified, a nonparametric or robust technique may be required. 

Related Techniques 
AndersonDarling GoodnessofFit Test KolmogorovSmirnov Test ShapiroWilk Normality Test Probability Plots Probability Plot Correlation Coefficient Plot 

Software  Some general purpose statistical software programs provide a chisquare goodnessoffit test for at least some of the common distributions. Both Dataplot code and R code can be used to generate the analyses in this section. 