KOLMOGOROV SMIRNOV GOODNESS OF FIT TEST

Name:

NOTE: This command has been replaced with the unified GOODNESS OF FIT command.

Type:

Analysis Command Purpose:

Perform a Kolmogorov-Smirnov goodness of fit test that a set of data come from a hypothesized continuouis distributuion. Dataplot currently supports the Kolmogorov-Smirnov goodness of fit test for 60+ distributions. Description:

₁

₂

where n(i) is the number of points less than Y_i This is a step function that increases by 1/N at the value of each data point.

We can graph a plot of the empirical distribution function with a cumulative distribution function for a given distribution. The K-S test is based on the maximum distance between these two curves. An example of this plot for a sample of 100 normal random numbers is given here.

An attractive feature of this test is that the distribution of the K-S test statistic itself does not depend on the underlying cumulative distribution function being tested. Another advantage is that it is an exact test (the chi-square goodness of fit depends on an adequate sample size for the approximations to be valid). Despite these advantages, the K-S test has several important limitations:

It only applies to continuous distributions.
It tends to be more sensitive near the center of the distribution than it is at the tails.
Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.
The K-S test is only valid for continuous distributions.

Due to limitations 2 and 3 above, many analysts prefer to use the Anderson-Darling goodness of fit test. However, the Anderson-Darling test is only available for a few specific distributions. In addition, the Anderson-Darling test is more powerful than the K-S test since it makes specific use of the underlying cumulative distribution.

More formally, the Kolmogorov-Smirnov goodness of fit test statistic can be defined as follows.

H₀: The data follow the specified distribution.

H_a: The data do not follow the specified distribution.

Test Statistic: The Kolmogorov-Smirnov goodness of fit test statistic is defined as

D = max |F(Y(i)) - i/N|

where F is the theoretical cumulative distribution of the distribution being tested.

Significance Level: alpha

Critical Region: The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table. There are several variations of these tables in the literature that use somewhat different scalings for the K-S test statistic and critical regions. These alternative formulations should be equivalent, but it is necessary to ensure that the test statistic is calculated in a way that is consistent with how the critical values were tabulated.
Dataplot uses the critical values from Chakravart, Laha, and Roy (see Reference: below).

In order to apply the K-S goodness of fit test, any shape parameters must be specified. For example,

The name of the distributional parameter for families is given in the list below.

Location and scale parameters can be specified generically with the following commands:

The location and scale parameters default to 0 and 1 if not specified.

Syntax:

UNIFORM
SEMI-CIRCULAR
TRIANGULAR
NORMAL
LOGISTIC
DOUBLE EXPONENTIAL
CAUCHY
TUKEY LAMBDA (LAMBDA)
LOGNORMAL (SD, optional, defaults to 1)
HALFNORMAL
T (NU)
CHI-SQUARED (NU)
F (NU1, NU2)
EXPONENTIAL
GAMMA (GAMMA)
BETA (ALPHA, BETA)
WEIBULL (GAMMA)
EXTREME VALUE TYPE 1
EXTREME VALUE TYPE 2 (GAMMA)
PARETO (GAMMA)
WALD (GAMMA)
INVERSE GAUSSIAN (GAMMA)
RIG (GAMMA)
FL (GAMMA)
NONCENTRAL BETA (ALPHA, BETA, LAMBDA)
NONCENTRAL CHISQUARE (NU, LAMBDA)
NONCENTRAL F (NU1, NU2, LAMBDA)
DOUBLY NONCENTRAL F (NU1, NU2, LAMBDA1, LAMBDA2)
NONCENTRAL T (NU, LAMBDA)
DOUBLY NONCENTRAL T (NU, LAMBDA1, LAMBDA2)
HYPERGEOMETRIC (K, N, M)
VON-MISES (B)
POWER-NORMAL (P, SD)
POWER-LOGNORMAL (P, SD)
COSINE
ALPHA (ALPHA, BETA)
POWER FUNCTION (C)
CHI (NU)
LOG LOGISTIC (DELTA)
GENERALIZED GAMMA (GAMMA, C)
ANGLIT
ARCSIN
HYPERBOLIC SECANT
HALF CAUCHY
FOLDED NORMAL (M, SD)
TRUNCATED NORMAL (A, B, M, SD)
TRUNCATED EXPONENTIAL (X0, M, SD)
DOUBLE WEIBULL (GAMMA)
LOG GAMMA (GAMMA)
GENERALIZED EXTREME VALUE (GAMMA)
PARETO SECOND KIND (GAMMA)
HALF LOGISTIC (GAMMA, optional)
EXPONENTIATED WEIBULL (GAMMA, THETA)
GOMPERTZ (C,B)
WRAPPED CAUCHY (C)
BRADFORD (ALPHA, BETA)
DOUBLE GAMMA (GAMMA)
FOLDED CAUCHY (M, SD)
GENERALIZED EXPONENTIAL (LAMBDA1, LAMBDA2, S)
GENERALIZED LOGISTIC (ALPHA)
MIELKE BETA-KAPPA (BETA, THETA, K)
EXPONENTIAL POWER (ALPHA, BETA)

Examples:

LET LAMBDA = 0.2
TUKEY LAMBDA KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST X

SET MINMAX = 1
LET GAMMA = 2.0
WEIBULL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST X

LET LAMBDA = 3
POISSON KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST X

NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y X
NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y X1 X2

Note:

There are several approaches for estimating the parameters of a distribution before applying the goodness of fit test. PPCC plots combined with probability plots are an effective graphical approach if there are zero or one shape parameters. Maximum likelihood estimation is available for several distributions. Least squares estimation can be applied for distributions for which maximum likelihood estimation is not available. Note:

STATVAL - value of the K-S goodness of fit statistic CUTUPP90 - 90% critical value (alpha = 0.10) for the K-S goodness of fit test statistic CUTUPP95 - 95% critical value (alpha = 0.05) for the K-S goodness of fit test statistic CUTUPP99 - 99% critical value (alpha = 0.01) for the K-S goodness of fit test statistic These parameters can be used in subsequent analysis.

Default:

Location and scale parameters default to zero and one. Shape parameters must be explicitly specified. There is no default distribution. Synonyms:

Related Commands:

ANDERSON-DARLING TEST	= Perform Anderson-Darling test for goodness of fit.
CHI-SQUARE TEST	= Perform chi-square test for goodness of fit.
WILK-SHAPIRO TEST	= Perform Wilk-Shapiro test for normality.
MAXIMUM LIKELIHOOD	= Perform maximum likelihood estimation for several distributions.
FIT	= Perform least squares fitting.
PROBABILITY PLOT	= Generates a probability plot.
HISTOGRAM	= Generates a histogram.
PPCC PLOT	= Generates probability plot correlation coefficient plot.

Reference:

"Handbook of Methods of Applied Statistics, Volume I", Chakravart, Laha, and Roy, John Wiley, 1967, pp. 392-394. Applications:

Distributional Analysis Implementation Date:

1998/12 Program:

The following output is generated.

 
      ********************************************************
      **  normal kolmogorov-smirnov goodness of fit test y  **
      ********************************************************
 
 
                  KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST
 
NULL HYPOTHESIS H0:      DISTRIBUTION FITS THE DATA
ALTERNATE HYPOTHESIS HA: DISTRIBUTION DOES NOT FIT THE DATA
DISTRIBUTION:            NORMAL
   NUMBER OF OBSERVATIONS      =      195
 
TEST:
KOLMOGOROV-SMIRNOV TEST STATISTIC     =   0.3249392E-01
 
   ALPHA LEVEL         CUTOFF              CONCLUSION
           10%        0.08737               ACCEPT H0
            5%        0.09739               ACCEPT H0
            1%        0.11673               ACCEPT H0