KOLMOGOROV SMIRNOV TWO SAMPLE

Name:

... KOLMOGOROV SMIRNOV TWO SAMPLE TEST Type:

Analysis Command Purpose:

Perform a Kolmogorov-Smirnov two sample test that two data samples come from the same distribution. Note that we are not specifying what that common distribution is. Description:

₁

₂

\( E_{N} = \frac{n_{i}}{N} \)

where n_i is the number of points less than Y_i. This is a step function that increases by 1/N at the value of each data point. We can graph a plot of the empirical distribution function with a cumulative distribution function for a given distribution. The one sample K-S test is based on the maximum distance between these two curves. That is,

\( D = \max_{1 \le i \le N}|F(Y_{i}) - \frac{i} {N}| \)

where F is the theoretical cumulative distribution function.

The two sample K-S test is a variation of this. However, instead of comparing an empirical distribution function to a theoretical distribution function, we compare the two empirical distribution functions. That is,

\( D = |E_1(i) - E_2(i)| \)

where E₁ and E₂ are the empirical distribution functions for the two samples. Note that we compute E₁ and E₂ at each point in both samples (that is both E₁ and E₂ are computed at each point in each sample).

More formally, the Kolmogorov-Smirnov two sample test statistic can be defined as follows.

H₀: The two samples come from a common distribution.

H_a: The two samples do not come from a common distribution.

Test Statistic: The Kolmogorov-Smirnov two sample test statistic is defined as

\( D = |E_1(i) - E_2(i)| \)

where E₁ and E₂ are the empirical distribution functions for the two samples.

Significance Level: \( \alpha \)

Critical Region: The hypothesis regarding the distributional form is rejected if the test statistic, D, is greater than the critical value obtained from a table. There are several variations of these tables in the literature that use somewhat different scalings for the K-S test statistic and critical regions. These alternative formulations should be equivalent, but it is necessary to ensure that the test statistic is calculated in a way that is consistent with how the critical values were tabulated.
Dataplot uses the critical values from Chakravart, Laha, and Roy (see Reference: below).

The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives to the two sample K-S test.

Syntax 1:

Syntax 2:

This syntax performs all the pairwise two sample Kolmogorov Smirnov tests.

Examples:

Note:

STATVAL	-	value of the K-S two sample statistic
CUTUPP90	-	90% critical value (alpha = 0.10) for the K-S two sample test statistic
CUTUPP95	-	95% critical value (alpha = 0.05) for the K-S two sample test statistic
CUTUPP99	-	99% critical value (alpha = 0.01) for the K-S two sample test statistic

These parameters can be used in subsequent analysis.

Note:

SET TWO SAMPLE TEST NUMBER OF PERCENTILES <value>

By default, the Kolmogorov-Smirnov test is generated using all the points. When the number of points gets large, this can result in this command taking a very long time. Computing this test for a specified number of percentiles of the data allows this command to be executed quickly without sacrificing too much information.

Default:

None Synonyms:

Some examples,

Related Commands:

KOMOGOROV SMIRNOV GOODNESS OF FIT TEST	= Perform Kolmogorov-Snirnov goodness of fit test.
CHI-SQUARE TWO SAMPLE TEST	= Perform chi-square two sample test.
BIHISTOGRAM	= Generates a bihistogram.
QUANTILE-QUANTILE PLOT	= Generates a quantile-quantile plot.
TUKEY MEAN DIFFERENCE PLOT	= Generates a Tukey mean difference plot.

Reference:

Handbook of Methods of Applied Statistics, Volume I,

Press, Teukolsky, Vetterling, and Flannery (1992), "Numerical Recipes in Fortan: The Art of Scientific Computing," Second Edition, Cambridge University Press, pp. 614-622.

Applications:

Distributional Analysis Implementation Date:

Program 1:

 
SKIP 25
READ AUTO83B.DAT Y1 Y2
.
DELETE Y2 SUBSET Y2 < 0
SET WRITE DECIMALS 4
KOLMOGOROV-SMIRNOPV TWO SAMPLE TEST Y1 Y2

             Kolmogorov-Smirnov Two Sample Test
  
 First Response Variable:  Y1
 Second Response Variable: Y2
  
 H0: The Two Samples Come From the
     Same (Unspecified) Distribution
 Ha: The Two Samples Come From
     Different Distributions
  
 Sample One Summary Statistics:
 Number of Observations:                  249
 Sample Mean:                             20.1446
 Sample Standard Deviation:               6.4147
 Sample Minimum:                          9.0000
 Sample Maximum:                          39.0000
  
 Sample Two Summary Statistics:
 Number of Observations:                  79
 Sample Mean:                             30.4810
 Sample Standard Deviation:               6.1077
 Sample Minimum:                          18.0000
 Sample Maximum:                          47.0000
  
 Test Statistic Value:                    0.6003
  
  
             Conclusions (Upper 1-Tailed Test)
  
 ------------------------------------------------------------------------
                                                                     Null
         Null   Significance           Test       Critical     Hypothesis
   Hypothesis          Level      Statistic    Region (>=)     Conclusion
 ------------------------------------------------------------------------
        Same           90.0%         0.6003         0.1575         REJECT
        Same           95.0%         0.6003         0.1756         REJECT
        Same           99.0%         0.6003         0.2105         REJECT

Program 2:

 
let y1 = norm rand numb for i = 1 1 50
let y2 = norm rand numb for i = 1 1 62
let y3 = norm rand numb for i = 1 1 45
.
let y2 = 1.7*y2
let y3 = 0.7*y3
.
set write decimals 5
.
two sample kolmogorov smirnov test  y1 y2 y3

             Kolmogorov-Smirnov Two Sample Test
  
 First Response Variable:  Y1
 Second Response Variable: Y2
  
 H0: The Two Samples Come From the
     Same (Unspecified) Distribution
 Ha: The Two Samples Come From
     Different Distributions
  
 Sample One Summary Statistics:
 Number of Observations:                  50
 Sample Mean:                             -0.00822
 Sample Standard Deviation:               0.71196
 Sample Minimum:                          -2.01524
 Sample Maximum:                          1.58788
  
 Sample Two Summary Statistics:
 Number of Observations:                  62
 Sample Mean:                             -0.29060
 Sample Standard Deviation:               1.94815
 Sample Minimum:                          -5.87855
 Sample Maximum:                          3.41010
  
 Test Statistic Value:                    0.28645
  
  
             Conclusions (Upper 1-Tailed Test)
  
 ------------------------------------------------------------------------
                                                                     Null
         Null   Significance           Test       Critical     Hypothesis
   Hypothesis          Level      Statistic    Region (>=)     Conclusion
 ------------------------------------------------------------------------
        Same           90.0%        0.28645        0.23189         REJECT
        Same           95.0%        0.28645        0.25850         REJECT
        Same           99.0%        0.28645        0.30982         ACCEPT
  
  
             Kolmogorov-Smirnov Two Sample Test
  
 First Response Variable:  Y1
 Second Response Variable: Y3
  
 H0: The Two Samples Come From the
     Same (Unspecified) Distribution
 Ha: The Two Samples Come From
     Different Distributions
  
 Sample One Summary Statistics:
 Number of Observations:                  50
 Sample Mean:                             -0.00822
 Sample Standard Deviation:               0.71196
 Sample Minimum:                          -2.01524
 Sample Maximum:                          1.58788
  
 Sample Two Summary Statistics:
 Number of Observations:                  45
 Sample Mean:                             -0.11118
 Sample Standard Deviation:               0.70195
 Sample Minimum:                          -2.21551
 Sample Maximum:                          1.29633
  
 Test Statistic Value:                    0.12222
  
  
             Conclusions (Upper 1-Tailed Test)
  
 ------------------------------------------------------------------------
                                                                     Null
         Null   Significance           Test       Critical     Hypothesis
   Hypothesis          Level      Statistic    Region (>=)     Conclusion
 ------------------------------------------------------------------------
        Same           90.0%        0.12222        0.25069         ACCEPT
        Same           95.0%        0.12222        0.27945         ACCEPT
        Same           99.0%        0.12222        0.33493         ACCEPT
  
  
             Kolmogorov-Smirnov Two Sample Test
  
 First Response Variable:  Y2
 Second Response Variable: Y3
  
 H0: The Two Samples Come From the
     Same (Unspecified) Distribution
 Ha: The Two Samples Come From
     Different Distributions
  
 Sample One Summary Statistics:
 Number of Observations:                  62
 Sample Mean:                             -0.29060
 Sample Standard Deviation:               1.94815
 Sample Minimum:                          -5.87855
 Sample Maximum:                          3.41010
  
 Sample Two Summary Statistics:
 Number of Observations:                  45
 Sample Mean:                             -0.11118
 Sample Standard Deviation:               0.70195
 Sample Minimum:                          -2.21551
 Sample Maximum:                          1.29633
  
 Test Statistic Value:                    0.24373
  
  
             Conclusions (Upper 1-Tailed Test)
  
 ------------------------------------------------------------------------
                                                                     Null
         Null   Significance           Test       Critical     Hypothesis
   Hypothesis          Level      Statistic    Region (>=)     Conclusion
 ------------------------------------------------------------------------
        Same           90.0%        0.24373        0.23892         REJECT
        Same           95.0%        0.24373        0.26634         ACCEPT
        Same           99.0%        0.24373        0.31921         ACCEPT

.
let stat  = two sample kolm smir test y1 y2
let cv95  = two sample kolm smir test critical value y1 y2
let alpha = 0.9
let cv90  = two sample kolm smir test critical value y1 y2
let alpha = 0.99
let cv99  = two sample kolm smir test critical value y1 y2

 PARAMETERS AND CONSTANTS--

    STAT    --        0.28645
    CV95    --        0.25850
    CV90    --        0.23189
    CV99    --        0.30982