 Dataplot Vol 2 Vol 1

# MANN WHITNEY U STATISTIC

Name:
MANN WHITNEY U STATISTIC (LET)
Type:
Analysis Command
Purpose:
Compute the test statistic or alternatively the frequencies and CDF values for the U version of the Mann Whitney rank sum test.
Description:
The t-test is the standard test for testing that the difference between population means for two non-paired samples are equal. The Mann Whitney rank sum test is a non-parameteric alternative to the t-test.

The Mann Whitney rank sum test statistic is computed by:

1. Rank the combined samples.

2. Compute the sum of the ranks for each sample (call these T1 and T2).

3. If the sample sizes are equal. the test statistic is

T = min(T1,T2)

4. If the sample sizes are unequal, let T1 be the sum of the smaller sample size and the test statistic is

T = MIN(T1,N1*(N1 + N2 + 1) - T1)

Sufficiently small values of T cause rejection of the null hypothesis that the sample locations are equal. Significance levels have been tabulated for small values of N1 and N2. For sufficiently large N1 and N2, the following normal approximation is used:

$$Z = \frac{|\mu - T| - 0.5}{\sigma}$$

where

$$\mu = \frac{N_1 (N_1 + N_2 + 1)}{2}$$
$$\sigma = \sqrt{\frac{N_2 \mu}{6}}$$

Some analysts prefer a slightly different formulation for this test

$$U = N_1 N_2 + 0.5 N_1(N_1 + 1) - T$$

This form of the statistic can be computed with the command (Syntax 1)

LET U = MANN WHITNEY U STATISTIC Y1 Y2

Dataplot uses Applied Statistics algorithm 62 (as updated by Alan Miller) to obtain the cumulative frequencies and the corresponding CDF values of the U test statistic.

That is, Syntax 1 is used to compute the value of the test statistic and Syntax 2 is used to obtain the CDF for the test statistic.

Syntax 1:
LET <U> = MANN WHITEY U STATISTIC <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<U> is a parameter where the U version of the Mann Whitney rank sum statistic is saved;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax returns the value of U version of the Mann-Whitney statistic.

Syntax 2:
LET <x> <freq> <cdf> = MANN WHITEY U STATISTIC <n1> <n2>
<SUBSET/EXCEPT/FOR qualification>
where <n1> is a parameter that specifies the sample size for the first response variable;
<n2> is a parameter that specifies the sample size for the second response variable;
<x> is a variable that returns the potential values of the test statistic;
<freq> is a variable containing the cumulative frequencies corresponding to <x>;
<cdf> is a variable containing the CDF values corresponding to <x>;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

This syntax returns the cumulative frequency table (and the corresponding CDF value) for the U version of the Mann Whitney statistic. Note that it only depends on the sample sizes for the two variables, not the data.

Examples:
LET U = MANN WHITNEY U STATISTIC Y1 Y2

LET N1 = SIZE Y1
LET N2 = SIZE Y2
LET X FREQ CDF = MANN WHITNEY U STATISTIC FREQUENCY N1 N2

Default:
None
Synonyms:
None
Related Commands:
 RANK SUM TEST = Compute a Mann Whitney rank sum test. T-TEST = Compute a t-test. SIGN TEST = Compute a sign test. SIGNED RANK TEST = Compute a signed rank test. CHI-SQUARED TWO SAMPLE TEST = Compute a two sample chi-square test. BIHISTOGRAM = Generates a bihistogram. QUANTILE-QUANTILE PLOT = Generate a quantile-quantile plot.
Reference:
Applied Statistics, AS 62.

Conover (1999), "Practical Non-Parametric Statistics," Third Edition, Wiley, pp. 272-281.

Snedecor and Cochran (1989), "Statistical Methods," Eigth Edition, Iowa State University Press, pp. 142-144.

Applications:
Non-Parametric Analysis, Two Sample Tests
Implementation Date:
2011/5
Program:

. Step 1: Read Data (example 2 from pp. 278-279 of Conover)
.
let y1 = data 1 2 3 5
let y2 = data 4 6 7 8 9
.
set write decimals 3
let u = mann whitney u statistic y1 y2
let n1 = size y1
let n2 = size y2
let x freq cdf = mann whitney u statistic frequency  n1 n2
print "Test Statistic = ^u"
print x freq cdf

The following output is generated
Test Statistic = 19

---------------------------------------------
X           FREQ            CDF
---------------------------------------------
0.000          1.000          0.007
1.000          2.000          0.015
2.000          4.000          0.031
3.000          7.000          0.055
4.000         12.000          0.095
5.000         18.000          0.142
6.000         26.000          0.206
7.000         35.000          0.277
8.000         46.000          0.365
9.000         57.000          0.452
10.000         69.000          0.547
11.000         80.000          0.634
12.000         91.000          0.722
13.000        100.000          0.793
14.000        108.000          0.857
15.000        114.000          0.904
16.000        119.000          0.944
17.000        122.000          0.968
18.000        124.000          0.984
19.000        125.000          0.992
20.000        126.000          1.000


NIST is an agency of the U.S. Commerce Department.

Date created: 12/11/2013
Last updated: 12/11/2013