Dataplot

Vol 2

Vol 1

QUANTILE

Name:

QUANTILE (LET) Type:

Let Subcommand Purpose:

Compute a user specified quantile for a variable. Description:

Dataplot supports two methods for computing the quantile.

The first method is based on the order statistic. The formula is:

\( \hat{X}_q = (1 - r)X_{NI1} + rX_{NI2} \)

where

X are the observations sorted in ascending order
NI1 = INT(q*(n+1))
NI2 = NI1 + 1
r = q*(n+1) - INT(q*(n+1))

An alternative method is called the Herrell-Davis estimate. This method attempts to provide a lower standard error for X_q by utilizing all the order statistics rather than a single (or a weighted average of two) order statistic. Note that there are cases where the Herrell-Davis has a substantially smaller standard error than the order statistic method. However, there are also cases where the reverse is true.

To compute the Herrell-Davis estimate, do the following:

Sort the X in ascending order.
A = (n+1)*q - 1
B = (n+1)*(1 - q) - 1
W_i = BETCDF(i/n,A,B) - BETCDF((i-1)/n,A,B) where BETCDF is the beta cumulative distribution function with shape parameters A and B.
\( \hat{X}_q = \sum_{i=1}^{n}{W_{i}X_{i}} \)

Note: The computations for A and B were modified 2/2003 to:

The original form was from the text in the Wilcox book. However, checking his S+ macros and verifying against the original Herrell and Davis article indicated that the new formulas are the correct ones.

Syntax:

Examples:

LET XQ = 0.50
LET A = XQ QUANTILE Y SUBSET TAG > 2

Note:

The PERCENTILE command is equivalent to the QUANTILE command using the order statistic method. The only difference is that the requested percentile is given as a percentage between 0 and 100% rather than as a fraction. Note:

The default method used by Dataplot described above is equivalent to method R6 of Hyndman and Fan.

The method advocated by Hyndman and Fan is R8. For the R8 method,

\( X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1}) \)

where

X are the observations sorted in ascending order
NI1 = INT(q*(n+(1/3)) + (1/3))
NI2 = NI1 + 1
r = q*(n+1) - INT(q*(n+1))

If q ≤ (2/3)/(n+(1/3)) the minimum value will be returned and if q ≥ (n-(1/3))/(n+(1/3)) the maximum value will be returned.

Method R7 (this is the default method in R and Excel) is calculated by

\( X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1}) \)

where

X are the observations sorted in ascending order
NI1 = INT(q*(in-1) + 1)
NI2 = NI1 + 1
r = q*(n+1) - INT(q*(n+1))

If q = 1, then X_n is returned.

The R6, R7, and R8 methods give fairly similar, but not exactly the same (particularly for small samples), results. For most purposes, any of these three methods should be acceptable.

Note:

SET QUANTILE METHOD <ORDER/HERRELL-DAVIS/R6/R7/R8>

R6 is equivalent to ORDER. ORDER is the default.

Note:

HELP STATISTICS

The specific quantile to compute is specified by entering the following command (before the plot command):

LET XQ = <value>

where <value> is a number in the interval (0,1) that specifies the desired quantile.

Default:

The default is to use the order statistic method to compute the quantile. Synonyms:

None Related Commands:

PERCENTILE	= Compute a percentile of a variable.
MEDIAN	= Compute the median of a variable.
LOWER QUARTILE	= Compute the lower quartile of a variable.
UPPER QUARTILE	= Compute the upper quartile of a variable.
FIRST DECILE	= Compute the first decile (the 10th quantile) of a variable.
STATISTIC PLOT	= Generate a statistic versus subset plot for a given statistics.
BOOTSTRAP PLOT	= Generate a bootstrap plot for a given statistic.

References:

The American Statistician

Rand Wilcox (1997), "Introduction to Robust Estimation and Hypothesis Testing", Academic Press.

Frank Herrell and C. E. Davis, (1982), "A New Distribution-Free Quantile Estimator", Biometrika, 69(3), 635-640.

Applications:

Data Analysis Implementation Date:

Program 1:

LET Y1 = LOGISTIC RANDOM NUMBERS FOR I = 1 1 100 
LET XQ = 0.05 
LET Q05A = XQ QUANTILE Y1 
LET XQ = 0.95 
LET Q95A = XQ QUANTILE Y1 
SET QUANTILE METHOD HERRELL DAVIS
LET XQ = 0.05 
LET Q05B = XQ QUANTILE Y1 
LET XQ = 0.95 
LET Q95B = XQ QUANTILE Y1 
LET Q05A = ROUND(Q05A,4)
LET Q95A = ROUND(Q95A,4)
LET Q05B = ROUND(Q05B,4)
LET Q95B = ROUND(Q95B,4)
PRINT "R6 METHOD: 0.05 Quantile = ^Q05A"
PRINT "R6 METHOD: 0.95 Quantile = ^Q95A"
PRINT "HD METHOD: 0.05 Quantile = ^Q05B"
PRINT "HD METHOD: 0.95 Quantile = ^Q95B"

R6 METHOD: 0.05 Quantile = -2.5442
R6 METHOD: 0.95 Quantile = 3.4716
HD METHOD: 0.05 Quantile = -2.7308
HD METHOD: 0.95 Quantile = 3.3067

Program 2:

. Step 1:   Read the data (from e-Handbook example)
.
read y
95.1772
95.1567
95.1937
95.1959
95.1442
95.0610
95.1591
95.1195
95.1065
95.0925
95.1990
95.1682 
end of data
.
. Step 2:   Compute the quantiles using different methods
.
let xq = 0.90
.
let xqr6 = quantile y
let xqr6 = round(xqr6,4)
.
set quantile method r7
let xqr7 = quantile y
let xqr7 = round(xqr7,4)
.
set quantile method r8
let xqr8 = quantile y
let xqr8 = round(xqr8,4)
.
. Step 3:   Print the results
.
print "Quantile with R6 method:  ^xqr6"
print "Quantile with R7 method:  ^xqr7"
print "Quantile with R8 method:  ^xqr8"

Quantile with R6 method:  95.1981
Quantile with R7 method:  95.1957
Quantile with R8 method:  95.1972