Dataplot Vol 2 Vol 1

# QUANTILE

Name:
QUANTILE (LET)
Type:
Let Subcommand
Purpose:
Compute a user specified quantile for a variable.
Description:
The qth quantile of a data set is defined as that value where a q fraction of the data is below that value and (1-q) fraction of the data is above that value. For example, the 0.5 quantile is the median.

Dataplot supports two methods for computing the quantile.

The first method is based on the order statistic. The formula is:

$$\hat{X}_q = (1 - r)X_{NI1} + rX_{NI2}$$

where

• X are the observations sorted in ascending order
• NI1 = INT(q*(n+1))
• NI2 = NI1 + 1
• r = q*(n+1) - INT(q*(n+1))

An alternative method is called the Herrell-Davis estimate. This method attempts to provide a lower standard error for Xq by utilizing all the order statistics rather than a single (or a weighted average of two) order statistic. Note that there are cases where the Herrell-Davis has a substantially smaller standard error than the order statistic method. However, there are also cases where the reverse is true.

To compute the Herrell-Davis estimate, do the following:

1. Sort the X in ascending order.

2. A = (n+1)*q - 1

3. B = (n+1)*(1 - q) - 1

4. Wi = BETCDF(i/n,A,B) - BETCDF((i-1)/n,A,B) where BETCDF is the beta cumulative distribution function with shape parameters A and B.

5. $$\hat{X}_q = \sum_{i=1}^{n}{W_{i}X_{i}}$$

Note: The computations for A and B were modified 2/2003 to:

A = (n+1)*q
B = (n+1)*(1 - q)

The original form was from the text in the Wilcox book. However, checking his S+ macros and verifying against the original Herrell and Davis article indicated that the new formulas are the correct ones.

Syntax:
LET <par> = <quant> QUANTILE <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<quant> is a number or parameter in the range (0,1) that specifies the desired quantile;
<par> is a parameter where the computed quantile is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = 0.20 QUANTILE Y

LET XQ = 0.50
LET A = XQ QUANTILE Y SUBSET TAG > 2

Note:
The PERCENTILE command is equivalent to the QUANTILE command using the order statistic method. The only difference is that the requested percentile is given as a percentage between 0 and 100% rather than as a fraction.
Note:
Note that there are a number of other ways of calculating percentiles in common use. Hyndman and Fan (1996) in an American Statistician article evaluated nine different methods (we will refer to these as R1 through R9) for computing percentiles relative to six desirable properties. Their goal was to advocate a "standard" definition for percentiles that would be implemented in statistical software. Although this has not in fact happened, the article does provide a useful summary and evaluation of various methods for computing percentiles. Most statistical and spreadsheet software use one of the methods described in Hyndman and Fan.

The default method used by Dataplot described above is equivalent to method R6 of Hyndman and Fan.

The method advocated by Hyndman and Fan is R8. For the R8 method,

$$X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1})$$

where

• X are the observations sorted in ascending order
• NI1 = INT(q*(n+(1/3)) + (1/3))
• NI2 = NI1 + 1
• r = q*(n+1) - INT(q*(n+1))

If q ≤ (2/3)/(n+(1/3)) the minimum value will be returned and if q ≥ (n-(1/3))/(n+(1/3)) the maximum value will be returned.

Method R7 (this is the default method in R and Excel) is calculated by

$$X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1})$$

where

• X are the observations sorted in ascending order
• NI1 = INT(q*(in-1) + 1)
• NI2 = NI1 + 1
• r = q*(n+1) - INT(q*(n+1))

If q = 1, then Xn is returned.

The R6, R7, and R8 methods give fairly similar, but not exactly the same (particularly for small samples), results. For most purposes, any of these three methods should be acceptable.

Note:
The following command is used to determine which method is used to compute the quantile:

SET QUANTILE METHOD <ORDER/HERRELL-DAVIS/R6/R7/R8>

R6 is equivalent to ORDER. ORDER is the default.

Note:
Dataplot statistics can be used in 20+ commands. For details, enter

The specific quantile to compute is specified by entering the following command (before the plot command):

LET XQ = <value>

where <value> is a number in the interval (0,1) that specifies the desired quantile.

Default:
The default is to use the order statistic method to compute the quantile.
Synonyms:
None
Related Commands:
 PERCENTILE = Compute a percentile of a variable. MEDIAN = Compute the median of a variable. LOWER QUARTILE = Compute the lower quartile of a variable. UPPER QUARTILE = Compute the upper quartile of a variable. FIRST DECILE = Compute the first decile (the 10th quantile) of a variable. STATISTIC PLOT = Generate a statistic versus subset plot for a given statistics. BOOTSTRAP PLOT = Generate a bootstrap plot for a given statistic.
References:
Hyndman and Fan (November 1996), "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365.

Rand Wilcox (1997), "Introduction to Robust Estimation and Hypothesis Testing", Academic Press.

Frank Herrell and C. E. Davis, (1982), "A New Distribution-Free Quantile Estimator", Biometrika, 69(3), 635-640.

Applications:
Data Analysis
Implementation Date:
2002/7
2003/2: Correction to Herrell-Davis estimate.
2015/2: Support for R7 and R8 methods
Program 1:
LET Y1 = LOGISTIC RANDOM NUMBERS FOR I = 1 1 100
LET XQ = 0.05
LET Q05A = XQ QUANTILE Y1
LET XQ = 0.95
LET Q95A = XQ QUANTILE Y1
SET QUANTILE METHOD HERRELL DAVIS
LET XQ = 0.05
LET Q05B = XQ QUANTILE Y1
LET XQ = 0.95
LET Q95B = XQ QUANTILE Y1
LET Q05A = ROUND(Q05A,4)
LET Q95A = ROUND(Q95A,4)
LET Q05B = ROUND(Q05B,4)
LET Q95B = ROUND(Q95B,4)
PRINT "R6 METHOD: 0.05 Quantile = ^Q05A"
PRINT "R6 METHOD: 0.95 Quantile = ^Q95A"
PRINT "HD METHOD: 0.05 Quantile = ^Q05B"
PRINT "HD METHOD: 0.95 Quantile = ^Q95B"

The following output is generated
R6 METHOD: 0.05 Quantile = -2.5442
R6 METHOD: 0.95 Quantile = 3.4716
HD METHOD: 0.05 Quantile = -2.7308
HD METHOD: 0.95 Quantile = 3.3067

Program 2:
. Step 1:   Read the data (from e-Handbook example)
.
95.1772
95.1567
95.1937
95.1959
95.1442
95.0610
95.1591
95.1195
95.1065
95.0925
95.1990
95.1682
end of data
.
. Step 2:   Compute the quantiles using different methods
.
let xq = 0.90
.
let xqr6 = quantile y
let xqr6 = round(xqr6,4)
.
set quantile method r7
let xqr7 = quantile y
let xqr7 = round(xqr7,4)
.
set quantile method r8
let xqr8 = quantile y
let xqr8 = round(xqr8,4)
.
. Step 3:   Print the results
.
print "Quantile with R6 method:  ^xqr6"
print "Quantile with R7 method:  ^xqr7"
print "Quantile with R8 method:  ^xqr8"

The following output is generated
Quantile with R6 method:  95.1981
Quantile with R7 method:  95.1957
Quantile with R8 method:  95.1972


NIST is an agency of the U.S. Commerce Department.

Date created: 07/22/2002
Last updated: 02/27/2015