Dataplot Vol 2 Vol 1

# PERCENTILE

Name:
PERCENTILE (LET)
Type:
Let Subcommand
Purpose:
Compute a user specified percentile for a variable.
Description:
The p-th percentile of a data set is defined as that value where p percent of the data is below that value and (1-p) percent of the data is above that value. For example, the 50th percentile is the median.

The default method for computing percentiles in Dataplot is based on the order statistic. The formula is:

$$\hat{X}_p = (1 - r)X_{NI1} + rX_{NI2}$$

where

• X are the observations sorted in ascending order
• NI1 = INT(p*(n+1))
• NI2 = NI1 + 1
• r = p*(n+1) - INT(p*(n+1))

If p is < 1/(n+1), then X1 is returned. If p > n/(n+1), then XN is returned.

Syntax 1:
LET <par> = <value> PERCENTILE <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<par> is a parameter where the computed percentile is stored;
<value> is a parameter that specifies which percentile to compute (it is a percentage value between 0 and 100);
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = PERCENTILE <y>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<par> is a parameter where the computed percentile is stored;
<value> is a parameter that specifies which percentile to compute (it is a percentage value between 0 and 100);
and where the <SUBSET/EXCEPT/FOR qualification> is optional.

With this syntax, the desired percentile is specified by entering the command

LET P100 = <value>

before entering the PERCENTILE command.

Examples:
LET A = 20 PERCENTILE Y
LET A = 50 PERCENTILE Y SUBSET TAG > 2

LET P100 = 90
LET A = PERCENTILE Y

Note:
The QUANTILE command is equivalent to the PERCENTILE command. The difference is that the requested percentile is given as a percentage between 0 and 100% while the requested quantile is specified as a fraction between 0 and 1.
Note:
Note that there are a number of other ways of calculating percentiles in common use. Hyndman and Fan (1996) in an American Statistician article evaluated nine different methods (we will refer to these as R1 through R9) for computing percentiles relative to six desirable properties. Their goal was to advocate a "standard" definition for percentiles that would be implemented in statistical software. Although this has not in fact happened, the article does provide a useful summary and evaluation of various methods for computing percentiles. Most statistical and spreadsheet software use one of the methods described in Hyndman and Fan.

The default method used by Dataplot described above is equivalent to method R6 of Hyndman and Fan. The description of the methods here will be in terms of the quantile q = p/100 where p is the desired percentile.

The method advocated by Hyndman and Fan is R8. For the R8 method,

$$X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1})$$

where

• X are the observations sorted in ascending order
• NI1 = INT(q*(n+(1/3)) + (1/3))
• NI2 = NI1 + 1
• r = q*(n+1) - INT(q*(n+1))

If q ≤ (2/3)/(n+(1/3)) the minimum value will be returned and if q ≥ (n-(1/3))/(n+(1/3)) the maximum value will be returned.

Method R7 (this is the default method in R and Excel) is calculated by

$$X_{q} = X_{NI1} + r(X_{NI2} - X_{NI1})$$

where

• X are the observations sorted in ascending order
• NI1 = INT(q*(in-1) + 1)
• NI2 = NI1 + 1
• r = q*(n+1) - INT(q*(n+1))

If q = 1, then Xn is returned.

The R6, R7, and R8 methods give fairly similar, but not exactly the same (particularly for small samples), results. For most purposes, any of these three methods should be acceptable.

Note:
The following command is used to determine which method is used to compute the quantile:

SET QUANTILE METHOD <ORDER/R6/R7/R8>

R6 is equivalent to ORDER. ORDER is the default.

Note:
Dataplot statistics can be used in 20+ commands. For details, enter

The specific percentile to compute is specified by entering the following command (before the plot command):

LET P100 = <value>

where <value> is a number in the interval (0,100) that specifies the desired percentile.

Default:
None
Synonyms:
None
Related Commands:
 QUANTILE = Compute a specified quantile of the data. MEDIAN = Compute the median of a variable. LOWER QUARTILE = Compute the lower quartile of a variable. UPPER QUARTILE = Compute the upper quartile of a variable. FIRST DECILE = Compute the first decile (the 10th quantile) of a variable. STATISTIC PLOT = Generate a statistic versus subset plot for a given statistics. BOOTSTRAP PLOT = Generate a bootstrap plot for a given statistic.
Reference:
Hyndman and Fan (November 1996), "Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365.
Applications:
Data Analysis
Implementation Date:
1998/12
2015/02: Support for R7 and R8 methods
Program 1:

LET Y1 = NORMAL RANDOM NUMBERS FOR I = 1 1 100
LET A1 = 30 PERCENTILE Y1

The value -0.4613 is returned.
Program 2:

. Step 1:   Read the data (from e-Handbook example)
.
95.1772
95.1567
95.1937
95.1959
95.1442
95.0610
95.1591
95.1195
95.1065
95.0925
95.1990
95.1682
end of data
.
. Step 2:   Compute the quantiles using different methods
.
let p100 = 90
.
let xpr6 = percentile y
let xpr6 = round(xpr6,4)
.
set quantile method r7
let xpr7 = percentile y
let xpr7 = round(xpr7,4)
.
set quantile method r8
let xpr8 = percentile y
let xpr8 = round(xpr8,4)
.
. Step 3:   Print the results
.
print "Percentile with R6 method:  ^xpr6"
print "Percentile with R7 method:  ^xpr7"
print "Percentile with R8 method:  ^xpr8"

The following output is generated.

Percentile with R6 method:  95.1981
Percentile with R7 method:  95.1957
Percentile with R8 method:  95.1972


NIST is an agency of the U.S. Commerce Department.

Date created: 06/05/2001
Last updated: 02/27/2015