DISTRIBUTIONAL BOOTSTRAP

Name:

BOOTSTRAP PLOT Type:

Graphics Command Purpose:

Generates a bootstrap plot for a given probability distribution. Description:

One limitation of this method is that it does not provide a method for finding uncertainty intervals for these estimates. To address this, we have extended the BOOTSTRAP PLOT command to support a number of probability distributions.

The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement.

To apply the bootstrap to the univariate distributional modeling problem, we do the following:

We have a univariate dataset containing n points.
We draw a bootstrap sample from the original data.
We perform the estimation for the bootstrap sample.
- For location/scale distributions, we estimate the parameters from a probability plot.
  The PPCC value is also computed for the bootstrap sample.
- For distributions with either one or two shape parameters, we generate a PPCC plot (alternatively, we can generate a KS plot) to estimate the shape parameters. We then generate a probability plot to estimate the location and scale parameters.
  The PPCC value is also computed for the bootstrap sample. If a KS plot is being used, the value of Kolmogorov-Smirnov statistic is computed instead.
- Bootstrapping is also supported for maximum likelihood estimation for a number of distributions. In this case, we perform the maximum likelihood estimation on the bootstrap sample.
  In this case, no PPCC or Kolmogorov-Smirnov statistic is computed.

For the bootstrap plot, there will be a separate curve drawn for each parameter estimated. In addition, there will be a curve drawn for the PPCC value (or the Kolmogorov-Smirnov statistic). The vertical axis contains the computed value of these estimated parameters and the horizontal axis contains the sample number (for k = 1, 2, ..., N).

The bootstrap plot is typically followed by some type of distributional plot, such as a histogram, for each estimated parameter. This is demostrated in the Program sample below.

Dataplot also supports bootstrap computations for the case when there is one group variable. In this case, the horizontal axis is group id and the vertical axis contains the computed values of the estimated parameters for that group (the parameters are offset horizontally). The number of bootstrap samples is applied to each group. For example,if the requested number of bootstrap samples is 100, then each group will have 100 bootstrap samples applied.

Syntax 1:

This syntax estimates the location and scale parameters for each bootstrap sample using a probability plot.

Syntax 2:

This syntax estimates the location and scale parameters for each bootstrap sample using a censored probability plot. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 3:

This syntax estimates the shape parameter using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample .

Syntax 4:

This syntax estimates the shape parameter using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 5:

This syntax estimates the shape parameter using a KS plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 6:

This syntax estimates the two shape parameters using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 7:

This syntax estimates the two shape parameters using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 8:

This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample.

Syntax 9:

This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 10:

This syntax estimates the shape and scale parameters (the Beta, Pareto, and inverse Gaussian estimate the two shape parameters but no scale parameter) using maximum likelihood for each bootstrap sample.

Syntax 11:

Syntax 12:

This syntax estimates the shape, location, and scale parameters using maximum likelihood for each bootstrap sample.

Examples:

Note:

dpst1f.dat

The estimates for each bootstrap sample are written to file dpst1f.dat. You can read the variables written to dpst1f.dat to generate histograms and to compute selected percentiles.

The order is:

If a particular syntax does not generate one or more of these values, then they are omitted (e.g., the normal distribution does not generate estimates for any shape parameters).

The following example generates a Weibull bootstrap plot and then reads the bootstrap estimates from dpst1f.dat.

dpst2f.dat

Selected percentiles are written to dpst2f.dat. The order is:

Note:

If you enter the command

SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT

you will get bootstrap estimates for the following percentiles:

0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 95.0, 97.5, 99.0, 99.5

If you would like to specify the specific percentiles to estimate, enter the command

SET MAXIMUM LIKELIHOOD PERCENTILES YPERC

with YPERC denoting a variable that contains the desired percentiles.

This is demonstrated in the sample program below.

By default, two sided confidence intervals are generated for the percentiles. The following commands can be used to generate either lower one sided or upper one sided intervals

To turn off the computation of the percentile confidence intervals, enter

SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES NONE

To reset the default of two sided intervals, enter

SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES TWOSIDED

Note:

One recommendation is to generate the bootstrap plot for a relatively small number of samples (e.g., 50) and use that to determine a reasonable range for the shape parameter.

Enter HELP PPCC PLOT to see the relevant parameter for the desired distribution.

Note:

The BOOTSTRAP PLOT supports ungrouped data, one group variable, or two group variables. The distributional bootstrap plots do not currently support two group variables. Note:

The BCA option is not currently supported for the bootstrap distributional plots.

Note:

The full sample is used to generate the parameter estimates of the distribution. Then the bootstrap samples are generated by generating random numbers from the specified distribution using the parameters estimated from the full sample.

The following command will specify this alternate method of bootstrapping be used:

SET DISTRIBUTIONAL BOOTSTRAP PARAMETRIC

To restore the default method of bootstrapping the data values, enter the command

SET DISTRIBUTIONAL BOOTSTRAP NONPARAMETRIC

The parameteric option is still being tested and may not work correctly for some of the distributions.

Note:

The BOOTSTRAP SAMPLE command can be used to specify the desired number of bootstrap samples. Recommended values are between 100 and 1,000. The default is 100. Default:

None Synonyms:

Related Commands:

BOOTSTRAP PLOT	= Generate a bootstrap plot.
PPCC PLOT	= Generate a ppcc plot.
KS PLOT	= Generate a ks plot.
PROBABILITY PLOT	= Generate a probability plot.
JACKNIFE PLOT	= Generate a jacknife plot.
BOOTSTRAP SAMPLE	= Set the sample size for the bootstrap
BOOTSTRAP FIT	= Compute a bootstrap linear/multilinear fit.
HISTOGRAM	= Generates a histogram.
PLOT	= Generates a data/function plot.

Reference:

The American Statistician

Efron and Tibshirabi (1993), "An Introduction to the Bootstrap", Springer-Verlang.

Applications:

Distributional Modeling Implementation Date:

2005/4: Program:

 
.  Following Sample Macro demonstrates the use of the
.  bootstrap with a Weibull distribution.
.
.  Step 0: Create some sample Weibull data
.
dimension 50 columns
.
let gamma = 2.3
let y = weibull random numbers for i = 1 1 100
.
. Step 1: Perform PPCC/Probability Plot Analysis,
.         Perform K-S goodness of fit test
.
set ipl1na distboo1.ps
device 2 postscript
device 2 color on
.
multiplot 2 2
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.5
y1label displacement 12
.
title displacement 2
x1label displacement 12
title Weibull PPCC Plot
x1label Shape Parameter (gamma)
y1label Correlation
weibull ppcc plot y
justification left
height 3.5
move 25 28
text Max PPCC: ^maxppcc
move 25 21
text Shape: ^shape
let gamma1 = shape - 2
if gamma1 <= 0
   let gamma1 = 0.1
end of if
let gamma2 = shape + 2
title Weibull PPCC Plot
weibull ppcc plot y
move 25 28
text Max PPCC: ^maxppcc
move 25 23
text Shape: ^shape
let gamma = shape
if n <= 200
   character x
   line blank
else
   line solid
   character blank
end of if
.
title Weibull Probability Plot
x1label displacement
x1label Theoretical
y1label Data
weibull probability plot y
justification center
move 50 2
text Location:  ^ppa0, Scale:  ^ppa1
let iplot = 3
multiplot 2 2 iplot
line solid
character blank
limits freeze
pre-erase off
let function f = ppa0 + ppa1*x
let zmin = minimum xplot
let zmax = maximum xplot
let ainc2 = (zmax - zmin)/10
plot f for x = zmin ainc2 zmax
limits
pre-erase on
let iplot = iplot + 1
multiplot 2 2 iplot
title Histogram with Overlaid Weibull
x1label Data Units
y1label Density
relative histogram y
multiplot 2 2 iplot
limits freeze
pre-erase off
let loc2=ppa0
let scale2=ppa1
line solid
char blank
line color blue
let amin = minimum y
let amax = maximum y
let ainc = 0.1
plot weipdf(x,shape,loc2,scale2) for x = amin ainc amax
limits
pre-erase on
line color black
delete gamma gamma1 gamma2
end of multiplot
device 2 close
.
. Step 2: Now perform a bootstrap analysis
.
feedback off
capture distboot.out
write " "
write "BOOTSTRAP-BASED INTERVALS"
write " "
set maximum likelihood percentiles default
bootstrap samples 200
let gamma1 = 0.2
let gamma2 = 5
set ipl1na distboo2.ps
device 2 postscript
device 2 color on
multiplot 2 3
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.7
y1label Parameter Estimate
x1label
x2label Bootstrap Sample
title Bootstrap Plot
line color blue red green
limits
bootstrap weibull plot y
line color black all
.
delete aloc ascale ashape appcc
skip 0
read dpst1f.dat aloc ascale ashape appcc
y1label
x2label
x3label displacement 16
title Location Parameter
let amed = median aloc
let amean = mean aloc
let asd = sd aloc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram aloc
title Scale Parameter
let amed = median ascale
let amean = mean ascale
let asd = sd ascale
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ascale
title Shape Parameter
let amed = median ashape
let amean = mean ashape
let asd = sd ashape
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ashape
title PPCC Value
let amed = median appcc
let amean = mean appcc
let asd = sd appcc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram appcc
x3label displacement
.
device 2 close
.
let alpha = 0.05
let xqlow = alpha/2
let xqupp = 1 - alpha/2
.
write "Bootstrap-based Confidence Intervals"
write "alpha = ^alpha"
write " "
.
let xq = xqlow
let loc95low = xq quantile aloc
let xq = xqupp
let loc95upp = xq quantile aloc
let xq = xqlow
let sca95low = xq quantile ascale
let xq = xqupp
let sca95upp = xq quantile ascale
let xq = xqlow
let sha95low = xq quantile ashape
let xq = xqupp
let sha95upp = xq quantile ashape
write "Confidence Interval for Location: (^loc95low,^loc95upp)"
write "Confidence Interval for Scale:    (^sca95low,^sca95upp)"
write "Confidence Interval for Gamma:    (^sha95low,^sha95upp)"
.
.  Now generate confidence intervals for percentiles
.
serial read p
0.5 1 2.5 5 10 20 30 40 50 60 70 80 90 95 97.5 99 99.5
end of data
let nperc = size p
skip 1
read matrix dpst4f.dat  xqp
write " "
loop for k = 1 1 nperc
    let xqptemp = p(k)
    let amed = median xqp^k
    let xqpmed(k) = amed
    let xq = xqlow
    let atemp = xq quantile xqp^k
    let xq95low(k) = atemp
    let xq = xqupp
    let atemp = xq quantile xqp^k
    let xq95upp(k) = atemp
end of loop
set table title "Bootstrap Based Confidence Intervals for Percentiles"
set table spacing 15
set write decimals 7
write " "
write "Confidence Intervals for Percentiles"
write p xqpmed xq95low xq95upp
end of capture
delete xqp xqpmed xq95low xq95upp
delete gamma1 gamma2

 BOOTSTRAP-BASED INTERVALS
  
 Bootstrap-based Confidence Intervals
 alpha = 0.05
  
 Confidence Interval for Location: (-0.4112,0.272742)
 Confidence Interval for Scale:    (0.744203,1.572763)
 Confidence Interval for Gamma:    (1.738776,3.748062)
  
  
 Confidence Intervals for Percentiles

 VARIABLES--P              XQPMED         XQ95LOW        XQ95UPP 

      0.5000000      0.1427522     -0.0264724      0.3189068
      1.0000000      0.1783569      0.0333498      0.3433715
      2.5000000      0.2441821      0.1315464      0.3925191
      5.0000000      0.3210546      0.2229533      0.4490436
     10.0000000      0.4301926      0.3367559      0.5266151
     20.0000000      0.5761956      0.4959525      0.6575613
     30.0000000      0.6924722      0.6142470      0.7801363
     40.0000000      0.8017390      0.7214258      0.8862920
     50.0000000      0.9073263      0.8198442      0.9952904
     60.0000000      1.0129241      0.9191304      1.1054595
     70.0000000      1.1306920      1.0393872      1.2280047
     80.0000000      1.2808006      1.1713773      1.3832619
     90.0000000      1.4895294      1.3569634      1.5912455
     95.0000000      1.6622919      1.5139780      1.7756084
     97.5000000      1.8108935      1.6356139      1.9443940
     99.0000000      1.9902619      1.7803770      2.1564791
     99.5000000      2.1085887      1.8710965      2.3045762