SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

DISTRIBUTIONAL BOOTSTRAP

Name:
    BOOTSTRAP PLOT
Type:
    Graphics Command
Purpose:
    Generates a bootstrap plot for a given probability distribution.
Description:
    The PPCC PLOT and KS PLOT provide a graphical method for estimating the shape parameter for a proabability distribution. The PROBABILITY PLOT can then be used to estimate the location and scale parameters.

    One limitation of this method is that it does not provide a method for finding uncertainty intervals for these estimates. To address this, we have extended the BOOTSTRAP PLOT command to support a number of probability distributions.

    The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement.

    To apply the bootstrap to the univariate distributional modeling problem, we do the following:

    1. We have a univariate dataset containing n points.

    2. We draw a bootstrap sample from the original data.

    3. We perform the estimation for the bootstrap sample.

      • For location/scale distributions, we estimate the parameters from a probability plot.

        The PPCC value is also computed for the bootstrap sample.

      • For distributions with either one or two shape parameters, we generate a PPCC plot (alternatively, we can generate a KS plot) to estimate the shape parameters. We then generate a probability plot to estimate the location and scale parameters.

        The PPCC value is also computed for the bootstrap sample. If a KS plot is being used, the value of Kolmogorov-Smirnov statistic is computed instead.

      • Bootstrapping is also supported for maximum likelihood estimation for a number of distributions. In this case, we perform the maximum likelihood estimation on the bootstrap sample.

        In this case, no PPCC or Kolmogorov-Smirnov statistic is computed.

    For the bootstrap plot, there will be a separate curve drawn for each parameter estimated. In addition, there will be a curve drawn for the PPCC value (or the Kolmogorov-Smirnov statistic). The vertical axis contains the computed value of these estimated parameters and the horizontal axis contains the sample number (for k = 1, 2, ..., N).

    The bootstrap plot is typically followed by some type of distributional plot, such as a histogram, for each estimated parameter. This is demostrated in the Program sample below.

    Dataplot also supports bootstrap computations for the case when there is one group variable. In this case, the horizontal axis is group id and the vertical axis contains the computed values of the estimated parameters for that group (the parameters are offset horizontally). The number of bootstrap samples is applied to each group. For example,if the requested number of bootstrap samples is 100, then each group will have 100 bootstrap samples applied.

Syntax 1:
    BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the following distributions:
        ANGLIT
        ARCSINE
        CAUCHY
        COSINE
        EXPONENTIAL
        GUMBEL (EXTREME VALUE TYPE 1)
        HALF CAUCHY
        HALF LOGISTIC
        HALF NORMAL
        HYPERBOLIC SECANT
        LAPLACE (DOUBLE EXPONENTIAL)
        LOGISTIC
        NORMAL
        RAYLEIGH
        SEMI-CIRCULAR
        SLASH
        UNIFORM
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the location and scale parameters for each bootstrap sample using a probability plot.

Syntax 2:
    BOOTSTRAP <dist> CENSORED PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is the censoring variable;
                <dist> is one of the distributions given for Synatx 1;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the location and scale parameters for each bootstrap sample using a censored probability plot. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 3:
    BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the following distributions:
        ASYMMETRIC LAPLACE (ASYMMETRIC DOUBLE EXPONENTIAL)
        BRADFORD
        CHI
        CHI-SQUARE
        DOUBLE GAMMA
        DOUBLE WEIBULL
        ERROR (SUBBOTIN)
        FATIGUE LIFE
        FOLDED T
        FRECHET
        GAMMA
        GENERALIZED EXTREME VALUE
        GENERALIZED HALF LOGISTIC
        GENERALIZED LOGISTIC
        GENERALIZED PARETO
        GEOMETRIC EXTREME EXPONENTIAL
        INVERTED GAMMA
        INVERTED WEIBULL
        LOG DOUBLE EXPONENTIAL (LOG LAPLACE)
        LOG GAMMA
        LOG LOGISTIC
        LOGNORMAL
        PARETO
        PARETO SECOND KIND
        POWER
        POWER NORMAL
        RECIPROCAL
        SKEW LAPLACE (SKEW DOUBLE EXPONENTIAL)
        T
        TUKEY-LAMBDA
        VON MISES
        WALD
        WRAPPED CAUCHY
        WEIBULL
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the shape parameter using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample .

Syntax 4:
    BOOTSTRAP <dist> CENSORED PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is the censoring variable;
                <dist> is one of the distributions given for syntax 3;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the shape parameter using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 5:
    BOOTSTRAP <dist> KS PLOT <y>    <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the distributions given for syntax 3;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the shape parameter using a KS plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 6:
    BOOTSTRAP <dist> PLOT <y>       <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
    <dist> is one of the following distributions:
        BETA
        F
        FOLDED NORMAL
        G-AND-H
        INVERSE GAUSSIAN
        GENERALIZED GAMMA
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the two shape parameters using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.

Syntax 7:
    BOOTSTRAP <dist> CENSORED PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is the censoring variable;
                <dist> is one of the distributions given in syntax 6;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the two shape parameters using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 8:
    BOOTSTRAP <dist> MLE PLOT <y>   <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the following distributions:
        CAUCHY
        EXPONENTIAL
        FOLDED NORMAL
        GUMBEL (EXTREME VALUE TYPE 1)
        LAPLACE (DOUBLE EXPONENTIAL)
        LOGISTIC
        NORMAL
        RAYLEIGH
        UNIFORM
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample.

Syntax 9:
    BOOTSTRAP <dist> CENSORED MLE PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is the censoring variable;
                <dist> is one of the following distributions:
        NORMAL
        EXPONENTIAL
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 10:
    BOOTSTRAP <dist> MLE PLOT <y>    <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the following distributions:
        BETA
        FATIGUE LIFE
        GAMMA
        GENERALIZED PARETO
        GEOMETRIC EXTREME EXPONENTIAL
        INVERSE GAUSSIAN
        LOGNORMAL
        PARETO
        WEIBULL
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the shape and scale parameters (the Beta, Pareto, and inverse Gaussian estimate the two shape parameters but no scale parameter) using maximum likelihood for each bootstrap sample.

Syntax 11:
    BOOTSTRAP <dist> CENSORED MLE PLOT <y> <x>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <x> is the censoring variable;
                <dist> is one of the following distributions:
        CENSORED GAMMA
        CENSORED LOGNORMAL
        CENSORED WEIBULL
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.

Syntax 12:
    BOOTSTRAP <dist> MLE PLOT <y>   <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
                <dist> is one of the following distributions:
        ASYMMETRIC LAPLACE
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax estimates the shape, location, and scale parameters using maximum likelihood for each bootstrap sample.

Examples:
    BOOTSTRAP NORMAL PLOT Y
    BOOTSTRAP NORMAL MLE PLOT Y
    BOOTSTRAP WEIBULL PLOT Y
    BOOTSTRAP WEIBULL KS PLOT Y
    BOOTSTRAP WEIBULL CENSORED PLOT Y X
Note:
    The BOOTSTRAP PLOT command generates the estimates for the bootstrap samples. Typically, these values are processed further for a complete bootstrap analysis. To simplify this, Dataplot writes information to files.

    dpst1f.dat

    The estimates for each bootstrap sample are written to file dpst1f.dat. You can read the variables written to dpst1f.dat to generate histograms and to compute selected percentiles.

    The order is:

      group id
      location parameter
      scale parameter
      first shape parameter
      second shape parameter
      ppcc (or ks) value

    If a particular syntax does not generate one or more of these values, then they are omitted (e.g., the normal distribution does not generate estimates for any shape parameters).

    The following example generates a Weibull bootstrap plot and then reads the bootstrap estimates from dpst1f.dat.

      WEIBULL BOOTSTRAP PLOT Y
      SKIP 0
      READ DPST1F.DAT ALOC ASCALE AGAMMA APPCC
      MULTIPLOT 2 2
      RELATIVE HISTOGRAM ALOC
      RELATIVE HISTOGRAM ASCALE
      RELATIVE HISTOGRAM AGAMMA
      RELATIVE HISTOGRAM APPCC
      END OF MULTIPLOT

    dpst2f.dat

    Selected percentiles are written to dpst2f.dat. The order is:

      Group id - omitted for ungrouped data
      Parameter number - parameters are ordered as they are in dpst1f.dat
      Mean
      Standard Deviation
      Median
      2.5 percentile
      97.5 percentile
      5.0 percentile
      95.0 percentile
      0.5 percentile
      99.5 percentile
Note:
    You can optionally specify that bootstrap estimates of selected percentiles be generated. This option is off by default.

    If you enter the command

      SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT

    you will get bootstrap estimates for the following percentiles:

      0.5, 1.0, 2.5, 5.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 95.0, 97.5, 99.0, 99.5

    If you would like to specify the specific percentiles to estimate, enter the command

      SET MAXIMUM LIKELIHOOD PERCENTILES YPERC

    with YPERC denoting a variable that contains the desired percentiles.

    This is demonstrated in the sample program below.

    By default, two sided confidence intervals are generated for the percentiles. The following commands can be used to generate either lower one sided or upper one sided intervals

      SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES LOWER
      SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES UPPER

    To turn off the computation of the percentile confidence intervals, enter

      SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES NONE

    To reset the default of two sided intervals, enter

      SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES TWOSIDED
Note:
    As in the PPCC and KS plots, you can specify the range for shape parameter. For example, to restrict the estimate of the shape parameter of a Weibull distribution to values between 0.5 and 10, enter the commands

      LET GAMMA1 = 0.5
      LET GAMMA2 = 10
      BOOTSTRAP WEIBULL PLOT Y

    One recommendation is to generate the bootstrap plot for a relatively small number of samples (e.g., 50) and use that to determine a reasonable range for the shape parameter.

    Enter HELP PPCC PLOT to see the relevant parameter for the desired distribution.

Note:
    The BOOTSTRAP PLOT supports ungrouped data, one group variable, or two group variables. The distributional bootstrap plots do not currently support two group variables.
Note:
    Dataplot supports the BCA BOOTSTRAP PLOT option to generate more accurate confidence intervals. BCa is an abbreviation for "acceleration" and "bias-correction". It provides second order accuracy (as oppossed to the first order accurary of the confidence intervals generated for the percentiles of the bootstrap samples).

    The BCA option is not currently supported for the bootstrap distributional plots.

Note:
    Some analysts prefer a different method for generating the bootstrap samples.

    The full sample is used to generate the parameter estimates of the distribution. Then the bootstrap samples are generated by generating random numbers from the specified distribution using the parameters estimated from the full sample.

    The following command will specify this alternate method of bootstrapping be used:

      SET DISTRIBUTIONAL BOOTSTRAP PARAMETRIC

    To restore the default method of bootstrapping the data values, enter the command

      SET DISTRIBUTIONAL BOOTSTRAP NONPARAMETRIC

    The parameteric option is still being tested and may not work correctly for some of the distributions.

Note:
    The BOOTSTRAP SAMPLE command can be used to specify the desired number of bootstrap samples. Recommended values are between 100 and 1,000. The default is 100.
Default:
    None
Synonyms:
    KOLMOGOROV SMIRNOV is a synonym for KS
    MAXIMUM LIKELIHOOD is a synonym for MLE
Related Commands: Reference:
    Efron and Gong (1983), "A Leisurely Look at the Bootstrap, the Jacknife, and Cross-Validation", The American Statistician, February, 1983.

    Efron and Tibshirabi (1993), "An Introduction to the Bootstrap", Springer-Verlang.

Applications:
    Distributional Modeling
Implementation Date:
    2005/4:
Program:
     
    .  Following Sample Macro demonstrates the use of the
    .  bootstrap with a Weibull distribution.
    .
    .  Step 0: Create some sample Weibull data
    .
    dimension 50 columns
    .
    let gamma = 2.3
    let y = weibull random numbers for i = 1 1 100
    .
    . Step 1: Perform PPCC/Probability Plot Analysis,
    .         Perform K-S goodness of fit test
    .
    set ipl1na distboo1.ps
    device 2 postscript
    device 2 color on
    .
    multiplot 2 2
    multiplot corner coordinates 0 0 100 100
    multiplot scale factor 1.5
    y1label displacement 12
    .
    title displacement 2
    x1label displacement 12
    title Weibull PPCC Plot
    x1label Shape Parameter (gamma)
    y1label Correlation
    weibull ppcc plot y
    justification left
    height 3.5
    move 25 28
    text Max PPCC: ^maxppcc
    move 25 21
    text Shape: ^shape
    let gamma1 = shape - 2
    if gamma1 <= 0
       let gamma1 = 0.1
    end of if
    let gamma2 = shape + 2
    title Weibull PPCC Plot
    weibull ppcc plot y
    move 25 28
    text Max PPCC: ^maxppcc
    move 25 23
    text Shape: ^shape
    let gamma = shape
    if n <= 200
       character x
       line blank
    else
       line solid
       character blank
    end of if
    .
    title Weibull Probability Plot
    x1label displacement
    x1label Theoretical
    y1label Data
    weibull probability plot y
    justification center
    move 50 2
    text Location:  ^ppa0, Scale:  ^ppa1
    let iplot = 3
    multiplot 2 2 iplot
    line solid
    character blank
    limits freeze
    pre-erase off
    let function f = ppa0 + ppa1*x
    let zmin = minimum xplot
    let zmax = maximum xplot
    let ainc2 = (zmax - zmin)/10
    plot f for x = zmin ainc2 zmax
    limits
    pre-erase on
    let iplot = iplot + 1
    multiplot 2 2 iplot
    title Histogram with Overlaid Weibull
    x1label Data Units
    y1label Density
    relative histogram y
    multiplot 2 2 iplot
    limits freeze
    pre-erase off
    let loc2=ppa0
    let scale2=ppa1
    line solid
    char blank
    line color blue
    let amin = minimum y
    let amax = maximum y
    let ainc = 0.1
    plot weipdf(x,shape,loc2,scale2) for x = amin ainc amax
    limits
    pre-erase on
    line color black
    delete gamma gamma1 gamma2
    end of multiplot
    device 2 close
    .
    . Step 2: Now perform a bootstrap analysis
    .
    feedback off
    capture distboot.out
    write " "
    write "BOOTSTRAP-BASED INTERVALS"
    write " "
    set maximum likelihood percentiles default
    bootstrap samples 200
    let gamma1 = 0.2
    let gamma2 = 5
    set ipl1na distboo2.ps
    device 2 postscript
    device 2 color on
    multiplot 2 3
    multiplot corner coordinates 0 0 100 100
    multiplot scale factor 1.7
    y1label Parameter Estimate
    x1label
    x2label Bootstrap Sample
    title Bootstrap Plot
    line color blue red green
    limits
    bootstrap weibull plot y
    line color black all
    .
    delete aloc ascale ashape appcc
    skip 0
    read dpst1f.dat aloc ascale ashape appcc
    y1label
    x2label
    x3label displacement 16
    title Location Parameter
    let amed = median aloc
    let amean = mean aloc
    let asd = sd aloc
    x2label Med = ^amed, Mean = ^amean
    x3label SD = ^asd
    histogram aloc
    title Scale Parameter
    let amed = median ascale
    let amean = mean ascale
    let asd = sd ascale
    x2label Med = ^amed, Mean = ^amean
    x3label SD = ^asd
    histogram ascale
    title Shape Parameter
    let amed = median ashape
    let amean = mean ashape
    let asd = sd ashape
    x2label Med = ^amed, Mean = ^amean
    x3label SD = ^asd
    histogram ashape
    title PPCC Value
    let amed = median appcc
    let amean = mean appcc
    let asd = sd appcc
    x2label Med = ^amed, Mean = ^amean
    x3label SD = ^asd
    histogram appcc
    x3label displacement
    .
    device 2 close
    .
    let alpha = 0.05
    let xqlow = alpha/2
    let xqupp = 1 - alpha/2
    .
    write "Bootstrap-based Confidence Intervals"
    write "alpha = ^alpha"
    write " "
    .
    let xq = xqlow
    let loc95low = xq quantile aloc
    let xq = xqupp
    let loc95upp = xq quantile aloc
    let xq = xqlow
    let sca95low = xq quantile ascale
    let xq = xqupp
    let sca95upp = xq quantile ascale
    let xq = xqlow
    let sha95low = xq quantile ashape
    let xq = xqupp
    let sha95upp = xq quantile ashape
    write "Confidence Interval for Location: (^loc95low,^loc95upp)"
    write "Confidence Interval for Scale:    (^sca95low,^sca95upp)"
    write "Confidence Interval for Gamma:    (^sha95low,^sha95upp)"
    .
    .  Now generate confidence intervals for percentiles
    .
    serial read p
    0.5 1 2.5 5 10 20 30 40 50 60 70 80 90 95 97.5 99 99.5
    end of data
    let nperc = size p
    skip 1
    read matrix dpst4f.dat  xqp
    write " "
    loop for k = 1 1 nperc
        let xqptemp = p(k)
        let amed = median xqp^k
        let xqpmed(k) = amed
        let xq = xqlow
        let atemp = xq quantile xqp^k
        let xq95low(k) = atemp
        let xq = xqupp
        let atemp = xq quantile xqp^k
        let xq95upp(k) = atemp
    end of loop
    set table title "Bootstrap Based Confidence Intervals for Percentiles"
    set table spacing 15
    set write decimals 7
    write " "
    write "Confidence Intervals for Percentiles"
    write p xqpmed xq95low xq95upp
    end of capture
    delete xqp xqpmed xq95low xq95upp
    delete gamma1 gamma2
        

    plot generated by sample program

    plot generated by sample program

     BOOTSTRAP-BASED INTERVALS
      
     Bootstrap-based Confidence Intervals
     alpha = 0.05
      
     Confidence Interval for Location: (-0.4112,0.272742)
     Confidence Interval for Scale:    (0.744203,1.572763)
     Confidence Interval for Gamma:    (1.738776,3.748062)
      
      
     Confidence Intervals for Percentiles
    
     VARIABLES--P              XQPMED         XQ95LOW        XQ95UPP 
    
          0.5000000      0.1427522     -0.0264724      0.3189068
          1.0000000      0.1783569      0.0333498      0.3433715
          2.5000000      0.2441821      0.1315464      0.3925191
          5.0000000      0.3210546      0.2229533      0.4490436
         10.0000000      0.4301926      0.3367559      0.5266151
         20.0000000      0.5761956      0.4959525      0.6575613
         30.0000000      0.6924722      0.6142470      0.7801363
         40.0000000      0.8017390      0.7214258      0.8862920
         50.0000000      0.9073263      0.8198442      0.9952904
         60.0000000      1.0129241      0.9191304      1.1054595
         70.0000000      1.1306920      1.0393872      1.2280047
         80.0000000      1.2808006      1.1713773      1.3832619
         90.0000000      1.4895294      1.3569634      1.5912455
         95.0000000      1.6622919      1.5139780      1.7756084
         97.5000000      1.8108935      1.6356139      1.9443940
         99.0000000      1.9902619      1.7803770      2.1564791
         99.5000000      2.1085887      1.8710965      2.3045762
        
Date created: 04/20/2005
Last updated: 12/04/2023

Please email comments on this WWW page to alan.heckert@nist.gov.