|
DISTRIBUTIONAL BOOTSTRAPName:
One limitation of this method is that it does not provide a method for finding uncertainty intervals for these estimates. To address this, we have extended the BOOTSTRAP PLOT command to support a number of probability distributions. The bootstrap is a non-parametric method for calculating a sampling distribution for a statistic. The bootstrap calculates the statistic with N different subsamples. The subsampling is performed with replacement. To apply the bootstrap to the univariate distributional modeling problem, we do the following:
For the bootstrap plot, there will be a separate curve drawn for each parameter estimated. In addition, there will be a curve drawn for the PPCC value (or the Kolmogorov-Smirnov statistic). The vertical axis contains the computed value of these estimated parameters and the horizontal axis contains the sample number (for k = 1, 2, ..., N). The bootstrap plot is typically followed by some type of distributional plot, such as a histogram, for each estimated parameter. This is demostrated in the Program sample below. Dataplot also supports bootstrap computations for the case when there is one group variable. In this case, the horizontal axis is group id and the vertical axis contains the computed values of the estimated parameters for that group (the parameters are offset horizontally). The number of bootstrap samples is applied to each group. For example,if the requested number of bootstrap samples is 100, then each group will have 100 bootstrap samples applied.
where <y> is the response variable; <dist> is one of the following distributions:
ARCSINE CAUCHY COSINE EXPONENTIAL GUMBEL (EXTREME VALUE TYPE 1) HALF CAUCHY HALF LOGISTIC HALF NORMAL HYPERBOLIC SECANT LAPLACE (DOUBLE EXPONENTIAL) LOGISTIC NORMAL RAYLEIGH SEMI-CIRCULAR SLASH UNIFORM This syntax estimates the location and scale parameters for each bootstrap sample using a probability plot.
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x> is the censoring variable; <dist> is one of the distributions given for Synatx 1; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax estimates the location and scale parameters for each bootstrap sample using a censored probability plot. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.
where <y> is the response variable; <dist> is one of the following distributions:
BRADFORD CHI CHI-SQUARE DOUBLE GAMMA DOUBLE WEIBULL ERROR (SUBBOTIN) FATIGUE LIFE FOLDED T FRECHET GAMMA GENERALIZED EXTREME VALUE GENERALIZED HALF LOGISTIC GENERALIZED LOGISTIC GENERALIZED PARETO GEOMETRIC EXTREME EXPONENTIAL INVERTED GAMMA INVERTED WEIBULL LOG DOUBLE EXPONENTIAL (LOG LAPLACE) LOG GAMMA LOG LOGISTIC LOGNORMAL PARETO PARETO SECOND KIND POWER POWER NORMAL RECIPROCAL SKEW LAPLACE (SKEW DOUBLE EXPONENTIAL) T TUKEY-LAMBDA VON MISES WALD WRAPPED CAUCHY WEIBULL This syntax estimates the shape parameter using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample .
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x> is the censoring variable; <dist> is one of the distributions given for syntax 3; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax estimates the shape parameter using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.
where <y> is the response variable; <dist> is one of the distributions given for syntax 3; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax estimates the shape parameter using a KS plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.
where <y> is the response variable; <dist> is one of the following distributions:
F FOLDED NORMAL G-AND-H INVERSE GAUSSIAN GENERALIZED GAMMA This syntax estimates the two shape parameters using a PPCC plot and then estimates the location and scale parameters using a probability plot for each bootstrap sample.
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x> is the censoring variable; <dist> is one of the distributions given in syntax 6; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax estimates the two shape parameters using a censored PPCC plot and then estimates the location and scale parameters using a censored probability plot for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.
where <y> is the response variable; <dist> is one of the following distributions:
EXPONENTIAL FOLDED NORMAL GUMBEL (EXTREME VALUE TYPE 1) LAPLACE (DOUBLE EXPONENTIAL) LOGISTIC NORMAL RAYLEIGH UNIFORM This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample.
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x> is the censoring variable; <dist> is one of the following distributions:
EXPONENTIAL This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.
where <y> is the response variable; <dist> is one of the following distributions:
FATIGUE LIFE GAMMA GENERALIZED PARETO GEOMETRIC EXTREME EXPONENTIAL INVERSE GAUSSIAN LOGNORMAL PARETO WEIBULL This syntax estimates the shape and scale parameters (the Beta, Pareto, and inverse Gaussian estimate the two shape parameters but no scale parameter) using maximum likelihood for each bootstrap sample.
<SUBSET/EXCEPT/FOR qualification> where <y> is the response variable; <x> is the censoring variable; <dist> is one of the following distributions:
CENSORED LOGNORMAL CENSORED WEIBULL This syntax estimates the location and scale parameters using maximum likelihood for each bootstrap sample. The censoring variable should contain a value of 1 to indicate a failure time and a value of 0 to indicate a censoring time.
where <y> is the response variable; <dist> is one of the following distributions:
This syntax estimates the shape, location, and scale parameters using maximum likelihood for each bootstrap sample.
BOOTSTRAP NORMAL MLE PLOT Y BOOTSTRAP WEIBULL PLOT Y BOOTSTRAP WEIBULL KS PLOT Y BOOTSTRAP WEIBULL CENSORED PLOT Y X
dpst1f.dat The estimates for each bootstrap sample are written to file dpst1f.dat. You can read the variables written to dpst1f.dat to generate histograms and to compute selected percentiles. The order is:
location parameter scale parameter first shape parameter second shape parameter ppcc (or ks) value If a particular syntax does not generate one or more of these values, then they are omitted (e.g., the normal distribution does not generate estimates for any shape parameters). The following example generates a Weibull bootstrap plot and then reads the bootstrap estimates from dpst1f.dat.
SKIP 0 READ DPST1F.DAT ALOC ASCALE AGAMMA APPCC MULTIPLOT 2 2 RELATIVE HISTOGRAM ALOC RELATIVE HISTOGRAM ASCALE RELATIVE HISTOGRAM AGAMMA RELATIVE HISTOGRAM APPCC END OF MULTIPLOT dpst2f.dat Selected percentiles are written to dpst2f.dat. The order is:
Parameter number - parameters are ordered as they are in dpst1f.dat Mean Standard Deviation Median 2.5 percentile 97.5 percentile 5.0 percentile 95.0 percentile 0.5 percentile 99.5 percentile
If you enter the command
you will get bootstrap estimates for the following percentiles:
If you would like to specify the specific percentiles to estimate, enter the command
with YPERC denoting a variable that contains the desired percentiles. This is demonstrated in the sample program below. By default, two sided confidence intervals are generated for the percentiles. The following commands can be used to generate either lower one sided or upper one sided intervals
SET BOOTSTRAP DISTRIBUTIONAL PERCENTILES UPPER To turn off the computation of the percentile confidence intervals, enter
To reset the default of two sided intervals, enter
LET GAMMA2 = 10 BOOTSTRAP WEIBULL PLOT Y One recommendation is to generate the bootstrap plot for a relatively small number of samples (e.g., 50) and use that to determine a reasonable range for the shape parameter. Enter HELP PPCC PLOT to see the relevant parameter for the desired distribution.
The BCA option is not currently supported for the bootstrap distributional plots.
The full sample is used to generate the parameter estimates of the distribution. Then the bootstrap samples are generated by generating random numbers from the specified distribution using the parameters estimated from the full sample. The following command will specify this alternate method of bootstrapping be used:
To restore the default method of bootstrapping the data values, enter the command
The parameteric option is still being tested and may not work correctly for some of the distributions.
MAXIMUM LIKELIHOOD is a synonym for MLE
Efron and Tibshirabi (1993), "An Introduction to the Bootstrap", Springer-Verlang.
. Following Sample Macro demonstrates the use of the
. bootstrap with a Weibull distribution.
.
. Step 0: Create some sample Weibull data
.
dimension 50 columns
.
let gamma = 2.3
let y = weibull random numbers for i = 1 1 100
.
. Step 1: Perform PPCC/Probability Plot Analysis,
. Perform K-S goodness of fit test
.
set ipl1na distboo1.ps
device 2 postscript
device 2 color on
.
multiplot 2 2
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.5
y1label displacement 12
.
title displacement 2
x1label displacement 12
title Weibull PPCC Plot
x1label Shape Parameter (gamma)
y1label Correlation
weibull ppcc plot y
justification left
height 3.5
move 25 28
text Max PPCC: ^maxppcc
move 25 21
text Shape: ^shape
let gamma1 = shape - 2
if gamma1 <= 0
let gamma1 = 0.1
end of if
let gamma2 = shape + 2
title Weibull PPCC Plot
weibull ppcc plot y
move 25 28
text Max PPCC: ^maxppcc
move 25 23
text Shape: ^shape
let gamma = shape
if n <= 200
character x
line blank
else
line solid
character blank
end of if
.
title Weibull Probability Plot
x1label displacement
x1label Theoretical
y1label Data
weibull probability plot y
justification center
move 50 2
text Location: ^ppa0, Scale: ^ppa1
let iplot = 3
multiplot 2 2 iplot
line solid
character blank
limits freeze
pre-erase off
let function f = ppa0 + ppa1*x
let zmin = minimum xplot
let zmax = maximum xplot
let ainc2 = (zmax - zmin)/10
plot f for x = zmin ainc2 zmax
limits
pre-erase on
let iplot = iplot + 1
multiplot 2 2 iplot
title Histogram with Overlaid Weibull
x1label Data Units
y1label Density
relative histogram y
multiplot 2 2 iplot
limits freeze
pre-erase off
let loc2=ppa0
let scale2=ppa1
line solid
char blank
line color blue
let amin = minimum y
let amax = maximum y
let ainc = 0.1
plot weipdf(x,shape,loc2,scale2) for x = amin ainc amax
limits
pre-erase on
line color black
delete gamma gamma1 gamma2
end of multiplot
device 2 close
.
. Step 2: Now perform a bootstrap analysis
.
feedback off
capture distboot.out
write " "
write "BOOTSTRAP-BASED INTERVALS"
write " "
set maximum likelihood percentiles default
bootstrap samples 200
let gamma1 = 0.2
let gamma2 = 5
set ipl1na distboo2.ps
device 2 postscript
device 2 color on
multiplot 2 3
multiplot corner coordinates 0 0 100 100
multiplot scale factor 1.7
y1label Parameter Estimate
x1label
x2label Bootstrap Sample
title Bootstrap Plot
line color blue red green
limits
bootstrap weibull plot y
line color black all
.
delete aloc ascale ashape appcc
skip 0
read dpst1f.dat aloc ascale ashape appcc
y1label
x2label
x3label displacement 16
title Location Parameter
let amed = median aloc
let amean = mean aloc
let asd = sd aloc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram aloc
title Scale Parameter
let amed = median ascale
let amean = mean ascale
let asd = sd ascale
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ascale
title Shape Parameter
let amed = median ashape
let amean = mean ashape
let asd = sd ashape
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram ashape
title PPCC Value
let amed = median appcc
let amean = mean appcc
let asd = sd appcc
x2label Med = ^amed, Mean = ^amean
x3label SD = ^asd
histogram appcc
x3label displacement
.
device 2 close
.
let alpha = 0.05
let xqlow = alpha/2
let xqupp = 1 - alpha/2
.
write "Bootstrap-based Confidence Intervals"
write "alpha = ^alpha"
write " "
.
let xq = xqlow
let loc95low = xq quantile aloc
let xq = xqupp
let loc95upp = xq quantile aloc
let xq = xqlow
let sca95low = xq quantile ascale
let xq = xqupp
let sca95upp = xq quantile ascale
let xq = xqlow
let sha95low = xq quantile ashape
let xq = xqupp
let sha95upp = xq quantile ashape
write "Confidence Interval for Location: (^loc95low,^loc95upp)"
write "Confidence Interval for Scale: (^sca95low,^sca95upp)"
write "Confidence Interval for Gamma: (^sha95low,^sha95upp)"
.
. Now generate confidence intervals for percentiles
.
serial read p
0.5 1 2.5 5 10 20 30 40 50 60 70 80 90 95 97.5 99 99.5
end of data
let nperc = size p
skip 1
read matrix dpst4f.dat xqp
write " "
loop for k = 1 1 nperc
let xqptemp = p(k)
let amed = median xqp^k
let xqpmed(k) = amed
let xq = xqlow
let atemp = xq quantile xqp^k
let xq95low(k) = atemp
let xq = xqupp
let atemp = xq quantile xqp^k
let xq95upp(k) = atemp
end of loop
set table title "Bootstrap Based Confidence Intervals for Percentiles"
set table spacing 15
set write decimals 7
write " "
write "Confidence Intervals for Percentiles"
write p xqpmed xq95low xq95upp
end of capture
delete xqp xqpmed xq95low xq95upp
delete gamma1 gamma2
BOOTSTRAP-BASED INTERVALS
Bootstrap-based Confidence Intervals
alpha = 0.05
Confidence Interval for Location: (-0.4112,0.272742)
Confidence Interval for Scale: (0.744203,1.572763)
Confidence Interval for Gamma: (1.738776,3.748062)
Confidence Intervals for Percentiles
VARIABLES--P XQPMED XQ95LOW XQ95UPP
0.5000000 0.1427522 -0.0264724 0.3189068
1.0000000 0.1783569 0.0333498 0.3433715
2.5000000 0.2441821 0.1315464 0.3925191
5.0000000 0.3210546 0.2229533 0.4490436
10.0000000 0.4301926 0.3367559 0.5266151
20.0000000 0.5761956 0.4959525 0.6575613
30.0000000 0.6924722 0.6142470 0.7801363
40.0000000 0.8017390 0.7214258 0.8862920
50.0000000 0.9073263 0.8198442 0.9952904
60.0000000 1.0129241 0.9191304 1.1054595
70.0000000 1.1306920 1.0393872 1.2280047
80.0000000 1.2808006 1.1713773 1.3832619
90.0000000 1.4895294 1.3569634 1.5912455
95.0000000 1.6622919 1.5139780 1.7756084
97.5000000 1.8108935 1.6356139 1.9443940
99.0000000 1.9902619 1.7803770 2.1564791
99.5000000 2.1085887 1.8710965 2.3045762
Date created: 04/20/2005 |
Last updated: 12/04/2023 Please email comments on this WWW page to [email protected]. | ||||||||||||||||||||||