|
GATHERName:
With the GATHER command, we define a separate variable that contains the specific rows we want to extract. If you modify the extracted data and want to save these modified values back to the original variable, you can use the SCATTER command. Enter HELP SCATTER for further information. The first program example shows a simple artificial example to demonstrate the basic syntax. The second and third examples show non-trivial examples of the GATHER command.
where <x> is a response variable; <index> is a variable containing row numbers; and <y> is a variable (of length equal to <index>) that contains the rows of <x> corresponding to <index>.
let n = 30
let xseq = sequence 1 1 n
let x = normal rand numb for i = 1 1 n
let iindex = data 10 14 8 23 19
.
let y = gather x iindex
set write decimals 3
print xseq x iindex y
The following output is generated.
1.000 -1.073 10.000 0.270
2.000 0.573 14.000 -0.841
3.000 -0.873 8.000 0.032
4.000 0.234 23.000 -1.063
5.000 -0.455 19.000 0.034
6.000 -0.525 0.000 0.000
7.000 -0.706 0.000 0.000
8.000 0.032 0.000 0.000
9.000 1.191 0.000 0.000
10.000 0.270 0.000 0.000
11.000 -0.149 0.000 0.000
12.000 -0.197 0.000 0.000
13.000 -0.243 0.000 0.000
14.000 -0.841 0.000 0.000
15.000 -0.104 0.000 0.000
16.000 0.419 0.000 0.000
17.000 0.264 0.000 0.000
18.000 0.898 0.000 0.000
19.000 0.034 0.000 0.000
20.000 1.588 0.000 0.000
21.000 0.389 0.000 0.000
22.000 -0.470 0.000 0.000
23.000 -1.063 0.000 0.000
24.000 -0.027 0.000 0.000
25.000 -0.464 0.000 0.000
26.000 0.592 0.000 0.000
27.000 -0.506 0.000 0.000
28.000 -0.360 0.000 0.000
29.000 0.499 0.000 0.000
30.000 0.243 0.000 0.000
Program 2:
. Purpose: Need to randomly sample bottles from a box of 81 bottles.
.
. 3 temperatures: -20, 5, 25
. 3 times: 1 day, 3 days, 7 days
. (7 days also has a temperature of -80)
. 4 levels
.
. This gives us 2*3*4 + 1*4*4 = 40 combinations. Generate a
. sampling plan for 2 replications for each combination.
.
. Step 0: Set the random number generator
.
dimension 100 columns
set random number generator fibonacci congruential
seed 55609
set write decimals 0
.
. Step 1: Assign the combinations in arbitrary order
.
let temp = data -80 -20 -20 -20 5 5 5 25 25 25
let time = data 7 1 3 7 1 3 7 1 3 7
let ncomb1 = size temp
let nlevel = 4
let temp = combine temp temp temp temp
let time = combine time time time time
let level = sequence 1 ncomb1 1 nlevel
let ncomb2 = size temp
.
. Step 2: Randomly permute the combinations
.
let ntemp = size temp
let zperm = random permutation for i = 1 1 ntemp
let x1 = gather temp zperm
let x2 = gather time zperm
let x3 = gather level zperm
delete zperm
.
let nrepl = 2
loop for k = 2 1 nrepl
let zperm = random permutation for i = 1 1 ntemp
let x1t = gather temp zperm
let x2t = gather time zperm
let x3t = gather level zperm
let x1 = combine x1 x1t
let x2 = combine x2 x2t
let x3 = combine x3 x3t
end of loop
let n = size x1
.
. Step 2: Now randomize the vial selection
.
let nbottle = 81
let bottleid = sequence 1 1 nbottle
let xperm = random permutation for i = 1 1 nbottle
retain bottleid xperm subset xperm <= n
.
. Step 3: Now match time/temp to bottle id
.
let temp2 = gather x1 xperm
let time2 = gather x2 xperm
let level2 = gather x3 xperm
.
write2 perm.out "Permutation Temperature Time Level Bottle-ID"
write2 perm.out "----------------------------------------------------"
set write format F14.0,F11.0,F7.0,F8.0,F12.0
write2 perm.out xperm temp2 time2 level2 bottleid
The perm.out file contains
Permutation Temperature Time Level Bottle-ID
----------------------------------------------------
77. 25. 7. 4. 1.
76. 25. 1. 1. 2.
31. -20. 7. 1. 3.
69. 5. 1. 1. 4.
51. -20. 1. 1. 5.
44. -20. 3. 4. 6.
79. 25. 3. 1. 7.
41. 25. 3. 3. 8.
54. 5. 3. 2. 9.
1. -80. 7. 4. 10.
18. -20. 3. 3. 11.
25. 25. 3. 4. 12.
10. 5. 3. 1. 13.
9. 25. 1. 3. 14.
58. 5. 7. 3. 15.
4. 5. 3. 3. 16.
34. 25. 3. 1. 17.
73. -80. 7. 1. 18.
39. 5. 1. 4. 19.
24. 25. 1. 2. 20.
52. -20. 3. 1. 21.
3. 5. 1. 2. 22.
49. 5. 3. 1. 23.
57. 25. 7. 2. 24.
8. 25. 7. 2. 25.
38. -20. 1. 2. 26.
59. 5. 7. 1. 27.
74. -20. 7. 1. 28.
26. 25. 1. 1. 29.
72. 5. 1. 4. 30.
19. -20. 7. 2. 31.
63. -20. 1. 2. 32.
23. 5. 1. 3. 33.
75. -80. 7. 4. 34.
45. -20. 7. 4. 35.
47. 5. 3. 3. 36.
71. -20. 1. 4. 37.
30. 5. 7. 4. 38.
27. -20. 1. 3. 39.
6. 5. 3. 4. 40.
70. 5. 7. 4. 41.
29. 25. 3. 3. 42.
46. -20. 3. 2. 43.
12. -20. 3. 4. 44.
32. 5. 1. 1. 45.
66. -80. 7. 3. 46.
21. 5. 7. 2. 47.
33. -80. 7. 3. 48.
5. 25. 7. 3. 49.
65. 5. 1. 3. 50.
80. -20. 3. 3. 51.
61. -20. 7. 2. 53.
11. 5. 7. 1. 54.
60. 25. 3. 4. 55.
14. 5. 3. 2. 56.
55. 25. 7. 1. 57.
16. -20. 7. 4. 58.
67. -20. 7. 3. 59.
62. -20. 1. 3. 60.
13. 5. 7. 3. 61.
7. 25. 7. 4. 62.
36. -20. 1. 4. 63.
28. -80. 7. 2. 64.
22. -20. 3. 2. 65.
48. 5. 3. 4. 66.
20. 25. 1. 4. 67.
64. 25. 7. 3. 68.
15. -20. 3. 1. 69.
53. 25. 1. 2. 70.
17. -20. 7. 3. 71.
50. -80. 7. 2. 72.
43. 25. 1. 3. 73.
37. -20. 1. 1. 74.
2. 25. 7. 1. 75.
68. 25. 3. 2. 76.
40. 25. 3. 2. 77.
42. 25. 1. 4. 78.
78. 5. 1. 2. 79.
56. 5. 7. 2. 80.
35. -80. 7. 1. 81.
Program 3:
. Purpose: Reproduce the Reliability Analyses given at:
.
. https://www.itl.nist.gov/div898/handbook/apr/section4/apr46.htm
.
. This script shows a non-trivial use of the GATHER command
. that greatly speeded up the analysis.
. Source: Thanks to Jonathan H. Morgan for submitting this example.
. Description: Perform a weighted sampling with replacement. This example
. builds on the reliability analyses discussed in sections
. 8.3.1.5 and 8.4.6 of the NIST/SEMATECH e-Handbook of
. Statistical Methods
. (http://www.itl.nist.gov/div898/handbook). In these
. analyses, the authors constructed a posterior distribution
. of Mean Time Between Failures (MTBF). This is a common
. approach because samples of equipment failure rates tend
. to be small, making frequentist approaches impractical in
. many cases. Based on these analyses, the authors confirmed
. that a 600 hour MTBF objective fell within a 80% credible
. interval. In addition to summarizing the posterior, it is
. useful to average over our uncertainty to provide an
. intuition about model trends. What might we expect in the
. future based on what we know of the prior and posterior?
. We do this by performing a weighted re-sampling from the
. prior and posterior distributions based on the prior and
. posterior likelihoods using GATHER.
.
. Step 0: Define the output devices and dimension the workspace
.
dimension 1000000 rows
.
device 2 close
let string fplot = gather_script.ps
set ipl1na ^fplot
device 2 postscript
.
. Step 1: Initialize some plot control settings and the random number
. generator
.
y2frame off; x2frame off
grid thickness 0.01; grid on
set random number generator fibonacci congruential
seed 33497
.
. Step 2: Generate the Data for Mean Time Between Failures Analysis
.
let prob = sequence 0 0.001 0.999
let ngroup = size prob
let alpha = 4
let beta = 3309
.
. Step 3: Calculate the MTBF Posterior
.
let post = igappf(prob,alpha,0, beta) for i = 1 1 ngroup
retain post prob for i = 1 1 1000
.
. Flipping Probabilities for the Purpose of Plotting
.
LET prob = REVERSE prob
.
. Calculating Intervals for Plotting Purposes
.
let p09 = igappf(0.9,alpha,0, beta)
let p08 = igappf(0.8,alpha,0,beta)
let p05 = igappf(0.5,alpha,0,beta)
let p01 = igappf(0.1,alpha,0,beta)
.
. Step 4: Calculate the MTBF Posterior
.
. Plotting to Confirm Results
.
x1label Mean Time Between Failures Objective
y1label Probability of Exceeding Objective
label size 2
y1label displacement 10
title Estimated Product Reliability
line color brown
.
plot prob post
line color blue
drawdata p09 0 p09 0.1
drawdata p05 0 p05 0.5
drawdata p08 0 p08 0.2
drawdata p01 0 p01 0.9
case lower
font complex
move 5 4
text alph()' = 2 + 2 fails, beta()' = 1400 + 1909 hours
move 63 4
text lamb() =
case asis
move 68 4
text 1/GSUP()-1
move 75 4
text (1/a, 4, 1/3309)
.
. Step 5: Sampling from the Posterior Distribution with Replacement Using
. GATHER
.
. Notes: The routine performs weighted samples by comparing the
. weights to values generated from a uniform random
. distribution scaled to encapsulate the range of the
. weights, in this case the posterior values. The
. routine's scale parameters reduce the number of maxloop
. draws necessary to generate the sample. Random draws
. are included in the random sample if the sample's weight
. is greater than the random draw from the comparison
. distribution. A macro of the routine such as
. weighted_macro facilitates performing weighted sampling
. with replacement.
.
. Returning Probabilities Back to their Original Order
.
let prob = reverse prob
.
. Specifying the Iteration and Location Parameters for Sampling
. the Posterior
.
let n_samp = 10000
let maxloop = 10*n_samp
let loc = minimum post
let scale = maximum post
let i_loc = 1
let i_scale = size prob
.
. Generate Two Sets of Uniform Random Numbers for Selecting Based
. on Likelihood
.
let i_value = uniform random numbers for I = 1 1 maxloop
let i_value = loc + scale*i_value
.
let ind = uniform random numbers for i = 1 1 maxloop
let ind = i_loc + i_scale*ind
let index = ceil(ind) - 1
.
. Now draw weighted samples from the posterior: post_sam
.
let index2 = sequence 1 1 maxloop
let val_i = gather i_value index2
let val = gather post index
.
let tag = 0 for i = 1 1 maxloop
let diff = val - val_i
let tag = 1 subset diff > 0
retain index subset tag = 1
retain index for i = 1 1 n_samp
let post_sam = gather prob index
.
delete loc scale i_loc i_scale i_value index ind index2 val val_i tag diff
.
. Estimating Prior MTBF
.
let prior = igappf(prob, 2, 0, 1400) for i = 1 1 ngroup
.
. Specifying the iteration and location parameters for sampling
. the prior
.
let maxloop = 1000000
let loc = minimum prior
let scale = maximum prior
let i_loc = 1
let i_scale = size prob
.
. Generate two sets of uniform random numbers for selecting
. based on likelihood
.
let i_value = uniform random numbers for i = 1 1 maxloop
let i_value = loc + scale*i_value
.
let ind = uniform random numbers for i = 1 1 maxloop
let ind = i_loc + i_scale*ind
let index = ceil(ind) - 1
.
. Now draw weighted samples from the prior: pri_samp
.
let index2 = sequence 1 1 maxloop
let val_i = gather i_value index2
let val = gather post index
.
let tag = 0 for i = 1 1 maxloop
let diff = val - val_i
let tag = 1 subset diff > 0
retain index subset tag = 1
retain index for i = 1 1 n_samp
let pri_samp = gather prob index
delete loc scale i_loc i_scale i_value index ind index2 val val_i tag diff
.
. Assessing the model by generating prior and posterior
. predicative distributions
.
. Generating predictions based on weighted samples of the
. prior likelihoods
.
let n_sim = size pri_samp
let pri_val = igappf(pri_samp, 2, 0, 1400) for i = 1 1 n_sim
.
. Generating predictions based on weighted samples of the
. posterior likelihoods
.
let n_sim = size post_sam
let pred_val = igappf(post_sam, alpha, 0, beta) for i = 1 1 n_sim
.
. Plotting Predicative Distributions
.
x1label Mean Time Between Failures Objective
y1label Density
x1limit 0 6000
y1limit 0 0.001
y1label displacement 12
major xtic mark number 5
major ytic mark number 3
title Weighted Predicative Distributions
line color brown blue
multiple kernel density plot pred_val pri_val
.
line thickness 0.2
line color blue
drawdata 3000 0.001 3500 0.001
movedata 3550 0.000985
hw 2 1
text Prior
.
line color brown
drawdata 3000 0.00085 3500 0.00085
movedata 3550 0.00084
text Posterior
line thickness 0.1
.
As expected, the weighted posterior predicative distribution is
centered on higher MTFB values than the prior weighted
predicative distribution. Comparing the two distributions, we
can see the additional information included in the posterior
has reduced the predictions’ variance. The weighted posterior
predicative distribution is skewed to the right compared to the
posterior prediction (median MTBF of 1213 compared to 901), as
is the weighted prior predicative distribution. The rightward
shift is because most of the mass of the posterior is centered
on MTFB values between 495 and 1897, and thus more of the
samples are of values in this range. The corresponding shift
in the weighted prior results in a peak occurring at a MTBF of
967 very close to the posterior’s median. Substantively, these
findings suggest that the posterior predictions are likely to
be conservative, as MTBF values under 400 are unlikely.
Nevertheless, we should take this result with a grain of salt
given that the posterior itself is based on a very small sample.
|
Privacy
Policy/Security Notice
NIST is an agency of the U.S.
Commerce Department.
Date created: 12/04/2008 | ||||||||||||||||||