SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

PORPORTION CONFIDENCE LIMITS

Name:
    PROPORTION CONFIDENCE LIMITS
Type:
    Analysis Command
Purpose:
    Generates a confidence interval for proportions.
Description:
    Given a set of N observations in a variable X, we can compute the proportion of successes. The PROPORTION CONFIDENCE LIMITS command computes a confidence interval for the proportion of successes.

    In Dataplot, you define a success by entering the command

      ANOP LIMITS <lower limit> <upper limit>

    before entering the PROPORTION CONFIDENCE LIMITS command. That is, you specify the lower and upper values that define a success. Then the estimate for the proportion of successes is simply the number of points in the success region divided by the total number of points. In most applications, successes are defined by 1's and failures by 0's. The default limits are 0.5 and 1.5, so if your data is defined by 0 and 1 values the ANOP LIMITS command can be omitted.

    Several methods have been proposed for the confidence limits for a binomial proportion. The following methods are currently supported in Dataplot

    1. NORMAL APPROXIMATION

      The normal approximation interval is

        \( \hat{p} \pm \Phi^{-1}_{(1 - \alpha/2)} \sqrt{\hat{p}(1 - \hat{p}/n} \)

      where

        X = the number of successes
        \(\hat{p} = \frac{X} {n} \)
        \(\Phi^{-1}\) is the percent point function of the normal distribution

      Due to its simplicity, the method is commonly used. However, its nominal coverage properties are not as good as the other methods. Its use should be restricted to cases with relatively large sample sizes where \( \hat{p} \) is not near 0 or 1.

    2. ADJUSTED WALD

      The adjusted Wald interval is

        \( \tilde{p} \pm \Phi^{-1}_{(1 - \alpha/2)} \sqrt{\tilde{p}(1 - \tilde{p})/\tilde{n}} \)

      where

        X = is the number of success
        \(\tilde{X} = X + (\Phi^{-1}(1 - \alpha/2))^{2}/2 \hspace{0.5in}\)
        \(\tilde{n} = n + (\Phi^{-1}(1 - \alpha/2))^{2}\)
        \(\tilde{p} = \frac{\tilde{X}} {\tilde{n}}\)
        \(\Phi^{-1}\) is the percent point function of the normal distribution

      This method improves upon the normal approximation.

    3. WILSON

      This method was originally proposed by Wilson in 1927. Papers by Agresti and Coull and also by Brown, Cai and DasGupta recommended this interval and provided comparisons of this method to the adjusted Wald and other methods.

      This method solves for the two values of p0 (say, pupper and plower)) that result from setting z = α/2 and solving for p0 = pupper, and then setting z = -z = α/2 and solving for p0 = plower where zα/2 denotes the variate value from the standard normal distribution such that the area to the right of the value is α/2. The solution for the two values of p0 results in the following confidence intervals:

        \( U. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} + z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n} \)

        \( L. L. = \frac{\hat{p} + \frac{z_{\alpha/2}^{2}}{2n} - z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z_{\alpha/2}^{2}}{4n^2}}} {1 + z_{\alpha/2}^{2}/n} \)

      This approach can be justified on the grounds that it is the exact algebraic counterpart to the (large-sample) hypothesis test and is also supported by the research of Agresti and Coull. One advantage of this procedure is that its worth does not strongly depend upon the value of n and/or p, and indeed was recommended by Agresti and Coull for virtually all combinations of n and p. Simulations by Agresti and Coull and by Brown, Cai and DasGupta show that this method does a better job of maintaining the nomial coverage than does the adjusted Wald and normal approximation methods. Another advantage is that the limits are in the (0,1) interval.

    4. JEFFREYS

      The Jeffreys interval is a Bayesian method based on a Jeffreys prior (the derivation for this interval is given in the Brown, Cai, DasGupta paper) is

        LCL = BETPPF(α/2,X + 0.5)
        UCL = BETPPF(1 - α/2,n - X + 0.5)

      where BETPPF is the percent point function of the beta distribution and X is the number of successes.

    5. EXACT BINOMIAL (or CLOPPER-PEARSON)

      Solve the equation

        BINCDF(x;p(u),n) = alpha/2

      for pu to obtain the upper 100(1 - alpha)% limit for p where BINCDF is the cumulative distribution function of the binomial distribution, x is the number of successes, and n is the number of trials.

      Next solve the equation

        BINCDF(x-1;p(l),n) = 1 - alpha/2

      for pl to obtain the lower 100(1 - alpha)% limit for p.

      Although this method is called "exact", it is not more accurate than the adjusted Wald or Wilson method. The "exact" terminology is based on the use of the binomial CDF function. However, since the binomial is a discrete distribution, the use of the CDF function does not result in "exact" 95% confidence intervals. The Agresti and Coull paper gives arguments to justify why the "approximate" Wilson and adjusted Wald methods can often be more accurate than the "exact" method.

    To specify the method to use, enter the command

      SET BINOMIAL METHOD <WILSON/ADJUSTED WALD/JEFFREYS/
                  NORMAL/EXACT>

    The default is the Wilson method. The Brown, Cai, and DasGupta paper studied the coverage properties of various methods. They specifically recommend the Wilson, the adjusted Wald, and the Jeffreys method as having the best coverage properties. Specifically, they recommend the Wilson and Jeffreys methods for n ≤ 40. For n > 40, these three methods have comparable performance. Although the normal approximation and exact binomial methods are not typically recommended, Dataplot provides them since they are still used in practice.

Syntax:
    PROPORTION CONFIDENCE LIMITS <y>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    ANOP LIMITS 0.80 1.0
    PROPORTION CONFIDENCE LIMITS Y

    ANOP LIMITS 0.80 1.0
    PROPORTION CONFIDENCE LIMITS Y SUBSET TAG = 1 TO 3

Note:
    A table of confidence intervals is printed for alpha levels of 50.0, 75.0, 90.0, 95.0, 99.0, 99.9, 99.99, and 99.999. The sample size, sample number of successes, and sample proportion of successes are also printed.
Note:
    Prior versions of Dataplot used the following method for the confidence interval

      (BINPPF(ALPHA/2,P,N)/N, BINPPF(1-ALPHA/2,P,N)/N)

    with BINPPF denoting the percent point function of the binomial distribution.

Default:
    None
Synonyms:
    None
Related Commands:
    AGRESTI COULL = Compute either the lower or upper confidence limit for either a one-sided or a two-sided binomial proportion of a variable (Wilson, adjusted Wald, or Jeffreys method).
    EXACT BINOMIAL = Compute either the lower or upper exact binomial confidence limit for either a one-sided or a two-sided binomial proportion of a variable.
    DIFFERENCE OF PROPORTIONS CONFIDENCE LIMIT = Generate a confidence interval for the difference of proportions.
    ANOP LIMITS = Specify success region for proportions.
    ANOP PLOT = Generate an analysis of proportions plot.
    CONFIDENCE LIMITS = Generate the confidence limits for the mean.
Reference:
    Agresti, A. and Coull, B. A. (1998), "Approximate is better than "exact" for interval estimation of binomial proportions", The American Statistician, 52(2), 119-126.

    Brown, L. D. Cai, T. T. and DasGupta, A. (2001), "Interval estimation for a binomial proportion," Statistical Science, 16(2), 101-133.

    Wilson (1927), "Probable inference, the law of succession, and statistical inference," Journal of the American Statistical Association, Vol. 22, pp. 209-212. Snedecor and Cochran, 1989, "Statistical Methods," Eigth Edition, Iowa State University Press, pp. 121-124.

    NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm.

Applications:
    Confirmatory Data Analysis
Implementation Date:
    1999/05
    2017/11: Change method for determining the confidence interval
Program:
    .           Create a binary variable with 30 rows
    .           with 8 successes.
    .
    let n = 30
    let nsuc = 8
    let y = 0 for i = 1 1 n
    let y = 1 for i = 1 1 nsuc
    .
    .          Now do proportions confidence interval
    .
    set write decimals 6
    set binomial method wilson
    proportion confidence interval y
    set binomial method adjusted wald
    proportion confidence interval y
    set binomial method jeffreys
    proportion confidence interval y
    set binomial method exact
    proportion confidence interval y
    set binomial method normal
    proportion confidence interval y
        
    This command generated the following output:
                Two-Sided Confidence Limits for a Proportion
                              (Wilson Method)
     
    Response Variable: Y
     
    Sample:
    Number of Observations:                  30
    Number of Successes:                     8
    Proportion of Successes:                 0.266667
    Standard Error:                          0.080737
     
     
     
    ------------------------------------------
      Confidence          Lower          Upper
       Value (%)          Limit          Limit
    ------------------------------------------
          50.000       0.215992       0.324313
          75.000       0.185098       0.367950
          90.000       0.157323       0.414615
          95.000       0.141827       0.444480
          99.000       0.116046       0.501805
          99.900       0.092558       0.564537
          99.990       0.077142       0.612690
          99.999       0.066181       0.651056
     
     
                Two-Sided Confidence Limits for a Proportion
                           (Adjusted Wald Method)
     
    Response Variable: Y
     
    Sample:
    Number of Observations:                  30
    Number of Successes:                     8
    Proportion of Successes:                 0.266667
    Standard Error:                          0.080737
     
     
     
    ------------------------------------------
      Confidence          Lower          Upper
       Value (%)          Limit          Limit
    ------------------------------------------
          50.000       0.217582       0.326788
          75.000       0.188962       0.376021
          90.000       0.163909       0.432706
          95.000       0.150238       0.471347
          99.000       0.128339       0.551032
          99.900       0.110551       0.646995
          99.990       0.101710       0.727092
          99.999       0.098343       0.794898
     
     
                Two-Sided Confidence Limits for a Proportion
                   (Bayesian with Jeffreys Prior Method)
     
    Response Variable: Y
     
    Sample:
    Number of Observations:                  30
    Number of Successes:                     8
    Proportion of Successes:                 0.266667
    Standard Error:                          0.080737
     
     
     
    ------------------------------------------
      Confidence          Lower          Upper
       Value (%)          Limit          Limit
    ------------------------------------------
          50.000       0.217637       0.325518
          75.000       0.184464       0.367317
          90.000       0.153145       0.412052
          95.000       0.134941       0.440996
          99.000       0.103386       0.497902
          99.900       0.073387       0.563271
          99.990       0.053383       0.616478
          99.999       0.039406       0.661171
     
     
                Two-Sided Confidence Limits for a Proportion
                          (Exact Binomial Method)
     
    Response Variable: Y
     
    Sample:
    Number of Observations:                  30
    Number of Successes:                     8
    Proportion of Successes:                 0.266667
    Standard Error:                          0.080737
     
     
     
    ------------------------------------------
      Confidence          Lower          Upper
       Value (%)          Limit          Limit
    ------------------------------------------
          50.000       0.202418       0.342833
          75.000       0.170298       0.385007
          90.000       0.140185       0.429934
          95.000       0.122795       0.458894
          99.000       0.092892       0.515598
          99.900       0.064818       0.580375
          99.990       0.046392       0.632814
          99.999       0.033699       0.676670
     
     
                Two-Sided Confidence Limits for a Proportion
                       (Normal Approximation Method)
     
    Response Variable: Y
     
    Sample:
    Number of Observations:                  30
    Number of Successes:                     8
    Proportion of Successes:                 0.266667
    Standard Error:                          0.080737
     
     
     
    ------------------------------------------
      Confidence          Lower          Upper
       Value (%)          Limit          Limit
    ------------------------------------------
          50.000       0.212210       0.321123
          75.000       0.173791       0.359543
          90.000       0.133866       0.399468
          95.000       0.108424       0.424909
          99.000       0.058701       0.474632
          99.900       0.000998       0.532335
          99.990       0.000000       0.580783
          99.999       0.000000       0.623298
        
Date created: 06/05/2001
Last updated: 12/11/2023

Please email comments on this WWW page to alan.heckert@nist.gov.