SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

BEST DISTRIBUTIONAL FIT

Name:
    BEST DISTRIBUTIONAL FIT
Type:
    Analysis Command
Purpose:
    Generate a ranked list of best distributional fit for univariate data.
Description:
    A common task is to find a good distributional fit to a set of univariate data. This command can be used as a screening tool to identify good candidate models.

    There are two steps in this process:

    1. Fitting
    2. Ranking by a goodness of fit critierion

    You can specify the fit method with the command

      SET BEST FIT METHOD <value>

    where <value> is one of the following

      MAXIMUM LIKELIHOOD: maximum likelihood
      PPCC: PPCC goodness of fit
      ANDERSON DARLING: Anderson-Darling goodness of fit
      KOLMOGOROV SMIRNOV: Kolmogorov-Smirnov goodness of fit

    The default method is maximum likelihood.

    You can specify the goodness of fit critierion with the command

      SET BEST FIT CRITERION <value>

    where <value> is one of the following

      ANDERSON DARLING: Anderson-Darling
      KOLMOGOROV SMIRNOV: Kolmogorov-Smirnov
      PPCC: PPCC
      AIC: Akaike Information Criterion
      AICc: Akaike Information Criterion corrected for sample size
      BIC: Bayesian Information Criterion

    The default goodness of fit criterion is Anderson-Darling.

    Note that this command is intended strictly as a screening tool to identify good candidate distributions. You should perform a more complete analysis once you identify appropriate candidate distributions. Also, you may be able to improve the fit for certain distributions by fine tuning the starting values.

    We do not recommend simply selecting the "best" distribution from the list. Rather this command is meant to identify good candidate models that should be examined more carefully. For example, a simpler distribution that provides nearly as good a fit as a more complicated distribution may be preferred. In some cases, a distribution that has a more meaningful physical interpretation or has established usage in a given area of work may be preferred.

    For performance reasons, not all possible distributions are included.

Syntax 1:
    BEST DISTRIBUTIONAL FIT <y>             <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    For this syntax, the response variable can be a matrix.

Syntax 2:
    MULTIPLE BEST DISTRIBUTIONAL FIT <y1> ... <yk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> ... <yk> is a list of 1 to 30 response variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax generates a best distributional analysis for each listed response variable. These response variables can be matrices.

Syntax 3:
    REPLICATED BEST DISTRIBUTIONAL FIT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is a response variable;
    <x1> ... <xk> is a list of 1 to 6 group-id variables;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    This syntax peforms a cross-tabulation of <x1> ... <xk> and performs the best distributional fit analysis for each unique combination of cross-tabulated values. For example, if X1 has 3 levels and X2 has 2 levels, there will be a total of 6 best distributional fit analyses performed.

Examples:
    BEST DISTRIBUTIONAL FIT Y
    BEST DISTRIBUTIONAL FIT Y SUBSET TAG = 1
    MULTIPLE BEST DISTRIBUTIONAL FIT Y1 TO Y5
    REPLICATED BEST DISTRIBUTIONAL FIT Y X
Note:
    This command is currently limited to raw data (i.e., not binned) and continuous distributions.
Note:
    If fitting is performed using maximum likelihood, the generalzied Pareto and generalized extreme value distributions will be fit using the elemental percentileds method. This is done since the maximum likelihood, moment, and L-moment estimates may not be valid for certain ranges of the distribution parameters.

    Also, distributions that expect all positive (or negative) numbers will be shifted appropriately before performing the maximum likelihood estimation.

    Since this command is intended as a quick screening method, not all methods for which Dataplot supports maximum likelihood estimation are included.

Note:
    If fitting is performed using one of the goodness of fit statistics (i.e., PPCC, Anderson-Darling, Kolmogorov-Smirnov), then the distributions are limited to location-scale distributions or distributions with a single shape parameter. The one exeception is the G and H distribution (which has two shape parameters).

    This restriction is primarily for performance reasons.

Note:
    The AIC is computed as

      AIC = 2*k - 2*LN(L)

    with k denoting the number of parameters being fit and L is the maximized value of the likelihood function.

    The AICc is computed as

      AICc = AIC + 2*k*(k+1)/(n-k-1)

    The AICc is recommended over the AIC when the sample size is small or k is large. Since AICc converges to AIC for large n, some analysts prefer to use AICc rather than AIC for all cases.

    The BIC is computed as

      -2*LN(L) + k*LN(n)

    The penalty term for extra parameters is larger in the BIC than it is for the AIC.

Note:
    The PPCC ranking method is based on the "most linear" probability plot where linearity is measured by correlation coefficient of the points on the probability plot. The probability plot has the property that it is invariant to location and scale. In practical terms, this means that the linearity of the probability plot only depends on the shape parameters, not the location and scale parameters.

    So if we use a non-PPCC method to estimate the parameters and use the PPCC as a ranking method, there is an additional implicit estimate for the location and scale parameters. For this reason, the PPCC ranking method is only supported when a PPCC fitting method is used.

Default:
    None
Synonyms:
    ML is a synonym for MAXIMUM LIKELIHOOD
    AD is a synonym for ANDERSON DARLING
    KS is a synonym for KOLMOGOROV SMIRNOV
Related Commands: Applications:
    Distributional Modeling
Implementation Date:
    2011/03
    2012/10: Added AIC, AICC, and BIC ranking methods
Program:
     
    .  Step 1: Read the data
    .
    .          Following data from Jeffery Fong of the NIST
    .          Applied and Computational Mathematics Division.
    .          This is strength data in ksi units.
    .
    read y
    18.830
    20.800
    21.657
    23.030
    23.230
    24.050
    24.321
    25.500
    25.520
    25.800
    26.690
    26.770
    26.780
    27.050
    27.670
    29.900
    31.110
    33.200
    33.730
    33.760
    33.890
    34.760
    35.750
    35.910
    36.980
    37.080
    37.090
    39.580
    44.045
    45.290
    45.381
    end of data
    .
    set write decimals 5
    .
    .  Step 2: Apply goodness of fit tests for Weibull distribution
    .          based on ML estimates
    .
    .  Maximum likelihood method
    .
    set best fit method     ml
    set best fit criterion  anderson darling
    best distributional fit  y
    .
    set best fit method     ml
    set best fit criterion  kolm smir
    best distributional fit  y
    .
    .  PPCC method
    .
    set best fit method     ppcc
    set best fit criterion  ppcc
    best distributional fit  y
        
    The following output is generated.
                Best Distributional Fit
     
    Response Variable: Y
     
    Fit Method: Maximum Likelihood
    Ranking Criterion: Anderson Darling
     
    Summary Statistics:
    Number of Observations:                              31
    Sample Minimum:                            18.83000
    Sample Maximum:                            45.38100
    Sample Mean:                               30.81142
    Sample SD:                                 7.253381
     
     
    Ranked List of Best Fit
    ----------------------------------------------------------------------------------------------------
                                    Goodness       Estimate       Estimate       Estimate       Estimate
                                      of Fit             of             of       of Shape       of Shape
    Distribution                   Statistic       Location          Scale    Parameter 1    Parameter 2
    ----------------------------------------------------------------------------------------------------
    TRIANGULAR                 0.3332130       17.33848       49.26861       25.50000                 **
    3-PAR WEIBULL (MINIMUM)    0.3380554       17.64420       14.83507       1.913580                 **
    2-PAR LOGNORMAL            0.3888329                 **   30.00134      0.2349026                 **
    GUMBEL (MAXIMUM)           0.3980371       27.39966       5.986812                 **             **
    2-PAR INVERTED GAMMA       0.4062290                 **   555.0562       18.99880                 **
    2-PAR GAMMA                0.4386866                 **   1.627518       18.93154                 **
    2-PAR BURR TYPE 10         0.4464637                 **   19.47685       7.276202                 **
    2-PAR FRECHET (MAX)        0.4681383                 **   26.74576       4.659741                 **
    LOGISTIC EXPONENTIAL       0.4897890                 **   43.44812       5.187883                 **
    NORMAL                     0.5321921       30.81142       7.253381                 **             **
    FOLDED NORMAL              0.5559204       30.81142       7.135432                 **             **
    LOGISTIC                   0.5728510       30.44662       4.224463                 **             **
    TWO-SIDED POWER            0.5794983       18.55532       45.96386       25.50000       1.269723
    2-PAR WEIBULL (MINIMUM)    0.5973435                 **   33.67424       4.635390                 **
    REFL GENE TOPP AND LEONE   0.8370549       18.80345       45.40755      0.5000000      0.7780750
    BIRNBAUM SAUNDERS          0.8477387                 **   30.00279      0.3283453                 **
    SLASH                      0.8526063       30.46421       3.523827                 **             **
    DOUBLE EXPONENTIAL         0.8691080       29.90000       6.124452                 **             **
    RAYLEIGH                   0.9356298       18.79377       9.882772                 **             **
    GUMBEL (MININUM)           0.9867376       34.50269       7.278262                 **             **
    CAUCHY                      1.200882       29.25895       5.093631                 **             **
    1-PAR MAXWELL               2.618183                 **   18.25977                 **             **
    PARETO                      3.437671       0.000000       1.000000       2.077089       18.53756
    2-PAR BETA                  3.448169       18.83000       26.55100      0.3175827      0.3288633
    2-PAR WEIBULL (MAXIMUM)     3.657144                 **   14.76277       1.088025                 **
    EXPONENTIAL (2-PARAMETER)   4.013110       18.83000       11.98142                 **             **
    UNIFORM                     5.244683       18.83000       26.55100                 **             **
    TOPP AND LEONE              5.711260       18.83000       26.55100       1.920741                 **
    POWER                       7.789723       18.83000       26.55100      0.5217915                 **
    2-PAR FRECHET (MIN)         8.330182                 **  0.9244886      0.1772339                 **
    REFLECTED POWER             10.70867       18.83000       26.55100      0.5568036                 **
    2-COMP NORMAL MIXTURE       23.99764                 **             **   24.57355       35.74121
    2-PAR INVERTED WEIBULL      916.0192                 **  0.3738909E-01   4.659730                 **
     
    
    
    
    
    
                Best Distributional Fit
     
    Response Variable: Y
     
    Fit Method: Maximum Likelihood
    Ranking Criterion: Kolmogorov Smirn
     
    Summary Statistics:
    Number of Observations:                              31
    Sample Minimum:                            18.83000
    Sample Maximum:                            45.38100
    Sample Mean:                               30.81142
    Sample SD:                                 7.253381
     
     
    Ranked List of Best Fit
    ----------------------------------------------------------------------------------------------------
                                    Goodness       Estimate       Estimate       Estimate       Estimate
                                      of Fit             of             of       of Shape       of Shape
    Distribution                   Statistic       Location          Scale    Parameter 1    Parameter 2
    ----------------------------------------------------------------------------------------------------
    TRIANGULAR                 0.1113989       17.33848       49.26861       25.50000                 **
    3-PAR WEIBULL (MINIMUM)    0.1170822       17.64420       14.83507       1.913580                 **
    2-PAR LOGNORMAL            0.1219492                 **   30.00134      0.2349026                 **
    BIRNBAUM SAUNDERS          0.1297689                 **   30.00279      0.3283453                 **
    TWO-SIDED POWER            0.1314438       18.55532       45.96386       25.50000       1.269723
    2-PAR INVERTED GAMMA       0.1318310                 **   555.0562       18.99880                 **
    2-PAR BURR TYPE 10         0.1326033                 **   19.47685       7.276202                 **
    LOGISTIC EXPONENTIAL       0.1329623                 **   43.44812       5.187883                 **
    SLASH                      0.1342469       30.46421       3.523827                 **             **
    2-PAR GAMMA                0.1349165                 **   1.627518       18.93154                 **
    GUMBEL (MAXIMUM)           0.1358038       27.39966       5.986812                 **             **
    LOGISTIC                   0.1425181       30.44662       4.224463                 **             **
    2-PAR FRECHET (MAX)        0.1456718                 **   26.74576       4.659741                 **
    NORMAL                     0.1513989       30.81142       7.253381                 **             **
    2-PAR WEIBULL (MINIMUM)    0.1525868                 **   33.67424       4.635390                 **
    FOLDED NORMAL              0.1539952       30.81142       7.135432                 **             **
    RAYLEIGH                   0.1570339       18.79377       9.882772                 **             **
    DOUBLE EXPONENTIAL         0.1598958       29.90000       6.124452                 **             **
    GUMBEL (MININUM)           0.1601804       34.50269       7.278262                 **             **
    CAUCHY                     0.1612234       29.25895       5.093631                 **             **
    REFL GENE TOPP AND LEONE   0.1625837       18.80345       45.40755      0.5000000      0.7780750
    TOPP AND LEONE             0.1633070       18.83000       26.55100       1.920741                 **
    UNIFORM                    0.1832347       18.83000       26.55100                 **             **
    EXPONENTIAL (2-PARAMETER)  0.2010937       18.83000       11.98142                 **             **
    1-PAR MAXWELL              0.2417328                 **   18.25977                 **             **
    PARETO                     0.2660602       0.000000       1.000000       2.077089       18.53756
    2-PAR BETA                 0.2714764       18.83000       26.55100      0.3175827      0.3288633
    2-PAR WEIBULL (MAXIMUM)    0.2845987                 **   14.76277       1.088025                 **
    POWER                      0.2852870       18.83000       26.55100      0.5217915                 **
    REFLECTED POWER            0.3940263       18.83000       26.55100      0.5568036                 **
    2-PAR FRECHET (MIN)        0.4239265                 **  0.9244886      0.1772339                 **
    2-COMP NORMAL MIXTURE      0.7554719                 **             **   24.57355       35.74121
    2-PAR INVERTED WEIBULL      1.000000                 **  0.3738909E-01   4.659730                 **
     
    
    
    
    
                Best Distributional Fit
     
    Response Variable: Y
     
    Fit Method: PPCC
    Ranking Criterion: PPCC
     
    Summary Statistics:
    Number of Observations:                              31
    Sample Minimum:                            18.83000
    Sample Maximum:                            45.38100
    Sample Mean:                               30.81142
    Sample SD:                                 7.253381
     
     
    Ranked List of Best Fit
    ----------------------------------------------------------------------------------------------------
                                    Goodness       Estimate       Estimate       Estimate       Estimate
                                      of Fit             of             of       of Shape       of Shape
    Distribution                   Statistic       Location          Scale    Parameter 1    Parameter 2
    ----------------------------------------------------------------------------------------------------
    GENERALIZED PARETO (MAX)   0.9892061       20.18056       17.07700     -0.6000000                 **
    REFLECTED POWER            0.9892025       20.24849       28.72833       1.708333                 **
    3-PAR WEIBULL (MINIMUM)    0.9882787       16.08011       16.72055       2.104016                 **
    GENERALIZED EXT VAL (MIN)  0.9882166       32.88338       7.894681      0.4509018                 **
    RAYLEIGH                   0.9881071       16.73097       11.30290                 **             **
    3-PAR BURR TYPE 10         0.9881069       16.71816       15.98673       1.001807                 **
    2-PAR MAXWELL              0.9877088       13.32804       10.99731                 **             **
    BRADFORD                   0.9871834       20.53644       24.96087       1.907258                 **
    3-PAR WEIBULL (MAXIMUM)    0.9869599       74.55574       46.73858       6.913655                 **
    GENERALIZED EXT VAL (MAX)  0.9869491       27.83139       6.788558      0.1503006                 **
    3-PAR GAMMA                0.9867522       5.208567       2.171618       11.82349                 **
    WALD                       0.9865175      -8.640849       39.52321       27.95582                 **
    BIRNBAUM SAUNDERS          0.9865080      -6.302986       36.45857      0.2002008                 **
    G AND H                    0.9863477       30.17988       7.281429      0.1919192       0.000000
    DOUBLE GAMMA               0.9863375       30.81142       2.345620       2.705221                 **
    3-PAR LOGNORMAL            0.9863089      -6.151920       36.30533      0.2002008                 **
    3-PAR INVERTED GAMMA       0.9862202      -19.79904       2367.047       47.70000                 **
    DOUBLE WEIBULL             0.9859972       30.81142       7.044318       1.703213                 **
    3-PAR GEOM EXTREME EXPO    0.9856844       17.88092       4.973479       10.82149                 **
    GENERALIZED PARETO (MIN)   0.9851244       44.69703       33.26714      -1.400000                 **
    ERROR                      0.9830244       30.81142       12.18327       3.400000                 **
    TUKEY-LAMBDA               0.9827152       30.81142       6.945565      0.4000000                 **
    ANGLIT                     0.9826880       30.81142       21.03343                 **             **
    COSINE                     0.9823948       30.81142       6.379694                 **             **
    HALF-NORMAL                0.9822081       21.15793       12.25805                 **             **
    LOGISTIC EXPONENTIAL       0.9813604       2.311842       40.26116       4.823990                 **
    GUMBEL (MAXIMUM)           0.9806594       27.52850       5.921301                 **             **
    LOG LOGISTIC               0.9804096      -18.27834       48.59806       11.74677                 **
    NORMAL                     0.9801995       30.81142       7.382224                 **             **
    UNIFORM                    0.9788557       18.56336       24.49611                 **             **
    3-PAR FRECHET (MAX)        0.9787403      -261.9942       289.4954       50.00000                 **
    3-PAR INVERTED WEIBULL     0.9787403      -261.9942       289.4954       50.00000                 **
    LOGISTIC                   0.9750343       30.81142       4.173573                 **             **
    LOG GAMMA                  0.9748592      -85.60675       36.38708       25.00000                 **
    HYPERBOLIC SECANT          0.9700100       30.81142       4.870712                 **             **
    ASYMMETRIC DOUBLE EXPO     0.9686984       27.98396       7.188828      0.7532995                 **
    ARCSINE                    0.9638854       21.02687       19.56909                 **             **
    LOG DOUBLE EXPONENTIAL     0.9635815      -23.48322       53.86625       10.00000                 **
    DOUBLE EXPONENTIAL         0.9590572       30.81142       5.438864                 **             **
    EXPONENTIAL (2-PARAMETER)  0.9479883       23.50395       7.529710                 **             **
    GUMBEL (MININUM)           0.9350657       33.94171       5.646002                 **             **
    3-PAR FRECHET (MIN)        0.9300918       309.0630       275.1060       50.00000                 **
    SLASH                      0.8225079       30.81142       1.106939                 **             **
    CAUCHY                     0.8092121       30.81142       1.378849                 **             **
        

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 09/22/2011
Last updated: 10/30/2013

Please email comments on this WWW page to alan.heckert@nist.gov.