1.2.5.4. Consequences Related to Distributional Assumptions

1. Exploratory Data Analysis
1.2. EDA Assumptions
1.2.5. Consequences

1.2.5.4. Consequences Related to Distributional Assumptions

Distributional Analysis

Scientists and engineers routinely use the mean (average) to estimate the "middle" of a distribution. It is not so well known that the variability and the noisiness of the mean as a location estimator are intrinsically linked with the underlying distribution of the data. For certain distributions, the mean is a poor choice. For any given distribution, there exists an optimal choice-- that is, the estimator with minimum variability/noisiness. This optimal choice may be, for example, the median, the midrange, the midmean, the mean, or something else. The implication of this is to "estimate" the distribution first, and then--based on the distribution--choose the optimal estimator. The resulting engineering parameter estimators will have less variability than if this approach is not followed.

Case Studies

The airplane glass failure case study gives an example of determining an appropriate distribution and estimating the parameters of that distribution. The uniform random numbers case study gives an example of determining a more appropriate centrality parameter for a non-normal distribution.

Other consequences that flow from problems with distributional assumptions are:

Distribution

The distribution may be changing.
The single distribution estimate may be meaningless (if the process distribution is changing).
The distribution may be markedly non-normal.
The distribution may be unknown.
The true probability distribution for the error may remain unknown.

Model

The model may be changing.
The single model estimate may be meaningless.
The default model
may be invalid.
If the default model is insufficient, information about a better model may remain undetected.
A poor deterministic model may be fit.
Information about an improved model may go undetected.

Process

The process may be out-of-control.
The process may be unpredictable.
The process may be un-modelable.