1.
Exploratory Data Analysis
1.2.
EDA Assumptions
1.2.1.
|
Underlying Assumptions
|
|
Assumptions Underlying a Measurement Process
|
There are four assumptions that typically underlie all measurement
processes; namely, that the data from the process at hand "behave
like":
- random drawings;
- from a fixed distribution;
- with the distribution having fixed location; and
- with the distribution having fixed variation.
|
Univariate or Single Response Variable
|
The "fixed location" referred to in item 3 above differs for
different problem types. The simplest problem type is univariate;
that is, a single variable. For the univariate problem, the
general model
response = deterministic component + random component
becomes
response = constant + error
|
Assumptions for Univariate Model
|
For this case, the "fixed location" is simply the unknown constant.
We can thus imagine the process at hand to be operating under
constant conditions that produce a single column of data with
the properties that
- the data are uncorrelated with one another;
- the random component has a fixed distribution;
- the deterministic component consists of only a constant; and
- the random component has fixed variation.
|
Extrapolation to a Function of Many Variables
|
The universal power and importance of the univariate model is that
it can easily be extended to the more general case where the
deterministic component is not just a constant, but is in fact a
function of many variables, and the engineering objective is to
characterize and model the function.
|
Residuals Will Behave According to Univariate Assumptions
|
The key point is that regardless of how many factors there are,
and regardless of how complicated the function is, if the engineer
succeeds in choosing a good model, then the differences (residuals)
between the raw response data and the predicted values from the
fitted model should themselves behave like a univariate process.
Furthermore, the residuals from this univariate process fit will
behave like:
- random drawings;
- from a fixed distribution;
- with fixed location (namely, 0 in this case); and
- with fixed variation.
|
Validation of Model
|
Thus if the residuals from
the fitted model do in fact behave like the ideal, then testing
of underlying assumptions becomes a tool for the validation and
quality of fit of the chosen model. On the other hand, if the
residuals from the chosen fitted model violate one or more of the
above univariate assumptions, then the chosen fitted model is
inadequate and an opportunity exists for arriving at an improved
model.
|