1.
Exploratory Data Analysis
1.1. EDA Introduction
|
|||
Data Analysis Approaches |
EDA is a data analysis approach. What other data analysis
approaches exist and how does EDA differ from these other approaches?
Three popular data analysis approaches are:
| ||
Paradigms for Analysis Techniques |
These three approaches are similar in that they all start with a
general science/engineering problem and all yield
science/engineering conclusions. The difference is the sequence
and focus of the intermediate steps.
For classical analysis, the sequence is
|
||
Method of dealing with underlying model for the data distinguishes the 3 approaches |
Thus for classical analysis, the data collection is followed by
the imposition of a model (normality, linearity, etc.) and the
analysis, estimation, and testing that follows are focused on the
parameters of that model. For EDA, the data collection is not
followed by a model imposition; rather it is followed immediately
by analysis with a goal of inferring what model would be
appropriate. Finally, for a Bayesian analysis, the analyst
attempts to incorporate scientific/engineering knowledge/expertise
into the analysis by imposing a data-independent distribution on
the parameters of the selected model; the analysis thus consists
of formally combining both the prior distribution on the parameters
and the collected data to jointly make inferences and/or test
assumptions about the model parameters.
In the real world, data analysts freely mix elements of all of the above three approaches (and other approaches). The above distinctions were made to emphasize the major differences among the three approaches. |
||
Further discussion of the distinction between the classical and EDA approaches | Focusing on EDA versus classical, these two approaches differ as follows: |