Exploratory Data Analysis
1.3. EDA Techniques
1.3.3. Graphical Techniques: Alphabetic
Check for Relationship
|A scatter plot (Chambers 1983) reveals relationships or association between two variables. Such relationships manifest themselves by any non-random structure in the plot. Various common types of patterns are demonstrated in the examples.
Linear Relationship Between Variables Y and X
Y Versus X
A scatter plot is a plot of the values of Y versus the
corresponding values of X:
Scatter plots can provide answers to the following questions:
|Combining Scatter Plots
Scatter plots can also be combined in multiple plots per page to
help understand higher-level structure in data sets with more than
The scatterplot matrix generates all pairwise scatter plots on a single page. The conditioning plot, also called a co-plot or subset plot, generates scatter plots of Y versus X dependent on the value of a third variable.
|Causality Is Not Proved By Association
The scatter plot uncovers relationships in
data. "Relationships" means that there is some structured
association (linear, quadratic, etc.) between X and Y.
Note, however, that even though
causality implies association
association does NOT imply causality.
The most popular rendition of a scatter plot is
Other scatter plot format variants include
In both cases, the resulting plot is referred to as a scatter plot, although the former (discrete and disconnected) is the author's personal preference since nothing makes it onto the screen except the data--there are no interpolative artifacts to bias the interpretation.
Run Sequence Plot
|The scatter plot is demonstrated in the load cell calibration data case study.
|Scatter plots are a fundamental technique that should be available in any general purpose statistical software program. Scatter plots are also available in most graphics and spreadsheet programs as well.