1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.3. Graphical Techniques: Alphabetic 1.3.3.26. Scatter Plot
|
|||
Scatter Plot Showing Heteroscedastic Variability | |||
Discussion |
This scatter plot of
the Alaska pipeline data
reveals an approximate linear relationship
between X and Y, but more importantly, it reveals
a statistical condition referred to as heteroscedasticity (that
is, nonconstant variation in Y over the values of
X). For a heteroscedastic data set, the variation in
Y differs depending on the value of X. In
this example, small values of X yield small scatter in
Y while large values of X result in large scatter
in Y.
Heteroscedasticity complicates the analysis somewhat, but its effects can be overcome by:
|
||
Impact of Ignoring Unequal Variability in the Data |
Fortunately, unweighted regression analyses on heteroscedastic data
produce estimates of the coefficients that are unbiased. However,
the coefficients will not be as precise as they would be with proper
weighting.
Note further that if heteroscedasticity does exist, it is frequently useful to plot and model the local variation \( \mbox{var}(Y_i | X_i) \) as a function of X, as in \( \mbox{var}(Y_i | X_i) = g(X_i) \) This modeling has two advantages:
|