Next Page Previous Page Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.1. EDA Introduction

1.1.7.

General Problem Categories

Problem Classification The following table is a convenient way to classify EDA problems.
Univariate and Control
UNIVARIATE

Data:

    A single column of numbers, Y.
Model:
    y = constant + error
Output:
  1. A number (the estimated constant in the model).
  2. An estimate of uncertainty for the constant.
  3. An estimate of the distribution for the error.
Techniques:
CONTROL

Data:

    A single column of numbers, Y.
Model:
    y = constant + error
Output:
    A "yes" or "no" to the question "Is the system out of control?".
Techniques:
Comparative and Screening
COMPARATIVE

Data:

    A single response variable and k independent variables (Y, X1, X2, ... , Xk), primary focus is on one (the primary factor) of these independent variables.
Model:
    y = f(x1, x2, ..., xk) + error
Output:
    A "yes" or "no" to the question "Is the primary factor significant?".
Techniques:
SCREENING

Data:

    A single response variable and k independent variables (Y, X1, X2, ... , Xk).
Model:
    y = f(x1, x2, ..., xk) + error
Output:
  1. A ranked list (from most important to least important) of factors.
  2. Best settings for the factors.
  3. A good model/prediction equation relating Y to the factors.
Techniques:
Optimization and Regression
OPTIMIZATION

Data:

    A single response variable and k independent variables (Y, X1, X2, ... , Xk).
Model:
    y = f(x1, x2, ..., xk) + error
Output:
    Best settings for the factor variables.
Techniques:
REGRESSION

Data:

    A single response variable and k independent variables (Y, X1, X2, ... , Xk). The independent variables can be continuous.
Model:
    y = f(x1, x2, ..., xk) + error
Output:
    A good model/prediction equation relating Y to the factors.
Techniques:
Time Series and Multivariate
TIME SERIES

Data:

    A column of time dependent numbers, Y. In addition, time is an indpendent variable. The time variable can be either explicit or implied. If the data are not equi-spaced, the time variable should be explicitly provided.
Model:
    yt = f(t) + error
    The model can be either a time domain based or frequency domain based.
Output:
    A good model/prediction equation relating Y to previous values of Y.
Techniques:
MULTIVARIATE

Data:

    k factor variables (X1, X2, ... , Xk).
Model:
    The model is not explicit.
Output:
    Identify underlying correlation structure in the data.
Techniques: Note that multivarate analysis is only covered lightly in this Handbook.
Home Tools & Aids Search Handbook Previous Page Next Page