1.1.7. General Problem Categories

1. Exploratory Data Analysis
1.1. EDA Introduction

1.1.7. General Problem Categories

Problem Classification

The following table is a convenient way to classify EDA problems.

Univariate and Control

UNIVARIATE

Data:

Model:

Output:

A number (the estimated constant in the model).
An estimate of uncertainty for the constant.
An estimate of the distribution for the error.

Techniques:

CONTROL

Data:

Model:

Output:

A "yes" or "no" to the question "Is the system out of control?". Techniques:

Control Charts

Comparative and Screening

COMPARATIVE

Data:

₁

₂

one

Model:

₁

₂

Output:

A "yes" or "no" to the question "Is the primary factor significant?". Techniques:

SCREENING

Data:

₁

₂

Model:

₁

₂

Output:

A ranked list (from most important to least important) of factors.
Best settings for the factors.
A good model/prediction equation relating Y to the factors.

Techniques:

Optimization and Regression

OPTIMIZATION

Data:

₁

₂

Model:

₁

₂

Output:

Best settings for the factor variables. Techniques:

REGRESSION

Data:

₁

₂

Model:

₁

₂

Output:

Techniques:

Time Series and Multivariate

TIME SERIES

Data:

Model:

y_t

Output:

Techniques:

MULTIVARIATE

Data:

₁

₂

Model:

The model is not explicit. Output:

Identify underlying correlation structure in the data. Techniques:

Star Plot
Scatter Plot Matrix
Conditioning Plot
Profile Plot
Principal Components
Clustering
Discrimination/Classification

Note that multivarate analysis is only covered lightly in this Handbook.