4.
Process Modeling
4.1.
Introduction to Process Modeling
4.1.2.
|
What terminology do statisticians use to describe process models?
|
|
Model Components
|
There are three main parts to every process model. These are
- the response variable, usually denoted by \(y\),
- the mathematical function, usually denoted as \(f(\vec{x};\vec{\beta})\), and
- the random errors, usually denoted by \(\varepsilon\).
|
Form of Model
|
The general form of the model is
$$ y = f(\vec{x};\vec{\beta}) + \varepsilon $$
All process models discussed in this chapter have this general form.
As alluded to earlier, the random
errors that are included in the model make the relationship between the
response variable and the predictor variables a "statistical" one, rather
than a perfect deterministic one. This is because the functional
relationship between the response and predictors holds only on average,
not for each data point.
|
|
Some of the details about the different parts of the model are discussed
below, along with alternate terminology for the different components of the
model.
|
Response Variable
|
The response variable, \(y\),
is a quantity
that varies in a way that we hope to be able to summarize and exploit
via the modeling process. Generally it is known that the variation of the
response variable is systematically related to the values of one or more
other variables before the modeling process is begun, although testing the
existence and nature of this dependence is part of the modeling process
itself.
|
Mathematical Function
|
The mathematical function consists of two parts. These parts are
the predictor variables, \(x_1, \, x_2, \, \ldots \, \),
and the parameters, \(\beta_0, \, \beta_1, \, \ldots \, \).
The predictor variables are observed along with the response variable.
They are the quantities described on the previous page as inputs to the
mathematical function, \(f(\vec{x};\vec{\beta})\)
The collection of all of the predictor variables is denoted by
\(\vec{x}\)
for short.
$$ \vec{x} \equiv (x_1, \, x_2, \, \ldots) $$
The parameters are the quantities that will be estimated during the
modeling process. Their true values are unknown and unknowable, except
in simulation experiments. As for the predictor variables, the
collection of all of the parameters is denoted by
\(\vec{\beta}\)
for short.
$$ \vec{\beta} \equiv (\beta_0, \, \beta_1, \, \ldots) $$
|
|
The parameters and predictor variables are combined in different forms
to give the function used to describe the deterministic variation in the
response variable. For a straight line with an unknown intercept and
slope, for example, there are two parameters and one predictor variable
$$ f(x;\vec{\beta}) = \beta_0 + \beta_1x \, .$$
For a straight line with a known slope of one, but an unknown intercept,
there would only be one parameter
$$ f(x;\vec{\beta}) = \beta_0 + x \, .$$
For a quadratic surface with two predictor variables, there are six
parameters for the full model.
$$ f(\vec{x};\vec{\beta}) =
\beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_{12}x_1x_2 +
\beta_{11}x_1^2 + \beta_{22}x_2^2 $$
|
Random Error
|
Like the parameters in the mathematical function, the random errors
are unknown. They are simply the difference between the data and the
mathematical function. They are assumed to follow a particular probability
distribution, however, which is used to describe their aggregate behavior.
The probability distribution that describes the errors has a mean of zero
and an unknown standard deviation, denoted by \(\sigma\),
that is another parameter in the model, like the \(\beta \,\)'s.
|
Alternate Terminology
|
Unfortunately, there are no completely standardardized names for the
parts of the model discussed above. Other publications
or software may
use different terminology. For example, another common name for the
response variable is "dependent variable". The response variable is also
simply called "the response" for short. Other names for the predictor
variables include "explanatory variables", "independent variables",
"predictors" and "regressors". The mathematical function used to describe
the deterministic variation in the response variable is sometimes called
the "regression function", the "regression equation", the "smoothing
function", or the "smooth".
|
Scope of "Model"
|
In its correct usage, the term "model" refers to the
equation above and also includes the underlying assumptions
made about the probability distribution used to describe the variation
of the random errors. Often, however, people will also use the term
"model" when referring specifically to the mathematical function
describing the deterministic variation in the data. Since the function is
part of the model, the more limited usage is not wrong, but it is
important to remember that the term "model" might refer to more than
just the mathematical function.
|