Given paired response variables x and y of length
n and a weights variable w, the weighted covariance is
computed with the formula
\( cov(x,y;w) = \frac
{\sum_{i=1}^{n}{w_{i} (x_{i} - m(x;w))(Y_{i} - m(y;w))}}
{\sum_{i=1}^{n}{w_{i}}} \)
where \( m \) denotes the weighted mean
\( m(x:w) = \frac{\sum_{i=1}^{n}{w_{i} x_{i}}}
{\sum_{i=1}^{n}{w_{i}}} \)
The weighted correlation coefficient is computed with the formula
\( \begin{array}{lcl}
r & = & \frac{S_{xy}} {\sqrt{S_{xx} S_{yy}}} \\
& = & \frac{cov(x,y;w)} {\sqrt{cov(x,x;w) cov(y,y;w)}}
\end{array}
\)
where
\( S_{xx} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w))^{2}} \)
\( S_{yy} = \sum_{i=1}^{n}{w_{i} (y_{i} - M(y;w))^{2}} \)
\( S_{xy} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w)) (y_{i} - M(y;w))} \)
The cosine similarity, which is equivalent to the reflective
correlation coefficient, is defined as
\( \mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}}
{\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}}
\sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \)
The cosine distance is then defined as
\( \mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity} \)
The weighted cosine similarity is defined as
\( \mbox{Cosine Similarity} =
\frac{\sum_{i=1}^{n}{w_{i} x_{i} y_{i}}}
{\sqrt{\sum_{i=1}^{n}{w_{i} x_{i}^{2}}}
\sqrt{\sum_{i=1}^{n}{w_{i} y_{i}^{2}}}} \)
The weighted cosine distance is then defined as
\( \mbox{Weighted Cosine Distance} = 1 - \mbox{Weighted Cosine Similarity} \)
A weighted linear regression is sometimes used when the error
variances are not homogeneous (e.g, variances are often higher in
one or both tails). In these cases, you may also want to obtain a
weighted correlation coefficient using the same weights as the linear
fit.
The Alaska pipeline case study in the NIST/SEMATECH e-Handbook of
Statistical Methods gives an example of how weights can be
determined. Although this is done in the context of a regression
analysis, the same approach applies to weighted correlation and
weighted covariance. See
https://www.itl.nist.gov/div898/handbook/pmd/section6/pmd625.htm
If you have grouped data (i.e., a bivariate frequency table), use the
GROUPED CORRELATION command. Grouped correlation is similar to
weighted correlation, but a different computational formula is used.
Syntax 1: