SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 2 Vol 1

WEIGHTED CORRELATION
WEIGHTED COVARIANCE
WEIGHTED COSINE DISTANCE
WEIGHTED COSINE SIMILARITY

Name:
    WEIGHTED CORRELATION (LET)
    WEIGHTED COVARIANCE (LET)
    WEIGHTED COSINE DISTANCE (LET)
    WEIGHTED COSINE SIMILARITY (LET)
Type:
    Let Subcommand
Purpose:
    Compute the weighted correlation coefficient between two variables.
Description:
    Given paired response variables x and y of length n and a weights variable w, the weighted covariance is computed with the formula

      \( cov(x,y;w) = \frac {\sum_{i=1}^{n}{w_{i} (x_{i} - m(x;w))(Y_{i} - m(y;w))}} {\sum_{i=1}^{n}{w_{i}}} \)

    where \( m \) denotes the weighted mean

      \( m(x:w) = \frac{\sum_{i=1}^{n}{w_{i} x_{i}}} {\sum_{i=1}^{n}{w_{i}}} \)

    The weighted correlation coefficient is computed with the formula

      \( \begin{array}{lcl} r & = & \frac{S_{xy}} {\sqrt{S_{xx} S_{yy}}} \\ & = & \frac{cov(x,y;w)} {\sqrt{cov(x,x;w) cov(y,y;w)}} \end{array} \)

    where

      \( S_{xx} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w))^{2}} \) \( S_{yy} = \sum_{i=1}^{n}{w_{i} (y_{i} - M(y;w))^{2}} \) \( S_{xy} = \sum_{i=1}^{n}{w_{i} (x_{i} - M(x;w)) (y_{i} - M(y;w))} \)

    The cosine similarity, which is equivalent to the reflective correlation coefficient, is defined as

      \( \mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \)

    The cosine distance is then defined as

      \( \mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity} \)

    The weighted cosine similarity is defined as

      \( \mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{w_{i} x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{w_{i} x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{w_{i} y_{i}^{2}}}} \)

    The weighted cosine distance is then defined as

      \( \mbox{Weighted Cosine Distance} = 1 - \mbox{Weighted Cosine Similarity} \)

    A weighted linear regression is sometimes used when the error variances are not homogeneous (e.g, variances are often higher in one or both tails). In these cases, you may also want to obtain a weighted correlation coefficient using the same weights as the linear fit.

    The Alaska pipeline case study in the NIST/SEMATECH e-Handbook of Statistical Methods gives an example of how weights can be determined. Although this is done in the context of a regression analysis, the same approach applies to weighted correlation and weighted covariance. See

    https://www.itl.nist.gov/div898/handbook/pmd/section6/pmd625.htm

    If you have grouped data (i.e., a bivariate frequency table), use the GROUPED CORRELATION command. Grouped correlation is similar to weighted correlation, but a different computational formula is used.

Syntax 1:
    LET <par> = WEIGHTED CORRELATION <y1> <y2> <weights>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <weights> is the weights variable;
                <par> is a parameter where the computed weighted correlation is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
    LET <par> = WEIGHTED COVARIANCE <y1> <y2> <weights>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <weights> is the weights variable;
                <par> is a parameter where the computed weighted covariance is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
    LET <par> = WEIGHTED COSINE DISTANCE <y1> <y2> <weights>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <weights> is the weights variable;
                <par> is a parameter where the computed weighted cosine distance is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
    LET <par> = WEIGHTED COSINE SIMILARITY <y1> <y2> <weights>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y1> is the first response variable;
                <y2> is the second response variable;
                <weights> is the weights variable;
                <par> is a parameter where the computed weighted cosine similarity is stored;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    LET A = WEIGHTED CORRELATION Y1 Y2 WEIGHTS
    LET A = WEIGHTED COVARIANCE Y1 Y2 WEIGHTS
    LET A = WEIGHTED COSINE DISTANCE Y1 Y2 WEIGHTS
    LET A = WEIGHTED COSINE SIMILARITY Y1 Y2 WEIGHTS
Note:
    Dataplot statistics can be used in a number of commands. For details, enter

Default:
    None
Synonyms:
    None
Related Commands:
    GROUPED CORRELATION = Compute the correlation coefficient based on bivariate frequency data.
    WEIGHED COSINE DISTANCE = Compute the weighted cosine distance.
    CORRELATION = Compute the correlation of two variables.
    COVARIANCE = Compute the covariance of two variables.
Reference: Applications:
    Linear Regression
Implementation Date:
    2018/10
Program:
     
    . Step 1:   Read the data
    .
    skip 25
    read berger1.dat y x batch
    .
    . Step 2:   Compute both the unweighted and weighted correlations
    .
    .           Weights from e-Handbook case study of Alaska pipeline data
    .
    let wt = 1/(x**(1.5))
    let corr = correlation y x
    let wtcorr = weighted correlation y x wt
    let cov = covariance y x
    let wtcov = weighted covariance y x wt
    .
    set write decimals 3
    print "Unweighted correlation:  ^corr"
    print "Weighted correlation:    ^wtcorr"
    print "Unweighted covariance:   ^cov"
    print "Weighted covariance:     ^wtcov"
        
    The following output is returned
     
    Unweighted correlation:  0.945581893216
    Weighted correlation:    0.983560277814
    Unweighted covariance:   423.101490037
    Weighted covariance:     500.3662749985
        

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 11/08/2018
Last updated: 08/10/2020

Please email comments on this WWW page to alan.heckert@nist.gov.