PEARSON DISSIMILARITY

Name:

Type:

Let Subcommand Purpose:

Compute the Pearson correlation coefficient transformed to a dissimilarity measure between two variables. Description:

\( S_{yy} = \sum_{i=1}^{N}{(Y_{i} - \bar{Y})^{2}} \)

\( S_{xy} = \sum_{i=1}^{N}{(Y_{i} - \bar{Y})(X_{i} - \bar{X})} \)

\( r = \frac{S_{xy}}{\sqrt{S_{xx} S_{yy}}} \)

A perfect linear relationship yields a correlation coefficient of +1 (or -1 for a negative relationship) and no linear relationship yields a correlation coefficient of 0.

In some applications, such as clustering, it can be useful to transform the correlation coefficient to a dissimilarity measure. The transformation used here is

\( d = \frac{1 - r}{2} \)

This converts the correlation coefficient with values between -1 and 1 to a score between 0 and 1. High positive correlation (i.e., very similar) results in a dissimilarity near 0 and high negative correlation (i.e., very dissimilar) results in a dissimilarity near 1.

If a similarity score is preferred, you can use

\( s = 1 - d \)

where d is defined as above.

Syntax 1:

Syntax 2:

Examples:

Note:

The two variables must have the same number of elements. Default:

None Synonyms:

PEARSON DISTANCE is a synonym for PEARSON DISSIMILARITY Related Commands:

CORRELATION	=	Compute the Pearson correlation of two variables.
SPEARMAN DISSIMILARITY	=	Compute the dissimilarity of two variables based on Spearman's rank correlation.
KENDALL TAU DISSIMILARITY	=	Compute the dissimilarity of two variables based on Kendall's tau correlation.
COSINE DISTANCE	=	Compute the cosine distance.
MANHATTAN DISTANCE	=	Compute the Euclidean distance.
EUCLIDEAN DISTANCE	=	Compute the Euclidean distance.
MATRIX DISTANCE	=	Compute various distance metrics for a matrix.
GENERATE MATRIX <stat>	=	Compute a matrix of pairwise statistic values.

Reference:

Wiley

Applications:

Clustering Implementation Date:

Program 1:

 
SKIP 25
READ BERGER1.DAT Y X
LET CORR = CORRELATION Y X
LET D    = PEARSON DISSIMILARITY Y X
PRINT CORR D

 PARAMETERS AND CONSTANTS--

    CORR    --          0.946
    D       --          0.027

Program 2:

 
SKIP 25
READ IRIS.DAT Y1 Y2 Y3 Y4
SET WRITE DECIMALS 3
.
LET M = GENERATE MATRIX PEARSON DISSIMILARITY Y1 Y2 Y3 Y4
PRINT M

 
        MATRIX M       --            4 ROWS
                       --            4 COLUMNS

 VARIABLES--M1             M2             M3             M4      

         -0.000          0.559          0.075          0.155
          0.559          0.000          0.736          0.534
          0.075          0.736          0.000          0.144
          0.155          0.534          0.144          0.000

Program 3:

SKIP 25 READ IRIS.DAT Y1 Y2 Y3 Y4 TAG . TITLE CASE ASIS TITLE OFFSET 2 LABEL CASE ASIS TIC MARK OFFSET UNITS DATA Y1LABEL Pearson Dissimilarity Coefficient YLIMITS 0 1 MAJOR YTIC MARK NUMBER 6 MINOR YTIC MARK NUMBER 1 Y1TIC MARK LABEL DECIMAL 1 Y1LABEL DISPLACEMENT 20 X1LABEL Species XLIMITS 1 3 MAJOR XTIC MARK NUMBER 3 MINOR XTIC MARK NUMBER 0 XTIC MARK OFFSET 0.3 0.3 X1LABEL DISPLACEMENT 14 CHARACTER X BLANK LINES BLANK SOLID . MULTIPLOT CORNER COORDINATES 5 5 95 95 MULTIPLOT SCALE FACTOR 2 MULTIPLOT 2 3 . TITLE Sepal Length vs Sepal Width CORRELATION PLOT Y1 Y2 TAG . TITLE Sepal Length vs Petal Length CORRELATION PLOT Y1 Y3 TAG . TITLE Sepal Length vs Petal Width CORRELATION PLOT Y1 Y4 TAG . TITLE Sepal Width vs Petal Length CORRELATION PLOT Y2 Y3 TAG . TITLE Sepal Width vs Petal Width CORRELATION PLOT Y2 Y4 TAG . TITLE Petal Length vs Petal Width CORRELATION PLOT Y3 Y4 TAG . END OF MULTIPLOT

plot generated by sample program