MATRIX DISTANCE

Name:

MATRIX DISTANCE (LET) Type:

Let Subcommand Purpose:

Compute the distance matrix of a matrix. Description:

Given an nxp data matrix X, we compute a distance matrix D. For row distances, the D_ij element of the distance matrix is the distance between row i and row j, which results in a nxn D matrix. For column distances, the D_ij element of the distance matrix is the distance between column i and column j, which results in a pxp D matrix.

Five distance metrics are available (the 2018/10 version of Dataplot added several additional distances).

The Euclidean row distance between rows i and <>j is defined as
The Euclidean column distance is defined as
The Euclidean distance is simply the square root of the squared differences between corresponding elements of the rows (or columns). This is probably the most commonly used distance metric.
The Mahalanobis distance is defined as
where \( S^{-1} \) is the inverse of the variance-covariance matrix of X. The row distances are obtained by letting X_i and X_j represent the i-th and j-th row while the column distances are obtained by letting X_i and X_j represent the i-th and j-th columns.
The Mahalanobis distance is is effectively a weighted Euclidean distance where the weighting is determined by the sample variance-covariance matrix.
The Minkowsky row distance is defined as
The column distance is similar, but the summation is over the number of rows rather than the number of columns.
The Minkowsky distance is the P-th root of the sum of the absolute differences to the P-th power between corresponding elements of the rows (or columns). The Euclidean distance is the special case of P = 2.
The block row distance is defined as
The column distance is similar, but the summation is over the number of rows rather than the number of columns.
The block distance is the sum of the absolute differences between corresponding elements of the rows (or columns). Note that this is a special case of the Minkowsky distance with P = 1.
The block distance is also known as the city block or Manhattan distance.
The Chebychev row distance is defined as
The column distance is similar, but the maximum is over the rows rather than the columns.
The cosine row similarity is defined as
The cosine distance is then defined as
The cosine distance above is defined for positive values only. It is also not a proper distance in that the Schwartz inequality does not hold. However, the following angular definitions are proper distances:
If negative values are encountered in the input, the cosine distances will not be computed. However, the cosine similarities will be computed.
The column distance and similarity are defined similarly, but the summations are over the rows rather than the columns.
The Canberra row distance is defined as
The column distance is similar, but the summation is over the rows rather than the columns.
The Canberra distance is a weighted version of the block (Manhattan) distance.
The Jaccard row similarity is defined as
Then the Jaccard row distance is defined as
The Jaccard column distance and similarity are defined similarly, but the summation is over the rows rather than the columns.
The Pearson row distance is defined as
where R_ij is the correlation coefficient between rows i and j.
The Pearson row similarity is then defined as
The Pearson column distance and similarity are defined similarly, but the correlation is over the rows rather than the columns.
The Hamming row distance is defined as
The column distance is similar, but the number of elements that differ is compared between two columns rather than two rows.

Many multivariate techniques are based on distance matrices.

Syntax 1:

This syntax computes row distances.

Syntax 2:

This syntax computes column distances.

Syntax 3:

This syntax computes row similarities.

Syntax 4:

This syntax computes column similarities.

Examples:

LET D = BLOCK ROW DISTANCE M
LET D = BLOCK COLUMN DISTANCE M

LET D = MAHALANOBIS ROW DISTANCE M
LET D = MAHALANOBIS COLUMN DISTANCE M

LET P = 1.5
LET D = MINKOWSKY ROW DISTANCE M
LET D = MINKOWSKY COLUMN DISTANCE M

LET D = COSINE ROW DISTANCE M
LET D = COSINE COLUMN DISTANCE M

LET D = COSINE ROW SIMILARITY M
LET D = COSINE COLUMN SIMILAITY M

LET D = JACCARD ROW DISTANCE M
LET D = JACCARD COLUMN DISTANCE M

LET D = JACCARD ROW SIMILARITY M
LET D = JACCARD COLUMN SIMILAITY M

LET D = PEARSON ROW DISTANCE M
LET D = PEARSON COLUMN DISTANCE M

LET D = PEARSON ROW SIMILARITY M
LET D = PEARSON COLUMN SIMILARITY M

Note:

Matrices are created with either the READ MATRIX command or the MATRIX DEFINITION command. Enter HELP MATRIX DEFINITION and HELP READ MATRIX for details. Note:

LET P = <value>

Note:

It is often desirable to scale the matrix before computing the distances. Dataplot provides several scaling options. Enter HELP MATRIX SCALE for details. Note:

The correlation matrix and covariance matrix can be considered distance matrices as well. Default:

None Synonyms:

None Related Commands:

READ MATRIX	=	Read a matrix.
MATRIX COLUMN DIMENSION	=	= Dimension maximum number of columns for Dataplot matrices.
CORRELATION MATRIX	=	Compute the correlation matrix.
VARIANCE-COVARIANCE MATRIX	=	Compute the variance-covariance matrix.
DISTANCE FROM MEAN	=	Compute the distance from the mean for a matrix.

Reference:

"Applied Multivariate Statistical Analysis", Third Edition, Johnson and Wichern, Prentice-Hall, 1992.

Applications:

Multivariate Analysis Implementation Date:

Program:

dimension 100 columns
set write decimals 4
set read missing value -999
.
let iflag1 = 1
. let iflag1 = 2
. let iflag1 = 3
. let iflag1 = 4
. let iflag1 = 5
. let iflag1 = 6
. let iflag1 = 7
. let iflag1 = 8
. let iflag1 = 9
. let iflag1 = 10
. let iflag1 = 11
let iflag2 = 1
. let iflag2 = 2
.
skip 25
read matrix iris.dat x
.
if iflag1 = 1
   if iflag2 = 1
      let v = euclidean column distance x
   else if iflag2 = 2
      let v = euclidean row distance x
   end of if
else if iflag1 = 2
   if iflag2 = 1
      let v = block column distance x
   else if iflag2 = 2
      let v = block row distance x
   end of if
else if iflag1 = 3
   let p = 1.5
   if iflag2 = 1
      let v = minkowski column distance x
   else if iflag2 = 2
      let v = minkowski row distance x
   end of if
else if iflag1 = 4
   if iflag2 = 1
      let v = chebychev column distance x
   else if iflag2 = 2
      let v = chebychev row distance x
   end of if
else if iflag1 = 5
   if iflag2 = 1
      let v = jaccard column distance x
   else if iflag2 = 2
      let v = jaccard row distance x
   end of if
else if iflag1 = 6
   if iflag2 = 1
      let v = jaccard column similarity x
   else if iflag2 = 2
      let v = jaccard row similarity x
   end of if
else if iflag1 = 7
   if iflag2 = 1
      set isubro cdis
      let v = cosine column distance x
      set isubro
   else if iflag2 = 2
      let v = cosine row distance x
   end of if
else if iflag1 = 8
   if iflag2 = 1
      let v = cosine column similarity x
   else if iflag2 = 2
      let v = cosine row similarity x
   end of if
else if iflag1 = 9
   if iflag2 = 1
      let v = hamming column distance x
   else if iflag2 = 2
      let v = hamming row distance x
   end of if
else if iflag1 = 10
   if iflag2 = 1
      let v = canberra column distance x
   else if iflag2 = 2
      let v = canberra row distance x
   end of if
else if iflag1 = 11
   if iflag2 = 1
      let v = pearson column distance x
   else if iflag2 = 2
      let v = pearson row distance x
   end of if
end of if
.
print v

 
        MATRIX V       --            5 ROWS
                       --            5 COLUMNS

 VARIABLES--V1             V2             V3             V4             V5      

         0.0000        36.1578        35.2624        61.4890        47.5358
        36.1578         0.0000        30.1846        29.3828        18.4770
        35.2624        30.1846         0.0000        37.6646        24.9542
        61.4890        29.3828        37.6646         0.0000        14.9040
        47.5358        18.4770        24.9542        14.9040         0.0000