 Dataplot Vol 2 Vol 1

# COSINE DISTANCE COSINE SIMILARITY ANGULAR COSINE DISTANCE ANGULAR COSINE SIMILARITY

Name:
COSINE DISTANCE (LET)
COSINE SIMILARITY (LET)
ANGULAR COSINE DISTANCE (LET)
ANGULAR COSINE SIMILARITY (LET)
Type:
Let Subcommand
Purpose:
Compute the cosine distance (or cosine similarity, angular cosine distance, angular cosine similarity) between two variables.
Description:
The cosine similarity is defined as

$$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}}$$

The cosine distance is then defined as

$$\mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity}$$

The cosine distance above is defined for positive values only. It is also not a proper distance in that the Schwartz inequality does not hold. However, the following angular definitions are proper distances:

$$\mbox{angular cosine distance} = \frac{1/\mbox{cosine similarity}} {\pi}$$

$$\mbox{angular cosine similarty} = 1 - \mbox{angular cosine distance}$$

If negative values are encountered in the input, the cosine distances will not be computed. However, the cosine similarities will be computed.

NOTE: The 2018/08 version of Dataplot updated the definition for the angular cosine distance to

$$\mbox{angular cosine distance} = \frac{\mbox{c} \arccos(\mbox{cosine similarity})} {\pi}$$

with $$\arccos$$ designating the arccosine function and where c = 2 if there are no negative values and c = 1 if there are negative values.

Syntax 1:
LET <par> = COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
LET <par> = ANGULAR COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
LET <par> = ANGULAR COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET A = COSINE DISTANCE Y1 Y2
LET A = COSINE SIMILARITY Y1 Y2
LET A = ANGULAR COSINE DISTANCE Y1 Y2
LET A = ANGULAR COSINE SIMILARITY Y1 Y2
LET A = SHORTEST HALF MIDMEAN Y1 SUBSET TAG > 2

LET A = COSINE DISTANCE Y1 Y2 SUBSET Y1 > 0 SUBSET Y2 > 0

Note:
Dataplot statistics can be used in a number of commands. For details, enter

Default:
None
Synonyms:
None
Related Commands:
 EUCLIDEAN DISTANCE = Compute the Euclidean distance. MANHATTAN DISTANCE = Compute the Manhattan distance. MATRIX DISTANCE = Compute various distance metrics for a matrix. CORRELATION = Compute the correlation between two variables.
Reference:
John Foreman (2014), "Data Smart", Wiley.
Applications:
Robust Clustering
Implementation Date:
2017/06
2018/08: Modified formula for angular cosine distance
Program:

SKIP 25
READ IRIS.DAT Y1 TO Y4 X
.
LET COSDIST  = COSINE DISTANCE Y1 Y2
LET COSADIST = ANGULAR COSINE DISTANCE Y1 Y2
LET COSSIMI  = COSINE SIMILARITY Y1 Y2
LET COSASIMI = ANGULAR COSINE SIMILARITY Y1 Y2
SET WRITE DECIMALS 4
TABULATE COSINE DISTANCE Y1 Y2 X

Cross Tabulate COSINE DISTANCE

(Response Variables: Y1       Y2      )
---------------------------------------------
X          |   COSINE DISTANCE
---------------------------------------------
1.0000   |            0.0027
2.0000   |            0.0049
3.0000   |            0.0056

.
XTIC OFFSET 0.2 0.2
X1LABEL GROUP ID
LET NDIST = UNIQUE X
XLIMITS 1 NDIST
MAJOR X1TIC MARK NUMBER NDIST
MINOR X1TIC MARK NUMBER 0
CHAR X
LINE BLANK
LABEL CASE ASIS
CASE ASIS
TITLE CASE ASIS
TITLE OFFSET 2
.
MULTIPLOT CORNER COORDIANTES 5 5 95 95
MULTIPLOT SCALE FACTOR 2
MULTIPLOT 2 2
.
Y1LABEL Cosine Distance
TITLE Cosine Distance (Sepal Length and Sepal Width)
COSINE DISTANCE PLOT Y1 Y2 X
.
Y1LABEL Cosine Similarity
TITLE Cosine Similarity (Sepal Length and Sepal Width)
COSINE SIMILARITY PLOT Y1 Y2 X
.
Y1LABEL Angular Cosine Distance
TITLE Angular Cosine Distance (Sepal Length and Sepal Width)
COSINE ANGULAR DISTANCE PLOT Y1 Y2 X
.
Y1LABEL Angular Cosine Similarity
TITLE Angular Cosine Similarity (Sepal Length and Sepal Width)
ANGULAR COSINE SIMILARITY PLOT Y1 Y2 X
.
END OF MULTIPLOT
JUSTIFICATION CENTER
MOVE 50 98
TEXT Distance/Similarity Measures (IRIS.DAT) .
BOOTSTRAP SAMPLES 1000
CHAR X ALL
LINE BLANK ALL
BOOTSTRAP COSINE DISTANCES PLOT Y1 Y2 X NIST is an agency of the U.S. Commerce Department.

Date created: 07/03/2017
Last updated: 07/03/2017