COSINE DISTANCE
COSINE SIMILARITY
ANGULAR COSINE DISTANCE
ANGULAR COSINE SIMILARITY
Name:
COSINE DISTANCE (LET)
COSINE SIMILARITY (LET)
ANGULAR COSINE DISTANCE (LET)
ANGULAR COSINE SIMILARITY (LET)
Type:
Purpose:
Compute the cosine distance (or cosine similarity, angular
cosine distance, angular cosine similarity) between two variables.
Description:
The cosine similarity is defined as
\( \mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}}
{\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}}
\sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \)
The cosine distance is then defined as
\( \mbox{Cosine Distance} = 1 - \mbox{Cosine Similarity} \)
The cosine distance above is defined for positive values
only. It is also not a proper distance in that the Schwartz
inequality does not hold. However, the following angular
definitions are proper distances:
If negative values are encountered in the input, the
cosine distances will not be computed. However, the
cosine similarities will be computed.
NOTE: The 2018/08 version of Dataplot updated the definition
for the angular cosine distance to
\( \mbox{angular cosine distance} =
\frac{\mbox{c} \arccos(\mbox{cosine similarity})} {\pi} \)
with \( \arccos \) designating the arccosine function and where
c = 2 if there are no negative values and c = 1 if
there are negative values.
Syntax 1:
LET <par> = COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine distance
is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 2:
LET <par> = COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed cosine similarity
is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 3:
LET <par> = ANGULAR COSINE DISTANCE <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine
distance is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Syntax 4:
LET <par> = ANGULAR COSINE SIMILARITY <y1> <y2>
<SUBSET/EXCEPT/FOR qualification>
where <y1> is the first response variable;
<y2> is the second response variable;
<par> is a parameter where the computed angular cosine
similarity is stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
Note:
Dataplot statistics can be used in a number of commands. For
details, enter
Default:
Synonyms:
Related Commands:
Reference:
John Foreman (2014), "Data Smart", Wiley.
Applications:
Implementation Date:
2017/06
2018/08: Modified formula for angular cosine distance
Program:
SKIP 25
READ IRIS.DAT Y1 TO Y4 X
.
LET COSDIST = COSINE DISTANCE Y1 Y2
LET COSADIST = ANGULAR COSINE DISTANCE Y1 Y2
LET COSSIMI = COSINE SIMILARITY Y1 Y2
LET COSASIMI = ANGULAR COSINE SIMILARITY Y1 Y2
SET WRITE DECIMALS 4
TABULATE COSINE DISTANCE Y1 Y2 X
Cross Tabulate COSINE DISTANCE
(Response Variables: Y1 Y2 )
---------------------------------------------
X | COSINE DISTANCE
---------------------------------------------
1.0000 | 0.0027
2.0000 | 0.0049
3.0000 | 0.0056
.
XTIC OFFSET 0.2 0.2
X1LABEL GROUP ID
LET NDIST = UNIQUE X
XLIMITS 1 NDIST
MAJOR X1TIC MARK NUMBER NDIST
MINOR X1TIC MARK NUMBER 0
CHAR X
LINE BLANK
LABEL CASE ASIS
CASE ASIS
TITLE CASE ASIS
TITLE OFFSET 2
.
MULTIPLOT CORNER COORDIANTES 5 5 95 95
MULTIPLOT SCALE FACTOR 2
MULTIPLOT 2 2
.
Y1LABEL Cosine Distance
TITLE Cosine Distance (Sepal Length and Sepal Width)
COSINE DISTANCE PLOT Y1 Y2 X
.
Y1LABEL Cosine Similarity
TITLE Cosine Similarity (Sepal Length and Sepal Width)
COSINE SIMILARITY PLOT Y1 Y2 X
.
Y1LABEL Angular Cosine Distance
TITLE Angular Cosine Distance (Sepal Length and Sepal Width)
COSINE ANGULAR DISTANCE PLOT Y1 Y2 X
.
Y1LABEL Angular Cosine Similarity
TITLE Angular Cosine Similarity (Sepal Length and Sepal Width)
ANGULAR COSINE SIMILARITY PLOT Y1 Y2 X
.
END OF MULTIPLOT
JUSTIFICATION CENTER
MOVE 50 98
TEXT Distance/Similarity Measures (IRIS.DAT)
.
BOOTSTRAP SAMPLES 1000
CHAR X ALL
LINE BLANK ALL
BOOTSTRAP COSINE DISTANCES PLOT Y1 Y2 X
Privacy
Policy/Security Notice
Disclaimer |
FOIA
NIST is an agency of the U.S.
Commerce Department.
Date created: 07/03/2017
Last updated: 07/03/2017
Please email comments on this WWW page to
alan.heckert@nist.gov.
|