|
K NEAREST NEIGHBORS CLASSICIATION PLOTName:
The data consists of both training data and observations to be classified. The training data are observations where the group-id is known. The k nearest neighbors classifies an observation based on the most common class in the k nearest neighbors. If there are ties, the class with the combined minimum distance to the observation will be the class to which the observation is assigned. The criterion for "nearest" is the Euclidean distance between the observation and one of the training observations. By default, the Euclidean distance will be used. See the Note section below to see how to use a different distance metric. If there are two variables, the y-axis is the first variable and the x-axis is the second variable. If there are more than two variables, the y-axis is the first principal component of all of the variables and the x-axis is the second principal component of all of the variables. For the case with more than two variables you can alternatively choose to use the first two variables rather than the principal components by entering the command
COMPONENTS NO To reset the default, enter
COMPONENTS YES If there are L categories, then the first L traces in the plot are the L categories for the training data. Similarly, traces L+1 to 2*L are the L categories for the observations to be classified. The ordering is from the low value for category to the high value of category. For example, if there are two categories, you might do something like
CHARACTER CIRCLE SQUARE CIRCLE SQUARE CHARACTER FILL OFF OFF ON ON CHARACTER COLOR BLACK BLACK RED BLUE This will draw the training observations as black unfilled circles and squares and the observations to be classified as red or blue filled circles or squares. This is demonstrated in the Program example below.
<SUBSET/EXCEPT/FOR qualification> where <y1> ... <yk> is a list of response variables; <tag> is the group-id variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional. All of the variables must have the same length. Values of the <tag> variable with a value of zero are the observations to be classified. If no values in the <tag> variable are identified with a value of zero, an error will be reported and no plot is generated.
K NEAREST NEIGHBORS CLASSIFICATION PLOT X1 X2 X3 TAG ... SUBSET TAG >= 0
where <value> is a positive integer. There is a trade-off in setting the value of K. Larger values of K can reduce the effect of noise on the classification at the cost of making the boundaries between classes less distinct.
where distance is one of the following
MINKOWSKY BLOCK CANBERRA CHEBYCHEV COSINE ANGULAR COSINE JACCARD PEARSON HAMMING Enter HELP MATRIX DISTANCE to see the definition of these differences.
The K NEAREST NEIGHBORS CLASSIFICATION PLOT does not standardize the data. If you want to standardize the data, do that before utilizing this command. This is demonstrated in the Program example below. Performing the standardization as a separate step allows more flexibility in the choice of standardization method.
If there are more than 50 variables, it is recommended that some type of dimension reduction, such as principal components, be used.
The first two principal components will be used for the plot when there are more than two variables. Euclidean distances will be used.
K NEAREST NEIGHBORS DISCRIMINATION PLOT KNN CLASSIFICATION PLOT KNN DISCRIMINATION PLOT KNN PLOT
. Step 1: Read the data and standardize based on z-scores . skip 25 read iris.dat seplen sepwidth petlen petwidth tag skip 0 let x1 = zscore seplen let x2 = zscore sepwidth let x3 = zscore petlen let x4 = zscore petwidth . . Step 2: Generate the plot . line blank all character hw 1 0.75 all character circle triangle revtri circle triangle revtri character fill off off off on on on character color black black black blue red green . y1label First Principal Component x1label Second Principal Component title K Nearest Neighbors Classification Plot for Iris Data . . Step 3: Specify rows to be classified . let tag2 = tag let tag2 = 0 for i = 10 10 150 let ktemp = 3 set nearest neighbor classification k ^ktemp x2label K = ^ktemp . k nearest neighbors classification plot x1 x2 x3 x4 tag2
Date created: 07/31/2024 |
Last updated: 07/31/2024 Please email comments on this WWW page to alan.heckert@nist.gov. |