SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

CLASSIFICATION SCATTER PLOT

Name:
    CLASSIFICATION SCATTER PLOT
Type:
    Graphics Command
Purpose:
    Generates a classification scatter plot.
Description:
    This plot is a variant of the dex scatter plot. For the dex scatter plot, the first variable is a response variable and the remaining variables are factor variables. The factor variables are typically qualitative variables (this plot is most typically used in the context of 2-level designed experiments, but it can be used for the case when there are more than two levels for some factors). For the dex scatter plot, a separate subplot is drawn for each factor with the subplot for factor k centered horizontally at x=k. Each subplot has a given horizontal width (defined by the DEX WIDTH command, defaults to 0.5). For example, the subplot for factor 2 ranges from 1.8 to 2.2 on the horizontal axis. The levels of the factor are assigned an x coordinate within this range (from lowest to highest). Then within each subplot:

      Vertical axis = value of the response variable;
      Horizontal axis = value of the level of a given factor.

    The classification scatter plot reverses the role of the reponse variable and the factor variables. For the classification scatter plot, the Y axis variable is assumed to be qualitative (i.e., a specific number of levels) and the factor variables are assumed to be continuous (the plot will still work if some of the factor variables are also qualitative). The context is the common classification problem where we use the values of the factor variables to classify which group an observation belongs to.

    For this plot, the subplots are based on the distinct levels of the response variable. For example, suppose the Y axis variable (Y) has two possible values. Then for the first factor variable (X1), we plot the values of X1 corresponding to Y = 1 with x-coordinate 0.8 and the we plot the values of X1 corresponding to Y = 2 with x-coordinate 1.2. A similar subplot is created for each factor variable.

    This plot can be useful in determing what are the most important factors in determining a classification.

Syntax:
    CLASSIFICATION SCATTER PLOT <y> <x1> ... <xk>
                            <SUBSET/EXCEPT/FOR qualification>
    where <y> is the response variable (qualitative); and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
    CLASSIFICATION SCATTER PLOT Y X1 X2
    CLASSIFICATION SCATTER PLOT Y X1 X2 X3
    CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
    CLASSIFICATION SCATTER PLOT Y X1 TO X4
Note:
    The TO syntax is allowed for the list of factor variables (see the EXAMPLES above).
Note:
    The CHARACTER and LINE settings can be used to control the appearance of the plot. If there are m levels for the response variable and k factor variables, the first m*k traces define the values corresponding to each response/factor variable combination (i.e., each column of the plot is assigned a different trace where the numbering is from left to right on the plot). This can be useful if would like to color code the levels of the response variable or give them some other identifying value. The m*k + 1 trace draws a reference line. In this case, the reference line is the mean y value for the points on the plot. This is demonstrated in the Program example below.

    For each trace, the mean, standard deviation, minimum, and maximum value for that trace is written to the file dpst4f.dat. This can be useful for annotating the plot

Default:
    None
Synonyms:
    None
Related Commands: Applications:
    Classification
Implementation Date:
    2019/03
Program:
     
    . Step 1:   Read the data
    .
    SET WRITE DECIMALS 3
    DIMENSION 40 COLUMNS
    SKIP 25
    READ IRIS.DAT X1 TO X4 Y
    SKIP 0
    .
    LET NFACT = 4
    LET STRING T1 = Sepal Length
    LET STRING T2 = Sepal Width
    LET STRING T3 = Petal Length
    LET STRING T4 = Petal Width
    .
    LOOP FOR K = 1 1 NFACT
        LET MEAN^K = MEAN X^K; LET MEAN^K = ROUND(MEAN^K,3)
        LET SD^K = SD X^K; LET SD^K = ROUND(SD^K,3)
    END OF LOOP
    .
    . Step 2:   Set plot control features
    .
    CASE ASIS
    TITLE CASE ASIS
    LABEL CASE ASIS
    TIC MARK LABEL CASE ASIS
    TITLE OFFSET 2
    .
    CHARACTERS 1 2 3 1 2 3 1 2 3 1 2 3 BLANK
    CHARACTER COLOR BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN
    LINES COLOR BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN BLUE RED GREEN
    LET PLOT LINE 13 = BLANK
    XLIMITS 1 NFACT
    MAJOR XTIC MARK NUMBER NFACT
    MINOR XTIC MARK NUMBER 0
    TIC MARK OFFSET UNITS DATA
    XTIC OFFSET 1 1
    XTIC LABEL FORMAT ALPHA
    XTIC LABEL CONTENT F1:sp()Sepalcr()Length F2:sp()Sepalcr()Width ...
                       F3:sp()Petalcr()Length F4:sp()Petalcr()Width
    Y1LABEL Standardized Feature
    X1LABEL Features
    X1LABEL DISPLACEMENT 12
    YLIMITS -4 4
    .
    LET X1 = STANDARDIZE X1
    LET X2 = STANDARDIZE X2
    LET X3 = STANDARDIZE X3
    LET X4 = STANDARDIZE X4
    .
    . Step 3:   Generate plots
    .
    TITLE Classification Scatter Plot: Standardized Units
    CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
    .
        
    TITLE IRIS Classification Analysis Based on Standardized Data
    .
    CLASSIFICATION SCATTER PLOT Y X1 X2 X3 X4
    .
    LET XCOOR1 = 86
    LET XCOOR2 = 88
    LET YCOOR  = 89
    LET YINC   = 2.5
    JUSTIFICATION LEFT
    COLOR BLACK
    HEIGHT 2
    .
    LOOP FOR K = 1 1 NFACT
        MOVE XCOOR1 YCOOR
        TEXT F^K: ^T^K
        LET YCOOR = YCOOR - YINC
        MOVE XCOOR2 YCOOR
        TEXT Mean = ^MEAN^K
        LET YCOOR = YCOOR - YINC
        MOVE XCOOR2 YCOOR
        TEXT SD   = ^SD^K
        LET YCOOR = YCOOR - YINC
    END OF LOOP
    .
    COLOR BLUE
    MOVE XCOOR1 45
    TEXT Cat1: Setosa
    .
    COLOR RED
    MOVE XCOOR1 42.5
    TEXT Cat2: Versicolor
    .
    COLOR GREEN
    MOVE XCOOR1 40
    TEXT Cat3: Virginica
    .
    skip 1
    read dpst4f.dat ymean ysd ymin ymax
    skip 0
    let ymean = round(ymean,1)
    let ysd   = round(ysd,1)
    let ymin  = round(ymin,1)
    let ymax  = round(ymax,1)
    .
    character blank all
    character size 1.5 all
    character just right all
    character color blue blue blue blue red red red red green green green green
    let nlen = 1
    let sblank = blank string nlen
    .
    set substitute format f4.1
    loop for l = 1 1 nfact
        loop for k = 1 1 3
            let k2 = (l-1)*3 + k
            let aval = ymax(k2)
            let bval = ymean(k2)
            let cval = ymin(k2)
            let dval = ysd(k2)
            let string s2 = ^aval ^bval ^cval ^dval
            if k = 1
               let string s = ^s2
            else
               let s = string concatenate s sblank s2
            end of if
        end of loop
        character ^s
        let xpos = sequence 0.8 4 0.2 1.21
        let xpos = (l - 1) + xpos
        let ypos = sequence 26 -1.5 21 for i = 1 1 12
        let tag = sequence 1 1 12
        drawds symbol xpos ypos tag
        delete s s2
    end of loop
    .
    color black
    height 1.5
    justification left
    move xcoor1 25.5
    text Max
    move xcoor1 24
    text Mean
    move xcoor1 22.5
    text Min
    move xcoor1 21
    text SD
    .
    height 2
    just left
    move 2 7.5
    text if F3 <= -0.7, then cat = 1
    move 2 5
    text if F4 >=  0.4, then cat = 3
    move 2 2.5
    text else                cat = 2
    .
    line color black
    line dotted
    drawsdsd 15 0.4 85 0.4
    drawsdsd 15 -0.7 85 -0.7
        
Date created: 03/14/2019
Last updated: 12/04/2023

Please email comments on this WWW page to alan.heckert@nist.gov.