|
SCATTER PLOT MATRIXName:
The pairwise plots need not be limited to scatter plots. Dataplot allows you to generate the pairwise plots for approximately 10 different plot types. There are a number of alternatives for the appearance of this plot. Dataplot tries to balance simplicity with flexibility by using default settings, but providing numerous SET commands to control the appearance of the plot. These are described in detail in the NOTES section below.
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. Up to 25 response variables can be specified.
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; <tag> is a group id variable (and is always given last); and where the <SUBSET/EXCEPT/FOR qualification> is optional. This is a special form of the command that plots
<SUBSET/EXCEPT/FOR qualification> where <y1> through <yk> are the response variables; <stat> defines a statistic, such as MEAN or MEDIAN, for the plot; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This is a special form of the command that plots
for the individual plots. <stat> is optional. If no statistic is specified, an INTERACTION PLOT of the raw data is generated.
SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 Y5 SUBSET TAG > 2
where <value> is one of the following. The folllowing plot two variables (e.g., BIHISTOGRAM Y1 Y2).
The folllowing plot Y X1 X2 (e.g., DEX CONTOUR PLOT Y X1 X2). That is, the response variable is the first variable in the list, and it remains constant for all the pairwise plots.
A few of the above plots support a <statistic> option. This can be one of 30+ supported statistics (the supported statistics are identical to those for the STATISTIC PLOT and the BOOTSTRAP PLOT). It is typically a location statistic (e.g., MEAN, MEDIAN) or a scale statistic (e.g., STANDARD DEVIATION, VARIANCE, MAD). Dataplot automatically defines X1LABEL, X2LABEL, and YLABEL commands for these plots. You can control the attributes of these labels with the standard label setting commands. If you have defined variable labels (with the VARIABLE LABEL command), these will automatically be substituted for variable names in the labels. If you have defined variable labels with the VARIABLE LABEL command and you want to suppress the automatic expansion of the variable name to the variable label, enter
To restore the default that variable names will be expanded to the corresponding variable label, enter
OFF means that all axis labels are suppressed (this can be useful if a large number of variables are being plotted). ON means that both X and Y axis labels are printed. XON only plots the x axis labels and YON only plots the y axis labels. BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. The default is ON (both x and y axis labels are printed).
BOTTOM specifies that the x axis labels are printed on the bottom axis (on the last row only). TOP specifies that the x axis labels are printed on the top axis (first row only). ALTERNATE specifies that the x axis labels alternate between the top (first row) and bottom axis (last row). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks. The default is ALTERNATE.
LEFT specifies that the y axis labels are printed on the left axis (on the first column only). RIGHT specifies that the y axis labels are printed on the right axis (last column only). ALTERNATE specifies that the y axis labels alternate between the left (first column) and right axis (last column). We recommend using the TIC OFFSET command to avoid overlap of axis labels and tic marks. The default is ALTERNATE.
DEFAULT connects neighboring frames (i.e., the FRAME CORNER
COORDINATES are set to 0 0 100 100). USER uses whatever
frame coordinates are currently set (15 20 85 90 by default)
and makes no special provisions for axis labels and tic marks
(i.e., you set them as you normally would, each plot uses
whatever you have set). CONNECTED uses whatever frame
coordinates have been set by the user, but it draws the axis
labels and tic marks as if DEFAULT were being used (that is, as
determined by the SET SCATTER PLOT MATRIX
The default is DEFAULT.
NORMAL means that all tic labels are plotted at a distance determined by the TIC LABEL DISPLACEMENT command. STAGGERED means that alternating plots will be staggered. That is, one will use the standard displacement while the next uses a staggered value. Entering this command with a numeric value specifies the amount of the displacement for the staggered tic labels. For example,
SET SCATTER PLOT MATRIX LABEL DISPLACEMENT STAGGERED SET SCATTER PLOT MATRIX LABEL DISPLACEMENT 25 These commands specify that the default tic label displacement is 10 and the staggered tic mark label displacement is 25.
NONE means that no fitted line is plotted. LOWESS means that a locally weighted least squares line will be overlaid. LINE means that a linear fit (Y = A0 + A1*X) will be overlaid. QUAD means that a quadratic fit (Y = A0 + A1*X + A2*X**2) will be overlaid. SMOOTH means that a least squares smoothing will be overlaid. For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6). The fitted line is currently only generated if the scatter plot matrix plot type is PLOT. The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.
In this form of the plot command, TAG is a group identifier variable. Points belonging to the same group are plotted with the same attributes (controlled by the CHARACTER and LINE commands and their various attribute setting commands). Using a tag variable has two common purposes:
You can specify that the scatter plot matrix use the form of the PLOT command by using the command
OFF specifies that the standard plot command (PLOT Y1 Y2) will be used. ON specifies that the last variable on the SCATTER PLOT MATRIX command is a tag variable. That is, it is not plotted directly, but is instead the third variable on all the plot commands generated by the scatter plot matrix. Currently, this command only applies if the scatter plot matrix plot type is set to PLOT. This form is common enough that the command (see Syntax 2)
implements this automatically. That is, YOUDEN MATRIX PLOT is equivalent to
SCATTER PLOT MATRIX Y1 Y2 ... YK TAG In some cases, you may want to use a tag variable for both purposes. That is, you may have natural groups in your data, but you also want to flag certain outlying points. You can do this by using a SUBSET clauuse. For example,
SET SCATTER PLOT MATRIX TAG ON CHARACTER CIRCLE SQUARE TRIANGLE CHARACTER FILL OFF OFF OFF SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 <= 100 PRE-ERASE OFF CHARACTER FILL ON ON ON SCATTER PLOT MATRIX Y1 Y2 Y3 Y4 TAG SUBSET Y2 > 100 The SET SCATTER PLOT MATRIX LIMITS command, discussed below, can be used to control the axis limits for the individual plots. The default is OFF.
Note that the pairs of limits correspond to the variable list in the SCATTER PLOT MATRIX command. That is, if Y3 is the third variable in the command, Dataplot will set the YLIMITS when Y3 is plotted on the y axis and the XLIMITS when Y3 is plotted on the x axis. This command is particularly useful if you want to overlay scatter plot matrices (the example discussed for the SET SCATTER PLOT MATRIX TAG command gives an example of where you might want to do this). The default is to allow the axis limits to float with the data.
This command is similar to the SET SCATTER PLOT MATRIX LIMITS command in that the list corresponds to the variables entered on the SCATTER PLOT MATRIX command. Only one set of subregion limits can be set for each variable. The default is that no subregion limits are set.
If BLANK, an empty plot is generated and the variable label is plotted in the center of the empty plot. If LINE, a PLOT Y1 Y1 is generated (this will simply be a 45 degree line, but it does give some indication of the univariate distribution of the variable). If HISTOGRAM, a relative histogram of the variable is generated. For the HISTOGRAM, the axis labels do not apply to the histogram plot. A relative histogram is drawn to make comparisons more meaningful. If BOXPLOT, a box plot of the variable is generated. The BOXPLOT only applies if the SET SCATTER PLOT MATRIX TAG ON command is entered. That is, the box plot is only used if there are groups in the data. For the box plot, the y axis limtis are valid, but the x axis limits are not. This command only applies if the scatter plot matrix plot type is PLOT, CROSS TABULATE, or DEX CONTOUR. The default is BLANK.
If OFF, the plots below the diagonal are omitted. If ON, the plots below the diagonal are drawn. The default is ON.
where
The appearance and location of the X2LABEL are controlled with the standard X2LABEL attribute setting commands. There are occassions where you may want to use the values computed in the X2LABEL for additional numeric computations. These values are automatically written to the file "dpst5f.dat". The values are printed in the order the plots are generated.
For example,
MULTIPLOT SCALE FACTOR 3 TIC OFFSET UNITS SCREEN TIC OFFSET 5 5 is a fairly typical set of commands commonly used with scatter plot matrices.
SET MATRIX PLOT is a synonym for SET SCATTER PLOT MATRIX.
du Toit, Steyn, and Stumpf (1986), "Graphical Exploratory Data Analysis," Springer-Verlang.
. A basic example of a scatter plot matrix skip 25 read iris.dat y1 y2 y3 y4 tag multiplot corner coordinates 10 10 90 90 multiplot scale factor 4 tic offset units screen tic offset 5 5 line blank blank blank character 1 2 3 set matrix plot tag on matrix plot y1 y2 y3 y4 tag move 50 95 justification center text Fisher Iris DataProgram 2: skip 25 read iris.dat y1 y2 y3 y4 tag . multiplot corner coordinates 10 10 90 90 multiplot scale factor 4 tic offset units screen tic offset 5 5 line blank blank blank character circle circle circle char hw 0.5 0.375 all char fill on on on char color blue red green set matrix plot tag on case asis justification center . . Basic usage . matrix plot y1 y2 y3 y4 tag move 50 97; text Basic Usage . . "BOX" labels . set scatter plot matrix labels box matrix plot y1 y2 y3 y4 tag move 50 97; text Box Option . . Options for axis labels . ylabel displacement 16 xlabel displacement 15 set scatter plot matrix diagonal boxplot set scatter plot matrix labels on set scatter plot matrix x axis bottom set scatter plot matrix y axis left matrix plot y1 y2 y3 y4 tag move 50 97; text Left/Bottom Axis Labels . set scatter plot matrix diagonal histogram set scatter plot matrix x axis alternate set scatter plot matrix y axis alternate matrix plot y1 y2 y3 y4 tag move 50 97; text Alternating Axis LabelsProgram 3: . Step 1: Read the data . skip 25 read exp.dat y1 read weibbury.dat y2 read gamma.dat y3 read frechet.dat y4 skip 0 . variable label y1 exp.dat variable label y2 weibbury.dat variable label y3 gamma.dat variable label y4 frechet.dat . . Step 2: Set some plot control features . case asis label case asis title case asis title offset 2 tic mark offset units screen tic mark offset 5 5 multiplot scale factor 4 multiplot corner coordinates 10 10 90 90 . line blank solid blank blank character circle blank blank blank character fill on character hw 0.5 0.375 all . . Step 3: Scatter plot matrix of quantile-quantile plot . label displacement 18 set scatter plot matrix type quantile quantile scatter plot matrix y1 y2 y3 y4 . justification center move 50 97 text Quantile-Quantile Plots of Four Data SetsProgram 4: . Step 1: Read the data . skip 25 read boxreact.dat y x1 to x5 let n = size y let k = 5 let string stat = Mean variable label X1 X1 (Feed Rate) variable label X2 X2 (Catalyst) variable label X3 X3 (Agitation Rate) variable label X4 X4 (Temperature) variable label X5 X5 (Concentration) . . Step 2: Define plot control settings . multiplot corner coordinates 5 5 95 90 multiplot scale factor 4 tic offset units screen tic offset 5 5 . xlimits -2 2 major x1tic mark number 5 minor x1tic mark number 0 x1tic mark label format alpha x1tic mark label content sp() -1 sp() 1 sp() x1tic mark offset 0.5 0.5 x1label displacement 115 x2label displacement 8 x2label justification left x2label offset -30 . probe cpumax let cpumax = probeval probe cpumin let cpumin = probeval let ymin = cpumax let ymax = cpumin loop for k = 1 1 k let ytemp = cross tabulate mean y x^k let ymint = minimum ytemp let ymaxt = maximum ytemp let ymin = min(ymint,ymin) let ymax = max(ymaxt,ymax) end of loop delete ytemp . set frame limits offset off let ytemp = data ymin ymax let ymin1 ymax1 = yframe ytemp set frame limits offset on let ymin2 ymax2 = yframe ytemp let xtemp = data -2 2 let xmin2 xmax2 = xframe xtemp delete xtempl ytemp1 ylimits ymin1 ymax1 . char circle blank char hw 1 0.75 char fill on line solid dashed . . Step 3: Define options for scatter plot matrix . y1label displacement 20 set scatter plot matrix type dex ^stat interaction set scatter plot matrix labels on set scatter plot matrix y axis off . set scatter plot matrix x axis top set scatter plot matrix xtic off set scatter plot matrix xtic labels off x1label displacement 15 set scatter plot matrix lower diagonal off set scatter plot matrix x2label filliben set scatter plot matrix shaded diagonal on subregion on subregion xlimits xmin2 xmax2 subregion ylimits ymin2 ymax2 region fill on region color g80 . . Step 4: Generate the plot . dex mean interaction effects plot y x1 to x5 . . Step 5: Add overall title, vertical axis title . just center move 50 97 text Chemical Reactor Efficiency (BHH) 2**5 (K=5, N=32) . direction vertical move 3 50 text Average Y direction horizontal
Date created: 06/05/2001 |
Last updated: 12/04/2023 Please email comments on this WWW page to alan.heckert@nist.gov. |