|
BOX PLOTName:
The bottom x is the data minimum; the bottom of the box is the estimated 25% point; the middle x in the box is the data median; the top of the box is the estimated 75% point; the top x is the data maximum. The box plot has 24 components (characters and lines) which may be individually controlled. For the box plot to appear as it should, the BOX PLOT command is usually preceded by two commands--
LINES BOX PLOT which will automatically define proper values for the 24 components of the box plot. After the box plot is formed, the analyst should redefine plot characters and lines via the usual CHARACTERS and LINES commands.
where <y> is the response (= dependent) variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional. This syntax generates a single box. Note that <y> can also be a matrix argument. If <y> is a matrix, a single box is drawn for all the values in the matrix.
where <y> is the response (= dependent) variable; <x> is an independent variable; and where the <SUBSET/EXCEPT/FOR qualification> is optional.
<SUBSET/EXCEPT/FOR qualification> where <y1> ... <yk> is a list of response (= dependent) variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. Note that response variables can also be matrices. If a matrix name is encountered, a box will be drawn for all the values in the matrix.
<SUBSET/EXCEPT/FOR qualification> where <y> is the response (= dependent) variable; <x1> ... <xk> is a list of 1 to 6 group-id variables; and where the <SUBSET/EXCEPT/FOR qualification> is optional. The group-id variables are cross-tabulated and a box is drawn for each distinct combination of values for the group-id variables. These are sometimes referred to as nested box plots. For the REPLICATED case, you can control the spacing between groups. Internally, Dataplot uses the CODE CROSS TABULATE command to generate a single combined group-id variable. Enter HELP CODE CROSS TABULATE for details on the ordering of the cross-tabulation and on how to control the spacing (the SET commands used by CODE CROSS TABULATE are supported for the BOX PLOT command).
BOX PLOT Y X1 MULTIPLE BOX PLOT Y1 TO Y10 REPLICATED BOX PLOT Y X1 TO X4
If you want to generate fixed width box plots, enter the command
To restore variable width box plots, enter the command
This option may be useful for large data sets. If the FENCES switch is OFF, then the CHARACTER and LINE settings for traces 21 through 26 will be used to draw these percentiles. If the FENCES switch is ON, then the CHARACTER and LINE settings for traces 25 through 30 will be used to draw these percentiles. Currently, the LINES BOX PLOT and CHARACTER BOX PLOT commands do not set these. You can use something like the following to set these switches.
LET PLOT CHARACTER INDX = BLANK LET PLOT LINE INDX = SOLID
Dataplot will internally create a stacked Y X set of data. This means that Dataplot's limit on the maximum number of rows applies to the combined number of rows in the response variables. Dataplot was modified so that if there are four or fewer response variables, then Dataplot will not stack the data to generate the box plot. Although this has no effect on the appearance of the plot, it can be useful when generating box plots for large data sets in that it may avoid exceeding Dataplot's limit on the maximum number of rows.
Walker proposed the following alternative for the fences
\[ f_{U} = q_1 - 1.5 \mbox{ IQR } \frac{\mbox{SIQR}_{U}} {\mbox{SIQR}_{L}} \] where
This formulation is based on the Galton (or Bowley) formula for skewness
For a more complete explanation of this method, see the Walker paper. Kimber had earlier proposed
\[ f_{U} = q_3 + 1.5 (2(q_3 - q_2)) \] For skewed data, the Kimber method tends to be intermediate between the default method and the Walker method in the number of potential outliers it identifies. For symmetric data, the Kimber and Walker methods are essentially equivalent to the default method. However, for skewed data, the Kimber and Walker methods will identify fewer potential outliers than the default method. The above formulas are for the "inner fences" boundary. For the "outer fences" boundary, replace 1.5 with 3.0. To use the Walker method, enter the command
To use the Kimber method, enter the command
To reset the default method, enter
Note that using the Walker or Kimber methods is recommended when you are specifically interested in identifying outliers. For exploratory purposes, it may be preferrable to use the default method (i.e., showing the skewness may be desirable).
SET BOXPLOT FENCE SKEWNESS OFF and SET BOXPLOT FENCE SKEWNESS BOWLEY are synonyms for SET BOXPLOT FENCE SKEWNESS WALKER.
Walker, Dovedo, Chakraborti and Hilton (2019), "An Improved Boxplot for Univariate Data", The American Statistician, Vol. 72, No. 4, pp. 348-353. Kimber (1990), "Exploratory Data Analysis for Possibly Censored Data from Skewed Distribution", Applied Statistics, Vol. 39, pp. 21-30.
2002/3: Support for fixed width box plot 2010/6: Support for TO syntax and matrix arguments 2010/6: Support for MULTIPLE and REPLICATED options 2016/06: Sample size of one or all response values having the same value no longer treated as an error 2016/06: Support for the SET BOX PLOT EXTREME PERCENTILES 2016/06: For MULTIPLE option, four or fewer response variables not stacked internally 2019/08: Support for the SET BOXPLOT FENCE SKEWNESS command SKIP 25 READ GEAR.DAT Y X . TITLE CASE ASIS TITLE OFFSET 2 LABEL CASE ASIS TITLE Box Plot for GEAR.DAT Y1LABEL Gear Diameter X1LABEL Batch . TIC MARK OFFSET UNITS DATA XLIMITS 1 10 MAJOR XTIC MARK NUMBER 10 MINOR XTIC MARK NUMBER 0 XTIC MARK OFFSET 1 1 YTIC MARK OFFSET 0.002 0.002 . LINES BOX PLOT CHARACTER BOX PLOT CHARACTER FONT SIMPLEX ALL FENCES ON BOX PLOT Y XProgram 2: dimension 40 columns skip 25 read sheesley.dat y x1 to x5 let x1d = distinct x1 let x2d = distinct x2 . SET CODE CROSS TABULATE GROUP SIZE ONE 5 xlimits 0 8 xtic mark offset 0 1 major xtic mark number 9 x1tic mark label format alpha x1tic mark label content Shift 1 2cr()Weldingsp()Process=1 3 sp() sp() ... 1 2cr()Weldingsp()Process=2 3 . character box plot character font simplex all lines box plot fences on . box plot y x1 x2 . SET CODE CROSS TABULATE GROUP SIZE ONE 5 SET CODE CROSS TABULATE GROUP SIZE TWO 3 xlimits 0 26 xtic mark offset 1 0 major xtic mark number 27 set string space ignore let string s1 = 1cr()1 let string s2 = 2 let string s3 = sp() let string s4 = 1cr()2 let string s5 = 2cr()sp()cr()Weldingsp()Process=1 let string s6 = sp() let string s7 = 1cr()3 let string s8 = 2 let string s9 = sp() let string s10 = sp() let string s11 = sp() let string s12 = sp() let string s13 = sp() let string s14 = sp() let string s15 = sp() let string s16 = sp() let string s17 = 1cr()1 let string s18 = 2 let string s19 = sp() let string s20 = 1cr()2 let string s21 = 2cr()sp()cr()Weldingsp()Process=2 let string s22 = sp() let string s23 = 1cr()3 let string s24 = 2 let string s25 = sp() let string s26 = sp() let string s27 = Machinecr()Shift let igx = group label s1 to s27 . x1tic mark label format group label x1tic mark label content igx box plot y x1 x2 x3 . reset data skip 25 read iris.dat y1 y2 y3 y4 species let m = create matrix y1 y2 y3 y4 . xlimits 1 4 xtic mark offset 1 1 major xtic mark number 4 x1tic mark label format alpha x1tic mark label content Sepalcr()Length Sepalcr()Width ... Petalcr()Length Petalcr()Width multiple box plot m1 m2 m3 m4 . reset data let y1 = norm rand numb for i = 1 1 1000 let y2 = logistic rand numb for i = 1 1 1000 let y3 = double exponential rand numb for i = 1 1 1000 let y4 = slash rand numb for i = 1 1 1000 . xlimits 1 4 xtic mark offset 1 1 major xtic mark number 4 x1tic mark label format alpha x1tic mark label content Normal Logistic Laplace Slash Petalcr()Length Petalcr()Width set box plot extreme percentiles on . . Reset character/line settings above 20 . fences off loop for k = 21 1 26 let plot character ^k = blank let plot line ^k = solid end of loop . multiple box plot y1 y2 y3Program 3: . Step 1: Create data (skewed) . let nu = 1 let y = chisquare random numbers for i = 1 1 100 . . Step 2: Define plot control . character box plot line box plot fences on title case asis x1tic marks off x1tic mark labels off tic mark offset units screen y1tic mark offset 3 3 . . Step 3: Generate the box plots . multiplot 1 3 multiplot scale factor 1 3 title Default Box Plot box plot y set box plot fence skewness galton title Fences Based oncr()Semi-Interquartile Ranges box plot y set box plot fence skewness kimber title Fences Based oncr()Kimber Method box plot y . end of multiplot
Date created: 11/30/2010 |
Last updated: 12/04/2023 Please email comments on this WWW page to alan.heckert@nist.gov. |