 Dataplot Vol 2 Vol 1

# SORT BY STATISTIC (LET)

Name:
SORT BY STATISTIC (LET)
Type:
Let Subcommand
Purpose:
Sort the values of a group variable based on the values of a user-specified statistic for each group.
Description:
For plots based on a response variable and a corresponding group-id variable, it is often desirable to generate the plot sorted by the value of some statistic for each group. For example, you may want to generate a box plot ordered from the smallest median to the largest median.

In most cases, the sorting statistic will be a location statistic such as the mean, median, minimum, or maximum. However, Dataplot supports 40+ statistics for the sorting statistic.

The SORT BY command is a utility command that simplifies the generation of these sorted plots. Specifically, it can be used in conjunction with the following types of plots:

1. BOX PLOT
2. <STATISTIC> PLOT
3. PLOT Y X TAG (i.e., scatter plot with replication)

Given a response variable, Y, and a group-id variable, X, the SORT BY command computes the value of a specified statistic for each group and returns the following two variables:

1. An index variable which is the ranking of the computed statistic for each group (if there are five groups in the data, the index variable will have five elements).

2. A sorted group-id variable, X2, that is used in place of the original X variable in subsequent plots.
Syntax:
LET <x2> <index> = SORT BY <stat> <y> <x>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x> is the group-id variable;
<stat> is one of the following statistics:
MEAN, MIDMEAN, MEDIAN, TRIMMED MEAN, WINSORIZED MEAN,
GEOMETRIC MEAN, HARMONIC MEAN, HODGES LEHMAN,
BIWEIGHT LOCATION,
SUM, PRODUCT, SIZE (or NUMBER or SIZE),
STANDARD DEVIATION, STANDARD DEVIATION OF MEAN,
VARIANCE, VARIANCE OF THE MEAN,
TRIMMED MEAN STANDARD ERROR,
IQ RANGE, BIWEIGHT MIDVARIANCE, BIWEIGHT SCALE,
PERCENTAGE BEND MIDVARIANCE,
WINSORIZED VARIANCE, WINSORIZED STANDARD DEVIATION,
RELATIVE STANDARD DEVIATION, RELATIVE VARIANCE,
COEFFICIENT OF VARIATION,
RANGE, MIDRANGE, MAXIMUM, MINIMUM, EXTREME,
LOWER HINGE, UPPER HINGE, LOWER QUARTILE, UPPER QUARTILE,
<FIRST/SECOND/THIRD/FOURTH/FIFTH/SIXTH/SEVENTH/EIGHTH/
NINTH/TENTH> DECILE,
PERCENTILE, QUANTILE, QUANTILE STANDARD ERROR,
SKEWNESS, KURTOSIS, NORMAL PPCC,
AUTOCORRELATION, AUTOCOVARIANCE,
SIN FREQUENCY, SIN AMPLITUDE,
CP, CPK, CNPK, CPM, CC,
EXPECTED LOSS, PERCENT DEFECTIVE,
TAGUCHI SN0 (or SN), TAGUCHI SN+ (or SNL),
TAGUCHI SN- (or SNS), TAGUCHI SN00 (or SN2);
<x2> is a variable where the sorted group-id values are stored;
<index> is a variable where the ranking of the statistic for each group are stored;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.
Examples:
LET X2 INDX = SORT BY MEAN Y X
LET X2 INDX = SORT BY MEIDAN Y X
LET X2 INDX = SORT BY SD Y X
LET X2 INDX = SORT BY MINIMUM Y X
LET X2 INDX = SORT BY IQ RANGE Y X
Note:
These plots often have alphabetic tick mark labels. The following enhancements were made to simplify the use of alphabetic tick mark labels with sorted plots.

• The TIC MARK LABEL FORMAT and TIC MARK LABEL CONTENT commands were previously augmented to allow numeric variables, group label variables, or the row label variable as the contents for the tick mark labels. Specifically,

LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB

LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG

X1TIC MARK LABEL FROMAT ROW LABELS

This has been enhanced to allow an index variable to be specified on the above TIC MARK LABEL CONTENT commands (the index variable is typically generated by a SORT BY <stat> command). The index variable specifies the order in which the tick mark labels will be generated.

So the above examples can be augmented by

LET X2 INDX = SORT BY MEAN Y X
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB INDX

LET X2 INDX = SORT BY MEAN Y X
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG INDX

LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FROMAT ROW LABELS
X1TIC MARK LABEL CONTENT INDX

Note that it is the values of INDX when the plot is generated, not when the TIC MARK LABEL CONTENT is entered, that will be used to sort the tick mark labels.

• The LET ... = GROUP LABEL .... command was augmented in the following two ways.

1. You can specify literal strings for group labels. For example,

LET IG = GROUP LABEL BATCHSP()1 BATCHSP()2 ...
BATCHSP()3 BATCHSP()4

The strings are separated by spaces. If you need to include a space in a particular string, use the SP() as in the above example.

2. Pre-defined strings can be used to define a group label variable. For example,

LET IG = GROUP LABEL ST1 TO ST10

where ST1, ST2, ...., ST10 are previously defined strings. The TO syntax is useful in this context when the number of strings is large.

Dataplot's algorithm for parsing the GROUP LABEL command is:

1. Dataplot first checks the character variables file (HELP SET CONVERT CHARACTER for details). If the first name listed is found, Dataplot uses this character variable to define the group labels.

2. If a character variable is not found, Dataplot checks all the listed names to see if they are previously defined strings. If they are, then Dataplot substitutes the values of these strings.

3. If one or more of the names is not a previously defined string, then Dataplot treats all of the names as literal text strings.
Default:
None
Synonyms:
None
Related Commands:
 PLOT = Generate a plot. BOX PLOT = Generate a box plot. STATISTIC PLOT = Generate a statistic versus subset plot. TIC MARK LABEL FORMAT = Specify the format for tic mark labels. TIC MARK LABEL CONTENT = Specify the content for alphabetic tic mark labels.
Applications:
Data Analysis
Implementation Date:
2006/1: Original Implementation
Program 1:
```
skip 25
.
let x2 indx = sort by median y x
let ig = group label Tinius1 Tinius2 Satec Tokyo
x1tic mark label case asis
x1tic mark label format group labels
x1tic mark label content ig indx
.
char box plot
line box plot
fences on
.
xlimits 1 4
major xtic mark number 4
minor xtic mark number 0
xtic offset 0.5 0.5
.
title case asis
title offset 2
title Charpy V-NIST Notch Testing
label case asis
x1label Machine Manufacturer
y1label Absorbed Energy
.
box plot y x2
```
Program 2:
```
set convert character on
skip 25
.
let ig = group label month
x1tic mark label format group label
let xcode = character code month
.
major xtic mark number 12
minor xtic mark number 0
xlimits 1 12
xtic offset 0.5 0.5
.
let xcode2 indx = sort by mean rank xcode
x1tic mark label content ig  indx
.
x1tic mark label size 1.5
tic mark label case asis
label case asis
title case asis
title displacement 2
x1label Month
y1label Draft Ranking
title Mean Plot Ordered by Mean
.
mean plot rank xcode2
``` Program 3:
```
skip 25
.
let ig = group label Tinius1 Tinius2 Satec Tokyo
.
char x blank
line blank dash
.
xlimits 1 4
major xtic mark number 4
minor xtic mark number 0
tic offset units screen
tic offset 5 5
.
title offset 2
multiplot corner coordinates 0 0 100 100
multiplot 2 2
multiplot scale factor 2
.
x1tic mark label format group label
x1tic mark label content ig
title Mean Plot (Unsorted)
mean plot y x
.
title SD Plot (Unsorted)
sd plot y x
.
title Mean Plot (Sorted by Mean)
let x2 indx = sort by mean y x
x1tic mark label content ig indx
mean plot y x2
.
title SD Plot (Sorted by SD)
let x2 indx = sort by sd y x
x1tic mark label content ig indx
sd plot y x2
.
end of multiplot
``` Program 4:
```
skip 25
.
char x all
line solid all
.
xlimits 1 10
major xtic mark number 10
minor xtic mark number 0
xtic offset 0.5 0.5
.
title offset 2
multiplot corner coordinates 0 0 100 100
multiplot 2 2
multiplot scale factor 2
.
title Original Data
plot y x x
.
title Sort by Mean
let x2 indx = sort by mean y x
x1tic mark label format variable
x1tic mark label content indx
plot y x2 x2
.
title Sort by Minimum
let x2 indx = sort by minimum y x
x1tic mark label content indx
plot y x2 x2
.
title Sort by SD
let x2 indx = sort by sd y x
x1tic mark label content indx
plot y x2 x2
.
end of multiplot
``` Date created: 1/26/2006
Last updated: 1/26/2006