20.
June 2021
This is the DATAPLOT News file DPNEWF.TEX. This NEWS file contains a
list of DATAPLOT enhancements over the last few years. This is
typically the only place that the most recent enhancements are
documented.
To get a hardcopy off-line listing of this file, exit DATAPLOT and
enter:
IBM PC: PRINT C:\DATAPLOT\DPNEWF.TEX
UNIX: lpr /usr/local/lib/dataplot/dpnewf.tex
other: Check with your local DATAPLOT installer;
at NIST: Alan Heckert (301-975-2899)
Jim Filliben (301-975-2855)
Your installation may define the directory where the DATAPLOT
auxillary files are stored differently than the list above.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
August 2023 - August 2023.
-----------------------------------------------------------------------
1. Made the following updates to Analysis commands.
a. Added the command
SET LINEAR RANK SUM TEST SCORE <WILCOXON/SAVAGE/CONOVER/MOOD/
VAN DER WAERDEN/ANSARI BRADLEY/
MEDIAN/KLOTZ>
TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
b. Added the command
FLIGNER POLICELLO TEST Y1 Y2
c. Added the command
SET PERMUTATION TEST SAMPLE SIZE <VALUE>
SET PERMUTATION TEST <DIFFERENCE/RATIO>
TWO SAMPLE <STAT> PERMUTATION TEST Y1 Y2
2. Added the following LET statistic commands
LET STATVAL = TWO SAMPLE LINEAR RANK SUM TEST Y1 Y2
LET STATCDF = TWO SAMPLE LINEAR RANK SUM TEST CDF Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM TEST PVALUE Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM LOWER TAIL TEST PVALUE Y1 Y2
LET PVALUE = TWO SAMPLE LINEAR RANK SUM UPPER TAIL TEST PVALUE Y1 Y2
LET STATVAL = FLIGNER POLICELLO TEST Y1 Y2
LET STATVAL = FLIGNER POLICELLO TEST CDF Y1 Y2
LET STATVAL = FLIGNER POLICELLO TEST PVALUE Y1 Y2
LET STATVAL = FLIGNER POLICELLO TEST LOWER TAILED PVALUE Y1 Y2
LET A = RATIO OF <STAT> Y1 Y2
where <STAT> is any supported statistic for a single response
variable.
3. Support for the clipboard functions (READ CLIPBOARD, WRITE CLIPBOARD,
LIST CLIPBOARD, CLEAR CLIPBOARD, RUN CLIPBOARD) was added for
Cygwin and MacOS. Under Cygwin, the clipboard contents are stored
in /dev/clipboard. For MacOS, clipboard functions are implemented
using the pbcopy and pbpaste commands.
4. Fixed a bug in the READ command when the file name is enclosed in
quotes. It was extracting the default variable names (COL1, COL2,
etc.) rather than the listed variable names.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
September 2021 - July 2023.
-----------------------------------------------------------------------
1. Added the following LET commands
LET Y = EXPONENTIAL ORDER STATISTIC MEDIANS FOR I = 1 1 N
LET Y = LOGISTIC ORDER STATISTIC MEDIANS FOR I = 1 1 N
LET Y = UNIFORM ORDER STATISTIC MEANS FOR I = 1 1 N
LET Y = NORMAL ORDER STATISTIC MEANS FOR I = 1 1 N
LET Y = EXPONENTIAL ORDER STATISTIC MEANS FOR I = 1 1 N
LET Y = UNIFORM ORDER STATISTIC STANDARD DEVIATIONS FOR I = 1 1 N
LET Y = NORMAL ORDER STATISTIC STANDARD DEVIATIONS FOR I = 1 1 N
LET Y = EXPONENTIAL ORDER STATISTIC STANDARD DEVIATIONS FOR I = 1 1 N
LET S = SAVAGE SCORES FOR I = 1 1 N
LET S = VAN DER WAERDEN SCORE Y
LET S = ANSARI BRADLEY SCORE Y
LET S = MOOD SCORE Y
LET S = MEDIAN SCORE Y
LET S = KLOTZ SCORE Y
LET S = CONOVER SCORE Y X
LET Y1P X2P = PLACEMENT SCORE Y1 Y2
LET GMD = GINI MEAN DIFFERENCE Y
LET GMDRATIO = GINI MEAN DIFFERENCE LOG RATIO Y1 Y2
2. Made the following updates to Analysis commands.
a. Added the linear pool method to the CONSENSUS MEANS and CONSENSUS
MEANS PLOT commands.
b. The F TEST command for equal standard deviations is known to be
quite senstive to departures from normality. Shoemaker and Bonett
have proposed modifications to make the test more robust. Options
have been added to use these modifications.
In addition, added the command
RATIO OF STANDARD DEVIATIONS CONFIDENCE LIMITS Y1 Y2
RATIO OF VARIANCES CONFIDENCE LIMITS Y1 Y2
These commands also support the Shoemaker and Bonett modifications.
c. Added the command
SIEGEL TUKEY TEST Y1 Y2
This is a two sample nonparametric test for equal standard deviations.
3. Made the following updates to graphics commands.
a. Added the command
SET EMPIRICAL CDF PLOT CONFIDENCE LIMITS <ON/OFF>
If set to ON, then the empirical cdf plot command will generate
approximate confidence intervals. If set to OFF (the default),
the confidence intervals will not be generated.
4. Added the command
DEVICE <1/2/3> HARDWARE CHARACTER OFFSET <horizontal> <vertical>
5. The LET ... = CREATE MATRIX command was updated to support the
" TO " syntax. So you can do something like the following
LET M = CREATE MATRIX X1 TO X50
LET M = CREATE MATRIX X4 X10 TO X19 X35
6. Fixed several bugs. In particular, made some tweaks to the
QUICKWIN driver and the CLIPBOARD functions to work with the
latest version of the Intel compiler under Windows.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
July 2021 - August 2021.
-----------------------------------------------------------------------
1. Modified the command
SET EXCEL SHEET <name>
so that <name> preserves the case. The <name>
argument is still restricted to 8 characters.
2. Added PROBE GD FONT as a synonym for PROBE GD FONT NAME and
PROBE SVG FONT as a synonym for PROBE SVG FONT NAME.
3. For the various FONT commands, added HARDWARE as a synonym for
TEKTRONIX to denote the use of hardware fonts (i.e., use the
fonts available on the specific graphics device). The use of
HARDWARE rather than TEKTRONIX makes the meaning more clear.
4. Previously for SUBSET clauses the value to the right of the
operator was restricted to being a number or a parameter. That
is, for
PLOT Y X SUBSET X > A
the name A was restricted to a parameter name (i.e., a single
value). This has now been extended to support variables. So
you can now do something like
LET N = 10
LET Y1 = NORMAL RANDOM NUMBERS FOR I = 1 1 N
LET Y1 = SORT Y1
LET Y2 = LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET Y2 = SORT Y2
LET TAG = 0 FOR I = 1 1 N
LET TAG = 1 SUBSET Y1 < Y2
5. If you enter a SIZE command, e.g.,
LET N = SIZE Y
and Y does not exist, N will be set to zero rather than
returning an error message.
6. Made the following updates to the STRING commands.
a. Updated the STRING COMPARE command to have options to
ignore the case and to specify how many characters in
the string to compare. Enter HELP STRING COMPARE for
details.
b. Added the command
LET SOUT = STRING REPEAT SIN COUNT
This creates the string SOUT by repeating COUNT times
the string SIN.
7. For linear fits, save the following as internal parameters
RESSS - the residual sum of squares
SSREG - the regression sum of squares
SSTOTAL - the total sum of squares
MSE - the mean square error
MSR - the mean square of the regression
FSTAT - the value of the F statistic
FCV95 - the 95% critical value for the F statistic
FCV99 - the 99% critical value for the F statistic
These values were previously printed in the ANOVA table written to
dpst5f.dat, but they were not saved to internal parameters.
8. Fixed a bug where multiple SUBSET/FOR/EXCEPT clauses were not
working on the FIT command.
Also fixed a bug for the case where a command line argument
(including the argument name) exceeded 80 characters.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2021 - June 2021.
-----------------------------------------------------------------------
1. Added the command
SET GRID ORDER <BEFORE/AFTER>
By default, Dataplot draws the lines, characters, bars and
spikes on a plot before drawing the frame, tic marks, tic mark
labels and grid lines. In some cases, you may want the lines,
characters, bars and spikes to be drawn on top of the grid lines.
To do this, specify BEFORE on this command. To reset the default,
specify AFTER for this command.
2. Corrected several issues with the CALL command and arguments.
a. Spaces around non-quoted equal signs are now ignored. That is,
CALL X=X2 Y=Y1
and
CALL X = X2 Y = Y1
are now equivalent. Previously, the spaces around the equal
sign resulted in an error.
b. Fixed a bug where
CALL X=X2 Y=Y1 STR="Z(13)"
stripped off the ending parenthesis for STR="Z(13)". This was
only an issue when the ending parenthesis occurs in the last
argument on the CALL command.
c. Fixed a bug with
CALL STTAG=Day STTAG2=Season
This was not performing the correct substitution for STTAG2.
d. Allowing spaces around the "=" for named arguments introduced
some issues with empty arguments. Empty arguments are most
likely to occur when using parameter/string substitution
(e.g., call sub.dp frame="^frame"). If the name indicated by
the "^" character is not a previously defined parameter or
string, then this results in an empty argument.
If named arguments are quoted, this should prevent any
ambiguity. For example,
call sub.dp x=x1 y=vial frame="^frame" tag = "day"
If no string or parameter called frame exists, this results in
call sub.dp x=x1 y=vial frame=" " tag = "day"
This does not cause an issue. Howver, if the argument is not
qouted
call sub.dp x=x1 y=vial frame=^frame tag = "day"
then we get
call sub.dp x=x1 y=vial frame= tag = "day"
Since spaces around the "=" are allowed, this results in an
ambiguous interpretation. That is, tag could be interpreted
as either the value for the frame argument or as a separate
named argument.
When Dataplot encounters something like this, it will look
ahead to the next unquoted equal sign. If there is a single
argument, then Dataplot assumes that this is a separate named
argument. If there is more than one argument, Dataplot will
process this from left to right in the standard way.
3. Fixed two issues with CONSENSUS MEANS command when there are a large
number of labs.
4. Corrected issues with setting the background color and the margin
color. Also corrected an issue with colors for the QuickWin device
(the green component was not being set correctly). Fixed issue with
RGB colors for SVG device.
5. Added the command
ALIAS <name> <string>
This can be used to create user defined shortcuts for commands.
For example,
alias v psview dppl1f.dat
So if you enter "v", the command "psview dppl1f.dat" will be
executed.
6. For the command
LET PLOT <command> <index> = <setting>
added support for RGB colors. Specifically, added
LET PLOT CHARACTER RGB COLOR <index> = <red> <green> <blue>
LET PLOT LINE RGB COLOR <index> = <red> <green> <blue>
LET PLOT SPIKE RGB COLOR <index> = <red> <green> <blue>
LET PLOT BAR RGB COLOR <index> = <red> <green> <blue>
LET PLOT BAR RGB COLOR <index> = <red> <green> <blue>
LET PLOT BAR BORDER RGB COLOR <index> = <red> <green> <blue>
LET PLOT BAR FILL RGB COLOR <index> = <red> <green> <blue>
LET PLOT BAR PATTERN RGB COLOR <index> = <red> <green> <blue>
LET PLOT REGION FILL RGB COLOR <index> = <red> <green> <blue>
LET PLOT REGION PATTERN RGB COLOR <index> = <red> <green> <blue>
In order to make the RGB color commands more useful, added the file
"rgb_color_schemes.txt" to the "help" sub-directory. This file
contains 271 color palettes. The first is due to Okabe and Ito and
defines a color palette that is useful for addressing color blindness.
The remaining palettes are the "ColorBrewer" palettes developed by
Dr. Cynthia Brewer of Penn State University. There are many color
palettes that have been proposed, but these palettes should provide
a good starting point for Dataplot applications. The files
"rgb_color_palettes.dp" and "rgb_color_palettes_labels.dp" were added
to the "programs" sub-directory. These files are used to generate
a Postscript file displaying the color palettes in
"rgb_color_schemes.txt".
7. Added the following library commands
LET YOUT = GT(Y1,Y2)
LET YOUT = GE(Y1,Y2)
LET YOUT = LT(Y1,Y2)
LET YOUT = LE(Y1,Y2)
These library functions perform "greater than", "greater than
or equal to", "less than" and "less than or equal to" comparisons.
Y1 is compared to Y2 and a 1 is returned for "true" and a 0 is
returned for "false".
EQ and EQUAL were added as synonyms for AGREE and NE and NOTEQUAL
were added as synonyms for DISAGREE.
8. Added confidence intervals for the lower and upper limit
parameters for the UNIFORM MLE command. Added the command
SET UNIFORM CONFIDENCE LIMIT <BOTH/UPPER/LOWER/NONE>
where UPPER specificies that only confidence intervals for the
upper limit will be generated, LOWER specifies that only confidence
intervals for the lower limit will be generated, BOTH specifies
that confidence intervals for both the lower and upper limits will
be generated and NONE specifies that no confidence intervals will
be generated. This command is useful to address cases where one
or both the limits may be bounded by a physical limit. For example,
in many cases, the uniform distribution will be bounded at zero.
The default is BOTH.
9. Added the command line switches "-gui" and "-nogui". Entering
this as a command line option allows the PROBE GUI command to
be used in the "dplogf.tex" file. The SET GUI command is not
entered by the Tcl/Tk scripts until after the "dplogf.tex" file
is run. This switch (specifically the "-gui" switch) is most
typically used by the Tcl/Tk scripts when initiating Dataplot.
The "-nogui" switch is the default and does not need to be
entered when running the command line version.
Several commands (most specifically, the LIST and HELP commands)
prompt if you want to continue after every n-th line (where n
is specified by the SET LIST LINES and SET HELP LINES). This
prompt causes the GUI to hang. Dataplot has been modified so
that if the GUI switch has been specified (either via the
"-gui" command line switch or the SET GUI ON command) this
prompt will not be issued. If you do not have the latest
version of the Dataplot executable, then it is recommended
that you enter SET HELP LINES 1000000 and SET LIST LINES 1000000
before using the HELP or LIST commands in the GUI.
10. For the READ EXCEL command, Pandas by default assumes that the
first row of the Excel file is a header row containing the variable
names. If the first row in fact contains data, you can now enter
the command
SET EXCEL HEADER NONE
To reset the default that the first row is a header row, enter
SET EXCEL HEADER ON
11. Fixed several miscellaneous bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
February 2021 - February 2021.
-----------------------------------------------------------------------
1. Made a few tweaks for MacOS.
a. Added MACOS switch to Makefile.mac.
b. Set the default for SET LIST LAUNCHER to
open -t
and the default for SET LIST VIEWER to blank. So LIST
will use (when SET LIST WINDOW NEW ON is set).
open -t file-name
This is equivalent to entering
SET LIST LAUNCHER "open -t"
SET LIST VIEWER
This opens the specified file using the default editor
(typically TextEdit) on your system.
c. Allow empty argument for SET LIST VIEWER. This is needed
for systems where the SET LIST LAUNCHER does not require
specification of the editor.
d. Updated the SET BROWSER command to extract to the end of the
command line without quotes. This allows the following to be
entered
SET BROWSER open -a safari
SET BROWSER open -a firefox
SET BROWSER open -a "google chrome"
The following aliases will be recognized
SET BROWSER safari
SET BROWSER firefox
SET BROWSER chrome
These will automatically add the "open -a" (and the
"google chrome" for chrome).
Note that using the default, SET BROWSER open, will use the
default browser on your platform. So using a SET BROWSER is
only needed if you want to override the use of the default
browser.
e. Fixed a bug in the AQUA driver when generating histograms.
2. Made a few minor bug fixes.
-----------------------------------------------------------------------
The following enhancements were made to Dataplot
December 2020 - January 2021.
-----------------------------------------------------------------------
1. The SAVE MEMORY and RESTORE MEMORY commands were updated to
include many new settings.
2. You can save a specified list of variables/parameters to a
file with the command
SAVE VARIABLES <file-name> <var-list>
If no file name is given, "dpsavf.tex" will be used. If
no variable list is given, all currently defined variables
and parameters will be saved.
This variable list can be restored with the command
RESTORE VARIABLES <file-name>
If no file name is given, "dpsavf.tex" is used. If you
have currently defined variables/parameters with the same
name, they will be overwritten. Currently defined variables
and parameters that are not in the save file will not be
changed.
3. Added the command
SET ERROR MESSAGE <ON/OFF>
If this switch is set to OFF, certain error messages may not
be printed. Currently this only applies to a few commands
(specifically FCDF, FPDF, FPPF), but the list of supported commands
should increase in subsequent releases.
4. The "dplogf.tex" file was updated for Linux/MacOS systems to be more
consistent with the Windows version. The Makefile for MacOS was
updated to support Catalina.
5. Fixed several bugs. Specifically
a. Fixed a bug with named arguments on the CALL command.
b. Fixed a bug with color Postscript output (this was introduced
with full RGB color support for Postscript) where the second
and later graphs generated a black background by default.
c. Fixed a bug in the STRING COMPARE AND REPLACE command (used
by the DEX 10-step macros) when no match is found and the
feedback switch is off. A few other minor tweaks were made
in support of the 10-step macros (both in the macros and in
the Dataplot source code).
d. A few minor changes were made to remove warning messages for
version 10 of the gcc/gfortran compiler. These changes should
not have any effect on Dataplot usage.
e. A few other minor bugs were also fixed.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
July 2020 - November 2020.
-----------------------------------------------------------------------
1) Dataplot's color model was based on the X11 Release 3 color
model. A few additional color names were added based on
Release 4.
This update makes a major update to Dataplot's color model.
Specifically,
a) The list of named colors was extended to be compatible
with current releases of X11. The number of named colors
was increased from 89 to 163. The updated color list can
be found at
https://www.itl.nist.gov/div898/software/dataplot/refman1/ch11/homepage.htm
b) For devices that support it, you can now specify RGB colors.
Dataplot still supports devices that do not have full RGB
support. To handle this, Dataplot first checks if the
device supports RGB color. If so, and if an RGB color has
been specified, then Dataplot uses the RGB color setting.
If the device does not support RGB color or if no RGB color
has been specified, then Dataplot uses the standard color
setting. For example,
LINE COLOR RED
LINE RGB COLOR 211 11 88
If a device does not support RGB color, it will use RED and
if it does support RGB color it will use the (211,11,88)
RGB settings for the color.
For each COLOR command in Dataplot, there is now a
RGB COLOR variant. An RGB color is specified as as set
of three numbers (or parameters). RGB values are given
in the range 0 to 255. To turn off the RGB color, use
"-1 -1 -1" (any negative values will have the same effect).
2) Made the following updates to string functions.
a) The LET ... = STRING SPLIT command now saves the number of new
strings generated in the parameter NUMBWORD.
b) The LET ... = NUMBER OF WORDS command now supports the
SET WORD DELIMITER option. Previously, words were delineated by
spaces (or any non-printing character). The SET WORD DELIMITER
option lets you specificy the character that will be treated as
the delimiter. For example, SET WORD DELIMITER , will treat
the "," as the word delimiter.
c) Added the command
LET IFLAG = IS NUMBER STR
where STR is a previously defined string. This command
returns a value of 1 to IFLAG if STR can be interpreted as a
number and a value of 0 if it cannot.
3) Many data files contain the variable names as the first line of the
file. Dataplot has the ability to extract these variable names by
doing something like
SET READ VARIABLE LABEL ON
READ FILE.DAT
A few tweaks were added to this capability.
a) Previously, only the first 255 characters of the variable names
line were read. This has been updated to support the number
of characters specified by the MAXIMUM RECORD LENGTH command.
b) Dataplot will now automatically strip spaces and other special
characters out of the variable names. Specifically, only
alphabetic characters (A-Z), numbers (0-9), and underscores are
retained.
c) Dataplot only supports eight characters for variable names. This
can lead to duplicate file names. To reduce the possibility of
duplicate names, Dataplot does the following if a duplicate name
is found.
i) If the name has less than eight characters, a "Z" is appended
to the end of one of the names. The right most name will
be modified.
ii) If the name has eight characters exactly, the right most
name will change the last character to a Z (or if that
character is already a Z, then to a X).
If blank names are encountered, these will be changed to
Zxxx where "xxx" is a sequence number (i.e., if there are three
blank names encountered, they wiil be set to Z1, Z2, and Z3).
4) The Dataplot GUI is somewhat out of date. We will be updating the
GUI in several stages.
a) The first stage is to update the contents of the menu. This
update has started, but not completed, that process.
b) After the menu contents are updated, we will begin revising
the Tcl/Tk scripts.
5) Added the command
SET CHARACTER TABULATION PLOT DIGITS <value>
6) Added the commands
HALF NORMAL MAXIMUM LIKELIHOOD Y
HALF LOGISTIC MAXIMUM LIKELIHOOD Y
Note that the half-logistic case actually computes a method
of moments estimate.
7) A number of bug fixes were made.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
May 2020 - June 2020.
-----------------------------------------------------------------------
1) The Dataplot web site now supports installation of ".rpm" files
for CentOS 7, CentOS 8, Fedora 30, Fedora 31 and Fedora 32.
2) Added the following graphics commands.
a) Added the following plot
TOTAL TIME ON TEST PLOT Y CENSOR
3) Added the following Math LET sub-commands.
LET Y = TOTAL TIME ON TEST X CENSOR
LET Y = SCALED TOTAL TIME ON TEST X CENSOR
4) Corrected an issue where commas inside a character field enclosed
in quotes were still being interpreted as field delimiters. Commas
inside a quoted field are now treated as part of the character
field and not a delimiter.
5) Made a number of corrections to address warning messages generated
by version 10 of the gfortran/gcc compilers (Fedora 32).
6) Support for accessing the clipboard has been extended to Linux
(previously this was only supported on Windows platforms). On
Linux, access to the clipboard is through the "xclip" utility.
Note that xclip is not installed by default on most Linux systems,
so you may need to install it for CLIPBOARD commands to work under
Linux. The following CLIPBOARD commands are currently supported
LIST CLIPBOARD
COPY CLIPBOARD <file>
COPY i<file> CLIPBOARD
CLEAR CLIPBOARD
READ CLIPBOARD <var-list&>
WRITE CLIPBOARD <var-list>
CALL CLIPBOARD <file>
7) Fixed a bug in the PYTHON/R command. Fixed a bug when entering the
name of a PDF file as a command.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2020 - April 2020.
-----------------------------------------------------------------------
1) The SET SEARCH DIRECTORY command has been enhanced to allow five
additional directories that can added to Dataplot's list of directories
to search for file names.
SET SEARCH2 DIRECTORY <directory-name>
SET SEARCH3 DIRECTORY <directory-name>
SET SEARCH4 DIRECTORY <directory-name>
SET SEARCH5 DIRECTORY <directory-name>
SET SEARCH6 DIRECTORY >directory-name>
2) Added the command
OUTPUT <name>
This command executes the following command
DEVICE 2 CLOSE
SET IPL1NA <name>ps
DEVICE 2 POSTSCRIPT
3) Added the command
SET WRITE CSV <ON/OFF>
If this is set to ON, the WRITE command will use commas rather
than spaces as the delimiter between fields. This is primarily
intended for importing into other programs that may require
comma separated variables (CSV) for ASCII files.
4) For the commands BEST DISTRIBUTIONAL FIT and DISTRIBUTIONAL FIT PLOT,
you can now specify which distributions to include. Enter
HELP BEST DISTRIBUTIONAL FIT command for details.
5) The READ EXCEL command was updated so that you can specify the
first and last rows of the Excel file to read.
SET EXCEL START ROW <value>
SET EXCEL STOP ROW <value>
One use of this is to skip over header lines in the Excel file.
6) The maximum number of characters for the command line was
increased from 255 to 1024. The maximum number of characters
for a file name was increased from 80 to 256.
7) A large number of changes were made to remove warning messages
when using a more strigent level of warning messages for the
gfortran and Intel compilers. These changes do not change
any Dataplot commands, but it did correct several potential
bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
September 2019 - February 2020.
-----------------------------------------------------------------------
1) The following operating system dependent commands were added
RM <file-list> - remove one or more files
RMDIR <file-list> - remove one or more directories
MKDIR <file-list> - make a new directory
CAT <file> - list the contents of a file
DIR <file-list> - list files
GREP <string> <file-list> - perform an operating system based
matching of a string to one or more
files
RSCRIPT <file-name> - run an R script
PYTHON <file-name> - run a Python script
Enter HELP RM, HELP MKDIR, HELP CAT, HELP DIR, HELP GREP,
HELP RSCRIPT or HELP PYTHON for details.
2) The following updates were made to the READ command.
a) Added the commands
WRITE EXCEL <excel-file> <var-list>
READ EXCEL <excel-file> <var-list>
Note that these commands invoke Python scripts to read/write the
Excel files. So a requirement for using these commands is that
version 3.x of Python is already installed on your system and that
the Pandas and xlsxwriter packages are also installed.
For additional information on using these commands, enter
HELP WRITE EXCEL or HELP READ EXCEL.
b) Reading character data from the terminal is now permitted.
For example,
SET CONVERT CHARACTER ON
READ IMONTH VALUE
January 21205
February 19867
March 24991
April 16523
May 17341
June 27912
July 29105
August 28766
September 23332
October 20211
November 18298
December 13112
END OF DATA
c) When reading a data file that contains character variables,
Dataplot saves the character data to the file "dpzchf.dat".
If a subsequent READ command also contains character variables,
the contents of dpzchf.dat are overwritten.
To specify that new character variables should be appended to
the current dpzchf.dat file, enter the command
SET CHARACTER VARIABLE APPEND
To reset the default, enter
SET CHARACTER VARIABLE OVERWRITE
If you specify APPEND, it is recommended that you delete the
current dpzchf.dat file at the beginning of the Dataplot session
with the command
RM dpzchf.dat
d) Some ASCII files may include a percent sign at the end of a
numeric field to indicate the value is a percentage. To ignore
these percent signs, enter the command
SET READ PERCENT SIGN IGNORE ON
e) For the READ CLIPBOARD command on Windows platforms, made two
corrections.
i. If the last column contained an empty field, this missing
value was not added to the data read. This caused a
misalingment of the data. This was corrected.
ii. If the number of variables on the READ command did not
match the number of columns in the clipboard, this caused a
misalignment of the data. This has been corrected.
If the number of variables on the READ CLIPBOARD command is
greater than the number of columns in the clipboard, the
extra columns will be set to the missing value. This value
can be specified with the command
SET READ MISSING VALUE
3) Added the following graphics commands.
a) Added the following option to the I PLOT command
RATIO OF MEANS CONFIDENCE LIMIT PLOT Y1 Y2 X
4) Added the following analysis commands.
a) Generate a confidence interval for the ratio of two means
(i.e., E(Y)/E(X)) for the case of paired data where both X and Y
are approximately normal
RATIO OF MEANS CONFIDENCE LIMITS Y X
Note that numerous methods have been proposed for this problem.
Dataplot currently supports the Fieller method, the large sample
approximation method, and the log ratio method. To specify which
method is used, enter
SET RATIO OF MEANS METHOD FIELLER
SET RATIO OF MEANS METHOD LARGE SAMPLE
SET RATIO OF MEANS METHOD LOG RATIO
The default is the Fieller method.
b) Added the following outlier commands
DAVID TEST Y
SKEWNESS OUTLIER TEST Y
KURTOSIS OUTLIER TEST Y
In addition, the GRUBS TEST now supports the case where the
standard deviation is available from previous data. For
this case, enter the commands
SET GRUBB STANDARD DEVIATION
SET GRUBB DEGREES OF FREEDOM
If the specified standard deviation is positive, Dataplot
uses the formulas based on an independent estimate of the
standard deviation. The independent standard deviation also
has an associated degrees of freedom (typically the sample
size used to compute that standard deviation). If the
degrees of freedom is not specified, a value of 10,000 will
be used. Essentially, any value greater than 120 is
effectively treated as a "known" standard deviation.
These new outlier commands were added to support the
2016 edition of the ASTM-178 standard for outliers.
5) Added the following Statistics LET sub-commands.
LET A = RATIO OF MEANS Y X
LET A = RATIO OF MEANS LOWER CONFIDENCE LIMIT Y X
LET A = RATIO OF MEANS UPPER CONFIDENCE LIMIT Y X
LET A = DAVID TEST Y
LET A = DAVID TEST CDF Y
LET A = DAVID TEST PVALUE Y
LET A = DAVID TEST MINIMUM INDEX Y
LET A = DAVID TEST MAXIMUM INDEX Y
LET A = SKEWNESS OUTLIER TEST Y
LET A = SKEWNESS OUTLIER TEST CDF Y
LET A = SKEWNESS OUTLIER TEST PVALUE Y
LET A = SKEWNESS OUTLIER TEST CRITICAL VALUE Y
LET A = SKEWNESS OUTLIER TEST INDEX Y
LET A = KURTOSIS OUTLIER TEST Y
LET A = KURTOSIS OUTLIER TEST CDF Y
LET A = KURTOSIS OUTLIER TEST PVALUE Y
LET A = KURTOSIS OUTLIER TEST CRITICAL VALUE Y
LET A = KURTOSIS OUTLIER TEST INDEX Y
6) Added the following Math LET sub-commands.
LET Y3 = INSERT Y1 Y2 NLOC
7) Added the following commands for strings.
a) The LET ... = STRING COMBINE ... command is used to concatenate
two or more strings. By default, this command puts a space
between the concatenated strings. To specify a different
separator character, enter the command
SET STRING COMBINE SEPARATOR <string>
b) Added the command
let ix = string variable s1 s3 s3
where s1, s2 and s3 are previously defined strings. This adds
"ix" to the list of character variables in the character variable
file dpzchf.dat.
8) Added the following miscellanous commands.
a) Added the following options to the LIST command
LIST HEAD <file>
LIST TAIL <file>
LIST HEAD will list the first 10 lines of the file and LIST TAIL
will list the last 10 lines of the file.
To modify the number of lines the HEAD and TAIL options print,
enter
SET HEAD LINES <value>
SET TAIL LINES <value>
Also added the option
LIST NEW WINDOW <file>
This will open up the file in a new window. For Windows, the
default application is Wordpad. For Linux, a gnome terminal window
will be opened and the file will be displayed in the vi editor.
To change the default application, enter
SET LIST VIEWER <name>
Note that Dataplot does no error checking on this name. If
an invalid application is given, the file will not be displayed.
To make opening a new window the default for the LIST command,
enter
SET LIST NEW WINDOW ON
To reset the default of listing the contents in the Dataplot
window, enter
SET LIST NEW WINDOW OFF
Similarly, you can open the output from the HELP command in a
new window by entering
SET HELP NEW WINDOW ON
To reset the default enter
SET HELP NEW WINDOW OFF
For Linux systems, you can specify which command is used
to launch the new window (for either the LIST or HELP
command) with the command
SET LIST LAUNCHER "gnome-terminal -e"
This is the default. Other choices include
SET LIST LAUNCHER "xterm -e"
SET LIST LAUNCHER "konsole -e"
There are many desktops available on various types of Linux
systems and each of these desktops may have its own
command for initiating a new terminal window. We have
explicitly tested xterm, konsole, and gnome-terminal on
CentOS and Fedora. However, depending on what desktop you
use, these options may not be available on your local system.
In addition to LIST NEW WINDOW, the following are also available
LIST EXCEL <file-name>
LIST WORD <file-name>
LIST POWER POINT <file-name>
On Windows systems, the default is equivalent to entering
the following command
SYSTEM "<file-name>"
On Linux systems, the default is the equivalent to entering
SYSTEM xdg-open "<file-name>"
On MacOS systems, the default is the equivalent to entering
SYSTEM open "<file-name>"
The "xdg-open" and "open" commands under Linux and MacOS
will select the application based on the file name extension.
Typically there will be a file name association defined for
many file name extensions. However, there may be file name
extensions for which no association has been defined. In this
case, or if you simply want to be explicit about what application
to use, you can specify which application will be used with the
following commands
SET EXCEL VIEWER "<application-name>"
SET WORD VIEWER "<application-name>"
SET POWER POINT VIEWER "<application-name>"
Dataplot does not error checking to see if <application-name>
is in fact installed on your system.
For example, to explicitly use libreoffice applications
under Linux, you could enter
SET EXCEL VIEWER "libreoffice --calc"
SET WORD VIEWER "libreoffice --writer"
SET POWER POINT VIEWER "libreoffice --impress"
These commands are not restricted to Microsoft Office
applications.
A few comments on this.
i. Dataplot does not check the file name extension.
You need to explicitly use LIST EXCEL, LIST WORD,
or LIST POWER POINT to invoke the application.
ii. Once the application is invoked, control returns
to the Dataplot window. So you can view the
spreadsheet or document while still entering Dataplot
commands.
iii. There are a large number of spreadsheet and word
processing programs each which tends to have their
own file extensions. Although the Microsoft extensions
(.xls, .xlsx, .doc, .docx, .ppt, .pptx) are likely to
have file associations defined on most systems, this is
less likely to be true for other spreadsheet or word
processing programs. In this case, you can either
create the file association or use the SET EXCEL VIEWER,
SET WORD VIEWER, or SET POWER POINT VIEWER commands to
specify the desired application.
b) The PSVIEW command was updated to view PDF and image files
in addition to Postscript files. Enter HELP PSVIEW for
details.
c) Added the following commands
HEAD <var-list>
TAIL <var-list>
HEAD and TAIL are synonyms for the WRITE command. However, only
the first (or last) 10 lines will be printed.
To modify the number of lines the HEAD and TAIL options print,
enter
SET HEAD LINES <value>
SET TAIL LINES <value>
d) Added the following command
SET OUTPUT LINE NUMBERS <ON/OFF>
If this switch is ON, the alphanumeric output will
contain line numbers.
e) Fixed a bug where the ONE SAMPLE T TEST was not
interpreted correctly.
f) Corrected several of the random number generators to
properly reset the sequence when a new SEED value is
entered.
g) If you enter a command without arguments and the command
is not matched, then Dataplot will try adding PRINT to the
beginning of the command. That is, the command
Y
willl be interpreted as
PRINT Y
However, be aware that a check is made to see if it is
a legitimate command first. For example, R, REPEAT, X,
S, SAVE, L, and LIST are valid commands without arguments.
So if you have a variable called R, you need to enter
PRINT R rather than just R.
h) The TIC MARK LABEL FORMAT VARIABLE option now supports
character variables as well as numeric variables.
i) The following command returns the current value of the
seed for various random number generators
PROBE SEED
j) The following commands return the values for certain
operating system defined enviornment variables.
PROBE HOME (or PROBE USER PROFILE)
PROBE USER (or PROBE USER NAME)
PROBE HOST (or PROBE HOST NAME or PROBE COMPUTER NAME)
PROBE DEFAULT PRINTER (Linux only)
PROBE PROGRAM FILES X86 (64-bit Windows only)
PROBE PROGRAM FILES (Windows only)
PROBE WINDOW BITS (Windows only)
Specifically, PROBE HOME returns the user's home directory,
PROBE USER returns the user name, PROBE HOST returns the computer
name, PROBE DEFAULT PRINTER returns the name of the default
printer, PROBE PROGRAM FILES X86 returns the location of
64-bit applications under Windows, PROBE PROGRAM FILES returns
the location of 32-bit applications under Windows, and
PROBE WINDOW BITS returns "32" if you are running on a
32-bit machine and "64" if you are running on a 64-bit
machine.
Remember that you can define a string after the PROBE. For
example,
PROBE USER
LET STRING USERNAME = PROBESTR
k) To include your home directory in the list of directories that
are searched when looking for a file enter
SET HOME PATH ON
To reset the default of not including your home directory, enter
SET HOME PATH OFF
To see what your home directory is, enter
PROBE HOME
l) Previously if a command is recognized as a file name, Dataplot
would interpret this as a CALL command. For example
test.dp
would be equivalent to
call test.dp
This has been expanded to recognize certain file extensions.
Specifically, if <file-name> is the command
i) If the file has a ".ps", ".PS", ".eps" or ".EPS" extension,
the following will be done
PSVIEW <file-name>
This command will view the file using the Postscript
viewer. The SET POSTSCRIPT VIEWER command can be used to
specify what application will be used to view the
Postscript file.
ii) If the file has a ".pdf" or ".PDF" extension, the following
will be done
PSVIEW <file-name>
This command will view the file using the PDF viewer. The
SET PDF VIEWER command can be used to specify what
application will be used to view the PDF file.
iii) If the file has a ".jpg", ".JPG", ".jpeg", ".JPEG", ".png",
".PNG", ".gif", ".GIF", ".tif", ".TIF", ".tiff", or ".TIFF"
extension, the following will be done
PSVIEW <file-name>
This command will view the file using the image viewer. The
SET IMAGE VIEWER command can be used to specify what
application will be used to view the image file.
iv) If the file has a ".dat", ".DAT", ".csv", ".CSV", ".out",
or ".OUT" extension, the following will be done
LIST NEW WINDOW <file-name>
The command SET LIST VIEWER can be used to specify what
application is used to view ASCII files.
v) If the file has a ".xls", ".XLS", ".xlsx", or ".XLSX"
extension, the following will be done
LIST EXCEL <file-name>
If the file has a ".doc", ".DOC", ".docx", or ".DOCX"
extension, the following will be done
LIST WORD <file-name>
If the file has a ".ppt", ".PPT", ".pptx", or ".PPTX"
extension, the following will be done
LIST POWER POINT <file-name>
The following commands can be used to set the viewers
for these types of files
SET EXCEL VIEWER
SET WORD VIEWER
SET POWER POINT VIEWER
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
May 2019 - August 2019.
-----------------------------------------------------------------------
1) The Dataplot source code is now available on the following
github site
https://github.com/usnistgov/dataplot
The build from source code has been simplified for Linux and
MacOS systems.
2) Numerous changes were made to make more efficient use of scratch
space. This allowed us to increase the default maximum number of
rows from 1,500,000 to 2,000,000.
3) Made the following changes to the Graphics commands.
a. For the CONSENSUS MEANS PLOT, the default is to put the method
results on the left side of the plot and the lab data results
on the right side of the plot. To reverse this (i.e., data
results first, then method results), enter the command
SET CONSENSUS MEAN PLOT DATA LEFT
You can also specify RIGHT, but this is equivalent to the
default of ON.
b. Added the following option for the BOX PLOT command
SET BOXPLOT FENCE SKEWNESS <OFF/GALTON/KIMBER>
For skewed datasets, the default box plot algorithm for
identifying outliers may identify an excessive number of
outliers when the FENCES switch is turned on. Several
authors have suggested alternative algorithms to address
this. Enter HELP BOX PLOT for details.
4) Made the following changes to the Analysis commands.
a. Added the following option for the ONE SAMPLE PROFICIENCY TEST
command
SET ONE SAMPLE PROFICIENCY TEST IDENTIFY LAB
<DEFAULT/UNUSUAL/EXTREMELY UNUSUAL>
For Table 2, if there are multiple labs with the same values,
only one lab-id is given (there is a column for the number of
occurrences for that value). For outliers, it can be useful
to identify all the labs. If UNUSUAL is specified, all lab-id's
are given for the Unusual and Extremely Unusual categories. If
EXTREMELY UNUSUAL is specified, all lab-id's are given for the
Extremeley Unusual category (but not the Unusual category).
5) Added the following Math LET sub-commands.
LET Y = BREAK LOCATIONS X
LET Y = FRAGMENT LOCATIONS X
LET Y = FRAGMENT LENGTHS X
LET Y1 Y2 = 2D GRIDDED X1 X2
LET Y1 Y2 Y3 = 3D GRIDDED X1 X2 X3
LET Y1 Y2 Y3 Y4 = 4D GRIDDED X1 X2 X3 X4
6) Added the following Statistics LET sub-commands.
LET A = LOWER SEMI INTERQUARTILE RANGE Y
LET A = UPPER SEMI INTERQUARTILE RANGE Y
LET A = KENDALLS TAU A Y1 Y2
LET A = KENDALLS TAU B Y1 Y2
LET A = KENDALLS TAU C Y1 Y2
LET A = YULES Y Y1 Y2
LET A = CORRELATION RATIO Y X
LET A = INTRACLASS CORRELATION RATIO Y X
7) Miscellaneous Updates
a. Added the following option to the SET POSTSCRIPT CONVERT
command
SET POSTSCRIPT CONVERT PS2PDF
The SET POSTSCRIPT CONVERT command is used to automatically
convert Dataplot's Postscript output to another format (most
commonly PDF, but not limited to this). Previously, this
command supported "GHOSTSCRIPT" and "CONVERT". GHOSTSCRIPT
uses Ghostscript to perform the conversion and CONVERT uses
the "convert" command from ImageMagick.
The PS2PDF option uses the "ps2pdf" script that is installed
with Ghostscript.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
April 2019 - April (18) 2019.
-----------------------------------------------------------------------
1) Made the following changes to the Analysis commands.
i. For linear fits (i.e., FIT Y X, QUADRATIC FIT Y X, and so on), a
header line is now written to the dpst1f.dat, dpst2f.dat, and
dpst4f.dat files (dpst3f.dat already had a header line).
For non-linear fits, a header line is now written to the dpst1f.dat
and dpst2f.dat files.
2) I/O updates
i. If you are reading a space delimited file in free format mode
and a row of the file has fewer values than expected, the
default behavior was modified. Previously, the number of
variables read would be equal to the minimum number of values
for a row. For example, if you entered
READ FILE.DAT Y X1 X2 X3
and a row was encountered that had only two values, then only
Y and X1 would be read. This was changed so that when the number
of values for a row is less than the expected number of values,
the missing value number (as specified by the SET READ MISSING
VALUE command) will be inserted. So in the above example, the
two values will be inserted into Y and X1 while X2 and X3 will be
assinged the missing value code.
Note that this can still be problematic if the missing fields are
not the end columns (values will be assigned to variables in the
order specified). This is the reason for printing the warning
message.
This new behavior is equivalent to
SET READ PAD MISSING COLUMNS ON
The one difference is that SET READ PAD MISSING COLUMNS ON will
not print the warning message.
ii. Added the following SET command
SET READ ASTERISK IGNORE <ON/OFF>
You may on occassion need to read a data file where an asterisk
("*") is used to identify certain points. For example, this may
indicate a statistic that is above a critical value or it may
be used to identify points that are of particular interest. In
this case, you can set the value to ON to ignore leading or
trailing asterisks.
iii. Fixed a bug with the vector form of the COLUMN LIMITS command.
3) Miscellaneous
i. Added the following command
SET AUXILIARY FILES DECIMAL POINTS <value>
Certain commands write information to the files dpst1f.dat,
dpst2f.dat, dpst3f.dat, dpst4f.dat and dpst5f.dat. Typically,
numbers are written in E15.7 format. This command allows you to
specify the number of significant digits to use (currently,
between 1 and 15 are allowed). For example, if you enter
SET AUXILIARY FILES DECIMAL POINTS 9
then an E17.9 format will be used.
This option will be implemented incrementally. The initial
implementation was for the fit commands, but support for
additional commands will be added in an arbitraty order.
Currently the following commands support this option
FIT, ORTHOGONAL DISTANCE FIT, ARMA (for dpst1f.dat, but not
dpst2f.dat), ODDS RATIO CHI-SQUARE TEST, ODDS RATIO
INDEPENDENCE TEST, OPTIMIZE, PAGE TEST, CAPABILITY ANALYSIS,
and CHI-SQUARE INDEPENDENCE TEST
Output to these files that does not use E15.7 format will not
use this option.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
August 2018 - March 2019.
-----------------------------------------------------------------------
1) Added the following graphics commands
CLASSIFICATION SCATTER PLOT Y X1 ... XK
CLASSIFICATION <stat> PLOT Y X1 ... XK
For the DEX <stat> PLOT, added the command
SET DEX STATISTIC PLOT INTERACTION <NONE/2/3>
By default, the DEX <stat> PLOT command shows the main effects.
If you specify "2" for this command, all 2-term interactions will
be added to the plot and if you specify "3", all 2-term and 3-term
interactions will be added to the plot.
This option is only intended for 2-level full and fractional
factorial designs.
2) Added the following analysis commands
HEDGES G CONFIDENCE LIMIT Y1 Y2
3) Added the following LET statistic subcommands
LET A = HEDGES G STANDARD ERROR Y1 Y2
LET A = HEDGES G LOWER CONFIDENCE LIMIT Y1 Y2
LET A = HEDGES G UPPER CONFIDENCE LIMIT Y1 Y2
LET A = HAMMING DISTANCE Y1 Y2
LET A = CANBERRA DISTANCE Y1 Y2
LET A = GROUPED CORRELATION Y1 Y2 W
LET A = WEIGHTED COSINE DISTANCE Y1 Y2 W
LET A = WEIGHTED COSINE SIMILARITY Y1 Y2 W
LET LOWLIMIT = value
LET UPPLIMIT = value
LET A = INTERVAL COUNT Y
LET A = PYTHON MEAN Y
LET A = YOUDEN INDEX Y1 Y2
4) Added the following LET matrix subcommands
LET MOUT = COSINE <COLUMN/ROW> DISTANCE M
LET MOUT = COSINE <COLUMN/ROW> SIMILARITY M
LET MOUT = ANGULAR COSINE <COLUMN/ROW> DISTANCE M
LET MOUT = ANGULAR COSINE <COLUMN/ROW> SIMILARITY M
LET MOUT = JACCARD <COLUMN/ROW> DISTANCE M
LET MOUT = JACCARD <COLUMN/ROW> SIMILARITY M
LET MOUT = PEARSON <COLUMN/ROW> DISTANCE M
LET MOUT = PEARSON <COLUMN/ROW> SIMILARITY M
LET MOUT = HAMMING <COLUMN/ROW> DISTANCE M
LET MOUT = CANBERRA <COLUMN/ROW> DISTANCE M
5) Added the following STRING subcommands
LET SOUT = STRING SPLIT SORG
LET SOUT = STRING REMOVE PUNCTUATION SORG
LET SOUT = STRING REMOVE WHITESPACE SORG
LET SOUT = STRING EXPAND WHITESPACE SORG
LET SOUT = STRING DELETE SORG SDEL
LET SOUT = STRING RIGHT INDEX SORG SCHAR
LET SOUT = STRING <LEFT/CENTER/RIGHT> JUSTIFY SORG NLEN
LET SOUT = SWAP CASE SORG
LET IFLAG = STRING STARTS WITH SORG SMATCH
LET IFLAG = STRING ENDS WITH SORG SMATCH
LET IFLAG = STRING CONTAIN SORG SMATCH
LET NC = STRING SUBSET COUNT SORG SMATCH
LET D = STRING HAMMING DISTANCE S1 S2
6) Added the following LET math subcommands
LET Y2 = CELL MATCH X VALUE
LET Y2 = LARGEST Y NVAL
LET Y2 = SMALLEST Y NVAL
LET Y = <YTIC/Y1TIC/Y2TIC/XTIC/X1TIC/X2TIC> <DATA/SCREEN> COORDINATES
LET TAG = DEX CHECK CENTER POINTS X1 ... XK
LET XNEW = CODE DEX 2-LEVEL X
7) Made the following updates to the I/O commands
a) Made the following updates to the STREAM READ command.
i) Added the additional distance options
STREAM READ CANBERRA DISTANCE <file> <var-list>
STREAM READ HAMMING DISTANCE <file> <var-list>
ii) Added the distance cross-tabulation options
STREAM READ CROSS TABULATE EUCLIDEAN DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE MANHATTAN DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE CHEBYCHEV DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE COSINE DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE COSINE SIMILARITY <file> <var-list>
STREAM READ CROSS TABULATE ANGULAR COSINE DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE ANGULAR COSINE SIMILARITY <file> <var-list>
STREAM READ CROSS TABULATE CANBERRA DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE HAMMING DISTANCE <file> <var-list>
STREAM READ CROSS TABULATE CORRELATION <file> <var-list>
STREAM READ CROSS TABULATE COVARIANCE <file> <var-list>
iii) Added the percentiles option
STREAM READ PERCENTILES <file> <var-list>
iv) Added the percentiles cross-tabulation options
STREAM READ CROSS TABULATE PERCENTILES <file> <var-list>
b) Dataplot is a column oriented program. That is, columns
denote variables whiles rows denote observations.
Sometimes you may encounter data files that are row
oriented, that is rows denote variables while columns
denote observations. This is often the case when the
number of variables is significantly greater than the
number of observations.
To better accomodate these types of data files, the
following command was added
READ ROW <file.dat> Y
In this case, the file is read one row at a time
and each row is added as a Dataplot variable. Only
a single variable name is listed. Note that this
variable name serves as "base name". So if Y is
the variable name and there are 25 rows of data,
variables Y1, Y2, ..., Y25 will be created by the
READ ROW command.
Currently, READ ROW is only supported for numeric
data. The rows do not need to contain the same
number of elements.
If the maximum number of available columns is reached,
the READ ROW command will be terminated. However,
any rows that have already been successfully read
will still be retained. If there in error in reading
a specific row, that row will be skipped and Dataplot
will go to the next row.
Similarly, the following command has been added for
writing variables in a row-wise fashion
WRITE ROW <file.dat> <var-list>
where <var-list> is the list of variables to write.
The primary reason for adding the WRITE ROW command
is to make it easy to create a version of the
row oriented file that can be read using the
SET READ FORMAT command. This can significantly
speed up the READ ROW command at the expense of
creating larger data files.
c) Added the following commands
WRITE1 <file-name> <var-list>
WRITE2 <file-name> <var-list>
WRITE3 <file-name> <var-list>
These commands allow writing to three distinct files in
append mode while still making normal use of the standard
WRITE command. Enter HELP WRITE for details on the usage
of these commands.
8) The IF commands now supports the following for strings
IF S1 < S2
IF S1 <= S2
IF S1 > S2
IF S1 >= S2
where S1 and S2 are pre-defined strings. The comparison is based
on the ASCII collating sequence (so "A" is less than "a" and "C" is
less than "b"). The comparison is performed left to right. If
one string is shorter than the other, the shorter string will
return 0 for the ASCII code when its length has been exceeded.
9) Added the following plot control command
...LABEL COORDINATES XCOOR YCOOR
10) Added the following SET commands
SET WRITE FEEDBACK <ON/OFF>
SET WORD DELIMITER <VALUE>
SET COMMAND SUBSTITUTION <ON/OFF>
SET SUBSTITUTION FORMAT <STRING>
SET CARRIAGE RETURN GAP <VALUE>
SET CLIPBOARD RUN CLEAR <ON/OFF>
11) Made the following updates to the SYSTEM command.
For the Windows 7/8/10 platforms, the SYSTEM command works
as follows:
i) A new terminal window is opened.
ii) The requested command is executed.
iii) When the requested command is completed, the new window is
closed and control returns to the Dataplot session.
Several SET commands have been added to control this behavior.
Specifically,
i) In some cases, it is desirable to leave the new Window up.
For example, you may need to view the results from the
SYSTEM command. To specify that the new command window
should remain, enter the command
SET SYSTEM PERSIST ON
To reset the default, enter
SET SYSTEM PERSIST OFF
Note that control does not return to the Dataplot session
until the new terminal window is closed.
This command has no effect on Unix/Linux and MacOS platforms.
ii) In some cases, you may want the SYSTEM command to operate in
the background and not open a new terminal window. To specify
this, enter the command
SET SYSTEM HIDDEN ON
For Windows, Dataplot normally executes the SYSTEM command with
the SYSTEMQQ system call in the Intel Fortran library. If
HIDDEN is set to ON, Dataplot will use the EXECUTE_COMMAND_LINE
routine that was added in the Fortran 2008 standard. If you
compile Dataplot for Windows using an older version of the Intel
compiler or a non-Intel compiler, the EXECUTE_COMMAND_LINE
subroutine may or may not be available. If not, Dataplot will
revert to using SYSTEMQQ (and HIDDEN ON will have no effect).
To reset the default, enter
SET SYSTEM HIDDEN OFF
This command has no effect on Unix/Linux and MacOS platforms.
iii) By default, Dataplot uses "CALL SYSTEM" for Linux/Unix and
MacOS platforms and "CALL SYSTEMQQ" for Windows platforms.
You can request that Dataplot use the EXECUTE_COMMAND_LINE
subroutine instead by entering the command
SET QWIN SYSTEM EXECUTE COMMAND LINE
Be aware that this is relatively new addition to the Fortran
standard and may not be available in all Fortran compilers (or
in older versions of compilers). Specifically, it is not
supported in gfortran (Linux, MacOS) until version 5.x and
version 16 of the Intel compiler. So this option may not be
available on all platforms. The Windows executable for 2019/03
that can be downloaded from the Dataplot web site uses version 17
of the Intel compiler, so this feature is available. If you are
running Dataplot on a platform where EXECUTE_COMMAND_LINE is not
available, Dataplot will revert to "CALL SYSTEM" or
"CALL SYSTEMQQ".
One advantage of using EXECUTE_COMMAND_LINE is that it supports
either synchronous or asynchronous execution. By synchronous, we
mean that control does not return to the Dataplot session until
the SYSTEM command completes execution. By asynchronous, we
mean that control returns to the Dataplot session after the
SYSTEM command is initated (but not neccessarily completed).
To specify asynchronous, enter
SET COMMAND LINE EXECUTE WAIT ON
To restore the default of synchronous, enter
SET COMMAND LINE EXECUTE WAIT OFF
Note that this command only applies if EXECUTE_COMMAND_LINE is
used to implement the SYSTEM command.
12) Added the command
PRINTFILE <filename>
This command can be used to print an ASCII file from within a
Dataplot session.
Also, added the command
COPY SYSTEM <file1> <file2>
Without the SYSTEM option, the COPY command works by reading lines
from <file1> and writing them to <file2>. With the
SYSTEM option, the COPY is implemented by using an appropriate
operating system command ("COPY" for Windows platforms and "cp"
for Linux/Unix and MacOS platforms).
13) Miscellaneous
i) If you enter a command like
Y = NORMAL RANDOM NUMBERS FOR I = 1 1 100
Dataplot would previously return an error. Dataplot was
updated so that if a command is not matched, the command
does not start with LET, and the command contains an
"=" character, Dataplot will insert a "LET " at the
beginning of the command string and try to match the command
again.
Note that using "LET" to start the command is still the
preferred syntax since this new syntax is only attempted if
the first word of the command line does not match a
Dataplot command. For example, "X", "R", "S", and "W"
are short-cuts to existing commands, so the following
will not work
X = X + 1
W = W + 1
R = R + 1
S = S + 1
ii) Added the command
SET TAB EXPAND <VALUE>
By default, when Dataplot parses a command line it converts
all non-printing characters to a single space. This command
allows you to specify how tab characters should be handled.
If <VALUE> is 0, then the tab character is left as is.
If <VALUE> is 1, then the tab character is replaced with
a single space (i.e., the default Dataplot behavior). If
<VALUE> is a positive integer greater than 1, then the tab
character will be replaced with <VALUE> spaces. If
<VALUE> is negative, it will be set to 1. If
<VALUE> is greater than 20, it will be set to 20.
In most cases, the default behavior should be preferred. This
command is most likely to be useful when processing tabs
contained within strings.
iii) When writing text on a plot (e.g., the TEXT, LEGEND, TITLE,
LABEL commands), support was added for "tabs". Note that
tabs are denoted by a "TAB()" sequence, not a hard tab
character. This is supported with the following SET commands
SET TAB HORIZONTAL POSITION <INDEX> <VALUE>
SET TAB VERTICAL POSITION <INDEX> <VALUE>
SET TAB COLOR <INDEX> <VALUE>
SET TAB JUSTIFICATION <INDEX> <VALUE>
SET TAB FONT <INDEX> <VALUE>
SET TAB UNITS <INDEX> <VALUE>
SET TAB VERTICAL UNITS <INDEX> <VALUE>
SET TAB SIZE <INDEX> <VALUE>
SET TAB WIDTH <INDEX> <VALUE>
This feature is most useful in the context of creating
table type text on a plot.
iv) The SET POSTSCRIPT CONVERT option was updated so that a
DEVICE 2 CLOSE (or DEVICE 3 CLOSE) command is no longer required
before exiting Dataplot.
v) Dataplot supports a built-in editor with the FED (or EDIT)
command. The built-in editor is a line-mode editor. You can
now specify the name of an editor of your choice by entering
the command
SET EDITOR <name>
For example, on Windows platforms you can use
SET EDITOR NOTEPAD
SET EDITOR WORDPAD
On Linux, you can use
SET EDITOR vi
SET EDITOR emacs
If your desired editor is not in the default search path, then
you need to include the complete path. For example,
SET EDITOR "C:\Program Files (x86)\notepad++\notepad++.exe"
vi) Changed the default seed for random numbers from 305 to 3005.
vii) Added CALL CLIPBOARD and CB as synonyms for CLIPBOARD RUN.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
June 2018 - July 2018 (contains updates for the 2018/08/03 version)
-----------------------------------------------------------------------
1) Made the following changes to the graphics commands.
a) Added the following options to the KERNEL DENSITY PLOT command
SET KERNEL DENSITY PROBABILITY FUNCTION <pdf/cdf/ppf>
SET KERNEL DENSITY RANDOM NUMBERS <value>
The KERNEL DENSITY PLOT estimates the underlying probability
density function. However, it can also be used to estimate
the cumulative distribution function (cdf) or the percent point
function (ppf). To estimate the cdf, the cumulative integral of
the kernel density plot is computed. The ppf is inverse of the
cdf, so the role of the x and y values from the estimated cdf are
switched to obtain an estimate of the ppf function.
To plot the estimated cdf, enter
SET KERNEL DENSITY PROBABILITY FUNCTION CDF
To plot the estimated ppf, enter
SET KERNEL DENSITY PROBABILITY FUNCTION PPF
Given that we can estimate the ppf function, we can use this to
generate random numbers based on the kernel density plot. If you
would like to generate random numbers, enter a value between 1 and
the maximum number of rows for the SET KERNEL DENSITY RANDOM NUMBERS
command (if this value is set to 0 or a negative value, no random
numbers will be generated).
Specifically, the following procedure is used:
i) Generate uniform random numbers (the uniform random numbers
correspond to x-axis values on the ppf version of the kernel
density plot).
ii) From the ppf version of the kernel density plot, determine
the y-axis value on the kernel density curve that corresponds
to the x-axis value. Cubic spline interpolation is used to
estimate the y-axis value. That is, at the points defined
by the uniform random numbers, we find interpolated values
based on the (x,y) coordinates of the kernel density curve.
iii) The random numbers are written to the file dpst1f.dat.
2) Made the following changes to the analysis commands.
a) Updated the LIMITS OF DETECTION command to accept negative values.
Negative data is handled by adding a constant to make the data all
positive. This constant is then subtracted off for the computed
critical value.
3) Added the following LET math subcommands
LET YRANK = MEAN RANK Y XSEQ X1 ... XK
Y is a response variable, X1 ... XK are group-id variables (up to
six supported), and XSEQ is a sequence number variable.
This command performs cross-tabulations based on the group-id
variables and then ranks the Y values for each cell of the
cross-tabulation. For a given sequence number, the average rank over
all cross-tabulation cells is returned.
The sequence variable is used so that not all cross-tabulation cells
need have the same number of values and also so that it is not
required that the Y values be "ordered" (the sequence number defines the
ordering) within a cell.
4) Added the following LET statistic subcommands
LET YCOV = WEIGHTED COVARIANCE Y1 Y2 W
LET YCORR = WEIGHTED CORRELATION Y1 Y2 W
Also, modified the definition of ANGULAR COSINE DISTANCE and
ANGULAR COSINE SIMILARITY. Specifically, the corrected formula is
Angular Cosine Distance = AFACT*ARCCOS((COSINE SIMILARITY)/PI
Angular Cosine Similarity = 1 - Angular Cosine Distance
where AFACT is 2 if there are no negative values and 1 if there
are negative values. ARCCOS is the arccosine function.
5) Added the following LET string subcommans
LET IX = REFERENCE CHARACTER CODE IX IG
This syntax allows you to specify the character string to numeric
value mapping when converting character fields to numeric values.
6) The following updates were made to the IO commands
a) The PRINT command now supports group labels, row labels, and
character variables.
b) SET PRINT FORMAT is now an alias to SET WRITE FORMAT.
c) Added the option
SET CONVERT CHARACTER CATEGORICAL
This is similar to SET CONVERT CHARACTER ON. However, in
addition to creating the character variable, it will create
a numeric variable that converts the unique values of the
character field to an integer code. Currently, the code values
are assigned in the order that the unique values are detected
in the file. Up to 1,000 levels are supported (if more than
1,000 levels are required, the remaining levels are all set
to "-1").
d) Added the command
SET ROW LABEL COLUMN <ival>
If the SET CONVERT CHARACTER option is set either ON or
CATEGORICAL, this command allows you to specify a column to be
treated as a row label. Row labels are typically the first
column, but you are not restricted to that. If the specified
column is a numeric column, this command has no effect.
e) Made the following updates to the STREAM READ command.
i) Added the cross-tabulation option
STREAM READ CROSS TABULATE <file> <var-list>
You can specify from one to four cross-tabulation variables
with the commands
SET STREAM READ CROSS TABULATE VARIABLE ONE <name>
SET STREAM READ CROSS TABULATE VARIABLE TWO <name>
SET STREAM READ CROSS TABULATE VARIABLE THREE <name>
SET STREAM READ CROSS TABULATE VARIABLE FOUR <name>
With this syntax, a set of nine statistics will be computed
for each cross-tabulation cell (count, minimum, maximum,
range, mean, standard deviation, skewness, kurtosis, number
of missing values).
ii) For the WRITE and CROSS TABULATE cases, character variables
will be automatically converted to categorical variables. For
the GROUP STATISITC, DEFAULT STATISTIC, and FULL STATISTIC
cases, character variables will still be ignored.
If you would like to save the character strings from the
character variables as group labels, enter the command
SET STREAM READ GROUP LABEL ON
If you have specified a column to be the row label variable
using the SET ROW LABEL COLUMN command, this variable will
not be converted to a numeric variable and group labels will
not be created.
iii) When computing statistics, missing values (as specified by the
SET READ MISSING VALUE command) will now be omitted.
iv) Added the options
STREAM READ EUCILDEAN DISTANCE <file> <var-list>
STREAM READ MANHATTAN DISTANCE <file> <var-list>
STREAM READ CHEBYCHEV DISTANCE <file> <var-list>
STREAM READ COSINE DISTANCE <file> <var-list>
STREAM READ COSINE SIMILARITY <file> <var-list>
STREAM READ ANGULAR COSINE DISTANCE <file> <var-list>
STREAM READ ANGULAR COSINE SIMILARITY <file> <var-list>
STREAM READ COVARIANCE <file> <var-list>
STREAM READ CORRELATION <file> <var-list>
With this option, the STREAM READ will return a distance,
covariance, or correlation matrix. The raw data is not
saved. For example,
STREAM READ CORRELATION FILE.DAT Y1 Y2 Y3
will return the 3 variables Y1, Y2, and Y3 where each of
these variables will contain 3 rows. For example, Y2(3)
contains the correlation between the second response column
and the third response column.
This syntax will ignore character fields. If you do not
want some fields in the file to be included, you can do
something like
LET ITYPE = DATA 1 1 1 0 1
SET STREAM READ VARIABLE TYPE ITYPE
These commands specify that fields 1, 2, 3, and 5 will be
included while field 4 will be excluded. This can be useful
if some of the fields are categorical variables where
distance and covariance/correlation do not make sense.
f) For comma delimited files, the READ command will no longer treat
spaces as a delimiters (i.e., spaces in character fields should be
read correctly even if the field is not contained in quotes).
In order to accomodate this change, the comma is no longer the
default read delimiter. If you have a comma delimited file, you
should enter the command
SET READ DELIMITER ,
before reading the data file. If you subsequently need to read
a space delimited data file, it is recommended that you enter the
command
SET READ DELIMITER
This resets the read delimiter to the space character.
7) The following miscellaneous changes were made
a) Made the following tweaks to the STATUS command
i) STATUS V now prints the number of variables currently
assigned and the maximum number of variables allowed.
ii) STATUS F prints the function/string names starting with
. For example, STATUS F ST will print all function/string
names starting with ST.
iii) For Linux/Unix platforms, the DATE (or TIME) command now
uses the Fortran 90 standard DATE_AND_TIME subroutine
(previously it used the Unix specific "fdate" function).
This changes makes the DATE command consistent across
Windows and Linux/Unix platforms.
8) Fixed some bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
October 2017 - May 2018 (includes updates for the 2018/06/12 version).
-----------------------------------------------------------------------
1) The following device driver was added
device <1/2/3> CAIRO <device>
Cairo is a general purpose 2D vector graphics library. Currently,
Dataplot supports the X11, Postscript, encapsulated Postscript,
PDF, Scalable Vector Graphics (SVG), and PNG devices through
the Cairo device driver. Enter HELP CAIRO for details.
This driver is still somewhat experimental.
2) The following changes were made to the Graphics commands
a) Added the following options to the I PLOT command
COEFFICIENT OF VARIAITON CONFIDENCE LIMIT PLOT Y X
COEFFICIENT OF DISPERSION CONFIDENCE LIMIT PLOT Y X
COEFFICIENT OF QUARTILE DISPERSION CONFIDENCE LIMIT PLOT Y X
DIFFERENCE OF MEANS CONFIDENCE LIMIT PLOT Y1 Y2 X
DIFFERENCE OF PROPORTIONS CONFIDENCE LIMIT PLOT Y1 Y2 X
CORRELATION CONFIDENCE INTERVAL PLOT Y1 Y2 X
b) For the fluctuation plot, uncertainty intervals are now
supported for the difference of means and the difference
of binomial proportions statistics.
c) Added the following option to the STATISTIC PLOT command
<stat> TAG PLOT Y X TAG
The X variable is used to identify the groups for the
purpose of computing the statistic. The TAG variable can
be used to give different plot attributes to different
groups. For example, you may want to identify "outlying"
groups.
d) Added the command
DEX ORDER PLOT Y X1 ... XK
This command is primarily used by the DEXODP.DP macro that is
part of the 10-step analysis for 2-level full and fractional
factorial designs.
3) The following changes were made to the Analysis commands
a) Modified the methods for the PROPORTION CONFIDENCE LIMITS
command. Enter HELP PROPORTION CONFIDENCE LIMITS for details.
b) Modified the methods for the DIFFERENCE OF PROPORTION CONFIDENCE
LIMITS command. Enter HELP DIFFERENCE OF PROPORTION CONFIDENCE
LIMITS for details.
c) For the CALIBRATION command, added two columns to the output to
show the expanded error and the coverage factor in addition to
the standard error.
d) Added the commands
COEFFICIENT OF DISPERSION CONFIDENCE LIMITS Y
COEFFICIENT OF QUARTILE DISPERSION CONFIDENCE LIMITS Y
e) Added an option to the SD CONFIDENCE LIMITS command to
support Bonett's intervals for non-normal data.
f) A few enhancements were made to the normal tolerance limits.
i) For two-sided intervals, adde the command
SET TOLERANCE LIMITS METHOD <HOWE/WALD WOLFOWITZ>
The HOWE option uses the Howe approximation. The
WALD WOLFOWITZ option uses an approximation given
by Gardiner to the Beatty method (which implements
the method suggested by Wald and Wolfowitz).
The default is HOWE. Note that prior versions of
Dataplot are based on the Gardiner approximation.
ii) Guenther suggested a correction to the Howe method.
To apply the Guenther correction, use the command
SET GUENTHER CORRECTION ON
iii) For one-sided intervals, added the command
SET TOLERANCE LIMITS ONE SIDED METHOD
<NONCENTRAL T/NORMAL/DEFAULT>
Using NONCENTRAL T uses an approximation based on the
non-central t percent point function while NORMAL uses an
approximation that only requires the normal percent point
function. Although the non-central t approximation is
considered more accurate, the non-central t percent point
function can lose accuracy for large sample sizes. The
DEFAULT option will use the non-central t approximation
for sample sizes of 100 or less and the normal approximation
for sample sizes greater than 100. Previous versions of
Dataplot used the non-central t approximation.
4) The following enhancement was made to the DRAW command.
a) The DRAW command now supports variable arguments. That is,
DRAW X1 Y1 X2 Y2
where X1, Y1, X2, and Y2 are variables rather than parameters.
Note that the DRAW command can accept a mix of parameter and
variable names. However, all variable names must be of the same
length.
If there are N rows in the variables, then N separate lines are
drawn (i.e., each row of the variable is drawn as a separate line).
When parameter names are mixed with variable names, the parameter
values will be used for all N lines.
b) Added the following command
DRAW SYMBOL XPOS YPOS TAG
where XPOS, YPOS, and TAG are variables. XPOS and YPOS define
the x and y coordinates. The TAG variable contains index values
(from 1 to 100) for the CHARACTER and associated character
attribute commands. For example,
LET XPOS = DATA 35 55
LET YPOS = DATA 40 55
LET TAG = DATA 1 2
CHARACTER - +
CHARACTER COLOR RED BLUE
DRAW SYMBOL XPOS YPOS TAG
will draw a "-" symbol in red at position (35,55) and a "+" symbol
at position (40,55) in blue.
If one of the arguments is a parameter rather than a variable, the
parameter value will be used for all rows.
You can specify screen or data coordinates in the standard way
(e.g., DRAWDATA, DRAWSDSD).
These commands were motivated to provide performance improvements to
the 10-step macros for 2-level full and fractional factorial designs.
However, they may have use outside of that context. Specifically,
if you have a number of symbols to add to a plot, using the
DRAW SYMBOL command may be significantly faster than a series of
MOVE and TEXT commands.
5) Added the following LET Statistics commands
LET A = DIFF OF BINOMIAL PROPORTIONS LOWER CONFIDENCE LIMIT Y1 Y2
LET A = DIFF OF BINOMIAL PROPORTIONS UPPER CONFIDENCE LIMIT Y1 Y2
LET A = LOWER COEFFICIENT OF DISPERSION CONFIDENCE LIMIT Y
LET A = UPPER COEFFICIENT OF DISPERSION CONFIDENCE LIMIT Y
LET A = LOWER ONESIDED COEFFICIENT DISPERSION CONFIDENCE LIMIT Y
LET A = UPPER ONESIDED COEFFICIENT DISPERSION CONFIDENCE LIMIT Y
LET A = LOWER COEFFICIENT OF QUARTILE DISPERSION CONFIDENCE LIMIT Y
LET A = UPPER COEFFICIENT OF QUARTILE DISPERSION CONFIDENCE LIMIT Y
6) The following new LET subcommands were added:
LET Y2 = HERMITE DERIVATIVE Y X X2
LET XNEW = CODE DEX X
LET COREFAC = DEX CORE X1 X2 ... XK
LET CONFTAG1 CONFTAG2 = DEX CONFOUND X1 X2 ... XK
LET IFLAG = DEX CHECK CLASSIC X1 X2 ... XK
The above four commands are primarily used by the macros for the
10-step analysis of full and fractional 2-level factorial designs.
7) The following new string commands were added
LET SOUT = STRING COMBINE S1 ... SK
LET SOUT = STRING COMPARE AND REPLACE SOLD SREPLACE SC1 TO SCK
LET SOUT = STRING INTERACTION J1 ... JK
The STRING COMBINE command is similar to the STRING CONCATENATE, but
there are a few differences. Enter HELP STRING COMBINE for details.
The STRING COMPARE AND REPLACE command is used by the EST.DP macro
(part of the 10-step macros for 2-level full and fractional factorial
designs). This command was added for performance reasons. It is not
anticipated that this command will be used outside of the EST.DP macro.
The STRING INTERACTION command was also added with the EST.DP macro in
mind. However, it did not in fact improve performance so it was
ultimately not used there.
8) The following new plot control subcommands were added:
a) Added the command
CHARACTER UNITS <val>
where can be one of DD, DS, SD, or SS. The D means data
units of the current plot and S means 0 to 100 screen units.
The first character refers to the x-axis coordinate and the
second character refers to the y-axis coordinate.
9) The following enhancments were made to the SEARCH, WEB HELP,
and WEB HANDBOOK commands.
a) Added the options
SEARCH REFERENCE <string>
SEARCH HANDBOOK <string>
SEARCH REFERENCE will search the file refman.tex (this is the
file used by the WEB HELP command). Likewise, SEARCH HANDBOOK
searches the file handbk.tex (this is the file used by the
WEB HANDBOOK command).
b) The SEARCH command was supported to support more than one
word matches. For example, previously
SEARCH <file> MEAN PLOT would print all lines containing
the word MEAN. It will now only list lines that contain the
words MEAN PLOT. Note that words MEAN PLOT must appear
contiguously (i.e., as a single phrase) on the line.
It does not do a separate search for MEAN and then for
PLOT. For example, "MEAN OF THE PLOT" would not be a
match for "MEAN PLOT".
c) The following synonyms were added.
? is a synonym for SEARCH REFERENCE
SEARCH RM is a synonym for SEARCH REFERENCE
SEARCH is a synonym for SEARCH REFERENCE
?? is a synonym for WEB HELP
??? is a synonym for SEARCH HANDBOOK
SEARCH HB is a synonym for SEARCH HANDBOOK
SEARCH HANDBK is a synonym for SEARCH HANDBOOK
???? is a synonym for WEB HANDBOOK
HANDBOOK is a synonym for WEB HANDBOOK
HB is a synonym for WEB HANDBOOK
WHB is a synonym for WEB HANDBOOK
W is a synonym for WEB
????? is a synonym for WEB SEARCH
WS is a synonym for WEB SEARCH
SEARCH DIR is a synonym for SEARCH DIRECTORY
SEARCH DIRE is a synonym for SEARCH DIRECTORY
SEARCH DIC is a synonym for SEARCH DICTIONARY
SEARCH DICT is a synonym for SEARCH DICTIONARY
d) The following SET command was added
SET WEB SEARCH DATAPLOT ON
When this switch is set to ON, the keyword DATAPLOT will be added
to the search. This is useful if you are primarily using the
WEB SEARCH command to locate Dataplot documentation.
The default is OFF.
10) The macros for performing the 10-step analysis for 2-level full and
fractional factorial designs were extensively rewritten to improve
the performance.
Note that the new macros require Dataplot executables built with the
May, 2018 source code as they incorporate several of the new
commands described above.
11) The following updates were made to the READ command.
a) Corrected the TO syntax when character variables are being
read.
b) If an error is encountered when reading a line, terminate the
READ immediately rather than continuing to additional lines.
12) The STATUS command was updated in the following ways.
a) For variables, print the number of assigned variables and the
maximum number of variables alllowed.
b) Add row labels, group labels, and character variables.
13) The following miscellaneous changes were made.
a. SHOW was added as a synonym for PSVIEW.
b. LIST now uses SET LIST LINES rather SET HELP LINES to specify the
number of lines to list before prompting to continue.
The number of columns for the list was increased from a maximum of
80 to 240.
c. For Windows platforms, when using SET POSTSCRIPT CONVERT PDF
within a CAPTURE HTML, the PDF file will now use the "embed"
tag rather than providing a link to the PDF file.
d. The method for passing arguments with the CALL command has been
updated. Previously, both positional and named arguments were
supported. The following enhancements were added:
i. The argument list can be enclosed in parentheis. Note that
the use of parenthesis is optional. You can optionally
include one or more spaces between the arguments and the
parenthesis.
ii. For named arguments with quoted values, you can include
only the value of named argument in quotes. That is,
file="c:\my data\test.txt"
Previously, this had to be entered as
"file=c:\my data\test.txt"
If the argument name is not inside the quotes, then you
cannot have spaces around the equal sign. If the argument
name is inside the quotes, then spaces around the equal sign
are optional.
iii. Commas can optionally be used as an argument delimiter. You
can mix the use of spaces and commas as the delimiter. For
example
call test.dp zx=x,zy=y zz=z
Although this is allowed, it is recommended that if you use
commas that you do so consistently. That is,
call test.dp zx=x,zy=y,zz=z
iv. Previously, calling a macro without arguments would clear
the current command arguments. This was changed so that
a CALL command without arguments will not modify the current
command list arguments.
Specifically, the following are all acceptable ways to enter the
same argument list.
Previously supported:
call test.dp y "for i = 1 1 50" x
call test.dp zy=y "target=for i = 1 1 50" x
New formats:
call test.dp y,"for i = 1 1 50",x
call test.dp zy=y,"target=for i = 1 1 50",x
call test.dp (y,"for i = 1 1 50",x)
call test.dp ( y, "for i = 1 1 50",x )
call test.dp (zy=y, "target=for i = 1 1 50",zx=x)
call test.dp ( zy=y, "target=for i = 1 1 50",zx=x )
call test.dp (zy=y, target="for i = 1 1 50",zx=x)
call test.dp ( zy=y, target="for i = 1 1 50",zx=x )
The choice of which syntax to use is primarily a matter of personal
preference.
e. The following commands were added
SET HYPHEN COMMAND LINE
SET COMMA COMMAND LINE
SET EQUAL COMMAND LINE
These commands specify whether hyphens, commas, and equal signs
are treated as word delimiters when parsing Dataplot commands.
These settings are typically used internally by Dataplot for
certain commands (specifically, they are used to support the
various syntax for arguments to the CALL command), but they may
occasionally be useful for your own use.
For example, you can do something like
SET COMMA COMMAND LINE ON
PLOT Y,X
f. Several tweaks were made to the IF command.
i. You can now have up to 10 AND, OR, or XOR clauses on a
single IF command.
ii. When A does not exist for the IF A = ... command, set the IF
status to false and do not signal an error condition.
iii. The following syntax is now supported
IF 3 > 2
IF 3 > A
That is, the left hand side of the logical operator can
now be a number as well as a parameter.
iv. When testing strings, if the left hand side of the logical
operator is enclosed in quotes, treat this is a literal
string.
v. Added the IF COMMAND LINE ARGUMENT ... NOT EXIST command
(the EXIST version was added 2016/10).
vi. Added better feedback for some of the special cases.
g. Changed the default random number generator from FIBONACCI to
FIBONACCI CONGRUENTIAL.
h. The PRINT command was updated to accomodate group labels. You
can use PRINT GROUP LABELS to print all group labels or you
can include group labels in the variable list for the PRINT
command. Group labels are created with the LET ... = GROUP LABEL
command.
i. The maximum number of group label variables was increased from 5
to 20.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
October 2016 - September 2017.
-----------------------------------------------------------------------
1) Updates to the graphics commands
a) Added the commands
EMPIRICAL QUANTILE PLOT Y
QUANTILE BOX PLOT Y
TRUNCATED INFORMATIVE QUANTILE PLOT Y
These commands were added to support the MIL-Handbook 17 standard.
b) Added the command
BLAND ALTMAN PLOT
c) Added the command
POINCARE PLOT
d) Added the command
NORMAL KERNEL DENSITY MIXTURE PLOT
e) Added the command
SET BLOCK PLOT JITTER value
This command can be used to add some random jitter to the
x-coordinate of the plot character of the block plot. This can
be useful when the plot characters overlap vertically.
f) The PSVIEW command allows you to view the current plot (i.e.,
the contents of the DEVICE 3 file) without exiting Dataplot.
This command was updated so that you can view the contents of
the DEVICE 2 file. You can also view an arbitrary Postscript
file. Enter HELP PSVIEW for details.
g) Added the following commands
SET CONSENSUS MEAN PLOT OMIT LABS <list of lab-id's>
SET CONSENSUS MEAN PLOT OMIT METHOD ONE <method>
SET CONSENSUS MEAN PLOT OMIT METHOD TWO <metho>d
SET CONSENSUS MEAN PLOT OMIT METHOD THREE <method>
The SET CONSENSUS MEAN PLOT OMIT LABS command allows you to
specify from one to ten labs that will be omitted from the
consensus mean plot. Note that the omitted labs will still be
included in the computation of the consensus means and
uncertainties, they just will not be included in the plot of the
lab data. This can be useful when there are extreme outliers in
the lab data as the outlying lab can make it difficult to see
differences in the non-outlying labs.
Similarly, the SET CONSENSUS MEAN PLOT OMIT METHOD command
allows you to specify a method (e.g., VANGEL RUKHIN) that
will be omitted from the consensus mean plot. The method will
be included in the computation of the consensus means and
uncertainties, but it will not be displayed on the plot. As
with omitting labs, this can be helpful in the case of extreme
outliers where the results from a particular method might
result in poor resolution for the plot.
2) Updates to the analysis commands
a) Added the following commands for cluster analysis
K MEANS Y1 ... YK
NORMAL MIXTURE CLUSTER Y1 ... Yk
K MEDOID Y1 ... YK
FANNY Y1 ... YK
AGNES Y1 ... YK
DIANA Y1 ... YK
The K MEANS command performs a k-means cluster analysis and the
NORMAL MIXTURE CLUSTER command performs a clustering based on
Hartigan's mixture of normal distributions.
The K MEDOID command performs a k-medoids cluster analysis based
on the CLARA and PAM programs of Rousseeuw and Kauffman.
The AGNES command performs hierarchial clustering using
agglomerative nesting methods. The DIANA command performs
hierarchial clustering using divisive analysis.
b) The CALIBRATION command was updated to include propogation of error
methods as defined in the NIST/SEMATECH e-Handbook of Statistical
Methods for both the linear and quadratic calibration cases.
c) Corrected the tolerance limit factor for the Weibull ABASIS
command.
d) Added the commands
COEFFICIENT OF VARIATION CONFIDENCE LIMITS Y
LOGNORMAL COEFFICIENT OF VARIATION CONFIDENCE LIMITS Y
COMMON COEFFICIENT OF VARIATION CONFIDENCE LIMITS Y X
ONE SAMPLE COEFFICIENT OF VARIATION TEST Y X GAMMA0
TWO SAMPLE COEFFICIENT OF VARIATION TEST Y1 X1 Y2 X2
Enter HELP COEFFICIENT OF VARIATION CONFIDENCE LIMITS for
details.
e) Added the command
LOGNORMAL CONFIDENCE LIMITS Y
3) The following new statistic LET subcommands were added:
LET A = UNBIASED COEFFICIENT OF VARIATION Y
LET A = LOGNORMAL COEFFICIENT OF VARIATION Y
LET A = LOWER COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = UPPER COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = LOWER ONESIDED COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = UPPER ONESIDED COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = LOWER LOGNORMAL COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = UPPER LOGNORMAL COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y
LET A = SUMMARY COEFFICIENT OF VARIATION YMEAN YSD N
LET A = SUMMARY LOWER COEFFICIENT OF VARIATION CONFIDENCE LIMIT
YMEAN YSD N
LET A = SUMMARY UPPER COEFFICIENT OF VARIATION CONFIDENCE LIMIT
YMEAN YSD N
LET A = COMMON COEFFICIENT OF VARIATION Y X
LET A = COMMON BIAS CORRECTED COEFFICIENT OF VARIATION Y X
LET A = LOWER COMMON COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y X
LET A = UPPER COMMON COEFFICIENT OF VARIATION CONFIDENCE LIMIT Y X
LET A = SIGNAL TO NOISE RATIO Y
LET A = PRECISION Y
LET A = COEFFICIENT OF DISPERSION Y
LET A = INDEX OF DISPERSION Y
LET A = QUARTILE COEFFICIENT OF DISPERSION Y
LET A = AAD TO MEDIAN Y
LET A = DIFFERENCE OF PRECISION Y1 Y2
LET A = DIFFERENCE OF SIGNAL TO NOISE RATIO Y1 Y2
LET A = DIFFERENCE OF COEFFICIENT OF DISPERSION Y
LET A = DIFFERENCE OF INDEX OF DISPERSION Y
LET A = DIFFERENCE OF QUARTILE COEFFICIENT OF DISPERSION Y
LET A = DIFFERENCE OF AAD TO MEDIAN Y
LET A = SHORTEST HALF MIDMEAN Y
LET A = SHORTEST HALF MIDRANGE Y
LET A = MIDHINGE Y
LET A = TRIMEAN Y
LET A = DIFFERENCE OF SHORTEST HALF MIDMEAN Y1 Y2
LET A = DIFFERENCE OF SHORTEST HALF MIDRANGE Y1 Y2
LET A = DIFFERENCE OF MIDHINGE Y1 Y2
LET A = DIFFERENCE OF TRIMEAN Y1 Y2
LET A = COSINE DISTANCE Y1 Y2
LET A = COSINE SIMILARITY Y1 Y2
LET A = ANGULAR COSINE DISTANCE Y1 Y2
LET A = ANGULAR COSINE SIMILARITY Y1 Y2
LET A = EUCLIDEAN DISTANCE Y1 Y2
LET A = EUCLIDEAN LENGTH Y1
LET A = DOT PRODUCT Y1 Y2
LET A = MANHATTAN DISTANCE Y1 Y2
LET A = CHEBYSHEV DISTANCE Y1 Y2
LET P = <value>
LET A = MINKOWSKI DISTANCE Y1 Y2
LET A = BINARY MATCH DISSIMILARITY Y1 Y2
LET A = BINARY MATCH SIMILARITY Y1 Y2
LET A = BINARY ROGERS MATCH DISSIMILARITY Y1 Y2
LET A = BINARY ROGERS MATCH SIMILARITY Y1 Y2
LET A = BINARY SOKAL MATCH DISSIMILARITY Y1 Y2
LET A = BINARY SOKAL MATCH SIMILARITY Y1 Y2
LET A = BINARY JACCARD DISSIMILARITY Y1 Y2
LET A = BINARY JACCARD SIMILARITY Y1 Y2
LET A = BINARY ASYMMETRIC SOKAL MATCH DISSIMILARITY Y1 Y2
LET A = BINARY ASYMMETRIC SOKAL MATCH SIMILARITY Y1 Y2
LET A = BINARY ASYMMETRIC DICE MATCH DISSIMILARITY Y1 Y2
LET A = BINARY ASYMMETRIC DICE MATCH SIMILARITY Y1 Y2
LET A = YULES Q Y1 Y2
LET A = GENERALIZED JACCARD COEFFICIENT Y1 Y2
LET A = GENERALIZED JACCARD DISTANCE Y1 Y2
LET A = PEARSON DISSIMILARITY Y1 Y2
LET A = SPEARMAN DISSIMILARITY Y1 Y2
LET A = KENDALL TAU DISSIMILARITY Y1 Y2
LET A = HEDGES G Y1 Y2
LET A = BIAS CORRECTED HEDGES G Y1 Y2
LET A = GLASS G Y1 Y2
LET A = COHEN D Y1 Y2
4) The following new LET subcommands were added:
LET M = GENERATE MATRIX <stat> Y1 ... YK
LET Y2 = HERMITE INTERPOLATION Y X X2
LET A = HERMITE INTEGRAL Y X
LET Y TAG = SAMPLE RANDOM PERMUTATION NPOP NKEEP P NITER
LET WSDF POOLSD = VARIANCES WELCH SATTERTHWAITE YVAR YDF
LET WSDF = GUM WELCH SATTERTHWAITE YSD YDF AUNC
LET Y X = NORMAL KERNEL DENSITY MIXTURE YMEAN YSD
LET Y = EMPIRICAL QUANTILE FUNCTION X
LET Y = INFORMATIVE QUANTILE FUNCTION X
LET Y = TRUNCATED INFORMATIVE QUANTILE FUNCTION X
LET Y = CODEX X
5) The following commands were added to aid in reading certain
types of ASCII files.
SET CHARACTER FIELD COMMA DELIMITER <ON/OFF>
SET READ CHARACTER MISSING VALUE <STRING>
SET READ TRAILING PLUS MINUS IGNORE <ON/OFF>
SET READ DOLLAR SIGN IGNORE <ON/OFF>
SET READ COMMA IGNORE <ON/OFF>
6) Miscellaneous updates
a) Command line arguments for macros have been updated to support
named arguments (previously, only ordered arguments were
supported).
Enter HELP MACRO SUBSTITUTION CHARACTER for details.
b) Updated the IF command to support the syntax
IF NOT <expression>
If <expression> is true, the IF command returns FALSE and
if <expression> is false, the IF command returns TRUE.
In addition, the following are now supported
IF <expr1> AND <expr2>
IF <expr1> OR <expr2>
IF <expr1> XOR <expr2>
Currently, only one AND, OR, or XOR clause may be included on
an IF command.
c) Previously, the following did not work with the SET command
LET DP = 3
SET WRITE DECIMALS DP
You needed to do the following
LET DP = 3
SET WRITE DECIMALS ^DP
The SET command was updated so that the last argument will
be checked to see if it is a parameter or a string. If so,
the last argument will be replaced with the value of the
parameter or string. So the "^" is no longer required,
although that syntax will still work.
Currently for strings, only the first 8 characters will be
used. This means there a few limitations that should be noted.
i) SET commands that expect a file name or a path name
are not yet supported.
ii) Only the last argument is checked. So if the SET
command needs multiple word arguments, this update
will not help.
d) Corrected the CAPTURE SCRIPT command so that it will work within
loops. Also addded the command
SET CAPTURE SCRIPT LOOP SUBSTITUTION <ON/OFF>
If ON, then substitutions denoted by the "^" character will be
performed before writing the line to the CAPTURE file. If OFF,
no substitution will be performed. ON is the default. The OFF
is typically only needed if the "^" character is used and has
a specific meaning you want to preserve (e.g., "^" is the
exponentiation symbol in many scripting languages).
e) Added the command
SET HYPHEN WORD SEPARATOR <ON/OFF>
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
December 2015 - September 2016.
-----------------------------------------------------------------------
1) Made the following updates to graphics commands.
a) Added the command
LET NSIZE = <value>
<stat> WINDOW STATISTIC PLOT Y
This is similar to the
<stat> PLOT Y X
command. However, there is a distinction in how groups are
formed.
The <stat> PLOT command forms groups based on the unique values of
the group-id variable. The <stat> WINDOW STATISTIC PLOT creates
groups of contiguous rows of the response variable. The size of
these groups is determined by the LET NSIZE command that is
entered before the <stat> WINDOW STATISTIC PLOT command.
This command can be useful for very large data sets where it may
be impractical to plot all the individual points. As an alternative,
you can plot various summary statistics of blocks of data.
b) The QUANTILE QUANTILE PLOT was updated in two ways. First, the
parameters PPA0, PPA1, and PPCC are automatically saved after this
plot. These parameters define the intercept, the slope and the
correlation coefficient of the line fit to the points on the plot,
respectively. For exact fit (e.g., QUANTILE QUANTILE PLOT Y Y),
these values would be 0, 1, and 0. Second, the following command
was added
SET QUANTILE QUANTILE PLOT NUMBER OF PERCENTILES <value>
By default, the quantile quantile plot generates the plot points
corresponding to percentiles of the smaller of N1 and N2 where N1
and N2 are the number of observations for the two samples.
This SET command allows you to specify an arbitrary number of
percentiles. This is intended for the case where there are a
large number of data points. For example, suppose the two columns
being compared each have a million or more points. This results
in a time consuming plot and a very large Postscript file (which
you may have a hard viewing or printing). By setting the number of
percentiles to something like a 1,000 or 10,000, you can generate a
quantile quantile plot quickly with a reasonably sized Postscript
file without sacrificing too much information. That is, the basic
message of the quantile quantile plot should still be clear even
with the reduced number of points plotted.
c) Made several updates to the HISTOGRAM command.
When dealing with pathological data sets (e.g., Cauchy distributed
data), there is an issue with generating a class size that is
appropriate for central bulk of the data while still being able
to generate the histogram in an efficient fashion. The following
commands provide some methods for addressing this.
When you have data where there are a small percentage of points
that are quite far from the bulk of the data, you might want to
use the command (this already existed, enter HELP HISTOGRAM CLASS
WIDTH for details)
SET HISTOGRAM CLASS WIDTH IQ RANGE
This bases the bin width for the histogram on the interquartile
range rather than the standard deviation as the other class width
algorithms do. This can result in more reasonable class widths
for the center of the data when there are extreme outliers in the
data. Also, these commands are typically used when the
SET HISTOGRAM OUTLIERS ON
command is also given (this command extends the bins to cover all
outliers).
i) The following command can be used to specify the maximum
number of classes for the histogram.
SET HISTOGRAM MAXIMUM CLASSES <value>
If this command is entered, then the class width is initially
computed in the standard way. If the number of bins needed to
cover the outliers is greater than the value given here, then
the class width is recomputed so that the number of bins is
equal to the value given here.
ii) The following command can be used to specify that outliers
be drawn as individual points rather than extending the bins
to cover them.
SET HISTOGRAM OUTLIER POINTS <ON/OFF>
d) Made several updates to the BOX PLOT command.
i) It is no longer considered an error to only have a single
point for the box plot. Although box plots are typically
not drawn for a small number of points, when automating an
analysis for a large data set it can be useful to have the
box plot drawn for degenerate cases.
ii) To have a horizontal bars drawn at the 1%, 5%, 10%, 90%,
95%, and 99% points of the distribution, enter
SET BOX PLOT EXTREME PERCENTILES ON
This option may be useful for large data sets.
If the FENCES switch is OFF, then the CHARACTER and LINE
settings for traces 21 through 26 will be used to draw these
percentiles. If the FENCES switch is ON, then the CHARACTER
and LINE settings for traces 25 through 30 will be used to
draw these percentiles. Currently, the LINES BOX PLOT and
CHARACTER BOX PLOT commands do not set these. You can use
something like the following to set these switches.
LET INDX = DATA 21 22 23 24 25 26
LET PLOT CHARACTER INDX = BLANK
LET PLOT LINE INDX = SOLID
iii) If you use the following syntax
MULTIPLE BOX PLOT Y1 Y2 ... Y5
Dataplot will internally create a stacked Y X set of data.
Dataplot was modified so that if there are four or fewer
response variables, then Dataplot will not stack the data
to generate the box plot. Although this has no effect on
the appearance of the plot, it can be useful when generating
box plots for large data sets in that it may avoid exceeding
Dataplot's maximum number of rows.
2) Made the following updates to analysis commands.
a) The KOLMOGOROV SMIRNOV TWO SAMPLE TEST was updated to use the
following command
SET TWO SAMPLE TEST NUMBER OF PERCENTILES <value>
By default, the Kolmogorov-Smirnov test is generated using all
the points. When the number of points gets large, this can result
in this command taking a very long time. Computing this test for
a specified number of percentiles of the data allows this command
to be executed quickly without sacrificing too much information.
3) The following new statistic LET subcommands were added:
LET A = NORMALIZED IQR Y
LET A = SCALED MAD Y
LET A = DIFFERENCE OF NORMALIZED IQR Y1 Y2
LET A = DIFFERENCE OF SCALED MAD Y1 Y2
LET A = 2PARAMETER WEIBULL PPCC
LET A = 2PARAMETER WEIBULL PPCC SHAPE
LET A = 2PARAMETER WEIBULL PPCC SCALE
4) Although Dataplot has a large number of built-in statistics,
there may be cases where you need a statistic not directly
supported by Dataplot.
The STATISTIC BLOCK command was added to allow you to define
your own statistic. The power in this command is not the
generation of the statistic itself (this could be accomplished
using a Dataplot macro), but in the ability to use the
statistic with 20+ Dataplot commands. For details, enter
HELP STATISTIC BLOCK.
5) The following new LET subcommands were added:
LET YMIN1 YMAX1 = YFRAME Y
LET XMIN1 XMAX1 = XFRAME X
LET Y2 = SEQUENTIAL SUM Y
LET Y2 = SEQUENTIAL MEAN Y
LET Y2 = SEQUENTIAL MINIMUM Y
LET Y2 = SEQUENTIAL MAXIMUM Y
LET Y2 = SEQUENTIAL PRODUCT Y
LET Y2 = SEQUENTIAL LOWER Y
LET Y2 = SEQUENTIAL UPPER Y
LET Y2 GROUP2 = SEQUENTIAL SUM Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL MEAN Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL MINIMUM Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL MAXIMUM Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL PRODUCT Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL LOWER Y GROUPID
LET Y2 GROUP2 = SEQUENTIAL UPPER Y GROUPID
LET DIPERC = ISO 13528 DIPERC Y XREF
LET PA = ISO 13528 PA Y XREF DELTAE
LET Y = EXECUTE <file-name> X
LET IFLAG = INQUIRE <file-name>
LET Y = WINDOW <stat> X
LET Y = VECTOR PERCENTILE X NPERC
LET Y = CODEZ X
LET Z = UNSTACK Y X
6) The following updates were made to the READ command.
a) Date and time fields will typically have syntax like
2016/06/22
12:43:08
Typically Dataplot will treat the "/" and ":" as indicating
character fields (based on the SET CHARACTER CONVERT command,
this will either cause an error, result in this field being
ignored, or the field being read as a character variable).
The following commands were added to help deal with date and
time fields.
SET DATE DELIMITER <character>
SET TIME DELIMITER <character>
Although Dataplot does not have explicit date or time variables,
these commands allow the components of date and time fields to
be read as separate numeric variables. For example,
SET DATE DELIMITER /
SET TIME DELIMITER :
READ YEAR MONTH DAY HOUR MIN SEC
2016/06/22 23:19:03
END OF DATA
b) IP addresses typically have a syntax like
129.6.37.209
By default, Dataplot will generate an error when trying to read a
field of this type. To address this, you can enter the command
SET READ IP ADDRESSES ON
If this switch is ON, Dataplot will scan the line and if a field is
encountered that contains more than one period ".", Dataplot will
convert these periods to spaces before parsing the line.
The default is OFF since this adds additional processing time to
the READ and most data sets do not contain IP addresses.
c) Added the command STREAM READ. This command can be useful
in processing large data sets (particularly for data sets
exceeding Dataplot's maximum number of rows). There are a
number of variants of this command. Specifically,
i) It can be used to create a file that can be read
using a Fortran like format (i.e., the SET READ
FORMAT command).
ii) It can be used to compute 23 statistics for groups
of the data (either for groups of a specified number
of rows or for when a specific group-id variable
changes value).
iii) Compute eight summary statistics for the full data
set.
7) Miscellaneous updates
a) Added the command
SET STANDARD INPUT <FNAME>
This command is used to set standard input to an external
file. Enter HELP STANDARD INPUT for details.
b) Added the following SET commands
SET DEVICE 3 <AUTOMATIC/USER>
SET DEVICE 2 SPLIT <ON/OFF>
SET DEVICE 3 NAME COUNTER <ON/OFF>
SET CAPTURE SPLIT <ON/OFF>
SET CAPTURE CUMULATIVE <ON/OFF>
c) Added the command
WEB SEARCH <string>
This command will perform a web search. The desired search
engine can be specified with the command
SET SEARCH ENGINE <GOOGLE/BING/DUCK/WOW/YAHOO>
d) Updated the QWIN device driver for Windows to better support
hardware fonts. Enter HELP QWIN for details.
e) Added the command
SET MACRO QUOTES STRIP <ON/OFF>
Enter HELP MACRO SUBSTITUTION CHARACTER for details.
f) Updated the RESET command to allow you to specify names
that will not be reset by the RESET DATA, RESET PARAMETERS,
RESET VARIABLES, RESET FUNCTIONS and RESET MATRICES commands.
The syntax is
RESET NO RESET <name>
Enter the above command for each name that you want the
RESET command to ignore. Up to 30 names can be specified.
To remove a name from the list, enter
RESET NO RESET <name> OFF
g) Added the command
RESET COMMAND LINE ARGUMENTS
This command clears any command line arguments (i.e., $0, $1, ...).
Alternatively, using
CALL file.dat NULL
will also clear the command line arguments.
h) Added the following SET commands that apply to the CORRELATION MATRIX
command
SET CORRELATION ABSOLUTE VALUE <ON/OFF>
SET CORRELATION PERCENTAGE VALUE <ON/OFF>
SET CORRELATION DIGITS <VALUE>
These commands are typically used when plotting the correlation values.
Specifically, the first command allows you to specify the absolute
value of the correlation (useful when you are trying to identify
significant correlation regardless of whether it is a positive or a
negative correlation). The second command specifies the correlation
as a percentage value (e.g., a correlation of 0.91 would be given as
91.0). The third command specifies how many digits to store for the
correlation.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
January 2014 - November 2015.
-----------------------------------------------------------------------
1) The following updates were made to to the Graphics commands.
a. Added the command
DISTRIBUTIONAL FIT PLOT Y X
This command is a graphical represention of the output from the
BEST DISTRIBUTIONAL FIT command for multiple groups as a
tabulation plot. It is intended as a screening tool for
identifying good candidate distributional models.
b. Added the commands
H CONSISTENCY PLOT Y LABID MATID
K CONSISTENCY PLOT Y LABID MATID
COCHRAN VARIANCE PLOT Y LABID MATID
c. Added the commands
TWO FACTOR PLOT Y LABID MATID
d. Added the commands
TWO WAY ROW PLOT Y LABID MATID
TWO WAY COLUMN PLOT Y LABID MATID
e. Added the commands
<stat> CUMULATIVE STATISTIC PLOT Y
<stat> CUMULATIVE STATISTIC PLOT Y X
<stat> MOVING STATISTIC PLOT Y
<stat> MOVING STATISTIC PLOT Y X
where <stat> is one of Dataplot's built-in statistics.
f. Added the command
LORENZ CURVE Y
g. Added the commands
EMBED <ON/OFF>
EMBED CORNER COORDINATES <xlow> <ylow> <xhigh> <yhigh>
The EMBED command is an alternative approach to generating
multiplot plots per pages. Enter HELP EMBED for details.
2) The following updates were made to to the Analysis commands.
a. The following commands
LET Y = ROOTS ...
LET Y = OPTIMIZE ...
PLOT ...
3-D PLOT ...
LET A = INTERGRAL ...
LET A = NUMERICAL DERIVATIVE ...
can work on functions. One limitation in Dataplot's function
definitions is the library functions are computed "row by row".
For many statistical applications, we need to define vector
functions (e.g., various sums and sums of squares). The above
commands now support a "FUNCTION BLOCK" capability that allows
much greater flexibility in defining functions.
Enter HELP FUNCTION BLOCK for details on how to define and use
function blocks.
b. The BEST CP command was updated to print the BIC statistic
in addition to the Cp statistic. Note that the selection
of regressions is still based on the Cp statistic.
c. Added the command
COMPLETE SPATIAL RANDOMNESS Y X
This command tests for complete spatial randomness in the 2D case.
It implements the following three tests:
i) A bivariate Cramer Von Mises test for uniformity
ii) The mean nearest neighbors distance test.
iii) Pollard's statistic for distance indices 1, 2, 3, 4, 5.
d. Added the command
COMMON WEIBULL SHAPE TEST Y X
e. Added the command
<dist1> AND <dist2> DISTRIBUTIONAL LIKELIHOOD RATIO TEST Y
f. Added the command
EQUAL SLOPES TEST Y X TAG
for testing the equality of slopes for two linear regression lines.
f. Made the following updates to the distributional maximum
likelihood commands
i) added 3-parameter lognormal
ii) added 3-parameter gamma
iii) made some updates to the 3-parameter Weibull
iv) added 3-parameter inverse gaussian (also added confidence
limits for parameter estimates)
v) added parameter confidence intervals and percentile
confidence intervals for several additional cases
g. The CROSS TABULATE command was updated to support up to eight
group-id variables (the previous limit was six).
h. The TOLERANCE LIMITS command was updated to support the following
options
LOGNORMAL TOLERANCE LIMITS Y
BOX COX TOLERANCE LIMITS Y
The LOGNORMAL case takes the log of the response data, generates
normal-based tolerance limits, and then takes the exponent of the
normal-based limits. Similarly, the BOX COX case performs a
Box-Cox transformation to normalize the data and then generates
normal-based tolerance limits. These normal-based limits are
then transformed back. These options can be useful for data that
does not follow a normal distribution. They can give more
efficient limits than non-parametric tolerance limits for many
non-normal datasets.
i. The PREDICTION LIMITS command was updated to support the
following options
LOGNORMAL PREDICTION LIMITS Y
BOX COX PREDICTION LIMITS Y
The LOGNORMAL case takes the log of the response data, generates
normal-based prediction limits, and then takes the exponent of the
normal-based limits. Similarly, the BOX COX case performs a
Box-Cox transformation to normalize the data and then generates
normal-based prediction limits. These normal-based limits are
then transformed back. These options can be useful for data that
does not follow a normal distribution.
j. The CAPABILITY ANALYSIS command was updated to include several
additional capability statistics. The output was also reformatted.
k. Several updates were made to the E691 INTERLAB command.
l. Added the command
COCHRAN VARIANCE OUTLIER TEST Y X
This command performs the Cochran variance outlier test for the
maximum variance. It includes the extensions of 'r Lam to handle
unequal group sizes and to handle the minimum variance case.
m. Added the command
EQUAL SLOPES TEST Y X TAG
3) The following new statistic LET subcommands were added:
LET A = VARIATIONAL DISTANCE Y
LET A = RELATIVE DISPERSION INDEX Y
LET A = UNIFORM CHI-SQUARE Y
LET A = DECILE RATIO Y
LET XVALUE = <value>
LET A = VALUE COUNT Y
LET A = GALTON SKEWNESS Y
LET A = PEARSON TWO SKEWNESS Y
LET A = DIFFERENCE OF GALTON SKEWNESS Y
LET A = DIFFERENCE OF PEARSON TWO SKEWNESS Y
LET A = WEIGHTED SKEWNESS Y W
LET A = BOX COX NORMALITY PPCC Y
LET A = BOX COX NORMALITY LAMBDA Y
LET A = AVERAGE ABSOLUTE DEVIATION FROM THE MEDIAN Y
LET A = DIFFERENCE OF AVERAGE ABSOLUTE DEVIATION FROM MEDIAN Y1 Y2
Note: The definition of average absolute deviation
was corrected to compute differences from the
mean rather than differences from the median.
LET A = COMMON WEIBULL SHAPE TEST Y X
LET A = COMMON WEIBULL SHAPE TEST CDF Y X
LET A = COMMON WEIBULL SHAPE TEST PVALUE Y X
LET A = COMMON WEIBULL SHAPE TEST CV90 Y X
LET A = COMMON WEIBULL SHAPE TEST CV95 Y X
LET A = COMMON WEIBULL SHAPE TEST CV99 Y X
LET A = KAPPENMAN R Y
LET A = KAPPENMAN R CUTOFF Y
LET A = BIVARIATE CRAMER VON MISES TEST X Y
LET A = BIVARIATE CRAMER VON MISES TEST CV95 X Y
LET A = BIVARIATE CRAMER VON MISES TEST CV05 X Y
LET A = MEAN NEAREST NEIGHBOR DISTANCE TEST X Y
LET A = MEAN NEAREST NEIGHBOR DISTANCE CDF X Y
LET A = MEAN NEAREST NEIGHBOR DISTANCE PVALUE X Y
LET A = POLLARD <ONE/TWO/THREE/FOUR/FIVE> TEST X Y
LET A = POLLARD <ONE/TWO/THREE/FOUR/FIVE> CDF X Y
LET A = POLLARD <ONE/TWO/THREE/FOUR/FIVE> PVALUE X Y
LET A = <dist> ANDERSON DARLING Y
(HELP STATISTIC ANDERSON DARLING for details)
LET A = CPMK Y
LET A = CNP Y
LET A = CNPM Y
LET A = CNPMK Y
4) The following new math LET subcommands were added:
LET Y = LOGNORMAL MOMENT ESTIMATES X
LET Y = GAMMA MOMENT ESTIMATES X
LET Y = INVERSE GAUSSIAN MOMENT ESTIMATES X
LET YNEW = SHUFFLE GROUPS Y X INDEX
LET YOUT = RANDOM ERROR QUANTITY X Y
LET Y = DIGITS A
5) The following new string LET subcommands were added:
LET STYPE = TYPE <name>
LET CON COP = CONFOUND K N
LET S = DIGITS TO STRING
LET S = STRING REMOVE SPACES
LET S = NUMBER TO STRING
6) The following new LET subcommands were added:
LET IFLAG = CHECK NAMES
LET IFLAG = CHECK EQUAL LENGTH
LET IFLAG = CHECK TYPE
7) The following new library functions were added
i) For a number of distributions commonly used in
reliability/lifetime applications, survival and
inverse survival functions were added. The
survival function is: SURV(X) = 1 - CDF(X). The
inverse survival function is: ISURV(P) = PPF(1-P).
LET A = EXPSURV(X,LOC,SCALE)
LET A = EXPISURV(P,LOC,SCALE)
LET A = EWESURV(X,SHAPE1,SHAPE2,LOC,SCALE)
LET A = EWEISURV(P,SHAPE1,SHAPE2,LOC,SCALE)
LET A = FLSURV(X,SHAPE1,LOC,SCALE)
LET A = FLISURV(P,SHAPE1,LOC,SCALE)
LET A = GAMSURV(X,SHAPE1,LOC,SCALE)
LET A = GAMISURV(P,SHAPE1,LOC,SCALE)
LET A = GEESURV(X,SHAPE1,LOC,SCALE)
LET A = GEEISURV(P,SHAPE1,LOC,SCALE)
LET A = GEVSURV(X,SHAPE1,LOC,SCALE)
LET A = GEVISURV(P,SHAPE1,LOC,SCALE)
LET A = GOMSURV(X,SHAPE1,SHAPE2,LOC,SCALE)
LET A = GOMISURV(P,SHAPE1,SHAPE2,LOC,SCALE)
LET A = IGSURV(X,SHAPE1,LOC,SCALE)
LET A = IGISURV(P,SHAPE1,LOC,SCALE)
LET A = IGASURV(X,SHAPE1,LOC,SCALE)
LET A = IGAISURV(P,SHAPE1,LOC,SCALE)
LET A = IWESURV(X,SHAPE1,LOC,SCALE)
LET A = IWEISURV(P,SHAPE1,LOC,SCALE)
LET A = LGNSURV(X,SHAPE1,LOC,SCALE)
LET A = LGNISURV(P,SHAPE1,LOC,SCALE)
LET A = NORSURV(X,LOC,SCALE)
LET A = NORISURV(P,LOC,SCALE)
LET A = PLNSURV(X,SHAPE1,SHAPE2,LOC,SCALE)
LET A = PLNISURV(P,SHAPE1,SHAPE2,LOC,SCALE)
LET A = PNRSURV(X,SHAPE1,LOC,SCALE)
LET A = PNRISURV(P,SHAPE1,LOC,SCALE)
LET A = RIGSURV(X,SHAPE1,LOC,SCALE)
LET A = RIGISURV(P,SHAPE1,LOC,SCALE)
LET A = WALSURV(X,SHAPE1,LOC,SCALE)
LET A = WALISURV(P,SHAPE1,LOC,SCALE)
LET A = WEISURV(X,SHAPE1,LOC,SCALE)
LET A = WEIISURV(P,SHAPE1,LOC,SCALE)
LET A = UNISURV(X,LOC,SCALE)
LET A = UNIISURV(P,LOC,SCALE)
ii) The following miscellaneous functions were added.
LET A = TRIGAMMA(X)
LET A = NORPPCV(N,ALPHA)
8) Plot control updates
The Dataplot CHARACTER, LINE, SPIKE, BAR, and REGION commands
and the associated attribute setting commands (e.g., CHARACTER
SIZE) support up to 100 settings.
In most applications, only the first few settings need to be made.
However, there are occassions where we would like to be able
to change the setting for a high trace number without entering
the values for all the lower trace numbers. This can now be
done with commands of the form
LET PLOT CHARACTER 24 = CIRCLE
LET PLOT CHARACTER FILL 24 = ON
In this syntax, the number 24 is the index of the trace being
set. The attribute being set ("CHARACTER" and "CHARACTER FILL"
in the above examples) is given on the left hand side of the
equal sign. The assigned value is given on the right hand side
of the equal sign.
You can set several values at once as follows
LET IINDEX = DATA 21 22 23 24
LET PLOT CHARACTER FILL IINDEX = ON
LET IINDEX = DATA 21 22 23 24
LET PLOT CHARACTER FILL IINDEX = ON OFF ON OFF
In the first example, traces 21, 22, 23, and 24 will all be
set to ON. In the second example, trace 21 will be set to ON,
trace 22 will be set to OFF, trace 23 will be set to ON and
trace 24 will be set to OFF.
9) Miscellaneous updates
a) When the command
SET FATAL ERROR PROMPT
is given, Dataplot will now print a trace of all macros called
at the time of the error.
b) The LET ... = CROSS TABULATE .... command will now accept six
cross tabulation factors (up from four).
c) Dataplot uses the GD graphics library to generate plots
in JPEG, PNG, and GIF format. In addition, images in these
formats can be read into Dataplot (as numerical variables).
We updated Dataplot to use the newest version of GD (2.1).
In doing so, we added support for several additional
image formats:
i) BMP - a common format in the Windows environment.
ii) WBMP - a black and white format intended for mobile/
wireless applications. Not commonly used
now.
iii) TGA - Targa format.
iv) TIFF - the widely used TIFF format. Note that this
requires the LIBTIFF library to be installed,
so it may not be available on some installations.
v) WEBP - this is a new format being championed by Google
for displaying videos on the web. This requires
the VPX library to be installed. This will
probably not be available on most installations
by default.
d) Added the command
SET FIT AUXILLARY FILES <ON/OFF>
By default, the FIT command writes information to the temporary
files dpst1f.dat, dpst2f.dat, dpst3f.dat, dpst4f.data, and
dpst5f.dat. If you set this switch to OFF, you can suppress this
writing to files. Note that although this switch was added for
internal Dataplot usage, it can also be entered explicitly.
e) Added the command
SET NORMAL PLOT AXES <DEFAULT/REVERSE>
This reverses the role of the x- and y-axis on the NORMAL PLOT.
f) Added the command
SET LATEX RESIZE <ON/OFF>
This command will add the Latex line
\resizebox{\linewidth}{!} {
to the beginning of LaTex tables. This is useful for tables with a
large number of columns (it automatically resizes the text size if
needed).
g) Added the command
SET CIRCLE CORRECTION <ON/OFF>
By default, Dataplot applies a correction factor to circles based
on the ratio of vertical pixels to horizontal pixels. When the
number of pixels in each direction is not equal, this has the
advantage of maintaining the circular appearance of the circle.
However, the coordinates of the circle may not be as expected.
If this switch is set to OFF, the coordinates will be as
expected. However, it is possible that the circle will have an
elliptic rather than a circular appearance.
h) Added the command
SET SEARCH DIRECTORY <directory name>
This adds a directory that will be searched for file names.
For example, you might want to create a directory to store
commonly called macros.
When Dataplot encounters a file name, it will first try to
find it in the current directory. If it is not found there,
it will then search the directory, if any, specified by the
SET SEARCH DIRECTORY command. If the file is still not
found, Dataplot will then search the Dataplot auxillary
directories.
i) Made the following updates to the READ command.
i) After a READ, the following parameters are saved:
ISKIP - the number of header lines skipped
NUMLRD - the number of data lines read
NUMVRD - the number of variables read
In addition, the variable names read are saved in the
strings ZZZV1, ZZZV2, ZZZV3, and so on. Note that these
parameters and strings are saved each time a READ is
performed.
ii) If you read from a file without specifying a list of
variables, Dataplot would previously do the following.
If a SKIP AUTOMATIC was in effect, Dataplot would search
for a line starting with four dashes ("----"). It would
then assume that the line preceeding this contained the
list of variable names.
On the other hand, if a SKIP N was in effect, Dataplot
would read the first line after the header to determine
the number of variables. It would then create variable
names of the form X1, X2, X3, and so on.
This has been modified. Now, for either the
SKIP AUTOMATIC or the SKIP N cases, you can specify
whether to create the variable names automatically
or to read them from the data file with the command
SET VARIABLE NAME <AUTOMATIC/FILE>
The AUTOMATIC option specifies that automatic variable
names will be created. If the FILE option is specified,
Dataplot will try to read the variable names from the
the file. If SKIP N is in effect, Dataplot will check
the last line of the header to see if it starts with
4 dashes. If so, it will assume the variable names
are in the preceeding line. If not, it will assume the
variable lines are the last line in the header.
Also, the default for automatic variable names has been
changed from X1, X2, and so on to COL1, COL2, and so on.
You can specify the base (e.g., X or COL) for the variable
names with the command
SET AUTOMATIC VARIABLE BASE NAME <value>
j) Dataplot has added a number of commands for accessing the system
clipboard. This capability is operating system and compiler
dependent. It is currently supported under Windows for the
Intel Fortran compiler (support for Linux and Mac OS X is
still under development).
Enter HELP CLIPBOARD for more information.
k) Added the commands
CALL EXIT
CALL EXIT ALL
The CALL EXIT command will exit the currently executing macro
and the CALL EXIT ALL command will exit all currently running
macros and return control to the keyboard. These commands can be
useful for general purpose macros where an error is detected.
l) Added the commands
WEB SEARCH <key1> ... <keyn>
10) Fixed a number of bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
December 2010 - December 2013.
-----------------------------------------------------------------------
1) The following library functions were added:
LET YOUT = MERGE(Y1,Y2,TAG)
LET YOUT = MERGE3(Y1,Y2,Y3,TAG)
LET YOUT = RELDIF(Y1,Y2)
LET YOUT = RELERR(Y1,Y2)
LET YOUT = PERCDIF(Y1,Y2)
LET YOUT = PERCERR(Y1,Y2)
LET YOUT = ANGRAD(X1,Y1,X2,Y2,X3,Y3)
LET YOUT = DPNTLINE(X1,Y1,X2,Y2,SLOPE)
LET YOUT = SLOPE(X1,Y1,X2,Y2)
LET YOUT = LININTER(X1,Y1,X2,Y2,X3)
In addition, the MIN and MAX functions were updated to handle up to
eight input arguments (previously limited to two arguments).
2) The following string commands were added:
a) LET IVAL = STRING COMPARE A B
IVAL will be set to 1 if strings A and B are identical and
set to 0 if they are not identical.
b) LET SBASE = GROUP LABEL TO STRINGS IG
This command will convert the group labels in IG to
the strings SBASE1, SBASE2, ..., SBASEN (where N is the
number of group labels in IG).
c) Several enhancements were made to the use of row labels.
The command
LET ROWLABEL = <ix>
previously would convert a character variable, <ix>
found in the character data file (dpzchf.dat) to row
labels. This command was expanded to include the
numeric variables as well. One example where this can
be useful is when lab-id's are used to label plot points.
Note that the dpzchf.dat file is searched first. If no
match is found there, then Dataplot will check the list
of currently defined numeric variables.
The command
LET ROWLABEL = STRING TO ROW LABEL <irow> <s>
was added. This will set the <irow>th row label to <s>
If <s> is a previously defined string, then the contents
of that string will be used. If no previously defined string
is found, then <s> is treated as literal text.
The command
LET ROWLABEL <ivalue> = <string>
will define the <ivalue>-th row of the row labels to
<string>.
The commands
LET ROWLABEL = SHIFT LEFT <ivalue>
LET ROWLABEL = SHIFT RIGHT <ivalue>
will shift the row label by <ivalue> rows either left
(= down) or right (up). The vacated rows will be set to
blank.
The command
LET ROWLABEL = DELETE
will re-initialize all row labels to blank.
3) The following enhancements were made to the LET subcommands.
a) The following new statistic LET subcommands were added:
LET A = SHANNON DIVERSITY INDEX Y
LET A = SIMPSON DIVERSITY INDEX Y
LET A = ROBUST POOLED STANDARD DEVIATION Y
LET A = ROBUST POOLED RANGE Y
LET A = UNIQUE X
LET A = EXCESS KURTOSIS Y
LET A = SUM OF SQUARES Y
LET A = DIFFERENCE OF SUM OF SQUARES Y1 Y2
LET A = SUM OF SQUARES FROM MEAN Y
LET A = DIFFERENCE OF SUM OF SQUARES FROM MEAN Y1 Y2
LET A = RESCALED SUM Y
LET A = RLP Y
LET QUANT =
LET A = Q QUANTILE RANGE Y
LET A = PERCENT AGREE Y1 Y2
LET A = PERCENT DISAGREE Y1 Y2
LET A = GROUPED POISSON DISPERSION TEST Y X
LET A = GROUPED POISSON DISPERSION TEST CDF Y X
LET A = GROUPED POISSON DISPERSION TEST PVALUE Y X
LET A = CORRELATION PVALUE Y1 Y2
LET A = CORRELATION CDF Y1 Y2
LET A = CORRELATION ABSOLUTE VALUE Y1 Y2
LET A = RANK CORRELATION ABSOLUTE VALUE Y1 Y2
LET A = RANK CORRELATION CDF Y1 Y2
LET A = RANK CORRELATION PVALUE Y1 Y2
LET A = RANK CORRELATION LOWER TAILED PVALUE Y1 Y2
LET A = RANK CORRELATION UPPER TAILED PVALUE Y1 Y2
LET A = KENDALL TAU ABSOLUTE VALUE Y1 Y2
LET A = KENDALL TAU CDF Y1 Y2
LET A = KENDALL TAU PVALUE Y1 Y2
LET A = KENDALL TAU LOWER TAILED PVALUE Y1 Y2
LET A = KENDALL TAU UPPER TAILED PVALUE Y1 Y2
LET A = PARTIAL CORRELATION Y1 Y2 Y3
LET A = PARTIAL CORRELATION PVALUE Y1 Y2 Y3
LET A = PARTIAL CORRELATION CDF Y1 Y2 Y3
LET A = PARTIAL CORRELATION ABSOLUTE VALUE Y1 Y2 Y3
LET A = PARTIAL RANK CORRELATION Y1 Y2 Y3
LET A = PARTIAL RANK CORRELATION ABSOLUTE VALUE Y1 Y2 Y3
LET A = PARTIAL KENDALL TAU CORRELATION Y1 Y2 Y3
LET A = PARTIAL KENDALL TAU ABSOLUTE VALUE Y1 Y2 Y3
LET A = INDEX FIRST MATCH Y1 Y2
LET A = INDEX LAST MATCH Y1 Y2
LET A = INDEX FIRST NOT MATCH Y1 Y2
LET A = INDEX LAST NOT MATCH Y1 Y2
LET A = WEIGHTED ORDER STATISTIC MEAN Y W
LET A = WEIGHTED SUM Y W
LET A = WEIGHTED SUM OF SQUARES Y W
LET A = WEIGHTED SUM OF ABSOLUTE VALUES Y W
LET A = WEIGHTED AVERAGE OF ABSOLUTE VALUES Y W
LET A = WEIGHTED SUM OF DEVIATIONS FROM THE MEAN Y W
LET A = WEIGHTED SUM OF SQUARED DEVIATIONS FROM THE MEAN Y W
LET A = A BASIS NORMAL Y
LET A = A BASIS LOGNORMAL Y
LET A = A BASIS WEIBULL Y
LET A = A BASIS NONPARAMETRIC Y
LET A = B BASIS NORMAL Y
LET A = B BASIS LOGNORMAL Y
LET A = B BASIS WEIBULL Y
LET A = B BASIS NONPARAMETRIC Y
LET A = LOWER CONFIDENCE LIMIT Y
LET A = UPPER CONFIDENCE LIMIT Y
LET A = ONE SIDED LOWER CONFIDENCE LIMIT Y
LET A = ONE SIDED UPPER CONFIDENCE LIMIT Y
LET A = LOWER PREDICTION LIMIT Y
LET A = UPPER PREDICTION LIMIT Y
LET A = ONE SIDED LOWER PREDICTION LIMIT Y
LET A = ONE SIDED UPPER PREDICTION LIMIT Y
LET A = LOWER PREDICTION BOUND Y
LET A = UPPER PREDICTION BOUND Y
LET A = ONE SIDED LOWER PREDICTION BOUND Y
LET A = ONE SIDED UPPER PREDICTION BOUND Y
LET A = LOWER SD CONFIDENCE LIMIT Y
LET A = UPPER SD CONFIDENCE LIMIT Y
LET A = ONE SIDED LOWER SD CONFIDENCE LIMIT Y
LET A = ONE SIDED UPPER SD CONFIDENCE LIMIT Y
LET A = LOWER SD PREDICTION LIMIT Y
LET A = UPPER SD PREDICTION LIMIT Y
LET A = ONE SIDED LOWER SD PREDICTION LIMIT Y
LET A = ONE SIDED UPPER SD PREDICTION LIMIT Y
LET A = CUMULATIVE SUM FORWARD TEST Y
LET A = CUMULATIVE SUM FORWARD TEST PVALUE Y
LET A = CUMULATIVE SUM BACKWARD TEST Y
LET A = CUMULATIVE SUM BACKWARD TEST PVALUE Y
LET A = DIXON TEST Y
LET A = DIXON MINIMUM TEST Y
LET A = DIXON MAXIMUM TEST Y
LET A = EXTREME STUDENTIZED DEVIATE TEST Y
LET A = FREQUENCY TEST CDF Y
LET A = FREQUENCY TEST Y
LET A = FREQUENCY WITHIN A BLOCK TEST CDF Y
LET A = FREQUENCY WITHIN A BLOCK TEST Y
LET A = GRUBB TEST Y
LET A = GRUBB TEST CDF Y
LET A = GRUBB TEST DIRECTION Y
LET A = GRUBB TEST INDEX Y
LET A = JARQUE BERA TEST Y
LET A = JARQUE BERA TEST CDF Y
LET A = JARQUE BERA TEST PVALUE Y
LET A = MEAN SUCCESSIVE DIFFERENCE TEST Y
LET A = MEAN SUCCESSIVE DIFFERENCE TEST NORMALIZED Y
LET A = MEAN SUCCESSIVE DIFFERENCE TEST CDF Y
LET A = MEAN SUCCESSIVE DIFFERENCE TEST PVALUE Y
LET A = NORMAL TOLERANCE K FACTOR Y
LET A = NORMAL TOLERANCE LOWER LIMIT Y
LET A = NORMAL TOLERANCE UPPER LIMIT Y
LET A = NORMAL TOLERANCE ONE SIDED K FACTOR Y
LET A = NORMAL TOLERANCE ONE SIDED LOWER LIMIT Y
LET A = NORMAL TOLERANCE ONE SIDED UPPER LIMIT Y
LET A = ONE SAMPLE SIGN TEST Y
LET A = ONE SAMPLE SIGN TEST CDF Y
LET A = ONE SAMPLE SIGN TEST PVALUE Y
LET A = ONE SAMPLE SIGN TEST LOWER TAIL PVALUE Y
LET A = ONE SAMPLE SIGN TEST UPPER TAIL PVALUE Y
LET A = ONE SAMPLE T TEST Y
LET A = ONE SAMPLE T TEST CDF Y
LET A = ONE SAMPLE T TEST PVALUE Y
LET A = ONE SAMPLE T TEST LOWER TAIL PVALUE Y
LET A = ONE SAMPLE T TEST UPPER TAIL PVALUE Y
LET A = ONE SAMPLE WILCOXON SIGNED RANK TEST Y
LET A = ONE SAMPLE WILCOXON SIGNED RANK TEST CDF Y
LET A = ONE SAMPLE WILCOXON SIGNED RANK TEST PVALUE Y
LET A = ONE SAMPLE WILCOXON TEST LOWER TAILED PVALUE Y
LET A = ONE SAMPLE WILCOXON TEST UPPER TAILED PVALUE Y
LET A = POISSON DISPERSION TEST Y
LET A = POISSON DISPERSION TEST CDF Y
LET A = POISSON DISPERSION TEST PVALUE Y
LET A = SUMMARY NORMAL TOLERANCE K FACTOR MEAN SD N
LET A = SUMMARY NORMAL TOLERANCE LOWER LIMIT MEAN SD N
LET A = SUMMARY NORMAL TOLERANCE UPPER LIMIT MEAN SD N
LET A = SUMMARY NORMAL TOLERANCE ONE SIDED K FACTOR MEAN SD N
LET A = SUMMARY NORMAL TOLERANCE ONE SIDED LOWER LIMI MEAN SD N
LET A = SUMMARY NORMAL TOLERANCE ONE SIDED UPPER LIMI MEAN SD N
LET A = SUMMARY LOWER PREDICTION LIMIT MEAN SD N
LET A = SUMMARY UPPER PREDICTION LIMIT MEAN SD N
LET A = SUMMARY ONE SIDED LOWER PREDICTION LIMIT MEAN SD N
LET A = SUMMARY ONE SIDED UPPER PREDICTION LIMIT MEAN SD N
LET A = SUMMARY LOWER PREDICTION BOUND MEAN SD N
LET A = SUMMARY UPPER PREDICTION BOUND MEAN SD N
LET A = SUMMARY ONE SIDED LOWER PREDICTION BOUND MEAN SD N
LET A = SUMMARY ONE SIDED UPPER PREDICTION BOUND MEAN SD N
LET A = SUMMARY LOWER SD CONFIDENCE LIMIT SD N
LET A = SUMMARY UPPER SD CONFIDENCE LIMIT SD N
LET A = SUMMARY ONE SIDED LOWER SD CONFIDENCE LIMIT SD N
LET A = SUMMARY ONE SIDED UPPER SD CONFIDENCE LIMIT SD N
LET A = SUMMARY LOWER SD PREDICTION LIMIT SD N
LET A = SUMMARY UPPER SD PREDICTION LIMIT SD N
LET A = SUMMARY ONE SIDED LOWER SD PREDICTION LIMIT SD N
LET A = SUMMARY ONE SIDED UPPER SD PREDICTION LIMIT SD N
LET A = TIETJEN MOORE TEST Y
LET A = TIETJEN MOORE MINIMUM TEST Y
LET A = TIETJEN MOORE MAXIMUM TEST Y
LET A = WILK SHAPIRO TEST Y
LET A = WILK SHAPIRO TEST PVALUE Y
LET A = ANGLIT PPCC Y
LET A = ANGLIT PPCC LOCATION Y
LET A = ANGLIT PPCC SCALE Y
LET A = ARCSINE PPCC Y
LET A = ARCSINE PPCC LOCATION Y
LET A = ARCSINE PPCC SCALE Y
LET A = CAUCHY PPCC Y
LET A = CAUCHY PPCC LOCATION Y
LET A = CAUCHY PPCC SCALE Y
LET A = COSINE PPCC Y
LET A = COSINE PPCC LOCATION Y
LET A = COSINE PPCC SCALE Y
LET A = DOUBLE EXPONENTIAL PPCC Y
LET A = DOUBLE EXPONENTIAL PPCC LOCATION Y
LET A = DOUBLE EXPONENTIAL PPCC SCALE Y
LET A = FATIGUE LIFE PPCC STATISTIC Y
LET A = FATIGUE LIFE PPCC LOCATION Y
LET A = FATIGUE LIFE PPCC SCALE Y
LET A = FATIGUE LIFE PPCC SHAPE Y
LET A = GAMMA PPCC STATISTIC Y
LET A = GAMMA PPCC LOCATION Y
LET A = GAMMA PPCC SCALE Y
LET A = GAMMA PPCC SHAPE Y
LET A = GH PPCC STATISTIC Y
LET A = GH PPCC LOCATION Y
LET A = GH PPCC SCALE Y
LET A = GH PPCC SHAPE ONE Y
LET A = GH PPCC SHAPE TWO Y
LET A = GENERALIZED PARETO PPCC STATISTIC Y
LET A = GENERALIZED PARETO PPCC LOCATION Y
LET A = GENERALIZED PARETO PPCC SCALE Y
LET A = GENERALIZED PARETO PPCC SHAPE Y
LET A = EXPONENTIAL PPCC Y
LET A = EXPONENTIAL PPCC LOCATION Y
LET A = EXPONENTIAL PPCC SCALE Y
LET A = HALF CAUCHY PPCC Y
LET A = HALF CAUCHY PPCC LOCATION Y
LET A = HALF CAUCHY PPCC SCALE Y
LET A = HALF NORMAL PPCC Y
LET A = HALF NORMAL PPCC LOCATION Y
LET A = HALF NORMAL PPCC SCALE Y
LET A = HYPERBOLIC SECANT PPCC Y
LET A = HYPERBOLIC SECANT PPCC LOCATION Y
LET A = HYPERBOLIC SECANT PPCC SCALE Y
LET A = INVERTED WEIBULL PPCC STATISTIC Y
LET A = INVERTED WEIBULL PPCC LOCATION Y
LET A = INVERTED WEIBULL PPCC SCALE Y
LET A = INVERTED WEIBULL PPCC SHAPE Y
LET A = LOGISTIC PPCC Y
LET A = LOGISTIC PPCC LOCATION Y
LET A = LOGISTIC PPCC SCALE Y
LET A = LOGNORMAL PPCC STATISTIC Y
LET A = LOGNORMAL PPCC LOCATION Y
LET A = LOGNORMAL PPCC SCALE Y
LET A = LOGNORMAL PPCC SHAPE Y
LET A = MAXWELL PPCC Y
LET A = MAXWELL PPCC LOCATION Y
LET A = MAXWELL PPCC SCALE Y
LET A = MAXIMUM GUMBEL PPCC Y
LET A = MAXIMUM GUMBEL PPCC LOCATION Y
LET A = MAXIMUM GUMBEL PPCC SCALE Y
LET A = MINIMUM GUMBEL PPCC Y
LET A = MINIMUM GUMBEL PPCC LOCATION Y
LET A = MINIMUM GUMBEL PPCC SCALE Y
LET A = NORMAL PPCC LOCATION Y
LET A = NORMAL PPCC SCALE Y
LET A = RAYLEIGH PPCC Y
LET A = RAYLEIGH PPCC LOCATION Y
LET A = RAYLEIGH PPCC SCALE Y
LET A = SEMICIRCULAR PPCC Y
LET A = SEMICIRCULAR PPCC LOCATION Y
LET A = SEMICIRCULAR PPCC SCALE Y
LET A = SLASH PPCC Y
LET A = SLASH PPCC LOCATION Y
LET A = SLASH PPCC SCALE Y
LET A = TUKEY LAMBDA PPCC STATISTIC Y
LET A = TUKEY LAMBDA PPCC LOCATION Y
LET A = TUKEY LAMBDA PPCC SCALE Y
LET A = TUKEY LAMBDA PPCC SHAPE Y
LET A = UNIFORM PPCC LOCATION Y
LET A = UNIFORM PPCC SCALE Y
LET A = WALD PPCC STATISTIC Y
LET A = WALD PPCC LOCATION Y
LET A = WALD PPCC SCALE Y
LET A = WALD PPCC SHAPE Y
LET A = WEIBULL PPCC STATISTIC Y
LET A = WEIBULL PPCC LOCATION Y
LET A = WEIBULL PPCC SCALE Y
LET A = WEIBULL PPCC SHAPE Y
LET SIGMA =
LET A = CHI-SQUARE SD TEST Y
LET A = CHI-SQUARE SD TEST CDF Y
LET A = CHI-SQUARE SD TEST PVALUE Y
LET A = CHI-SQUARE SD TEST LOWER TAIL PVALUE Y
LET A = CHI-SQUARE SD TEST UPPER TAIL PVALUE Y
LET A = F TEST Y1 Y2
LET A = F TEST CDF Y1 Y2
LET A = F TEST PVALUE Y1 Y2
LET A = KLOTZ TEST Y1 Y2
LET A = KLOTZ TEST CDF Y1 Y2
LET A = KLOTZ TEST PVALUE Y1 Y2
LET A = KLOTZ TEST LOWER TAILED PVALUE Y1 Y2
LET A = KLOTZ TEST UPPER TAILED PVALUE Y1 Y2
LET A = KRUSKAL WALLIS TEST Y X
LET A = KRUSKAL WALLIS TEST CDF Y X
LET A = KRUSKAL WALLIS TEST PVALUE Y X
LET A = MANN WHITNEY RANK SUM TEST Y1 Y2
LET A = MANN WHITNEY RANK SUM TEST CDF Y1 Y2
LET A = MANN WHITNEY RANK SUM TEST PVALUE Y1 Y2
LET A = MANN WHITNEY RANK SUM LOWER TAIL PVALUE Y1 Y2
LET A = MANN WHITNEY RANK SUM UPPER TAIL PVALUE Y1 Y2
LET A = MANN WHITNEY U STATISTIC Y1 Y2
LET A = TWO SAMPLE CHI SQUARE TEST Y1 Y2
LET A = TWO SAMPLE CHI SQUARE TEST CDF Y1 Y2
LET A = TWO SAMPLE CHI SQUARE TEST PVALUE Y1 Y2
LET A = TWO SAMPLE KOLMOGOROV SMIRNOV TEST Y1 Y2
LET A = TWO SAMPLE KOLMOGOROV SMIRNOV TEST CRITICAL VALUE Y1 Y2
LET A = TWO SAMPLE SIGN TEST Y1 Y2
LET A = TWO SAMPLE SIGN TEST CDF Y1 Y2
LET A = TWO SAMPLE SIGN TEST PVALUE Y1 Y2
LET A = TWO SAMPLE SIGN TEST LOWER TAIL PVALUE Y1 Y2
LET A = TWO SAMPLE SIGN TEST UPPER TAIL PVALUE Y1 Y2
LET A = TWO SAMPLE T TEST Y1 Y2
LET A = TWO SAMPLE T TEST CDF Y1 Y2
LET A = TWO SAMPLE T TEST PVALUE Y1 Y2
LET A = TWO SAMPLE T TEST LOWER TAILED PVALUE Y1 Y2
LET A = TWO SAMPLE T TEST UPPER TAILED PVALUE Y1 Y2
LET A = TWO SAMPLE WILCOXON SIGNED RANK TEST Y1 Y2
LET A = TWO SAMPLE WILCOXON SIGNED RANK TEST CDF Y1 Y2
LET A = TWO SAMPLE WILCOXON SIGNED RANK TEST PVALUE Y1 Y2
LET A = TWO SAMPLE WILCOXON TEST LOWER TAILED PVALUE Y1 Y2
LET A = TWO SAMPLE WILCOXON TEST UPPER TAILED PVALUE Y1 Y2
LET A = ANDERSON DARLING K SAMPLE TEST Y X
LET A = ANDERSON DARLING K SAMPLE TEST CRITICAL VALUE Y X
LET A = MEDIAN TEST Y X
LET A = MEDIAN TEST CDF Y X
LET A = MEDIAN TEST PVALUE Y X
LET A = SQUARED RANK TEST Y X
LET A = SQUARED RANK TEST CDF Y X
LET A = SQUARED RANK TEST PVALUE Y X
LET A = SQUARED RANK TEST LOWER TAILED PVALUE Y X
LET A = SQUARED RANK TEST UPPER TAILED PVALUE Y X
b) The statistic LET subcommands now support matrix arguments.
For example,
LET A = MEAN M
LET A = DIFFERENCE OF MEANS M N
where M and N are matrices. Note that the matrix will be converted
to a variable (in a columnwise order) before applying the command.
This means that the number of rows times the number of columns must
be less than or equal to the maximum number of rows per variable
(this is set to 1,000,000 on most current implementations).
Be aware that Dataplot distinguishes between "statistic" and "math"
LET subcommands. The statistic LET subcommands work with variables
on the right hand side and always return a parameter (i.e., scalar)
value. The math LET subcommands may have a mix of parameters and
variables on both the left and right hand sides. At the current
time, only those math LET subcommands that explicitly work with
matrices support matrix arguments. Volume II of the online Reference
Manual provides separate chapters for the "statistic", "math", and
"matrix" LET subcommands. It is only those commands in the
"statistic" chapter that are affected by this update.
c) The following new math LET subcommands were added:
LET X FREQ CDF = MANN WHITNEY U STATISTIC FREQUENCY N1 N2
LET TAG = KEEP X XKEEP
LET TAG = OMIT X XOMIT
LET Y2 TAG = THRESHOLD MINIMUM Y TVAL
LET Y2 TAG = THRESHOLD MAXIMUM Y TVAL
LET Y = CUMULATIVE <STAT> Y
LET Y = CROSS TABULATE CUMULATIVE <STAT> Y X
LET Y = WEIBULL MOMENT ESTIMATORS X
LET Y = PERCENTAGE RANK X
LET Y = EXPAND XLAB XVAL
LET Y2 = JSCORE Z ROUND
LET Y2 = ISO 13528 ZSCORE Y XREF SIGMA
LET Y2 = ISO 13528 ZPRIME Y XREF SIGMA
LET Y2 = ISO 13528 EN SCORE Y ULAB XREF UREF
LET Y2 = ISO 13528 ZETA SCORE Y ULAB XREF UREF
LET Y2 = ISO 13528 EZMINUS SCORE Y ULAB XREF UREF
LET Y2 = ISO 13528 EZPLUS SCORE Y ULAB XREF UREF
LET MOUT = MATRIX COMBINE COLUMNS M N
LET MOUT = MATRIX COMBINE ROWS M N
LET MOUT = PARTIAL CORRELATION MATRIX M
LET MOUT = PARTIAL CORRELATION CDF MATRIX M
LET MOUT = PARTIAL CORRELATION PVALUE MATRIX M
LET MOUT = CORRELATION CDF MATRIX M
LET MOUT = CORRELATION PVALUE MATRIX M
LET YOUT = LOW PASS FILTER Y
LET YOUT = HIGH PASS FILTER Y
LET TAG = POINTS IN POLYGON XVAL YVAL XPOLY YPOLY
LET Y2 X2 = TRANSFORM POINTS Y X TX TY SX SY THETA
LET Y2 X2 = EXTREME POINTS Y X
LET Y2 X2 = LINE INTERSECTIONS X1 Y1 X2 Y2 X3 Y3 X4 Y4
LET Y2 X2 = PARALLEL LINES X1 Y1 X2 Y2 X3 Y3
LET Y2 X2 = PERPINDICULAR LINES X1 Y1 X2 Y2 X3 Y3
LET YINDEX = NEAREST NEIGHBOR INDEX Y X
LET YDIST = NEAREST NEIGHBOR DISTANCE Y X
LET YINDEX YDIST = NEAREST NEIGHBOR Y X
LET Y2 X2 TAG2 = NEAREST NEIGHBOR JOIN Y1 X1 YINDEX
LET Y3 X3 DIST = FIRST NEAREST NEIGHBOR Y1 X1 Y2 X2
LET Y3 X3 DIST TAG1 TAG2 = ALL NEAREST NEIGHBORS Y1 X1 Y2 X2
LET Y2 X2 YCODED = BINNED CODED Y
The INTEGRAL command was updated to allow indefinite integrals
(i.e., either the lower limit or the upper limit is infinity).
If you specify the lower limit as CPUMIN or -INFINIY or you
specify the upper limit as CPUMAX or INFINITY, then the
indefinite integration code will automatically be invoked.
You do not have to define CPUMIN/CPUMAX/INIFINITY (Dataplot
checks for the literal text, not the value of any parameter
that may be defined these strings).
4) The following enhancements were made to the graphics commands.
a) The following graphics commands were added
ISO 13528 PLOT Y Z ROUND LABID LAB
ISO 13528 ZSCORE PLOT Z MATID ROUNDID
ISO 13528 JSCORE PLOT Z MATID ROUNDID
ISO 13528 RLP PLOT Z LABID MATID
b) The following updates were made to the HOMOSCEDASTICITY PLOT:
i) Added support for the MULTIPLE option.
ii) Added support for the SUBSET (or HIGHLIGHT) option.
iii) Allow more than one group-id variable.
iv) Allow alternate measures for location and scale.
v) Added support for summary data.
vi) Added support for the "circle technique" to identify
non-homogeneous labs.
vii) Added support for the TO syntax.
Enter HELP HOMOSCEDASTICITY PLOT for details.
c) Added several options to the BLOCK PLOT. Enter HELP BLOCK PLOT
for details.
d) Added the command
FRECHET PLOT Y
This is similar to a Weibull plot, but it fits a 2-parameter
Frechet (maximum case) rather than the 2-parameter Weibull.
e) Several enhancements were made to the I PLOT command.
i) Added the options
MEAN I PLOT Y X
MIDMEAN I PLOT Y X
MIDRANGE I PLOT Y X
TRIMMED MEAN I PLOT Y X
BIWEIGHT I PLOT Y X
MEAN CONFIDENCE LIMIT PLOT Y X
MEDIAN CONFIDENCE LIMIT PLOT Y X
QUANTILE CONFIDENCE LIMIT PLOT Y X
BIWEIGHT CONFIDENCE LIMIT PLOT Y X
TRIMMED MEAN PLOT CONFIDENCE LIMIT PLOT Y X
ONE STANDARD ERROR PLOT Y X
TWO STANDARD ERROR PLOT Y X
ONE STANDARD DEVIATION PLOT Y X
TWO STANDARD DEVIATION PLOT Y X
NORMAL TOLERANCE LIMIT PLOT Y X
NORMAL PREDICTION LIMIT PLOT Y X
AGRESTI COULL LIMIT PLOT Y X
ii) Added the REPLICATION option
REPLICATED I PLOT Y X1 .... XK
where there can be from one to six replication variables.
In addition, there is a special replication syntax when
there are exactly two replication variables
I PLOT Y X1 X2
(that is, omit the REPLICATED keyword).
Enter HELP I PLOT for a description of using replication
variables.
f) Continued to add support for the MULTIPLE, REPLICATION, and
HIGHLIGHT options and support for MATRIX arguments. Specifically,
i) Support for MULTIPLE option:
ANOP PLOT, I PLOT, INFLUENCE CURVE, PERCENT POINT PLOT,
RUN SEQUENCE PLOT, VIOLIN PLOT
ii) Support for REPLICATION option:
I PLOT, PERCENT POINT PLOT, RUN SEQUENCE PLOT, VIOLIN PLOT
iii) Support for HIGHLIGHT option:
BIHISTOGRAM, LAG PLOT, NORMAL PLOT, PERCENT POINT PLOT,
QUANTILE-QUANTILE PLOT, RUN SEQUENCE PLOT, SHIFT PLOT,
TUKEY MEAN DIFFERENCE PLOT, WEIBULL PLOT, 4-PLOT
iv) Support for matrix arguments (not supported when the
REPLICATION option is used):
ANOP PLOT, BIHISTOGRAM, COMPLEX DEMODULATION, DUANE PLOT,
I PLOT, INFLUENCE CURVE, LAG PLOT, NORMAL PLOT,
PARALLEL COORDINATES PLOT, PERCENT POINT PLOT,
QUANTILE-QUANTILE PLOT, RUN SEQUENCE PLOT, SHIFT PLOT,
STEM AND LEAF PLOT, TUKEY MEAN DIFFERENCE PLOT,
VIOLIN PLOT, WEIBULL PLOT, 4-PLOT
Note that matrix arguments will be converted to a variable
in a column-wise fashion. So the number of rows times the
number of columns must be less than the maximum number of
rows for a variable (this is set to 1,000,000 on most
systems, but it may vary from this).
5) The following updates were made to the Analysis commands.
a) The following non-parametric tests were added:
COX STUART TEST Y - sign test for trend
KLOTZ TEST Y1 Y2 - two-sample test for equal variances
MEDIAN TEST Y X - test for equal medians for k groups
SQUARED RANKS TEST Y X - test for equal variances for k groups
FISHER TWO SAMPLE RAND - two sample Fisher randomization test
TEST Y1 Y2 for equal location
PAGE TEST Y X1 X2 - Page test for two factor ANOVA
QUADE TEST Y X1 X2 - Quade test for two factor ANOVA
KENDALL TAU INDEPENDENCE TEST Y1 Y2 - two sample
independence test
RANK CORRELATION INDEPENDENCE TEST Y1 Y2 - two sample
independence test
b) Added the command
PREDICTION LIMITS - prediction limits for the
mean of new observations
PREDICTION BOUNDS - prediction limits to cover
all new observations
SD CONFIDENCE LIMITS - confidence limits for the
standard deviation
SD PREDICTION LIMITS - prediction limits for the
standard deviation
CORRELATION CONFIDENCE LIMITS - confidence limits for the
correlation coefficient based on
Fisher's normal approximation
c) Added the commands
JARQUE BERA NORMALITY TEST - perform a Jarque-Bera test for
normality
POISSON DISPERSION TEST - perform the Poisson dispersion
test for Poissonality
MEAN SUCCESSIVE DIFF TEST - perform a mean successive
differences test for randomness
BEST DISTRIBUTIONAL FIT Y - search for best fitting distribution
(univariate data, continuous
distributions, no censoring)
MCCOOL WEIBULL LOCATION TEST - test for samples of size 10 to 100
to distinguish between a
3-parameter and a 2-parameter
Weibull distribution
e) The REPLICATED and MULTIPLE options, support for matrix arguments
and support for the TO syntax were added to additional analysis
commands. In addition, the output was reformatted for many of these
commands. Specifically,
i) Support for MULTIPLE option:
ABASIS, ANDERSON-DARLING K-SAMPLE, BARTLETT TEST, BBASIS,
CAPABILITY ANALYSIS, CHI-SQUARE SD TEST, CUMULATIVE SUM,
F LOCATION TEST, F TEST, FREQUENCY TEST, GOODNESS OF FIT,
KOLM SMIR TWO SAMPLE TEST, KRUSKAL WALLIS, LEVENE TEST,
LJUNG BOX TEST, MANN WHITNEY RANK SUM TEST, RUNS, SIGN TEST,
SUMMARY, T TEST, TOLERANCE LIMITS, TWO SAMPLE CHI-SQUARE,
VAN DER WAERDEN, WILCOXON SIGNED RANK TEST, WILK-SHAPIRO
The interpretation of the MULTIPLE option depends on the
data expected.
a) When a single response variable is expected (e.g., the
SUMMARY command), the MULTIPLE option means the test will
be applied to each response variable independently.
b) When two variables are expected where the first variable
is the response variable and the second variable is a
group-id variable (e.g., the KRUSKAL WALLIS TEST), the
MULTIPLE option means that each variable is treated as
a distinct group, no group-id variable is entered, and a
single test is performed.
c) When two response variables are expected (e.g., the
F TEST), the MULTIPLE option will perform the test on all
the pairwise combinations of response variables. That is
F TEST Y1 TO Y4
is equivalent to entering
F TEST Y1 Y2
F TEST Y1 Y3
F TEST Y1 Y4
F TEST Y2 Y3
F TEST Y2 Y4
F TEST Y3 Y4
ii) Support for REPLICATION option:
ABASIS, BBASIS, CAPABILITY ANALYSIS, CUMULATIVE SUM,
FREQUENCY TEST, GOODNESS OF FIT, LJUNG-BOX, RUNS, SUMMARY,
TOLERANCE LIMITS, WILK-SHAPIRO
iii) Support for matrix arguments:
All of the commands listed above for the MULTIPLE option
now support matrix arguments. The following additional
commands also support matrix arguments
BINOMIAL PROPORTION TEST,
DIFFERENCE OF PROPORTION CONFIDENCE LIMITS,
PROPORTION CONFIDENCE LIMITS
Matrix arguments are not supported when the REPLICATION
option is used.
iv) Support for RTF formatted output has been extended to
all cases where previously only HTML/LATEX formatted
output was supported. The HTML/LATEX/RTF support was
extended to a number of additional commands.
v) Many of the commands have reformatted the output for better
clarity and readability. These are not listed individually.
6) The following miscellaneous commands were added.
a) Added the command PWD to retrieve the current working
directory.
b) Added the command PSVIEW. This command will preview the
current plot (i.e., the dppl2f.dat file) with a Postscript
viewer. You can specify the program to use as the Postscript
viewer with the command
SET POSTSCRIPT VIEWER /usr/bin/evince
The default is Ghostview for both Windows and Linux/Unix.
c) Added the following option to the CAPTURE command:
CAPTURE SCRIPT <filename>
This option saves the subsequent commands to a file
without executing them. The intended purpose of this
is to allow scripts for external programs (e.g., Python,
Perl, and so on) to be created within a Dataplot
macro. You can subsequently use the SYSTEM command to
execute the script.
d) Dataplot runs Ghostscript behind the scenes for several
commands. For Windows platforms, version 9.x of Ghostscript
provides both 32-bit executables and 64-bit executables. The
following command was added to allow you to specify which
version of Ghostscript is installed on your system
SET GHOSTSCRIPT VERSION <32/64>
This command is only applicable on Windows platforms.
The default is "64". This command has been added to the
DPLOGF.TEX file that is installed by the Dataplot Windows
installation. If you have the 32-bit version of Ghostscript
installed, it is recommended that you modify the DPLOGF.TEX
file.
This command is ignored for Unix/Linux and Mac OS X platforms.
7) The following colors were added:
R0 - R255 - turns on RED with an intensity level from 0
to 255
Z0 - Z255 - turns on GREEN with an intensity level from 0
to 255
B0 - B255 - turns on BLUE with an intensity level from 0
to 255
Note the "Z" was used for GREEN because Gxxx is already used
for gray scale colors.
For devices that don't support full RGB colors, these will
be mapped to RED, GREEN, and BLUE.
8) A number of bugs have been fixed.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
August 2009 - November 2010.
-----------------------------------------------------------------------
1) The following enhancements were made to the graphics commands.
a) Several features are being developed for general implementation
for the graphics commands. These will be phased in over
the next several releases. Implementation of each of these
features will be considered for each of the graphics commands
on a case by case basis.
i) REPLICATION - for this option, one or more group-id
variables can be specified. These group-id variables
are cross tabulated and the plot is generated for
each combination of the cross tabulated values.
The LINE and CHARACTERS commands (and associated attribute
setting commands) can be used to distinguish the
various curves.
ii) MULTIPLE - many Dataplot commands expect syntax like
BOX PLOT Y X
where Y is the response variable and X is a group-id
variable.
In many cases, the groups may be in separate
columns. The MULTIPLE option will support the
following syntax
MULTIPLE BOX PLOT Y1 Y2 Y3 Y4
Although you can use the LET ... = STACK command to
put the data in the Y X form, the MULTIPLE option
makes that step unnecessary.
For commands that except a single response (e.g.,
BOX COX NORMALITY PLOT), the MULTIPLE option can be
used to overlay several curves on the same plot.
The MULTIPLE option cannot be used with the
REPLICATION option.
iii) SUBSET (highlighting) - for many plots, it may be
useful to highlight certain points. The SUBSET option
typically specifies a group variable. Based on this
group variable, you can use the LINE and CHARACTER
(and related) commands to highlight certain points.
For example, the highlighted points might be drawn
in a different color.
Although this command is similar to REPLICATION, it
is different. For example, if you use REPLICATION to
define two groups for a normal probability plot,
two distinct probability plots are generated. On the
other hand, if you use SUBSET to define two groups for
the normal probability plot, there is only one probability
plot generated. However, the two groups can be plotted
with different attributes.
iv) TO syntax - several Dataplot commands support a TO syntax.
For example, READ X1 TO X4 is equivalent to
READ X1 X2 X3 X4. This syntax will be extended to more
commands.
v) Dataplot supports a matrix type. Previously, matrix
arguments were restricted to commands that specifically
operated on matrices. Commands that expect a
univariate argument or that support the MULTIPLE
option are good candidates for adding support for
matrix arguments. Matrix arguments will not be
supported for the REPLICATION option or for the case
where multiple response variables are expected.
Note that a matrix argument will be treated as a
variable argument. For example, NORMAL PROBABILITY PLOT M
(where M is the name of a matrix) will generate a single
probability plot for all values in the matrix.
b) Added the command
TABULATION PLOT Y X1 ... X4 YLEVEL
This plot is a bit of a mix between a fluctuation plot
and a contour plot. Enter HELP TABULATION PLOT for
details.
Some enhancements were made to the FLUCTUATION PLOT
as well. Enter HELP FLUCTUATION PLOT for details.
c) Added a "BATCH MULTIPLE" option for the STRIP PLOT
command. Enter HELP STRIP PLOT for details.
d) Made several changes to the HISTOGRAM command.
i) By default, Dataplot sets the lower and upper class limits
to xbar -/+ 6*s (with xbar and s denoting the sample mean
and standard deviation), respectively. This can
occassionally result in a few outlying points being excluded
from the histogram. To adjust the lower and upper class
limits so that these outlying points are included, enter the
command
SET HISTOGRAM OUTLIERS ON
To revert to the default, enter
SET HISTOGRAM OUTLIERS OFF
ii) By default, the histogram draws all cells, even those with
zero frequency. To suppress these zero frequency cells,
enter
SET HISTOGRAM EMPTY BINS OFF
To restore the default, enter
SET HISTOGRAM EMPTY BINS ON
iii) Previously, Dataplot only generated histograms for the case
where the bin widths were equal. This has been extended
to the case with unequal bin widths. The syntax is
HISTOGRAM Y XLOW XHIGH
with XLOW containing the values for the lower bin limit
and XHIGH containing the values for the upper bin limit.
iv) Added the following option
SUBSET HISTOGRAM Y X
In this case, X is a group-id variable. This syntax
can be used to highlight the contribution to the
histogram for particular subsets of the data.
v) Fixed a bug in the CUMULATIVE RELATIVE HISTOGRAM
for the AREA case. If SET RELATIVE AREA HISTOGRAM
is set to AREA (the default), relative histograms
are normalized so that the area is equal to 1 and
if it set to PERCENT the sum of the bar heights is
equal to 1. The PERCENT case did not have a bug.
For details, enter HELP HISTOGRAM.
e) Made several enhancements to the FREQUENCY PLOT and
KERNEL DENSITY commands.
i) The SET HISTOGRAM OUTLIER option applies to the
FREQUENCY PLOT.
ii) As with the HISTOGRAM, non-equispaced bins are
supported for the FREQUENCY PLOT:
FREQUENCY PLOT Y XLOW XHIGH
iii) The REPLICATED and MULTIPLE options were added
to these commands. For the REPLICATED case, either
one or two replication variables can be specified.
Support was also added for matrix arguments and
for the TO syntax.
f) Made several enhancements to the BOX PLOT command.
Support was added for the MULTIPLE and REPLICATED
options. Up to six replication variables can be
specified. The word REPLICATION is optional.
For the REPLICATED case, you can control the spacing between
groups. Internally, Dataplot uses the CODE CROSS TABULATE
command to generate a single combined group-id variable. Enter
HELP CODE CROSS TABULATE for details on how to control the
spacing (the SET commands used by CODE CROSS TABULATE are
supported for the BOX PLOT command).
Support was added for matrix arguments for the MULTIPLE case or
for the case where only a single argument is given.
g) Made several enhancements to the
BOX COX NORMALITY PLOT Y
BOX COX HOMGENEITY PLOT Y
BOX COX LINEARITY PLOT Y X
commands.
The REPLICATED option is supported for all 3 plots. Either one or
two replication variables can be supported. The MULTIPLE option is
supported for the BOX COX NORMALITY PLOT.
The BOX COX NORMALITY plot supports matrix arguments for the MULTIPLE
case or for the case where only a single argument is given. The
TO syntax is supported for all of these commands.
h) For the PROBABILITY PLOT, added support for the MULTIPLE and
REPLICATED (for up to 2 replication variables) options,
support for matrix arguments, and support for the
HIGHLIGHT option.
In addition, you can enter the commands
LET PPLOC =
LET PPSCALE =
before entering the PROBABILITY PLOT command. This adds location
and scale parameters to the theoretical distribution. This is
intended for the case where the distribution parameters are
estimated by a non-PPCC method (e.g., maximum likelihood) and
you want to generate the probability plot using the estimated
parameters.
i) For the PPCC PLOT, added support for the MULTIPLE and
REPLICATED (for up to 2 replication variables) options
and support for matrix arguments.
j) The BOOTSTRAP PLOT and JACKNIFE PLOT commands were updated
to include reports in addition to the plots.
i) If the BOOTSTRAP/JACKNIFE plot is applied to a statistic
(e.g., BOOTSTRAP MEAN PLOT Y), the following tables are
generated:
1) An initial summary table.
2) A table containing percent points for the computed
statistic.
3) A table containing percentile confidence limits for
the statistic for various values of alpha.
ii) If the BOOTSTRAP/JACKNIFE plot is applied to a
a distributional fit (e.g., BOOTSTRAP WEIBULL PPCC PLOT Y
or BOOTSTRAP WEIBULL MLE PLOT Y), the following tables are
generated:
1) An initial summary table.
2) A table containing percentile confidence limits for
each of the parameters of the distribution for various
levels of alpha.
3) If the SET MAXIMUM LIKELIHOOD PERCENTILS command was
given, a table containing confidence limits for the
specified percentiles will be generated.
For both cases, the SET WRITE DECIMALS command can be
used to specify the number of decimals to use in the
tables and the CAPTURE HTML, CAPTURE LATEX, and
CAPTURE RTF options are supported.
2) Added or enhanced the following analysis comamnds:
a) Similar to the graphics commands, the REPLICATED and MULTIPLE
options and support for matrix arguments will be added to the
analysis commands over the next several releases on a case by
case basis.
b) The output for the GRUBBS command was modified.
Support was added for the MULTIPLE and REPLICATED options.
Up to six replication variables can be specified.
In addition, the following capability was added:
GRUBB TEST Y LABID
The LABID variable is used for identification purposes
only in the output.
c) In addition, the following new outlier tests were added:
DIXON Y
TIETJEN MOORE Y
EXTREME STUDENTIZED DEVIATE Y
The Dixon test is a small sample test for a single outlier.
The Tietjen-Moore test is an generalization of the Grubbs
test to the case of more than one outlier where the number
of outliers must be specified exactly. The extreme studentized
deviate test is an generalization of the Grubbs test to the
case of more than one outlier where only the upper bound on
the number of outliers must be specified.
These commands support the MULTIPLE and REPLICATED options
in a similar manner as the GRUBBS command.
d) For the following commands
CONFIDENCE LIMITS
BIWEIGHT CONFIDENCE LIMITS
TRIMMED MEAN CONFIDENCE LIMITS
MEDIAN CONFIDENCE LIMITS
QUANTILE CONFIDENCE LIMITS
DIFFERENCE OF MEANS CONFIDENCE LIMITS
added support for the MULTIPLE and REPLICATION options.
In addition, matrix arguments are now supported (except for
the REPLICATION case). The MULTIPLE option is not supported
for the DIFFERENCE OF MEANS CONFIDENCE LIMITS case.
3) Added the following miscellaneous commands:
a) CPU TIME - this command prints the current CPU time
used by the current Dataplot session.
b) The CHARACTER command now accepts up to 16 characters for
the plot symbol. The previous limit was 4 characters.
This capability is most useful for the case where the
CHARACTER command is used to label specific points. In
particular, it can be useful for the CHARACTER AUTOMATIC
command.
c) The command
CALL filename.dp
can now also be run by entering
filename.dp
That is, the CALL is optional.
4) Significant restructing is being performed for the
goodness of fit and goodness of fit plots for probability
distributions. Much of this change is to reduce duplicate code,
to make various goodness of fit commands more consistent,
and to make some planned future updates easier to implement.
Although much of this change is primarily internal and should
be transparent to users, the following updates were made.
a) The Anderson-Darling option was added as an alternative
to the PPCC PLOT and KS PLOT:
ANDERSON DARLING PLOT Y
The Anderson-Darling is currently supported for ungrouped
and uncensored data (i.e., the same as the KS test).
b) The Anderson-Darling syntax was changed to
ANDERSON DARLING GOODNESS OF FIT Y
c) The output format for the Anderson-Darling, Kolmogorov-Smirnov,
and chi-square goodness of fit was modified.
d) The following command was added
PPCC GOODNESS OF FIT Y
This is currently supported for the raw data case (i.e.,
ungrouped data) without censoring. Currently, distributions
with more than one shape parameter are not supported.
e) The Anderson-Darling, Kolmogorov-Smirnov, and PPCC
goodness of fit commands were updated to generate appropriate
critical values dynamically via Monte Carlo simulation. A
few comments on this.
i) There are 2 distinct cases. In the first case, we
assume the distribution parameters are known. This
is referred to as the "fully specified" case. In this
case, the simulations are performed using the specified
parameters.
In the second case, the distribution parameters are
estimated from the data. For this case, the
distribtuion parameters are estimated for each
Monte Carlo sample using maximum likelihood (for the
PPCC case, the PPCC plot is used instead of maximum
likelihood).
The following command is used to specify which method
is used for the Monte Carlo simulations
SET GOODNESS OF FIT
ii) Although the second case (i.e., estimate the parameters
from the data) is the more realistic case, Dataplot
does not support this for all distributions. For the
K-S and Anderson-Darling cases, a maximum likelihood
estimation needs to be performed for each set of
simulated values. So if Dataplot supports maximum
likelihood estimation for the specified distribution,
then the "ESTIMATE" option is likely to be supported.
The PPCC option is limited to distributions where
there is at most one shape parameter.
iii) Published tabes are available for a number of
distributions for the Anderson-Darling and for
the fully specified case for the Kolmogorov-Smirnov
cases. The advantage of using the published tables
is speed since the simulation step does not need to
be performed. Simulation allows the Anderson-Darling
and Kolmogorov-Smirnov critical values to be generated
for cases where published tables are not available and
also permits p-values to be returned. Note that the
critical values returned by Dataplot simulations may
differ slightly from the published values due to a
different random number generator being used.
To specify whether "tabled" critical values or
similated critical values will be used, enter
SET KOLMOGOROV SMIRNOV CRITICAL VALUE
SET ANDERSON DARLING CRITICAL VALUE
f) The maximum likelihood estimation for the parameters
of a distribution routines are being reviewed and updated.
Some of the change is cosmetic (i.e., more consistent
appearance), but in some cases the fitting algorithms are
being improved.
We are also reviewing the computational algorithms for some
of the probability distributions.
g) Made the following enhancements to the CONSENSUS MEANS command.
i) Added Horn-Horn-Duncan and MINMAX estimates for the standard
error to the DerSimonian Laird estimate.
ii) Many of the methods require that the standard deviaitions
be positive for all labs. Zero standard deviations can
result when a lab has a single observation or when all
observations are equal. Previously, Dataplot omitted
all labs with zero standard deviations from the analysis
(they were included in the initial summary table).
However, some of the methods (specifically, the grand mean,
mean of means, BOB, Bayesian Consensus Mean, and generalized
confidence interval methods) can handle these methods.
iii) The following consensus mean statistics can be computed
LET A = SUMMARY DERSIMONIAN LAIRD MEAN SD N
LET A = SUMMARY DERSIMONIAN LAIRD STANDARD ERROR MEAN SD N
LET A = SUMMARY DERSIMONIAN LAIRD HHD MEAN SD N
LET A = SUMMARY DERSIMONIAN LAIRD MINMAX MEAN SD N
LET A = SUMMARY MANDEL PAULE MEAN SD N
LET A = SUMMARY MANDEL PAULE STANDARD ERROR MEAN SD N
LET A = SUMMARY MODIFIED MANDEL PAULE MEAN SD N
LET A = SUMMARY MODIFIED MANDEL PAULE STANDARD ERROR MEAN SD N
LET A = SUMMARY VANGEL RUKHIN MEAN SD N
LET A = SUMMARY VANGEL RUKHIN STANDARD ERROR MEAN SD N
LET A = SUMMARY GENERALIZED CONFIDENCE INTERVAL MEAN SD N
LET A = SUMMARY GENERALIZED CONFIDENCE INTERVAL STANDARD ERROR
MEAN SD N
LET A = SUMMARY BOB MEAN SD N
LET A = SUMMARY BOB STANDARD ERROR MEAN SD N
LET A = SUMMARY BCP MEAN SD N
LET A = SUMMARY BCP STANDARD ERROR MEAN SD N
LET A = SUMMARY MEAN OF MEANS MEAN SD N
LET A = SUMMARY MEAN OF MEANS STANDARD ERROR MEAN SD N
LET A = SUMMARY FAIRWEATHER MEAN SD N
LET A = SUMMARY FAIRWEATHER STANDARD ERROR MEAN SD N
LET A = SUMMARY SCHILLER-EBERHARDT MEAN SD N
LET A = SUMMARY SCHILLER-EBERHARDT STANDARD ERROR MEAN SD N
LET A = SUMMARY GRAYBILL DEAL MEAN SD N
LET A = SUMMARY GRAYBILL DEAL SINHA STANDARD ERROR MEAN SD N
LET A = SUMMARY GRAYBILL DEAL NAIVE STANDARD ERROR MEAN SD N
LET A = SUMMARY GRAYBILL DEAL ZHANG ONE STANDARD ERROR MEAN SD N
LET A = SUMMARY GRAYBILL DEAL ZHANG TWO STANDARD ERROR MEAN SD N
LET A = DERSIMONIAN LAIRDY X
LET A = DERSIMONIAN LAIRD STANDARD ERRORY X
LET A = DERSIMONIAN LAIRD HHDY X
LET A = DERSIMONIAN LAIRD MINMAXY X
LET A = MANDEL PAULEY X
LET A = MANDEL PAULE STANDARD ERRORY X
LET A = MODIFIED MANDEL PAULEY X
LET A = MODIFIED MANDEL PAULE STANDARD ERRORY X
LET A = VANGEL RUKHINY X
LET A = VANGEL RUKHIN STANDARD ERRORY X
LET A = GENERALIZED CONFIDENCE INTERVALY X
LET A = GENERALIZED CONFIDENCE INTERVAL STANDARD ERRORY X
LET A = BOBY X
LET A = BOB STANDARD ERRORY X
LET A = BCPY X
LET A = BCP STANDARD ERRORY X
LET A = MEAN OF MEANSY X
LET A = MEAN OF MEANS STANDARD ERRORY X
LET A = FAIRWEATHERY X
LET A = FAIRWEATHER STANDARD ERRORY X
LET A = SCHILLER-EBERHARDTY X
LET A = SCHILLER-EBERHARDT STANDARD ERRORY X
LET A = GRAYBILL DEALY X
LET A = GRAYBILL DEAL SINHA STANDARD ERRORY X
LET A = GRAYBILL DEAL NAIVE STANDARD ERRORY X
LET A = GRAYBILL DEAL ZHANG ONE STANDARD ERRORY X
LET A = GRAYBILL DEAL ZHANG TWO STANDARD ERRORY X
The SUMMARY version of the these commands uses the summary
statistics (means, standard deviations, sample size) while
the other cases expect a response variable and a lab-id variable.
These statistics can also be used with the various commands that
support more than one response variable. Enter HELP STATISTICS
for details.
5) The following statistics are now supported
LET A = LOCATION Y
LET A = SCALE Y
LET A = DIFFERENCE OF LOCATION Y
LET A = DIFFERENCE OF SCALE Y
LET A = TIETJEN MOORE TEST Y
LET A = TIETJEN MOORE MINIMUM TEST Y
LET A = TIETJEN MOORE MAXIMUM TEST Y
LET A = DIXON TEST Y
LET A = DIXON MINIMUM TEST Y
LET A = DIXON MAXIMUM TEST Y
LET A = EXTREME STUDENTIZED DEVIATE TEST Y
LET A = BINOMIAL RATIO NSUCC NTRIAL
LET A = ROOT MEAN SQUARE ERROR Y
LET A = DIFFERENCE OF ROOT MEAN SQUARE ERROR Y
6) The following LET sub-commands were added:
LET LOWLIM UPPLIM = BINOMIAL RATIO CONFIDENCE LIMITS
P1 N1 P2 N2 ALPHA
LET XCODE = CODE CROSS TABULATE X1 X2
LET XCODE = CODE CROSS TABULATE X1 X2 X3
LET XCODE = CODE CROSS TABULATE X1 X2 X3 X4
LET XCODE = CODE CROSS TABULATE X1 X2 X3 X4 X5
LET XCODE = CODE CROSS TABULATE X1 X2 X3 X4 X5 X6
LET AO A0SD A1 A1SD = MATRIX FIT M X
LET Y = RANK INDEX X
LET Y = COMBINE X1 ... XK
LET M = VARIABLE TO MATRIX M NROW
LET S = STRING WORD IWORD
LET NWORD = NUMBER OF WORDS S
LET YOUT = MOVING Y1 ... YK
7) The SEQUENCE command was updated to allow variables for
the arguments on the right hand side of the equal sign
(previously these were restricted to parameters or constants).
For example,
LET REP = DATA 3 3 2 3
LET Y = SEQUENCE 1 REP 1 5
would generate
1 1 1 2 2 2 3 3 4 4 4 5 5 5
You can combine any mixture of constants, parameters, or
variables for these arguments. However, if more than one
is a variable, all of these variables must be of the same
length.
In addition, the following syntax is now supported
LET YVAL = DATA 1 2 3
LET REP = DATA 3 2 5
LET Y = SEQUENCE YVAL REP
would generate
1 1 1 2 2 3 3 3 3 3
8) The Windows version was upgraded to use version 11 of the
Intel compiler (previously version 9 was used). A number of
Fortran 77 constructs are no longer supported by this version
of the compiler. A large number of coding changes were made
to make the source code compatible with version 11. These
should be transparent (i.e., no change in how commands are
used, although a number of potential bugs were corrected) to
users.
9) Following bug fixes:
a) Fixed MOVE RELATIVE command.
b) The standard deviation for the location parameter from
linear fits (FIT Y X, QUADRATIC FIT Y X, etc.) was
corrected.
c) Corrected a bug when using the SET CONVERT CHARACTER command.
d) Corrected the MEDIAN CONFIDENCE LIMITS for the Maritz-Jarrett
method.
e) A number of other miscellaneous bug fixes were made.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2008 - July 2009.
-----------------------------------------------------------------------
1) For Unix/Linux platforms and the gfortran compiler, added the
command
SET PROMPT ADVANCE
This controls whether the Dataplot prompt appears as
(for the OFF case)
>
enter new command here
or (for the ON case)
>enter new command here
Although the ON case is preferred when running the command
line version, it causes problems when running the GUI version
with Tcl/Tk only (i.e., not using Expect). For this reason,
OFF is the default. If you typically only run the command
line version, you may want to add a SET PROMPT ADVANCE ON
command to your dplogf.tex file (although you will need to
comment this line out if you want to run the GUI). Alternatively,
you can enter the command manually when you initiate Dataplot.
2) The following enhancements were made to the READ/WRITE commands.
a) When reading numeric fields, Dataplot will check for
the string
NaN
The NaN string is used to denote "not a number" on some
systems. If Dataplot encounters this string, it will
insert the missing value. This can be set with the
command (the default is 0)
SET READ MISSING VALUE <value>
The search for NaN is not case sensitive.
b) Many programs will have a specific alpha string to denote
a missing value. You can set a character string that
denotes a missing value with the command
SET DATA MISSING VALUE <value>
Currently, a maximum of 4 characters is allowed for the
missing value string.
When reading numeric fields, Dataplot will check for the
missing value string (if specified). If Dataplot encounters
this string, it will insert the missing value. This can be set
with the command (the default is 0)
SET READ MISSING VALUE <value>
The search for the missing value string is not case sensitive.
c) The maximum number of characters allowed for a single command
line was increased to 255. This applies to both reading
commands from the terminal and reading commands from a file.
In addition, you can now enter multiple continuation lines
(previously, only a single continuation line was allowed).
However, the combined command still has a maximum limit of
255 characters.
d) On Linux/Unix platforms, support has been added for the
GNU readline library. The readline library allows command
line editing and history recall. To use the readline
capability, enter the command
SET READ LINE ON
Note that this capability only applies to commands that
are entered from the terminal.
The editing/history capabilities supported by readline are
documented at the readline web site:
http://tiswww.case.edu/php/chet/readline/rltop.html
Dataplot requires version 6.x of the readline library.
This is the current production version, although many
systems may still be using version 5.2.
e) Added the commands
READ MATRIX TO VARIABLES FILE.DAT Z ROWID COLID
In many cases
Whether you read a matrix as a matrix or as
f) Added the commands
READ STACKED VARIABLES FILE.DAT Z GROUPID
g) Added the commands
READ IMAGE TO VARIABLES FILE.DAT RED GREEN BLUE ROWID COLID
READ IMAGE TO VARIABLES FILE.DAT GREY ROWID COLID
READ IMAGE FILE.DAT GREY
READ IMAGE RED FILE.DAT RED
READ IMAGE GREEN FILE.DAT GREEN
READ IMAGE BLUE FILE.DAT BLUE
These commands allow you to read image data into Dataplot.
The GD library is used to read the images in the following
format:
1) jpeg
2) png
3) gif
Note: This update is not currently available for the
Windows implementation.
h) The WRITE RTF command will print variables in an RTF
table (support was previously added for WRITE LATEX and
WRITE HTML). Note that HTML format is restricted to
15 variables or less and LATEX and RTF format is
restricted to 7 variables or less.
i) Added the command
TABLE WIDTH <totwidth> <nright>
where <totwidth> and <nright> are variables that specify
the total width of the field and the number of digits to
the right of the decimal point. Row one applies to
variable one, row two applies to variable two, and so on.
This is an alternative to the SET WRITE DECIMALS and
SET WRITE FORMAT commands. The SET WRITE DECIMALS command
requires that all variables be printed with the same
format. Although the SET WRITE FORMAT allows more flexibility,
it cannot be used for WRITE <RTF/LATEX/HMTL>.
Up to 200 rows can be specified (if the number of variables
being printed is greater than 200, it is recommended that
you use the SET WRITE FORMAT command).
A few comments on what can be specified for <ntot> and
<nright> If NTOT and NRIGHT are the values for a given
row, then the following apply:
1) A value of -99 indicates that the default value
should be used (this is 15 for NTOT and 7 for NWIDTH).
2) If NRIGHT is a positive integer, then Fortran F format
will be used (e.g., "3.26").
3) If NRIGHT is 0, then an integer format will be used.
4) If NRIGHT is -2, then a G15.7 format will be used.
In this case, the Fortran compiler will decide between
an F-based format or an E-based format depending on
the particular number. If NRIGHT is between -3 and
-20, then a Fortran E-based format (Eyy.xx where the
absolute value of NRIGHT specifies the "xx") will be
used.
3) Added the following graphics commands.
a) Added the command
DISCRETE CONTOUR PLOT Z ROW COL Z0
b) Added the commands
IMAGE PLOT GREY ROWID COLID
IMAGE PLOT RED GREEN BLUE ROWID COLID
This command allows you to render images. The ability to
support image plots is dependent on the capabilities of
the specific graphics device and is currently supported
on the following devices:
1) Quickwin - i.e., the command line version of
Dataplot under Microsoft Windows
2) X11 - currently, only grey scale is supported.
3) GD - the GD device is used to generate jpeg,
PNG, and gif format files
4) Postscript
c) Added the command
FLUCTUATION <stat> PLOT Y X1 X2
FLUCTUATION <stat> PLOT Y X1 X2 X3
FLUCTUATION <stat> PLOT Y X1 X2 X3 X4
FLUCTUATION <stat> PLOT Y X1 X2 X3 X4 X5
FLUCTUATION <stat> PLOT Y X1 X2 X3 X4 X5 X6
to generate a fluctuation plot (this is a variant
of the mosaic plot). Enter HELP FLUCTUATION PLOT for
details.
d) Added the command
STRIP PLOT Y
STRIP PLOT Y2 X2
BATCH STRIP PLOT Y TAG
BATCH STRIP PLOT Y2 X2 TAG
The strip plot is also known as a dot plot. There
are a number of variations of this plot. Enter
HELP STRIP PLOT for details.
e) The <stat> PLOT command was updated to support multiple
response variables. For example,
MEAN PLOT Y1 TO Y4 X
That is, for each distinct value of X, there are now
4 means plotted instead of just one.
The following commands can be used to control the
appearance of the plot:
SET STATISTIC PLOT FORMAT <DEX/OVERLAY>
SET STATISTIC PLOT SUMMARY <VARIABLE/GROUP>
If the FORMAT option is set to OVERLAY and the SUMMARY option
is set to VARIABLE, this is equivalent to the following:
YLIMITS ...
PRE-ERASE OFF
ERASE
MEAN PLOT Y1 X
MEAN PLOT Y2 X
MEAN PLOT Y3 X
MEAN PLOT Y4 X
PRE-ERASE ON
That is, there will be a curve corresponding to each
response variable and there will be a reference line
corresponding to each variable.
If the FORMAT option is set to DEX, then this plot uses a
format similar to the DEX <stat> PLOT command. That is, for
each distinct value of X, there will be curve connecting the
mean values for the 4 response variables.
If the SUMMARY option is set to GROUP, there will be a single
reference curve. At each distinct value of X, a single overall
mean is computed for all 4 of the response variables.
In addition, the following option is added to this command:
<stat> <zscore/uscore> PLOT
If ZSCORE is given, then a z-score transformation (subtract the
mean and then divide by the standard deviation) is computed
on each response variable first. If USCORE is given, then a
u-score transformation (subtract the minimum and divide by the
range) is computed on each column. Note these z-score and
u-score transformations apply to the entire response variable, not
to each distinct group within the response variable.
4) The following updates were made to the graphics output
devices.
a) For the SVG (Scalable Vector Graphics) device, graphics
elements are now assigned "layers". This may be useful
if you import the SVG graphic into a graphics editing
program (i.e., it may allow individual elements of the
plot to be edited).
b) The GD driver was enhanced to support hardware text (the
previous implementation drew all characters using one
of Dataplot's software fonts).
There are two types of hardware characters supported:
1) The GD library supports 5 built-in fonts: small,
large, mediumbold, tiny, and huge.
These are fixed size fonts.
2) In addition, the GD library supports True Type Fonts
(TTF). This is the font type supported on the
Microsoft Windows operating system. These fonts
are scalable. Although these fonts were originally
developed for Microsoft Windows, they can be used
on Linux/Unix systems as well.
Note that neither Dataplot nor the GD web site
provides any of these fonts. However, there are
a number of web sites that provide these types of
fonts (some are freely downloadable while others
are proprietary).
c) The Postscript driver was updated in the following ways.
1) For presentation and publication graphs, it is desirable
to use the Postscript typeset quality fonts. However,
the use of special characters (with the limited exception
of the SP(), CR(), UC(), and LC() options) has required
the use of one of the software Hershey fonts (e.g.,
SIMPLEX or DUPLEX). The Postscript device was upgraded
to handle most of Dataplot's supported special characters.
Specifically, the following are supported:
i) subscripts and superscripts
ii) Greek characters
iii) A subset of the mathematical symbols and other
special characters. This is based on what is
available in the Postscript symbol font. Note
that there is not a 1-to-1 correspondence between
the characters available in the Postscript symbol
font and the special characters supported by
Dataplot. The following is the list of Dataplot
special characters that will be translated to
equivalent characters in the Postscript symbol font:
INTE(), SUMM(), PROD(), INFI(), DOTP(),
DEL(), DIVI(), LT(), GT(), LTEQ(), GTEQ(),
NOT(), +-(), APPR(), TILD(), EQUI(), VARI(),
CARA(), TIME(), PART(), RADI(), SUBS(),
SUPE(), UNIO(), INTR(), ELEM(), THEX(),
THFO(), RAPO(), LBRA(), RBRA(), LCBR(),
RCBR(), LELB(), RELB(), RARR(), UARR(),
DARR(), VBAR(), HBAR(), DEGR()
The full set of special symbols supported by Dataplot
is documented in chapter 13 of Volume I of the
Reference Manual
http://www.itl.nist.gov/div898/software/dataplot/refman1/
ch13/homepage.pdf
d) Added support for the Unux/Linux libplot library. The libplot
library is part of the "plotutils" package which includes the
plot, tek2plot, pic2plot, plotfont, spline, and ode programs.
This driver may not be available on some platforms.
For further information on using this device driver, enter
HELP LIBPLOT
5) The following updates were made to the analysis commands.
a) The following updates were made to the CROSS TABULATE command:
1) The number of cross-tabulation variables was increased to
six (from two). That is, you can cross-tabulate on a
minimum of 2 variables and a maximum of six variables.
2) The output can be generated in RTF format (support was
previously added for Latex and HTML format). RTF can
be used to import the output into Microsoft Word.
This enhancement was also made to the TABULATE command.
3) You can use the SET WRITE DECIMALS command to specify
the number of digits to the right of the decimal point
in the output.
This enhancement was also made to the TABULATE command.
4) Since there is now a separate CHI-SQUARE INDEPENCE TEST
command, the chi-square test option in the CROSS TABULATE
command has been removed.
5) For the LET <resp> = CROSS TABULATE <stat> ...
command, added the following option:
SET LET CROSS TABULATE <EXPAND/COLLAPSE>
If EXPAND is specified, the number of rows in the
output variable is equal to the number of rows in
the input variables. If COLLAPSE is specified, the
number of rows in the output variable is equal to the
number of distinct cross tabulation cells.
If the COLLAPSE option is used, the following comamnds
may be helpful
LET X1D = CROSS TABULATE GROUP ONE <var-list>
LET X2D = CROSS TABULATE GROUP TWO <var-list>
LET X3D = CROSS TABULATE GROUP THREE <var-list>
LET X4D = CROSS TABULATE GROUP FOUR <var-list>
For example, if you want to cross-tabulate the means
for three classification variables, you could do
something like
LET YMEAN = CROSS TABULATE MEAN Y X1 X2 X3
LET X1D = CROSS TABULATE GROUP ONE X1 X2 X3
LET X2D = CROSS TABULATE GROUP TWO X1 X2 X3
LET X3D = CROSS TABULATE GROUP THREE X1 X2 X3
PRINT YMEAN X1D X2D X3D
b) Added the command
BINOMIAL PROPORTION TEST P1 N1 P2 N2
BINOMIAL PROPORTION TEST Y1 Y2
c) Added the following commands to perform one-sample and
two-sample proficiency analyses based on the ASTM
E2489 - 06 standard:
ONE SAMPLE PROFICIENCY TEST Y LABID
TWO SAMPLE PROFICIENCY TEST Y LABID
6) The following updates were made to the probability
distributions.
a) Added support for the following distributions
1) 3-Parameter Logistic-Exponential
LE3CDF(X,BETA) - cdf function
LE3CHAZ(X,BETA) - cumulative hazard function
LE3HAZ(X,BETA) - hazard function
LE3PDF(X,BETA) - pdf function
LE3PPF(P,BETA) - ppf function
2) Truncated Pareto
TNPCDF(X,GAMMA,A,NU) - cdf function
TNPPDF(X,GAMMA,A,NU) - pdf function
TNPPPF(P,GAMMA,A,NU) - ppf function
3) Brittle Fracture
BFRCDF(X,ALPHA,BETA,R) - cdf function
BFRPDF(X,ALPHA,BETA,R) - pdf function
BFRPPF(P,ALPHA,BETA,R) - ppf function
4) Pearson Type III
PE3CDF(X,GAMMA) - cdf function
PE3PDF(X,GAMMA) - pdf function
PE3PPF(P,GAMMA) - ppf function
5) The Mielke's Beta-Kappa distributtion was
renamed to
MIECDF(X,K,THETA) - cdf function
MIEPDF(X,K,THETA) - pdf function
MIEPPF(P,K,THETA) - ppf function
Note also that BETA parameter was in fact a
scale parameter and is now explicitly treated
as such.
6) Kappa
KAPCDF(X,K,H) - cdf function
KAPPDF(X,K,H) - pdf function
KAPPPF(P,K,H) - ppf function
Note that the Mielke's Beta-Kappa is a
re-parameterized special case of the Kappa
distribution.
b) Added support for maximum likelihood estimation for
the following distributions:
reflected power
Weibull (maximum case)
Frechet (minimum case)
generalized Pareto (minimum case)
generalized extreme value (minimum case)
Kappa
Pearson type III
Note that the Kappa and Pearson type III actually
implement L-moment estimates rather than maximum
likelihood estimates.
7) The following LET commands were added.
a) Two-dimensional convex hulls can be computed
with the command
LET Y2 X2 = 2D CONVEX HULL Y X
b) Two-dimensional minimum spanning trees can be
computed with the commands
LET Y2 X2 TAG = MINIMUM SPANNING TREE Y X
LET Y2 X2 = MINIMUM SPANNING TREE D
The first syntax is used when the input is a set of
vertices. The second syntax is used when the input
is a distance matrix.
c) Two-dimensional spanning forests can be
computed with the commands
LET Y2 X2 TAG = SPANNING FOREST EDGE1 EDGE2 Y X
LET EDGE1 EDGE2 TAG NV = SPANNING FOREST EDGE1 EDGE2 NVERT
The first syntax is most useful when you want to plot
the spanning forest.
d) The command
LET Y2 X2 TAG = EDGES TO VERTICES EDGE1 EDGE2 Y X
can be used to convert a list of edges in a graph
to a list of vertices. This is a convenience command
to make plotting the graph easier.
e) The commands
LET X2 Y2 = SORT2 X Y
LET Z1 Z2 Y2 = SORT3 X1 X2 Y
LET Z1 Z2 Z3 Y2 = SORT4 X1 X2 X3 Y
can be used to sort based on multiple fields.
f) The commands
LET Y = GATHER X INDEX
LET Y = SCATTER X INDEX
can be used to extract (or insert) data from one
array into another based on a variable that contains
row id's.
g) Added the following matrix commands:
The following is used to compute the permanent of a
matrix.
LET MOUT = MATRIX PERMANENT M
The following is used to generate an adjacency matrix
from a list of edges.
LET ADJ = EDGES TO ADJACENCY MATRIX EDGE1 EDGE2 NVERT
The following is used to permute the rows and columns
of a matrix.
LET MOUT = MATRIX RENUMBER M VROW VCOL
The following is used to compute the pseudo inverse
of a matrix.
LET MINV = PSEUDO INVERSE M
The following is used to compute the coordinates for a
biplot.
LET Y X TAG = BIPLOT M
h) Added the following commands:
LET N = <value>
LET Y = RANDOM SUBSET FOR I = 1 1 K
LET N = <value>
LET Y = RANDOM K-SET OF N-SET FOR I = 1 1 K
LET N = <value>
LET Y = RANDOM COMPOSITION FOR I = 1 1 K
LET N = <value>
LET K = <value>
LET Y = RANDOM PARTITION
LET N =
LET Y = RANDOM EQUIVALENCE RELATION
LET N = <value>
LET LAMBDA = DATA <values>
LET Y = RANDOM YOUNG TABLEAUX LAMBDA
LET Y = NEXT SUBSET 0
LET Y = NEXT SUBSET N YPREV
LET Y = NEXT K-SET OF N-SET 0
LET Y = NEXT SUBSET N K YPREV
LET Y = NEXT PERMUTATION 0
LET Y = NEXT PERMUTATION N YPREV
LET Y = NEXT COMPOSITION 0
LET Y = NEXT COMPOSITION N K YPREV
LET Y1 Y2 = NEXT EQUIVALENCE RELATION 0
LET Y1 Y2 = NEXT EQUIVALENCE RELATION N YPREV YREPPREV
LET Y = NEXT YOUNG TABLEAUX N LAMBDA
LET Y = NEXT YOUNG TABLEAUX N LAMBDA Y
LET VAL ROWID = CONVERT YOUNG TABLEAUX Y
LET HOOK = YOUNG TABLEAUX HOOK LENGTH VAL ROWID
LET Y2 X2 = PEAKS OF FREQUENCY TABLE Y
LET AL AU = DIFFERENCE OF PROPORTION CONFIDENCE
LIMITS P1 N1 P2 N2 ALPHA
LET PVAL = DIFFERENCE OF PROPORTION HYPOTHESIS
TEST P1 N1 P2 N2 ALPHA
LET PVAL = DIFFERENCE OF PROPORTION LOWER TAIL
HYPOTHESIS TEST P1 N1 P2 N2 ALPHA
LET PVAL = DIFFERENCE OF PROPORTION UPPER TAIL
HYPOTHESIS TEST P1 N1 P2 N2 ALPHA
i) Added the following commands for working with strings:
LET SOUT = STRING MERGE SORG SADD NSTART
LET SOUT = STRING REPLACE SORG SADD NSTART
LET SOUT = STRING EDIT SORG SOLD SNEW
LET SOUT = STRING CONCATENATE S1 S2 ... SK
LET SOUT = SUBSTRING SORG NSTART NSTOP
LET SOUT = SUBSTRING SORG NSTART NSTOP
LET SOUT = UPPER CASE SORG
LET SOUT = LOWER CASE SORG
LET IVAL = ICHAR SORG
LET NLEN = STRING LENGTH SORG
LET NSTART NSTOP = STRING INDEX SORG SMATCH
j) Added the following command for merging two sets of data
LET ... = MERGE ...
The merge is performed by matching columns in two sets of
data and then carrying other variables of interest. For
details, enter HELP MERGE.
k) Added the following commands for shifting the elements of
a vector up (right) or down (left)
LET Y2 = SHIFT Y NSHIFT
LET Y2 = CIRCULAR SHIFT Y NSHIFT
l) It is sometimes convenient to extract the index of the
minimim, maximum, or extreme value (i.e., the largest
absolute value) of a vector. This can be done with the
following commands
LET A = INDEX MINIMUM Y
LET A = INDEX MAXIMUM Y
LET A = INDEX EXTREME Y
8) The following miscellaneous updates were made.
a) Added SAVE/RESTORE options to the FEEDBACK command.
This is primarily useful for general purpose macros
where you want to use the FEEDBACK OFF command and
you want to restore the setting that was in place
when the macro was called.
9) Added the following command
DIRECTION <HORIZONTAL/VERTICAL>
For the TEXT command when a hardware font is being used,
this specifies whether the text is drawn horizontally
(the default) or vertically.
10) Fixed a number of bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2007 - July 2007.
-----------------------------------------------------------------------
1) We have made the following updates for categorical data
analysis.
There are two basic types of data that the following
commands address.
a) We have two variables,each with n observations, where
the first can have one of r mutually exclusive values
and the second can have one of c mutually exclusive values.
So each observation will fit into exactly one of the
r levels of variable one and exactly one of the c levels
of variable two.
Your data can be either in raw form (two columns of data
each with n rows) or summary form (an rxc table which
will typically be read into Dataplot as a matrix).
Each entry in the summary table is a count of how many
times that particular combination occurred.
b) If each variable can have exactly two outcomes (typically
coded as 1/0), then we have the 2x2 special case. There
are a number of specialized methods for dealing with
this type of data.
For this type of data, the number of observations for
the two variables need not be equal.
Some examples of this type of data are:
i) We have a diagnostic test to detect a disease.
Variable one specifies whether the patient in
fact has the disease (coded as 1) or not (coded
as 0). Variable two specifies whether the test
detected the disease (coded as 1) or not (coded
as 0).
ii) We are testing instruments to determine whether or
not they can detect a particular substance. Variable
one is the ground truth (coded as 1 when the substance
is present and coded as 0 when it is not). Variable
two denotes whether the instrument detected the
substance (1 for detected, 0 for not detected).
The following capabilities have been added to Dataplot
for analyzing these type of data.
a) The following statistical tests were added:
ODDS RATIO INDEPENDENCE TEST N11 N21 N12 N22
ODDS RATIO INDEPENDENCE TEST Y1 Y2
ODDS RATIO INDEPENDENCE TEST M
CHI-SQUARE INDEPENDENCE TEST N11 N21 N12 N22
CHI-SQUARE INDEPENDENCE TEST Y1 Y2
CHI-SQUARE INDEPENDENCE TEST M
FISHER EXACT TEST N11 N21 N12 N22
FISHER EXACT TEST Y1 Y2
FISHER EXACT TEST M
MCNEMAR TEST N11 N21 N12 N22
MCNEMAR TEST Y1 Y2
MCNEMAR TEST M
ODDS RATIO CHI-SQUARE TEST Y1 Y2
ODDS RATIO CHI-SQUARE TEST Y1 Y2 X
ODDS RATIO CHI-SQUARE TEST Y1 X1 Y2 X2
MANTEL-HAENSZEL TEST Y1 Y2
MANTEL-HAENSZEL TEST Y1 Y2 X
MANTEL-HAENSZEL TEST Y1 X1 Y2 X2
b) Added the following statistics:
LET A = ODDS RATIO X1 X2
LET A = ODDS RATIO STANDARD ERROR X1 X2
LET A = LOG ODDS RATIO X1 X2
LET A = LOG ODDS RATIO STANDARD ERROR X1 X2
LET A = RELATIVE RISK X1 X2
LET A = CRAMER CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND CRAMER CONTINGENCY COEFFICIENT M
LET A = PEARSON CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND PEARSON CONTINGENCY COEFFICIENT M
LET A = FALSE POSITIVE Y1 Y2
LET A = FALSE NEGATIVE Y1 Y2
LET A = TRUE POSITIVE Y1 Y2
LET A = TRUE NEGATIVE Y1 Y2
LET A = TEST SENSITIVITY Y1 Y2
LET A = TEST SPECIFICITY Y1 Y2
LET A = POSITIVE PREDICTIVE VALUE Y1 Y2
LET A = NEGATIVE PREDICTIVE VALUE Y1 Y2
These statistics are supported by the following commands:
PLOT
TABULATE
CROSS TABULATE
CROSS TABULATE PLOT
BOOTSTRAP PLOT
JACKNIFE PLOT
c) Added the following graphics:
ROC CURVE Y1 Y2 X - generate a ROC curve
ROSE PLOT Y - generate a rose plot (also
ROSE PLOT Y1 Y2 known as a four-fold plot)
BINARY TABULATION PLOT Y1 Y2 X1 X2
BINARY PLOT Y1 Y2 X1
where is one of:
CORRECT MATCH
FALSE POSITIVE
FALSE NEGATIVE
TRUE POSITIVE
TRUE NEGATIVE
These "binary" plots are used to generate summary
plots of "1/0" type data across groups.
ASSOCIATION PLOT M - generate an association plot
ASSOCIATION PLOT Y1 Y2
ASSOCIATION PLOT N11 N21 N12 N22
SIEVE PLOT M - generate a sieve plot
SIEVE PLOT Y1 Y2
SIEVE PLOT N11 N21 N12 N22
2) We have made the following updates for probability
distributions.
a) Maximum likelihood estimates were added for the
following distributions:
Katz (generates moment estimates)
slash
triangular
four parameter beta (generates moment estimates)
log beta
beta normal
The maximum likelihood for the two-sided power distribution
was generalized to include the lower and upper limit
parameters.
The slash and triangular distributions have also been
added to the BOOTSTRAP/JACKNIFE MLE PLOT command:
BOOTSTRAP TRIANGULAR MLE PLOT Y
JACKNIFE TRIANGULAR MLE PLOT Y
BOOTSTRAP SLASH MLE PLOT Y
JACKNIFE SLASH MLE PLOT Y
The maximum likelihood estimation for the
two-sided power distribution was updated from the
the standard case (lower and upper limits = 0 and 1)
to the general case (lower and upper limits will be
estimated from the data). Also, the ML procedure for
this distribution only applies if the N shape parameter
is > 1.
b) Added the following commands for binomial confidence
intervals:
LET A = EXACT BINOMIAL LOWER BOUND P N ALPHA
LET A = EXACT BINOMIAL UPPER BOUND P N ALPHA
LET ALOW AUPP = AGRESTI COULL LIMITS P N ALPHA
The BINOMIAL MAXIMUM LIKELIHOOD command can generate
these values for raw data. The above LET commands are
useful when you only have summary data (i.e., the p and n).
c) Added the following plots:
POISSON PLOT Y X
GEOMETRIC PLOT Y X
BINOMIAL PLOT Y X
NEGATIVE BINOMIAL PLOT Y X
LOGARITHMIC SERIES PLOT Y X
These plots are alternatives to the PROBABILITY PLOT
command.
ORD PLOT Y
This plot can help distinguish whether a Poisson,
a negative binomial, or a logarithmic series
distribution provides a more appropiate distributional
model for a set of discrete data.
3) Made the following updates to graphics commands.
a) The HISTOGRAM command now accepts a matrix argument.
b) Added the command
BIVARIATE NORMAL TOLERANCE REGION PLOT Y1 Y2 X
4) Added the following statistics:
LET P1 =
LET P2 =
LET A = TRIMMED STANDARD DEVIATION Y
5) Added the following command
SET FATAL ERROR
If an analysis or graphics command returns an error code,
this command tells Dataplot how to respond:
IGNORE - Dataplot will simply continue processing the
next command. This was the behavior before
this command was added and is the default.
TERMINATE - Dataplot will print a message and terminate
immediately.
PROMPT - Dataplot will prompt whether you want to
continue or terminate.
This command was added primarily as a debugging option.
If you are trying to debug a complex macro, it can be helpful
to have Dataplot terminate (or prompt for termination)
in order to locate where the initial error is occurring.
Note that this command is not active if you are running
the Graphical User Interface (GUI) version.
6) A Windows Vista installation is now available.
7) Fixed a number of miscellaneous bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
May 2006 - February 2007.
-----------------------------------------------------------------------
1) The following updates were made for maximum likelihood estimates
for distributions:
a) The negative binomial was updated to distinguish between
two cases: 1) the case where k is assumed known (p is
estimated) and 2) the case where k is assumed unknown.
For case 1), confidence limits for p were added.
b) Maximum likelihood estimates were added for the
following discrete distributions:
zeta
Borel-Tanner
Lagrange-Poisson
lost games
beta-geometric
Polya-Aeppli
generalized logarithmic series
geeta
Consul
quasi binomial type I
generalized lost games
generalized negative binomial
topp and leone
c) The binomial mle was updated in the following ways:
1) For exact intervals, fixed a bug for extreme values
of p and small samples.
2) By default, Dataplot switches from the exact method
to the normal approximation for sample sizes greater
than 30 (Agresti-Coull intervals are always generated).
You can specify the threshold with the command
SET BINOMIAL NORMAL APPROXIMATION THRESHOLD
3) Some analysts prefer to use a continuity correction
(p + 0.5)/(n + 1)
You can specify whether to use the continuity
correction by entering the command
SET BINOMIAL CONTINUITY CORRECTION
The default is OFF.
2) The following distributional updates were made.
a) The YULCDF was updated to use an explicit formula (as
oppossed to direct summation).
b) For the KS PLOT, the location and scale parameters are
estimated via the probability plot. For long-tailed
distributions, more accurate estimates may be obtained
by applying a biweight fit of the probability plot.
To specify this option, enter the command
SET PPCC PLOT LOCATION SCALE BIWEIGHT
To restore the use of the regular least squares
estimates of location and scale, enter
SET PPCC PLOT LOCATION SCALE DEFAULT
c) Added the following new continuous distributions.
1) Asymmetric Log-Laplace
ALDCDF(X,ALPHA,BETA) - cdf function
ALDPDF(X,ALPHA,BETA) - pdf function
ALDPPF(P,ALPHA,BETA) - ppf function
2) Log-Beta
LBECDF(X,ALPHA,BETA,C,D) - cdf function
LBEPDF(X,ALPHA,BETA,C,D) - pdf function
LBEPPF(P,ALPHA,BETA,C,D) - ppf function
3) Topp and Leone
TOPCDF(X,BETA) - cdf function
TOPPDF(X,BETA) - pdf function
TOPPPF(P,BETA) - ppf function
4) Generalized Topp and Leone
GTLCDF(X,ALPHA,BETA) - cdf function
GTLPDF(X,ALPHA,BETA) - pdf function
GTLPPF(P,ALPHA,BETA) - ppf function
5) Reflected Generalized Topp and Leone
RGTCDF(X,ALPHA,BETA) - cdf function
RGTPDF(X,ALPHA,BETA) - pdf function
RGTPPF(P,ALPHA,BETA) - ppf function
6) Wakeby:
WAKCDF(X,BETA,GAMMA,DELTA) - cdf function
WAKPPF(P,BETA,GAMMA,DELTA) - ppf function
d) Added the following new discrete distributions.
1) Beta-Geometric (Waring)
BGECDF(X,ALPHA,BETA) - cdf function
BGEPDF(X,ALPHA,BETA) - pdf function
BGEPPF(X,ALPHA,BETA) - ppf function
2) Beta-Negative Binomial (generalized Waring)
BNBCDF(X,ALPHA,BETA,k) - cdf function
BNBPDF(X,ALPHA,BETA,k) - pdf function
BNBPPF(X,ALPHA,BETA,k) - ppf function
3) Zeta
ZETCDF(X,ALPHA) - cdf function
ZETPDF(X,ALPHA) - pdf function
ZETPPF(X,ALPHA) - ppf function
4) Zipf
ZIPCDF(X,ALPHA,N) - cdf function
ZIPPDF(X,ALPHA,N) - pdf function
ZIPPPF(X,ALPHA,N) - ppf function
5) Borel-Tanner
BTACDF(X,LAMBDA,N) - cdf function
BTAPDF(X,LAMBDA,N) - pdf function
BTAPPF(X,LAMBDA,N) - ppf function
6) Lagrange-Poisson
LPOCDF(X,LAMBDA,THETA) - cdf function
LPOPDF(X,LAMBDA,THETA) - pdf function
LPOPPF(X,LAMBDA,THETA) - ppf function
7) Leads in Coin Tossing (Discrete Arcsine)
LCTCDF(X,N) - cdf function
LCTPDF(X,N) - pdf function
LCTPPF(X,N) - ppf function
8) Classical Matching
MATCDF(X,K) - cdf function
MATPDF(X,K) - pdf function
MATPPF(X,K) - ppf function
9) Polya-Aeppli
PAPCDF(X,THETA,P) - cdf function
PAPPDF(X,THETA,P) - pdf function
PAPPPF(X,THETA,P) - ppf function
10) Generalized Logarithmic Series
GLSCDF(X,THETA,BETA) - cdf function
GLSPDF(X,THETA,BETA) - pdf function
GLSPPF(X,THETA,BETA) - ppf function
11) Geeta
GETCDF(X,THETA,BETA) - cdf function
GETPDF(X,THETA,BETA) - pdf function
GETPPF(X,THETA,BETA) - ppf function
This distribution can also be parameterized with
MU and BETA.
12) Quasi Binomial Type 1
QBICDF(X,P,PHI) - cdf function
QBIPDF(X,P,PHI) - pdf function
QBIPPF(X,P,PHI) - ppf function
13) Generalized Negative Binomial
GNBCDF(X,THETA,BETA,M) - cdf function
GNBPDF(X,THETA,BETA,M) - pdf function
GNBPPF(X,THETA,BETA,M) - ppf function
14) Truncated Generalized Negative Binomial
GNTCDF(X,THETA,BETA,M,N) - cdf function
GNTPDF(X,THETA,BETA,M,N) - pdf function
GNTPPF(X,THETA,BETA,M,N) - ppf function
15) Discrete Weibull
DIWCDF(X,Q,BETA) - cdf function
DIWPDF(X,Q,BETA) - pdf function
DIWPPF(X,Q,BETA) - ppf function
DIWHAZ(X,Q,BETA) - hazard function
16) Consul (a generalized geometric)
CONCDF(X,THETA,M) - cdf function
CONPDF(X,THETA,M) - pdf function
CONPPF(X,THETA,M) - ppf function
17) Lost Games
LOSCDF(X,P,R) - cdf function
LOSPDF(X,P,R) - pdf function
LOSPPF(X,P,R) - ppf function
18) Generalized Lost Games
GLGCDF(X,P,J,A) - cdf function
GLGPDF(X,P,J,A) - pdf function
GLGPPF(X,P,J,A) - ppf function
19) Katz
KATCDF(X,ALPHA,BETA) - cdf function
KATPDF(X,ALPHA,BETA) - pdf function
KATPPF(X,ALPHA,BETA) - ppf function
e) The Waring routines (WARCDF, WARPDF, WARPPF) routines
were re-written to take advantage of their relationship
to the beta-geometric (the Waring is simply a different
parameterization of the beta-geometric). This makes
the Waring routines more computationally efficient and
more accurate.
3) Added the following LET sub-commands.
a) Added the harmonic number and generalized harmonic
number functions:
LET A = HARMNUMB(N)
LET A = HARMNUMB(N,M)
b) For certain types of plots, it can be useful to add a
small bit of random noise to a variable to avoid
overplotting. This is commonly referred to as jittering.
To simplify this, the following command was added:
LET DELTA
LET Y = JITTER X DELTA
The value of DELTA is used to control the magnitude of
the jittering. That is, the value of x(i) will be
changed to a value x(i) + noise where noise is in the
range (-DELTA/2,DELTA/2).
4) Made the following updates to the CONSENSUS MEANS command.
a) If a within-lab standard deviation is zero (i.e., the lab
has only a single unique measurement value), that lab
will be omitted from the analysis (it will be included
in the initial summary table). Previously, Dataplot
treated this as an error and would not run the
consensus means analysis.
b) Added the Fairweather method. There are 3 separate
methods for generating 95% confidence intervals for this
method (the original method proposed by Fairweather,
an improvement suggested by Cox, and a method developed
by Ruhkin). The output for this method is only printed
if the minimum number of oberservations for a lab is
greater than 5.
c) Added the Bayesian Consensus Procedure (BCP) method of
Hagwood and Guthrie. This is a refinement of the BOB
method. For this method, the consensus mean and the
standard deviation of the consensus mean are asymptotically
equivalent to the posterior mean and standard deviation of
a fully Bayesian method.
d) Dataplot currently supports 12 methods. Most users will
only be interested in a subset of these methods. You
can now selectively turn individual methods on or off
(all methods are on by default) with the commands:
SET MANDEL PAULE
SET MODIFIED MANDEL PAULE
SET VANGEL RUHKIN
SET BOB
SET SCHILLER EBERHARDT
SET MEAN OF MEANS
SET GRAND MEAN
SET GRAYBILL DEAL
SET GENERALIZED CONFIDENCE INTERVAL
SET DERSIMONIAN LAIRD
SET FAIRWEATHER
SET BAYESIAN CONSENSUS PROCEDURE
5) The following updates and enhancements were made to
the graphics commands.
a) Added the command:
SET 4-PLOT DISTRIBUTION
The 4-plot by default consists of a run sequence plot,
a lag plot, a histogram, and a normal probability plot.
The above command allows us to replace the normal
probability plot with an exponential probability plot.
This is useful when checking the assumptions for a
Homogeneous Poisson Process (HPP) where we assume the
interarrival times follow an exponential distribution.
b) Added the command:
REPAIR PLOT Y X CENSOR
This is used to plot repair data where we may have
multiple systems and each system may have a single
censoring time (i.e., the time between the last repair
and the end of the test). Enter HELP REPAIR PLOT
for details.
c) Added the command:
MEAN REPAIR FUNCTION PLOT Y X CENSOR
d) Added the command
TRILINEAR PLOT Y1 Y2 Y3
This is used for plots where the rows of Y1, Y2, and
Y3 are mixtures (i.e., they sum to either 1 (or 100
if you are using fractional units)).
6) Updated the RELIABILITY TREND TEST in the following
ways.
a) Fixed a bug in the reverse arrangements test.
b) Modified the output format for better clarity.
c) Added support for multiple systems. For multiple systems,
the tests will be applied to each individual system and
then composite tests will be performed.
d) Added support for HTML, Latex, and RTF format.
7) The following bug fixes were made:
a) The 2 variable case for the chi-square goodness of fit
test for discrete distributions had a bug. This has
been fixed. For older versions, a work around is
SET MINSIZE = 1
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
POISSON CHI-SQUARE GOODNESS OF FIT Y3 XLOW XHIGH
b) Some bugs with LET subcommands and SUBSETTING were
corrected.
c) A bug involving IF statements within nested loops was
corrected.
d) A few other miscellanous bug fixes were made.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
September 2005 - April 2006.
-----------------------------------------------------------------------
1) For many one-factor plots, it is useful to sort the horizontal
axis based on the value of some statistic (most commonly a
location statistic such as the mean, median, minimum, or
maximum). The following commands was added to help generate
these sorted plots:
LET XSORT INDX = SORT BY X GROUPID
For example, to generate a sorted mean plot for variables
Y and X, you would do something like
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT INDX
MEAN PLOT Y X2
This can be used with the following types of plots
i) PLOT Y X
where is a desired statistic (e.g., MEAN or
SD).
ii) BOX PLOT Y X
iii) PLOT Y X GROUP
For details, enter HELP SORT BY STATISTIC.
These plots often have alphabetic tick mark labels. The
following enhancements were made to simplify the use
of alphabetic tick mark labels with sorted plots.
a) The TIC MARK LABEL FORMAT and TIC MARK LABEL CONTENT
commands were previously augmented to allow numeric
variables, group label variables, or the row label
variable as the contents for the tick mark labels.
Specifically,
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG
X1TIC MARK LABEL FROMAT ROW LABELS
This has been enhanced to allow an index variable to
be specified on the above TIC MARK LABEL CONTENT
commands (the index variable is typically generated by
a SORT BY command). The index variable specifies
the order in which the tic mark labels will be generated.
So the above examples can be augmented by
LET X2 INDX = SORT BY MEAN Y X
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB INDX
LET X2 INDX = SORT BY MEAN Y X
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG INDX
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FROMAT ROW LABELS
X1TIC MARK LABEL CONTENT INDX
b) The LET ... = GROUP LABEL .... command was augmented in
the following two ways.
i) You can specify literal strings for group labels.
For example,
LET IG = GROUP LABEL BATCHSP()1 BATCHSP()2 ...
BATCHSP()3 BATCHSP()4
The strings are separated by spaces. If you need to
include a space in a particular string, use the
SP() as in the above example.
ii) Pre-defined strings can be used to define a group
label variable. For example,
LET IG = GROUP LABEL ST1 TO ST10
where ST1, ST2, ...., ST10 are previously defined
strings. The TO syntax is useful in this context
when the number of strings is large.
Dataplot's algorithm for parsing the GROUP LABEL command
is:
i) Dataplot first checks the character variables file
(HELP SET CONVERT CHARACTER for details). If the
first name listed is found, Dataplot uses this
character variable to define the group labels.
ii) If a character variable is not found, Dataplot
checks all the listed names to see if they are
previously defined strings. If they are, then
Dataplot substitutes the values of these strings.
iii) If one or more of the names is not a previously
defined string, then Dataplot treats all of the
names as literal text strings.
2) You can now pass arguments to macros.
To pass arguments to a macro, do something like
CALL SAMPLE.DP arg1 arg2 arg3
Up tp 10 arguments may be passed (although limits on command
line lengths still apply). Arguments containing spaces or
hyphens should be enclosed in quotes. The character limit for
a single argument is 40 characters.
In the SAMPLE.DP macro, if a $1 is encountered, it will be
replaced with "arg1", if a $2 is encountered, it will be
replaced with "arg2" and so on. A $0 will substitute the
number of arguments given on the CALL command.
This substitution will only occur if a command line is contained
within a macro (i.e., if no macro is active, the "$" will not
signal any substitution and it will remain in the command line
as given).
Dataplot currently only supports one level of argument
substitition for macros. That is, the values of the macro
arguments (i.e., the $1, $2, etc.) will contain the values
given by the most recent CALL command that specified at least
one argument. If you need to nest CALL commands with macro
arguments, the recommended work around is to have the
higher level macro extract any macro arguments passed to it
into temporary variables or strings before calling any other
macros. For example, supposse SAMPLE.DP needs to call
SAMPLE2.DP with arguments. You could do something like
the following in SAMPLE.DP:
. Start of SAMPLE.DP macro
let string zzzzs1 = $1
let string zzzzs2 = $2
let string zzzzs3 = $3
...
call sample2.dp newarg1 newarg2
The default character for argument substitution is the
"$". To use a different character, enter the command
MACRO SUBSTITUTION CHARACTER
3) The following enhancements were made to the CAPTURE
command (the CAPTURE command re-directs alphanumeric output
to a file rather than displaying it on the screen).
a) Sometimes it may be useful to have the output sent to
both the screen and to a file. You can do this by
entering the command
CAPTURE SCREEN ON
To restore CAPTURE output only being sent to the
CAPTURE file, enter the command
CAPTURE SCREEN OFF
b) Sometimes it may be useful to selectively send output to
the CAPTURE file. You can do this with the following
commands:
CAPTURE SUSPEND
CAPTURE RESUME
where SUSPEND specifies that output will be sent to the
screen rather than the CAPTURE file (note that the CAPTURE
file remains open) and RESUME will send the output to
the currently open CAPTURE file. You can enter as many
CAPTURE SUSPEND/CAPTURE RESUME sequences as you like
between a CAPTURE/END OF CAPTURE session.
Note that OFF is a synonym for SUSPEND and ON is a
synonym for RESUME.
4) Made the following probability distribution updates:
a) Added confidence intervals for the maximum likelihood
estimates for the geometric distribution.
b) Added confidence intervals for the maximum likelihood
estimates for the Poisson distribution.
c) Added support for the following new probability
distributions:
1) Added the type 2 generalized logistic distribution.
Enter HELP GL2PDF for details.
2) Added the type 3 generalized logistic distribution.
Enter HELP GL3PDF for details.
3) Added the type 4 generalized logistic distribution.
Enter HELP GL4PDF for details.
4) Added the Hosking parameterization of the generalized
logistic distribution. Enter HELP GL5PDF for details.
5) Added the generalzied Tukey-Lambda distribution. Enter
HELP GLDPDF for details.
6) Added the beta-normal distribution. Enter HELP BNOPDF
for details.
7) Added the asymmetric log double exponential (Laplace)
distribution. Enter HELP ALDPDF for details.
5) Added or modified the following analysis comamnds.
a) The Durbin test for identifical effects in a two-way
table for balanced incomplete block designs is supported
with the command
DURBIN TEST Y BLOCK TREATMENT
Enter
HELP DURBIN TEST
for details.
b) The TOLERANCE LIMITS command generates both normal tolerance
limits and non-parametric tolerance limits. You can now
specify only one of these with the commands
NORMAL TOLERANCE LIMITS
NONPARAMETRIC TOLERANCE LIMITS
c) The GRUBS TEST for outlier detection was previously augmented
to generate three distinct tests:
i) a test for both the minimum and maximum points as
outliers.
ii) a test for the minimum points as an outliers.
iii) a test for the maximum points as an outliers.
This has now been modifed into three distinct commands:
GRUBBS TEST Y
GRUBBS MINIMUM TEST Y
GRUBBS MAXIMUM TEST Y
This was done so that the internally saved parameters
(e.g., STATVAL, STATCDF, etc.) will now be correct for
the appropriate test.
d) The CONSENSUS MEANS command was modified in a number of
ways. Specifically,
1) The output format was modified to make it more
consistent and to provide better clarity. In
particular, a clearer distinction is made between
standard uncertainty (the standard error of the
consensus mean), expanded uncertainty (2*standard
error) and expanded uncertainty based on a
normal or t percent point value.
2) Modified the summary tables. There are now 4 summary
tables generated:
i) A summary table of the original data.
ii) A summary table of the 95% confidence limits
generated by each method
iii) A summary table of the standard uncertainties
generated by each method (i.e., the standard
error of the consensus mean estimate)
iv) A summary table of the expanded uncertainties
generated by each method (i.e., the 2 times
the standard error of the consensus mean estimate)
3) Added the following new methods:
i) The Graybill-Deal method now generates confidence
limits using a method proposed by Andrew Rukhin.
It also generates 4 distinct estimates of the
variance of the consensus mean (the Sinha method,
the naive method, and 2 methods proposed by
Nien-Fan Zhang. The commonly used naive method
is know to seriously underestimate the variance
for small sample sizes.
ii) Added the generalized confidence interval method
proposed by Hari Iyer and Jack Wang.
iii) Added the DerSimonian-Laird method.
4) Previous versions of Dataplot allowed you to create
the CONSENSUS MEANS output in HTML format
(CAPTURE HTML FILE.HTM) or Latex format
(CAPTURE LATEX file.tex). This was extended to
include Rich Text Format (RTF). The RTF option
is used for creating output that can be read into
Microsoft Word (RTF is a protocol Microsoft created
for transporting word processing files between
different word processing programs). For example
CAPTURE RTF FILE.RTF
CONSENSUS MEAN Y X
END OF CAPTURE
You can then import FILE.RTF into Word. Note that
although RTF is suppossed to be a portable format,
our experience is that non-Word word processors do a
poor job of importing the Dataplot RTF files (tables
tend to be problamatic for non-Word software and
Dataplot is creating most of its RTF output as tables).
6) The following updates were made to graphics output devices.
a) The GD library, used to generate JPEG and PNG format
graphs, was updated from version 1.84 to 2.033. The
primary consequence of this is that we can now generate
GIF format files as well. To generate GIF files, enter
SET IPL1NA PLOT.GIF
DEVICE 2 GD GIF
b) Dataplot can now generate graphs in Latex format.
The primary motivation for using this format is
to generate publication quaility graphs. There are
some unique features to this device driver that are
described in detail in the HELP LATEX command.
7) The following statistic command was added.
LET A = RATIO Y1 Y2
This statistic is the sum of Y1 divided by the sum of Y2.
The following additional commands are supported:
TABULATE RATIO Y1 Y2 X
CROSS TABULATE RATIO Y1 Y2 X1 X2
RATIO PLOT Y1 Y2 X
RATIO CROSS TABULATE PLOT Y1 Y2 X1 X2
BOOTSTRAP RATIO PLOT Y1 Y2
JACKNIFE RATIO PLOT Y1 Y2
8) The following special function library functions were added:
I0INT - integral of the modified Bessel function of the
first kind and order 0
J0INT - integral of the Bessel function of the first kind
and order 0
K0INT - integral of the modified Bessel function of the
third kind and order 0
Y0INT - integral of the Bessel function of the second kind
and order 0
I0ML0 - difference of the modified Bessel function of the
first kind of order 0 and the modified Struve function
of order 0
I1ML1 - difference of the modified Bessel function of the first
kind of order 1 and the modified Struve function of
order 1
AIRINT - integral of the Airy function Ai
BIRINT - integral of the Airy function Bi
AIRYGI - modified Airy function Gi
AIRYHI - modified Airy function Hi
ATNINT - integral of the inverse-tangent function
9) Added the following LET subcommands:
a) LET Y2 = REPLACE GROUPID GROUP2 Y1
This command does the following:
1) It matches the values in GROUP2 against GROUPID and
returns the indices of the matching rows for the GROUPID
array.
2) The indices are used to access the corresponding value
in the Y1 array.
3) The corresponding row of Y2 is replaced with the Y1
value.
The abbreviated syntax
LET Y2 = REPLACE GROUPID GROUP
simply assigns a value of 1 in the corresponding row of Y2.
Enter HELP REPLACE for details.
b) LET Y2 X2 = MATRIX BIN M
This command is used to generate a frequency table for
the elements in a matrix. This can be used to generate
a histogram of the elements in a matrix. For example,
LET Y2 X2 = MATRIX BIN M
HISTOGRAM Y2 X2
Enter HELP MATRIX BIN for details.
c) LET M = MATRIX TRUNCATION M IVALUE
LET M = MATRIX LOWER TRUNCATION M IVALUE
Set all values in the matrix M that are less than
IVALUE to IVALUE. This command can be used in conjunction
with the MATRIX SUBTRACT command to remove background
values from a matrix. For example, if the background
value is 5, do something like
LET IBACK = 5
LET IZERO = 0
LET M = MATRIX SUBTRACT M IBACK
LET M = MATRIX TRUNCATION M IZERO
Likewise, you can use the following command to perform
an upper truncation:
LET M = MATRIX LOWER TRUNCATION M IVALUE
That is, any values in M greater than IVALUE are set to
IVALUE.
10) The SET HISTOGRAM CLASS WIDTH was previously implemented to
specify different default class width algorithms for
histograms. This command was extended to apply to the
following additional commands:
LET Y2 X2 = BINNED Y
LET Y2 X2 = MATRIX BIN Y
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
CHI-SQUARE GOODNESS OF FIT Y
2 SAMPLE CHI-SQUARE GOODNESS OF FIT Y
11) Added the following command
PROCESS ID
This command will print the process id and save this
process id in the internal parameter PID.
12) Made the following bug fixes.
a) Previously, if all elements of a response variable were
equal, the HISTOGRAM command would print an error message
and not generate the histogram. Dataplot will now
print a warning message, but will generate a histogram
with one non-zero class (it will generate one class above
and one class below with zero count as well).
b) In the TABULATE command, if all elements in the response
variable are identifical, change from an error message to a
warning message and perform the tabulation anyway.
c) Corrected a bug in Friedman's test. The previous version
is correct if the original data is the rank within a block.
The corrected version does not require that the data
already be ranked.
d) The WILK SHAPIRO command was not returning the p-value in
the saved parameter PVALUE correctly. This was corrected.
e) For the command
LET Z2 = BIVARIATE INTERPOLATION Z Y X Y2 X2
the Y and X arguments were in the wrong order (i.e., the
command was interperting Y X as X Y). This was corrected.
f) Fixed bugs in the
LET X = CHARACTER CODE IX1
LET X = ALPHABETIC CHARACTER CODE IX1
commands.
g) The command
LET Y2 XLOW XUPP = COMBINE FREQUENCY TABLE Y X
is used to combine low frequency bins. The original
implementation simply worked from left to right to
combine the bins. Since low frequency bins typically
occur in the left and right tails, the algorithm was
modified to move from the left tail to the center and
then from the right tail to the center.
h) Fixed a bug where the ORIENTATION command could cause
Dataplot to hang on subsequent plots if no DEVICE 2
command was defined and a software font was used to
draw text.
i) Dataplot creates and uses a number of temporary files
in the current directory.
If you have multiple sessions running from the current
directory, this can create a problem for these temporary
files. In most cases, a conflict does not occur because
Dataplot will open the file, read or write to the file,
and then close the file immediately. However, a few
files, such as the plot files dppl1f.dat and dppl2f.dat,
typically remain open. The effect of different Dataplot
sessions trying to access these files is system dependent.
1. On Unix and Windows 98/NT4 platforms, the file will
contain whatever was most recently written to it.
2. On Windows 2000/XP platforms, the Dataplot session
that opens the file first has a "lock" on the file.
This causes any subsequent Dataplot session that tries
to access the file to hang.
This is particularly a problem with the GUI version
on Windows 2000/XP. Specifically, if the Dataplot GUI
does not shut down cleanly, the underlying Dataplot
executable does not get killed. This then causes any
future attempt to open the GUI to hang since the "dead"
Dataplot executable has a lock on the file. You have to
use "Cntrl-Alt-Del" to bring up the Task Manager, select
"Processes", and then manually kill any "DPLAHEY.EXE"
processes in order to clear the dead process.
In particualar, if you close the GUI by clicking the
"x" in the upper right hand corner (rather than clicking
the EXIT menu), this does not kill the underlying
DPLAHEY.EXE process.
As a partial solution to this problem, Dataplot should
now trap this condition. It will print a message
indicating how to clear the "dead" DPLAHEY.EXE process.
In addition, it will do one of two things in the current
Dataplot process:
a. It will attach the process id to the temporary file
name and then re-open the file.
b. It will simply ignore file (so if dppl2f.dat is locked,
Dataplot will not write the current plot to dppl2f.dat
in the current Dataplot session).
You can specify which option Dataplot will use by entering
one of the following commands in your startup file
(c:\Program Files\NIST\DATAPLOT\DPLOGF.TEX):
SET TEMPORARY FILE PID
SET TEMPORARY FILE IGNORE
The default is PID.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - August 2005.
-----------------------------------------------------------------------
1) The following matrix commands were added.
a. The sum of all elements in a matrix can be computed with
the following command
LET A = MATRIX SUM M
b. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case of computing the statistics for the entire matrix
with the command
LET A = MATRIX GRAND M
where denotes the desired the statistic (the list
of supported statistics is the same as for the
MATRIX COLUMN STATISTIC and MATRIX ROW STATISTIC commands.
c. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case where the matrix is divided into equal partitions
with the command
LET MOUT = MATRIX PARTITION M NROW NCOL
with M, NROW, and NCOL denoting the input matrix, the number
of rows in each sub-matrix, and the number of columns in
each sub-matrix, respectively. Note that this command
returns a matrix (MOUT) of values.
That is, the original matrix is divided into sub-matrices
containing NROW rows and NCOL columns each. The partition
starts at row 1 and column 1. The number of rows in MOUT
is determined by dividing the number of rows in M by NROW.
Likewise, the number of columns is determined by dividing
the number of columns in M by NCOL. If this division
does not result in an integer value (e.g., 23 columns
in M and NCOL = 5 results in 3 columns left over), then the
last column, or row, of MOUT will be based on whatever
columns are left over.
In addition, the MATRIX PARTITION command has been extended
to accomodate unequal partitions where the partitions need
not be contiguous.
The syntax in this case is
LET MOUT = MATRIX PARTITION M TAGROW TAGCOL
with M denoting the input matrix. In this case, TAGROW and
TAGCOL are vectors with TAGROW having the same number of rows
as M and TAGCOL having the same number of columns as M.
The elements of TAGROW and TAGCOL identify which partition
each element of M belongs to. The output matrix will be
dimensioned based on the number of distinct values in
TAGROW and TAGCOL.
2) The following commands were added to compute probability
weighted moments and L-moments.
LET P = PROBABILITY WEIGHTED MOMENTS Y
LET L = L MOMENTS Y
3) The following distributional updates were made.
a. Made the following enhancements to the generalized Pareto
maximum likelihood command.
1. L-moment and elemental percentile estimates are now
included. The L-moment estimators are a refinement of
probability weighted moments. The elemental perecentile
method is described in Castillo, Hadi, Balakrishnan, and
Sarabia, "Extreme Value and Related Models with
Applications in Engineering and Science", Wiley, 2005.
One advantage of the elemental percentile approach is that
it does not have the restricted domain for the shape
parameter that the moment and maximum likelihood estimators
have.
2. The elemental percentile estimate is now used as the
starting value for the maximum likelihood. This seems
to improve the convergence of the ML method.
3. The methods used (moments, L-moments, elemental percentiles,
and maximum likelihood) do not estimate a location
parameter.
By default, these methods will now use the minimum data
value (minus an epsilon fudge factor) as the estimate of
location. The data will subtract this value before
applying the estimation procedures.
If you would like to provide your own location estimate,
enter the command
LET THRESHOL =
Any data values less than the value specified for
THRESHOL will be omitted from the estimation. Note that
the generalized Pareto is often used in the context of
modeling the distribution of "points above a threshold",
so specifying a threshold greater than some of the data
points is fairly common.
4. The maximum likelihood estimates now include the normal
approximation confidence intervals for the scale and
shape parameters and, optionally, for select percentiles
of the data.
To specify percentile estimates, enter the command
SET MAXIMUM LIKELIHOOD PERCENTILES
where specifies the name of a variable containing
the desired percentiles. You can specify DEFAULT to
to use a default set of values.
Be aware that for the generalized Pareto maximum
likelihood estimation, a relatively large sample size
may be required for the asymptotic normal approximations
to become reasonably accurate. Some studies have
indicated sample sizes of at least 500 may be required.
b. Added support for the maximum likelihood estimation for
the inverted Weibull distribution:
INVERTED WEIBULL MLE Y
INVERTED WEIBULL MLE Y X
The first syntax supports the full sample case. It will
return confidence intervals for the shape and scale
parameters for various values of alpha (based on the
normal approximations) and will return confidence intervals
for selected percentiles if you have entered a
SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT command.
The second syntax supports the censored case. This case
currently only returns point estimates.
c. The BINOMIAL MLE now returns improved confidence intervals.
d. We have modified the output from a number of the maximum
likelihood commands to make the output more consistent.
3) Made a number of bug fixes. In particular
a. Fixed a bug where the following orm of the DERIVAIVE command
wasn't being recognized:
LET FUNCTION D = DERIVATIVE F WRT X
This syntax should now work.
b. Fixed the DIFFERENCE OF MEANS CONFIDENCE INTERVALS command
(in adding support for the HTML/LATEX output, we had shut
off the standard ASCII output). Fixed the HTML outout
for this command.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - May 2005.
-----------------------------------------------------------------------
1) Distributional Modeling Updates
a. Dataplot provides extensive distributional modeling
capabilities via probability plots and PPCC/KS plots. One
limitation of these methods is that they do not provide
estimates for the uncertainty of the parameter estimates
and for the distribution quantiles.
The BOOTSTRAP ... PLOT command was enhanced to support
distributional modeling for a number of distributions.
This can be used to obtain confidence intervals for the
distribution parameters, for selected percentiles of the
distribution, and for the value of the PPCC (or K-S
statistic).
For details, enter
HELP DISTRIBUTIONAL BOOTSTRAP
b. For the case of one shape parameter, the PPCC plot was
enhanced to support a group option (where group means
multiple batches of data as oppossed to binned data).
In this case, a separate curve is drawn for each batch
of the data. This can be used to check for a common
shape parameter across multiple batches of data. For
details, enter
HELP PPCC PLOT
c. The PPCC PLOT and PROBABILITY PLOT commands support binned
data. Previously, the binning consisted of two variables:
the first contained the bin frequencies and the second
contaned the mid-point of the bins. This form assumes
the bins are of equal width.
Some binned data may contain bins of unequal width. The
most common reason for the this is to combine bins in the
tails which have low frequencies.
The PPCC PLOT and PROBABILITY PLOT commands were updated
to handle this case. In this case, the syntax is
PPCC PLOT Y XLOW XHIGH
PROBABILITY PLOT Y XLOW XHIGH
with Y, XLOW, and XHIGH denoting the frequency variable,
the lower class boundary, and the upper class boundary,
respectively. For details, enter
HELP PPCC PLOT
HELP PROBABILITY PLOT
d. The following enhancenets were made to the maximum
likelihood estimation.
1. Added confidence intervals for the location and scale
parameters for the double exponential case
(DOUBLE EXPONENTIAL MAXIMUM LIKELIHOOD Y).
2. Added a weighted order statistics method to the Cauchy
maximum likelihood estimation (CAUCHY MLE Y). This method
was added because it is the method recommended for the
Cauchy Anderson-Darling test (see D'Agostino and Stephens,
"Goodness-Of-Fit Techniques", Marcel Dekker, 1986, p. 164).
3. Added support for the maximum case of the 2-parameter
extreme value type 2 (Frechet) distribution. This includes
confidence intervals for the estimated parameters and
for select percentiles (see
SET MAXIMUM LIKELIHOOD PERCENTILES).
e. The Anderson-Darling test now supports the extreme value
type 2 (Frechet) for the maximum case and the Cauchy
distribution.
f. Added support for the minimum case for the generalized
extreme value distribution. Added the GEVHAZ and GEVCHAZ
functions to compute the hazard and cumulative hazard
functions for the generalized extreme value distribution.
g. A number of distributions (Weibull, Gumbel, Frechet,
and generalized extreme value) support both a minimum and
a maximum case. The command
SET MINMAX <1/2>
is used to specify which case (1 = minimum, 2 = maximum).
If no MINMAX command is entered, previous versions used
the value 1 as the default (this was chosen since the
minimum case is what is typically used for the Weibull
distribution).
However, for the other distributions, the maximum case
is generally the one most used. For this reason, we
added the value 0 to indicate the default where the default
is now specific to each distribution. For the Weibull, the
default is the minimum and for the Gumbel, Frechet, and
generalized extreme value the default is the maximum.
2) Interlaborartory Analysis Updates
Dataplot added the following commands to perform an
interlaboratory analysis as documented in
"Standard Practice for Conducting an Interlaboratory Study
to Determine the Precision of a Test Method", ASTM
International, 100 Barr Harbor Drive, PO BOX C700,
West Conshohoceken, PA 19428-2959, USA. This document is
in support of ASTM Standard E 691 - 99.
The specific commands added are:
LET A = REPEATABILITY STANDARD DEVIATION Y LABID
LET A = REPRODUCABILITY STANDARD DEVIATION Y LABID
LET H = H CONSISTENCY STATISTIC Y LABID
LET K = K CONSISTENCY STATISTIC Y LABID
LET H TAG = H CONSISTENCY STATISTIC Y LABID MATID
LET K TAG = K CONSISTENCY STATISTIC Y LABID MATID
E691 INTERLAB Y LABID MATID
The E691 INTERLAB command generates four tables documentented
in the above document. The other comamnds are useful in
generating the plots described in this standard.
In addition, a number of built-in macros were added to
generate the various graphs demonstrated in the standard.
For more information, enter
HELP E691 INTERLAB
3) The following command can be useful in converting data in a
two-way table to a format required by certain Dataplot
commands
LET Y MATID LABID = REPLICATED STACK X1 ... XK LAB
The resulting output has the form
X1(1) 1 LAB(1)
. . .
X1(n) 1 LAB(n)
X2(1) 2 LAB(1)
. . .
X2(n) 2 LAB(n)
...
Xk(1) k LAB(1)
. . .
Xk(n) k LAB(n)
This is a variation of the STACK command. The distinction is
that the last variable entered is interpreted as a labid
variable that is replicated for each of the response variables.
For details, enter
HELP REPLICATED STACK
4) Extreme Value Analysis
a. Enhancements were made to the CME and DEHAAN commands (these
estimate the parameters for a generalized Pareto distribution).
b. Added the following command
PEAKS OVER THRESHOLD PLOT Y
For details, enter PEAKS OVER THRESHOLD PLOT Y.
5) Platform Specific Issues
a) We have separated the Windows installation files into two
distinct cases:
a) Windows 2000/XP platforms
b) Windows 95/98/NT4/ME platforms
This was required for compiler compatibility reasons. The
Lahey LF90 and Compaq Visual Fortran compilers were starting
to show some problems under Windows XP (specifically with
Service Pack 2).
For Windows 2000/XP, we have upgraded to the Intel 8.1
Fortran compiler. However, this compiler does not support
Windows 98 and earlier platforms. So the
Windows 95/98/NT4/ME version is still built using the
Lahey (for the GUI) and Compaq compilers.
b) We have updated the Mac OSX installation. There is now a
single file that you download that includes the executable,
the auxillary files, the source, the needed Tcl/Tk files,
and the g77 compiler. This simplifies the installation
(e.g., you do not have to install Tcl/Tk yourself).
6) We have started overhauling some of the menus for the graphical
interface (GUI). This will not be radically different, just an
effort to provide better organization and clarity to the menus.
This updating will occur over several releases. The initial
update has re-arranged the top level menus. We have added
a "Getting Started" menu to help new users. The Reliability
and Extreme Values menus have been reorganized.
7) Dataplot uses the "." for the decimal point when reading data.
Some countries use the "," for this purpose.
We have added the command
SET DECIMAL POINT
with denoting the character to be used as the decimal
point.
Note that the use of this is currently fairly limited. It is
used in free-format reads only. It is provided to allow
international users the ability to read their data files
without editing them. Note that it does not apply if you
use the SET READ FORMAT command to define a format for the
data. It is also not used for writing data nor for the
output from Dataplot commands.
8) Fixed a number of bugs.
a. Fixed the COLUMN LIMITS where the specified limits are
arrays (as oppossed to single scalar values) to work in
the case where columns are of unequal length.
b. Internally, Dataplot treats strings and functions
interchangeably. The one distinction is that strings
preserve case. However, when strings are operating as
functions, we want them to be converted to upper case.
Dataplot was updated so that when a string is used as a
function, it is converted to upper case. This also
required some updates in the "^" and "&" string operators
to handle case conversions appropriately.
c. Fixed a bug in the Wilcox signed rank test when it was
used for a 1-sample test.
d. For generalized Pareto percent point function, the scale
parameter was ignored. This was corrected.
e. Fixed a bug in the HFLPPF library function.
f. The GRUBBS TEST checks for both the maximum and minimum
values as outliers (relative to the normal distribution).
This is actually two tests: one for the minimum value and
one for the maximum value. When testing for both, the
value of alpha needs to be divided by 2.
The fix was to have the Grubbs test generate output for
3 tests:
1) Test both the minimum and the maximum value (with the
value of alpha adjusted appropriately).
2) Test the minimum value only.
3) Test the maximum value only.
To suppress the one-sided tests, enter the command
SET GRUBBS ONE SIDED OFF
g. Fixed a bug in the discrete uniform random number generator.
The algorithm was generating random numbers on the interval
[1,N]. This was corrected to generate random numbers on the
interval [0,N].
h. If the PRINTING switch was set to OFF, the YATES command
was not writing information to files "dpst1f.dat" and
"dpst2f.dat". This was corrected so that these files are
printed regardless of the setting of the PRINTING switch.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - December 2004.
-----------------------------------------------------------------------
1) The following updates were made for probability distributions.
A. The following enhancements were made to maximum likelihood
estimation.
1. The maximum likelihood output was rewritten for the
normal, lognormal, exponential, Weibull, gamma, beta,
Gumbel, and Pareto distributions.
Support was added for the following:
a. Improved confidence intervals for the distributional
parameters.
b. support for censored data was added for the normal,
lognormal, exponential, Weibull, and gamma distributions.
c. Confidence intervals for selected percentiles was added
for the normal, lognormal, exponential, Weibull, gamma,
beta, and Gumbel distributions.
2. Added support for the Rayleigh, Maxwell, asymmetric
Laplace, generalized Pareto, and normal mixture
distributions:
RAYLEIGH MAXIMUM LIKELIHOOD Y
MAXWELL MAXIMUM LIKELIHOOD Y
ASYMMETRIC LAPLACE MAXIMUM LIKELIHOOD Y
GENERALIZED PARETO MAXIMUM LIKELIHOOD Y
LET NCOMP =
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
The NCOMP parameter is used to specify how many normal
distributions to mix (it defaults to 2 if a value is not
specified for NCOMP).
The online help for the maximum likelihood was also rewritten.
Enter
HELP MAXIMUM LIKELIHOOD
for details.
B. Support was added for the following new distributions.
Skew-Laplace (Skew Double Exponential) distribution:
LET A = SDECDF(X,LAMBDA) - cdf of skew-Laplace distribution
LET A = SDEPDF(X,LAMBDA) - pdf of skew-Laplace distribution
LET A = SDEPPF(X,LAMBDA) - ppf of skew-Laplace distribution
Asymmetric Laplace (Asymmetric Double Exponential) distribution:
LET A = ADECDF(X,LAMBDA) - cdf of asymmetric Laplace
distribution
LET A = ADEPDF(X,LAMBDA) - pdf of aysmmetric Laplace
distribution
LET A = ADEPPF(X,LAMBDA) - ppf of asymmetric Laplace
distribution
Maxwell-Boltzman distribution:
LET A = MAXCDF(X,SIGMA) - cdf of Maxwell Boltzman
LET A = MAXPDF(X,SIGMA) - pdf of Maxwell Boltzman
LET A = MAXPPF(X,SIGMA) - ppf of Maxwell Boltzman
Rayleigh distribution:
LET A = RAYCDF(X) - cdf of Maxwell Boltzman
LET A = RAYPDF(X) - pdf of Maxwell Boltzman
LET A = RAYPPF(X) - ppf of Maxwell Boltzman
Generalized Inverse Gaussian distribution:
LET A = GIGCDF(X,CHI,LAMBDA,THETA) - cdf of generalized inverse
gaussian distribution
LET A = GIGPDF(X,CHI,LAMBDA,THETA) - pdf of generalized inverse
gaussian distribution
LET A = GIGPPF(X,CHI,LAMBDA,THETA) - ppf of generalized inverse
gaussian distribution
Generalized Asymmetric Laplace distribution:
LET A = GALCDF(X,KAPPA,TAU) - cdf of generalized asymmetric
Laplace distribution
LET A = GALPDF(X,KAPPA,TAU) - pdf of generalized asymmetric
Laplace distribution
LET A = GALPPF(X,KAPPA,TAU) - ppf of generalized asymmetric
Laplace distribution
Bessel I Function distribution:
LET A = BEICDF(X,S1SQ,S2SQ,NU) - cdf of Bessel I function
distribution
LET A = BEIPDF(X,S1SQ,S2SQ,NU) - pdf of Bessel I function
distribution
LET A = BEIPPF(X,S1SQ,S2SQ,NU) - ppf of Bessel I function
distribution
McLeish (related to Bessel K function) distribution:
LET A = MCLCDF(X,ALPHA) - cdf of McLeish distribution
LET A = MCLPDF(X,ALPHA) - pdf of McLeish distribution
LET A = MCLPPF(X,ALPHA) - ppf of McLeish distribution
Generalized McLeish (related to Bessel K function) distribution:
LET A = GMCCDF(X,ALPHA,A) - cdf of McLeish distribution
LET A = GMCPDF(X,ALPHA,A) - pdf of McLeish distribution
LET A = GMCPPF(X,ALPHA,A) - ppf of McLeish distribution
C. The following random number generators, plots, and commands
were added:
LET LAMBDA =
LET Y = SKEW LAPLACE RANDOM NUMBERS FOR I = 1 1 N
SKEW LAPLACE PROBABILITY PLOT Y
SKEW LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
SKEW LAPLACE CHI-SQUARE GOODNESS OF FIT Y
SKEW LAPLACE PPCC PLOT Y
SKEW LAPLACE KS PLOT Y
LET LAMBDA =
LET Y = ASYMMETRIC LAPLACE RANDOM NUMBERS FOR I = 1 1 N
ASYMMETRIC LAPLACE PROBABILITY PLOT Y
ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
ASYMMETRIC LAPLACE CHI-SQUARE GOODNESS OF FIT Y
ASYMMETRIC LAPLACE PPCC PLOT Y
ASYMMETRIC LAPLACE KS PLOT Y
LET Y = MAXWELL RANDOM NUMBERS FOR I = 1 1 N
MAXWELL PROBABILITY PLOT Y
MAXWELL KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MAXWELL CHI-SQUARE GOODNESS OF FIT Y
LET Y = RAYLEIGH RANDOM NUMBERS FOR I = 1 1 N
RAYLEIGH PROBABILITY PLOT Y
RAYLEIGH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
RAYLEIGH CHI-SQUARE GOODNESS OF FIT Y
LET CHI =
LET LAMBDA =
LET THETA =
LET Y = GENERALIZED INVERSE GAUSSIAN RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED INVERSE GAUSSIAN PROBABILITY PLOT Y
GENERALIZED INVERSE GAUSSIAN KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED INVERSE GAUSSIAN CHI-SQUARE ...
GOODNESS OF FIT Y
LET KAPPA =
LET TAU =
LET Y = GENERALIZED ASYMMETRIC LAPLACE RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED ASYMMETRIC LAPLACE PROBABILITY PLOT Y
GENERALIZED ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED ASYMMETRIC LAPLACE CHI-SQUARE ...
GOODNESS OF FIT Y
LET S1SQ =
LET S2SQ =
LET NU =
LET Y = BESSEL I FUNCTION RANDOM NUMBERS FOR I = 1 1 N
BESSEL I FUNCTION PROBABILITY PLOT Y
BESSEL I FUNCTION KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
BESSEL I FUNCTION CHI-SQUARE GOODNESS OF FIT Y
LET ALPHA =
LET Y = MCLEISH RANDOM NUMBERS FOR I = 1 1 N
MCLEISH PROBABILITY PLOT Y
MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MCLEISH CHI-SQUARE GOODNESS OF FIT Y
MCLEISH PPCC PLOT Y
MCLEISH KS PLOT Y
LET ALPHA =
LET A =
LET Y = GENERALIZED MCLEISH RANDOM NUMBERS FOR I = 1 1 N
GENERALIZED MCLEISH PROBABILITY PLOT Y
GENERALIZED MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
GENERALIZED MCLEISH CHI-SQUARE GOODNESS OF FIT Y
GENERALIZED MCLEISH PPCC PLOT Y
GENERALIZED MCLEISH KS PLOT Y
D. Dataplot uses the following defintion for the generalized
Pareto probability density function:
f(x,gamma) = (1+gamma*x)**(-(1/gamma)-1)
However, many sources (e.g., Johnson, Kotz, and Balakrishnan)
define the generalized Pareto as:
f(x,gamma) = (1-gamma*x)**((1/gamma)-1)
That is, the sign of gamma is reversed. The following
command was added:
SET GENERALIZED PARETO DEFINITION
was added. A value of JOHNSON or KOTZ for this command
will use the second definition given. Any other value
will use the first (default) definition.
E. For the Pareto and Pareto type 2 distributions, what is
typically referred to as the location parameter (the A
parameter) is not a location parameter in the technical
sense that the relation
f(x;gamma,loc) = f((x-loc);gamma,0)
does not hold (it is a location parameter in the sense
that it defines a lower bound for the Pareto, but not the
Pareto type 2, distribution).
For this reason, we modified the Dataplot definition to
treat A as a second shape parameter. For example, the
Pareto PDF function is
PARPDF(x,gamma,a,loc,scale)
The A, LOC, and SCALE parameters are optional (A will
default to 1 if not given).
F. The following enhancements were made to the probability
plot and ppcc/ks plots.
Note that both the probability plot and the ppcc plot
ultimately depend on computing the percent point function
for the specified distribution. If the percent point function
is fast to compute (e.g., if it exists as a simple, closed
formula), then these plots can be generated rapidly even if the
number of data points is large. On the other hand, some percent
point functions can require a good deal of computation. For
example, some distributions compute the cumulative distribution
function via numerical integration and then compute the percent
point function by inverting the cumulative distribution
function. In these cases, the ppcc/ks plots can take too long
to generate to be practical (this tends to be less of an issue
with probability plots).
1. The following commands can be used to control how many
points are used to generate probability and ppcc/ks
plots, respectively:
SET PROBABILITY PLOT DATA POINTS
SET PPCC PLOT DATA POINTS
The algorithm is to compute equally spaced
percentiles of the full data set and then use these
percentiles in generating the probability and
ppcc/ks plot.
Using this command involves a trade-off between speed
and accuracy. For distributions with simple, closed
formualas or fast approximations for the percent point
function, there is little reason not to use the full data
set. However, for many distributions, the ppcc plot or
ks plot can become impractical as the number of data points
increases.
The minimum number of points is 20. The number of
points is typically set between 50 and 100. You may
want to use less than 50 points for a few distributions
with particularly expensive percent point functions.
For distributions with only moderately expensive percent
point functions, you may want to go as high as 100 or
200.
2. For the ppcc (or ks) plot, each point on the plot
represents one underlying probability plot (which in
return requires n, where n is the sample size, computations
of the percent point function. For distributions with
one shape parameter, Dataplot typically uses 50 points
(i.e., there are 50 underlying probability plots
computed). For two shape parameters, Dataplot typically
uses between 20 and 50 values for each shape parameter.
It decreases the number of values used when the percent
point function is expensive to compute.
The following command allows you to explicitly specify
how many probability plots are generated by the ppcc plot:
SET PPCC PLOT AXIS POINTS
with and denoting the number of values
to use for the first and second shape parameters,
respectively. Specifying is optional.
Set these values to 0 in order to revert to the Dataplot
default.
There are actually two reasons for using this command.
If the percent point function is fast to compute (e.g.,
the Weibull distribution), you may want to increase the
number of points in order to generate a finer grid. On
the other hand, if the percent point function is
expensive to compute, you may want to decrease the
number of points to speed up the generation of the plot.
3. If the ppcc (or ks) plot has two shape parameters, then
the default graphical format is to plot the ppcc (or
ks) value on the y-axis. Each curve on the plot
represents one value of one shape parameter while the
value of the x-axis coordinate represents the value of
the other shape parameter. To reverse the roles of the
shape parameters, enter the command
SET PPCC PLOT AXIS ORDER REVERSE
To restore the default, enter
SET PPCC PLOT AXIS ORDER DEFAULT
4. The PPCC PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
5. The PROBABILITY PLOT and PPCC PLOT were updated to support
multiply censored data.
The syntax is
CENSORED PROBABILITY PLOT Y X
CENSORED PPCC PLOT Y X
The X variable identifies which points represent failure
and which represent censoring times. Specifically,
X = 1 implies a failure time and X = 0 represents a
censoring time. The word CENSORED is required to
distinguish this syntax from the syntax for binned
data. Censored probability plots and censored ppcc
plots do not apply to binned data.
Dataplot supports two algorithms for determining plot
coordinates for a censored probability plot.
i. The uniform order statistic medians are generated
based on the full sample size. However, only
values that represent a failure time are actually
plotted.
ii. Instead of uniform order statistic medians, the
plotting positions for the failure times are
computed using the Kaplan-Meier product limit
estimate:
U(i) = ((n+0.7)/(n+0.4))*
PRODUCT[q=1 to i][(n-q+0.7)/(n-q+1.7)]
with n denoting the full sample size and q denoting
failure times only. The theoretical quantile is then
the percent point function of U(i).
The censored ppcc plot is then based on the correlation
coefficient of the censored probability plot.
To specify which censoring algorithm to use, enter the
commands
SET CENSORED PROBABILITY PLOT
SET CENSORED PPCC PLOT
The default is to use the uniform order statistic medians
algorithm.
G. The following enhancements were made to the
Kolmogorov-Smirnov goodness of fit command and the KS PLOT.
plot and ppcc/ks plots.
1. The KS PLOT for the binned case ( KS PLOT Y X) now
automatically plots the chi-square goodness of fit
statistic rather than the Kolmogorov-Smirnov goodness of
fit statistic. This is done since the chi-square goodness
of fit is expliticly based on binned data. Note that
bins with a size less than 5 are automatically combined
so that the minimum bin size is at least 5.
2. The KS PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
2) The following graphics commands were added.
a. Univariate average shifted histograms can be generated with
the command:
ASH HISTOGRAM Y
3) The following analysis commands were added.
a. Cochran's test can be performed with the command
COCHRAN TEST Y X
where Y is a response variable and X is a group identifier
variable. Cochran's test is an alternative to the
Kruskal-Wallis test when the response variable is dichotomous
(i.e., only 2 possible values).
b. The Kruskal-Wallis test was enhanced to write the pairwise
multiple comparisons to the file dpst1f.dat.
c. Van Der Waerden's test can be performed with the command
VAN DER WAERDEN TEST Y X
where Y is a response variable and X is a group identifier
variable. Van Der Waerden's test is an alternative to
KRUSKAL WALLIS that is based on normal scores of the ranks.
4) The following statistics and LET subcommands were added.
a. Kendell's tau can be computed with the command
LET A = KENDELL TAU Y1 Y2
b. For the chi-square goodness of fit, it is generally advisable
to combine bins with small counts (typically, 5 is recommended
as a minimum bin size). To convert equal width bins to
variable width bins with a minimum bin count, enter the
commands
LET MINSIZE =
LET Y2 XLOW XUPPER = Y X
c. The commands
LET Y2 X2 = ASH BINNED Y
LET Y2 X2 = COUNTS ASH BINNED Y
generate frequency tables based on the average shifted
histogram (see ASH HISTOGRAM above). The first syntax returns
the relative frequency while the second syntax returns a
count.
5) The following enhancements were made to the READ command.
a. In previous versions of Dataplot, if your data set contained
rows with an unequal number of columns, Dataplot would only
read the number of variables corresponding to the row
with the minimum number of columns.
If you would like Dataplot to pad missing columns with a
missing value, enter the command
SET READ PAD MISSING COLUMNS ON
For example, if you enter the command
READ FILE.DAT X1 X2 X3 X4 X5
then rows with less than five columns will set the missing
rows to a missing value. To set the numeric value that
represents a missing value, enter
SET READ MISSING VALUE
where denotes the desired numeric value.
To reset the default behavior, enter the command
SET READ PAD MISSING COLUMNS OFF
In some cases, missing columns would be indicative of an
error in the data file.
b. The SUBSET/EXCEPT/FOR clause on a READ command was ambiguous.
The ambiguity aries from the fact that it is not clear whether
the SUBSET/EXCEPT/CLAUSE command refers to the lines in the
data file being read or to the output variables that are
created by the READ command. We address this with the
following command:
SET READ SUBSET
In this command, PACK means the SUBSET/EXCEPT/FOR clause
does not apply while DISPERSE means that it does. The
first setting applies to the input file while the second
setting applies to the created data variables.
This is demonstrated with the following example (note that
P-D means the data file is set to PACK and the output
variable is set to DISPERSE). The first column is the
data in the file while the remaining columns show what
the resulting data variable should look like.
READ FILE.DAT X FOR I = 1 2 10
X P-D P-P D-P D-D
===========================================
1 1 1 1 1
2 0 2 3 0
3 2 3 5 3
4 0 4 7 0
5 3 5 9 5
6 0 6 - 0
7 4 7 - 7
8 0 8 - 0
9 5 9 - 9
10 - 10 - -
The default setting is PACK-DISPERSE (this is the default
because this is the behavior of previous versions of Dataplot).
6) Miscellaneous Updates
a. Added the command
SET POSTSCRIPT DEFAULT COLOR
Postscript devices can be either black and white or color.
Dataplot assumes black and white by default. After the
DEVICE <2/3> POSTSCRIPT command, you can enter
DEVICE <2/3> COLOR ON
Although this works fine for DEVICE 2, it presents
complications for DEVICE 3 (this is the device used by the
PP command to print the current graph to a Postscript
printer). Dataplot opens/closes this device as needed
without the user entering any commands. It can be
difficult to determine when to insert a DEVICE 3 COLOR ON
command.
If you enter
SET POSTSCRIPT DEFAULT COLOR ON
then Dataplot will assume Postscript devices are color
(this applies to both DEVICE 2 and DEVICE 3, although it
is primarily motivated for DEVICE 3 output).
b. The default algorithm for class width in Dataplot is to
use 0.3*s where s is the sample standard deviation.
A number of different algorithms have been proposed to
obtain "optimal" class widths. The command
SET HISTOGRAM CLASS WIDTH
can be used to specify the default class width that Dataplot
will use for the HISTOGRAM and ASH HISTOGRAM commands.
Additional choices may be added in future releases.
The current choices are:
DEFAULT - use 0.3*s
SD - use 0.3*s
NORMAL - use 2.5*s/n**(1/3)
NORMAL CORRECTED - start with 2.5*s/n**(1/3). If the
skewness is between 0 and 3, multiply
this by the correction factor:
1/(1 - 0.006*skew + 0.27*skew**2 -
0.0069*skew**3).
If the kurtosis - 3 is between 0 and 6,
multiply by the correction factor:
1 - 0.2*(1 - EXP(-0.7*(kurt - 3)))
IQ - use 2.603*IQ/N**(1/3) where IQ is the
interquartile range
The NORMAL width is an optimal choice (in the sense of
minimizing the integrated mean square error of the histogram)
if the data is in fact normal. The NORMAL CORRECTED provides
correction factors for moderate skewness and kurtosis. The
IQ replaces s with a robust estimate of scale (the
interquartile range) and should provide a reasonable bin width
for a wide range of underlying distributions.
Since the "optimal" choice of bin width is dependent on
the underlying distribution of the data, it is difficult
to provide a default bin width that will work well in all
cases (we are typically using the histogram to help determine
what that underlying distribution actually is).
An explicit CLASS WIDTH command will override the default
class width algorithm.
c. For the chi-square goodness of fit test, it is usually
recommended that classes with less than 5 observations be
combined in order to obtain a reasonably accurate
approximation. Given data that is binned into equal size
bins, you can automatically combine bins with small
frequencies with the commands
LET MINSIZE =
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
The variables XLOW and XHIGH will contain the lower and upper
boundary values for the classes (since bins will no longer be
of equal length), respectively. The value for MINSIZE defines
the minimum frequency for a class (it defaults to 5).
You can then generate a chi-square goodness of fit test
with the command
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
A typical sequence of commands for generating a chi-square
goodness of fit test for a discrete distribution, starting
from raw data, is
LET AMIN = MINIMUM Y
LET AMAX = MAXIMUM Y
CLASS LOWER AMIN
CLASS UPPER AMAX
CLASS WIDTH 1
LET Y2 X2 = BINNED Y
LET MINSIZE = 5
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
d. The CORRELATION MATRIX and COVARIANCE MATRIX compute the
correlation and covariance matrices, respectively, of the
columns of a matrix. If you would like these to be
generated from the rows of the matrix, you can enter the
commands
SET CORRELATION MATRIX DIRECTION ROW
SET COVARIANCE MATRIX DIRECTION ROW
To reset to the columns, enter
SET CORRELATION MATRIX DIRECTION COLUMN
SET COVARIANCE MATRIX DIRECTION COLUMN
7) Bug Fixes:
a. There was a bug reading numbers of the form
-.23
In this case, the minus sign was being lost. You can
work around this by entering the number as
-0.23
This bug is fixed in the current version.
NOTE: This bug was introduced in the 1/2004 version.
b. There was a bug reading rows containing a single character.
This has been fixed. If you encounter this bug, you can
work around it by inserting a leading space in the data
file.
NOTE: This bug was introduced in the 1/2004 version.
c. The SET commands that accepted file names as arguments did
not support quoting. Enclosing the file name in quotes is
required when the file names contains spaces or hyphens.
This has been corrected.
d. There was a bug in the SUMMARY command where in some cases
it did not extract the correct data. This has been fixed.
e. There was a bug in the KAPLAN MEIER PLOT command that caused
the censoring variable to not be recognized. This has been
corrected.
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT February - May 2004.
-------------------------------------------------------------------------
1) The following updates were made for probability distributions.
a. Support was added for the following new distributions.
Log-skew-normal distribution:
LET A = LSNCDF(X,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSNPDF(X,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSNPPF(P,LAMBDA,SD) - ppf of log-skew-normal
distribution
Log-skew-t distribution:
LET A = LSTCDF(X,NU,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSTPDF(X,NU,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSTPPF(P,NU,LAMBDA,SD) - ppf of log-skew-normal
distribution
G-and-H distribution:
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPDF(X,G,H) - pdf of g-and-h distribution
Note that the ppf function was added in a previous update.
Hermite distribution:
LET A = HERCDF(X,A,B) - cdf of Hermite distribution
LET A = HERPDF(X,A,B) - pdf of Hermite distribution
LET A = HERPPF(P,A,B) - ppf of Hermite distribution
Yule distribution:
LET A = YULCDF(X,P) - cdf of Yule distribution
LET A = YULPDF(X,P) - pdf of Yule distribution
LET A = YULPPF(P,P) - ppf of Yule distribution
b. The following pdf functions were added (these distributions
previously supported the cdf and ppf functions).
LET A = NCTPDF(X,NU,LAMBDA) - pdf of non-central t
LET A = DNTPDF(X,NU,L1,L2) - pdf of doubly non-central t
LET A = NCCPDF(X,NU,LAMBDA) - pdf of non-central chi-square
LET A = NCFPDF(X,NU1,NU2,L1) - pdf of non-central F
LET A = DNFPDF(X,NU1,NU2,L1,L2) - pdf of doubly non-central F
LET A = NCBPDF(X,A,B,LAMBDA) - pdf of non-central Beta
These pdf functions are computed by taking the numerical
derivative of the corresponding cdf function. You may
at times get warning messages that the derivative has not
converged with sufficient accuracy (this occurs most frequently
with the non-central Beta distribution).
c. The following enhancements were made to maximum likelihood
estimation.
1. The binomial case now generates lower and upper confidence
limits based on the Agresti and Coull approximation.
2. The lognormal case now generates confidence limits for
the shape and scale parameters.
3. Support was added for the following distributions:
LOGARITHIC SERIES MAXIMUM LIKELIHOOD Y
GEOMETRIC MAXIMUM LIKELIHOOD Y
BETA BINOMIAL MAXIMUM LIKELIHOOD Y
NEGATIVE BINOMIAL MAXIMUM LIKELIHOOD Y
HYPERGEOMETRIC MAXIMUM LIKELIHOOD Y
HERMITE MAXIMUM LIKELIHOOD Y
YULE MAXIMUM LIKELIHOOD Y
FATIGUE LIFE MAXIMUM LIKELIHOOD Y
GEOMETRIC EXTREME EXPONENTIAL MAXIMUM LIKELIHOOD Y
FOLDED NORMAL MAXIMUM LIKELIHOOD Y
CAUCHY MAXIMUM LIKELIHOOD Y
4. For the Johnson SU/SB distribution, a percentile
estimator is now available (a method of moments
estimator was previously available):
JOHNSON PERCENTILE Y
Note that this estimator will automatically determine
whether a SB or SU estimator is appropiate. Also, you
can define a constant Z used by this estimator by
entering the command (before the JOHNSON PERCENTILE
command):
LET Z =
This value is typically set between 0.5 and 1 with a
default value of 0.54. As the sample size gets larger,
then values of Z closer to 1 are appropriate (e.g.,
for a sample of size 1,000, a value of 0.8 works well).
5. Support for Latex and HTML output was added to most
supported distributions.
d. The following random number generators were added:
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA1 =
LET LAMBDA2 =
LET Y = DOUBLY NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL BETA RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED HALF-LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET BETA =
LET Y = HERMITE RANDOM NUMBERS FOR I = 1 1 N
LET P =
LET Y = YULE RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET C =
LET Y = WARING RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET Y = GENERALIZED WARING RANDOM NUMBERS FOR I = 1 1 N
The t, F, and chi-square random number generators were
updated to accept non-integer values for the degrees of
freedom parameters.
e. The following additions were made to the probability plot,
Kolmogorov-Smirnov goodness of fit, chi-sqaure goodness of
fit, and ppcc plot commands:
LET LAMBDA =
LET SD =
LOG SKEW NORMAL PROBABILITY PLOT Y
LOG SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW NORMAL CHI-SQUARE GOODNESS OF FIT Y
LOG SKEW NORMAL PPCC PLOT Y
LET LAMBDA =
LET SD =
LET NU =
LOG SKEW T PROBABILITY PLOT Y
LOG SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW T CHI-SQUARE GOODNESS OF FIT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
G AND H CHI-SQUARE GOODNESS OF FIT Y
G AND H PPCC PLOT Y
LET ALPHA =
LET BETA =
HERMITE PROBABILITY PLOT Y
HERMITE CHI-SQUARE GOODNESS OF FIT Y
HERMITE PPCC PLOT Y
LET P =
YULE PROBABILITY PLOT Y
YULE CHI-SQUARE GOODNESS OF FIT Y
YULE PPCC PLOT Y
f. The Anderson Darling test was updated to support the
generalized Pareto distribution:
ANDERSON-DARLING GENERALIZED PARETO TEST Y
The maximum likelihood estimation for the generalized
Pareto is still undergoing algorithmic development, so
you should specify the shape and scale parameter for
the generalized Pareto (before invoking the Anderson-Darling
test) as follows:
LET GAMMA =
LET A =
g. An optional definition was added for the geometric
distribution.
The default defintion for the geometric distribution is the
number of failures before the first success is obtained in
a sequence of Bernoulli trials. The alternate definition
is the number of trials up to and including the first
success in a series of Bernoulli trials. This definition
simply shifts the geometric distribution to start at X = 1
rather than X = 0.
To specify the alternate definition, enter the command
SET GEOMETRIC DEFINITION DLMF
To restore the default definition, enter the command
SET GEOMETRIC DEFINITION JOHNSON AND KOTZ
h. The negative binomial was updated to support non-integer
arguments for the number of failures shape parameter
(i.e., k).
i. A number of bug fixes and algorithmic improvements were made
for the ppcc plots with two shape parameters and the random
number generation for a few distributions.
2. The following enhancements were made to the PPCC PLOT and
PROBABILITY PLOT commands.
a. For some long tailed distributions, there can be large
variability in the tails. This can distort the estimates
of location, PPA0, and scale, PPA1, of the line fitted
to the probability plot. To address this, Dataplot now
also returns PPA0BW and PPA1BW. These are the estimates
obtained by performing two iterations of biweight
weighting of the residuals.
In most cases, the use of PPA0 and PPA1 is preferred.
However, if the probability plot indicates the prescence
of extreme outliers in the tails, PPA0BW and PPA1BW may
provide better estimates for the location and scale
parameters.
b. The following command was added as a variant of the
ppcc plot:
KS PLOT Y
where is any of the distributions supported by
the PPCC PLOT command.
This plot uses a similar concept to the ppcc plot.
However, it uses the value of the Kolmogorov-Smirnov
goodness of fit statistic rather than the correlation
coefficient of the probability plot as the measure
of distributional fit. In this, the goal is to minimize
the Kolmogorov-Smirnov goodness of fit statistic.
Although we are still developing experience with this
plot, a few prelimary recommendations are:
1. For most continuous distributions with one shape
parameter, the PPCC PLOT and KS PLOT generate similar
estimates for the shape parameter.
2. The KS PLOT seems to perform better for at least some
distributions with two shape parameters.
3. The KS PLOT generates a smoother plot for discrete
distributions.
For additional information, enter
HELP KS PLOT
c. For the PPCC PLOT and KS PLOT, the following command
allows you to specify the desired format for the
plot when there are two shape parameters:
SET PPCC FORMAT
For the default setting, TRACE, these plots are generated
as a multi-trace 2D plot. That is, the Y axis will
represent the correlation (or value of the
Kolmogorov-Smirnov statistic), the X axis will represent
the value of the second shape parameter, and each trace
will represent one of the values for the first shape
parameter.
If this value is set to 3D, the plot is represented as
a 3D surface plot.
3. Sometimes data may only be available in the form of a frequency
table. However, some Dataplot commands may expect the data
in a "raw" format. The following command was added to convert
frequency data to raw data:
LET Y = FREQUENCY TO RAW X FREQ
For example,
X FREQ
--------
0 3
1 2
2 4
would be converted to
0
0
0
1
1
2
2
2
2
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT June 2003-January 2004.
-------------------------------------------------------------------------
1) The following enhancements were made to the Dataplot I/O
capabilities.
a) Previously, the Dataplot READ command was updated to
handle the syntax
READ FILE.DAT
In this case, Dataplot simply assigns the names X1, X2,
and so on to the variables. Many packages accept data
files where the first line contains the variable names.
To support this in Dataplot, do the following:
SET READ VARIABLE LABEL ON
READ FILE.DAT
In this case, Dataplot will interpret the first line
read as the variable names in the file.
b) Dataplot has previously not supported reading character
variables in data files (with the one execption of READ ROW
LABELS). If encountered, Dataplot would generate an error
message and not read the data file correctly. To address
this, we have added the command
SET CONVERT CHARACTER
Setting this to ERROR will continue the current Dataplot
action of reporting an error. This is recommended for the
case when a file is suppossed to contain only numeric data
and the presence of character data is in fact indicative
of an error in the data file. Setting this to IGNORE will
instruct Dataplot to simply ignore any fields containing
character data. Setting this to ON will read character fields
and write them to the file "dpzchf.dat".
There are some restrictions on when Dataplot will try to
read character data:
1) This only applies to the variable read case. That
is, READ PARAMETER and READ MATRIX will ignore
character fields or treat them as an error.
2) Dataplot will only try to read character data from
a file. When reading from the keyboard (i.e., when
READ is specified with no file name), character data
will be ignored when a SET CONVERT CHARACTER ON is
specified.
3) This capability is not supported for the SERIAL READ
case.
4) The SET READ FORMAT command does not accept the
"A" format specification for reading character
fields.
Some of these restrictions may be addressed in subsequent
releases of Dataplot.
Enter HELP CONVERT CHARACTER for details.
c) The COLUMN LIMITS command has been updated to accept
variable arguments. For example,
COLUMN LIMITS LOWER UPPER
with LOWER and UPPER denoting variables (as oppossed to
parameters) each with N elements. Dataplot will parse
the data file assuming that field one of the data is in
columns LOWER(1) to UPPER(1), field two of the data is
in LOWER(2) to UPPER(2) and so on. Note that only one
numeric or character variable will be read in each field.
Many programs, Excel for example, will write data to ASCII
files with the data values either left or right justified
to a given column. If the ASCII file is written so that
the decimal point is in a fixed column, then using the
SET READ FORMAT is typically recommended rather than
the COLUMN LIMITS with variable arguments.
If the data file contains columns of equal length, then
using this form of the COLIMNM LIMITS command is not
necessary. However, there are two cases where it is useful:
1) If you only want to read selected fields in the data
file, then this form of the COLUMN LIMITS command
easily allows you to do this.
2) If the data columns are of unequal length, as ASCII
files created from Excel often are, then this form
of the COLUMN LIMITS allows these data files to be
read correctly. If a given field is empty, Dataplot
interprets it as a missing value.
By default, Dataplot will set the missing value to 0.
If you would like to specify a value other than zero,
then enter the command
SET READ MISSING VALUE
where is the desired value.
Enter HELP COLUMN LIMITS for details.
d) If Excel writes a comma delimited ASCII file (.CSV), then
missing values are denoted with ",,". In order to interpert
these files correctly, you can enter the command
SET READ DELIMITER
where specifies the desired delimiter. The default
delimiter is a comma.
If Dataplot encounters the delimiter before any valid data
has been found, it interprets this as a missing value.
Missing values are set to 0 unless a SET READ MISSING VALUE
command has been entered (see above).
We have added a section in the online help files that provides
general guidance on reading ASCII data files in Dataplot.
This consolidates information documented under a number of
different commands. For details, enter
HELP ASCII FILES
2) The SET CONVERT CHARACTER ON command allows you to read
character variables. We have added the following commands
that operate on these character variables.
a) Many character variables are in fact group-id variables.
In order to allow you to use these group-id variables
in a numeric context, the following two commands were added:
LET Y = CHARACTER CODE IX
LET Y = ALPHABETIC CHARACTER CODE IX
with IX denoting the name of a character variable that
has been read into Dataplot and Y denoting the name of a
numeric variable that will be created by this command.
Both of these commands identify the unique rows in the
character variable (Dataplot checks for exact matches, it
does not try to guess if a typo has occurred, etc.). If
there are K unique rows, Dataplot will generate coded values
as the integer values from 1 to K. The distinction is that
CHARACTER CODE will perform the coding in the order that the
unique rows are encoutered in the file while ALPHABETIC
CHARACTER CODE will sort the unique character rows and
code based on the alphabetic order.
b) Character variables are frequently used as group-id
variables (e.g., Male and Female to identify sex). The
following command creates a group-id variable from a
character variable:
LET IG = GROUP LABELS MONTH
with MONTH denoting the name of a character variable.
The name IG will be used to denote a group-id variable.
The number of rows in IG will be equal to the number of
unique rows in MONTH. Up to 5 group-id variables can be
created and the maximum number of rows for a group-id
variable is the maximum number of rows for a numeric
variable divided by 100.
c) You can create a row label variable with the READ ROW LABEL
command. Alternatively, you now enter the command
LET ROWLABEL = MONTH
with MONTH denoting the name of a character variable.
Note that the variable name on the left hand side of the
"=" must be ROWLABEL for this command to work.
d) The TIC MARK LABEL FORMAT and TIC LABEL CONTENT commands
have been updated to suppor the following:
TIC MARK LABEL FORMAT GROUP LABEL
TIC MARK LABEL CONTENT IG
TIC MARK LABEL FORMAT ROW LABEL
TIC MARK LABEL FORMAT VARIABLE
TIC MARK LABEL CONTENT YVAR
Setting the tic mark label format to GROUP LABEL instructs
Dataplot to use a group label variable for the contents
of the tic mark label. The TIC MARK LABEL CONTENT command
is then used to specify the name of the group label variable
to use.
Setting the tic mark label format to VARIABLE is similar to
the GROUP LABEL case. However, in this case a numeric
variable is specified rather than a group label variable.
This allows you to place your own numeric tic mark labels.
For example, you can use this to generate a "reverse" axis.
Setting the tic mark label format to ROW LABEL allows you
to use the row labels as the content for the tic mark labels.
For example, this can be useful for labeling a bar chart.
3) Support for the following univariate distributions was added:
LET A = TRACDF(X,A,B,C,D) - cdf of trapezoid distribution
LET A = TRAPDF(X,A,B,C,D) - pdf of trapezoid distribution
LET A = TRAPPF(P,A,B,C,D) - ppf of trapezoid distribution
LET A = GTRCDF(X,A,B,C,D,NU1,NU3,ALPHA) - cdf of generalized
trapezoid distribution
LET A = GTRPDF(X,A,B,C,D,NU1,NU3,ALPHA) - pdf of generalized
trapezoid distribution
LET A = GTRPPF(P,A,B,C,D,NU1,NU3,ALPHA) - ppf of generalized
trapezoid distribution
LET A = FTCDF(X,NU) - cdf of folded t distribution
LET A = FTPDF(X,NU) - pdf of folded t distribution
LET A = FTPPF(P,NU) - ppf of folded t distribution
LET A = SNCDF(X,ALPHA) - cdf of skew normal distribution
LET A = SNPDF(X,ALPHA) - pdf of skew normal distribution
LET A = SNPPF(P,ALPHA) - ppf of skew normal distribution
LET A = STCDF(X,NU,ALPHA) - cdf of skew t distribution
LET A = STPDF(X,NU,ALPHA) - pdf of skew t distribution
LET A = STPPF(X,NU,ALPHA) - ppf of skew t distribution
LET A = SLACDF(X) - cdf of slash distribution
LET A = SLAPPF(P) - ppf of slash distribution
LET A = IBCDF(X,ALPHA,BETA) - cdf of inverted beta distribution
LET A = IBPPF(P,ALPHA,BETA) - ppf of inverted beta distribution
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = MAKCDF(X,XI,L,T) - cdf of Gompertz-Makeham distribution
LET A = MAKPDF(X,XI,L,T) - pdf of Gompertz-Makeham distribution
LET A = MAKPPF(P,XI,L,T) - ppf of Gompertz-Makeham distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = ZIPPDF(X,ALPHA) - pdf of Zipf distribution
Note that the IBPDF and SLAPDF functions were implemented
previously. The GHPDF function is still under development.
You can generate random numbers for these distributions
with the commands
LET A =
LET B =
LET C =
LET D =
LET Y = TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
LET Y = GENERALIZED TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET Y = FOLDED T RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = SKEWED NORMAL RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET ALPHA =
LET Y = SKEWED T RANDOM NUMBERS FOR I = 1 1 N
LET G =
LET H =
LET Y = G AND H RANDOM NUMBERS FOR I = 1 1 N
LET XI =
LET LAMBDA =
LET THETA =
LET Y = GOMPERTZ-MAKEHAM RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = ZIPF RANDOM NUMBERS FOR I = 1 1 N
Random numbers for the slash and inverted beta distributions
were added previously.
You can generate the following probability plots and goodness
of fit tests
LET A =
LET B =
LET C =
LET D =
TRAPEZOID PROBABILITY PLOT Y
TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
GENERALIZED TRAPEZOID PROBABILITY PLOT Y
GENERALIZED TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GENERALIZED TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET NU =
FOLDED T PROBABILITY PLOT Y
FOLDED T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
FOLDED T CHI-SQUARE GOODNESS OF FIT TEST Y
FOLDED T PPCC PLOT Y
LET NU =
LET LAMBDA =
SKEW T PROBABILITY PLOT Y
SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW T CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW T PPCC PLOT Y
LET LAMBDA =
SKEW NORMAL PROBABILITY PLOT Y
SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW NORMAL PPCC PLOT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
G AND H CHI-SQUARE GOODNESS OF FIT TEST Y
G AND H PPCC PLOT Y
LET XI =
LET LAMBDA =
LET THETA =
GOMPERTZ-MAKEHAM PROBABILITY PLOT Y
GOMPERTZ-MAKEHAM KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GOMPERTZ-MAKEHAM CHI-SQUARE GOODNESS OF FIT TEST Y
c) Added the following commands
JOHNSON SU MOMENTS Y
JOHNSON SB MOMENTS Y
to compute method of moment estimates for the Johnson SU
and Johnson SB distributions.
d) The GUMBEL MAXIMUM LIKELIHOOD command was extended to support
both the minimum and maximum cases (the previous version was
restricted to the maximum case). Before the GUMBEL MAXIMUM
LIKELIHOOD command, enter the command
SET MINMAX 1
to specify the minimum case and
SET MINMAX 2
to specify the maximum case.
e) Enter the following command to generate Dirichelet random numbers:
LET M = DIRICHLET RANDOM NUMBERS ALPHA N
with ALPHA denoting a vector containing the shape parameters of
the Dirichlet distribution and N denoting a scalar that specifies
the number of rows to generate. M will be a matrix with N rows
and k columns (where k is the number of elements in the ALPHA
vector).
You can also compute the Dirichlet probability density or the
log of the Dirichlet probability density with the commands
LET M = DIRICHLET PDF X ALPHA
LET M = DIRICHLET LOG PDF X ALPHA
f) Enter the following command to generate correlated uniform
random numbers:
LET U = MULTIVARIATE UNIFORM RANDOM NUMBERS SIGMA N
with SIGMA denoting the variance-covariance matrix of
a multivariate normal distribution and N denoting the number
of rows to generate.
g) The Anderson-Darling goodnes of fit test was enhanced to
include the following distributions:
ANDERSON-DARLING LOGISTIC TEST Y
ANDERSON-DARLING DOUBLE EXPONENTIAL TEST Y
ANDERSON-DARLING UNIFORM TEST Y
The uniform case is for the uniform distribution on the
(0,1) interval. This can also be used for fully specified
distributions (i.e., the shape, location, and scale
parameters are not estimated from the data). Simply
calculate the appropriate CDF function with the specified
shape, location, and scale parameters (this converts the
data to the (0,1) interval) and apply the test for a
uniform distribution.
h) The following maximum likelihood estimation commands were
added:
LOGISTIC MAXIMUM LIKELIHOOD Y
UNIFORM MAXIMUM LIKELIHOOD Y
BETA MAXIMUM LIKELIHOOD Y
The BETA and UNIFORM cases generate both method of moments and
maximum likelihood estimates.
The beta case estimates the lower and upper limits of the
data from the minimum and maximam data values, respectively,
and then computes the maximum likelihood estimates for the
alpha and beta shape parameters.
i) Support was added for the following random number
generators:
1) FIBONACCI CONGRUENTIAL - a mixture of the Fibonnaci generator
with a congruential generator
2) MERSENNE TWISTER - Fortran 90 implementation of the
Mersenned twister generator (may not be
valid on platforms that are compiled
with Fortran 77 compilers)
Enter HELP RANDOM NUMBER GENERATOR for details.
j) Fixed the inverse gaussian and reciprocal inverse gaussian
probability functions. The MU parameter was treated as a
location parameter in original implementation. However, it
is really a shape parameter. So IGPDF and RIGPDF can now be
called via
IGPDF(X,GAMMA,MU,LOC,SCALE)
RIGPDF(X,GAMMA,MU,LOC,SCALE)
The MU parameter is treated as an optional parameter (LOC and
SCALE are also optional). MU is set to 1 if it is omitted.
The MU parameter can also be specified for random numbers
and probability plots. If the MU parameter is not set, it
will automatically be set to 1 (no error message is printed).
The PPCC plot for these two distributions is now generated for
both the gamma and mu parameters (i.e., a 3D plot is generated).
If you want the PPCC plot assuming MU =1 for the inverse
gaussian case, you can use the WALD PPCC PLOT command (the
Wald distribution is a special case of the inverse gaussian
where MU is set to 1).
4) Added the following analysis commands:
a) Support for linear and quadratic calibration is available via
the following commands:
LINEAR CALIBRATION Y X Y0
QUADRATIC CALIBRATION Y X Y0
The LINEAR CALIBRATION command performs a linear calibration
analysis using eight different methods. The QUADRATIC
CALIBRATION command performs a quadratic calibration analysis
using three different methods.
Enter HELP CALIBRATION for details.
b) The Friedman test for two-way analysis of variance on ranks
is supported with the command
FRIEDMAN TEST Y BLOCK TREATMENT
Enter
HELP FRIEDMAN TEST
for details.
c) The frequency and cumulative sum tests for randomness are
supported with the commands
FREQUENCY TEST Y
LET M =
FREQUENCY WITHIN A BLOCK TEST Y
CUMULATIVE SUM TEST Y
These tests are used for sequences of 0's and 1's (Dataplot
just checks for two distinct values, the higher value is
set to 1 and the lower value is set to 0).
To test a uniform random number generator, do something like
the following:
LET N = 1
LET P = 0.5
LET Y = BINOMIAL RANDOM NUMBERS FOR I = 1 1 10000
FREQUENCY TEST Y
For details, enter
HELP FREQUENCY TEST
HELP CUMULATIVE SUM TEST
5) The following enhancements were made to the BOOTSTRAP PLOT command
a) Extended the grouped case to handle two groups (previously
one group was supported).
b) For the grouped (either one or two groups), the following
information is written to file:
DPST1F.DAT - the full set of bootstrap estimates for the
statistic (group-id in column 1, bootstrap
statistic in column 2)
DPST2F.DAT - writes the group-id and the corresponding mean,
standard deviation, and the 0.025, 0.975, 0.05,
0.95, 0.0005, and 0.995 quantiles
c) Added the following form of the command
BCA BOOTSTRAP PLOT Y
This generates BCa bootstrap confidence intervals as defined
by Efron. At the expense of additional computation, it
generates bootstrap confidence intervals that are second order
accurate (the percentile bootstrap confidence intervals are
first order accurate).
Enter HELP BOOTSTRAP PLOT for further information.
6) The CAPTURE HTML (for generating Dataplot output in HTML format)
capability has been extended to additional analysis commands.
In addition, Dataplot output can now be generated in Latex format
with the command
CAPTURE LATEX file.tex
with "file.tex" denoting the name where the Latex output is
generated. An END OF CAPTURE terminates the generation of
Latex output.
The CAPTURE HTML and CAPTURE LATEX commands now generate formatted
output for the following commands:
SUMMARY
TABULATE
CROSS TABULATE
CONSENSUS MEAN
CONSENSUS MEAN PLOT
LINEAR CALIBRATION
QUADRATIC CALIBRATION
YATES ANALYSIS
FIT
ANOVA
FRIEDMAN TEST
WILK SHAPIRO
ANDERSON DARLING
KOLMOGOROV-SMIRNOV GOODNESS OF FIT
CHI-SQUARE GOODNESS OF FIT
EXPONENTIAL MAXIMUM LIKELIHOOD
GUMBEL MAXIMUM LIKELIHOOD
WEIBULL MAXIMUM LIKELIHOOD
LOGISTIC MAXIMUM LIKELIHOOD
PARETO MAXIMUM LIKELIHOOD
UNIFORM MAXIMUM LIKELIHOOD
BETA MAXIMUM LIKELIHOOD
CONFIDENCE LIMITS
DIFFERENCE OF MEANS CONFIDENCE LIMITS
BIWEIGHT LOCATION CONFIDENCE LIMITS
TRIMMED MEAN CONFIDENCE LIMITS
MEDIAN/QUANTILE CONFIDENCE LIMITS
T TEST
F TEST
CHI-SQUARE TEST
GRUBB TEST
LEVENE TEST
FREQUENCY TEST
FREQUENCY WITHIN A BLOCK TEST
CUSUM TEST
In addition, WRITE HTML and WRITE LATEX commands have been added
to allow the generation of one-way tables.
We plan to implement this capability for most of the analysis
commands over the course of the next year or so. In addition,
we are investigating a similar capability for Rich Text
Format (RTF), which would allow importation into Word and
other word processing programs.
Output from unsupported commands is enclosed in "" and
" " tags for HTML and within the "begin{\verbatin}"
environment for Latex. Enter
HELP HTML
HELP LATEX
for details.
7) Dataplot has previously supported a LET ... = DERIVATIVE ...
command that generates analytic derivatives. However, this was
supported for a rather limited set of functions (enter
HELP DERIVATIVE for details). We have added the commands
LET A = NUMERICAL DERIVATIVE F WRT X FOR X = X0
LET Y = NUMERICAL DERIVATIVE F WRT X
to compute derivatives numerically. The distinction in the
above syntax is that the first command computes a single
derivative while the second syntax computes the derivative
for a vector of values (define X to contain the points at
which you want the derivative computed). For details, enter
HELP NUMERICAL DERIVATIVE f
8) Fixed following bugs:
a) Fixed the READ and WRITE commands to handle hyphens inside
of quoted file names correctly (only applies if
SET FILE NAME QUOTE ON entered).
b) The substitution character, "^", was modified to treat
anything other than a letter, a number, or an underscore
as terminator for the Dataplot name. Note that although you
can use some special characters in Dataplot names, this
is strongly discouraged.
c) Fixed a bug where the file name restriction of 80 characters
was actually a restriction on the entire command line. This
has been fixed so that file name may be up to 80 characters
and the full command line may be more than 80 characters.
d) Fixed a bug with the CAPTURE FLUSH command.
e) If an improper format is given on the SET WRITE FORMAT,
Dataplot will now return an error message rather than
crashing.
f) Fixed a bug in the generation of non-central chi-square,
non-central F, and doubly non-central F random numbers.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT April-May 2003.
-----------------------------------------------------------------------
1) Added the following plot commands
PARALLEL COORDINATES PLOT Y1 ... YK
The parallel coordinates plot is a technique for plotting
multivariate data. Enter HELP PARALLEL COORDINATES PLOT
for details.
2) Added support for the following statistics:
LET A = SN SCALE Y1
LET A = QN SCALE Y1
LET A = DIFFERENCE OF SN Y1 Y2
LET A = DIFFERENCE OF QN Y1 Y2
LET P1 = 10
LET P2 = 10
Enter HELP for the given statistic for details (e.g.,
HELP DIFFERENCE OF SN).
In addition, these statistics are supported for the following
plots and commands
STATISTIC PLOT Y1 Y2 X
CROSS TABULATE STATISTIC PLOT Y1 Y2 X1 X2
BOOTSTRAP PLOT Y1 Y2 X1 X2
JACKNIFE PLOT Y1 Y2 X1 X2
TABULATE Y1 Y2 X
CROSS TABULATE Y1 Y2 X1 X2
LET Z = CROSS TABULATE Y1 Y2 X1 X2
The DIFFERENCE OF COUNTS statistic is not supported for these
plots and commands (since it will simply be zero for all
cases).
The SN SCALE and QN SCALE statistics are also supported for
the following additional commands
DEX PLOT Y X1 ... XK
BLOCK PLOT Y X1 ... XK
INFLUENCE CURVE Y
INTERACTION PLOT Y X1 X2
LET Y = MATRIX COLUMN M
LET Y = MATRIX ROW M
3) The following probability distribution commands were added:
a) The following commands for multivariate random numbers
were added:
LET W = WISHART RANDOM NUMBERS MU SIGMA N
LET U = INDEPENDENT UNIFORM RANDOM NUMBERS LOWL UPPL NP
LET M = MULTIVARIATE T RANDOM NUMBERS MU SIGMA NU N
LET M = MULTINOMIAL RANDOM NUMBERS P N NEVENTS
For details, enter
HELP WISHART RANDOM NUMBERS
HELP INDEPENDENT UNIFORM RANDOM NUMBERS
HELP MULTIVARIATE T RANDOM NUMBERS
HELP MULTINOMIAL RANDOM NUMBERS
b) The following multivariate cumulative distribution and
probability density/mass function commands were added:
LET M = MULTIVARIATE NORMAL CDF SIGMA UPPL
LET M = MULTIVARIATE NORMAL CDF SIGMA LOWL UPPL
LET M = MULTIVARIATE T CDF SIGMA UPPL
LET M = MULTIVARIATE T CDF SIGMA LOWL UPPL
LET M = MULTINOMIAL PDF X P
These compute the cdf for multivariate normal and
multivariate t distributions and the pdf for the multinomial
distribution. For details, enter
HELP MULTIVARIATE NORMAL CDF
HELP MULTIVARIATE T CDF
HELP MULTINOMIAL PDF
c) Support for the following univariate distributions was
added:
LET A = LANCDF(X) - cdf of Landau distribution
LET A = LANPDF(X) - pdf of Landau distribution
LET A = LANPPF(P) - ppf of Landau distribution
LET A = LANDIF(X) - derivative of Landau pdf
LET A = LANXM1(X) - first moment function of
Landau distribution
LET A = LANXM2(X) - second moment function of
Landau distribution
LET A = ERRCDF(X,ALPHA) - cdf of error distribution
LET A = ERRPDF(X,ALPHA) - pdf of error distribution
LET A = ERRPPF(X,ALPHA) - ppf of error distribution
LET A = SLAPDF(X) - pdf of slash distribution
LET A = IBPDF(X,ALPHA) - pdf of inverted beta distribution
The cdf and ppf functions for the slash and inverted
beta distributions are still being developed.
You can generate random numbers for these distributions
with the commands
LET Y = LANDAU RANDOM NUMBERS FOR I = 1 1 N
LET Y = SLASH RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = ERROR RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = INVERTED BETA RANDOM NUMBERS FOR I = 1 1 N
The error distribution is also referred to as the
Subbotin, exponential power, or general error distribution.
There are several different parameterizations of this
distribution. Dataplot uses the parameterization of
Tadikamalla in "Random Sampling From the Exponential
Power Distribution", Journal of the American Statistical
Association, September, 1980. Enter HELP ERRPDF for
details.
d) Support was added for the following random number
generators:
1) GENZ - Alan Genz generator
2) LUXURY - based on the Marsagalia and Zaman
borrow-and-carry generator. Uses a code written
by F. James and incorporating improvements by
M. Luscher.
Enter HELP RANDOM NUMBER GENERATOR for details.
4) Added the following command:
LET Y2 X2 = STACK Y1 Y2 ... YK
This command appends the variables Y1, Y2, ..., YK into
the single variable Y2. In addition, X2 contains a
group identifier variable (values corresponding to Y1 are
set to 1, values corresponding to Y2 are set to 2, and so on).
Many Dataplot commands (e.g., BOX PLOT, MEAN PLOT, ANOVA)
require data be in the two-variable format (i.e., a response
variable and a group identifier variable). However, many
data files will simply have each response variable in a
separate column. The STACK command provides a convenient
way to generate the data in the form needed by many Dataplot
commands.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January-March 2003.
-----------------------------------------------------------------------
1) The Windows 95/98/ME/NT/2000/XP installation now uses
InstallShield. This should simplify the installation of
Dataplot on Windows platforms.
2) A few tweaks were made to the Postscript device.
a) Previously, Dataplot started a new page when the device
was intialized. It also started a new page when the first
plot was generated. This was to ensure that a fresh
page was started if you were generating diagrammatic
graphics before the first plot. However, it caused
a blank page to be printed for most applications.
Dataplot now automatically keeps track so that the first
plot will not generate the unneeded page erase.
b) Previously, the LANDSCAPE WORDPERFECT orientation (this
results in a landscape orientation on a portrait page)
was supported for encapsulated Postscript, but not for
regular Postscript. This orientation is now supported
for regular Postscript.
c) Dataplot allows you to switch between the various
orientations (LANDSCAPE, PORTRAIT, LANDSCAPE WORDPERFECT,
SQUARE) when using Postscript. For this reason, it sets
the bounding box for an 11x11 inch page.
The following command
SET POSTSCRIPT BOUNDING BOX
can be used to modify this behavior. If the value is
FLOAT (the default), the bounding box is set for an
11x11 inch page. If the value is set to FIXED, the
bounding box will be set according to whatever the current
orientation is when the device is initialized. However,
you should not change the orientation if FIXED is used.
If you are simply using the Postscript output for printing,
then you do not need to worry about this command. However,
it may occasionally be useful if are importing the Postscript
output into an external program.
3) Postscript was added to the list of devices supported by
the CAPTURE HTML command (see 3) for the August-December 2002
updates).
If a DEVICE 2 CLOSE command is encountered when CAPTURE HTML
is on and the device is set to postscript, Dataplot will first
use Ghostscript to convert the Postscript output to JPEG.
The JPEG file will have the same file name as the original
postscript file, but its extension will be changed to "jpg"
(e.g., the default name "dppl1f.dat" results in a JPEG file
called "dppl1f.jpg"). Dataplot will add an "
For example, on my Windows system, I use
SET GHOSTSCRIPT PATH F:\GS\GS704\GS\BIN\
We suggest that you add this command to your Dataplot
startup file "dplogf.tex".
b. We suggest using either the ORIENTATION PORTRAIT or the
ORIENTATION LANDSCAPE WORDPERFECT command to set the
orientation. Plots with a landscape orientation are
rotated in the Dataplot Postscript output (in order to
make full use of the page). Currently, Ghostscript does
not support a command line switch to rotate the graph.
This means that landscape plots will be rotated vertically
on the web page (you can use an external program, GIMP for
example, to rotate the JPEG files if you like).
4) Dataplot uses a vector graphics model. However, when you want
to incorporate Dataplot graphics into other applications, it
is often preferrable to work with bitmapped graphics.
Dataplot now supports the command:
SET POSTSCRIPT CONVERT
where is one of the following:
JPEG - for jpeg
PDF - for Portable Document Format (PDF)
TIFF - for Tiff
PBM - PBM Portable Bit Map Format (supports black and white)
PGM - PBM Portable Grey Map Format (supports grey scale)
PPM - Portable Pixmap Format (supports color)
PNM - PBM Portable Anymap Format (operates on PBM, PGM, or
PPM formats)
If is set to one of the choices above, a DEVICE 2 CLOSE
command is encountered, and the device is set to postscript, Dataplot
first uses Ghostscript to convert the Postscript output to the
requested format. The converted file will have the same file name
as the original postscript file, but its extension will be changed to
"jpg", "pdf", "tif", "pbm", "pgm", "ppm", or "pnm" depending on
the value of . For example, if is "PDF", the default
name "dppl1f.dat" results in a PDF file called "dppl1f.pdf").
As noted above in 3), this option assumes Ghostscript is installed
on your local system. You can use the SET GHOSTSCRIPT PATH
described above to set the path for Ghostscript.
Also, as noted in 3), we suggest using either the ORIENTATION PORTRAIT
or the ORIENTATION LANDSCAPE WORDPERFECT command to set the
orientation.
A few additional points:
a. The original postscript file is not deleted. An additional
plot file, with a different extension, is created.
b. The bit map formats are generally most useful when there is
one image per file. You can do something like the following:
SET POSTSCRIPT CONVERT JPEG
SET IPL1NA plot1.ps
DEVICE 2 POSTSCRIPT
... generate plot 1 ...
DEVICE 2 CLOSE
SET IPL1NA plot2.ps
DEVICE 2 POSTSCRIPT
... generate plot 2 ...
DEVICE 2 CLOSE
This will result in the files plot1.ps, plot1.jpg, plot2.ps, and
plot2.jpg.
The PDF files may be an exception to this. Depending on how
you want to use the generated plots, you can either
create all the plots in a single PDF file or put each plot
in a separate PDF file (using the above logic).
c. If the CAPTURE HTML switch is on, PDF files are incorporated
into the generated HTML file. For PDF files, no file
conversion is performed. Instead, a link to the PDF file is
added to the HTML page.
The advantage of the PDF format over JPEG is that it is typically
of higher quality than the JPEG file. The disadvantage is that
you have to link to another page to view it.
5) The CAPTURE HTML command can be used to save Dataplot numeric
and graphics output in an HTML page. By default, Dataplot
generates fairly minimal "header" and "footer" HTML code
(basically, it sets a white background and not much else).
If your basic purpose is to simply create a web viewable page,
then this is sufficient. However, many sites have specific style
guidelines for web pages. These can typically be incorporated into
the "header" and "footer" of the HTML page.
In order to provide additional flexibility to the appearance
of the web pages created using CAPTURE HTML, Dataplot now
supports the following two commands:
SET HTML HEADER FILE
SET HTML FOOTER FILE
If these commands are given, Dataplot will add the contents of
to the beginning and the contents of
to the end of the generated HTML file.
The Dataplot HELP directory contains the files
"sed_header.htm" and "sed_footer.htm". These can be used as
examples for developing your own templates (these implement
some NIST specific information, so they are not intended to be
used directly by non-NIST users).
Note that Dataplot does no error checking on these files. We
recommend that you view a page containing the intended header
and footer to detect problems with your HTML code.
Dataplot will only read 240 characters per line in these file.
6) One current limitation in Dataplot has been that reading data
from ASCII files was limited to a maximum of 132 columns. The
only way arround this was to use the SET READ FORMAT. However,
this did not work if the data did not have a consistent format.
The default limit was raised to 255 columns. To read even
longer data lines, use the command MAXIMUM RECORD LENGTH.
Enter HELP MAXIMUM RECORD LENGTH for details.
7) The following commands were added:
TRIMMED MEAN CONFIDENCE LIMITS Y
MEDIAN CONFIDENCE LIMITS Y
These provide confidence intervals for robust estimates of
location. Enter
HELP TRIMMED MEAN CONFIDENCE LIMITS
HELP MEDIAN CONFIDENCE LIMITS
for details.
8) The following plot commands were added:
VIOLIN PLOT Y X
SHIFT PLOT Y X
The VIOLIN PLOT is a mix of a a box plot and a kernel density
plot. The shift plot is a variation of quantile-quantile or
Tukey mean-difference plots.
Enter HELP VIOLIN PLOT and HELP SHIFT PLOT for details.
9) The Hotelling control chart capability was upgraded in the following
way:
a) A distinction is now made between phase I and phase II plots.
The previous implementation was effectively a phase I plot.
b) Support was added for the individual observations case.
Enter
HELP HOTELLING CONTROL CHART
for details.
10) The Ljung-Box test for randomness was added. This test is based
on the autocorrelation plot and is commonly used in the context
of ARIMA modeling. Enter
HELP LJUNG BOX TEST
for details.
11) The follwing miscellaneous changes were made:
a) A correction was made in the computation of the Herrell-Davis
quantile estimate. Enter HELP QUANTILE for details.
b) The SEARCH command now returns the line number that the
first match is found on in the internal parameter
LINENUMB. This can occassionaly be useful when writing
macros.
c) If no variable name is given on the READ command, Dataplot
will now try to automatically determine the variables.
There are two cases:
i) If the command SKIP AUTOMATIC was previously entered,
Dataplot will skip all lines until a line starting
with "----" is encountered. It will then backup one
line and read the variable list from that line.
This case is primarily used when reading data files
that come with the Dataplot distribution (i.e., the
files in the Dataplot "DATA" sub-directory). Most,
though not all, of these files follow that convention.
ii) If a SKIP AUTOMATIC command has not been entered,
Dataplot will read the first line of the file and
determine the number of columns of data. It will then
automatically name the variables X1 X2 ... XK (where
K is the number of variables).
Note that any SKIP, COLUMN LIMITS, or ROW LIMITS
commands will be honored when reading the first
line to determine the number of variables.
This capability only applies when reading variables (i.e.,
it is not supported for the READ PARAMETER, READ STRING,
or READ MATRIX cases). Also, it only applies when reading
from a file, not when reading from the terminal.
d) Some bugs were fixed.
12) Added support for the following statistics:
LET A = DIFFERENCE OF MEANS Y1 Y2
LET A = DIFFERENCE OF MIDMEANS Y1 Y2
LET A = DIFFERENCE OF MEIDANS Y1 Y2
LET A = DIFFERENCE OF MIDRANGE Y1 Y2
LET A = DIFFERENCE OF TRIMMED MEANS Y1 Y2
LET A = DIFFERENCE OF WINSORIZED MEANS Y1 Y2
LET A = DIFFERENCE OF GEOMETRIC MEANS Y1 Y2
LET A = DIFFERENCE OF HARMONIC MEANS Y1 Y2
LET A = DIFFERENCE OF HODGES-LEHMAN Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT LOCATIONS Y1 Y2
LET A = DIFFERENCE OF STANDARD DEVIATIONS Y1 Y2
LET A = DIFFERENCE OF VARIANCES Y1 Y2
LET A = DIFFERENCE OF AAD Y1 Y2
LET A = DIFFERENCE OF MAD Y1 Y2
LET A = DIFFERENCE OF INTERQUARTILE RANGE Y1 Y2
LET A = DIFFERENCE OF WINSORIZED SD Y1 Y2
LET A = DIFFERENCE OF WINSORIZED VARIANCE Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT MIDVARIANCE Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT SCALE Y1 Y2
LET A = DIFFERENCE OF PERCENTAGE BEND MIDVARIANCE Y1 Y2
LET A = DIFFERENCE OF GEOMETRIC SD Y1 Y2
LET A = DIFFERENCE OF RANGE Y1 Y2
LET A = DIFFERENCE OF SKEWNESS Y1 Y2
LET A = DIFFERENCE OF KURTOSIS Y1 Y2
LET A = DIFFERENCE OF RELATIVE SD Y1 Y2
LET A = DIFFERENCE OF COEFFICIENT OF VARIATION Y1 Y2
LET A = DIFFERENCE OF SD OF MEAN Y1 Y2
LET A = DIFFERENCE OF RELATIVE VARIANCE Y1 Y2
LET A = DIFFERENCE OF VARIANCE OF MEAN Y1 Y2
LET A = DIFFERENCE OF QUANTILE Y1 Y2
LET A = DIFFERENCE OF MINIMUM Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF EXTREME Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF SUM Y1 Y2
LET A = DIFFERENCE OF COUNTS Y1 Y2
Enter HELP for the given statistic for details (e.g.,
HELP DIFFERENCE OF MEANS).
In addition, these statistics are supported for the following
plots and commands
STATISTIC PLOT Y1 Y2 X
CROSS TABULATE STATISTIC PLOT Y1 Y2 X1 X2
BOOTSTRAP PLOT Y1 Y2 X1 X2
JACKNIFE PLOT Y1 Y2 X1 X2
TABULATE Y1 Y2 X
CROSS TABULATE Y1 Y2 X1 X2
LET Z = CROSS TABULATE Y1 Y2 X1 X2
The DIFFERENCE OF COUNTS statistic is not supported for these
plots and commands (since it will simply be zero for all
cases).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT August-December 2002.
----------------------------------------------------------------------
1) Added the following command:
AUTO TEXT <ON/OFF>
Entering AUTO TEXT ON will prepend a TEXT to all subsequent
lines until an AUTO TEXT OFF command is encoutered. This
command is used in generating word slides. Enter
HELP AUTO TEXT
for details.
2) The list of supported statistics has been expanded for the
following commands:
BLOCK PLOT
DEX PLOT
TABULATE
CROSS TABULATE
MATRIX ROW STATISTIC
MATRIX COLUMN STATISTIC
CROSS TABULATE (LET)
Enter the corresponding HELP command for a complete list
of supported statistics.
3) The CAPTURE command added the following option:
CAPTURE HTML <file-name>
This writes the output from the CAPTURE command in HTML
format. Note that most commands simply use a
<PRE> ... </PRE> syntax. Curently, the exceptions are the
TABULATE and CROSS TABULATE, which write the output using
HTML table syntax.
This can be used in conjunction with the WEB command. For
example,
SKIP 25
READ RIPKEN.DAT Y X1 X2
ECHO ON
CAPTURE HTML C:\TABLE.HTM
TABULATE MEAN Y X1
CROSS TABULATE MEAN Y X1 X2
END OF CAPTURE
WEB file://C:\TABLE.HTM
In addition, if DEVICE 2 is set to PNG, JPEG, or SVG, Dataplot
will incorporate the graphics into the web page using the
IMG tag. For example,
device 1 x11
.
skip 25
read berger1.dat y x
.
line blank solid
character x blank
echo on
capture html fit.htm
set ipl1na data.png
device 2 gd png
title original data
plot y x
device 2 close
fit y x
set ipl1na pred.png
device 2 gd png
title predicted line
plot y pred vs x
device 2 close
end of capture
.
web file:///home/heckert/dataplot/solaris/fit.htm
4) The maximum number of lines in a loop was raised from 500 to
1,000.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT April-July 2002.
----------------------------------------------------------------------
1) Added support for the following probability distribution
functions.
a) Two-Sided Power
TSPCDF(X,THETA,N)
TSPPDF(X,THETA,N)
TSPPPF(X,THETA,N)
LET THETA = <value>
LET N = <value>
LET Y = TWO-SIDED POWER RANDOM NUMBERS FOR I = 1 1 100
LET THETA = <value>
LET N = <value>
TWO-SIDED POWER PROBABILITY PLOT Y
TWO-SIDED POWER PPCC PLOT Y
LET THETA = <value>
LET N = <value>
CHI-SQUARE TWO-SIDED POWER GOODNESS OF FIT TEST Y
LET THETA = <value>
LET N = <value>
KOLMOGOROV-SMIRNOV TWO-SIDED POWER GOODNESS OF FIT TEST Y
LET A = <lower limit>
LET B = <upper limit>
TWO-SIDED POWER MAXIMUM LIKELIHOOD Y
Note: The MLE estimator assumes that the value of the lower
and upper limits (default to 0 and 1) are known and fixed.
It returns estimates for THETA and N.
b) Bi-Weibull
BWECDF(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEPDF(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEPPF(P,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEHAZ(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWECHAZ(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
LET Y = BIWEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
BIWEIBULL PROBABILITY PLOT Y
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
CHI-SQUARE BIWEIBULL GOODNESS OF FIT TEST Y
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
KOMOGOROV-SMIRNOV BIWEIBULL GOODNESS OF FIT TEST Y
c) Multivariate normal distribution
LET MU = DATA <list of p means>
READ MATRIX SIGMA
<pxp set of values>
END OF DATA
LET N = <value>
LET M = MULTIVARIATE NORMAL RANDOM NUMBERS MU SIGMA N
Note that M will be an NxP matrix. N is the number of rows
generated for each component and their are P components to
the multivariate normal. SIGMA is the pxp variance-covariance
matrix of the multivariate normal. SIGMA will be checked to
ensure that it is a positive definite matrix. MU is a vector
specifying the means of the p components.
This command utilizes a code written by Charlie Reeves when
he was a member of the NIST Statistical Engineering Division.
d) Multinomial distribution
LET P = DATA <list of probabilities that sum to 1>
LET NEVENTS = <value>
LET NCAT = SIZE P
LET N = <value>
LET M = MULTINOMIAL RANDOM NUMBERS P NEVENTS NCAT N
Note that M will be an NxP matrix. N is the number of rows
generated for each component and their are P components to
the multivariate normal. SIGMA is the pxp variance-covariance
matrix of the multivariate normal. SIGMA will be checked to
ensure that it is a positive definite matrix. MU is a vector
specifying the means of the p components.
e) Logarithmic series distribution
Added randon number generation for this distribution. For
example,
LET THETA = 0.7
LET Y = LOGARITHMIC SERIES RANDOM NUMBERS FOR I = 1 1 500
The cdf, pdf, and ppf functions are already available for
this distribution.
2) Made the following updates to the FIT command:
a) Added the command:
SET FIT ADDITIVE CONSTANT <ON/OFF>
If OFF, then Dataplot does not include a constant term
in a multi-linear fit (i.e., FIT Y X1 X2 ...). The
default is to include the additive constant.
b) If Dataplot detects a singularity in a multi-linear fit,
it now prints an error message. Previously, it simply
set all the parameter estimates to 0 and terminated the
fit. In addition, Dataplot explictly checks for two
types of singularities: a column that contains all the same
values (this essentially adds an addtional constant term) and
for two columns being equal.
c) Added the command:
LET M = CREATE MATRIX X1 ... XK
where X1 ... XK designates a list of previously defined
variables.
This command has a similar function as the MATRIX DEFINITION
command. However, the MATRIX DEFINITION command
creates matrices from variables that are contiguous
(the order of variables is determined by the order
in which they were created in Dataplot). The
CREATE MATRIX command does not have this restriction.
The variables need not be contiguous.
This command is useful for creating a design matrix
in regression problems that can be used as input for
some of the new commands that follow.
d) Added the command:
LET C = CATCHER MATRIX X
This computes the catcher matrix, X*(X'X)**(-1). This
matrix is used in the computation of certain regression
diagnostics (e.g., Variance Inflation Factors, Partial
Regression Plots). This command greatly simplifies the
writing of macros to generate these regression diagnostics
(and allows larger design matrices to be used). Enter
HELP CATCHER MATRIX for details.
e) Added the command:
LET XTXINV = XTXINV MATRIX X
This computes the matrix (X'X)**(-1). This
matrix is used in the computation of certain regression
diagnostics (e.g., DFBETA statistic) and in computing
certain confidence and prediction intervals for multi-linear
fits. This command simplifies the writing of macros to
generate these regression diagnostics and intervals
(and allows larger design matrices to be used). Enter
HELP XTXINV MATRIX for details.
f) Added the command:
LET C = CONDITION INDICES X
where X is the design matrix for a multi-linear fit
(note that you need to create the indpendent variables,
including a column containing all 1's, as a matrix).
The condition indices provide a measure of colinearity
in the design matrix. Enter HELP CONDITION INDICES for
details.
g) Added the command:
LET VIF = VARIANCE INFLATION FACTORS X
where X is the design matrix for a multi-linear fit
(note that you need to create the indpendent variables,
including a column containing all 1's, as a matrix).
The variance inflation factors provide a measure of
colinearity in the design matrix. Enter
HELP VARIANCE INFLATION FACTORS for details.
h) Added the following plot commands:
PARTIAL REGRESSION PLOT Y X1 ... XK XI
PARTIAL RESIDUAL PLOT Y X1 ... XK XI
PARTIAL LEVERAGE PLOT Y X1 ... XK XI
CCPR PLOT Y X1 ... XK XI
MATRIX PARTIAL REGRESSION PLOT Y X1 ... XK
MATRIX PARTIAL RESIDUAL PLOT Y X1 ... XK
MATRIX PARTIAL LEVERAGE PLOT Y X1 ... XK
MATRIX CCPR PLOT Y X1 ... XK XI
These generate partial regression plots, partial residual
plots, partial leverage plots, and component and
component-plus-residual (CCPR) plots for a multi-linear fit.
These plots are typically used to assess the effect of
a variable on the fit given the effect of other variables
already included in the fit.
There are 2 forms for the command.
In the first form, a single plot is generated. In this case,
the last variable listed is the "primary" variable. That is,
this is the variable we are considering adding/deleting from
the fit. Note that this variable should already be listed.
That is, a fit of Y versus X1 to XK is performed (including XI),
then the plot assesses the effect of XI on the fit.
In the second form, a multiplot is generated where each
of the indpendent variables is used as the primary variable.
Enter
HELP PARTIAL REGRESSION PLOT
HELP PARTIAL RESIDUAL PLOT
HELP PARTIAL LEVERAGE PLOT
HELP CCPR PLOT
for details.
i) For multi-linear fits, the output for DPST2F.DAT was
enhanced to include Bonferroni and Hotelling joint
confidence limits, respectively, for the predicted values.
By default, a 95% interval is generated. To use a different
alpha value, enter the following command before the fit:
LET ALPHA = 0.90
In addition, the output for DPST1F.DAT now includes
the t critical value and lower and upper joint Bonferroni
confidence limits for the parameters. The format 5E15.7
is used in writing these values.
In addition, for multi-linear fits, the regression ANOVA
table is written to the file DPST5F.DAT. In addition, the
values for R**2, adjusted R**2, and the Press P statistic are
also printed to this file. Theses three statistics are
saved as the internal parameters RSQUARE, ADJRSQUA, and PRESSP,
respectively.
j) One weakness in the Dataplot multi-linear fit routine
has been the lack of any "forward selection/backward
selection/best subsets" capabilities.
The command
BEST CP Y X1 ... XK
was added to identify the best candidate models using
the Mallow's CP criterion. Enter HELP BEST CP for details.
k) Added the command:
BOOTSTRAP FIT Y X1 .... XK
This performs a bootstrap linear/multilinear fit. Bootstrap
linear fits are an alternative to weighting and transformation
when the assumptions for multilinear fitting are not
satisfied (that is, the errors from the fit are independent and
have a common distribution, typically assumed to be normal, with
common location and scale). Enter HELP BOOTSTRAP FIT for
details.
3) Added support for alternative random number generators. Note
that the default generator (i.e., the one that has been in
Dataplot for many years) is based on Fibonacci sequence as
defined by Marsagalia. Note that this is equivalent to the
generator UNI of Jim Blue, David Kahaner, and George Marsagalia
that is in the CMLIB library.
Support is now provided for a linear congruential generator
written by Fullerton (CMLIB routine RUNIF) and a multiplicative
congruential generator (ACM algorithm 599). In addition,
2 generators based on the generalized feedback shift
register (GFSR) methods are supported. The first is based on the
original algorithm of Lewis and Payne (Journal of the ACM,
Volume 20, pp. 456-468). The second is an alternative
implementation given by Fushimi and Tezuka (Journal of the
ACM, Volume 26, pp. 516-523). Both are based on codes
given by Monohan (2000) in "Numerical Methods of Statistics".
Support is also provided for the Applied Statistics algorithm
183. AS183 is based on the fractional part of the sum of 3
multiplicative congruential generators. It requires 3 integers
be specified initially. Dataplot uses the multiplicative
congruenetial generator (which does depend on the SEED command)
to randomly generate these 3 integers.
These 6 generators are used to generate uniform random numbers.
Random numbers for other distributions are then derived from
these uniform random numbers.
To specify the uniform random number generator, use the command
SET RANDOM NUMBER GENERATOR FIBONACCI
SET RANDOM NUMBER GENERATOR LINEAR CONGRUENTIAL
SET RANDOM NUMBER GENERATOR MULTIPLICATIVE CONGRUENTIAL
SET RANDOM NUMBER GENERATOR GFSR
SET RANDOM NUMBER GENERATOR FUSHIMI
SET RANDOM NUMBER GENERATOR AS183
Note that you can use the SEED command to change the random numbers
generated as well. The SEED does not apply to the 2 GFSR
generators (these each have their own initialization routines).
4) Added support for the following special functions.
a) Fermi-Dirac function
FERMDIRA(X,ORDER)
where ORDER is the order of the function. ORDER can be
-0.5, 0.5, 1.5, or 2.5 (Dataplot uses an epsilon of 0.1,
any order not within epsilon of one of the above values
results in an error. Enter HELP FERMDIRA for details.
5) Added support for the following statistics:
LET A = WINSORIZED VARIANCE Y
LET A = WINSORIZED SD Y
LET A = WINSORIZED COVARIANCE Y X
LET A = WINSORIZED CORRELATION Y X
LET A = BIWEIGHT MIDVARIANCE Y X
LET A = BIWEIGHT MIDCOVARIANCE Y X
LET A = BIWEIGHT MIDCORRELATION Y X
LET A = PERCENTAGE BEND MIDVARIANCE Y
LET A = PERCENTAGE BEND CORRELATION Y1 Y2
LET A = HODGES LEHMAN Y
LET A = TRIMMED MEAN STANDARD ERROR
LET A = <XQ> QUANTILE Y
LET A = <XQ> QUANTILE STANDARD ERROR Y
Enter
HELP WINSORIZED VARIANCE
HELP WINSORIZED SD
HELP WINSORIZED COVARIANCE
HELP WINSORIZED CORRELATION
HELP BIWEIGHT MIDVARIANCE
HELP BIWEIGHT MIDCOVARIANCE
HELP BIWEIGHT MIDCORRELATION
HELP PERCENTAGE BEND MIDVARIANCE
HELP PERCENTAGE BEND CORRELATION
HELP HODGES LEHMAN
HELP TRIMMED MEAN STANDARD ERROR
HELP QUANTILE
HELP QUANTILE STANDARD ERROR
for details.
6) Added the following plot:
<stat> INFLUENCE CURVE Y XSEQ
where <stat> is one of the built-in supported statistics,
Y is a response variable, and XSEQ is a sequence of x values.
The plot is generated by looping through the values in XSEQ.
For a given value of XSEQ, the value of <stat> is computed for
that value of XSEQ along with the values in Y. The vertical
axis of the plot contains the computed statistic while the
horizontal axis contains the value of XSEQ.
This plot is of interest in the field of robust statistics.
For details, enter HELP INFLUENCE CURVE.
7) For the ANOVA command, the residual standard deviations for
various models are written to the file DPST3F.DAT (these are
the same values that appear in the fitted output). This
allows these values to be read back in as a variable, which
is occassionally useful in writing macros that involve an
ANOVA step.
8) The PROBE command now recognizes the following:
PROBE IDMAN(1)
PROBE IDMAN(2)
PROBE IDMAN(3)
This identifies the current manufacturer for devices 1, 2, and
3 respectively. In addition, the value of PROBEVAL is set
if the returned manufacturer is one of the following:
X11 = 1
QWIN = 2
REGI = 3
TEKT = 4
OPGL = 5
QUAR or MACI = 6
POST or PS = 7
HP or HPGL = 8
GENE = 9
GD = 10
QUIC = 11
CALC = 12
ZETA = 13
GKS = 14
LAHE = 15
PRIN = 16
LATE = 17
SVG = 18
DISC = 19
In addition, the device model can be extracted via the commands
PROBE IDMOD(1)
PROBE IDMOD(2)
PROBE IDMOD(3)
PROBE IDMO2(1)
PROBE IDMO2(2)
PROBE IDMO2(3)
PROBE IDMO3(1)
PROBE IDMO3(2)
PROBE IDMO3(3)
The following PROBE commands were added to return the
operating system and compiler, respectively.
PROBE IOPSY1
PROBE ICOMPI
For IOPSY1, the value of PROBEVAL is also set:
UNIX = 1 (Unix)
PC-D = 2 (Windows)
VMS = 3 (VAX/VMS)
other = 0
For ICOMPI, the value of PROBEVAL is also set:
f77 = 1 (the Unix Fortran compiler)
MS-F = 2 (the Microsoft, now Compaq, Fortran compiler)
LAHE = 3 (the Lahey Fortran compiler)
other = 0
In general, if the PROBE command returns a string value of ON,
OPENED, or YES, it sets the value of the PROBEVAL parameter to 1.
Similarly, if the PROBE command returns a string value of OFF,
CLOSED, or NO, it sets the value of the PROBEVAL parameter to 0.
The above uses of PROBE are primarily of value in writing
general purpose macros. In particular, macros that are intended
to be used by others.
9) The following command was added:
CAPTURE FLUSH
The purpose of this command is to allow Dataplot text output
to be written to the graphics output file. This can be useful
when you are writing a macro and you want the analytic output
(for example, the output from a fit) to be included with the
graphics output. The following shows a sample of how this
command is used:
device 1 x11
device 2 postscript
.
title automatic
skip 25
read gear.dat y x
.
mean plot y x
.
move 5 95
margin 5
capture junk.dat
tabulate mean y x
capture flush
end of capture
.
device 2 close
system lpr dppl1f.dat
The initial CAPTURE command directs text output to the
file "junk.dat". When the CAPTURE FLUSH command is
encountered, the capture file is closed, an ERASE command
is generated for the graphics devices, the contents of
the capture file are printed on the graphics devices using
the TEXT command (i.e., each line of the file generates a
distinct TEXT command), and then the capture file is re-opened
(it will start at the beginning).
Since the lines are generated with the TEXT command, the
appearance of the text can be controlled with the various
TEXT attribute commands. Also, it is recommended that
CRLF be set to ON (the default), a MOVE command be given to
set the position for the first line of the text, and a MARGIN
command be entered to set the beginning x-coordinate for the
line.
Some output may be too long to display on one page. You
can control the number of lines printed per page with the
following command:
SET CAPTURE LINES <value1> ... <value5>
Up to 5 values may be entered. The first value is for the
first page of output, the second value is for the second
page of output, and so on. If more than 5 values are
entered, then the page limits start over (i.e., page 6 uses
the value for page 1, page 7 uses the value for page 2, and
so on). The default is 25 lines for all pages.
If the MULTIPLOT switch is ON, the initial page erase is
suppressed. The following example shows how this feature
can be used:
.
device 1 x11
device 2 ps
device 1 font simplex
.
title automatic
skip 25
read gear.dat y x
.
multiplot 2 2
multiplot corner coordinates 0 0 100 100
multiplot scale factor 2
.
mean plot y x
sd plot y x
.
move 5 98
margin 5
plot
capture junk.dat
tabulate mean y x
capture flush
end of capture
move 5 98
plot
capture junk.dat
tabulate sd y x
capture flush
end of capture
.
end of multiplot
.
Note that the null PLOT command is used to move to the
next plot area without actually generating a plot.
This example draws a mean and standard deviation plot
on the first row and then suplements that with the numeric
values generated using the TABULATE command on the second
row.
The following two commands are also available.
SET CAPTURE NUMBER <ON/OFF>
SET CAPTURE BOX <ON/OFF>
If SET CAPTURE NUMBER ON is entered, the output lines are
numbered. This is primarily a convenience function to help
determine what values to enter for the SET CAPTURE LINES command
in order to generate breaks at the appropriate spots.
If SET CAPTURE BOX ON is entered, a box will be drawn for each
page of the output. Use the BOX 1 CORNER COORDINATES command,
before the CAPTURE FLUSH, to specify the cooridinates of the
box. Use the various BOX attribute commands to set the
properties of the box.
10) The following enhancements were made to the IF command:
a) You can now test for strings with the IF command. That is,
LET STRING S = TEST
IF S = TEST
PRINT S
ENDS OF IF
LET STRING S = TEST
IF S <> "NOT TEST"
PRINT S
ENDS OF IF
Note that "=" and "<>" are the only comparisons allowed (i.e.,
no "<" or ">").
The argument on the left of the "=" must be the name of a
previously defined string. The argument to the right of the
"=" is a literal string. The string can be enclosed in
dooble quotes, ", if it contains spaces. If there are no
double quotes, the string is assumed to end once the first
space is encountered.
b) Support was added for a ELSE and ELSE IF clauses. For
example,
IF A = 2
PRINT "A = 2"
ELSE
PRINT "A NOT EQUAL 2"
END OF IF
or
IF A = 2
PRINT "A = 2"
ELSE IF A = 1
PRINT "A = 1"
ELSE
PRINT "A NOT EQUAL 2 AND A NOT EQUAL 1"
END OF IF
c) A bug was fixed for the IF ... NOT EXIST and IF ... EXIST
cases. Also, these now test whether the name exists as a
parameter, string, variable, or matrix (previously, it only
checked if it was a parameter).
11) One problem with reading files in Dataplot has been the
inability to handle directory and file names with embedded
spaces. The command
SET FILE NAME QUOTE <ON/OFF>
was added to address this problem. If ON is specified,
then the file name may be enclosed in double quotes (").
All text, including spaces, until the matching ending double
quote is found are considered a part of the file name (no
provision is made for file names containing a double quote
character). If OFF is specified, this feature is disabled.
The default is OFF to accomodate quoted strings on the WRITE
that might contain a "." (which is what Dataplot uses to
identify a file name). For example,
WRITE "Example of writing a string."
The following will work as intended:
SET FILE NAME QUOTE ON
WRITE "C:\ My Data\STRING.OUT" "String to STRING.OUT"
12) Modified the output for the SIGN TEST, SIGNED RANK TEST, and
the RANK SUM test to have better clarity.
13) Added the following to the BOOTSTRAP PLOT command:
BOOTSTRAP CORRELATION PLOT Y X
BOOTSTRAP RANK COVARIANCE PLOT Y X
BOOTSTRAP RANK CORRELATION PLOT Y X
BOOTSTRAP COVARIANCE PLOT Y X
BOOTSTRAP LINEAR CALIBRATION PLOT Y X
BOOTSTRAP QUADRATIC CALIBRATION PLOT Y X
14) Fixed several bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT November-March 2002.
----------------------------------------------------------------------
1) Added the following probability distributions.
a) Geometric Extreme Exponential
GEECDF(X,GAMMA)
GEEPDF(X,GAMMA)
GEEPPF(X,GAMMA)
GEEHAZ(X,GAMMA)
GEECHAZ(X,GAMMA)
LET GAMMA = <value>
LET Y = GEOMETRIC EXTREME EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
GEOMETRIC EXTREME EXPONENTIAL PROBABILITY PLOT Y
GEOMETRIC EXTREME EXPONENTIAL PPCC PLOT Y
LET GAMMA = <value>
CHI-SQUARE GEOMETRIC EXTREME EXPONENTIAL GOODNESS OF FIT TEST Y
LET GAMMA = <value>
KOLMOGOROV-SMIRNOV GEOMETRIC EXTREME EXPONENTIAL GOODNESS OF FIT TEST Y
b) Johnson SB
JSBCDF(X,ALPHA1,ALPHA2)
JSBPDF(X,ALPHA1,ALPHA2)
JSBPPF(X,ALPHA1,ALPHA2)
LET ALPHA1 = <value>
LET ALPHA2 = <value>
LET Y = JOHNSON SB RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA1 = <value>
LET ALPHA2 = <value>
JOHNSON SB PROBABILITY PLOT Y
JOHNSON SB PPCC PLOT Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
CHI-SQUARE JOHNSON SB GOODNESS OF FIT TEST Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
KOLMOGOROV-SMIRNOV JOHNSON SB GOODNESS OF FIT TEST Y
c) Johnson SU
JSUCDF(X,ALPHA1,ALPHA2)
JSUPDF(X,ALPHA1,ALPHA2)
JSUPPF(X,ALPHA1,ALPHA2)
LET ALPHA1 = <value>
LET ALPHA2 = <value>
LET Y = JOHNSON SU RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA1 = <value>
LET ALPHA2 = <value>
JOHNSON SU PROBABILITY PLOT Y
JOHNSON SU PPCC PLOT Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
CHI-SQUARE JOHNSON SU GOODNESS OF FIT TEST Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
KOLMOGOROV-SMIRNOV JOHNSON SU GOODNESS OF FIT TEST Y
d) Generalized Tukey-Lambda
Note: still being tested/developed. In particular,
negative values of shape parameter are not working.
GLDCDF(X,LAMBDA3,LAMBDA4)
GLDPDF(X,LAMBDA3,LAMBDA4)
GLDPPF(X,LAMBDA3,LAMBDA4)
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
LET Y = GENERALIZED TUKEY LAMBDA RANDOM NUMBERS FOR I = 1 1 100
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
GENERALIZED TUKEY LAMBDA PROBABILITY PLOT Y
GENERALIZED TUKEY LAMBDA PPCC PLOT Y
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
CHI-SQUARE GENERALIZED TUKEY LAMBDA GOODNESS OF FIT TEST Y
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
KOLMOGOROV-SMIRNOV GENERALIZED TUKEY LAMBDA GOODNESS OF FIT TEST Y
2) Added support for the following new statistics.
a) LET A = BIWEIGHT LOCATION Y
b) LET A = BIWEIGHT SCALE Y
For more information, enter the following commands:
HELP BIWEIGHT LOCATION
HELP BIWEIGHT SCALE
3) Added support for a biweight based confidence interval:
BIWEIGHT CONFIDENCE INTERVAL Y
For more information, enter the following command:
HELP BIWEIGHT CONFIDENCE INTERVAL
4) Added the following command:
SET BOX PLOT WIDTH <VARIABLE/FIXED>
This specifies whether box plots are drawn with fixed width
or variable width boxes. In variable width box plots, the
width of the box is proportional to the maximum group sample
size. That is, the largest width is used for the box plot
with the largest sample size. The remaining box plots
compute a scale factor that is the sample size of the given
box plot relative to the maximum sample size.
The default is variable width. This is recommended in most cases
as it conveys additional information regarding the relative
sample sizes. However, there are cases where it is desirable
to turn this feature off (e.g., when multiple BOX PLOT commands
are used to overlay box plots on the same page.
5) Added the following commands:
SET 4PLOT MULTIPLOT <ON/OFF>
SET 6PLOT MULTIPLOT <ON/OFF>
Setting these switches ON specifies that the multiplot corner
coordinates will be used to size the 4-PLOT and 6-PLOT,
respectively. The default is OFF (i.e., the plot sizes are
hard-coded to a default value). If set to ON, then you
can use the MULTIPLOT CORNER COORDINATES to size the
graphs.
6) ROBUSTNESS PLOT was added as a synonym for BLOCK PLOT.
7) Support was added for the Scalable Vector Graphics (SVG)
graphics output. SVG is an XML based vector graphics format
that is expected to become increasingly popular for web based
applications. SVG format files can also be imported into
several popular graphics editing programs. For more information,
enter
HELP SVG
8) The VERSION command was re-activated.
9) Fixed several bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT May-October 2001.
----------------------------------------------------------------------
1) Added support for kernel density plots. Enter
HELP KERNEL DENSITY PLOT
for details.
2) Added the following command:
CONSENSUS MEAN PLOT
This plot summarizes the results of a consensus means analysis.
Enter
HELP CONSENSUS MEANS PLOT
for details.
3) Added the following probability distributions.
a) Inverted Weibull
IWECDF(X,GAMMA)
IWEPDF(X,GAMMA)
IWEPPF(X,GAMMA)
IWEHAZ(X,GAMMA)
IWECHAZ(X,GAMMA)
LET GAMMA = <value>
LET Y = INVERTED WEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
INVERTED WEIBULL PROBABILITY PLOT Y
INVERTED WEIBULL PPCC PLOT Y
LET GAMMA = <value>
CHI-SQUARE INVERTED WEIBULL GOODNESS OF FIT TEST Y
LET GAMMA = <value>
KOLMOGOROV-SMIRNOV INVERTED WEIBULL GOODNESS OF FIT TEST Y
b) Log Double Exponential
LDECDF(X,ALPHA)
LDEPDF(X,ALPHA)
LDEPPF(X,ALPHA)
LET ALPHA = <value>
LET Y = LOG DOUBLE EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LOG DOUBLE EXPONENTIAL PROBABILITY PLOT Y
LOG DOUBLE EXPONENTIAL PPCC PLOT Y
LET ALPHA = <value>
CHI-SQUARE lOG DOUBLE EXPONENTIAL GOODNESS OF FIT TEST Y
LET ALPHA = <value>
KOLMOGOROV-SMIRNOV LOG DOUBLE EXPONENTIAL GOODNESS OF FIT TEST Y
4) Added support for random number for the following distributions:
LET Y = COSINE RANDOM NUMBERS FOR I = 1 1 100
LET Y = ANGLIT RANDOM NUMBERS FOR I = 1 1 100
LET Y = HYPERBOLIC SECANT RANDOM NUMBERS FOR I = 1 1 100
LET Y = ARCSIN RANDOM NUMBERS FOR I = 1 1 100
LET Y = HALF-LOGISTIC RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = DOUBLE WEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = DOUBLE GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = INVERTED GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = LOG GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = GENERALIZED EXTREME VALUE RANDOM NUMBERS FOR I = 1 1 100
LET DELTA = <value>
LET Y = LOG LOGISTIC RANDOM NUMBERS FOR I = 1 1 100
LET BETA = <value>
LET Y = BRADFORD RANDOM NUMBERS FOR I = 1 1 100
LET B = <value>
LET Y = RECIPROCAL RANDOM NUMBERS FOR I = 1 1 100
LET C = <value>
LET B = <value>
LET Y = GOMPERTZ RANDOM NUMBERS FOR I = 1 1 100
LET P = <value>
LET Y = POWER NORMAL RANDOM NUMBERS FOR I = 1 1 100
LET P = <value>
LET SD = <value>
LET Y = POWER LOGNORMAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LET BETA = <value>
LET Y = POWER EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LET BETA = <value>
LET Y = ALPHA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET THETA = <value>
LET Y = EXPONENTIATED WEIBULL RANDOM NUMBERS FOR I = 1 1 100
5) Extended the ppcc plot to handle distributions with 2
shape parameters. Specifically,
BETA PPCC PLOT
GOMPERTZ PPCC PLOT
ALPHA PPCC PLOT
EXPONENTIAL POWER PPCC PLOT
EXPONENTIATED WEIBULL PPCC PLOT
This generates a 3-d plot of ppcc value over the range of
values taken by the 2 shape parameters.
Support for several additional 2-shape parameter distributions
is still being tested.
Enter HELP PPCC PLOT for details.
6) Made some updates to the STANDARDIZE command.
a) LET Y2 = USCORE Y X1 X2
This syntax generates a u-score (i.e., subtract the minimum
and divide by the range). This effectively translates
the variable to a uniform (0,1) scale (much as the z-score
translates to a standard normal scale).
b) LET Y2 = SCALE STANDARDIZE Y X1 X2
This divides by the scale statistic, but does not subtract
the location statistic first.
c) Support was added for additional location and scale
statistics.
Enter HELP STANDARDIZE for details.
7) Added the command
LET Y2 = CROSS TABULATE <stat> Y X1 X2
where <stat> is one of approximately 25 statistics.
This command is related to, but different than, the
analysis command CROSS TABULATE. This command stores
the value of the cross tabulated statistic in
each row of Y2 (where Y2 is the same length as the original
array Y). The purpose of this form of the cross tabulation
is to allow the cross tabulated values to be used in
subsequent computations (e.g., to compute statistics not
supported directly by Dataplot).
For more information, enter the following command:
HELP CROSS TABULATE (LET)
In this case, you need to specify the "(LET)" in order to
avoid ambiguity with other CROSS TABULATE commands.
8) Added support for the following new statistics.
a) LET A = INTERQUARTILE RANGE Y
For more information, enter the following commands:
HELP INTERQUARTILE RANGE
9) Added the following commands:
LET A = COMMON DIGITS Y
LET A = NUMBER OF COMMON DIGITS Y
These commands return the common digits, and the number of
common digits, of a vector of numbers. For example, given
the numbers 3.214, 3.216, 3.217, and 3.219, the common digits
are 3.21 and the number of common digits is 2. The common digits
are tested to the the RIGHT of the decimal point only (although
Dataplot does include the portion to the left of the decimal
point when returning the value of the common digits). If the
numbers do no match in their integer portion, Dataplot does
not return any common digits. This is a convenience command
that was added to simplify some macros we were writing.
10) Added the following command:
LET Y = MATCH X VAL
LET Z2 = MATCH X VAL Z
This command matches each value in VAL against X. For the
first syntax, it returns the index of the X array where the
match was found. A match is that value that is closest in
absolute value (i.e., an exact match is not required, so
a match will always be returned). For the second syntax,
the index is used to extract the value in Z corresponding to
the matched index. This second syntax in fact implements the
most common use of this command (i.e., the index is usually
not of interest in itself, rather it is used to extract
appropriate values from another variable).
11) A few bug fixes were made. In partiuclar,
a) The ANDERSON DARLING WEIBULL TEST was modified slightly.
You no longer get an error message if the GAMMA parameter
is not specified. This GAMMA was not actually being used.
The command now does the following:
i) If no GAMMA (shape parameter) or BETA (scale parameter)
has been predefined, maximum likelihood estimates are
computed automatically.
ii) If GAMMA and BETA are pre-defined, then the test is
based on these values. This allows you to test the goodness
of fit for parameter values obtained by methods other than
maximum likelihood.
b) Made a few fixes in the SINGLE SAMPLE ACCEPTANCE PLAN
command. Specifically, it now requires P1 < P2. In addition,
a maximum number of iterations has been added to detect
convergence problems (although this usually caused by P1 > P2).
Also modified the documentation for this command to provide
more realistic examples.
c) Fixed some bugs in the GD device driver (JPEG and PNG
support).
d) The COLUMN LIMITS command now works with READ STRING
(when the string is read from a file).
e) The output for a number of confirmatory tests was modified
for clarity. Note that the underlying computations were
not modified, just the presentation of the output.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT February-April 2001.
----------------------------------------------------------------------
1) The online help files have been substantially updated.
Specifically, the additions over the last three years are
now (mostly) incorporated into the help files and the
web documentation.
2) Added support for generating JPEG and PNG image formats.
Enter HELP GD for details. These device drivers are dependent
on several external libraries, so support may not be
available on all platforms.
3) Added the following command:
CHARACTER AUTOMATIC SIGN <varname>
This is similar to the CHARACTER AUTOMATIC command. However,
it makes the character "+", "-", or "0" depending on the
sign of the value in <varname>. This is sometimes useful
when writing macros for design of experiment applications.
4) PROBE is used to determine the current value of internal
Dataplot variables. Added the following values that can
now be accessed with PROBE.
FX1MIN
FX1MAX
FY1MIN
FY1MAX
GX1MIN
GX1MAX
GY1MIN
GY1MAX
DX1MIN
DX1MAX
DY1MIN
DY1MAX
The FX1MIN, FX1MAX, FY1MIN, FY1MAX define the current
axis limits, DX1MIN, DX1MAX, DY1MIN, DY1MAX define the
current data limits, and GX1MIN, GX1MAX, GY1MIN, GY1MAX
are the current "fixed" limits (i.e., limits set by the
LIMITS command).
The most common use is to PROBE the values for FX1MIN,
FX1MAX, FY1MIN, and FY1MAX to determine the current
axis limits. This can sometimes be useful when writing
complex macros. For example,
PLOT SIN(X) FOR X = 0 0.1 6
PROBE FX1MIN
LET XAXISMIN = PROBEVAL
PROBE FX1MAX
LET XAXISMAX = PROBEVAL
PROBE FY1MIN
LET YAXISMIN = PROBEVAL
PROBE FY1MAX
LET YAXISMAX = PROBEVAL
5) Added the following command:
LET Y2 = STANDARDIZE Y
LET Y2 = STANDARDIZE Y X1
LET Y2 = STANDARDIZE Y X1 X2
This command standardizes a variable, Y, based on either
no groups, one group, or two groups. You can standardize
for both mean and standard deviation or just by the mean.
By standardize, we mean subtract the mean and divide by the
standard deviation. Alternative measures for location and
scale are allowed. For details, enter
HELP STANDARDIZE
6) By default, the size of characters in subscripts or superscripts
are set to 1/2 the current character size. You can set the
scale factor using the following commands:
SET SUPERSCRIPT VERTICAL SCALE <value>
SET SUPERSCRIPT HORIZONTAL SCALE <value>
These set the height and width of the character respectively.
7) The CAPABILITY command was significantly enhanced. Enter
HELP CAPABILITY
for details.
8) Support was added for orthogonal distance regression. Enter
HELP ORTHOGONAL DISTANCE FIT
for details.
9) Support was added for consensus means using Mandel-Paule,
modified Mandel-Paule, Vangel-Ruhkin (maximum likelihood),
Schiller-Eberhardt, and bounds on bias (BOB) methods. Enter
HELP CONSENSUS MEANS
for details.
10) Some bugs were fixed.
In particular, diagrammatic graphics drawn in data units rather
than screen units (e.g., DRAWDATA, MOVEDATA) were not drawn
correctly for log scales. This has been fixed. An error
message is printed if a WEIBULL or NORMAL axis scale is detected.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January 2000.
----------------------------------------------------------------------
1) Added the following commands.
a) LEGEND <numb> UNITS <DATA/SCREEN>
This command allows legend coordinates to be interpreted
in either the screen 0 to 100 units (SCREEN, the default) or in
units of the plot (DATA).
b) ...LABEL OFFSET <value>
...LABEL JUSTIFICATION <value>
These commands allow you to set the horizontal offset
(in Dataplot 0 to 100 screen units, the LABEL DISPLACEMENT
allows you to set the vertical offset) and justification of
the axis labels. These commands were motivated by some
of the new multiplots discussed below. However, they
can be used at any time (although usage should be rare).
c) You can use CR() in text strings to start a new line.
Up to 10 lines may be entered, although more than 3 lines
is rare. Each of the lines use the same plot attributes
(e.g., all left justified or all center justified).
This applies to both hardware and software fonts and
is used for all types of text. The most common usages
are to create multiline titles and legends and to use
multiple lines with alphabetic tic mark labels.
d) By default, the Dataplot HISTOGRAM and FREQUENCY POLYGONS
range from -6 to +6 standard deviations from the mean.
Although in most cases, this is more than adequate,
Dataplot did not warn you if points were found outside
this range. Dataplot now flags the number of points
outside this range (separate messages for points below
and points above). No message is printed if all points
are within the range. The CLASS LOWER and CLASS UPPER
commands can be used if you need to widen the range.
e) Dataplot now supports row labels and variable labels.
Row labels are strings of up to 32 characters that
are used to identify a row of the data. To define
the row label, do something like the following:
SKIP 25
COLUMN LIMITS 1 19
READ ROW LABELS AUTO79.DAT
COLUMN LIMITS 20 132
READ AUTO79.DAT Y1 TO Y12
The COLUMN LIMITS are almost always used when reading the
row labels. Typically, you read a file once for the
numeric data and then a second time for the row labels.
Currently, the use of row labels is only supported
with the CHARACTER command (see below). However, we
anticipate additional usage of this feature in future
updates.
A long label (up to 52 characters) can be associated with a
variable name (which is currently limited to 8 characters).
Variable labels are specified with (note that the variable
name must already be defined).
VARIABLE LABEL <var name> <var label>
The label may contain spaces. Variable labels are currently
supported in three ways:
i) Some of the new multi-plotting commands (discussed
below) automatically make use of variable labels.
ii) You can use the "^" to substitute a variable label
for a variable name in text strings. For example,
LET Y = NORMAL RAND NUMBERS FOR I = 1 1 100
VARIABLE LABEL Y NORMAL RANDOM NUMBERS
Y1LABEL ^Y
PLOT Y
Previously, Dataplot only supported substitutions
for parameters and strings. Now, if a variable name
is found, it checks to see if a label has been defined.
If yes, the label is substituted for the variable name.
If not, the variable label is left as is (with the
"^" removed).
iii) The X1LABEL AUTOMATIC and Y1LABEL AUTOMATIC commands
will now substitute the varialbe label for the variable
name on the x and y axes respectively.
f) The following special options were added for the
CHARACTER command:
ROWID - uses the row number as the plot character
ROWLABEL - uses the row label as the plot character
XVALUE - uses the x-coordinate of the point as the
plot character
YVALUE - uses the y-coordinate of the point as the
plot character
XYVALUE - uses (x-coor,y-coor) as the plot character
TVALUE - uses the tag value as the plot character
(Dataplot assigns a curve-id, the tag,
to each point)
ZVALUE - this is a special form that is specific to
certain commands. For a few commands (currently
the DEX CONTOUR PLOT and the CROSS TABULATE
PLOT, but we expect a few
additional plots to support this form in future
releases), Dataplot writes a numeric value into
an internal array. The value in this array is
used as the plot symbol. Using this with
unsupported plot types may have unpredictable
results (it will depend on what is stored in
the internal array). This option is typically
set automatically by Dataplot in the
background, so currently users should not
set this directly.
The ROWID and ROWLABEL are typically only used for the
PLOT command (i.e., not for HISTOGRAM, etc.). This option
keeps track of any subsetting (i.e., SUBSET/FOR/EXCEPT
clauses on the plot command) when identifying the point.
However, the results may be unpredictable for graphics other
than the PLOT command.
The most common use of this command is to identify specific
points on the plot (typically with the ROWLABEL option).
A typical sequence would be
CHARACTER X
PLOT Y X
PRE-ERASE OFF
LIMITS FREEZE
CHARACTER ROWLABEL
PLOT Y X SUBSET Y > 90
g) The STATISTIC PLOT command now supports the
CORRELATION, RANK CORRELATION, COVARIANCE, and RANK
COVARIANCE cases.
h) The command
SET PARAMETER EXPANSION <NUMERIC/EXPONENTIAL>
was added. This command applies when substituting the
value of a parameter using "^". Normally, this was
intended for putting numeric values in text lagels. In this
case, it is desirable to limit the number of digits. However,
when used with the FIT command (parameters you want to remain
constant rather than be fitted are often entered this way),
you may need to specify high precision. If NUMERIC (the
default) is specified, the current algorithm for parameter
substitution is used. If EXPONENTIAL is specified, the
parameter is entered using scientific notation. For example,
(0.123456789012*10**(2))
i) The command
SET SORT DIRECTION <ASCENDING/DESCENDING>
was added. This command specifies whether the sorts
performed by SORT and SORTC are ascending or descending
sorts (the default is ascending).
2) The following new plots were added.
a) INTERACTION PLOT Y X1 ... XK
<stat> INTERACTION PLOT Y X1 ... XK
These plot Y versus X1*X2* ... *XK and are primarily intended
for DEX applications. Specifically, it serves as the
building block for the DEX INTERACTION PLOT discussed below.
It is actually the DEX INTERACTION PLOT that is typically
generated by the user. This command supports the same
set of statistics as the STATISTIC PLOT command.
The case of most interest for the DEX plots is 2 X variables,
but these plots will in fact handle an arbitrary number
up to 25.
b) CROSS TABULATE <stat> PLOT Y X1 X2
CROSS TABULATE <stat> PLOT Y1 Y2 X1 X2
CROSS TABULATE PLOT X1 X2
CROSS TABULATE PLOT <stat> X1 X2
This command performs a cross-tabulation on X1 and X2.
It computes the statistic given by <stat> for the response
values (Y) in each cell of the cross tabulation. The list
of supported statistics is the same as for the
STATISTIC PLOT command. Most of the supported statistics
expect a single response variable. A few expect two
(e.g., LINEAR CORRELATION). The COUNT (or NUMBER) expect
no response variables.
The output of this command plots the computed statistic
on the Y axis. The X axis coordinate is determined from
the two group variables in the following way:
i) The levels of the first group variable (X1 in the above
examples) are plotted at 1, 2, 3, etc.
ii) For each level of the group 1 variable, the levels of
the group 2 variable are scaled +/- 0.2 around the
level of the group 1 variable.
For example, if X1 has 2 levels (at 1 and 2) and X2 has
3 levels (1, 2, and 3), then the following x-coordinates
are used:
X1 X2 X-COOR
============================
1 1 0.8
1 2 1.0
1 3 1.2
2 1 1.8
2 2 2.0
2 3 2.2
The syntax CROSS TABULATE X1 X2 is a special case. It plots
the value of X1 on the X axis and the value of X2 on the
Y axis. The plot character is then set to the count
for that cell (this is done automatically and you do not need
to set the plot character). This form of the plot has
application in the design of experiments.
Note that this command is an extension of the STATISTIC PLOT
command. However, instead of one group variable, there
are two group variables.
The command
SET CROSS TABULATE PLOT DIMENSION <1/2>
can be used to specify an alternative format for this
plot. If "1", then the format of the plot is described
as above. If "2", then the format is similar to the
CROSS TABULATE X1 X2 format. That is,
SET CROSS TABULATE PLOT DIMENSION 2
CROSS TABULATE MEAN PLOT Y X1 X2
will print the value of the mean of Y at the value of X1 on
the X axis and the value of X2 on the Y axis. Essentially,
this is the tabled values in graphic format. You can
use this format to generate plots where you want to print
a numeric value at (X,Y), that is some value other than
X or Y. You can define a response variable Z with the
desired values to print and then use the CROSS TABULATE
MEAN PLOT (if there is only one value, the mean is equal
to that value).
c) DEX CONTOUR PLOT Y X1 X2 YCONT
This plots a dex contour plot for the case when X1 and X2
have 2 levels (represented by the values -1 and 1). In
addition, one or more center points (X1 and X2 both 0)
may be present. Any points where X1 and X2 are not equal
to -1, 1, or 0 are ignored. The array YCONT contains the
contour levels.
The appearance of the plot is controlled by the settings
of the LINE and CHARACTER command. Specifically,
trace 1 = label for center point and the points
at (-1,-1), (-1,1), (1,1), (1,-1). The
character setting should be ZVAL and line
should be blank.
trace 2 = center point. If no center point was specified,
this point is not generated (and the CHAR and LINE
settings need to be adjusted accordingly).
trace 3 = line connecting (-1,-1), (1,-1), (1,1), (-1,1)
trace 4+= the contour lines start with trace 4. There is
one trace for each value of YCONT.
This command implements the algorithm previously available
in the built-in DEXCONT.DP macro as a Dataplot command.
As an example of this command, you can enter
SKIP 25
READ BOXYIELD.DAT Y X1 X2
LET YCONT = SEQUENCE 50 2 70
CHARACTER ZVAL CIRCLE CIRCLE
CHARACTER FILL OFF ON ON
LINE BLANK BLANK BLANK
DEX CONTOUR PLOT Y X1 X2
d) YATES CUBE PLOT Y X1 X2 X3
This plots a Yates cube plot for the case when X1, X2, and
X3 are factor variables with exactly two levels. It plots
the value of the response variable, Y, at each vertex.
This plot is used in 2**(3) factorial and fractional
factorial designs.
3) Dataplot now supports sub-regions on plots. Sub-regions are
motivated by the desire to denote "engineering limits"
on a plot. That is, a rectangle, denoting an acceptance
region in both the X and Y directions, is drawn on the
plot and then the plots are overlaid on top of this.
Although the subregion capability was motivated for the
purpose of denoting engineering limits, they can in fact
be used for whatever purpose you want.
The SUBREGION commands are:
SUBREGION <ON/OFF> <ON/OFF> <ON/OFF> ....
SUBREGION XLIMITS <lower value> <upper value>
SUBREGION <id> XLIMITS <lower value> <upper value>
SUBREGION YLIMITS <lower value> <upper value>
SUBREGION <id> YLIMITS <lower value> <upper value>
Up to 10 subregions may be defined. In most applications,
only a single subregion is plotted. The SUBREGION <ON/OFF>
switch determines whether or not the given subregion is
plotted. The SUBREGION XLIMITS/YLIMITS commands specify
the lower and upper bounds of the rectangle. If no
<id> is specified, the limits are set for the first subregion.
If <id> is specified, it should be between 1 and 10.
You do not need to adjust the settings for the CHARACTER, LINE,
BAR, and SPIKE when using subregions. Dataplot automatically
shifts these in the background. The attributes of the SUBREGION
are defined by:
REGION FILL <ON/OFF>
REGION COLOR <COLOR>
REGION BORDER LINE <linetype>
REGION BORDER COLOR <color>
The REGION FILL and REGION COLOR determine the attributes of
the interior of the rectangle. The two most common choices
are to leave it blank or to fill it with some type of light gray
scale color. The attributes of the box border are set with
the REGION BORDER LINE and REGION BORDER COLOR commands. The
standard line types (BLANK,SOLID, DASH, DOTTED, etc.) are
supported. Although only one setting was given above, if you
have defined multiple subregions, then you should define
multiple settings in the above command.
A typical sequence of commands would be
SUBREGION ON
SUBREGION XLIMITS 0.35 0.42
SUBREGION YLIMITS 2000 3000
REGION FILL ON
REGION BORDER LINE DASH
REGION COLOR G90
PLOT ....
SUBREGION OFF
Some points to note about subregions are:
a) The subregions are plotted before any of the plot
curves. The significance of this is that a solid filled
subregion will be drawn and then the regular plot points
are drawn on top. The effect of this can be hardware
dependent. On X11 and Postscript devices, a solid character
can be seen on top of a light gray scale box (if the gray
scale gets too dark, the plot points are no longer
distinguishable). However, on some hardware devices, you may
not be able to see points plotted on top of a solid fill
region. In this case, plot the border of the subregion and
leave the interior blank.
It is this order of plotting that distinguishes the
subregion from simply using a BOX <id> command to plot
rectangular regions on the screen.
b) Although most commonly used with the PLOT command, subregions
can in fact be used with any Dataplot graphics command.
c) Currently, only rectangular subregions are supported.
We expect that to be generalized to polygonal regions
in the future.
4) Dataplot now saves the following internal parameters after
all plots (not just those generated with PLOT):
PLOTCORR - correlation of the X and Y coordinates on the plot
PLOTCOR1 - correlation of the X and Y coordinates on the plot
with a tag value of 1. This can be useful for
plots that generate reference lines (which you
do not want included in the correlation computation
PLOTYMAX - maximum Y coordinate
YMAXINDE - index of the maximum Y coordinate
PLOTYMIN - minimum Y coordinate
YMININDE - index of the minimum Y coordinate
PLOTXMAX - maximum X coordinate
XMAXINDE - index of the maximum X coordinate
PLOTXMIN - minimum X coordinate
XMININDE - index of the minimum X coordinate
NACCEPT - number of plot points inside the first subregion
(0 if no subregions defined)
NREJECT - number of plot points outside the first subregion
(0 if no subregions defined)
NTOTAL - number of plot points (NACCEPT + NREJECT)
(0 if no subregions defined)
5) The following multiplots were added:
SCATTER PLOT MATRIX Y1 Y2 ... YK
FACTOR PLOT Y1 X1 ... YK
CONDITIONAL PLOT Y X TAG
a) SCATTER PLOT MATRIX Y1 ... YK
This generates all the pairwise scatter plots of Y1 ... YK
on a single page.
b) FACTOR PLOT Y X1 ... XK
This generates the plots Y VS X1, Y VS X2, .... , Y VS XK
on a single page.
c) CONDITIONAL PLOT Y X TAG
This generates PLOT Y VERSUS X for each unique value in
TAG on a single page.
There are a lot of variations possible with these types of
plots. For example, the basic concept is not limited to
scatter plots. For example, you can generate all the pairwise
bihistograms instead of the pairwise scatter plots. There are
many options in terms of labeling, what plot goes on the
diagonal, and so on.
There are various SET commands that control the appearance
and nature of these plots. Enter
HELP SCATTER PLOT MATRIX
HELP CONDITIONAL PLOT
HELP FACTOR PLOT
for a complete description of what is available.
Two variations of the SCATTER PLOT MATRIX are important enough
to be given special names:
DEX INTERACTION PLOT
YOUDEN MATRIX PLOT
These are described under HELP SCATTER PLOT MATRIX.
6) Fixed the following bugs.
a) The MULTIPLOT SCALE FACTOR did not work correctly with
the software fonts.
b) Entering "character blank", i.e., the blank is in lower case,
plotted BLAN as the plot character when DEVICE 1 FONT SIMPLEX
was used.
c) Using SP() with a software font did not work.
d) The BOX SHADOW OFF command was fixed to set the shadow
height and width to zero rather than to the default.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - July 1999.
----------------------------------------------------------------------
1) Modified the IF command so that if there is an error (e.g., one
of the parameters is not defined), the IF status is set to
FALSE rather than being undefined.
2) Added the following time series commands.
a) Added the command
LET PERIOD = <value>
LET START = <value>
SEASONAL SUBSERIES PLOT Y
A seasonal subseries plot is used to determine if there
is significant seasonality in a time series. Instead of
a straight time order plot, it splits the plot into
the corresponding seasons (or periods). For example, for
monthly data, all the January values are plotted, then all
the February values, and so on. Reference lines are drawn
at the seasonal means.
b) Added the command
LET PERIOD = <value>
LET STLWIDTH = <value>
LET STLSDEG = <0/1>
LET STLTDEG = <0/1>
LET STLROBST = <0/1>
SEASONAL LOWESS Y (or SEASONAL LOESS Y)
READ DPST1F.DAT SEAS TREND
The SEASONAL LOWESS command decomposes a time series into
trend, seasonal, and residual components using techniques
based on locally weighted least squares. That is,
X(t) = TREND(t) + SEAS(t) + RES(t)
The seasonal and trend components are written to the file
DPST1F.DAT (dpst1f.dat on Unix systems) and can be read
back into Dataplot for further plotting and analysis. The
internal variable RES contains the residual component and
the internal variable PRED contains the trend plus the
seasonality component.
The SEASONAL LOWESS command accepts a number of options
which can be defined by the LET commands above. The most
important is the PERIOD parameter which identifies the number
of seasons (e.g., 12 for monthly data). The STLWIDTH
parameter identifies the number of data points to use
in the LOWESS steps and defaults to N/10. It is similar
to specifying the LOWESS FRACTION for standard LOWESS
smoothing. The more points used, the more smoothing that
occurs. The STLSDEG and STLTDEG parameters identify the
polynomial degree used in the lowess for the seasonal and
trend components respectively. By default, the seasonal
lowess performs some robustness iterations. Enter
LET STLROBST = 1 to suppress this.
This technique is described in
Cleveland, Cleveland, McRae, and Terpenning, "STL: A
Seasonal-Trend Decomposition Procedure Based on Loess",
Statistics Research Report, AT&T Bell Laboratories.
c) Added an ARIMA modeling capability. The command is:
ARMA Y AR DIFF MA SAR SDIFF SMA SPERIOD
where
Y = the response variable
AR = the order of auto-regressive terms
DIFF = number of differences to apply. DIFF is typically
0, 1, or 2. Differencing is one technique for
removing trend.
MA = order ot the moving average terms
SAR = order of seasonal auto-regressive terms.
SDIFF = number of seasonal differences to apply. It is
typically 0, 1, or 2.
SMA = order of seasonal moving average terms.
SPERIOD = period for seasonal terms. It defaults to 12
(if a seasonal component is included).
If there is no seasonal component, the last 4 terms may be
omitted.
To minimize the amount of screen output, but to also to
keep the maximum amount of information, Dataplot writes
most of the output to files. Speficially,
dpst1f.dat - the parameters and the standard deviations
of the parameters from the ARMA fit. The
order is:
1) Autoregressive terms
2) Seasonal autoregressive terms
3) Mean term
4) Moving average terms
5) Seasonal moving average terms
dpst2f.dat - this file contains:
1) Row number
2) Original series (i.e., Y)
3) Predicted values
4) Standard deviation of predicted values
5) Residuals
6) Standardized residuals
dpst3f.dat - Intermediate outut from iterations before
convergence. This is generally useful if
the ARMA fit does not converge.
dpst4f.dat - The parameter variance-covariance matrix.
dpst5f.dat - The forecast values for (N/10)+1 observations
ahead. Specifically,
1) The forecasted values
2) The standard deviation of the forecasted
values.
3) The lower 95% confidence band for the
forecast.
4) The upper 95% confidence band for the
forecast.
Dataplot allows you to define the starting values by
defining the variable ARPAR. The order of the parameters
is as given for the file dpst1f.dat above. By default,
all parameters are set to 1 except for the mean term which
is set to 0.
In addition, you can define the variable ARFIXED to fix
certain parameters to their start values. That is, you
define ARPAR to specify the start values. If the
corresponding element of ARFIXED is zero, the parameter is
estimated as usual. If ARFIXED is one, then the parameter
is fixed at the start value. The most common use of this
is to set certain parameters to zero. For example, if
you fit an AR(2) model and you want the AR(1) term to be
zero, you could enter the following:
LET ARPAR = DATA 0 1
LET ARFIXED = DATA 1 0
Dataplot uses the STARPAC library (developed by
Janet Rogers and Peter Tyrone of NIST) to compute the
ARIMA estimates.
ARIMA modeling is covered in many time series texts. It is
beyond the scope of this news file to discuss ARIMA modeling.
However, to use ARIMA models, it is generally recommended
that the series be at least 50 observations long. In addition,
if the series is dominated by the trend and seasonal factors,
an explicit trend, seasonal, and random component decomposition
method, such as the seasonal lowess described above, is
generally preferred to an explicit ARIMA model.
3) Added support for location and scale parameters for an additional
15 distributuins. Entering the command
LIST DISTRIBU.
will list the distributions table. This table shows which
distributions support location and scale parameters.
4) Added the following statistics:
Added the CNPK capability index statistics:
LET LSL = <value>
LET USL = <value>
LET A = CNPK Y
This statistic is now also supported for the following plots:
LET LSL = <value>
LET USL = <value>
CNPK PLOT Y X
DEX CNPK PLOT Y
The LSL and USL specify the lower specification and upper
specificiation engineering limits. The CNPK is a variant of the
CPK capability indices used for non-normal data and is defined as:
CNPK = MIN(A,B)
where
A = (USL-MEDIAN)/(P(.995)-MEDIAN)
B = (MEDIAN-LSL)/(MEDIAN-P(0.005))
P(0.995) and P(0.0050 are the 99.5 and 0.5 percentiles of the
data respectively.
Added the geometric mean and standard deviation and the
harmonic mean statistics.
LET A = GEOMETRIC MEAN Y
LET A = GEOMETRIC STANDARD DEVIATION Y
LET A = HARMONIC MEAN Y
This statistic is now also supported for the following plots:
GEOMETRIC MEAN PLOT Y X
GEOMETRIC STANDARD DEVIATION PLOT Y X
HARMONIC MEAN PLOT Y X
BOOTSTRAP GEOMETRIC MEAN PLOT Y X
BOOTSTRAP GEOMETRIC STANDARD DEVIATION PLOT Y X
BOOTSTRAP HARMONIC MEAN PLOT Y X
JACKNIFE GEOMETRIC MEAN PLOT Y X
JACKNIFE GEOMETRIC STANDARD DEVIATION PLOT Y X
JACKNIFE HARMONIC MEAN PLOT Y X
The geometric mean is defined as:
XGM = (PRODUCT(Xi))**(1/N)
The geometric standard deviation (SD means standard deviation of)
is defined as:
XSD = EXP(SD(LOG(Xi)))
The harmonic mean is defined as:
XHM = N/SUM(1/Xi)
5) Added the Wilks-Shapiro test for normality. The following
commands are equivalent.
WILKS SHAPIRO NORMALITY TEST Y
WILKS SHAPIRO TEST Y
WILKS SHAPIRO Y
There must be at least 3 values in Y. The computed significance
level is not neccessarily valid for N >= 5,000. This command
uses algorithm R94 from the Applied Statistics Journal.
6) Added the studentized range CDF and PPF functions.
LET A = SRACDF(X,V,R)
LET A = SRAPPF(P,V,R)
where V is the degrees of freedom and R is the number of
samples. X must be positive, V must be >= 1, and
R must be >= 2. For most applications, R = V + 1. The PPF
function is only supported for values in the range 0.90 to 0.99.
The studentized range is defined as:
Q = Range/(Standard deviation)
The studentized range is used in constructing confidence intervals
and significance levels for tests for multiple comparison in
analysis of variance problems.
7) Updated the Weibull maximum likelihood estimates to suport
censored data (both type 1 and type 2 and multiple). It also now
generates confidence intervals for the estimate (for various
significance levels). The command
SET CENSORING TYPE <NONE/1/2/MULTIPLE>
defines the censoring type. The EXPONENTIAL MLE output was
modified to be more readable and consistent with the Weibull
output.
8) Added the following quality control commands.
a) Added the following command to generate binomial based single
sample acceptance plans:
SINGLE SAMPLE ACCEPTANCE PLOT P1 P2 ALPHA BETA
where
P1 = Acceptable Quality Level
P2 = Lot Tolerence Percent Defective
ALPHA = Probability of a Type I error
BETA = Probability of a Type II error
b) Added a command to generate the average run length for the
cumulative sum (cusum) control chart. The average run length
is the average number of observations that are entered
before the system is declared out of control.
LET S0 = <value>
LET K = <value>
LET H = <value>
These commands set parameters required by the cusum ARL
calculation. Specifically,
S0 = start-up value for the cumulative sum. This is
usually zero. However, it can be set to a
positive initial value for a fast initial
response (FIR) cusum chart.
H = defines the value which signals that the cusum
is "out of control". A value of 5 is a common
choice.
K = the value of k is set to one half of the smallest
shift in location (in standard deviation units)
that you want to detect. A common choice is a
1-sigma shift, that is k = 0.5.
LET Y = ONE-SIDED CUSUM ARL DELTA
LET Y = CUSUM ARL DELTA
where DELTA defines the difference between the target value
of the process and the true value of the process. This is
a variable that is usually defined to be a sequence of values.
For example,
LET DELTA = SEQUENCE 0 .01 0.5
That is, this command returns the average run length for
a series of values that define the difference between the
target value and the true value of the process.
A typical sequence of commands would be
LET K = 0.5
LET H = 5
LET S0 = 0
LET DELTA = SEQUENCE 0 .01 1.0
LET Y = CUSUM ARL DELTA
PLOT Y DELTA
This command was implemented using Applied Statistics
algorithm 258. If unreasonable values are specified for the
parameters, this algorithm can generate unreasonable results.
9) Added the following commands:
ANOP LIMITS <low> <high>
PROPORTION CONFIDENCE LIMITS Y
DIFFERENCE OF PROPORTION CONFIDENCE LIMITS Y1 Y2
to generate a confidence interval for proportions and the
difference of two proportions respectively. The ANOP
LIMITS command is used to define the lower and upper bounds
that define a success. The confidence intervals are based
on the direct binomial computations, not the normal
approximation, so it is not limited by small N.
10) Added the command
WEB HANDBOOK <keyword>
This command access the NIST/SEMATECH Engineering Statistics
Handbook. A beta version of the Handbook will be released
May, 1999 (http://www.itl.nist.gov/div898/handbook/).
The <keyword> is matched against a file of keywords to
go to the appropriated location in the handbook. This
command is used primarily by the Dataplot GUI, but it can
also be entered by the end-user. If you want to see a list
of the supported keywords, enter
LIST HANDBK.TEX
The handbook provides tutorial information on many common
engineering statistical capabilities. This complements the
WEB HELP command, which accesses the on-line Dataplot Reference
Manual. The on-line Reference Manual is primarily concerned
with how you implement a statistical technique while the
Handbook provides more of a statistical tutorial.
If your site has downloaded the Handbook, enter a command
like the following:
SET HANDBOOK URL http://ketone.cam.nist.gov/cf/handbook/
to define the home directory for the handbook.
The web commands SET BROWSER and SET NETSCAPE OLD apply to
the WEB HANDBOOK as well. SET BROWSER defines the browser
and SET NETSCAPE OLD allows you to use a currently open
browser for the WEB HANDBOOK command. These commands are
discussed in more detail later in this news file.
11) Added the following non-parameteric tests.
a) The following are non-parametric alternatives to the
2-sample t test (i.e., test the hypothesis U1 = U2 where U1
and U2 are the population means for 2 samples).
SIGN TEST Y1 Y2
SIGN TEST Y1 Y2 D0
SIGN TEST Y1 MU
SIGNED RANK TEST Y1 Y2
SIGNED RANK TEST Y1 Y2 D0
SIGNED RANK TEST Y1 MU
RANK SUM TEST Y1 Y2
RANK SUM TEST Y1 Y2 D0
where Y1 and Y2 are the response variables and D0 and MU
are parameters. Specify D0 to test U1 - U2 = D0. The
2-sample test can also be used for the 1-sample test
U1 = MU.
The SIGN TEST and SIGNED RANK TEST commands only apply to
paired samples. The RANK SUM TEST command does not require
equal sample sizes.
b) The following performs the Kruskal-Wallis non-parametric
1-sample ANOVA.
KRUSKAL WALLIS Y X
where Y is the response variable and X is the factor
variable.
12) Added the following plot commands:
a) TUKEY MEAN-DIFFERENCE PLOT Y1 Y2
A Tukey mean-difference plot is an enhancement of the
quantile-quantile (q-q) plot. It converts the interpretation
of the q-q plot from the differences around a diagonal line
to the differences around a horizontal line. If T(i) and
D(i) are the vertical and horizontal coordinates for the q-q
plot, the Tukey mean-difference plot is (T(i) - D(i)) versus
(T(i) + D(i))/2. A horizontal reference line is drawn at
zero.
b) SPREAD LOCATION PLOT Y TAG
The spread-location (s-l) plot is a robust alternative to
the homoscedasticity plot.
Given a response variable Y and a group-id variable X,
the homoscedasticity plot is the group standard deviations
versus the group means. This is a graphical measure of
constant spread across groups.
The s-l plot has the square roots of the absolute value of
the Y(i) minus their group medians on the vertical axis and
the group medians on the horizontal axis. A reference line
connects the group medians.
When setting the LINE and CHARACTER commands, the reference
line is the first trace and the data starts with trace 2
(each group is identified as a unique trace). That is, to
draw the data points as circles and the reference line as a
solid line, do something like the following
CHARACTER CIRCLE ALL
CHARACTER BLANK
LINE BLANK ALL
LINE SOLID
SPREAD LOCATION PLOT Y X
c) RF SPREAD PLOT
The residuals-fitted (r-f) spread plot is a graphical measure
of the goodness of fit. That is, this command is preceeded
by some type of fit. It plots percent point (or quantile)
plots of the fitted values minus their mean and the residuals
arranged side by side with a common vertical scale.
The vertical spread of the residuals compared to the vertical
spread of the fitted values gives an indication of how much
of the variation is explained by the fit.
13) Added the following special functions:
a) LET A = ABRAM0(X,ORD)
This computes the Abramowitz function for order ORD.
currently, ORD can be an integer from 0 to 100.
b) LET A = CLAUSN(X)
This computes the Clausen integral.
c) LET A = DEBEYE(X,ORD)
This computes the Debeye function of order ORD. ORD
can be 1, 2, 3, or 4.
d) LET A = EXP3(X)
This computes the cubic exponential integral.
e) LET A = GOODST(X)
This computes the Goodwin and Stanton integral.
f) LET A = LOBACH(X)
This computes the Lobachevski's integral.
g) LET A = SYNCH1(X)
LET A = SYNCH2(X)
This computes the synchrotron radiation functions.
h) LET A = STROM(X)
This computes Stromgren's integral.
i) LET A = TRAN(X,ORD)
This computes the transport integrals of order ORD.
ORD can be 2, 3, 4, 5, 6, 6, 8, or 9.
These special functions are computed using ACM algorithm 757.
Formulas for these functions are given in:
Allan MacLead, "ACM Transactions of Mathematical Software",
Vol. 22, No. 3, September 1996, pp. 288-301.
13) Fixed a bug in the CD command for Unix platforms. The CD command
allows you to set the default directory.
A few other miscellaneous bugs have also been fixed.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT September - Dec 1998.
----------------------------------------------------------------------
1) Added the following MATRIX commands:
LET MEAN = MATRIX GROUP MEANS M TAG
LET SD = MATRIX GROUP SD M TAG
LET SPOOL = POOLED VARIANCE-COVARIANCE MATRIX M TAG
The MATRIX GROUP MEANS and MATRIX GROUP SD commands compute
the group means and standard deviations of a matrix.
The POOLED VARIANCE-COVARIANCE MATRIX computes a pooled
variance-covariance matrix.
These commands all operate on a matrix (M) and a group
id variable (TAG). The TAG variable has the same number of
rows as the matrix M. The values of TAG are typically integers
and they identify the group to which the corresponding row
of the matrix belongs.
The MATRIX GROUP MEANS/SD commands return a matrix with the
same number of columns as the original matrix M and with
the number or rows equal the number of groups identified
by the TAG variable. That is, MEANS(2,3) is the mean of
of the third variable of the second group.
The pooled variance-covariance matrix:
SPOOL = (1/SUM(N(i)-1)) * SUM((1/N(i)-1)*C(i)))
where N(i) is the number of elements in group i and C(i)
is the variance-covariance matrix of the rows belonging to
group i. An earlier implementation of this command
works with 2 matrices (and no group id variable). This
version of the command still works. That is, if the second
argument to POOLED VARIANCE-COVARIANCE MATRIX command is
a matrix, it is assumed that there are 2 groups and the
data for each group is stored in a separate matrix. If the
second argument is a variable, it is assumed that it is a
group id variable and the data for all matrices are stored
in a single matrix. For the 2 group case, either syntax
will work. For more than 2 groups, only the new syntax
will work.
2) The following control chart enhancements were added:
a) HOTELLING CONTROL CHART Y1 Y2 ... YK GROUP
This commands implements a Hotelling multivariate
control chart. Given p response variables, the Hotelling
control chart computes and plots the following for each group:
T-square = n*(xbar - u0)'SINV(xbar - u0)
N is the size of the group, xbar is a vector of the p
sample means for the subgroup, and u0 is a vector of the
p sample means for the entire data set. That is a 1-sample
Hotelling test is computed to test whether the means for
a given group are equal to the overall sample means.
An upper control limit (there is no lower control limit)
is drawn at the appropriate F statistic for the Hotelling
test. The value of alpha for the F test is chosen so
that alpha/(2*p) = 0.00135. This corresponds to the
3-sigma value for a univariate chart. You can specify
your own control limit, set by whatever criterion that
you deem appropriate, by entering the command:
LET USL = <value>
You can control the appearance of this chart by setting
the lines and character switches. The traces are:
Trace 1 = T-square values
Trace 2 = Zero reference line
Trace 3 = Dataplot calculated control limit
Trace 4 = User specified upper control limit
For example, to draw the T-square values as a solid line
and an X, no zero reference line, the Dataplot control
limit as a dotted line, and no user specified control
limit, enter the commands:
LINE SOLID BLANK DOTTED BLANK
CHARACTER X BLANK BLANK BLANK
b) CUSUM CONTROL CHART Y X
This command implements a mean cumulative sum control
chart.
There are numerous variations on how cusum control
charts are implemented. Dataplot follows the methods
discussed by Thomas Ryan in "Statistical Methods for
Quality Improvement". Dataplot does the following:
i) Positive and negative sums are computed as follows:
SUMH = MAX[0,(z(i) - k) + SUMH(i-1)]
SUML = MAX[0,(-z(i) - k) + SUML(i-1)]
SUMH and SUML have initial values of 0. Z(i) is
the z-score of the ith group (that is, the sub-group
mean minus the overall mean divided by the
standard deviation of xbar.
Dataplot plots the negative of SUML. This is to
avoid overlap for the plottting of SUMH and SUML.
SUMH is plotted on the positive scale vertically and
SUML is plotted on the negative scale vertically.
The value of k is set to one half of the smallest
shift in location (in standard deviation units)
that you want to detect. Dataplot by default selects
a 1-sigma shift, that is k = 0.5. To overide this,
enter the command
LET K = <value>
ii) By defauult, Dataplot sets the control limit at
a value of 5. That is, if the one of the sums exceeds
5, the process is deemed out of control. To override
the default value, enter the command
LET H = <value>
The value for H is typically between 4 and 5.
3) The following command was added:
TOLERANCE LIMITS Y
This computes univariate two-sided tolerance limits for the normal
case and for the distribution free case.
Tolerance limits are a generalization of confidence limits
for the mean. However, instead of a confidence limit for a
single value, it provides confidence limits for the interval
that contains a given percentage of the data (this is called
the coverage). That is, for 90% coverage, we are finding
a confidence interval that contains 90% of the data.
4) Bug fixes:
a) The PP command was fixed for the LAHEY and Microsoft PC
versions of Dataplot.
b) Fixed the RESET VARIABLES command so that it would not
delete parameters, functions, and strings. Note that
RESET DATA still deletes them.
5) Added the percentile statistic:
LET A = <value> PERCENTILE Y
where <value> is a number between 0 and 100.
This statistic is now also supported for the following plots:
LET P100 = <value>
PERCENTILE PLOT Y X
BOOTSTRAP PERCENTIL PLOT Y
JACKNIFE PERCENTILE PLOT Y
PERCENTILE BLOCK PLOT Y
DEX PERCENTILE PLOT Y
The LET P100 = <value> command defines the percentile you
want to compute for all of these plots.
Fixed a small bug in the ...DECILE command.
6) Added the CPM and CC capability index statistics:
LET LSL = <value>
LET USL = <value>
LET TARGET = <value>
LET A = CPM Y
LET A = CC Y
This statistic is now also supported for the following plots:
LET LSL = <value>
LET USL = <value>
LET TARGET = <value>
CPM PLOT Y X
DEX CPM PLOT Y
CC PLOT Y X
DEX CC PLOT Y
The LSL, USL, and TARGET specify the lower specification,
upper specificiation, and target engineering limits. The
CPM is a variant of the CP and CPK capability indices and
is defined as:
CPM = (USL-LSL)/(6*SQRT(S**2+(XBAR-TARGET)**2))
where XBAR and S are the sample mean and standard deviation.
For this index, the larger the better.
The CC statistic is defined as:
CC = MAX((TARGET-XBAR)/(TARGET-LSL),(XBAR-TARGET)/USL)
For this index, the smaller the better.
7) Added the following commands:
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y X
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y X1 X2
<dist> KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
These commands test whether or not a data set
comes from a specified distribution. All distributions for
which Dataplot can generate a cdf function are supported (there
are 70+ such distributions in Dataplot). The names are identical
to the names used for the PROBABILITY PLOT command.
A couple of notes on these commands:
a) The KOLMOGOROV-SMIRNOV test is not supported for discrete
distributions.
b) The CHI-SQUARE test works with either binned or unbinned
data.
Dataplot supports 2 types of pre-binned data. If your data
has equal sized bins, then the X variable contains the
mid-point of each bin. If your bins may be of different
sizes, then the X1 variable is the lower limit of each
class and X2 is the upper limit of each class. Unequal
bins usually result from combining classes with low expected
frequency.
It uses the same rules for binning as it does for the
HISTOGRAM command. That is, the class width is 0.3*S where S
is the standard deviation of Y. The upper and lower limits are
the mean plus or minus 6 times the standard deviation.
The BINNED command generates counts while the RELATIVE BINNED
generates relative frequency.
As with the histogram, you can override these defaults with the
following commands:
CLASS WIDTH <value>
CLASS LOWER <value>
CLASS UPPER <value>
c) You need to specify shape parameters for distributions that
require it. For example,
LET GAMMA = 2
GAMMA CHI-SQUARE GOODNESS OF FIT Y
The parameter names are equivalent to the names used for
the PROBABILITY PLOT command.
Location and shape parameters can be specified genrically
for the CHI-SQUARE and KOLMOGOROV-SMIRNOV tests respectively
by entering:
LET CHSLOC = <value>
LET CHSSCALE = <value>
LET KSLOC = <value>
LET KSSCALE = <value>
These are optional.
8) Added the following commands:
2-SAMPLE CHI-SQUARE TEST Y1 Y2
2-SAMPLE KOLMOGOROV-SMIRNOV TEST Y1 Y2
These 2 commands test whether 2 data samples come from a
common (unspecified) distribution. Y1 and Y2 do not need
to be the same size.
9) Updated the TABULATE and CROSS-TABULATE commands. The computed
group id's and the value of the statistic are written to
the file DPST1F.DAT (or dpst1f.dat on Unix). This simplifies
using the results in further analysis. For example, to
compute the group means and store them in a variable, do
something like the following:
TABULATE MEANS Y X
SKIP 1
READ DPST1F.DAT GROUPID YMEANS
SKIP 0
The CROSS-TABULATE is similar, except there are 2 group-id
variables instead of 1.
10) Added the following command:
LET Y2 X = BINNED Y (or LET Y2 X = FREQUENCY TABLE Y)
LET Y2 X = RELATIVE BINNED Y
(or LET Y2 X = RELATIVE FREQUENCY TABLE Y)
Here, Y2 will contain the counts (or frequencies) and X will
contain the bin mid-points.
This command bins your data. It uses the same rules as the
histogram. That is, the class width is 0.3*S where S is the
standard deviation of Y. The upper and lower limits are
the mean plus or minus 6 times the standard deviation.
The BINNED command generates counts while the RELATIVE BINNED
generates relative frequency.
As with the histogram, you can override these defaults with the
following commands:
CLASS WIDTH <value>
CLASS LOWER <value>
CLASS UPPER <value>
The command SET RELATIVE HISTOGRAM <AREA/PERCENT> specifies
whether or not relative binning is computed so that the area
sums to 1 or so that the frequencies sum to 1. The first option,
which is the default, is useful when using the
relative binning as an estimate of a probability distribution.
The second option is useful when you want to see what percentage
of the data falls in a given class.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - August 1998.
----------------------------------------------------------------------
1) Added the following command:
EMPIRICAL CDF PLOT Y
This generates an empirical CDF plot.
2) Made the following enhancements to the QWIN (the Microsoft
95/NT version) device driver:
Added support for "true color". Previously, if the user
had true color set for the display, the screen colors were
all black (i.e., you couldn't see the output).
Note that true color is something you set from the
Windows 95/NT control panel, not something that Dataplot
can set automatically. That is, you set true color or
standard VGA mode from the control panel and then you
enter the appropriate Dataplot commands to support that
mode.
a) If you have your display set to true color, enter the
following commands in the C:\DPLOGF.TEX file:
SET QWIN COLOR DIRECT
DEVICE 1 QWIN
Note that the order is significant here. The color model
is set when the QWIN device is initialized, so the
SET QWIN COLOR command must come before the DEVICE 1 QWIN
command. Also, it is recommended that you put these commands
in the DPLOGF.TEX file so that you do not get the initial
blank screen where you cannot see the text that you type.
The command SET QWIN COLRO VGA resets the default.
b) For true color, the QWIN device driver supports the full
complement of colors recognized by Dataplot (HELP COLORS
for a description of the Dataplot color model). The default
VGA mode only supports 16 colors.
c) The foreground and background colors for the text window
can now be set for both standard VGA and true color modes.
The following 2 commands, if used, should be entered
after the SET QWIN COLOR <DIRECT/VGA> command and before
the DEVICE 1 QWIN command:
SET QWIN TEXT BACKGROUND COLOR <index>
SET QWIN TEXT FOREGROUND COLOR <index>
where <index> is an integer identifying the desired
color (HELP COLOR gives the index to color mapping in
Dataplot). For VGA mode, <index> is restricted to 0 to
15. For DIRECT mode, <index> is restricted to 0 to 88.
The default for both VGA and DIRECT mode is a white
foreground on a black background. The colors for the
graphics window are set by the normal Dataplot COLOR
commands (e.g., BACKGROUND COLOR BLUE, LINE COLOR RED).
3) Added the following new matrix commands:
The following 2 commands are used to obtain row or column
statistics for a matrix.
LET Y = MATRIX ROW <STAT> M
LET Y = MATRIX COLUMN <STAT> M
where <STAT> is one of: MEAN, MIDMEAN, TRIMMED MEAN,
WINSORIZED MEAN, MEDIAN, SUM, PRODUCT, SD (or STANDARD DEVIATION),
SD OF MEAN, VARIANCE, VARIANCE OF MEAN, RELATIVE VARIANCE,
RELATIVE STANDARD DEVIATION, COEFFICIENT OF VARIATION,
AVERAGE ABSOLUTE DEVIAITION, MEDIAN ABSOLUTE DEVIATION, RANGE,
MIDRANGE, MAXIMUM, MINIMUM, EXTREME, LOWER HINGE, UPPER HINGE,
LOWER QUARTILE, UPPER QUARTILE, SKEWNESS, KURTOSIS,
AUTOCOVARIANCE, AUTOCORRELATION.
The following command computes an overall mean for the matrix:
LET A = MATRIX MEAN M
The following command calculates the quadratic form of a
vector and a matrix. The quadratic form is: x'Mx where x
is a vector and M is a matrix. Quadratic forms are used
frequently in multivariate statistical calculations.
LET A = QUADRATIC FORM M X
The following command is a commonly used quadratic form:
LET Y = DISTANCE FROM MEAN M
This command generates:
Di = (Xi - XMEAN)'SINV(Xi-XMEAN)
where Xi is the ith row, XMEAN is a vector of the column
means, and SINV is the inverse of the variance-covariance
matrix. That is, Di is the distance of the ith row of the
matrix from the mean. Note that in the Dataplot command, you
specify the original matrix, not the variance-covariance matrix.
The following command cacluate X*X' for the vector X. The
result is a pxp matrix where p is the number of rows of X.
This computation is used in some multivariate analyses.
LET M = VECTOR TIMES TRANSPOSE X
The following command is used to create linear combinations:
LET Y2 = LINEAR COMBINATION M C
If the matrix M has p columns and n rows, C should be a vector
with p rows. This commands calculates:
y2 = c(1)*M1 + c(2)*M2 + c(3)*M3 + ... + c(p)*Mp
where M1, M2, ... are the columns of the matrix. The result
is a vector with n rows.
The following commands are used to calculate various distance
matrices:
LET D = EUCLIDEAN ROW DISTANCE M
LET D = EUCLIDEAN COLUMN DISTANCE M
LET D = MAHALANOBIS ROW DISTANCE M
LET D = MAHALANOBIS COLUMN DISTANCE M
LET D = MINKOWSKY ROW DISTANCE M
LET D = MINKOWSKY COLUMN DISTANCE M
LET D = CHEBYCHEV ROW DISTANCE M
LET D = CHEBYCHEV COLUMN DISTANCE M
LET D = BLOCK ROW DISTANCE M
LET D = BLOCK COLUMN DISTANCE M
It is often desirable to scale the original matrix before
calculating a distance matrix. The following commands can
be used to scale the original matrix:
SET MATRIX SCALE <NONE/MEAN/SD/RANGE/ZSCORE>
LET MSCAL = MATRIX ROW SCALE M
LET MSCAL = MATRIX COLUMN SCALE M
The SET MATRIX SCALE command is used to define the type of
scaling to perform. You can scale either across rows or down
columns.
The following command computes the pooled sample
variance-covariance matrix for two matrices:
LET MOUT = POOLED VARIANCE-COVARIANCE MATRIX MA MB
Note that MA and MB should have the same number of columns.
However, the number of rows can vary.
The following computes a 1-sample Hotelling T-square test:
LET A = 1-SAMPLE HOTELLING T-SQUARE M Y
The 1-sample Hotelling t-square tests the following hypothesis:
H0: U=U0
Here, U0 is a vector of population means. That is, the
hypothesied means for each column of the matrix. In the
above syntax, M is a matrix containing the original data
and Y is a vector containing the hypothesized means. The
returned parameter A contains the value of the Hotelling
T-square test statistic. The critical values corresponding
to alpha = .90, .95, .99, and .995 are saved in the internal
parameters B90, B95, B99, and B995.
The following computes a 2-sample Hotelling T-square test:
LET A = 2-SAMPLE HOTELLING T-SQUARE MA MB
The 2-sample Hotelling t-square tests the following hypothesis:
H0: U1=U2
Here, U1 is a vector of population means for sample 1 and
U2 is a vector of population means for sample 2. In the
above syntax, MA is a matrix containing the original data
for sample 1 and MB is a matrix containing the original data
for sample 2. MA and MB must have the same number of columns.
However, they can have a different number of rows. The
returned parameter A contains the value of the Hotelling
T-square test statistic. The critical values corresponding
to alpha = .90, .95, .99, and .995 are saved in the internal
parameters B90, B95, B99, and B995.
The following 2 commands add or delete rows of a matrix:
LET M = MATRIX ADD ROW M Y
LET M = MATRIX DELETE ROW M ROWID
Here, M is a matrix, Y is a variable with the number of rows
equal to the number of columns in M, and ROWID is a scalar
identifying the row to delete.
4) Fixed a bug in the character fill for the QWIN device
driver (DEVICE 1 QWIN for Windows 95/NT). Removed the line
CHARACTER FILL COLOR from the sample DPLOGF.TEX file (this
caused problems for Postscript output).
5) Added support for SP() in the LET STRING command. SP() will
be converted to a single space. Previously, LET STRING packed
out any spaces in the string.
6) Added the command:
LET Y2 = EXPONENTIAL SMOOTHING Y ALPHA
This performs an exponential smoothing of Y. The formual is:
Y2(1) = Y(1)
Y2(I) = ALPHA*Y(I) + (1-ALPHA)*Y(I-1), I > 1
ALPHA is the smoothing parameter and should be greater than
0 and less than 1.
7) The PROBE command is used to return the values of certain
internal parameters and strings. This command was updated
so that the returned value is automatically saved. If the
returned value is an integer or real number, then the value
is stored in the internal parameter PROBEVAL. If the
returned value is a string, then the value is stored in the
internal string PROBESTR. PROBESTR and PROBEVAL can then be
used in the same way as other parameters and strings.
This feature is typically used in macros. For example, you
might want to use the machine maximum value as a "missing
value" indicator. A host independent way of using this value
would now be:
PROBE CPUMAX
LET MACHMAX = PROBVAL
You could then use the parameter MACHMAX wherever you wanted
to define a missing value.
8) Multiplots create new 0 to 100 coordinate units for each
subplot and character sizes are scaled according to this
new subplot area. Although this is generally desirable,
sometimes the resulting character sizes are too small or
distorted if the rows to columns ratio is too far from 1.
As a convenience, the following command was added to allows
all character sizes to be scaled when multiplotting is
in effect:
MULTIPLOT SCALE FACTOR 3
MULTIPLOT SCALE FACTOR 1 2
In the first syntax, both the height and width sizes are
scaled (by 3 in this example) by the same factor. In the
second syntax, the height and width are scaled separately
(the height by 1 and the width by 2 in this example).
The word FACTOR is optional in the command.
The scale factor is multiplied by the requested size. For
example, if the title size is 2 and the scale factor is 3,
then the effective size will be 6. The scale factor is
ignored if multi-plotting is not in effect.
This command allows character sizes to be easily adjutsted
for multiplots without having to enter a number of separate
size commands before the multiplot (and then after the
multiplot to return to normal values).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - May 1998.
----------------------------------------------------------------------
1) Reliability/Extreme Value Updates
a) Added the following commands for finding maximum likelihood
estimates for distribution parameters.
WEIBULL MAXIMUM LIKELIHOOD Y
EXPONENTIAL MAXIMUM LIKELIHOOD Y
DOUBLE EXPONENTIAL MAXIMUM LIKELIHOOD Y
NORMAL MAXIMUM LIKELIHOOD Y
LOGNORMAL MAXIMUM LIKELIHOOD Y
PARETO MAXIMUM LIKELIHOOD Y
GAMMA MAXIMUM LIKELIHOOD Y
INVERSE GAUSSIAN MAXIMUM LIKELIHOOD Y
GUMBEL MAXIMUM LIKELIHOOD Y (or EV1)
POWER MAXIMUM LIKELIHOOD Y
BINOMIAL MAXIMUM LIKELIHOOD Y
POISSON MAXIMUM LIKELIHOOD Y
At this time, only the parameter estimates are computed,
that is no standard errors or confidence intervals for the
estimates are computed.
There are various synonyms for these commands. For example,
WEIBULL MAXIMUM LIKELIHOOD ESTIMATE Y
WEIBULL MAXIMUM LIKELIHOOD Y
WEIBULL MLE ESTIMATE Y
WEIBULL MLE Y
are all equivalent. Similar synonyms apply to the other
commands.
The exponential case is an exception in that it does
print confidence intervals. It also supports type 1 and
type 2 censored data. For example, the full sample case
is:
SET CENSORING TYPE NONE (this is the default)
EXPONENTIAL MLE Y
Type 1 censoring is censoring at a fixed time t0. This
is handled via:
SET CENSORING TYPE 1
LET TEND = <censor time>
EXPONENTIAL MLE Y
If you have data values that are censored before time t0, then
create a TAG variable with 1 for failure times and 0 for
censoring times. You would the enter:
EXPONENTIAL MLE Y TAG
Type 2 censoring is censoring after R failures have been
observed. This case is handled via:
SET CENSORING TYPE 2
EXPONENTIAL MLE Y TAG
where TAG is variable with 1 for failure times and 0 for
censoring times.
Related to this are the commands
DEHAAN Y
CME Y
These generate parameter estimates for the generalized
Pareto distribution for extreme value applications.
b) Added the following commands:
1) LET Y = CUMULATIVE HAZARD X TAG
LET Y = HAZARD X TAG
where X is a list of failure times and TAG is an array
that identifies the value as a failure time (TAG = 1) or
a censoring time (TAG = 0).
2) LET Y = INTERARRIVAL TIMES X
where X is a list of failure times. This is similar to
the SEQUENTIAL DIFFERENCE command in that it calculates
X(I)-X(I-1). However, it sorts the data first and the
first interarrival time is set equal to X(1).
3) LET Y = CUMULATIVE AVERAGE X
LET Y = CUMULATIVE MEAN X
As the name implies, this computes the cumulative mean of
a variable. One use of this is to compute cumulative mean
time between failures for reliability data.
4) LET Y = REVERSE X
LET Y = FLIP X
This reverses the order of a variable (i.e., Y(1)=X(N),
Y(2)=X(N-1), and so on). For example, if you want to
sort from high to low instead of low to high, you can enter
LET Y = SORT X
LET Y = REVERSE Y
5) LET ALPHA = <value>
LET BETA = <value>
LET Y = POWER LAW RANDOM NUMBERS FOR I = 1 1 N
This generates N failure times from a non-homogeneous
Poisson process following the power law. That is,
M(t) = alpha*t**beta alpha, beta > 0
where M(t) is the expected number of failures at time
t. The random failure times are generated from the
formula for the interarrival times (i.e., the CDF for
the waiting time for the next failure given a failure at
time T):
F (t) = 1 - EXP(-ALPHA*[(T+t)**BETA-T**BETA]
T
c) The following 2 plots were added:
KAPLAN MEIER PLOT Y TAG
MODIFIED KAPLAN MEIER PLOT Y TAG
Here, Y is a list of failure times and TAG identifies censored
data. A value of 1 for TAG means that the corresponding Y
value is a failure time and a value of 0 means that the
corresponding Y value was censored. The TAG variable is
optional (if omitted, no censoring is performed).
Kaplan-Meier estimates are discussed in most texts in survival
or reliability analysis. The modified Kaplan-Meier is a
slightly adjusted form of the estimate.
The X axis of the plot is failure time and the Y axis is
an estimate of survival (or reliability). Some analysts
prefer that the Y axis be CDF estimate (i.e., 1 - Survival).
Enter the command
SET KAPLAN MEIER CDF
to specify this (and SET KAPLAN MEIER RELIABILITY to reset it).
If you want the numeric Kaplan Meier estimates, do
KAPLAN MEIER PLOT Y TAG
LET RELI = YPLOT
LET FAILTIME = XPLOT
The variables RELI and FAILTIME can be used in subsequent
commands to do further analysis.
d) The following plots were added:
EXPONENTIAL HAZARD PLOT Y TAG
NORMAL HAZARD PLOT Y TAG
LOGNORMAL HAZARD PLOT Y TAG
WEIBULL HAZARD PLOT Y TAG
Hazard plots are similar to probability plots. However,
they can be used with censored data and are commonly used
in reliability studies.
e) Added the following command:
DUANE PLOT Y
Given a set of failure times T, the Duane plot is
Ti/i (where i is the index from 1 to N) versus Ti on
a log-log scale. You do not need to specify XLOGON or YLOG ON
as Dataplot does this automatically. Dataplot also resets
the original values for these switches after the Duane plot
is completed.
A line is fit to the plotted data. Various parameters from
the fit are saved as internal parameters (enter
STATUS PARAMETERS after the DUANE PLOT to see what they are).
A typical use would be:
READ FAILURE.DAT Y
Y1LABEL CUMULATIVE MEAN TIME BETWEEN FAILURE
X1LABEL FAILURE TIME
CHARACTER X BLANK
LINE BLANK SOLID
DUANE PLOT Y
JUSTIFCATION CENTER
MOVE 50 7
TEXT SLOPE OF FITTED LINE = ^BETA
MOVE 50 4
TEXT INTERCEPT OF FITTED LINE = ^ALPHA
f) The following command was added:
RELIABILITY TRENDS TEST Y
This command is used in reliability applications to determine
if repair times show a significant trend. It computes the
following 3 tests:
a) Reverse Arrangement Test
b) Military Handbook Test
c) Laplace Test
The last 2 tests require the censoring time. This is entered
(before the RELIABILITY TRENDS TEST) as:
LET TEND = <value>
The value of TEND should be greater than the maximum value
of the response variable.
Some of the Probability and Recipe updates discussed below are
also relevant to reliability applications.
2) Probability Updates
a) Added optional location and scale parameters for many of the
probability functions.
Specifically, the following functions now support both location
and scale parameters:
CAUCDF, CAUPDF, CAUPPF, CAUSF
DEXCDF, DEXPDF, DEXPPF, DEXSF
DGACDF, DGAPDF, DGAPPF
DWECDF, DWEPDF, DWEPPF
EV1CDF, EV1PDF, EV1PPF
EV2CDF, EV2PDF, EV2PPF
EWECDF, EWEPDF, EWEPPF
EXPCDF, EXPPDF, EXPPPF
FLCDF, FLPDF, FLPPF
GAMCDF, GAMPDF, GAMPPF
GEVCDF, GEVPDF, GEVPPF
GGDCDF, GGDPDF, GGDPPF
GLOCDF, GLOPDF, GLOPPF
HFCCDF, HFCPDF, HFCPPF
HFNCDF, HFNPDF, HFNPPF
IGCDF, IGPDF, IGPPF
LGACDF, LGAPDF, LGAPPF
LGNCDF, LGNPDF, LGNPPF
LLGCDF, LLGPDF, LLGPPF
LOGCDF, LOGPDF, LOGPPF
NORCDF, NORPDF, NORPPF
RIGCDF, RIGPDF, RIGPPF
WEICDF, WEIPDF, WEIPPF
NOTE: The help files and Reference Manual refer to the
location parameter for the 2-parameter inverse gaussian
(IG), reciprocal inverse gaussian (RIG), Wald (WAL), and
fatigue life (FL) distributions. This is actually the
scale parameter for these distributions.
The following added a location parameter only:
HFLCDF, HFLPDF, HFLPPF
PA2CDF, PA2PDF, PA2PPF
PARCDF, PARPDF, PARPPF
PEXCDF, PEXPDF, PEXPPF
PLNCDF, PLNPDF, PLNPPF
PNRCDF, PNRPDF, PNRPPF
VONCDF, VONPDF, VONPPF
WALCDF, WALPDF, WALPPF
WCACDF, WCAPDF, WCAPPF
The following added a scale parameter only:
GEPCDF, GEPPDF, GEPPPF
POWCDF, POWPDF, POWPPF
The following added a lower and upper limit (which is then
converted by Dataplot into location and scale parameters).
UNICDF, UNIPDF, UNIPPF, UNISF
BETCDF, BETPDF, BETPPF, BETSF
b) Added the following hazard and cumulative hazard functions:
NOTE: In the following, LOC and SCALE specify location and
scale parameters respectively and are optional. For the
uniform, the lower and upper limits are specified (and
are converted by Dataplot to location and scale
parameters) and are also optional. All other parameters
are the standard shape parameters for the distribution.
UNIHAZ(X,LOWER,UPPER) - uniform hazard function
UNICHAZ(X,LOWER,UPPER) - uniform cumulative hazard function
NORHAZ(X,LOC,SCALE) - normal hazard function
NORCHAZ(X,LOC,SCALE) - normal cumulative hazard function
LGNHAZ(X,SD,LOC,SCALE) - normal hazard function
LGNCHAZ(X,SD,LOC,SCALE) - normal cumulative hazard function
PNRHAZ(X,SD,P,LOC) - power normal hazard function
PNRCHAZ(X,SD,P,LOC) - power normal cumulative hazard
function
PLNHAZ(X,SD,P,LOC) - power log-normal hazard function
PLNCHAZ(X,SD,P,LOC) - power log-normal cumulative
hazard function
EXPHAZ(X,LOC,SCALE) - exponential hazard function
EXPCHAZ(X,LOC,SCALE) - exponential cumulative hazard
function
WEIHAZ(X,GAMMA,LOC,SCALE) - Weibull hazard function
WEICHAZ(X,GAMMA,LOC,SCALE) - Weibull cumulative hazard
function
EWEHAZ(X,GAMMA,THETA,LOC,SCALE) - exponentiated Weibull
hazard function
EWECHAZ(X,GAMMA,THETA,LOC,SCALE) - exponentiated Weibull
cumulative hazard function
GAMHAZ(X,GAMMA,LOC,SCALE) - gamma hazard function
GAMCHAZ(X,GAMMA,LOC,SCALE) - gamma cumulative hazard function
IGAHAZ(X,GAMMA,LOC,SCALE) - inverted gamma hazard function
IGACHAZ(X,GAMMA,LOC,SCALE) - inverted gamma cumulative hazard
function
GGDHAZ(X,GAMMA,K,LOC,SCALE) - generalized gamma hazard
function
GGDCHAZ(X,GAMMA,K,LOC,SCALE) - generalized gamma cumulative
hazard function
EV1HAZ(X,GAMMA,LOC,SCALE) - Gumbel hazard function
EV1CHAZ(X,GAMMA,LOC,SCALE) - Gumbel cumulative hazard
function
EV2HAZ(X,GAMMA,LOC,SCALE) - Frechet hazard function
EV2CHAZ(X,GAMMA,LOC,SCALE) - Frechet cumulative hazard
function
GEPHAZ(X,GAMMA,SCALE) - generalized Pareto hazard
function
GEPCHAZ(X,GAMMA,SCALE) - generalized Pareto cumulative
hazard function
IGHAZ(X,GAMMA,LOC,SCALE) - inverse gaussian hazard function
IGCHAZ(X,GAMMA,LOC,SCALE) - inverse gaussian cumulative
hazard function
WALHAZ(X,GAMMA,LOC) - Wald hazard function
WALCHAZ(X,GAMMA,LOC) - Wald cumulative hazard function
RIGHAZ(X,GAMMA,LOC,SCALE) - reciprocal inverse gaussian
hazard function
RIGCHAZ(X,GAMMA,LOC,SCALE) - reciprocal inverse gaussian
cumulative hazard function
FLHAZ(X,GAMMA,LOC,SCALE) - fatigue life hazard function
FLCHAZ(X,GAMMA,LOC,SCALE) - fatigue life cumulative hazard
function
PARHAZ(X,GAMMA,LOC) - Pareto hazard function
PARCHAZ(X,GAMMA,LOC) - Pareto cumulative hazard
function
ALPHAZ(X,ALPHA,BETA) - alpha hazard function
ALPCHAZ(X,ALPHA,BETA) - alpha cumulative hazard function
PEXHAZ(X,ALPHA,BETA) - exponetial power hazard function
PEXCHAZ(X,ALPHA,BETA) - exponential power cumulative
hazard function
NOTE: The hazard function is defined as:
h(x) = pdf(x)/(1-cdf(x))
and the cumulative hazard function is defined as:
H(x) = -log(1-cdf(x))
where pdf and cdf are the probability density and
cumulative distribution functions respectively. These
functions can be used to generate hazard and cumulative
hazard functions for distributions that Dataplot does
not support directly.
c) Added the mixture of 2 normal probability functions.
Specifically,
NORMXCDF(X,U1,SD1,U2,SD2,PMIX)
NORMXPDF(X,U1,SD1,U2,SD2,PMIX)
NORMXPPF(P,U1,SD1,U2,SD2,PMIX)
where U1 and SD1 are the mean and standard deviation of the
first normal distribution, U2 and SD2 are the mean and standard
deviation of the second normal distribution, and PMIX is
the mixing proportion (between 0 and 1).
You can generate a probability plot as follows:
LET U1 = <value>
LET SD1 = <value>
LET U2 = <value>
LET SD2 = <value>
LET P = <value>
NORMAL MIXTURE PROBABILITY PLOT Y
You can generate random numbers as follows:
LET U1 = <value>
LET SD1 = <value>
LET U2 = <value>
LET SD2 = <value>
LET P = <value>
LET Y = NORMAL MIXTURE RANDOM NUMBERS FOR I = 1 1 1000
d) Added the inverted gamma probability functions:
IGACDF(X,GAMMA,LOC,SCALE)
IGAPDF(X,GAMMA,LOC,SCALE)
IGAPPF(P,GAMMA,LOC,SCALE)
This is not really a new function. It is simply the
generalized gamma function with the second shape parameter
set to -1. We added it as a separate set of functions since
it is a common distribution in certain applications.
Also added:
LET GAMMA = <value>
INVERSE GAMMA PROBABILITY PLOT
INVERSE GAMMA PPCC PLOT
e) Added following discrete PPCC PLOT commands:
BINOMIAL PPCC PLOT
NEGATIVE BINOMIAL PPCC PLOT
LOGARIOTHMIC SERIES PPCC PLOT
For the binonial and negative binomial, N must be specified
(and then P is computed).
f) Fixed the PROBABILITY PLOT X Y and PPCC PLOT X Y commands
to handle zero count bins correctly.
3) Recipe Updates
a) Added support for multi-factor recipe fits. For example,
a common model is:
Y = A0 + A1*X1 + A2*X1**2 + A3*X2 + A4*X2**2 + A5*X1*X2
In Dataplot, the recipe analysis could be done as follows:
READ FILE.DAT Y X1 X2 BATCH
READ FILE2.DAT XP1 XP2
LET X1S = X1*X1
LET X2S = X2*X2
LET X1X2 = X1*X2
LET XP1S = XP1*XP1
LET XP2S = XP2*XP2
LET XP1P2 = XP1*XP2
.
RECIPE FIT FACTORS 5
RECIPE FIT Y X1 X1S X2 X2S X1X2 BATCH XP1 XP1S XP2 XP2S XP1P2
PRINT TOL
XP1 and XP2 are the points at which you want the tolerance
values computed. If they are omitted, then the tolerance
values are computed at the unique points in the design
matrix (i.e., all the unique combinations of X1 and X2).
The BATCH variable is a batch identifier and is optional.
X1 and X2 must have the same number of points and XP1 and
XP2 should have the same number of points. However, X1 and
XP1 do not need to have the same number of points (and they
usually will not). The primary output from the RECIPE command
is the tolerance values (by default, saved in TOL). Commands
for setting the probability confidence and content are
the same as for the 1-factor recipe fit.
b) Recipe is generally used in the context of setting tolerance
limits as defined in MIL-17 Handbook. A number of other
statistical techniques are defined in this handbook.
Dataplot had previously added support for the Grubbs test,
Levene's test for shifts in scale, and the F test for shifts
in location. The following additional tests defined in the
handbook are now supported as well:
ANDERSON-DARLING <DIST> TEST Y
where DIST is: NORMAL, LOGNORMAL, WEIBULL, EXTREME VALUE
ANDERSON-DARLING K-SAMPLE TEST Y X
WEIBULL MAXIMUM LIKELIHOOD Y
B BASIS <DIST> TOLERANCE LIMIT Y
A BASIS <DIST> TOLERANCE LIMIT Y
where DIST is: NORMAL, LOGNORMAL, WEIBULL, NON-PARAMETRIC
The Anderson-Darling 1-sample test is used to determine if a
data set can be assumed to come from a certain distribution.
The EXTREME VALUE distribution is the type 1 extreme value
distribution. The k-sample Anderson-Darling test is used
to test if groups of data are the same (in the sense of
coming from the same distribution with common location and
scale). It is typically used to determine if data coming
different batches can be treated as if they came from the
same batch. The WEIBULL MAXIMUM LIKELIHOOD command is used
to generate maximum likelihood estimates of the 2-parameter
Weibull distribution (the shape and scale parameters).
The B BASIS and A BASIS commands are used to generate
b basis and a basis tolerance limits for a variable
for a few common distributions.
See the MIL-17 Handbook for more information on these
techniques.
4) Matrix Updates
Modified matrix commands to make more efficient use of
storage. Upped default maximum number of rows from 1,500 to
3,000.
Added a DIMENSION MATRIX COLUMNS <val> and DIMENSION MATRIX ROWS
<val> command. This is used to dimension temporary matrices
in the matrix routines. Note that unlike the DIMENSION command
for variables, this command does not erase any previously
created data. It is only used to dimension temporary matrices
in the matrix code, not to store the original data.
Each temporary matrix has a maximum of 920,000/3 elements.
However, you cannot dimension the number of rows in a matrix
to be greater than the number of rows in a variable.
5) Miscellaneous Updates
a) Added the commands:
LINE <SAVE/RESTORE>
CHARACTER <SAVE/RESTORE>
These were motivated by the graphical user interface, but they
can be used directly by the user as well.
b) Added the commands:
SET PRINTER <id>
PROBE PRINTER <id>
These allow the user to specify the printer name for the
PP command. It is currently supported for the Unix and
Windows 95/NT versions. It would be straightforward to support
on other systems as well.
c) The ANOVA code was significantly rewritten.
1) The maximum number of factors was increased from 5 to 10.
2) The output was modified. Specifically, an ANOVA table was
added other output was re-arranged.
3) Some information is now written out to files DPST1F.DAT
and DPST2F.DAT. This is usefule if you need to use some
of the ANOVA quantities in further analysis.
4) A check is now made to see if you have a balanced design
(i.e., all cells have an equal number of observations).
A warning message will be printed if an unbalanced case is
detected. Note that the Dataplot calculations are based on
the assumption of balanced data. However, it will still
run the ANOVA for the unbalanced case (the output will
not be accurate in this case).
d) Added CODED as synonym for CODE (LET Y = CODE X or
LET Y = CODED X).
e) Modified data reads so that non-printing characters are
converted to spaces.
f) The BOOTSTRAP PLOT command was augmented so that the following
parameters are now automatically saved:
BMEAN - mean of the plotted bootstrap values
BSD - standard deviation of the plotted bootstrap values
B001 - the 0.1% percentile of the plotted bootstrap values
B005 - the 0.5% percentile of the plotted bootstrap values
B01 - the 1.0% percentile of the plotted bootstrap values
B025 - the 2.5% percentile of the plotted bootstrap values
B05 - the 5.0% percentile of the plotted bootstrap values
B10 - the 10% percentile of the plotted bootstrap values
B20 - the 20% percentile of the plotted bootstrap values
B80 - the 80% percentile of the plotted bootstrap values
B90 - the 90% percentile of the plotted bootstrap values
B95 - the 95% percentile of the plotted bootstrap values
B975 - the 97.5% percentile of the plotted bootstrap values
B99 - the 99% percentile of the plotted bootstrap values
B995 - the 99.5% percentile of the plotted bootstrap values
B999 - the 99.9% percentile of the plotted bootstrap values
These values are typically used in setting confidence levels.
Also, the BOOTSTRAP COEFFICENT OF VARIATION PLOT and
BOOTSTRAP RELATIVE VARIANCE PLOT commands were added.
g) Some code not used by the user was added for the graphical
front-end.
h) Raised the maximum number of lines in a loop from 200 to 500.
i) Fixed some minor bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT October - December 1997.
----------------------------------------------------------------------
1) The WRITE command was updated to allow
WRITE VARIABLES ALL (or WRITE ALL VARIABLES)
This was added to support some updates to the frontend, but it
can be used in the command line as well. Currently, a maximum
of 25 variables will be printed.
2) An update was made to allow exponential notation in commands
where a number or parameter is expected. For example,
LET Y = DATA 1.2E-7 2.0E3 4.26E+4
The above example shows the 3 forms of the E notation that
are currently recognized. Note that using "D" instead of
"E" is not currently supported.
Parsing of expressions (e.g., transformations under LET,
definition of functions, FIT expressions) is not yet supported.
That is,
LET Y(1) = 1.2E-3
does NOT work as of yet. The parsing of expresions under
LET is handled in a different part of the code. Support
may be added at a later time.
3) The command SKIP AUTOMATOC or SKIP ---- can be used to
skip all lines in a data file until the first line
containing a "----" string is found. It does not have to
start in column 1. This was added primarily to
to support the data files provided with Dataplot. However,
you can use this with your own data files as well.
If no line with "----" is found, Dataplot rewinds the file
and tries to read data starting with the first line of the
file.
This option only applies if the read is performed on a file.
If the read is from the terminal, SKIP AUTOMATIC is
equivalent to a SKIP 0.
4) The following 2 commands were added:
AUTOCOMOVEMENT PLOT Y
CROSS COMOVEMENT PLOT Y1 Y2
These are similar to the AUTOCORRELATION PLOT and the
CROSS CORRELATION PLOT commands. However, they are based
on the COMOVEMENT statistic rather than the correlation
statistic. At this time, no reference lines indicating
statistical significance are drawn.
5) The following special function was added:
LET A = PSIFN(X,K) - scaled k-th derivative of the PSI (or
DIGAMMA) function
Note that this computes a SCALED version of the function,
specifically
((-1)**(K+1)/GAMMA(K+1))*PSI(X,K)
where GAMMA is the gamma function and PSI(X,K) is the unscaled
function. Also, it is the k-th derivative of PSI, not of
the log gamma function. That is, K=1 computes the
trigamma function, not the digamma function.
6) The DELETE command was modified so that blanked out values
are reset to zero instead of machine negative infinity.
7) Added IF EXIST command. An IF NOT EXIST command was added
several years ago. This commands works as follows:
IF A EXIST
PRINT A
END OF IF
where A is a parameter. A will be printed if it already
exists.
8) Added the command REPLOT to regenerate the most recently
created plot. Although this was motivated by enhancements
to the graphical user interface, it can be useful in command
line mode as well.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT September 1997.
----------------------------------------------------------------------
1) Added a SLEEP <n> command to pause for <n> seconds. This is useful for
macros so plots can be displayed for a given period of time without
requiring user intervention to continue (as needed by the PAUSE command).
This command is platform dependent and is currently implemented for Unix
and Windows 95/NT versions.
Added a CD command to change the current directory. This command is
platform dependent and has currently been implemented for the
Windows 95/NT version. This command is particularly useful for the
Windows 95/NT version since when Dataplot is executed from a screen
icon, the default directory is the the directory where the Dataplot
executable resides. The SYSTEM command cannot be used to change the
current directory since a "SYSTEM CD <directory>" does not persist
after the SYSTEM command completes execution.
2) Added Mark Vangel's RECIPE code. RECIPE stands for "REgression Confidence
Intervals on PErcentiles". It is used to calculate basis values for
regression models with or without a random "batch effect".
A full discussion of RECIPE is beyond the scope of this brief news item.
Complete technical documentation for RECIPE is available at the following
Web site:
http://www.itl.nist.gov/div898/software/recipe/
This discusses RECIPE in general, not the Dataplot implementation.
The basic RECIPE commands are:
RECIPE FIT Y X BATCH XPRED - linear regression, polynomial models
RECIPE ANOVA Y X1 ... XK BATCH - ANOVA, multilinear models
The primary output from the RECIPE command is a set of tolerance values.
These are saved in the internal Dataplot variable TOL by default. This
variable can be plotted and manipulated like any other Dataplot variable.
The RECIPE documentation (on the above web site) also discusses a program
called SIMCOV. SIMCOV is used to determine whether or not Saitterthwaite
approximation is adequate in determing the tolerance values. SIMCOV
uses simulation to determine this. The following commands implement
the SIMCOV program in Dataplot.
RECIPE SIMCOV FIT Y X BATCH XPRED - linear regression, polynomial models
RECIPE SIMCOV ANOVA Y X1 ... XK BATCH - ANOVA, multilinear models
The following commands set switches for the RECIPE and SIMCOV analyses.
RECIPE FIT DEGREE <N> - polynomial degree for RECIPE FIT
RECIPE FACTORS <N> - number of factors for RECIPE ANOVA
RECIPE OUTPUT <VAR> - name of variable to contain computed
tolerance values
RECIPE SATTERTHWAITE <YES/NO> - specifies whether or not Satterthwaite
approximation is used
RECIPE PROBABILITY CONTENT <VAL> - value for probability content
RECIPE CONFIDENCE <VAL> - value for probability content
RECIPE CORRELATION <N> - the number of correlation values at
which to compute SIMCOV probabilities
RECIPE SIMCOV REPLICATES <N> - the number of replications for SIMCOV
RECIPE SIMPVT REPLICATES <N> - the number of replications for SIMPVT
(applies when Satterthwaite
approximation not used)
In addition, the following commands were added to support RECIPE
analyses (these techniques recommended by the MIL-HDBK-17E):
GRUBB TEST Y - performs the Grubb test for outliers
LEVENE TEST Y X - performs the Levene test for homogenuous variances
(similar, but more robust for non-normal distributions,
to Bartlett's test)
F LOCATION TEST Y X - performs an F test for homogenuous locations
These capabilities were originally implemented as the macros GRUBB.DP, LEVENE.DP,
and FTESTLOC.DP which have been added to the Dataplot macro directory.
In addition, four data sets (VANGEL31.DAT, VANGEL32.DAT, VANGEL33.DAT, and
VANGEL34.DAT) that can be analyzed with RECIPE were added to the Dataplot
data sets directory. Corresponding macros (VANGEL31.DP, VANGEL32.DP, VANGEL33.DP,
and VANGEL34.DP) were added to the Dataplot programs directory.
3) The following control charts were added:
EWMA CONTROL CHART Y - exponentially weighted moving average control chart
EWMA CONTROL CHART Y X - exponentially weighted moving average control chart
MOVING AVERAGE CONTROL CHART Y - moving average control chart
MOVING AVERAGE CONTROL CHART Y X - moving average control chart
MOVING RANGE CONTROL CHART Y - moving range control chart
MOVING RANGE CONTROL CHART Y X - moving range control chart
MOVING SD CONTROL CHART Y - moving standard deviation control chart
MOVING SD CONTROL CHART Y X - moving standard deviation control chart
These work in a similar fashion to previously available control charts.
An important feature of all control charts was omitted from previous
documentation (this feature has actually been available for quite some time).
Dataplot allows you to specify the target and lower and upper
control limits by entering the commands:
LET A = TARGET = <value> - the target value
LET A = USL <value> - the upper control limit
LET A = LSL <value> - the lower control limit
The data is drawn as trace 1, the target value and limits derived from the
data are drawn as traces 2, 3, and 4, and the user specified target and
control limits (if given) are drawn as traces 5, 6, and 7. You can control
which of these values are actually plotted by setting the LINE and CHARACTER
commands appropriately.
4) The REPEAT GRAPH, SAVE GRAPH, and LIST GRAPH commands that were previously
added for X11 installations have been extended to support the Microsoft
Windows 95/NT implementation. The commands work on Windows 95/NT as they
do for Unix. The primary difference is that the plots are saved in
Windows bitmap format. The Windows 95/NT still needs a little tidying up
(the default positioning isn't ideal yet), but it is functional.
5) The following special functions were added:
LET A = CGAMMA(XR,XC) - real component of complex gamma
LET A = CGAMMAI(XR,XC) - complex component of complex gamma
LET A = CLNGAM(XR,XC) - real component of complex log gamma
LET A = CLNGAMI(XR,XC) - complex component of complex log gamma
LET A = CBETA(AR,AC,BR,BC) - real component of complex beta
LET A = CBETAI(AR,AC,BR,BC) - complex component of complex beta
LET A = CLNBETA(AR,AC,BR,BC) - real component of complex beta
LET A = CLNBETAI(AR,AC,BR,BC) - complex component of complex beta
LET A = CPSI(XR,XC) - real component of complex psi
LET A = CPSII(XR,XC) - complex component of complex psi
LET A = CHM(X,A,B) - confluent hypergeometric M function
LET A = HYPERGEO(X,A,B,C) - hypergeometric function (for restricted values of X,
convergent case x < 1)
LET A = PBDV(X,A) - parabolic cylinder function (Dv)
LET A = PBDV1(X,A) - derivative of parabolic cylinder
function (Dv)
LET A = PBVV(X,A) - parabolic cylinder function (Vv)
LET A = PBVV1(X,A) - derivative of parabolic cylinder
function (Vv)
LET A = PBWA(X,A) - parabolic cylinder function (Wa) (only for X < 5)
LET A = PBWA1(X,A) - derivative of parabolic cylinder
function (Wa) (only for X < 5)
LET A = BER(XR) - Real component of Kelvin Ber function
LET A = BERI(XR) - Complex component of Kelvin Ber function
LET A = BER1(XR) - Real component of derivative of Kelvin Ber
function
LET A = BERI1(XR) - Complex component of derivative of Kelvin Ber
function
LET A = KER(XR) - Real component of Kelvin Ker function
LET A = KERI(XR) - Complex component of Kelvin Ker function
LET A = KER1(XR) - Real component of derivative of Kelvin Ker
function
LET A = KERI1(XR) - Complex component of derivative of Kelvin Ker
function
LET A = ZETA(S) - Riemann zeta function - 1 (s > 1)
LET A = ETA(S) - eta function - 1 (s >= 1)
LET A = CATLAN(S) - Catlan Beta function - 1 (s >= 1)
LET A = BINOMIAL(N,M) - Binomial coefficent of N and M
LET A = BINOM(N,M) - Binomial coefficent of N and M
LET A = EN(N) - Euler number of order N
LET A = EN(X,N) - Euler polynomial of order N
LET A = BN(N) - Bernoulli number of order N
LET A = BN(X,N) - Bernoulli polynomial of order N
LET A = BERNOULLI NUMBERS FOR I = 1 1 N - Bernoulli numbers
LET A = EULER NUMBERS FOR I = 1 1 N - Euler numbers
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT July 1997.
----------------------------------------------------------------------
1. Added support for printing tic mark labels in exponential
format for linear scales. Enter the command
...TIC MARK LABEL FORMAT EXPONENTIAL
The default is to write the number with an E15.7 format.
To control the number of decimal points, enter the command
...TIC MARK LABEL DECIMAL <n>
where <n> is a positive integer. For example, if
<n> is 4, the number is printed with an E12.4 format.
2) For the diagrammatic graphics commands that draw a figure
(AND, AMPLIFIER, ARC, ARROW, BOX, CAPACITOR, CIRCLE, DIAMOND,
CUBE, ELLIPSE, GROUND, HEXAGON, INDUCTOR, LATTICE, NOR, OR,
OVAL, PYRAMID, POINT, RESISTOR, SEMI-CIRCLE, TRIANGLE)
were updated to include a "DATA" option (similar to the
DRAWDATA and MOVEDATA commands). This "DATA" option draws the
plot in units of the most recent plot rather than 0 to 100
screen units. For example, ELLIPSE DATA <list of points>
draws the ellipse in units of the most recent plot.
Similar to the DATA option, there is a RELATIVE option in the
above commands. Although this capability has actually been
available in Dataplot for quite some time, it was left out
of the documentation for the diagrammatic graphics commands.
Relative drawing means that the first point is drawn in
absolute units and all subsequent points are relative to the
prior point. For example DRAW RELATIVE 10 10 2 3
would draw a line from (10,10) to (12,13).
The word "DATA" should come before the word "RELATIVE"
in these commands. There are actually 4 forms to these
commands. For example,
ELLIPSE X1 Y1 X2 Y2 X3 Y3
ELLIPSE DATA X1 Y1 X2 Y2 X3 Y3
ELLIPSE RELATIVE X1 Y1 X2 Y2 X3 Y3
ELLIPSE DATA RELATIVE X1 Y1 X2 Y2 X3 Y3
The first form draws in absolute screen 0 to 100 units,
the second form draws in absolute units of the most recent plot,
the third form draws in relative screen 0 to 100 units, and
the fourth form draws in relative units of the most recent plot.
3) POLYGON was added to the list of diagrammatic commands. This
command takes the following form:
POLYGON X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON DATA X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON RELATIVE X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON RELATIVE DATA X Y <SUBSET/EXCEPT/FOR qualification>
The first form plots the polygon in 0 to 100 screen units while
the second form plots the data in units of the most recent plot.
The third and fourth forms are similar, but they use relative
coordinates (the first coordiante pair is in absolute units,
the remaining are coordinates relative to the previous point).
Note that X and Y are arrays, not lists of points as used by
the other diagrammatic graphics commands. Since these are
arrays, the SUBSET, EXCEPT, and FOR qualifications can be
applied to the list of points, although this is not common
in the context of this command.
Setting the last point to the first point (i.e., closing the
polygon) is not required since Dataplot does this automatically.
As with the other diagrammatic graphics commands, the attributes
of the border of the polygon are set via the first setting
of the LINE commands (e.g., LINE DASH, LINE COLOR BLUE, LINE
THICKNESS 0.3). The attributes of the interioir of the polygon
are set with the various REGION attribute commands (e.g.,
REGION FILL ON, REGION FILL COLOR BLUE).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January-April 1997.
----------------------------------------------------------------------
1. A check is now performed to determine if DPPL2F.DAT is opened
successfully upon starting Dataplot. If not, an error message is
printed and Dataplot is terminated. The typical cause for this
is trying to run Dataplot in a read only directory. This change
provides a more graceful exit.
2. The Dataplot Reference Manual is now available on-line. The
Dataplot home page can be accessed from a Web browser using
the URL:
http://www.itl.nist.gov/div898/software/dataplot/homepage.html
The Reference Manual is under the "documentation" table entry.
The following should be noted:
a) In order for these commands to work, you need to have
a web browser available on your system.
The Dataplot web pages display correctly with the Netscape,
Internet Explorer, and HotJava 1.1 browsers. They do not
display correctly with the HotJava 1.0, Mosaic, or character
oriented browsers. We do not have access to other browsers,
so we can make no specific comment on them.
b) The Reference Manual is in PDF format (Portable Document
Format), so it requires a PDF viewer. Typically, this is the
Adobe Acrobat Reader. This reader is supported on most common
platforms and can be downloaded for free. The PC installation
typically takes about 10-15 minites to download and install.
For best performance, it is strongly recommended that the
Adobe Acrobat reader be installed as a plug-in (this is
done automatically for Netscape on the PC) rather than
as a helper application. The documentation web page contains
a link to the Adobe Acrobat web site for downloading the
reader.
In addition, several commands are now available for accessing
the Web, and the Dataplot Web pages and Reference Manual in
particular, from within Dataplot.
The first command is:
WEB
WEB NIST/SIMA/HPPC/SED/DATAPLOT
WEB <url address>
By default, this command activates Netscape with the specified
URL. If no URL is given, the NIST home page is used. Several
keywords are recognized. For example, SED activates the
NIST Statistical Engineering Division home page.
The second command is:
WEB HELP <string>
This command is similar to the standard Dataplot HELP command.
However, it accesses the on-line Reference Manual rather than
the ASCII text help files. <string> will usually be a Dataplot
command (e.g., WEB HELP FIT, WEB HELP PLOT). However, many
special keywords are also recognized. For example, WEB HELP or
WEB HELP DATAPLOT access the Dataplot home page. Enter the
command:
LIST REFMAN.TEX
to see a list of recognized keywords (the upper case entries in
columns 1-40 identify the keywords while columns 40+ identify the
associated URL).
The WEB and WEB HELP commands are supported for Unix platforms
and for the Windows 95/NT version.
A few SET commands were added to support the WEB and WEB HELP
commands.
a) By default, Dataplot tries to use the Netscape browser. On
Unix, it tries to do this by entering the command "netscape".
On Windows 95/NT, it enters
"C:\Program Files\NETSCAPE\NAVIGATOR\PROGRAM\netscape.exe"
If you wish to use a different browser, or if Netscape is
installed in a different location, you can enter the
following command:
SET BROWSER <file name>
where <file name> is the string that activates your preferred
browser. In particular, if you prefer to use the Internet
Explorer under Windows 95/NT, you can enter:
SET BROWSER "C:\Program Files\Plus!\Microsoft Internet\iexplore.exe"
The enclosing quotes are required because the file name contains
spaces. Again, check to see if this is the proper path on
your system.
Alternatively, you can enter the Unix command
setenv BROWSER <file name>
or the Windows 95/NT command
SET BROWSER=<file name>
to set the browser. These are typically placed in your
start-up files (.login or .cshrc for Unix, AUTOEXEC.BAT for
Windows 95/NT). You can shorten the browser name if you add
the correct directory to your path.
b) For the WEB command, the default URL is the NIST home page.
You can change the default with the following Dataplot command:
SET URL <default URL>
For the WEB HELP command, the default URL is the Dataplot
home page on the public NIST web server. This can be
changed (for example, if you have installed the Dataplot
web pages and Reference Manual on a local site) by entering
the command:
SET DATAPLOT URL <location of Dataplot web pages>
Alternatively, you can enter the Unix commands
setenv URL <location of default URL>
setenv DPURL <location of Dataplot web pages>
or the Windows 95/NT commands
SET URL=<location of default URL>
SET DPURL=<location of Dataplot web pages>
For Unix platforms, the following command was added to tell
Dataplot to use a currently open NETSCAPE window (this command
is not needed for the PC):
SET NETSCAPE <OLD/NEW>
These commands have been tested with NETSCAPE on Unix and
with Netscape and the Internet Explorer on the PC.
One important difference between the Unix and PC versions of
these commands should be noted. Under Unix, once the WEB command
is initiated, control returns to Dataplot after the browser is
started. You can independently navigate in the the browser and
enter additional Dataplot commands. However, on the PC, control
does not return to Dataplot until you exit the browser.
3. The following commands were added to allow previously viewed
graphs to be saved for later recall. The primary purpose is
to allow comparisons of a previous graph to a current graph.
These commands are currently only supported for the X11 graphics
device (available on most Unix implementations).
SAVE PLOT <file> (or SAVE GRAPH, SP, SG)
SAVE PLOT <file> AUTOMATIC
SAVE PLOT AUTOMATIC
REPEAT PLOT <file> (or REPEAT GRAPH, RP, RG, VIEW PLOT,
VIEW GRAPH, VG, VP)
REPEAT PLOT <+n>
REPEAT PLOT <-n>
LIST PLOT (or LIST GRAPH, LP, LG)
CYCLE PLOT (or CYCLE GRAPH, CG, CP)
PIXMAP TITLE <title>
As a technical note, the plots are saved in X11 "bitmap" format.
This is distinct from the X11 image format that is used by
xwd to save a screen image. This choice was made for performance
reasons (xlib provides direct routines for reading and writing
bitmaps, but not for reading and writing images). The primary
limitations are:
i) Color is not supported for X11 bitmaps. Elements drawn
in color will not be saved in the bitmap.
ii) You cannot use the X11 tools xwd and xwud to view the
saved plots independently of Dataplot. However, they
can be viewed by any software the reads X11 bitmaps.
The saved plots are essentially screen dumps. There is
currently no "linking" in the sense that if a given variable
is changed the saved plots are automatically updated.
The SAVE GRAPH command saves the current plot in the user
specified file. If no file name is specified, then the file
name "pixmap.<n>", where <n> is a counter, is used.
The keyword AUTOMATIC tells Dataplot to automatically save all
subsequent plots. With the AUTOMATIC option, Dataplot does not
save the current graph until the next plot is generated. This is
done in order to correctly handle multi-plots and diagrammatic
graphics. That is, the current graph is saved whenever a screen
erase is performed. If a filename is provided, this will be used
as the base (the ".<n>" is added). For example,
SAVE PLOT HISTOGRAMS AUTOMATIC saves subsequent plots in
the files HISTOGRAMS.1, HISTOGRAMS.2, and so on. Enter SAVE GRAPH
AUTOMATIC OFF to terminate the automatic saving of the plots.
The REPEAT PLOT command reads a saved plot and draws it in a
window that is distinct from the normal Dataplot X11 graphics
window. If no file is specified, or if <n> is 0 for REPEAT
PLOT, the most current saved plot is drawn. A <+n> takes the
Nth plot from the current list. A <-n> takes the "current - n"th
plot from the current plot list. The DEVICE 1 X11 command
must be entered before the REPEAT PLOT command can be used.
The REPEAT PLOT command can redraw plots that were created in
a previous Dataplot session. In fact, it will successfully
redraw any file that is in the X11 bitmap format (but not in
xwd format).
The LIST PLOT command lists the currently saved plots (by
sequence number, file name, and title). It only lists plots
saved in the current session. However, this includes graphs
created in a previous Dataplot session that have been redrawn
with the REPEAT GRAPH command. Dataplot does not maintain a
database of previously saved plots.
The CYCLE PLOT command allows you to cycle through the pixmaps
in the current list by clicking mouse buttons. Clicking the
left mouse button moves down in the current list, clicking the
right mouse button moves up in the current list, and clicking
the middle mouse button returns control to Dataplot. At least
one REPEAT PLOT command should be entered before using this
command.
The PIXMAP TITLE command allows you to specify the title for
a saved plot. This title is simply for convenience in listing
the saved plots. It is not saved as part of the file and the
title only applies to the current Dataplot session. The default
title is the file name.
The pixmap title applies to the current plot when the SAVE GRAPH
command is entered. It does not matter whether the PLOT or
PIXMAP TITLE command is entered first.
Be aware that for SAVE GRAPH AUTOMATIC the saving for a given
plot is not executed until the next screen erase (typically the
next plot) is encountered to allow for multi-plotting and the
addition of diagrammatic graphics to a plot. The order of
the commands would typically be something like:
SAVE GRAPH AUTOMATIC
4-PLOT Y
PIXMAP TITLE 4-PLOT
PLOT Y
PIXMAP TITLE PLOT Y
HISTOGRAM Y
PIXMAP TITLE HISTOGRAM
The main point here is that the PIXMAP TITLE comes AFTER the
plot command.
Unlike the regular TITLE command, the PIXMAP TITLE command does
not persist. That is, it applies only to the next saved plot and
then reverts to the default of using the file name.
4. Added following special functions:
a) LAMBDA(X,V) - Lambda function (V can be integer or real)
b) LAMBDAP(X,V) - derivative of Lambda function (V can be integer
or real)
c) H0(X) - Struve function order 0
d) H1(X) - Struve function order 1
e) HV(X,V) - Struve function order V
f) L0(X) - modified Struve function order 0
g) L1(X) - modified Struve function order 1
h) LV(X,V) - modified Struve function order V
i) Added LOGBETA as synonym for LNBETA and LNGAMMA as synonym for
LOGGAMMA.
5. The following bug fixes were made:
a) Fixed bug where TEXT command automatically generated a software
font (introduced by the DEVICE FONT command).
b) Fixed bug in the ANOVA command.
c) Fixed bug with ERASE command on Windows NT version.
d) Fixed bug in HELP with conflict between STATUS and
STATISTIC PLOT.
e) Fixed bug if software font used and CHARACTER BLANK was
entered in lower case.
f) Fixed bug where CREATE <file> went into an infinite loop if
a CALL command was encountered. The CALL command will now
be saved correctly in the CREATE file. Note that the commands
in the CALL file are not saved in the CREATE file (they are
already saved as part of the CALL macro file).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT October-November 1996.
----------------------------------------------------------------------
1. A native mode Windows 95/NT version is now available. This
version was created using the Microsoft Windows 95 compiler.
The initial release supports the command line version only.
We will attempt over the next several months to port the
Tcl/Tk based graphical user interface to the Windows 95/NT
environment.
To generate graphics to the screen for this version, enter
the following command:
DEVICE 1 QWIN
Enter the command HELP QWIN for details of using this device.
2. For encapsulated Postscript files, DATAPLOT based the bounding
box parameters assuming an 11 x 11 inch page. This was done to
accomodate both landscape and portrait orientation plots.
Unfortunately, this did not generate satisfactory results when
importing DATAPLOT graphics into WordPerfect and other text
processing software. The user had to do a fair amount of manual
rotation and scaling of plots.
DATAPLOT now adjusts the bounding box depending on the orientation.
It uses 11 x 8.5 inch for landscape orientation and 8.5 x 11 inch
for portrait. However, most text processors ignore the rotation
and translation that the landscape plots request. To compensate
for this, the following command was added:
ORIENTATION LANDSCAPE WORDPERFECT
This essentially generates a landscape orientation on a portrait
page. That is, the bounding box specifies an 8.5 x 6.5 inch page.
This generates execellent results with Word Perfect (users should
normally never need to adjust the bounding box parameters or
perform manual rotation and translation in Word Perfect).
This option is only recognized for encapsulated Postscript.
Regular Postscript should still use ORIENTATION LANDSCAPE.
3. Fixed a few bugs:
a. Macros now accept more than 1,000 lines.
b. Unix executables were not finding certain auxillary files
if the file names were entered in lower case.
c. NORMAL PLOT fixed.
4. The output for the YATES command was modified to be more readable
and informative.
----------------------------------------------------------------------
The following enhancement was made to DATAPLOT July 1996.
----------------------------------------------------------------------
1. The previous fix (checking the HOME environment variable for the
user's root directory) was refined a bit. If HOME is defined,
it looks for dplogf.tex in that directory. If dplogf.tex is
not found, instead of printing an error message, it then strips
off the path name and looks for it in the current directory and
then in the DATAPLOT directory (typically /usr/local/lib/dataplot).
Note that if an error message is printed saying that this file is
not found, DATAPLOT will still run. This file simply lets you
enter some DATAPLOT commands when starting DATAPLOT (i.e., for
setting your preferred defaults). There should not be any
negative side effects if this file is not executed.
2. Unix versions will check for the environment variable
DATAPLOT_WEB. If this variable is defined, DATAPLOT assumes it
is being run from the web (e.g., from Mosaic or Netscape).
Currently, the only effect is that certain files that DATAPLOT
typically creates in the current directory, such as dppl1f.dat
and dpconf.tex, are opened in the /tmp directory. This may or
may not be expanded upon as we gain more experience running
DATAPLOT from web servers.
3. We built a "double precision" version for the Sun. That is,
the -p8 option was used so that single precision numbers are
64-bit rather than 32-bit. The only complication was in how the
X11 routines were called (these are compiled with 32-bit real
numbers). Changes were made to the X11 driver to allow a
"compile flag" to be set based on which case (i.e., 32 or 64-bit)
is desired. This means that DATAPLOT can be easily built on any
Unix system that supports the "-p8" option (or a compiler switch
that provides a similar capability).
4. A version of DATAPLOT was built using the LAHEY compiler
(previously, the OTG compiler was used). This version allows
DATAPLOT to be run on PC's without special AUTOEXEC.BAT and
CONFIG.SYS files (and therefore no rebooting to run DATAPLOT).
A device driver that uses the LAHEY graphics library is also
available. Enter
DEVICE 1 LAHEY
DEVICE 1 FONT SIMPLEX (this described below)
5. The following command was added:
DEVICE <1/2/3> FONT <font name>
This allows the screen device to use a different font than the
printed output. This was specifically motivated for the LAHEY
device driver. This driver does a very poor job with hardware
characters. Using a software font avoids this problem, but
often hardware characters are desired for the printed Postscript
output (to take advantage of the typset quality fonts available
with Postscript). Using the DEVICE 1 FONT SIMPLEX allows us
to get decent characters on the screen and still retain the
ability to use the Postscript fonts. Although this command
was motivated by the LAHEY device, it is also useful for other
screen devices (e.g., X11 hardware fonts are a fixed size, so
only 1 character size is available at a time, Tektronix devices
are limited to 4 discrete sizes, etc.).
6. Previously, log scales required at least 1 full cycle (e.g.,
10 to 100). It is now possible to get around this limitation.
For example, to have a log scale go from 85 to 125, do the
following:
YLOG ON
YLIMITS 100 100
YTIC OFFSET 15 25
PLOT Y
The key is that the lower and upper bound on the LIMITS command
must be the same and at least one of the TIC OFFSETS must be
greater than zero. Major TICS will be generated at this bound
and also at the frame limits. Minor tics will be plotted
where appropriate. Also, the TIC OFFSET is always interpreted
in data units for this case (i.e., can't specify the offset
in DATAPLOT 0 to 100 coordinates as you normally can).
7. Several bugs were fixed.
----------------------------------------------------------------------
The following enhancement was made to DATAPLOT June 1996.
----------------------------------------------------------------------
For Unix systems, check for the HOME environment variable. This
normally specifies the user's home directory. If present, DATAPLOT
looks for the user's start-up file (dplogf.tex) in the user's home
directory rather than the current directory. This means you no
longer have to include the start-up file in each directory from
which you run DATAPLOT. If HOME is not found, look for dplogf.tex
in the current directory . Note that if HOME is found and dplogf.tex
is not found in the home directory, DATAPLOT will NOT look for
it in the current directory.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT MAY, 1996.
----------------------------------------------------------------------
1) Fixed a bug where the X11 driver bombed if being run remotely
and the SET X11 PIXMAP ON command was used.
2) Fixed a bug where the 3D-PLOT was bombing when a large number of
points were plotted.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT FEBRUARY-APRIL, 1996.
----------------------------------------------------------------------
1) The following probability functions were added:
LET A = BBNCDF(X,ALPHA,BETA,N) - beta-binomial cumulative
distribution function
LET A = BBNPDF(X,ALPHA,BETA,N) - beta-binomial probability
density function
LET A = BBNPPF(P,ALPHA,BETA,N) - beta-binomial percent point
function
LET A = BRACDF(X,BETA) - Bradford cumulative distribution
function
LET A = BRAPDF(X,BETA) - Bradford probability density function
LET A = BRAPPF(P,BETA) - Bradford percent point function
LET A = DGACDF(X,GAMMA) - double gamma cumulative distribution
function
LET A = DGAPDF(X,GAMMA) - double gamma probability density
function
LET A = DGAPPF(P,GAMMA) - double gamma percent point function
LET A = FCACDF(X,U,SD) - folded Cauchy cumulative distribution
function
LET A = FCAPDF(X,U,SD) - folded Cauchy probability density
function
LET A = FCAPPF(P,U,SD) - folded Cauchy percent point function
LET A = GEXCDF(X,LAM1,LAM2,S) - generalized exponential
cumulative distribution function
LET A = GEXPDF(X,LAM1,LAM2,S) - generalized exponential
probability density function
LET A = GEXPPF(P,LAM1,LAM2,S) - generalized exponential
percent point function
LET A = GLOCDF(X,ALPHA) - generalized logistic cumulative
distribution function
LET A = GLOPDF(X,ALPHA) - generalized logistic probability
density function
LET A = GLOPPF(P,ALPHA) - generalized logistic percent point
function
LET A = KAPCDF(X,AK,B,T) - Mielke's beta-kappa cumulative
distribution function
LET A = KAPPDF(X,AK,B,T) - Mielke's beta-kappa probability
density function
LET A = KAPPPF(P,AK,B,T) - Mielke's beta-kappa percent point
function
LET A = NCCPDF(X,V,DELTA) - non-central chi-square probability
density function
LET A = PEXCDF(X,ALPHA,BETA) - exponential power cumulative
distribution function
LET A = PEXPDF(X,ALPHA,BETA) - exponential power probability
density function
LET A = PEXPPF(P,ALPHA,BETA) - exponential power percent point
function
The following probability plots were added:
LET ALPHA = <value>
LET BETA = <value>
LET N = <value>
BETA BINOMIAL PROBABILITY PLOT Y
LET BETA = <value>
BRADFORD PROBABILITY PLOT Y
LET GAMMA = <value>
DOUBLE GAMMA PROBABILITY PLOT Y
LET M = <value>
LET SD = <value>
FOLDED CAUCHY PROBABILITY PLOT Y
LET LAMBDA1 = <value>
LET LAMBDA2 = <value>
LET S = <value>
GENERALIZED EXPONENTIAL PROBABILITY PLOT Y
LET ALPHA = <value>
GENERALIZED LOGISTIC PROBABILITY PLOT Y
LET BETA = <value>
LET THETA = <value>
LET K = <value>
MIELKE BETA-KAPPA PROBABILITY PLOT Y
LET ALPHA = <value>
LET BETA = <value>
EXPONENTIAL POWER PROBABILITY PLOT Y
The following probability plot correlation coefficient plots
were added:
BRADFORD PPCC PLOT Y
DOUBLE GAMMA PPCC PLOT Y
GENERALIZED LOGISTIC PPCC PLOT Y
2) The WRITE command was updated to handle a maximum of 25 variables
(up from 10).
Support was added for writing Fortran unformatted data files.
This was done primarily for sites that have created "mega" size
versions of DATAPLOT where the time entailed in reading and writing
large data files becomes important. For standard size DATAPLOT
(typically a maximum of 10,000 rows with 10 columns for 100,000
data points total), the use of the SET READ FORMAT and SET WRITE
FORMAT commands provides adequate performance. However, the
unformatted read and write capability is available regardless of
the workspace size. The advantage of unformatted read and writes
is that the data files are much smaller (typically by a factor of
10 or more) and reading and writing the data significantly faster.
The disadvantage is that unformatted files are binary, and thus
cannot be modified or viewed with a standard text editor. Also,
Fortran unformatted files are NOT transportable across different
computer systems. Also, unformatted Fortran files are NOT
equivalent to C language byte stream files (these types of files
are not currently supported in DATAPLOT).
An unformatted write is accomplished by entering the command:
SET WRITE FORMAT UNFORMATTED
and then entering a standard WRITE command. For example,
WRITE LARGE.DAT X1 X2 X3
There are 2 ways to create the unformatted file in Fortran. For
example, suppose X and Y are to be written to an unformatted
file. The WRITE can be generated by:
a) WRITE(IUNIT) (X(I),Y(I),I=1,N)
b) WRITE(IUNIT) X,Y
The distinction is that (a) stores the data as X(1), Y(1),
X(2), Y(2), ..., X(N), Y(N) while (b) stores all of X then
all of Y. There is no inherent advantage in either method in
terms of performance or file size. The SET WRITE FORMAT
UNFORMATTED command only supports (a).
Unformatted writing is supported only for variables or matrices
(i.e., not for parameters or strings).
Be aware that Fortran unformatted files are NOT transportable
across systems. This is due to the fact that the file contains
various header bytes (the Fortran standard leaves implementation
of this up to vendor) that are not standard. Also, the storage
of real numbers can vary between platforms. This means that
the SET WRITE FORMAT UNFORMATTED command can NOT be used to write
raw binary files (as might be produced by a C program) and it
cannot, in general, be used to write unformatted Fortran files
that can be read on systems other than the one you are running
DATAPLOT on.
3) The command SET RELATIVE HISTOGRAM <AREA/PERCENT> was added to
specify whether or not relative histograms (and relative
bi-histograms) are drawn so that the area under the histogram
sums to 1 or so that the heights of the histograms sum to 1.
The first option, which is the default, is useful when using the
relative histogram as an estimate of a probability distribution.
The second option is useful when you want to see what percentage
of the data falls in a given class.
4) For Unix versions, the location of the DATAPLOT auxillary files
can be specified with the following Unix command:
setenv DATAPLOT_FILES <directory name>
This can be useful if you do not have super user permission to
copy the files into the /usr/local/lib/dataplot directory and
you do not have a cooperative system adminstrator.
5) The LET STRING command was modified so that the case of the
text in the string is preserved as entered. Note that the
LET FUNCTION command still converts text to upper case.
The READ STRING command was modified so that it ignores the
SET READ FORMAT command.
6) Numerous minor bugs were fixed.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT AUGUST-OCTOBER, 1995.
-----------------------------------------------------------------
1) The Numerical Recipes routine for calculating complex roots
was replaced with a CMLIB routine. There is no change in the
command syntax.
2) The Numerical Recipes routine for calculating the fast Fourier
transform was replaced with CMLIB routines. A couple of changes
were made as follows:
a) the CMLIB routine does not require zero padding so that
the length of the variable is a power of two. Previously,
DATAPLOT did this automatically. It no longer does. However,
the CMLIB algorithm loses efficiency if the length is not a
factor of small primes. In this case, you may wish to zero
pad the variable yourself before calling the FFT command.
b) The SET FOURIER EXPONENT <+/-> command was corrected to work
as intended (the default implemented the + case, which was really
the only option that worked). In addition, this command was
extended to apply to the FOURIER and INVERSE FOURIER command
as well as the FFT and INVERSE FFT commands. Enter
HELP FOURIER EXPONENT for more information on this command.
c) Most FFT routines return the data in the following order:
F(1) = zero frequency
F(2) ... F(N/2) = smallest positive frequency to largest
positive frequency
F(N/2+1) = aliased point that contains the largest
positive and the largest negative frequency
F(N/2+2) ... F(N) = negative frequencies from largest
magnitude to smallest magnitude
By default, DATAPLOT returns the data in the following order:
F(1) = aliased point that contains the largest
positive and the largest negative frequency
F(2) ... F(N/2) = Largest positive frequency to smallest
positive frequency
F(N/2+1) = zero frequency
F(N/2+2) ... F(N) = negative frequencies from smallest
magnitude to largest magnitude
The command SET FOURIER ORDER <STANDARD/DATAPLOT> was
implemented to allow you to specify which order to use.
The option STANDARD returns the first order while the option
DATAPLOT returns the second order.
3) Support was added for hypergeometric, non-central chi-square,
singly and doubly non-central F, half-cauchy and folded normal
random numbers,
The following probability functions were added:
LET A = ANGCDF(X) - anglit cumulative distribution function
LET A = ANGPDF(X) - anglit density function
LET A = ANGPPF(X) - anglit percent point function
LET A = ARSCDF(X) - arcsin cumulative distribution function
LET A = ARSPDF(X) - arcsin density function
LET A = ARSPPF(X) - arcsin percent point function
LET A = DWECDF(X,G) - double Weibull cumulative distribution
function
LET A = DWEPDF(X,G) - double Weibull density function
LET A = DWEPPF(X,G) - double Weibull percent point function
LET A = EWECDF(X,G) - exponentiated Weibull cumulative
distribution function
LET A = EWEPDF(X,G) - exponentiated Weibull density function
LET A = EWEPPF(X,G) - exponentiated Weibull percent point function
LET A = FNRCDF(X,U,SD) - folded normal cumulative distribution
function
LET A = FNRPDF(X,U,SD) - folded normal probability density
function
LET A = FNRPPF(X,U,SD) - folded normal percent point function
LET A = GEVCDF(X,G) - generalized extreme value cumulative
distribution function
LET A = GEVPDF(X,G) - generalized extreme value density function
LET A = GEVPPF(X,G) - generalized extreme value percent point
function
LET A = GOMCDF(X,C,B) - Gompertz cumulative distribution function
LET A = GOMPDF(X,C,B) - Gompertz probability density function
LET A = GOMPPF(X,C,B) - Gompertz percent point function
LET A = HFCCDF(X) - half-Cauchy cumulative distribution function
LET A = HFCPDF(X) - half-Cauchy density function
LET A = HFCPPF(X) - half-Cauchy percent point function
LET A = HFLCDF(X,G) - generalized half-logistic cumulative
distribution function
LET A = HFLPDF(X,G) - generalized half-logistic density function
LET A = HFLPPF(X,G) - generalized half-logistic percent point
function
LET A = HSECDF(X) - hyperbolic secant cumulative distribution
function
LET A = HSEPDF(X) - hyperbolic secant density function
LET A = HSEPPF(X) - hyperbolic secant percent point function
LET A = LGACDF(X,G) - log-gamma cumulative distribution function
LET A = LGAPDF(X,G) - log-gamma density function
LET A = LGAPPF(X,G) - log-gamma percent point function
LET A = PA2CDF(X,G) - Pareto type 2 cumulative distribution
function
LET A = PA2PDF(X,G) - Pareto type 2 density function
LET A = PA2PPF(X,G) - Pareto type 2 percent point function
LET A = TNRCDF(X,A,B,U,SD) - truncated normal cumulative
distribution function
LET A = TNRPDF(X,A,B,U,SD) - truncated normal probability density
function
LET A = TNRPPF(X,A,B,U,SD) - truncated normal percent point
function
LET A = TNECDF(X,X0,U,SD) - truncated exponential cumulative
distribution function
LET A = TNEPDF(X,X0,U,SD) - truncated exponential probability
density function
LET A = TNEPPF(X,X0,U,SD) - truncated exponential percent point
function
LET A = WCACDF(X,G) - wrapped-up Cauchy cumulative distribution
function
LET A = WCAPDF(X,G) - wrapped-up Cauchy density function
LET A = WCAPPF(X,G) - wrapped-up Cauchy percent point function
The following probability plots were added:
ANGLIT PROBABILITY PLOT Y
ARCSIN PROBABILITY PLOT Y
HYPERBOLIC SECANT PROBABILITY PLOT Y
HALF CAUCHY PROBABILITY PLOT Y
LET M = <value>
LET SD = <value>
FOLDED NORMAL PROBABILITY PLOT Y
LET A = <value>
LET B = <value>
LET M = <value> (optional, defaults to 0)
LET SD = <value> (optional, defaults to 1)
TRUNCATED NORMAL PROBABILITY PLOT Y
LET X0 = <value>
LET M = <value> (optional, defaults to 0)
LET SD = <value> (optional, defaults to 1)
TRUNCATED EXPONENTIAL PROBABILITY PLOT Y
LET GAMMA = <value>
DOUBLE WEIBULL PROBABILITY PLOT Y
LOG GAMMA PROBABILITY PLOT Y
GENERALIZED EXTREME VALUE PROBABILITY PLOT Y (or GEV PROB PLOT)
PARETO SECOND KIND PROBABILITY PLOT Y (or PARETO TYPE 2)
HALF LOGISTIC PROBABILITY PLOT Y (GAMMA optional for this case)
LET GAMMA = <value>
LET THETA = <value>
EXPONENTIATED WEIBULL PROBABILITY PLOT Y
LET C = <value>
LET B = <value>
GOMPERTZ PROBABILITY PLOT Y
LET C = <value>
WRAPPED CAUCHY PROBABILITY PLOT Y
The following probability plot correlation coefficient plots were
added:
LOG GAMMA PPCC PLOT Y
DOUBLE WEIBULL PPCC PLOT Y
GENERALIZED EXTREME VALUE PPCC PLOT Y (or GEV PPCC PLOT)
PARTEO SECOND KIND PPCC PLOT Y (or PARETO TYPPE 2 PPCC PLOT)
WRAPPED CAUCHY PPCC PLOT Y
HALF LOGISTIC PPCC PLOT Y
4) The following character option was added:
CHARACTER PIXEL
This option plots a single "pixel" on a given device. In addition,
when this option is given, the CHARACTER SIZE is interpreted as
an integer expansion factor. For example, CHARACTER SIZE 10 will
plot a 10x10 pixel block.
This option has been implemented for the Tektronix, X11,
Postscript, HP-GL, Regis, HP-2622, and Sun devices. Other devices
will print a message saying this option is unavailable (although
additional devices will be added later).
Although this capability was added with some possible future
enhancements in mind, it can be useful in some plots such as
fractal plots.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT JULY, 1995.
-----------------------------------------------------------------
Support was added for various types of orthogonal polynomials.
The following commands were added.
LET A = LEGENDRE(X,N) Compute the Legendre polynomial of
order n
LET A = LEGENDRE(X,N,M) Compute the associated Legendre
polynomial of order n and degree m
LET A = NRMLEG(X,N) Compute the normalized Legendre
polynomial of order n
LET A = NRMLEG(X,N,M) Compute the associated normalized
Legendre polynomial of order n and
degree m
LET A = LEGP(X,N) Compute the Legendre function of the
first kind of order n
LET A = LEGP(X,N,M) Compute the associated Legendre function
of the first kind of order n and degree m
LET A = LEGQ(X,N) Compute the Legendre function of the
second kind of order n
LET A = LEGQ(X,N,M) Compute the associated Legendre function
of the second kind of order n and
degree m
LET A = SPHRHRMR(X,P,N,M) Compute the real component of the
spherical harmonic function
LET A = SPHRHRMC(X,P,N,M) Compute the complex component of the
spherical harmonic function
LET A = LAGUERRE(X,N) Compoute the Laguerre polynomial of
order n
LET A = LAGUERRL(X,N,A) Compute the generalized Laguerre
polynomial of order n
LET A = NRMLAG(X,N) Compute the normalized Laguerre
polynomial of order n
LET A = CHEBT(X,N) Compute the Chebyshev T (first kind)
polynomial of order n
LET A = CHEBU(X,N) Compute the Chebyshev U (second kind)
polynomial of order n
LET A = JACOBIP(X,N,A,B) Compute the Jacobi polynomial of order n
LET A = ULTRASPH(X,N,A) Compute the Ultraspherical (or
Gegenbauer) polynomial of order n
LET A = HERMITE(X,N) Compute the Hermite polynomial of order n
LET A = LNHERMIT(X,N) Compute the log of the absolute value of
the Hermite polynomial of order n
LET A = HERMSGN(X,N) Compute the sign of the Hermite
polynomial (1 for positive, -1 for
negative, 0 for zero)
In addition, an alpha version of a graphical user interface is
available on some Unix systems. You can check with your local site
installer to see if it is available on your system. If it is
available, it is typically executed by entering the command:
xdp
At NIST, the frontend has been installed on the CAML Sun's and
SGI's as well as the Convex. There are no plans to install it
on the Cray. For non-NIST sites, the following non-DATAPLOT programs
must be installed:
1) Tcl/TK - Tool Commmand Language
2) Expect - a program for controlling the dialog among
interactive programs.
These are both popular public domain Unix utilities that can be
installed on most common Unix platforms.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT APRIL, 1995.
-----------------------------------------------------------------
1) Support was added for reading Fortran unformatted data files.
This was done primarily for sites that have created "mega" size
versions of DATAPLOT where the time entailed in reading large
data files becomes important. For standard size DATAPLOT
(typically a maximum of 10,000 rows with 10 columns for 100,000
data points total), the use of the SET READ FORMAT command
provides adequate performance. However, the unformatted read
capability is available regardless of the workspace size. The
advantage of unformatted reads is that the data files are much
smaller (typically by a factor of 10 or more) and reading the
data significantly faster. The disadvantage is that unformatted
files are binary, and thus cannot be modified or viewed with a
standard text editor. Also, Fortran unformatted files are NOT
transportable across different computer systems.
An unformatted read is accomplished by entering the command:
SET READ FORMAT UNFORMATTED
and then entering a standard READ command. For example,
READ LARGE.DAT X1 X2 X3
There are 2 ways to create the unformatted file in Fortran. For
example, suppose X and Y are to be written to an unformatted
file. The WRITE can be generated by:
a) WRITE(IUNIT) (X(I),Y(I),I=1,N)
b) WRITE(IUNIT) X,Y
The distinction is that (a) stores the data as X(1), Y(1),
X(2), Y(2), ..., X(N), Y(N) while (b) stores all of X then
all of Y. There is no inherent advantage in either method in
terms of performance or file size. The SET READ FORMAT
UNFORMATTED command assumes (a). To specify (b), enter the
command:
SET READ FORMAT COLUMNWISE (or UNFORMATTEDCOLUMNWISE)
Unformatted reading is supported only for variables or matrices
(i.e., not for parameters or strings). Also, it only applies
when reading from a file. The limits for the maximum number of
rows and columns for a matrix still apply (500 rows and 100
columns on most systems). When reading a matrix, the number of
columns must be specified via the SET UNFORMATTED COLUMNS
command. For example,
SET READ FORMAT UNFORMATTED
SET UNFORMATTED COLUMNS 25
READ MATRIX.DAT M
The maximum size of the file that DATAPLOT can read is equal to
the workspace size on your implementation (100,000 or 200,000
points on most installations). For larger files, it will read
up to this number of data values.
The data is assumed to be a rectangular grid of data written in
a single chunk. Only single precision real numbers are
supported. By default, the entire file (up to the maximum number
of points) is read. DATAPLOT does provide 2 commands to allow
some control of what portion of the file is read:
SET UNFORMATTED OFFSET <value>
SET UNFORMATTED RECORDS <value>
The OFFSET specifies the number of data values at the begining of
the file to skip. This is useful for skipping header lines
(similar to a SKIP command for reading ASCII files) and other
miscellaneous values. The RECORDS value is useful for reading
part of a larger file.
Be aware that Fortran unformatted files are NOT transportable
across systems. This is due to the fact that the file contains
various header bytes (the Fortran standard leaves implementation
of this up to vendor) that are not standard. Also, the storage
of real numbers can vary between platforms. This means that
the SET READ FORMAT UNFORMATTED command can NOT be used to read
raw binary files (as might be produced by a C program) and it
cannot, in general, be used to read unformatted Fortran files
created on systems other than the one you are running DATAPLOT on.
2) The following mathematical library functions were added:
LET A = HEAVE(X,C) - Heavside function (=1 if X>=C, 0
otherwise, C is 0 if no second argument)
LET A = CEIL(X) - ceiling function (integer value of x
rounded to positive infinity
LET A = FLOOR(X) - floor function (integer value rounded o
negative infinity)
LET A = STEP(X) - step function (synonym for FLOOR(X))
LET A = GCD(X1,X2) - greatest common divisor of X1 and X2
3) The following command was added:
LET A = MAD Y - medain absolute deviation
MEDIAN ABSOLUTE DEVIATION is a synonym for MAD. Given a variable
X with median value MED, the MAD is defined as the median of
the absolute value of (X-MED).
The BOOTSTRAP PLOT, JACKNIFE PLOT, STATISTIC PLOT, BLOCK PLOT, and
DEX PLOT commands were modified to support the MAD and AAD
statistics.
4) The PHD command was renamed DEX PHD. In addition, some I/O was
fixed in these routines.
5) Some bugs were fixed in the EDIT command. A few other
miscellaneous bugs were fixed.
7) The following functions were added to the probability library.
LET A = ALPCDF(X,ALPHA,BETA) - alpha cumulative distribution
function
LET A = ALPPDF(X,ALPHA,BETA) - alpha density function
LET A = ALPPPF(X,ALPHA,BETA) - alpha percent point function
LET A = CHCDF(X,NU) - chi cumulative distribution
function
LET A = CHPDF(X,NU) - chi density function
LET A = CHPPF(X,NU) - chi percent point function
LET A = COSCDF(X) - cosine cumulative distribution
function
LET A = COSPDF(X) - cosine density function
LET A = COSPPF(X) - cosine percent point function
LET A = DLGCDF(X,THETA) - logarithmic series cumulative
distribution function
LET A = DLGPDF(X,THETA) - logarithmic series density
function
LET A = DLGPPF(X,THETA) - logarithmic series percent point
function
LET A = GGDCDF(X,ALPHA,C) - generalized gamma cumulative
distribution function
LET A = GGDPDF(X,ALPHA,C) - generalized gamma density function
LET A = GGDPPF(X,ALPHA,C) - generalized gamma percent point
function
LET A = LLGCDF(X,DELTA) - log-logistic cumulative
distribution function
LET A = LLGPDF(X,DELTA) - log-logistic density function
LET A = LLGPPF(X,DELTA) - log-logistic percent point
function
LET A = PLNCDF(X,P,SD) - power lognormal cumulative
distribution function
LET A = PLNPDF(X,P,SD) - power lognormal density function
LET A = PLNPPF(X,P,SD) - power lognormal percent point
function
LET A = PNRCDF(X,P,SD) - power normal cumulative
distribution function
LET A = PNRPDF(X,P,SD) - power normal density function
LET A = PNRPPF(X,P,SD) - power normal percent point function
LET A = POWCDF(X,C) - power function cumulative
distribution function
LET A = POWPDF(X,C) - power function density function
LET A = POWPPF(X,C) - power function percent point
function
LET A = WARCDF(X,C,A) - Waring cumulative distribution
function
LET A = WARPDF(X,C,A) - Waring density function
LET A = WARPPF(P,C,A) - Waring percent point function
LET A = NCTPDF(X,NU,DELTA) - non-central t density function
(density and percent point
functions were added previously)
LET A = TNRPDF(X,A,B) - truncated normal density function
LET A = FNRPDF(X,U,SD) - folded normal density function
The Yule distribution is a special case of the Waring
distribution. Set A to 1 or simply omit the A parameter.
The generalized gamma distribution can handle negative values
for the C parameter (although not zero). Specifically, a value
of C = -1 is the inverted gamma distribution.
In addition, the log-normal cdf, pdf, and ppf functions were
upgraded to handle the standard deviation shape parameter (LGNCDF,
LGNPDF, LGNPPF). This parameter defaults to 1 if not specified.
In addition the following probability plots were added.
COSINE PROBABILITY PLOT Y
LET ALPAHA = <value>
LET BETA = <value>
ALPHA PROBABILITY PLOT Y
LET P = <value>
LET SD = <value> (this parameter optional, defaults to 1)
POWER NORMAL PROBABILITY PLOT Y
LET P = <value>
LET SD = <value> (this parameter optional, defaults to 1)
POWER LOGNORMAL PROBABILITY PLOT Y
LET SD = <value>
LOGNORMAL PROBABILITY PLOT Y
LET C = <value>
POWER FUNCTION PROBABILITY PLOT Y
LET NU = <value>
CHI PROBABILITY PLOT Y
LET THETA = <value>
LOGARITMIC SERIES PROBABILITY PLOT Y
LET DELTA = <value>
LOG LOGISTIC PROBABILITY PLOT Y
LET GAMMA = <value>
LET C = <value>
GENERALIZED GAMMA PROBABILITY PLOT Y
LET A = <value> (can omit for the Yule distribution)
LET C = <value>
GENERALIZED GAMMA PROBABILITY PLOT Y
In addition the following PPCC plots were added.
LET SD = <value> (this parameter optional, defaults to 1)
POWER NORMAL PPCC PLOT Y
LET SD = <value> (this parameter optional, defaults to 1)
POWER LOGNORMAL PPCC PLOT Y
LET SD = <value>
LOGNORMAL PPCC PLOT Y
CHI PPCC PLOT Y
VON MISES PPC PLOT Y
POWER FUNCTION PPCC PLOT Y
LOG LOGISTIC PPCC PLOT Y
In addition the following random number generator was added.
LET C = <value>
LET Y = POWER FUNCTION RANDOM NUMBERS FOR I = 1 1 N
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT NOVEMBER, 1994.
-----------------------------------------------------------------
1) The following mathematical library functions were added:
LET A = FRESNS(X) - Fresnel sine integral
LET A = FRESNC(X) - Fresnel cosine integral
LET A = FRESNF(X) - Fresnel auxillary function f integral
LET A = FRESNG(X) - Fresnel auxillary function g integral
LET A = SN(X,M) - Jacobian elliptic sn function
LET A = CN(X,M) - Jacobian elliptic cn function
LET A = DN(X,M) - Jacobian elliptic dn function
LET A = PEQ(XR,XI) - the real component of the Weirstrass
elliptic function (equianharmomic case)
LET A = PEQI(XR,XI) - the complex component of the Weirstrass
elliptic function (equianharmomic case)
LET A = PEQ1(XR,XI) - the real component of the first
derivative of the Weirstrass elliptic
function (equianharmomic case)
LET A = PEQ1I(XR,XI) - the complex component of the first
derivative of the Weirstrass elliptic
function (equianharmomic case)
LET A = PLEM(XR,XI) - the real component of the Weirstrass
elliptic function (cwlemniscatic case)
LET A = PLEMI(XR,XI) - the complex component of the Weirstrass
elliptic function (lemniscatic case)
LET A = PLEM1(XR,XI) - the real component of the first
derivative of the Weirstrass elliptic
function (lemniscatic case)
LET A = PLEM1I(XR,XI) - the complex component of the first
derivative of the Weirstrass elliptic
function (lemniscatic case)
------------------------------------------------------------
Changes prior to this are no longer in the news file
because they are documented in the Reference Manual and
the on-line help.
------------------------------------------------------------
YOU HAVE JUST ACCESSED THE FILE DPNEWF.
Date created: 06/05/2001
Last updated: 08/30/2023
Please email comments on this WWW page to
alan.heckert@nist.gov.
|
|