![]() |
ASCII FILESThis section provides guidance on reading ASCII data files in Dataplot. This includes discussion of some commands added to the 1/2004 version of Dataplot. In particular, discussion is included for ASCII files created by the Excel program.Dataplot has limited support for binary data files. Currently, only binary files created using Fortran unformatted WRITE are supported. Enter HELP SET READ FORMAT for details. Also, Dataplot does not currently support directly reading files from other statistical/spreadsheet programs or database files. Some support may be provided in future releases, but for now you need to save the data from these programs in an ASCII file in order to read them into Dataplot. XML based data files are becoming increasingly popular as well. At this time, Dataplot does not support XML based data files, although we anticipate looking at this issue for subsequent releases. IDEAL CASE By default, Dataplot assumes rectangular data files containing numeric data where the data columns are separated by one or more spaces, commas, or tabs. In this case, you can read the file with a command like the following:
The first argument after the READ is the name of the ASCII file. The remaining arguments identify the variable names. Variable names can be up to eight characters long and should be limited to alphabetic (A-Z) and numeric (0-9) characters. Although other characters can in fact be used, this is discouraged since their use can cause problems in some contexts. Variable names are not case sensitive (Dataplot converts all alphabetic characters to upper case). Variable names are separated with one or more spaces (commas are not allowed as delimiters in this context). Dataplot recognizes the first argument as a file name if it finds a "." in the name. If no "." is found, Dataplot assumes the first argument is a variable name and it tries to read from the keyboard rather than the file. The remainder of this section discusses various issues that may cause problems when reading ASCII files and provides suggestions on how to deal with these issues. The following topics are discussed:
If you create the ASCII file yourself, it is recommended that you create it with variables of equal length (pick some numeric value to signify missing data) and with data items separated by one or more spaces. Inclusion of a header giving a description of the data file is optional, but we find it helpful (Dataplot can skip over the header lines). When the ASCII files are created by another program (e.g., Excel), then you may have less control over the format of the file. Hopefully, most ASCII files you encounter can be handled using the commands discussed below. VIEWING THE ASCII FILE WITHIN DATAPLOT In order to identify some of the issues discussed below, it is often helpful to view the ASCII file before trying to read it into Dataplot. You can do this with the command
This will list the file 20 lines (you can change the number of lines with the SET LIST LINES command) at a time. You can then enter a carriage return to view the next 20 lines or a "no" to stop viewing the file. For some of the commands given below, you need to either know approriate line numbers or column numbers. To view the file with line numbers, enter the command
To identify appropriate columns, enter the command
This will identify the first 80 columns. HEADER LINES/RESTRICTED ROWS OR COLUMNS Many data files contain header lines at the beginning of the file that provide a description of the file. In order to skip over these lines, enter the command
where N identifies how many lines to skip. Most of the sample data files that are distributed with Dataplot contain a line starting with hyphens ("---"). You can use the command
for these files. Dataplot will skip all lines until a line starting with three or more hypens is encoutered. In a related issue, if you want to restrict the read to certain rows in the file, you can enter the command
with N1 and N2 denoting the first and last rows to read, respectively. You can also restrict the read to certain columns of the file using the command
with C1 denoting the first column to read and C2 the last column to read. When reading from the keyboard, Dataplot restricts a single record to a maximum of 80 columns. When reading from a file, Dataplot previously restricted a single record to a maximum of 132 columns. The March, 2003 version raised the default limit to 255 characters. In addition, the following command was added:
with N denoting the size of the largest record to be read. Dataplot accepts values of N up to 9999. However, be aware that some Fortran compilers may impose their own limit. These limits tend not to be well documented, but with modern compilers they should be sufficiently large that this should not be a problem in practice. If you specify a SET READ FORMAT command (discussed below), you do not need to specify the maximum record length. Dataplot normally reads variable names on the READ command. However, many ASCII files will have the name of the variables given directly in the file or Dataplot can assign the variable names automatically. Specific methods include the following.
Note that Dataplot's usual rules for variable names still apply. That is, a maximum of eight characters will be used and spaces will delimit variable names. The use of special (i.e., not a number and not an alphabetic character) characters is discouraged. You may need to edit the file if the variable names do not follow these rules (more than eight characters will simply be ignored, so the issue is more one of duplicate variable names in the first eight characters). By default, Dataplot performs free format reads. That is, you do not need to line up the columns neatly. You do need to provide one or more spaces (tabs, commas, colons, semi-colons, parenthesis, or brackets can be used as well) between data fields. Many data files will contain fixed fields. There are several reasons you may want or need to take advantage of these fixed fields rather than using a free format read.
There are two basic cases for fixed fields.
READING VARIABLES OF UNEQUAL LENGTH Dataplot normally expects all variables to be of equal length. If some variables have missing rows, this can have undesired results. Dataplot will assign the first value read to the first variable name, the second value to second variable and so on. If fewer values than variables are specified, then variables that have no data values are not read at all (even if they have values for other rows). If you have a data file where the columns have unequal lengths, you can do one of the following things.
READING DATA WITH CHARACTER FIELDS Dataplot has not previously supported character data. The one execption is that you could read row labels with the READ ROW LABEL command (enter HELP READ ROW LABEL for details). If encountered, Dataplot would generate an error message and not read the data file correctly. With the January 2004 version, we have introduced some limited support for character data. Specifically, we have added the command
Setting this to ERROR will continue the current Dataplot action of reporting character data as an error. This is recommended for the case when a file is suppossed to contain only numeric data and the presence of character data is in fact indicative of an error in the data file. Setting this to IGNORE will instruct Dataplot to simply ignore any fields containing character data. This can be useful if you simply want to extract the numeric data fields in the file without entering COLUMN LIMITS or SET READ FORMAT commands. Setting this to ON will read character fields and write them to the file "dpzchf.dat". Note that Dataplot saves numeric data "in memory" for fast access. Since character data has limited use in Dataplot, we have decided to save character data externally to minimize memory requirements. Dataplot keeps a separate name table for the character data fields (the names for character variables are stored in the file "dpzchf.dat"). There are some restrictions on when Dataplot will try to read character data:
Some of these restrictions may be addressed in subsequent releases of Dataplot. Currently, Dataplot has limited support for character variables. Specifically,
We anticipate additional use of character variables in subsequent releases of Dataplot. If your character fields contain non-numeric/non-alphabetic characters, then it is recommended that the character fields be enclosed in quotes. When Dataplot encounters a quote (either a single or double quote), it interprets everything until a matching quote is found as part of that character field. If the quotes are not used, then spaces, tabs, parenthesis, brackets, colons, and semi-colons are interpreted as delimiters that signify the end of that data item. Dataplot assumes a column oriented format. That is, a row of data represents a single record (or case) and a column of data represents a variable. If a data file has a row orientation, then this is reversed. A row of data represents a variable and a column of data represents a record (or case). The following example shows one way of correctly reading the data into Dataplot. Suppose that your data file contains five rows with each row corresponding to a single variable. You can do the following:
SERIAL READ FILE.DAT X^K It is sometimes convenient to include comments in data files. If these comments are contained at the beginning of the file, then the SKIP command can be used. To have Dataplot check for comment lines in the data file, enter the command
The default comment character is a ".". That is, any line starting with a ". " is treted as a comment line and ignored. To specify a different comment character, enter the command
with
At the current time (1/2004), Dataplot does not support the
direct reading of Excel data files. We are planning to add
this capability in a future release of Dataplot. Until that
time, you need to save the data in Excel to an ASCII file and
read that ASCII file into Dataplot.
Excel provides the following options for writing ASCII data
files:
This format will use consistent columns for the data fields.
The variable form of the COLUMN LIMITS command can be used
when the data columns have unequal length.
Character fields will often not have the separating space. The
variable form of the COLUMN LIMITS command can be used in this
case as well.
This format will separate data fields with a single comma.
Missing data is represented with successive commas. Dataplot
can now (as of the January 2004 version) handle this correctly.
These files will separate data fields with a tab character.
Note that Dataplot converts all non-printing characters
(including tabs) to a single space character.
This format is not appropriate for data containing variables
with unequal lengths since it will not generate consistent
columns for the data fields. Use either the space delimited
or comma delimited file for that case.
The 2014/12 version of Dataplot added the capability of reading
and writing to the system clipboard under Windows. Using the
"copy" function and Excel and then using the READ CLIPBOARD command
in Dataplot will in many cases be the easiest way to retrieve
data from Excel files. Enter HELP CLIPBOARD
for details.
A few comments on file names.
If your file name contains a mixture of upper and lower case
characters, then you need to enter the case for the file name
correctly on the READ command.
Dataplot follows the United States convention where the decimal
point is the period ".". Some locales may use a different
character to denote the decimal point. In particular, some
countries use the comma ",".
To allow Dataplot to read files that use a character other than
the "." for the decimal point, enter the command
where <value> denotes the character that specifies the decimal
point.
Note this support is fairly limited. Specifically, it applies
to free-format reads (i.e., no SET READ FORMAT command has been
entered). In addition,
MISSING VALUES AND UNDEFINED NUMBERS
Some software programs will have special characters to denote
missing values or undefined values (e.g., the result of trying
to divide by 0).
In particular, Unix/Linux software often uses "nan" to denote an
undefined number. If Dataplot encounters an "nan" in a numeric
field, it will convert it to the Dataplot "missing value". The "nan"
search is not case sensitive (i.e., it will check for "NAN", "NaN",
etc.). You can specify what Dataplot will use for the missing value
by entering the command
where <value> is a numeric value.
Missing value flags are specific to individual programs. You can
specify a character string that denotes a missing value with the
command
where <value> is a string with 1 to 4 characters. If Dataplot
encounters <value> in a numeric field, it will convert it to the
Dataplot "missing value". The missing value string is not case
sensitive. You can specify what Dataplot will use for the missing
value by entering the command
where <value> is a numeric value.
Privacy
Policy/Security Notice
NIST is an agency of the U.S.
Commerce Department.
Date created: 07/07/2004 |