READ EXCEL

Name:

READ EXCEL Type:

Support Command Purpose:

Read variables (vectors) from an Excel file. Description:

Dataplot does not natively support reading or writing to Excel format files. The READ EXCEL command works as follows

Dataplot writes the name of the Excel file to line 1 of the file "dpst5f.dat". It writes the name of the Excel sheet to line 2 of "dpst5f.dat". The default sheet name is "Sheet1". To change the sheet name, enter the command
Dataplot then invokes a Python script to read the variables from the specified Excel file and then writes the variables to the file "dpst1f.dat".
The Python script, "read_excel.py", is located in the "scripts" subdirectory of the Dataplot auxiliary directory. This script uses the Pandas function "dataframe.read_excel" to read the data from the Excel file. It then uses the Pandas function "pandas.to_csv" to write the variables as a comma separated file (CSV) to the file "dpst1f.dat".
Dataplot reads the contents of the variables from the file "dpst1f.dat".

This command assumes that Python and the Python package Pandas are already installed on your local platform. Dataplot does not check if Python is installed and it does not initiate the Python installation if it is not already installed.

As Python is used by many popular applications on Linux platforms, most Linux platforms will already have Python installed. However, this is not the case for Windows and MacOS platforms.

If you need to install Python, there are a number of Python distributions (see https://wiki.python.org/moin/PythonDistributions). However, the most common are ActivePython from ActiveState and Anaconda from Continuum Analytics. Dataplot does not depend on a specific Python distribution and we make no recommendation for the preferred distribution.

This command will typically work well with Excel files that are clean rectangular data sets. If you have added graphs, formulas, cross-tabulations, and so on to the Excel sheet, then the READ EXCEL command is likely to be unsatisfactory. You can either copy the data portion of the spreadsheet to a new sheet or use one of the other methods for reading data from spreadsheets listed below.

Broman and Woo (see References below) wrote a useful article in The American Statistician that provides useful guidelines for creating Excel files that can be more easily read by non-Excel programs.

If you infrequently need to read Excel files and do not already have Python installed on your local platform, there are several additional methods for reading the data from Excel files. However, if you anticipate the need for frequently reading Excel files, then going through the Python installation is probably worth the effort.

In addition to READ EXCEL, the following methods can also be used to read data from Excel files.

Spreadsheets typically support writing data to ASCII files. For example, Excel can write formatted text (space delimited) (.PRN extension), comma delimited files (.CSV extension), or tab delimited files (.TXT extension). Enter HELP READ for details of this.
As with READ EXCEL, this option works best with clean rectangular data sets. If you do not have a clean Excel sheet, you will probably need to manually edit the ASCII file before trying to read it with Dataplot. However, it is a viable alternative to READ EXCEL when you do not have Python installed on your platform and do not wish to install it.
You can also use Copy and Paste. On Windows platforms, you can Copy the desired data to the clipboard and then use the READ CLIPBOARD command. Alternatively, you can paste the data into an ASCII editor (e.g., Notepad or Wordpad on Windows, vi or Emacs on Linux), save the file and then use the Dataplot READ command.
This option is particularly useful when your spreadsheet is not a clean rectangular data set.

Syntax:

Note that Pandas adds a "row index" variable as the first column in the CSV file. Be sure to add this variable to <variable list>.

Examples:

Note:

The Pandas pandas.to_csv function will typically add a row id variable to the ASCII file. Note:

Also, Dataplot will not search for the Excel file in the Dataplot auxiliary files directories as it does for the READ for ASCII files.

Note: The 2020/05 version of Dataplot now supports specifying the first and last rows of the Excel file to read with the following commands

Python/Pandas start row numbers with 0. Also, the first line read is assumed to contain the variable names. So if the first good line is row 3 in the Excel file, use

SET EXCEL START ROW 2

Currently you need to specify both the START ROW and the STOP row. Otherwise, Pandas will include all rows in the Excel sheet.

One use of this is to skip over header lines in the Excel file.

Note:

SET EXCEL HEADER NONE

To reset the default of a header line, enter the command

SET EXCEL HEADER ON

This command was added 2021/07. The "read_python.py" script was also updated at this time to support this option.

Note:

Excel (or some other spreadsheet program) does not need to be installed on your local platform. For example, you can create the Excel file on a Windows platform and then move the Excel file to a Linux platform. Note:

set python path c:\Users\heckert\AppData\Local\Continum\anaconda3\

There are several different Python distributions. The appropriate Python path will depend on the specific distribution you used to install Python and whether you choose to install it for a single user or for all users.

Note:

On Windows platforms, the "read_excel.py" script will be copied to the current directory. Default:

None Synonyms:

None Related Commands:

WRITE EXCEL	=	Write variables to an Excel file.
PYTHON	=	Run a user specified Python script from within Dataplot.
READ	=	Read variables, strings, parameters and matrices.

References:

O'Reilly

Chen (2018), "Pandas for Everyone: Python Data Analysis," O'Reilly.

Broman and Woo (2018), "Data Organization in Spreadsheets," The American Statistician, Vol. 72, No. 1, pp. 2-10.

Applications:

Data Input Implementation Date:

Program 1:

 
dimension 40 columns
. Path name for Windows
. read excel "C:\Program Files (x86)\NIST\DATAPLOT\LIB\DATA\FIBERS.XLSX" ...
.     rowid breakid xcent ycent xc1 yc1 xc2 yc2 xc3 yc3 xc4 yc4 length
. Path name for Linux
read excel /usr/local/lib/data/FIBERS.XLSX ...
     rowid breakid xcent ycent xc1 yc1 xc2 yc2 xc3 yc3 xc4 yc4 length
set write decimals 3
print breakid xcent length

---------------------------------------------
        BREAKID          XCENT         LENGTH
---------------------------------------------
          1.000       3924.447          0.000
          2.000       7230.652       3260.000
          3.000       7825.889        556.000
          4.000       9600.147       1728.000
          5.000      11662.307       2013.000
          6.000      12588.740        880.000
          7.000      13559.739        927.000
          8.000      14799.128       1194.000
          9.000      15984.171       1141.000
         10.000      16951.411        922.000
         11.000      19179.848       2177.000
         12.000      20234.844       1009.000
         13.000      21569.521       1287.000
         14.000      22745.419       1133.000
         15.000      23840.270       1052.000
         16.000      24850.845        962.000
         17.000      25998.789       1103.000
         18.000      26910.859        867.000
         19.000      28470.836       1517.000
         20.000      29140.799        623.000
         21.000      30514.016       1327.000
         22.000      31554.430        995.000
         23.000      32368.668        766.000
         24.000      33224.684        808.000
         25.000      34101.653        829.000
         26.000      35314.130       1165.000
         27.000      36619.810       1260.000
         28.000      37353.368        684.000
         29.000      38151.635        747.000
         30.000      39244.924       1048.000
         31.000      39919.396        629.000

Program 2:

 
. Step 1: READ EXCEL does not search Dataplot sub-directories, so need
.         to pre-pend the appropriate sub-directory to Excel file name.
.
probe path
let string ipath = ^probestr
probe iopsy1
if probeval = 2
   let string islash = \
else
   let string islash = /
end of if
let string subdir = data
let ipath = string concatenate ipath subdir islash
.
let string fname = TEST.XLSX
let fname = string concatenate ipath fname
.
. Step 2: Read the Excel file
.
set excel start row 1
set excel stop  row 77
read excel ^fname rowid labid y1 y2
.
. Step 3: Print the first few lines of the Excel file
.
set write decimals 2
print rowid labid y1 y2 for i = 1 1 10

 ------------------------------------------------------------
           ROWID          LABID             Y1             Y2
 ------------------------------------------------------------
            0.00      101014.00          10.25           5.92
            1.00      101031.00          16.62           7.00
            2.00      101032.00          13.75           5.75
            3.00      101048.00          10.25           3.57
            4.00      101125.00           7.10           4.37
            5.00      101130.00          10.00           4.60
            6.00      101143.00          15.13           5.80
            7.00      101147.00           8.35           3.80
            8.00      101151.00           7.37           5.00
            9.00      101165.00           8.15           5.10