SED navigation bar go to SED home page go to Dataplot home page go to NIST home page SED Home Page SED Staff SED Projects SED Products and Publications Search SED Pages
Dataplot Vol 1 Vol 2

READ EXCEL

Name:
    READ EXCEL
Type:
    Support Command
Purpose:
    Read variables (vectors) from an Excel file.
Description:
    Spreadsheet programs are a popular method for saving data. Most spreadsheet programs support the Microsoft Excel file format.

    Dataplot does not natively support reading or writing to Excel format files. The READ EXCEL command works as follows

    1. Dataplot writes the name of the Excel file to line 1 of the file "dpst5f.dat". It writes the name of the Excel sheet to line 2 of "dpst5f.dat". The default sheet name is "Sheet1". To change the sheet name, enter the command

        SET EXCEL SHEET <sheet-name>

    2. Dataplot then invokes a Python script to read the variables from the specified Excel file and then writes the variables to the file "dpst1f.dat".

      The Python script, "read_excel.py", is located in the "scripts" subdirectory of the Dataplot auxiliary directory. This script uses the Pandas function "dataframe.read_excel" to read the data from the Excel file. It then uses the Pandas function "pandas.to_csv" to write the variables as a comma separated file (CSV) to the file "dpst1f.dat".

    3. Dataplot reads the contents of the variables from the file "dpst1f.dat".

    This command assumes that Python and the Python package Pandas are already installed on your local platform. Dataplot does not check if Python is installed and it does not initiate the Python installation if it is not already installed.

    As Python is used by many popular applications on Linux platforms, most Linux platforms will already have Python installed. However, this is not the case for Windows and MacOS platforms.

    If you need to install Python, there are a number of Python distributions (see https://wiki.python.org/moin/PythonDistributions). However, the most common are ActivePython from ActiveState and Anaconda from Continuum Analytics. Dataplot does not depend on a specific Python distribution and we make no recommendation for the preferred distribution.

    This command will typically work well with Excel files that are clean rectangular data sets. If you have added graphs, formulas, cross-tabulations, and so on to the Excel sheet, then the READ EXCEL command is likely to be unsatisfactory. You can either copy the data portion of the spreadsheet to a new sheet or use one of the other methods for reading data from spreadsheets listed below.

    Broman and Woo (see References below) wrote a useful article in The American Statistician that provides useful guidelines for creating Excel files that can be more easily read by non-Excel programs.

    If you infrequently need to read Excel files and do not already have Python installed on your local platform, there are several additional methods for reading the data from Excel files. However, if you anticipate the need for frequently reading Excel files, then going through the Python installation is probably worth the effort.

    In addition to READ EXCEL, the following methods can also be used to read data from Excel files.

    1. Spreadsheets typically support writing data to ASCII files. For example, Excel can write formatted text (space delimited) (.PRN extension), comma delimited files (.CSV extension), or tab delimited files (.TXT extension). Enter HELP READ for details of this.

      As with READ EXCEL, this option works best with clean rectangular data sets. If you do not have a clean Excel sheet, you will probably need to manually edit the ASCII file before trying to read it with Dataplot. However, it is a viable alternative to READ EXCEL when you do not have Python installed on your platform and do not wish to install it.

    2. You can also use Copy and Paste. On Windows platforms, you can Copy the desired data to the clipboard and then use the READ CLIPBOARD command. Alternatively, you can paste the data into an ASCII editor (e.g., Notepad or Wordpad on Windows, vi or Emacs on Linux), save the file and then use the Dataplot READ command.

      This option is particularly useful when your spreadsheet is not a clean rectangular data set.

Syntax:
    READ EXCEL <fname> <variable list>
                            <SUBSET/EXCEPT/FOR qualification>
    where <fname> is the name of the Excel file;
                <variable list> is the list of variables to read;
    and where the <SUBSET/EXCEPT/FOR qualification> is optional.

    Note that Pandas adds a "row index" variable as the first column in the CSV file. Be sure to add this variable to <variable list>.

Examples:
    READ EXCEL FILE.XLSX Y1 Y2 Y3 X
Note:
    The Pandas pandas.to_csv function will typically add a row id variable to the ASCII file.
Note:
    The python script will currently read the entire spreadsheet. Additional commands to specify columns and rows in the spreadsheet to read are under development but not yet available.

    Also, Dataplot will not search for the Excel file in the Dataplot auxiliary files directories as it does for the READ for ASCII files.

    Note: The 2020/05 version of Dataplot now supports specifying the first and last rows of the Excel file to read with the following commands

      SET EXCEL START ROW <value>
      SET EXCEL STOP ROW <value>

    Python/Pandas start row numbers with 0. Also, the first line read is assumed to contain the variable names. So if the first good line is row 3 in the Excel file, use

      SET EXCEL START ROW 2

    Currently you need to specify both the START ROW and the STOP row. Otherwise, Pandas will include all rows in the Excel sheet.

    One use of this is to skip over header lines in the Excel file.

Note:
    Pandas by default assumes that the first row of the Excel file is a header line containing the variable names. If your Excel file does not contain a header line, enter the command

      SET EXCEL HEADER NONE

    To reset the default of a header line, enter the command

      SET EXCEL HEADER ON

    This command was added 2021/07. The "read_python.py" script was also updated at this time to support this option.

Note:
    Excel (or some other spreadsheet program) does not need to be installed on your local platform. For example, you can create the Excel file on a Windows platform and then move the Excel file to a Linux platform.
Note:
    If Python is not installed on your default path, you can specify it using the SET PYTHON PATH command. For example, the following is for the Anaconda installation of Python 3 under Windows (where Anaconda is installed for the single user heckert)

      set python path c:\Users\heckert\AppData\Local\Continum\anaconda3\

    There are several different Python distributions. The appropriate Python path will depend on the specific distribution you used to install Python and whether you choose to install it for a single user or for all users.

Note:
    On Windows platforms, the "read_excel.py" script will be copied to the current directory.
Default:
    None
Synonyms:
    None
Related Commands:
    WRITE EXCEL = Write variables to an Excel file.
    PYTHON = Run a user specified Python script from within Dataplot.
    READ = Read variables, strings, parameters and matrices.
References:
    McKinney (2018), "Python for Data Analysis," Second Edition, O'Reilly.

    Chen (2018), "Pandas for Everyone: Python Data Analysis," O'Reilly.

    Broman and Woo (2018), "Data Organization in Spreadsheets," The American Statistician, Vol. 72, No. 1, pp. 2-10.

Applications:
    Data Input
Implementation Date:
    2020/02
    2020/05: Added SET EXCEL START ROW and SET EXCEL STOP ROW
    2021/07: Added SET EXCEL HEADER
Program 1:
     
    dimension 40 columns
    . Path name for Windows
    . read excel "C:\Program Files (x86)\NIST\DATAPLOT\LIB\DATA\FIBERS.XLSX" ...
    .     rowid breakid xcent ycent xc1 yc1 xc2 yc2 xc3 yc3 xc4 yc4 length
    . Path name for Linux
    read excel /usr/local/lib/data/FIBERS.XLSX ...
         rowid breakid xcent ycent xc1 yc1 xc2 yc2 xc3 yc3 xc4 yc4 length
    set write decimals 3
    print breakid xcent length
        
    The following output is generated
    ---------------------------------------------
            BREAKID          XCENT         LENGTH
    ---------------------------------------------
              1.000       3924.447          0.000
              2.000       7230.652       3260.000
              3.000       7825.889        556.000
              4.000       9600.147       1728.000
              5.000      11662.307       2013.000
              6.000      12588.740        880.000
              7.000      13559.739        927.000
              8.000      14799.128       1194.000
              9.000      15984.171       1141.000
             10.000      16951.411        922.000
             11.000      19179.848       2177.000
             12.000      20234.844       1009.000
             13.000      21569.521       1287.000
             14.000      22745.419       1133.000
             15.000      23840.270       1052.000
             16.000      24850.845        962.000
             17.000      25998.789       1103.000
             18.000      26910.859        867.000
             19.000      28470.836       1517.000
             20.000      29140.799        623.000
             21.000      30514.016       1327.000
             22.000      31554.430        995.000
             23.000      32368.668        766.000
             24.000      33224.684        808.000
             25.000      34101.653        829.000
             26.000      35314.130       1165.000
             27.000      36619.810       1260.000
             28.000      37353.368        684.000
             29.000      38151.635        747.000
             30.000      39244.924       1048.000
             31.000      39919.396        629.000
        
Program 2:
     
    . Step 1: READ EXCEL does not search Dataplot sub-directories, so need
    .         to pre-pend the appropriate sub-directory to Excel file name.
    .
    probe path
    let string ipath = ^probestr
    probe iopsy1
    if probeval = 2
       let string islash = \
    else
       let string islash = /
    end of if
    let string subdir = data
    let ipath = string concatenate ipath subdir islash
    .
    let string fname = TEST.XLSX
    let fname = string concatenate ipath fname
    .
    . Step 2: Read the Excel file
    .
    set excel start row 1
    set excel stop  row 77
    read excel ^fname rowid labid y1 y2
    .
    . Step 3: Print the first few lines of the Excel file
    .
    set write decimals 2
    print rowid labid y1 y2 for i = 1 1 10
        
     ------------------------------------------------------------
               ROWID          LABID             Y1             Y2
     ------------------------------------------------------------
                0.00      101014.00          10.25           5.92
                1.00      101031.00          16.62           7.00
                2.00      101032.00          13.75           5.75
                3.00      101048.00          10.25           3.57
                4.00      101125.00           7.10           4.37
                5.00      101130.00          10.00           4.60
                6.00      101143.00          15.13           5.80
                7.00      101147.00           8.35           3.80
                8.00      101151.00           7.37           5.00
                9.00      101165.00           8.15           5.10
      
        

Privacy Policy/Security Notice
Disclaimer | FOIA

NIST is an agency of the U.S. Commerce Department.

Date created: 02/20/2020
Last updated: 07/12/2021

Please email comments on this WWW page to alan.heckert@nist.gov.