Data reader

From ASCEND
Jump to: navigation, search
This page documents an experimental feature. Please tell us if you experience any problems.

The Data Reader is an external library function that allows the user to read data from tabular files (see #File formats) to be incorporated as variables of ASCEND models. The syntax allows the user to specify the file location, format and select an arbitrary number of variables to be imported from the data, given an independent variable as a parameter, such as time.

Initially developed for the purpose of reading weather data files for use in solar energy simulations in a similar way to that provided by TRNSYS (see Other modelling tools). More recently, broader applications for the code have been found in reading and interpolating values from tabulated data files, such as those found in the appendices of thermodynamics, fluid mechanics, and heat transfer textbooks.

File formats

At this point, the following file formats are supported:

  • TMY3 weather data files[1] from NREL
  • TMY2 weather data files[2] from NREL
  • ACDB Australian Climate Data Bank files (see models/johnpye/datareader/acdb.c for more information).
  • CSV (comma-separated values, as exported from Excel and OpenOffice and many other programs). There are some important limitations on what can be done using CSV files, see #CSV format below for details.
  • E/E (EnergyPlus/ESP-r format, aka 'EPW' format for weather data)

Interpolation methods

For any given calculation, it is possible that the independent variable may be situated between data points. For example, if the data in the file has been sampled at hourly intervals, and the time step is one minute, then we will need to be able to evaluate the data at many more points than originally given in the input data.

The Data reader supports the following interpolation methods on the loaded data:

  • Linear interpolation
  • Constrained cubic-spline interpolation[3]

By default, the cubic spline method is used, due to its overall smoothness, especially in the vicinity of a data sample. See more information about usage in the Parameters Syntax

Syntax and Usage

A short description of the Data Reader usage will be presented using an existing example in the file jose:models/johnpye/datareader/testtmy.a4c.

Importing the Data Reader Module

IMPORT "johnpye/datareader/datareader";

In the case of the example this is done in line 4.

Configuring the Input parameters of the file

MODEL drconf;
   filename IS_A symbol_constant;
   filename :== 'johnpye/datareader/23161.tm2';

   format IS_A symbol_constant;
   format :== 'TMY2';
   parameters IS_A symbol_constant;

   parameters :== '2:linear,2:cubic,2:default';
END drconf;

The variables in this model have to be named exactly 'filename', 'format', and 'parameters' for the Data Reader to be able to pick them up. In this example it is shown that assigning a value to each one of the symbol constants at this stage is optional.

Filename

filename :== 'johnpye/datareader/23161.tm2';

In this example, the path is declared as a relative path to the ASCEND model library. If the data file does not reside within this directory, it is also possible to declare an absolute file path.

Format

The format that the file has been written in is declared by assigning the format name (e.g 'TMY2', 'ACDB') to the format variable of the drconf model. The permitted formats are seen in the #define FMTS line of models/johnpye/datareader/dr.c.

Parameters

parameters :=='1,7,9';

By listing these column numbers, separated by commas, the user is specifying that the first model variable is the 1st column of the data file, the second model variable is the 7th and the third model variable is the 9th column. The model variables are as per the section Declaring Variables.

If the columns in the parameters string are in a different order, or even repeated, that is the way that they will be assigned to the variables. As per the TMY2 example, the same column 2 is assigned to three different declared variables.

If the user declares less column assignments, the remaining column assignments will be filled with default numbering starting from the first column.

parameters :== '2:linear,2:cubic,2:default';

In this example, the user requires that the first variable is the second column of the data file, interpolated linearly, the second variable is the second data column, using the constrained cubic spline algorithm and the third variable is again the second data column, using the default interpolation algorithm. By default, the cubic algorithm is used, and this is indicated either by specifying 'default' or nor specifying an interpolation algorithm as before.

TODO document how the column containing the independent variable is specified, particularly for the CSV format. Can columns other than the first one be used?

Declaring Inputs and Output Variables

tmydata IS_A drconf;

This allows the main model to pass all the necessary parameters to to the Data Reader. Every time ASCEND is required to solve the model, it can retrieve the values of these variables first, aiding the solving process of the other model variables.

my_solar_data:datareader(
   t : INPUT;
   Gbnl,Gbnc,Gbns :OUTPUT;
   tmydata : DATA
);

For this example to work 't','Gbnl', 'Gbnc' and 'Gbns' must have been previously declared as the main model variables, with statements such as 't IS_A time;'. In the example, 't' has been declared as an input. 'Gbnl', 'Gbnc' and 'Gbns' have been declared as outputs, and the instance of the Data Reader link in the main model 'tmydata' has been declared as containing additional data.

The total number of OUTPUT variables in this declaration must not exceed the maximum number of columns available in the data file. In most cases, there will be an injective(i.e. one to one) relationship between data columns and model variables and this prevents declaring more variables than columns.

Examples

See:

Implementation

The Data reader works in ASCEND using the external relation API. This means that the user is essentially telling ASCEND that there is a relation which can evaluate a set of outputs using an (currently just one) input. In practise, the input value is the 'independent' variable from the data file, and the output values are selected from the other data columns in the data file.

Data files are read using a file-format specific API that includes (a) reading the headers (b) reading the data rows and (c) testing for end-of-file. This API is kept as small as possible to allow new file formats to be added easily. Current we support CSV data as well as two weather data file formats.

Interpolation is performed using either linear or constrained cubic spline interpolation, which is performed in code independent from the input file format.

Further Work

A significant concern with the current data reader as applied to solar radiation data is that normally data files such as TMY3 format have total solar radiation over the preceeding hour, together with instantaneous temperature and pressures, etc. If we interpolate, for example using constrained cubic interpolation, from that data, there is no guarantee that the integral of the solar energy will equal the value reported from the weather data. One solution to this is offered by Rymes and Myers[4], which may be suitable for implementation in our data reader.

We plan to take this further by connecting the data reader with a sun position algorithm, because it is important when interpolating values of solar flux to take into accord the time when the sun rises and sets (see calculation of sun position). This algorithm should become available as an interpolation option.

This code is still under development. In particular, use of this code exposed a limitation in IDA when 'integrating' models that don't have any derivatives in them (see User:Leon about this).

Also under current development and possibly of interest: http://mesor.org/

CSV format

In the 0.9.7 release of ASCEND, the following limits applied to CSV files imported

  • The first column must contain your independent variable, in base SI units. For example, this might be time in seconds, or temperature in Kelvin.
  • All columns must be in base SI units, unless you're manually adding scaling factors as relations in your model.
  • Each line in the file is currently limited to 9999 characters, including commas.
  • Values in the first column must be sorted and monotonically increasing.
  • Cubic interpolation requires data values in the first column to be uniformly spaced.
  • Data rows may contain only numerical data, no other 'words'.

Work is currently active (Jun 2010) on improving support for CSV files. Active branch is csv2:models/johnpye/datareader. So far, we have added:

  • Support for data files with comment lines and header lines
  • Recognition of, and scaling according to, units of measurement found in header lines of format "COLUMN NAME / [units of measurement]".
  • Allow comment lines to be inserted in the data
  • Tolerate blank lines in the data
  • Tolerate extra data columns added to the right of the main data
  • Data values may be surrounded by quotes, but will still be interpreted as numerical (useful for scanned data tables)

Further improvements for the CSV format that we'd like to achieve include:

  • Support for delimiters such as tab, semicolon, etc. (propose to allow an optional 'delimiter' field in the DATA instance, which somehow the datareader would allow the format-specific CSV code to query)
  • Support for manually-specified scaling factors using DATA instance (maybe via a 'fields' element in the DATA instance?)
  • Permit comments added at end of line (parsing of lines would just stop if a # found outside a quoted string)
  • Permit strings to be present within numerical data (parsing a string would result in a 0 in the data)
  • Permit manually-specified data range within a file (##)
  • Apply scaling on independent variable column? (check this... it might already be happening?)
  • Allow independent column to be other than the first column (relates to ## above)
  • Permit reading of Celsius degree data (could we use °C for true temperatures and °Cd for temperature differences?)
  • Permit column names from header fields to be used instead of header numbers
  • Refactor datareader code for stand-alone use, not depending on libascend (requires locally-maintained error/output hooks, memory management functions, and move ospath stuff from dr.c into datareader.c)
  • Permit missing data values to be interpolated?
  • Make sure we're dealing with unicode/ISO88591 etc correctly?
  • Make calls to datareader_set_parameters before datareader_init, so that when reading the datafile, we can (optionally) avoid storing columns that we don't actually need. Then, if parameters are not set, we just use the number of output columns to form a default column-to-output mapping.
  • Test suite to assure correct detection of errors from various mal-formed CSV files (partially completed)

E/E format

The US DOE developers of the building simulation package, EnergyPlus have released a wide range of weather data files for locations around the world in a format called "E/E" (short for EnergyPlus/EPS-r format) or "EPW" (short for EnergyPlus Weather). Details are here as well as in PDF files embedded within the EnergyPlus installer packager (a gratis download ~70 MB).

The EnergyPlus/ESP-r text-based data format, also know as 'EPW' format (=EnergyPlus Weather?) is a weather data format that also also some metadata associated with building heating/cooling practices to be embedded within it. It also includes information about data sources, uncertainty, etc, and some markup providing 'typical' and 'extreme' periods within the data set. It can handle data with varying time-steps, but becomes a little ambiguous when TMY data is provided, because data for one year can follow immediately after data for another year, with the result that it is unclear what to do at the time/temperature discontinuities.

Initial support for EnergyPlus weather files is now provided in the ASCEND trunk. It is not thoroughly tested yet, so please give us feedback. Some already-know issues include:

  • can't read data files with other than 8760 rows (need to count rows first)
  • there are likely to be problems with data files that represent multiple years of 'real' data because year-data needs to be ignored for RMY/TMY-derived data.
  • data for solar irradiation is provided in the data file as Wh/m² for the time period of interest, but converting that to an instantaneous reading requires knowing the timestep length, which in turn requires us to look back to the time of the previous timestep (which could be in another year for TMY data!) so we haven't yet workout out a solution for this.

See models/johnpye/datareader/ee.c and models/johnpye/datareader/energyplus.a4c, bug 512.

References

  1. NREL, 2008, National Solar Radiation Data Base 1991- 2005 Update: Typical Meteorological Year 3, http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/, accessed 18 Apr 2012.
  2. NREL, 1995, User's Manual for TMY2s: Typical Meteorological Years, http://rredc.nrel.gov/solar/old_data/nsrdb/1961-1990/tmy2/, accessed 18 Apr 2012.
  3. C J C Kruger, 2002, Constrained Cubic Spline Interpolation for Chemical Engineering Applications, http://www.korf.co.uk/spline.pdf, accessed 17 Apr 2012
  4. M D Rymes and D R Myers, 2001. Mean preserving algorithm for smoothly interpolating averaged data, Solar Energy 71 pp225–231 doi:10.1016/S0038-092X(01)00052-4.

See also