Data reader: Difference between revisions

Revision as of 06:34, 8 June 2010

This page documents an experimental feature. Please tell us if you experience any problems.

The Data Reader is an external library function that allows the user to read data from tabular files (see #File formats) to be incorporated as variables of ASCEND models. The syntax allows the user to specify the file location, format and select an arbitrary number of variables to be imported from the data, given an independent variable as a parameter, such as time. The normal intended application for the data reader was for reading weather data, so that simulations of solar energy systems could be performed in a similar way to that provided by TRNSYS (see Other modelling tools).

File formats

At this point, the following file formats are supported:

TMY2 weather data files, (as described by NREL)
ACDB Australian Climate Data Bank files (see models/johnpye/datareader/acdb.c for more information).
CSV (comma-separated values, as exported from Excel and OpenOffice and many other programs)

There are some limitations on what can be done using CSV files, see #CSV format below for details.

Interpolation Methods

For a given simulation, it is possible that the independent variable can be situated between data points. For example, if the data in the file has been sampled at hourly intervals and the time step is one minute.

The Data Reader external function supports the following interpolation methods, to fill these gaps:

Linear interpolation
Constrained cubic spline interpolation (more information)

By default, the cubic spline method is used, due to its overall smoothness, especially in the vicinity of a data sample. See more information about usage in the Parameters Syntax

Syntax and Usage

A short description of the Data Reader usage will be presented using an existing example in the file jose:models/johnpye/datareader/testtmy.a4c.

Importing the Data Reader Module

IMPORT "johnpye/datareader/datareader";

In the case of the example this is done in line 4.

Configuring the Input parameters of the file

MODEL drconf;
   filename IS_A symbol_constant;
   filename :== 'johnpye/datareader/23161.tm2';

   format IS_A symbol_constant;
   format :== 'TMY2';
   parameters IS_A symbol_constant;

   parameters :== '2:linear,2:cubic,2:default';
END drconf;

The variables in this model have to be named exactly 'filename', 'format', and 'parameters' for the Data Reader to be able to pick them up. In this example it is shown that assigning a value to each one of the symbol constants at this stage is optional.

Filename

filename :== 'johnpye/datareader/23161.tm2';

In this example, the path is declared as a relative path to the ASCEND model library. If the data file does not reside within this directory, it is also possible to declare an absolute file path.

Format

The format that the file has been written in is declared by assigning the format name (e.g 'TMY2', 'ACDB') to the format variable of the drconf model.

Parameters

parameters :=='1,7,9';

By listing these column numbers, separated by commas, the user is specifying that the first model variable is the 1st column of the data file, the second model variable is the 7th and the third model variable is the 9th column. The model variables are as per the section Declaring Variables.

If the columns in the parameters string are in a different order, or even repeated, that is the way that they will be assigned to the variables. As per the TMY2 example, the same column 2 is assigned to three different declared variables.

If the user declares less column assignments, the remaining column assignments will be filled with default numbering starting from the first column.

parameters :== '2:linear,2:cubic,2:default';

In this example, the user requires that the first variable is the second column of the data file, interpolated linearly, the second variable is the second data column, using the constrained cubic spline algorithm and the third variable is again the second data column, using the default interpolation algorithm. By default, the cubic algorithm is used, and this is indicated either by specifying 'default' or nor specifying an interpolation algorithm as before.

Declaring Inputs and Output Variables

tmydata IS_A drconf;

This allows the main model to pass all the necessary parameters to to the Data Reader. Every time ASCEND is required to solve the model, it can retrieve the values of these variables first, aiding the solving process of the other model variables.

my_solar_data:datareader(
   t : INPUT;
   Gbnl,Gbnc,Gbns :OUTPUT;
   tmydata : DATA
);

For this example to work 't','Gbnl', 'Gbnc' and 'Gbns' must have been previously declared as the main model variables, with statements such as 't IS_A time;'. In the example, 't' has been declared as an input. 'Gbnl', 'Gbnc' and 'Gbns' have been declared as outputs, and the instance of the Data Reader link in the main model 'tmydata' has been declared as containing additional data.

The total number of OUTPUT variables in this declaration must not exceed the maximum number of columns available in the data file. In most cases, there will be an injective(i.e. one to one) relationship between data columns and model variables and this prevents declaring more variables than columns.

Examples

See:

models/johnpye/datareader/testcsv.a4c a trivial example
models/johnpye/datareader/testairprops.a4c for a model that interpolates air transport properties read from a file

Implementation

The Data reader works as in ASCEND using the external relation API. This means that the user is essentially telling ASCEND that there is a relation which can evaluate a set of outputs using an (currently just one) input. In practise, the input value is the 'independent' variable from the data file, and the output values are selected from the other data columns in the data file.

Data files are read using a file-format specific API that includes (a) reading the headers (b) readering the data rows and (c) testing for end-of-file. This API is kept as small as possible to allow new file formats to be added easily. Current we support CSV data as well as two weather data file formats.

Interpolation is performed using either linear or constrained cubic spline interpolation, which is performed in code independent from the input file format.

Further Work

We plan to take this further by connecting the data reader with a sun position algorithm, because it is important when interpolating values of solar flux to take into accord the time when the sun rises and sets (see the models/johnpye/sunpos.a4c model file in the ModelLibrary). This algorithm will become available as an interpolation option.

This code is still under development. In particular, use of this code exposed a limitation in IDA when 'integrating' models that don't have any derivatives in them.

CSV format

Work is currently active (Jun 2010) on improving support for CSV files, so stay tuned. In the 0.9.7 release of ASCEND, the following limits applied to CSV files imported

The first column must contain your independent variable, in base SI units. For example, this might be time in seconds, or temperature in Kelvin.
All columns must be in base SI units, unless you're manually adding scaling factors as relations in your model.
Each line in the file is currently limited to 9999 characters, including commas.
Values in the first column must be sorted and monotonically increasing.
Cubic interpolation requires data values in the first column to be uniformly spaced.
Data rows may contain only numerical data, no other 'words'.

New work in csv2:models/johnpye/datareader/csv.c is enhancing support for CSV files. So far, we have added:

Support for data files with comment lines and header lines
Recognition of, and scaling according to, units of measurement found in header lines of format "COLUMN NAME / [units of measurement]".
Allow comment lines to be inserted in the data
Tolerate blank lines in the data
Tolerate extra data columns added to the right of the main data

Further improvements for the CSV format that we'd like to achieve include:

Support for delimiters such as tab, semicolon, etc.
Support for manually-specified scaling factors using DATA instance
Permit comments added at end of line
Permit manually-specified data range within a file (##)
Apply scaling on independent variable column?
Allow independent column to be other than the first column (relates to ## above)
Permit string values within data rows
Permit reading of Celsius degree data

@@ Line 1: / Line 1: @@
 {{experimental}}
-== Description ==
+The Data Reader is an external library function that allows the user to read data from tabular files (see [[#File formats]]) to be incorporated as variables of ASCEND models. The syntax allows the user to specify the file location, format and select an arbitrary number of variables to be imported from the data, given an independent variable as a parameter, such as time. The normal intended application for the data reader was for reading weather data, so that simulations of solar energy systems could be performed in a similar way to that provided by TRNSYS (see [[Other modelling tools]]).
-The Data Reader is an external library function that allows the user to read data from tabular files (see <a href="#Supported_File_Formats" title="">File Formats</a>) to be incorporated as variables of ASCEND models. The syntax allows the user to specify the file location, format and select an arbitrary number of variables to be imported from the data, given an independent variable as a parameter, such as time. The normal intended application for the data reader was for reading weather data, so that simulations of solar energy systems could be performed in a similar way to that provided by TRNSYS (see [[Other modelling tools]]).
+== File formats ==
-== Supported File Formats ==
 At this point, the following file formats are supported:
@@ Line 10: / Line 8: @@
 * TMY2 weather data files, (as [http://rredc.nrel.gov/solar/old_data/nsrdb/tmy2/ described] by NREL)
 * ACDB Australian Climate Data Bank files (see {{src|models/johnpye/datareader/acdb.c}} for more information).
-* CSV comma separated values.
+* [[#CSV format|CSV]] (comma-separated values, as exported from Excel and OpenOffice and many other programs)
-Limitations apply to the CSV-formatted files:
-* The '''first column must contain your independent variable, in base SI units'''. For example, this might be time in seconds, or temperature in Kelvin.
-* Each line in the file is currently limited to 9999 characters, including commas.
-* Independent variable values must be sorted and monotonically increasing.
-The [[#See also|See also]] section contains a link to an example model using a CSV data file.
+There are some limitations on what can be done using CSV files, see [[#CSV format]] below for details.
-== Supported Interpolation Methods ==
+== Interpolation Methods ==
 For a given simulation, it is possible that the independent variable can be situated between data points. For example, if the data in the file has been sampled at hourly intervals and the time step is one minute.
@@ Line 106: / Line 98: @@
 See:
+* {{src|models/johnpye/datareader/testcsv.a4c}} a trivial example
 * {{src|models/johnpye/datareader/testairprops.a4c}} for a model that interpolates air transport properties read from a file
@@ Line 121: / Line 114: @@
 This code is still under development. In particular, use of this code exposed a limitation in [[IDA]] when 'integrating' models that don't have any derivatives in them.
+== CSV format ==
+Work is currently active (Jun 2010) on improving support for CSV files, so stay tuned. In the 0.9.7 release of ASCEND, the following limits applied to CSV files imported
+* The '''first column must contain your independent variable, in base SI units'''. For example, this might be time in seconds, or temperature in Kelvin.
+* All columns must be in base SI units, unless you're manually adding scaling factors as relations in your model.
+* Each line in the file is currently limited to 9999 characters, including commas.
+* Values in the first column must be sorted and monotonically increasing.
+* Cubic interpolation requires data values in the first column to be uniformly spaced.
+* Data rows may contain only numerical data, no other 'words'.
+New work in {{srcbranch|csv2|models/johnpye/datareader/csv.c}} is enhancing support for CSV files. So far, we have added:
+* Support for data files with comment lines and header lines
+* Recognition of, and scaling according to, units of measurement found in header lines of format "COLUMN NAME / [units of measurement]".
+* Allow comment lines to be inserted in the data
+* Tolerate blank lines in the data
+* Tolerate extra data columns added to the right of the main data
+Further improvements for the CSV format that we'd like to achieve include:
+* Support for delimiters such as tab, semicolon, etc.
+* Support for manually-specified scaling factors using DATA instance
+* Permit comments added at end of line
+* Permit manually-specified data range within a file (##)
+* Apply scaling on independent variable column?
+* Allow independent column to be other than the first column (relates to ## above)
+* Permit string values within data rows
+* Permit reading of Celsius degree data
 ==See also==

Data reader: Difference between revisions

Revision as of 06:34, 8 June 2010

Contents

File formats

Interpolation Methods

Syntax and Usage

Importing the Data Reader Module

Configuring the Input parameters of the file

Filename

Format

Parameters

Declaring Inputs and Output Variables

Examples

Implementation

Further Work

CSV format

See also

Navigation menu

Data reader: Difference between revisions

Revision as of 06:34, 8 June 2010

File formats

Interpolation Methods

Syntax and Usage

Importing the Data Reader Module

Configuring the Input parameters of the file

Filename

Format

Parameters

Declaring Inputs and Output Variables

Examples

Implementation

Further Work

CSV format

See also

Navigation menu

Search