TABLES: Difference between revisions

From ASCEND
Jump to navigation Jump to search
No edit summary
added DATASET details
 
Line 1: Line 1:
This page documents the TABLE/VALUES/DATASET syntax and behavior that is actually implemented in ASCEND at present.
This page documents the TABLE/VALUES/DATASET syntax and behavior that is actually implemented in ASCEND at present.


== Status Summary ==
= Status Summary =


* <code>TABLE</code>: parsed and compiled.
* <code>TABLE</code>: implemented.
* <code>VALUES</code>: parsed only (no statement object/lowering yet).
* <code>VALUES</code>: parsed only (no statement object/lowering yet).
* <code>DATASET</code>: parsed only (no statement object/lowering yet).
* <code>DATASET</code>: implemented.


== TABLE Syntax (Implemented) ==
= <code>TABLE</code> statement =


<source lang=ascend>
<source lang=ascend>
Line 35: Line 35:
* Tokens accepted in TABLE body include identifiers, quoted symbols, integers, reals, <code>{...}</code>, and punctuation <code>:</code> <code>=</code> <code>,</code> <code>+</code> <code>-</code> <code>;</code>.
* Tokens accepted in TABLE body include identifiers, quoted symbols, integers, reals, <code>{...}</code>, and punctuation <code>:</code> <code>=</code> <code>,</code> <code>+</code> <code>-</code> <code>;</code>.


== TABLE Compilation Behavior ==
== POSITIONAL table ==
 
=== 1) POSITIONAL TABLE ===


<source lang=ascend>
<source lang=ascend>
Line 56: Line 54:
* Sparse punctuation (<code>:</code>, <code>=</code>) is rejected in POSITIONAL mode.
* Sparse punctuation (<code>:</code>, <code>=</code>) is rejected in POSITIONAL mode.


=== 2) Dense non-POSITIONAL TABLE ===
== Dense non-POSITIONAL table ==


<source lang=ascend>
<source lang=ascend>
Line 136: Line 134:
* <code>VALUES</code> and <code>DATASET</code> are parse-only at this stage.
* <code>VALUES</code> and <code>DATASET</code> are parse-only at this stage.


=== 2) <code>VALUES</code> ===
= <code>VALUES</code> statement (not yet implemented) =


Intended for vectors and sparse ND assignments.
Intended for vectors and sparse ND assignments.
Line 158: Line 156:
</source>
</source>


=== 3) <code>DATASET</code> ===
= <code>DATASET</code> statement =
 
<source lang=ascend>
DATASET met FROM "weather.csv";
    ghi[t] FROM 'GHI';
END DATASET;
</source>


Intended for external files, especially large time-series and operational data.
With no <code>INDEX</code> items, <code>t</code> is interpreted as the row index
(1..N rows). If <code>t</code> is undeclared, parser lowering injects:


<source lang=ascend>
<source lang=ascend>
DATASET <dataset_name> FROM "file.csv";
t IS_A set OF integer_constant;
INDEX <set_name> FROM COLUMN <col_name> IS_A <type_name>;
<array_name>[<index_set>] FROM COLUMN <col_name> [{<units>}] [IS_A <type_name>];
...
END DATASET;
</source>
</source>


Example:
== Extended syntax form ==
 
Includes explicit <code>INDEX</code> and typed maps.


<source lang=ascend>
<source lang=ascend>
DATASET ops FROM "ops.csv";
DATASET ops FROM "ops.csv";
     INDEX t FROM COLUMN timestamp IS_A time;
     INDEX customer FROM COLUMN customer_id IS_A integer_constant;
     load[t] FROM COLUMN load_MW {MW};
     load[customer] IS_A factor_constant FROM 'load_MW' {MW};
END DATASET;
END DATASET;
</source>
</source>
Supported map statement forms:
* <code>&lt;target&gt; FROM [COLUMN] &lt;column_ref&gt; [units] [IS_A type]</code>
* <code>&lt;target&gt; IS_A type FROM [COLUMN] &lt;column_ref&gt; [units]</code>
where:
* <code>&lt;column_ref&gt;</code> is an identifier or quoted symbol.
* <code>COLUMN</code> is optional for map items.
* <code>COLUMN</code> is required for <code>INDEX</code> items.
== Example: Melbourne DNI/GHI ==
<source lang=ascend>
MODEL melbourne_dni_ghi_dataset4;
    DATASET melbourne FROM "johnpye/datareader/086282TMY_60min_dni_ghi.csv";
        dni[timestep] IS_A real_constant FROM 'DNI';
        ghi[timestep] IS_A real_constant FROM 'GHI';
    END DATASET;
    annual_dni_sum = SUM[dni[t] | t IN timestep];
    annual_ghi_sum = SUM[ghi[t] | t IN timestep];
END melbourne_dni_ghi_dataset4;
</source>
Notes:
Notes:


* <code>DATASET</code> map targets must be indexed (<code>name[...]</code>).
* No explicit <code>INDEX</code> statement is required in this form.
* If <code>IS_A</code> is omitted on a map line, a default type (planned: <code>real_constant</code>, dimensionless) can be assumed during compile/lowering.
* <code>timestep</code> is populated as 1..N (N = number of data rows).
* Quoted column names are used for direct header matching.
 
== DATASET Compilation Behavior ==
 
=== File loading ===
 
* Reads CSV-like input with a required header row.
* Delimiters accepted: comma and semicolon.
* Quoted fields (single/double quote) are accepted; outer quotes are stripped.
* Blank lines and comment lines beginning with <code>#</code> are ignored.
* Supported file types: plain, <code>.gz</code>, <code>.xz</code>.
* Compression support is conditional at build time:
** <code>.gz</code> requires zlib.
** <code>.xz</code> requires liblzma.
 
=== Column names and units ===
 
Column units are parsed from header suffixes:
 
* <code>Name{unit}</code>
* <code>Name / {unit}</code>
* <code>Name[unit]</code>
* <code>Name / [unit]</code>
 
(whitespace tolerant)
 
Optional units-row behavior:
 
* If the first data row parses as units-only tokens, it is consumed as column units.
* Units-row tokens accepted include bracket/brace forms with optional leading <code>/</code>.
* Dimensionless tokens (<code>-</code>, <code>1</code>, <code>dimensionless</code>) mean "no units".
 
=== Index handling ===
 
* Explicit <code>INDEX</code> items:
** Build and assign named sets from column values.
** Supported element types: integer, symbol.
* Implicit row-index mode (no explicit <code>INDEX</code> items):
** Unresolved simple set names in map indices are bound to row positions 1..N.
** The corresponding set is assigned as integer row labels.
** Current restriction: one undeclared set name per DATASET.


== Labels and Quoting ==
=== Map assignment ===


* Identifier-like labels can be unquoted.
* Map targets assign to constant instances (real/int/symbol/bool constants).
* Labels with spaces/punctuation require symbol-constant quoting: <tt>'New York'</tt>
* Units rules:
** Map units and column units must match if both are present.
** Cell units (<code>value{unit}</code>) must match map/column units if those are present.
** If map/column units are absent, cell units may provide units for real constants.
** Units are rejected for integer/symbol/boolean constants.


== Units Concept ==
=== Runtime notes ===


* Unit metadata may come from file metadata/rows (future data-handler behavior).
* Loaded DATASET content is materialized in memory and cached per file during pass 1.
* Inline units in <code>DATASET</code> mappings are allowed and should be checked for compatibility against declared type dimensions.
* For simple 1D array targets (eg <code>dni[t]</code>), assignment uses a direct array-child fast path instead of per-cell <code>FindInstances</code>.
* After this fast path, DATASET loading/assignment is much cheaper; large runtime is currently dominated by relation instantiation (eg large <code>SUM[...]</code> relations).


[[Category:Development]]
[[Category:Development]]

Latest revision as of 05:49, 24 February 2026

This page documents the TABLE/VALUES/DATASET syntax and behavior that is actually implemented in ASCEND at present.

Status Summary

  • TABLE: implemented.
  • VALUES: parsed only (no statement object/lowering yet).
  • DATASET: implemented.

TABLE statement

TABLE <target_name> [IS_A <type_name>[(<type_args>)] [OF <set_type>]] [POSITIONAL] [DEFAULT <expr>];
    ...
END TABLE;

Where <target_name> is an indexed array reference such as:

cost[r,c]

or

cost[r][c]

Notes:

  • Inline declaration is supported, eg TABLE cost[r,c] IS_A factor_constant;.
  • When inline declaration is used, ASCEND injects an equivalent IS_A declaration statement before the TABLE statement during parsing.
  • POSITIONAL and DEFAULT may both be written.
  • Body rows are newline-based.
  • Tokens accepted in TABLE body include identifiers, quoted symbols, integers, reals, {...}, and punctuation : = , + - ;.

POSITIONAL table

TABLE cost[r,c] IS_A integer_constant POSITIONAL;
    11 12 13
    21 22 23
END TABLE;

Implemented behavior:

  • Supports 1-D and 2-D targets.
  • Values must be numeric.
  • Signs +/- are supported.
  • Comma can be used as a value separator.
  • A sequence containing at least one semicolon and/or newline ends the current row.
  • Enforces row/column/value counts against index cardinality.
  • Sparse punctuation (:, =) is rejected in POSITIONAL mode.

Dense non-POSITIONAL table

TABLE cost[r,c] IS_A integer_constant;
	    3 1 2
	    2 23 21 22
	    1 13 11 12
END TABLE;

Implemented behavior:

  • Dense non-POSITIONAL mode currently requires exactly 2 indices.
  • First non-empty row is the column-label header.
  • Header may optionally start with :.
  • Each data row begins with a row label; : after row label is optional.
  • Cell values are numeric only.
  • Row/column labels can be integer or symbol-valued.
  • Duplicate row labels and duplicate column labels are rejected.
  • Label membership is checked against index sets when those sets are already defined.
  • Row/column counts must match first/second index set cardinality.
  • Comma-separated rows (CSV-like) are accepted.
  • A sequence containing at least one semicolon and/or newline ends the current row.

Label Rules

Examples:

TABLE cost[r,c];
	    x y
	    north 11 12
	    south 21 22
END TABLE;
TABLE cost[r,c] IS_A integer_constant;
	    c3,c1,c2;
	    alan,23,21,22;
	    bernhard,13,11,12;
END TABLE;

Current label handling:

  • Integer labels: parsed as integers and matched to integer sets.
  • Symbol labels: unquoted identifiers or quoted symbols (e.g. 'New York') are matched to symbol sets.

Implicit Set Inference (Implemented for Dense TABLE)

If an index set expression cannot yet be evaluated and it is a simple named set, dense TABLE can infer and assign that set from labels.

Example:

r IS_A set OF integer_constant;
c IS_A set OF symbol_constant;

TABLE cost[r,c] IS_A integer_constant;
        a  b
    2: 21 22
    1: 11 12
END TABLE;

Behavior:

  • If all labels are unquoted integers, infer integer set.
  • Otherwise infer symbol set.
  • Duplicate inferred labels are rejected.

Current Limitations

  • Sparse non-POSITIONAL TABLE assignment (e.g. row: col=value ...) is not implemented.
  • DEFAULT <expr> is parsed and stored but not yet used in TABLE lowering.
  • Non-numeric TABLE cell values are not yet supported.
  • TABLE index ranges written directly in index expressions are not currently supported by dense lowering.
  • VALUES and DATASET are parse-only at this stage.

VALUES statement (not yet implemented)

Intended for vectors and sparse ND assignments.

VALUES <array_name>[<index_set>, ...] [DEFAULT <expr>];
   <k1>[,<k2>...] = <value>;
   ...
END VALUES;

Example:

VALUES demand[customer] DEFAULT 0;
    1 = 45;
    2 = 30;
    3 = 40;
    4 = 35;
END VALUES;

DATASET statement

DATASET met FROM "weather.csv";
    ghi[t] FROM 'GHI';
END DATASET;

With no INDEX items, t is interpreted as the row index (1..N rows). If t is undeclared, parser lowering injects:

t IS_A set OF integer_constant;

Extended syntax form

Includes explicit INDEX and typed maps.

DATASET ops FROM "ops.csv";
    INDEX customer FROM COLUMN customer_id IS_A integer_constant;
    load[customer] IS_A factor_constant FROM 'load_MW' {MW};
END DATASET;

Supported map statement forms:

  • <target> FROM [COLUMN] <column_ref> [units] [IS_A type]
  • <target> IS_A type FROM [COLUMN] <column_ref> [units]

where:

  • <column_ref> is an identifier or quoted symbol.
  • COLUMN is optional for map items.
  • COLUMN is required for INDEX items.

Example: Melbourne DNI/GHI

MODEL melbourne_dni_ghi_dataset4;
    DATASET melbourne FROM "johnpye/datareader/086282TMY_60min_dni_ghi.csv";
        dni[timestep] IS_A real_constant FROM 'DNI';
        ghi[timestep] IS_A real_constant FROM 'GHI';
    END DATASET;

    annual_dni_sum = SUM[dni[t] | t IN timestep];
    annual_ghi_sum = SUM[ghi[t] | t IN timestep];
END melbourne_dni_ghi_dataset4;

Notes:

  • No explicit INDEX statement is required in this form.
  • timestep is populated as 1..N (N = number of data rows).
  • Quoted column names are used for direct header matching.

DATASET Compilation Behavior

File loading

  • Reads CSV-like input with a required header row.
  • Delimiters accepted: comma and semicolon.
  • Quoted fields (single/double quote) are accepted; outer quotes are stripped.
  • Blank lines and comment lines beginning with # are ignored.
  • Supported file types: plain, .gz, .xz.
  • Compression support is conditional at build time:
    • .gz requires zlib.
    • .xz requires liblzma.

Column names and units

Column units are parsed from header suffixes:

  • Name{unit}
  • Name / {unit}
  • Name[unit]
  • Name / [unit]

(whitespace tolerant)

Optional units-row behavior:

  • If the first data row parses as units-only tokens, it is consumed as column units.
  • Units-row tokens accepted include bracket/brace forms with optional leading /.
  • Dimensionless tokens (-, 1, dimensionless) mean "no units".

Index handling

  • Explicit INDEX items:
    • Build and assign named sets from column values.
    • Supported element types: integer, symbol.
  • Implicit row-index mode (no explicit INDEX items):
    • Unresolved simple set names in map indices are bound to row positions 1..N.
    • The corresponding set is assigned as integer row labels.
    • Current restriction: one undeclared set name per DATASET.

Map assignment

  • Map targets assign to constant instances (real/int/symbol/bool constants).
  • Units rules:
    • Map units and column units must match if both are present.
    • Cell units (value{unit}) must match map/column units if those are present.
    • If map/column units are absent, cell units may provide units for real constants.
    • Units are rejected for integer/symbol/boolean constants.

Runtime notes

  • Loaded DATASET content is materialized in memory and cached per file during pass 1.
  • For simple 1D array targets (eg dni[t]), assignment uses a direct array-child fast path instead of per-cell FindInstances.
  • After this fast path, DATASET loading/assignment is much cheaper; large runtime is currently dominated by relation instantiation (eg large SUM[...] relations).