TABLES: Difference between revisions

From ASCEND
Jump to navigation Jump to search
No edit summary
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
This document captures the current v0.5 concept for tabular data syntax in ASCEND. It is a language design draft, not a complete implementation specification.
This page documents the TABLE/VALUES/DATASET syntax and behavior that is actually implemented in ASCEND at present.


== Goals ==
== Status Summary ==


* Keep ASCEND’s explicit, type-aware style.
* <code>TABLE</code>: parsed and compiled.
* Minimize repetitive boilerplate for indexed data.
* <code>VALUES</code>: parsed only (no statement object/lowering yet).
* Support both compact in-model data and external datasets.
* <code>DATASET</code>: parsed only (no statement object/lowering yet).
* Preserve dimensional/unit checking as a first-class feature.


== New Declarations ==
== TABLE Syntax (Implemented) ==


=== 1) <code>TABLE</code> ===
<source lang=ascend>
TABLE <target_name> [IS_A <type_name>[(<type_args>)] [OF <set_type>]] [POSITIONAL] [DEFAULT <expr>];
    ...
END TABLE;
</source>
 
Where <code>&lt;target_name&gt;</code> is an indexed array reference such as:
 
<source lang=ascend>
cost[r,c]
</source>


Intended for two-axis table data mapped to existing arrays.
or


<source lang=ascend>
<source lang=ascend>
TABLE <array_name>[<row_index_set>,<col_index_set>] [POSITIONAL] [DEFAULT <expr>];
cost[r][c]
...
END TABLE;
</source>
</source>


Rules:
Notes:


* Default (non-<code>POSITIONAL</code>) form:
* Inline declaration is supported, eg <code>TABLE cost[r,c] IS_A factor_constant;</code>.
** first row contains column labels.
* When inline declaration is used, ASCEND injects an equivalent <code>IS_A</code> declaration statement before the TABLE statement during parsing.
** subsequent rows contain row label followed by row values.
* <code>POSITIONAL</code> and <code>DEFAULT</code> may both be written.
** <code>:</code> after row label is optional.
* Body rows are newline-based.
* Sparse form:
* Tokens accepted in TABLE body include identifiers, quoted symbols, integers, reals, <code>{...}</code>, and punctuation <code>:</code> <code>=</code> <code>,</code> <code>+</code> <code>-</code> <code>;</code>.
** no header row required.
** each row uses <code><col_label>=<value></code> pairs.
** <code>DEFAULT <expr></code> indicates unspecified cells are filled with the default.
** <code>POSITIONAL</code> form:
** no row/column labels in body.
** values are interpreted positionally from set ordering.
* Row separator:
** newline is the natural row separator.
** <code>;</code> immediately before newline joins lines, so one logical row can wrap across physical lines.


Examples:
== TABLE Compilation Behavior ==


Dense form, with row and column labels, as well as the central data array:
=== 1) POSITIONAL TABLE ===


<source lang=ascend>
<source lang=ascend>
TABLE ship_cost[customer,facility];
TABLE cost[r,c] IS_A integer_constant POSITIONAL;
      1 2 3
     11 12 13
    1 2 4 5
     21 22 23
    2 3 1 2
     3 4 2 1
     4 5 3 2
END TABLE;
END TABLE;
</source>
</source>
Implemented behavior:
* Supports 1-D and 2-D targets.
* Values must be numeric.
* Signs <code>+</code>/<code>-</code> are supported.
* Comma can be used as a value separator.
* A sequence containing at least one semicolon and/or newline ends the current row.
* Enforces row/column/value counts against index cardinality.
* Sparse punctuation (<code>:</code>, <code>=</code>) is rejected in POSITIONAL mode.
=== 2) Dense non-POSITIONAL TABLE ===


<source lang=ascend>
<source lang=ascend>
TABLE ship_cost[customer,facility];
TABLE cost[r,c] IS_A integer_constant;
            Sydney 'New York' Rome
    3 1 2
    Alan    2      4          5
    2 23 21 22
    Bernhard 3     1         2
    1 13 11 12
    Colin    4      2         1
    David    4      5          3
END TABLE;
END TABLE;
</source>
</source>


Sparse form, as flagged via the <tt>DEFAULT value</tt> term:
Implemented behavior:
 
* Dense non-POSITIONAL mode currently requires exactly 2 indices.
* First non-empty row is the column-label header.
* Header may optionally start with <code>:</code>.
* Each data row begins with a row label; <code>:</code> after row label is optional.
* Cell values are numeric only.
* Row/column labels can be integer or symbol-valued.
* Duplicate row labels and duplicate column labels are rejected.
* Label membership is checked against index sets when those sets are already defined.
* Row/column counts must match first/second index set cardinality.
* Comma-separated rows (CSV-like) are accepted.
* A sequence containing at least one semicolon and/or newline ends the current row.
 
== Label Rules ==
 
Examples:
 
<source lang=ascend>
TABLE cost[r,c];
    x y
    north 11 12
    south 21 22
END TABLE;
</source>


<source lang=ascend>TABLE ship_cost[customer,facility] DEFAULT 0;
<source lang=ascend>
    1: 1=2 2=4 3=5
TABLE cost[r,c] IS_A integer_constant;
    2: 1=3 2=1 3=2
    c3,c1,c2;
    4 1=5 3=2
    alan,23,21,22;
    bernhard,13,11,12;
END TABLE;
END TABLE;
</source>
</source>


'Positional' form, where row and column labels are implied (integer sequence):
Current label handling:
 
* Integer labels: parsed as integers and matched to integer sets.
* Symbol labels: unquoted identifiers or quoted symbols (e.g. <code>'New York'</code>) are matched to symbol sets.
 
== Implicit Set Inference (Implemented for Dense TABLE) ==
 
If an index set expression cannot yet be evaluated and it is a simple named set, dense TABLE can infer and assign that set from labels.
 
Example:


<source lang=ascend>
<source lang=ascend>
TABLE c[city,city] POSITIONAL;
r IS_A set OF integer_constant;
    0 89 22 11
c IS_A set OF symbol_constant;
     21 0 94 77
 
     11 19 0 35
TABLE cost[r,c] IS_A integer_constant;
    79 33 99 0
        a  b
     2: 21 22
     1: 11 12
END TABLE;
END TABLE;
</source>
</source>
Behavior:
* If all labels are unquoted integers, infer integer set.
* Otherwise infer symbol set.
* Duplicate inferred labels are rejected.
== Current Limitations ==
* Sparse non-POSITIONAL TABLE assignment (e.g. <code>row: col=value ...</code>) is not implemented.
* <code>DEFAULT &lt;expr&gt;</code> is parsed and stored but not yet used in TABLE lowering.
* Non-numeric TABLE cell values are not yet supported.
* TABLE index ranges written directly in index expressions are not currently supported by dense lowering.
* <code>VALUES</code> and <code>DATASET</code> are parse-only at this stage.


=== 2) <code>VALUES</code> ===
=== 2) <code>VALUES</code> ===

Latest revision as of 12:55, 17 February 2026

This page documents the TABLE/VALUES/DATASET syntax and behavior that is actually implemented in ASCEND at present.

Status Summary

  • TABLE: parsed and compiled.
  • VALUES: parsed only (no statement object/lowering yet).
  • DATASET: parsed only (no statement object/lowering yet).

TABLE Syntax (Implemented)

TABLE <target_name> [IS_A <type_name>[(<type_args>)] [OF <set_type>]] [POSITIONAL] [DEFAULT <expr>];
    ...
END TABLE;

Where <target_name> is an indexed array reference such as:

cost[r,c]

or

cost[r][c]

Notes:

  • Inline declaration is supported, eg TABLE cost[r,c] IS_A factor_constant;.
  • When inline declaration is used, ASCEND injects an equivalent IS_A declaration statement before the TABLE statement during parsing.
  • POSITIONAL and DEFAULT may both be written.
  • Body rows are newline-based.
  • Tokens accepted in TABLE body include identifiers, quoted symbols, integers, reals, {...}, and punctuation : = , + - ;.

TABLE Compilation Behavior

1) POSITIONAL TABLE

TABLE cost[r,c] IS_A integer_constant POSITIONAL;
    11 12 13
    21 22 23
END TABLE;

Implemented behavior:

  • Supports 1-D and 2-D targets.
  • Values must be numeric.
  • Signs +/- are supported.
  • Comma can be used as a value separator.
  • A sequence containing at least one semicolon and/or newline ends the current row.
  • Enforces row/column/value counts against index cardinality.
  • Sparse punctuation (:, =) is rejected in POSITIONAL mode.

2) Dense non-POSITIONAL TABLE

TABLE cost[r,c] IS_A integer_constant;
	    3 1 2
	    2 23 21 22
	    1 13 11 12
END TABLE;

Implemented behavior:

  • Dense non-POSITIONAL mode currently requires exactly 2 indices.
  • First non-empty row is the column-label header.
  • Header may optionally start with :.
  • Each data row begins with a row label; : after row label is optional.
  • Cell values are numeric only.
  • Row/column labels can be integer or symbol-valued.
  • Duplicate row labels and duplicate column labels are rejected.
  • Label membership is checked against index sets when those sets are already defined.
  • Row/column counts must match first/second index set cardinality.
  • Comma-separated rows (CSV-like) are accepted.
  • A sequence containing at least one semicolon and/or newline ends the current row.

Label Rules

Examples:

TABLE cost[r,c];
	    x y
	    north 11 12
	    south 21 22
END TABLE;
TABLE cost[r,c] IS_A integer_constant;
	    c3,c1,c2;
	    alan,23,21,22;
	    bernhard,13,11,12;
END TABLE;

Current label handling:

  • Integer labels: parsed as integers and matched to integer sets.
  • Symbol labels: unquoted identifiers or quoted symbols (e.g. 'New York') are matched to symbol sets.

Implicit Set Inference (Implemented for Dense TABLE)

If an index set expression cannot yet be evaluated and it is a simple named set, dense TABLE can infer and assign that set from labels.

Example:

r IS_A set OF integer_constant;
c IS_A set OF symbol_constant;

TABLE cost[r,c] IS_A integer_constant;
        a  b
    2: 21 22
    1: 11 12
END TABLE;

Behavior:

  • If all labels are unquoted integers, infer integer set.
  • Otherwise infer symbol set.
  • Duplicate inferred labels are rejected.

Current Limitations

  • Sparse non-POSITIONAL TABLE assignment (e.g. row: col=value ...) is not implemented.
  • DEFAULT <expr> is parsed and stored but not yet used in TABLE lowering.
  • Non-numeric TABLE cell values are not yet supported.
  • TABLE index ranges written directly in index expressions are not currently supported by dense lowering.
  • VALUES and DATASET are parse-only at this stage.

2) VALUES

Intended for vectors and sparse ND assignments.

VALUES <array_name>[<index_set>, ...] [DEFAULT <expr>];
   <k1>[,<k2>...] = <value>;
   ...
END VALUES;

Example:

VALUES demand[customer] DEFAULT 0;
    1 = 45;
    2 = 30;
    3 = 40;
    4 = 35;
END VALUES;

3) DATASET

Intended for external files, especially large time-series and operational data.

DATASET <dataset_name> FROM "file.csv";
INDEX <set_name> FROM COLUMN <col_name> IS_A <type_name>;
<array_name>[<index_set>] FROM COLUMN <col_name> [{<units>}] [IS_A <type_name>];
...
END DATASET;

Example:

DATASET ops FROM "ops.csv";
    INDEX t FROM COLUMN timestamp IS_A time;
    load[t] FROM COLUMN load_MW {MW};
END DATASET;

Notes:

  • DATASET map targets must be indexed (name[...]).
  • If IS_A is omitted on a map line, a default type (planned: real_constant, dimensionless) can be assumed during compile/lowering.

Labels and Quoting

  • Identifier-like labels can be unquoted.
  • Labels with spaces/punctuation require symbol-constant quoting: 'New York'

Units Concept

  • Unit metadata may come from file metadata/rows (future data-handler behavior).
  • Inline units in DATASET mappings are allowed and should be checked for compatibility against declared type dimensions.