Next-generation ASCEND syntax

From ASCEND
Jump to: navigation, search
This article is about planned development or proposed functionality. Comments welcome.

See New Compiler for motivation, but this page is about nitty details.

A number of problems with the current system (and its ancient mishmash of Pascal and smalltalk ideas) suggest that a saner way forward than incremental cleanup is to reinvent the language front end and stick with a small language. The ultimate "big language" approach ends with modelica/xml, which we do not care for.


Objectives

Aiming toward 'small-project' and 'portability' and 'easy to learn' and 'integrability' leads to the following design criteria for a new front-end (input language):


  • respell the pedantic keyword style of ascend to something which is ergonomic and still clear.
  • redesign the use of delimiters for concepts not part of normal procedural languages (equations, set notation, annotations, tables) .
  • not expose the user to explicit memory management of the model, unless they are very ambitious.
  • provide for methods which may be written in the full syntax and semantics of a standard ISO language without tortuous usage to refer to the ascend model parts.
  • eliminate the bulk of the hand-coded compiler/interpreter routines-- leave this work to real compilers like C and interpreters like python. Get rid of everything after compiler Pass1.

One approach: sitting on top of a 'base' language

Derive from some language which is:


  • easy to learn.
  • already well known with a large, lightweight (and/or else highly ubiquitous and stable) open-source tool base for developers.
  • capable of being processed with an open-source, easily modified AST-based engine so that a class in the language can be verified for conformance with ascend modeling restrictions/practices.
  • capable of being compiled to binary, to avoid our maintenance of an interpreter.
  • capable of supporting multithreaded use, for performance on all modern systems.
  • capable of preserving ascend's dynamic typing and glassbox modeling approaches.

Ben has given up on this approach, because C can't support it and everything better than C is worse at meeting the big picture requirements. Also, what we really want is easy interoperabilty (or embedding) in many languages, not just one.


Possible base languages

  • C++ is the most obvious choice to meet the above criteria, but critically fails the AST-based engine criterion.
  • (Abbot went this route) Java is a possibility, but is slow and the common editing environments for it are anything but lightweight.
  • Ruby/Python are persistently tempting but so persistently inconsistent that portability is an big issue.
  • Fortran 20xx is a possibility, since it has an open-source parser, but then it adds a lot of baggage if we want to apply autodifferentiation.
  • C and objective-c, unlike c++, is very parseable and otherwise has the large array of tools needed. The base language compiler error messages are nearly useless, but with an appropriate ASCEND frontend, this could be much improved. In particular, if our dialect of C explicitly bans end-user use of the C preprocessor, much becomes simpler.

Prototyping with several of these, however, show that grafting introspection on to any of the binary-oriented languages ultimately leads to madness. So the alternative is to consider designing a very small language with no imperative statements but eliminate the bulk of the instance structure management by targeting a base language the routine user does not see. The methods section becomes a plain text block that, with perhaps minor automated edits, is then valid code passed to the base language compiler.


  • An alternative approach would be to use one of ML, Haskell (or a strict version thereof), or Scheme as a base language. These have the advantages that parsing, analysis, automatic type inference (the used kind for a static type system to as to minimize user type annotations), and generally everything related to programming and reasonable compilation become much easier. Also, if say a miniature strict version of haskell were used, the monad abstraction would be useful for tracking dependencies of what needed to be reevaluated etc when parameters of a model are reset. Additionally with either of these functional languages as substrate, it'd be relatively simple to add some sort of notion of having units for values that can be checked statically/dynamically. (eg m^2, ft, etc). They are also good languages for metaprogramming etc, though very very different approaches are taken in each of the three.

Eliminate the instantiator bulkiness and thinking about instance memory management issues

The maintenance nightmare which is the compiler could be substantially reduced by having a grammar which is simpler and turning the compiler into a family of code generators. Instantiating a model is then just passing processed models through an output phase and running python or gcc or ... on them. In this approach, memory management of the instantiation becomes the responsibility of the target language-- ascend is no longer creating a canonical instance tree (beyond Pass1 of the current compiler which is essentially an AST expansion) on which external languages act through limited APIs.


'Designing out' maintenance errors

Simple naming error introduced in maintenance are typically caught by the parser or instantiation. Block nesting errors, on the other hand, are very easy to introduce and often hard to find (or correct if not introduced by the same coder). Syntax can eliminate most such errors by eliminating ambiguity in block end matching. ASCEND4 has the feature. Some people would try to improve it. BetterBlockSyntax alternatives have been discussed (but the page has been lost).

Eliminate redundant typing

Type declaration in ascend is currently clunky, due to its pascal-ish nature and its long, underscored keywords. Going with any of the mainstream languages is equally clunky. Consider the alternate example:

// assume any ascend4 feature you don't see revised here will pretty much carry over unmodified (except perhaps lowercase).
// note that if we can get the sections defined properly, we may be able to use '=' a lot of places unambiguously with one
// meaning of '=' defined in each section.
model b
 ...
ledom
atom pressure extends real
  ...
mota
model c
  x1, x2, q: real;
ledom
model a (somename : symbol_constant) extends c
  // sections are with keywords changes, submodels, variables, equations boundaries, methods, etc.
  // everything is declared with a (label: type;) where label may be one or more names. type may be parameterized.
 interface-
   // stuff so we can have more than one parameterization of a type.
   // semantics to be determined. Does an 'a' with multiple interfaces yield multiple (implicit) type definitions?
   // Is the member list of a type the union of all defined interfaces?
   // Can a name be willbe in one interface and isa in another?
   // When is an instance created by one interface going to be mergeable with one created by another interface?
 changes-
   q: pressure; // IRT
   x1, x2: merged; // ats
   q, x1: alike; // AA (with refinement side effects).
   x2: findtype(somename); // dynamic kind irt. currently requires passing a prototype instance.
 submodels-
   bpart: b;    
 constants-
   s: string; // symbol
   order: int; // integer
 variables-
   b1: real;
   b: pressure;
   y; // implicit double var named y, or error. design choice 
 equations-
   e1: 2*y = z;
 boundaries-
   bnd1: y > b;  
 invariant:
   // statements about structure that downstream refinements must not violate at any time
 analyzers-
   // Definition of OODB-style queries and manipulation of results, eg for building dynamic solver-oriented structures.
   // Any item defined in analysis may be referred to in another analysis, but not in model-declarative sections.
   // An analysis is not part of the implicit, dynamic typing of ascend instances.
   // numeric operators on lists for gradients, blocking, hessians, residuals, etc.
   // multi-d arrays of dimensionless numbers with C indexing or ranging ala fortran. slicing ala fortran?
   // border-issue: where do we leave off and have matlab/octave begin?
   // call() for standard numeric libraries or gate it out to procedural code receiving operator objects? 
   //// Arguably, all this is doable from any of the tri-semicolon languages, but why not canonicalize the
   //// query operations in a form reusable and composable from all target languages?
   analysis newton(self)
     rlist = collect(basetype=relation);
     eqlist = filter(rlist, .included==true, .token[0] == EQ);
   done
 methods-
 ;;;c;;;
   //  c-ish code here.  basically, pasted into output of conversion to C of the declarative code.
 ;;;;;;
 ;;;python;;;
   // pythonic code here.
 ;;;;;;
 method x // old style methods deprecated maybe? They got overloaded with a lot of poorly C-ish crap once extended past simple assignments.
 dohtem // impossible to get right, on purpose. stop using ascend4-style methods.
 assignments-
 procedure y (arglist named, typed numeric values and willbe objects only )
   '''list''' = value-expression; // strictly assignments only in this section.
 end;
ledom

Here we see that the labeling ( colon usage) for equations from ascend 4 as consistently applied to all parts.


  • This format is pretty simple to parse: except for eqns, bounds, and optional assigments in the constants section and changes sections, it is all lists of names, of which the last must be a type. loops and switches are minor extensions on this.
  • no shift key required anywhere, until you get in {} for table and annotation structures.
  • The use of : , and (on section names)- is entirely syntactic sugar and could be omitted.

Scaling to DAEs, PDEs, and all that

The notion of analyzers is powerful, but it cannot capture semantics which are not there in the ascend model. We can add semantics via deriving from specific types, but past attempts in this direction with ascend have not proved highly usable or reusable or extensible-- all the semantics gets buried in C analysis implementations.

The lexing and AST analysis of antlr makes defining and maintaining fancy line-oriented syntaxes for operators, domains, domain surfaces and time limits much easier to prototype and play with. It does nothing, on the other hand, to deal with the underlying discretization data for problems on irregular grids.

Chapel is an HPC oriented language with a very interesting history and currently under development by DOE/Cray. It has a lot of constructs intended directly for PDE-related algorithms.


METHODS

Using C

So C-ish code here is interesting. In particular, there's the tremendous problem of the end user addressing (correctly every time) some piece of an ascend model from C. To that end, I propose the following simple filtering rule (it might work for python very similarly):


  • Any complete dot-qualified name appearing in the c-ish code will be checked for existence.
    • If it does not, it will then checked in the model hierarchy (and if found) converted to the proper C reference form.
    • If not found, a warning will come from the ascend C code generator and (invariably) an error will come from the C compiler if the user proceeds.
    • If it does not, it will then checked in the model hierarchy (and if found) converted to the proper C reference form.
    • If not found, a warning will come from the ascend C code generator and (invariably) an error will come from the C compiler if the user proceeds.
  • If it does not, it will then checked in the model hierarchy (and if found) converted to the proper C reference form.
  • If not found, a warning will come from the ascend C code generator and (invariably) an error will come from the C compiler if the user proceeds.

Why? ASCEND model structures are dynamic (though this need not be so for the output C, it probably will be to accommodate using the C-instantiated instance under the current GUIs which may request refinement), so they cannot be converted to nested C structs; they must be converted to structs containing pointers to the child instances. Thus in ascend notation a real named a.b.c would probably end up in C as:

((struct Var *)(self->childlist[2]->childlist[3]->childlist[1]))->value

In a nondynamic c generation, it would be:

a.b.c.value

Using Python

Not enough thought here yet. Imagine the power, though, if combined with analyzers.


Using a native ASCEND language for METHODS

Today's current methods could keep their syntax and be simply converted (on a fully instantiable model) to equivalent native method in any request language for which there is a filter. Otherwise, we have to keep hauling around an interpreter and debugger for our methods. I would prefer to deprecate them and much of the cruft going along that can be better done directly with new 'native' tri-semicolon methods.


'procedures'

Get back to the ASCEND3 simple assignments only scheme. Anything else leads to madness best resolved in tri-semicolon code.


Proofs of concept

Yacc/Lex are completely out of gas when it comes to parser maintainability and extensibility and AST management. Extensibility is a key feature for ASCEND. The only other alternatives going are GLR and ANTLR and boost-spirit. C++ errors are still hopelessly cryptic. ANTLR has an astounding tool stack used heavily in the commercial and scientific computing worlds, including grammar IDE, grammars for C, fortran, Java and C preprocessor. GLR we don't know enough about.

Thus, two grammars and a set of tree-walkers are required for a satisfying language prototype.


  • antlr parser of ascend4.
  • antlr parser of ascend5.
  • tree walker that takes ascend4 parse tree in and puts out equivalent ascend5 tree for all legal ascend4.
  • tree walker that takes ascend5 parse tree in and puts out equivalent libascend (C or python?) object for any instantiable type.

The key to understanding the approach advocated here is that antlr parsers generate (for free) an abstract syntax tree (AST) which is essentially our current 'TypeDescription' machinery. Lots (if not all) of the manual goop going on in our yacc 'actions' disappears. The AST can then be passed to a series of ANTLR-described filters for type/symbol resolution and semantic validation. The end result of a validation can then be passed to a tree walker (another antlr filter) which follows the type tree to generate an instance tree equivalent to current ascend4 compiler Pass1 where all MODEL, set, and scalar/array instance structures are known. This 'partial' instance tree is sufficient to discover the implicit types of all the models and generate non-redundant code in C, python, etc. The model thus becomes exactly equivalent to any other traditionally compiled language in performance, including access to GPUs, GUIs, etc.


Sample parser

John Pye (JP) has done a little bit of a first cut on a possible new ASCEND grammar, sandboxes/trunk/antlr/mygrammar.g (open it with ANTLRWorks). The idea was to parse as much as possible, including statements that are given in the wrong place, and to then use suitable error messages to advise the user if the context of their statement is disallowed. Lower-case keywords are used, and operators were simplified a bit. This idea still implements a METHODS language, and has opted for a nice IF..ELSE syntax for conditionals (eliminating WHEN and CONDITIONAL) but otherwise, doesn't bring very much that's new.

An example of a file that's supposed to parse using the above grammar is sandboxes/trunk/antlr/testmodel.a5c

JP worked out a framework for relations, expressions, function evaluations (my take on external relations), METHODs (yes, still thinking of keeping those around), declarations, model, and importing of other models and code.

Some ideas that seemed to make sense:


  • namespace implementation for imported stuff, as per Python. "import as"...?
  • 'import' could replace both REQUIRE and IMPORT.
  • external relations should be replaced by external functions that can be poked into any expression or conditional.
  • relations, assignments, and comparisons should all use the same '=' operator.
  • boolean comparison could be forced using 'bool(...)' casting.
  • IF..ELSE statements

Things that will need more thought:


  • when external functions return multiple variables, how to do the syntax... thinking that [var, var,...] lists would be appropriate.
  • haven't given any thought to dealing with arrays yet
  • was trying to think about METHOD declarations that could require parameters as inputs, for use from other methods. Am too naive to know how much work would be involved in permitting declaration of local variables within METHODs
  • haven't yet attempted FOR-loops.
  • parametric MODELs need lots more thought as well
  • what do to with constants?
  • what do about meta-model data like NOTES, icon info, solve directives, etc?

We would hope to replace the current compiler with a sequence of AST tree-walkers: checking the context of things, checking and assigning identifiers, etc.


First cut at ASCEND4 parser using ANTLR

Here's my first cut at this. No actions, just a grammar. No syntax errors, but not tested/debugged on real model files yet -- User:Jpye