Syntactics -- The Surface of Programming Languages

The surface structure of a (programming) language determines its look & feel.

Analysis Requirements & Principles architecture:
Paradigms
substance:
Abstractions
structure:
Domains
building blocks: Features surface:
Syntactics
Defining PLs Language
List
foundational safety flexibilized

  1. Syntactic Variation
  2. Morphologic Variation --> see also special page: spelling of names
  3. Syntactic Sugars
  4. Code Layout Features

 
If you know further references, please don't hesitate to tell me (Ulf Schünemann), likewise if you have any suggestions.

Quick links to the syntactic definition of some languages (some PL grammars): Ada95, Java, Haskell, SML. Modula2.

TODO: Main directions of programm representation:


Syntactic Variation

1. Conditional

Abstract syntax: a conditional ("if p holds then a, otherwise b") consists of a condition p and two alternatives a and b.

Concrete syntax:
statements expressions (+/- side-effects)
As a flow chart (1946/47)
Sorry for this crude character graphic
      true
      .---> [a]---v
---> <p>          +--->
      `---> [b]---^
      false 
In Lisp (1958) (if (p) (a) (b))
In Algol-60 (1960) if p then a else b
In BCPL (end of 1960s) [ref] test p then a else b
if p then a
(in certain cases then may be omitted)
p-> a, b
In C (1971/2/3) if(p) a else b p? a : b
In ML (1978) if p then a else b
In Smalltalk80 (1980) (p) ifTrue: a; ifFalse: b

2. Application

Abstract syntax: an application consists of an operator-expression F (in particular, an operator name f) and 1 to n operand-expressions or arguments Ai.

Concrete syntax:
notation schema examples occurance
Function style: F(A1, ..., An) pow(2,3) mathematics, Algol-family
Cambridge Polish [Scott] F A1 ... An (pow 2 3), (+ x y), (set! x y) lambda-calculus, Lisp (always surrounded by parantheses), ML, Haskell
Prefix (unary Cambridge Polish):
f A1 -x, ¬true, ++x mathematics, logic, Algol-family
Postfix (unary post-A1)A1 f x!, x^ (pointer dereferencing), x++ mathematics, Pascal, C-family
Infix (binary post-A1)A1 f A2
in Ada syntactic sugar for function-style "f"(A1,A2)
in C++ syntactic sugar for operatorf(A1,A2) or for A1.operatorf(A2)
x + y, x < y, x & y, x mod y, x->y, (x := y - "application" of "update-operator"?) mathematics, logic, Algol-family
Selection style A1.f(A2, ..., An) x.add(y) Simula-family (Eiffel, C++, Java, ...)
Mixfix styles: preposition style myBox displayOn: myScreen at: 100@50
HouseholdFinances spend: 12+20 on: 'food'
ages at: 'Brett' put: 3
Smalltalk's keyword messages
conjunction style if x then y else z Algol, ML, Haskell
circumfix (unary) |x| mathematics
A2-circumfix (binary)x[y] Algol-family
A2-circumfix (ternary)x ? y : zC-family (C, C++, Java)

Precedence levels (and associativity) regulate how to parse nested expressions cotaining infix, and/or pre- and postfix operators. Infix operators have associated predicende levels and associativity (left/right) to implicitly structure complex expressions. The design question is whether at all and how many different (classes of) operators there should be in the language.

  1. no operators (Lisp/Scheme)
  2. built-in operators (Pascal, Java)
  3. overloadable operators (C++)
  4. user-defined operators with user-defined precedence and associativity (ML, Haskell)
Here an integrated precedence table from Fortran, C, and Ada:
Fortran C Ada
arithmetic** post++, --, ** abs not
pre++, --, unary+, -, &, * (ref/deref), !, ~ (logical/bitwise not)
*, / *, /, % *, /, mod, rem
unary+, -
+, - +, - +, -, & (concatentation)
<<, >> (shift)
relational.eq., .ne., .lt., .le., .gt., .ge. >, >=, <, <= =, /=, <, <=, >, >=
==, !=
bitwise   &  
^
|
logical .not.&& and, or, xor [or has 'and' higher prec?]
.and.
.or. ||
.eqv., .neqv.
conditional  ?:  
asssignment(?)   =, +=, /=, <<=, &=, ... (?)  
Argument on the design of Pascal's levels
«In the interest of simplicity and efficient translatability, Pascal aimed at a reasonably small number of operator precedence levels. Algol 60's hierarchy of 9 levels seemed clearly too baroque. An additional incentive for change was the replacement of the equivalence operator for Boolean expressions ["=" with three line] by the equality operator [normal "="]. Since these two operators reside on different priority levels in Algol 60, some departure from the old rules were mandatory. In retrospect, however, the decision seems ill-advised, particularly with the growing significance of complicated Boolean expressions in connection with program verification.» [Niklaus Wirth: An Assessment of the P.L Pascal; SE 1(2) 1975]

3. Bracketing syntax

Variation Argument
"{...}" vs. keywords "{...}" is easier to learn, whereas "if...endif", "do...od", "function f...end f;" etc allows better error detection [Anatomy, 59ff]. Also "{...}" is easier to spot on skimming than names (visual); bracketing check support by text-editors more likely than with keywords [IMHO].

TODO

Assignments: Meaning Form
  • =, :=, :-
  • assignment as operator/expression with value,
  • multi-assignment ...

    Definitions: Meaning Form
    ...


    Morphologic Variation

    An example of morphologic variation: Comments.
    line terminated bracketed nesting
    In Lisp (1958) and assembly ; comment
    In Algol-60 (1960) comment comment ; ?
    In Basic (1965) rem comment
    In Prolog (1972) % comment /* comment */ ?
    In BCPL (end of 1960s) [ref] // comment
    || comment
    \\ comment
    /* comment */
    |* comment *|
    \* comment *\
    no nesting of equals (as in Pascal)
    In C (1971/2/3),
    line-comment since ISO-C, taken from C++
    // comment /* comment */ first closes all:
    /* ... /* ... /* ... ... */
    In Pascal (1972) (* comment *)
    { comment }
    no nesting of equals:
    (* ... { ... } ... *)
    { ... (* ... *) ... }
    In ML/SML (1978/83?) -- comment (* comment *) ?
    In Smalltalk80 (1980) (and earlier versions?) "comment" ?
    In Haskell -- comment {- comment -} ?
    Variation Argument
    line terminated vs. bracketed line terminated comments are less error-prone than bracketed comments [Anatomy, 62ff]

    Syntactic Sugars

    Much more could be said here ...

    [ST80] Adele Goldberg, David Robson: Smalltalk-80: The Language and its Implementation; Addison-Wesley; 1983.

    Syntactic sugars are surface structure constructs which can be macro-expressed by other constructs of the language (they are not orthogonal to the rest of the language).
    Sugar Description Argument
    a<b<=c  and similar combinations equivalent to a<b & b<=c
    e.g. in BCPL or Icon
    conciseness, common usage
    point free notation: function f = g in the context of, e.g., function g(x) = 2*x stands for function f(x) = g(x)
    E.g. in function-oriented languages Lisp, ML, Haskell
    conciseness
    different loop constructs in the same language ... the trend seem in PL design to reduce the variation in loop constructs
    message cascading
    as in Smalltalk-80
    Sends a cascade of messages to the same object:
    OrderedCollection new add:1; add:2; add:3

    Instead of
    temp <- OrderedCollection new.
    temp add:1.
    temp add:2.
    temp add:3

    cleaner [ST80]


    Code Layout Features

    C.f. code decomposition features
    Feature Property Argument
    Tabular notation
    vs
    linear notation

    Parnas tables represent relationships and functions in tabular form,

    and
    AND/OR tables represent boolean formulae (in disjunctive normal form).
    E.g. (p1 /\ not p3) \/ (p4) \/ (p2 /\ p3) =
    p1: T . .
    p2: . . T
    p3: F . T
    p4: . T .
    readability for documentation (in initial development, design reviews, maintenance ...) Linear: ``Standard mathematical notation works well for short formulas, but not for long ones. ... Hierachical structuring and modularity along are not sufficient. The problem is that the standard mathematical notation is, in principle, linear. This makes it poorly readable when many cases have to be considered.''
    Tabular: ``... the most readable definition [of a function with many cases] is that ... where the concept of a table is used.'' [ParnasTables]

    ``It is interesting that both DNGS and TCAS found tabular representations of information most useful for review and reasoning, as opposed to expressions used in MGS, SACEM, and some of the commercial cases.'' [FormalInInd]

    readability for non-programmers (as guarding conditions on state transitions) Linear (predicate calculus) ``Our TCAS [aircraft collision avoidance system] external reviewers (including avionics engineers, component engineers, airline representatives, and pilots), ... did not find this notation natural or reviewable.''
    Tabular: ``we decided to use a tabular representation [instead] ...'' [RSML]
    lexical nesting
    vs
    abstract nesting
    error-prone ``when a nested expression has too many [lexical] nesting levels, humans tend to become confused about the meaning of the code.'' ``Declarations are used ... to reduce the apparent nesting depth of expressions.'' ``Subroutines are used to minimize the depth of [lexical] nesting in a single program unit.'' [Anatomy, 200]
    let
    vs
    anonymous function with arguments
    error-prone The let-construct has better lexical coherence than applying an anonymous function to immedeatly following arguments. ``This lack of lexical coherence makes it awkward and error prone for a human to match up the names with the values ...'' [Anatomy, 30]

    Analysis Requirements & Principles Paradigms Abstractions Domains Features foundational safety flexible typing Syntactics Defining PLs Language List
    Ulf Schünemann 101101