blob: c0890ba98f3f4bda00c81ebf8a28786b54a65445 [file] [log] [blame]
.. index:: ! abstract syntax
Conventions
-----------
WebAssembly is a programming language that has multiple concrete representations
(its :ref:`binary format <binary>` and the :ref:`text format <text>`).
Both map to a common structure.
For conciseness, this structure is described in the form of an *abstract syntax*.
All parts of this specification are defined in terms of this abstract syntax.
.. index:: ! grammar notation, notation
single: abstract syntax; grammar
pair: abstract syntax; notation
.. _grammar:
Grammar Notation
~~~~~~~~~~~~~~~~
The following conventions are adopted in defining grammar rules for abstract syntax.
* Terminal symbols (atoms) are written in sans-serif font: :math:`\K{i32}, \K{end}`.
* Nonterminal symbols are written in italic font: :math:`\X{valtype}, \X{instr}`.
* :math:`A^n` is a sequence of :math:`n\geq 0` iterations of :math:`A`.
* :math:`A^\ast` is a possibly empty sequence of iterations of :math:`A`.
(This is a shorthand for :math:`A^n` used where :math:`n` is not relevant.)
* :math:`A^+` is a non-empty sequence of iterations of :math:`A`.
(This is a shorthand for :math:`A^n` where :math:`n \geq 1`.)
* :math:`A^?` is an optional occurrence of :math:`A`.
(This is a shorthand for :math:`A^n` where :math:`n \leq 1`.)
* Productions are written :math:`\X{sym} ::= A_1 ~|~ \dots ~|~ A_n`.
* Large productions may be split into multiple definitions, indicated by ending the first one with explicit ellipses, :math:`\X{sym} ::= A_1 ~|~ \dots`, and starting continuations with ellipses, :math:`\X{sym} ::= \dots ~|~ A_2`.
* Some productions are augmented with side conditions in parentheses, ":math:`(\iff \X{condition})`", that provide a shorthand for a combinatorial expansion of the production into many separate cases.
.. _notation-epsilon:
.. _notation-length:
.. _notation-index:
.. _notation-slice:
.. _notation-replace:
.. _notation-record:
.. _notation-project:
.. _notation-concat:
.. _notation-compose:
Auxiliary Notation
~~~~~~~~~~~~~~~~~~
When dealing with syntactic constructs the following notation is also used:
* :math:`\epsilon` denotes the empty sequence.
* :math:`|s|` denotes the length of a sequence :math:`s`.
* :math:`s[i]` denotes the :math:`i`-th element of a sequence :math:`s`, starting from :math:`0`.
* :math:`s[i \slice n]` denotes the sub-sequence :math:`s[i]~\dots~s[i+n-1]` of a sequence :math:`s`.
* :math:`s \with [i] = A` denotes the same sequence as :math:`s`,
except that the :math:`i`-th element is replaced with :math:`A`.
* :math:`s \with [i \slice n] = A^n` denotes the same sequence as :math:`s`,
except that the sub-sequence :math:`s[i \slice n]` is replaced with :math:`A^n`.
* :math:`\concat(s^\ast)` denotes the flat sequence formed by concatenating all sequences :math:`s_i` in :math:`s^\ast`.
Moreover, the following conventions are employed:
* The notation :math:`x^n`, where :math:`x` is a non-terminal symbol, is treated as a meta variable ranging over respective sequences of :math:`x` (similarly for :math:`x^\ast`, :math:`x^+`, :math:`x^?`).
* When given a sequence :math:`x^n`,
then the occurrences of :math:`x` in a sequence written :math:`(A_1~x~A_2)^n` are assumed to be in point-wise correspondence with :math:`x^n`
(similarly for :math:`x^\ast`, :math:`x^+`, :math:`x^?`).
This implicitly expresses a form of mapping syntactic constructions over a sequence.
Productions of the following form are interpreted as *records* that map a fixed set of fields :math:`\K{field}_i` to "values" :math:`A_i`, respectively:
.. math::
\X{r} ~::=~ \{ \K{field}_1~A_1, \K{field}_2~A_2, \dots \}
The following notation is adopted for manipulating such records:
* :math:`r.\K{field}` denotes the contents of the :math:`\K{field}` component of :math:`r`.
* :math:`r \with \K{field} = A` denotes the same record as :math:`r`,
except that the contents of the :math:`\K{field}` component is replaced with :math:`A`.
* :math:`r_1 \compose r_2` denotes the composition of two records with the same fields of sequences by appending each sequence point-wise:
.. math::
\{ \K{field}_1\,A_1^\ast, \K{field}_2\,A_2^\ast, \dots \} \compose \{ \K{field}_1\,B_1^\ast, \K{field}_2\,B_2^\ast, \dots \} = \{ \K{field}_1\,A_1^\ast~B_1^\ast, \K{field}_2\,A_2^\ast~B_2^\ast, \dots \}
* :math:`\bigcompose r^\ast` denotes the composition of a sequence of records, respectively; if the sequence is empty, then all fields of the resulting record are empty.
The update notation for sequences and records generalizes recursively to nested components accessed by "paths" :math:`\X{pth} ::= ([\dots] \;| \;.\K{field})^+`:
* :math:`s \with [i]\,\X{pth} = A` is short for :math:`s \with [i] = (s[i] \with \X{pth} = A)`.
* :math:`r \with \K{field}\,\X{pth} = A` is short for :math:`r \with \K{field} = (r.\K{field} \with \X{pth} = A)`.
where :math:`r \with~.\K{field} = A` is shortened to :math:`r \with \K{field} = A`.
.. index:: ! vector
pair: abstract syntax; vector
.. _syntax-vec:
Vectors
~~~~~~~
*Vectors* are bounded sequences of the form :math:`A^n` (or :math:`A^\ast`),
where the :math:`A` can either be values or complex constructions.
A vector can have at most :math:`2^{32}-1` elements.
.. math::
\begin{array}{lllll}
\production{vector} & \vec(A) &::=&
A^n
& (\iff n < 2^{32})\\
\end{array}