blob: 1d87c8ce7938af877f32404611a3a5cfc8e476a0 [file] [log] [blame]
.. index:: ! text format, Unicode, UTF-8, S-expression, identifier, file extension, abstract syntax
Conventions
-----------
The textual format for WebAssembly :ref:`modules <module>` is a rendering of their :ref:`abstract syntax <syntax-module>` into |SExpressions|_.
Like the :ref:`binary format <binary>`, the text format is defined by an *attribute grammar*.
A text string is a well-formed description of a module if and only if it is generated by the grammar.
Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses.
Thus, the attribute grammar implicitly defines a *parsing* function.
Some productions also take a :ref:`context <text-context>` as an inherited attribute
that records bound :ref:`identifers <text-id>`.
Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax.
However, it also defines a number of *abbreviations* that are "syntactic sugar" over the core syntax.
The recommended extension for files containing WebAssembly modules in text format is ":math:`\T{.wat}`".
Files with this extension are assumed to be encoded in UTF-8, as per |Unicode|_ (Section 2.5).
.. index:: grammar notation, notation, Unicode
single: text format; grammar
pair: text format; notation
.. _text-grammar:
Grammar
~~~~~~~
The following conventions are adopted in defining grammar rules of the text format.
They mirror the conventions used for :ref:`abstract syntax <grammar>` and for the :ref:`binary format <binary>`.
In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.
* Terminal symbols are either literal strings of characters enclosed in quotes
or expressed as |Unicode|_ code points: :math:`\text{module}`, :math:`\unicode{0A}`.
(All characters written literally are unambiguously drawn from the 7-bit |ASCII|_ subset of Unicode.)
* Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`.
* :math:`T^n` is a sequence of :math:`n\geq 0` iterations of :math:`T`.
* :math:`T^\ast` is a possibly empty sequence of iterations of :math:`T`.
(This is a shorthand for :math:`T^n` used where :math:`n` is not relevant.)
* :math:`T^+` is a sequence of one or more iterations of :math:`T`.
(This is a shorthand for :math:`T^n` where :math:`n \geq 1`.)
* :math:`T^?` is an optional occurrence of :math:`T`.
(This is a shorthand for :math:`T^n` where :math:`n \leq 1`.)
* :math:`x{:}T` denotes the same language as the nonterminal :math:`T`, but also binds the variable :math:`x` to the attribute synthesized for :math:`T`.
* Productions are written :math:`\T{sym} ::= T_1 \Rightarrow A_1 ~|~ \dots ~|~ T_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\T{sym}` in the given case, usually from attribute variables bound in :math:`T_i`.
* Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.
.. _text-syntactic:
* A distinction is made between *lexical* and *syntactic* productions. For the latter, arbitrary :ref:`white space <text-space>` is allowed in any place where the grammar contains spaces. The productions defining :ref:`lexical syntax <text-lexical>` and the syntax of :Ref:`values <text-value>` are considered lexical, all others are syntactic.
.. note::
For example, the :ref:`textual grammar <text-valtype>` for :ref:`value types <syntax-valtype>` is given as follows:
.. math::
\begin{array}{llcll@{\qquad\qquad}l}
\production{value types} & \Tvaltype &::=&
\text{i32} &\Rightarrow& \I32 \\ &&|&
\text{i64} &\Rightarrow& \I64 \\ &&|&
\text{f32} &\Rightarrow& \F32 \\ &&|&
\text{f64} &\Rightarrow& \F64 \\
\end{array}
The :ref:`textual grammar <text-limits>` for :ref:`limits <syntax-limits>` is defined as follows:
.. math::
\begin{array}{llclll}
\production{limits} & \Tlimits &::=&
n{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
n{:}\Tu32~~m{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
\end{array}
The variables :math:`n` and :math:`m` name the attributes of the respective |Tu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those parse into.
The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.
.. index:: ! abbreviations, rewrite rule
.. _text-abbreviations:
Abbreviations
~~~~~~~~~~~~~
In addition to the core grammar, which corresponds directly to the :ref:`abstract syntax <syntax>`, the textual syntax also defines a number of *abbreviations* that can be used for convenience and readability.
Abbreviations are defined by *rewrite rules* specifying their expansion into the core syntax:
.. math::
\X{abbreviation~syntax} \quad\equiv\quad \X{expanded~syntax}
These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax.
.. index:: ! identifier context, identifier, index, index space
.. _text-context-wf:
.. _text-context:
Contexts
~~~~~~~~
The text format allows the use of symbolic :ref:`identifiers <text-id>` in place of :ref:`indices <syntax-index>`.
To resolve these identifiers into concrete indices,
some grammar production are indexed by an *identifier context* :math:`I` as a synthesized attribute that records the declared identifiers in each :ref:`index space <syntax-index>`.
In addition, the context records the types defined in the module, so that :ref:`parameter <text-param>` indices can be computed for :ref:`functions <text-func>`.
It is convenient to define identifier contexts as :ref:`records <notation-record>` :math:`I` with abstract syntax as follows:
.. math::
\begin{array}{llll}
\production{(identifier context)} & I &::=&
\begin{array}[t]{l@{~}ll}
\{ & \ITYPES & (\Tid^?)^\ast, \\
& \IFUNCS & (\Tid^?)^\ast, \\
& \ITABLES & (\Tid^?)^\ast, \\
& \IMEMS & (\Tid^?)^\ast, \\
& \IGLOBALS & (\Tid^?)^\ast, \\
& \ILOCALS & (\Tid^?)^\ast, \\
& \ILABELS & (\Tid^?)^\ast, \\
& \ITYPEDEFS & \functype^\ast ~\} \\
\end{array}
\end{array}
For each index space, such a context contains the list of :ref:`identifiers <text-id>` assigned to the defined indices.
Unnamed indices are associated with empty (:math:`\epsilon`) entries in these lists.
An identifier context is *well-formed* if no index space contains duplicate identifiers.
Conventions
...........
To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts.
For example, the record :math:`\{\}` is shorthand for an :ref:`identifier context <text-context>` whose components are all empty.
.. index:: vector
pair: text format; vector
.. _text-vec:
Vectors
~~~~~~~
:ref:`Vectors <syntax-vec>` are written as plain sequences, but with a restriction on the length of these sequence.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{vector} & \Tvec(\T{A}) &::=&
(x{:}\T{A})^n &\Rightarrow& x^n & (\iff n < 2^{32}) \\
\end{array}