| .. index:: ! text format, Unicode, UTF-8, S-expression, identifier, file extension, abstract syntax |
| |
| Conventions |
| ----------- |
| |
| The textual format for WebAssembly :ref:`modules <module>` is a rendering of their :ref:`abstract syntax <syntax-module>` into |SExpressions|_. |
| |
| Like the :ref:`binary format <binary>`, the text format is defined by an *attribute grammar*. |
| A text string is a well-formed description of a module if and only if it is generated by the grammar. |
| Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses. |
| Thus, the attribute grammar implicitly defines a *parsing* function. |
| Some productions also take a :ref:`context <text-context>` as an inherited attribute |
| that records bound :ref:`identifers <text-id>`. |
| |
| Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax. |
| However, it also defines a number of *abbreviations* that are "syntactic sugar" over the core syntax. |
| |
| The recommended extension for files containing WebAssembly modules in text format is ":math:`\T{.wat}`". |
| Files with this extension are assumed to be encoded in UTF-8, as per |Unicode|_ (Section 2.5). |
| |
| |
| .. index:: grammar notation, notation, Unicode |
| single: text format; grammar |
| pair: text format; notation |
| .. _text-grammar: |
| |
| Grammar |
| ~~~~~~~ |
| |
| The following conventions are adopted in defining grammar rules of the text format. |
| They mirror the conventions used for :ref:`abstract syntax <grammar>` and for the :ref:`binary format <binary>`. |
| In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former. |
| |
| * Terminal symbols are either literal strings of characters enclosed in quotes |
| or expressed as |Unicode|_ code points: :math:`\text{module}`, :math:`\unicode{0A}`. |
| (All characters written literally are unambiguously drawn from the 7-bit |ASCII|_ subset of Unicode.) |
| |
| * Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`. |
| |
| * :math:`T^n` is a sequence of :math:`n\geq 0` iterations of :math:`T`. |
| |
| * :math:`T^\ast` is a possibly empty sequence of iterations of :math:`T`. |
| (This is a shorthand for :math:`T^n` used where :math:`n` is not relevant.) |
| |
| * :math:`T^+` is a sequence of one or more iterations of :math:`T`. |
| (This is a shorthand for :math:`T^n` where :math:`n \geq 1`.) |
| |
| * :math:`T^?` is an optional occurrence of :math:`T`. |
| (This is a shorthand for :math:`T^n` where :math:`n \leq 1`.) |
| |
| * :math:`x{:}T` denotes the same language as the nonterminal :math:`T`, but also binds the variable :math:`x` to the attribute synthesized for :math:`T`. |
| |
| * Productions are written :math:`\T{sym} ::= T_1 \Rightarrow A_1 ~|~ \dots ~|~ T_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\T{sym}` in the given case, usually from attribute variables bound in :math:`T_i`. |
| |
| * Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases. |
| |
| .. _text-syntactic: |
| |
| * A distinction is made between *lexical* and *syntactic* productions. For the latter, arbitrary :ref:`white space <text-space>` is allowed in any place where the grammar contains spaces. The productions defining :ref:`lexical syntax <text-lexical>` and the syntax of :Ref:`values <text-value>` are considered lexical, all others are syntactic. |
| |
| .. note:: |
| For example, the :ref:`textual grammar <text-valtype>` for :ref:`value types <syntax-valtype>` is given as follows: |
| |
| .. math:: |
| \begin{array}{llcll@{\qquad\qquad}l} |
| \production{value types} & \Tvaltype &::=& |
| \text{i32} &\Rightarrow& \I32 \\ &&|& |
| \text{i64} &\Rightarrow& \I64 \\ &&|& |
| \text{f32} &\Rightarrow& \F32 \\ &&|& |
| \text{f64} &\Rightarrow& \F64 \\ |
| \end{array} |
| |
| The :ref:`textual grammar <text-limits>` for :ref:`limits <syntax-limits>` is defined as follows: |
| |
| .. math:: |
| \begin{array}{llclll} |
| \production{limits} & \Tlimits &::=& |
| n{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|& |
| n{:}\Tu32~~m{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\ |
| \end{array} |
| |
| The variables :math:`n` and :math:`m` name the attributes of the respective |Tu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those parse into. |
| The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values. |
| |
| |
| .. index:: ! abbreviations, rewrite rule |
| .. _text-abbreviations: |
| |
| Abbreviations |
| ~~~~~~~~~~~~~ |
| |
| In addition to the core grammar, which corresponds directly to the :ref:`abstract syntax <syntax>`, the textual syntax also defines a number of *abbreviations* that can be used for convenience and readability. |
| |
| Abbreviations are defined by *rewrite rules* specifying their expansion into the core syntax: |
| |
| .. math:: |
| \X{abbreviation~syntax} \quad\equiv\quad \X{expanded~syntax} |
| |
| These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax. |
| |
| |
| .. index:: ! identifier context, identifier, index, index space |
| .. _text-context-wf: |
| .. _text-context: |
| |
| Contexts |
| ~~~~~~~~ |
| |
| The text format allows the use of symbolic :ref:`identifiers <text-id>` in place of :ref:`indices <syntax-index>`. |
| To resolve these identifiers into concrete indices, |
| some grammar production are indexed by an *identifier context* :math:`I` as a synthesized attribute that records the declared identifiers in each :ref:`index space <syntax-index>`. |
| In addition, the context records the types defined in the module, so that :ref:`parameter <text-param>` indices can be computed for :ref:`functions <text-func>`. |
| |
| It is convenient to define identifier contexts as :ref:`records <notation-record>` :math:`I` with abstract syntax as follows: |
| |
| .. math:: |
| \begin{array}{llll} |
| \production{(identifier context)} & I &::=& |
| \begin{array}[t]{l@{~}ll} |
| \{ & \ITYPES & (\Tid^?)^\ast, \\ |
| & \IFUNCS & (\Tid^?)^\ast, \\ |
| & \ITABLES & (\Tid^?)^\ast, \\ |
| & \IMEMS & (\Tid^?)^\ast, \\ |
| & \IGLOBALS & (\Tid^?)^\ast, \\ |
| & \ILOCALS & (\Tid^?)^\ast, \\ |
| & \ILABELS & (\Tid^?)^\ast, \\ |
| & \ITYPEDEFS & \functype^\ast ~\} \\ |
| \end{array} |
| \end{array} |
| |
| For each index space, such a context contains the list of :ref:`identifiers <text-id>` assigned to the defined indices. |
| Unnamed indices are associated with empty (:math:`\epsilon`) entries in these lists. |
| |
| An identifier context is *well-formed* if no index space contains duplicate identifiers. |
| |
| |
| Conventions |
| ........... |
| |
| To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts. |
| For example, the record :math:`\{\}` is shorthand for an :ref:`identifier context <text-context>` whose components are all empty. |
| |
| |
| .. index:: vector |
| pair: text format; vector |
| .. _text-vec: |
| |
| Vectors |
| ~~~~~~~ |
| |
| :ref:`Vectors <syntax-vec>` are written as plain sequences, but with a restriction on the length of these sequence. |
| |
| .. math:: |
| \begin{array}{llclll@{\qquad\qquad}l} |
| \production{vector} & \Tvec(\T{A}) &::=& |
| (x{:}\T{A})^n &\Rightarrow& x^n & (\iff n < 2^{32}) \\ |
| \end{array} |