document/core/text/conventions.rst - external/github.com/WebAssembly/spec - Git at Google

 .. index:: ! text format, Unicode, UTF-8, S-expression, identifier, file extension, abstract syntax

 Conventions
 -----------

 The textual format for WebAssembly :ref:`modules <module>` is a rendering of their :ref:`abstract syntax <syntax-module>` into |SExpressions|_.

 Like the :ref:`binary format <binary>`, the text format is defined by an *attribute grammar*.
 A text string is a well-formed description of a module if and only if it is generated by the grammar.
 Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses.
 Thus, the attribute grammar implicitly defines a *parsing* function.
 Some productions also take a :ref:`context <text-context>` as an inherited attribute
 that records bound :ref:`identifers <text-id>`.

 Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax.
 However, it also defines a number of *abbreviations* that are "syntactic sugar" over the core syntax.

 The recommended extension for files containing WebAssembly modules in text format is ":math:`\T{.wat}`".
 Files with this extension are assumed to be encoded in UTF-8, as per |Unicode|_ (Section 2.5).


 .. index:: grammar notation, notation, Unicode
    single: text format; grammar
    pair: text format; notation
 .. _text-grammar:

 Grammar
 ~~~~~~~

 The following conventions are adopted in defining grammar rules of the text format.
 They mirror the conventions used for :ref:`abstract syntax <grammar>` and for the :ref:`binary format <binary>`.
 In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.

 * Terminal symbols are either literal strings of characters enclosed in quotes
   or expressed as |Unicode|_ code points: :math:`\text{module}`, :math:`\unicode{0A}`.
   (All characters written literally are unambiguously drawn from the 7-bit |ASCII|_ subset of Unicode.)

 * Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`.

 * :math:`T^n` is a sequence of :math:`n\geq 0` iterations  of :math:`T`.

 * :math:`T^\ast` is a possibly empty sequence of iterations of :math:`T`.
   (This is a shorthand for :math:`T^n` used where :math:`n` is not relevant.)

 * :math:`T^+` is a sequence of one or more iterations of :math:`T`.
   (This is a shorthand for :math:`T^n` where :math:`n \geq 1`.)

 * :math:`T^?` is an optional occurrence of :math:`T`.
   (This is a shorthand for :math:`T^n` where :math:`n \leq 1`.)

 * :math:`x{:}T` denotes the same language as the nonterminal :math:`T`, but also binds the variable :math:`x` to the attribute synthesized for :math:`T`.

 * Productions are written :math:`\T{sym} ::= T_1 \Rightarrow A_1 ~|~ \dots ~|~ T_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\T{sym}` in the given case, usually from attribute variables bound in :math:`T_i`.

 * Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.

 .. _text-syntactic:

 * A distinction is made between *lexical* and *syntactic* productions. For the latter, arbitrary :ref:`white space <text-space>` is allowed in any place where the grammar contains spaces. The productions defining :ref:`lexical syntax <text-lexical>` and the syntax of :Ref:`values <text-value>` are considered lexical, all others are syntactic.

 .. note::
    For example, the :ref:`textual grammar <text-valtype>` for :ref:`value types <syntax-valtype>` is given as follows:

    .. math::
      \begin{array}{llcll@{\qquad\qquad}l}
      \production{value types} & \Tvaltype &::=&
        \text{i32} &\Rightarrow& \I32 \\ &&|&
        \text{i64} &\Rightarrow& \I64 \\ &&|&
        \text{f32} &\Rightarrow& \F32 \\ &&|&
        \text{f64} &\Rightarrow& \F64 \\
      \end{array}

    The :ref:`textual grammar <text-limits>` for :ref:`limits <syntax-limits>` is defined as follows:

    .. math::
       \begin{array}{llclll}
       \production{limits} & \Tlimits &::=&
         n{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
         n{:}\Tu32~~m{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
       \end{array}

    The variables :math:`n` and :math:`m` name the attributes of the respective |Tu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those parse into.
    The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.


 .. index:: ! abbreviations, rewrite rule
 .. _text-abbreviations:

 Abbreviations
 ~~~~~~~~~~~~~

 In addition to the core grammar, which corresponds directly to the :ref:`abstract syntax <syntax>`, the textual syntax also defines a number of *abbreviations* that can be used for convenience and readability.

 Abbreviations are defined by *rewrite rules* specifying their expansion into the core syntax:

 .. math::
    \X{abbreviation~syntax} \quad\equiv\quad \X{expanded~syntax}

 These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax.


 .. index:: ! identifier context, identifier, index, index space
 .. _text-context-wf:
 .. _text-context:

 Contexts
 ~~~~~~~~

 The text format allows the use of symbolic :ref:`identifiers <text-id>` in place of :ref:`indices <syntax-index>`.
 To resolve these identifiers into concrete indices,
 some grammar production are indexed by an *identifier context* :math:`I` as a synthesized attribute that records the declared identifiers in each :ref:`index space <syntax-index>`.
 In addition, the context records the types defined in the module, so that :ref:`parameter <text-param>` indices can be computed for :ref:`functions <text-func>`.

 It is convenient to define identifier contexts as :ref:`records <notation-record>` :math:`I` with abstract syntax as follows:

 .. math::
    \begin{array}{llll}
    \production{(identifier context)} & I &::=&
      \begin{array}[t]{l@{~}ll}
      \{ & \ITYPES & (\Tid^?)^\ast, \\
         & \IFUNCS & (\Tid^?)^\ast, \\
         & \ITABLES & (\Tid^?)^\ast, \\
         & \IMEMS & (\Tid^?)^\ast, \\
         & \IGLOBALS & (\Tid^?)^\ast, \\
         & \ILOCALS & (\Tid^?)^\ast, \\
         & \ILABELS & (\Tid^?)^\ast, \\
         & \ITYPEDEFS & \functype^\ast ~\} \\
      \end{array}
    \end{array}

 For each index space, such a context contains the list of :ref:`identifiers <text-id>` assigned to the defined indices.
 Unnamed indices are associated with empty (:math:`\epsilon`) entries in these lists.

 An identifier context is *well-formed* if no index space contains duplicate identifiers.


 Conventions
 ...........

 To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts.
 For example, the record :math:`\{\}` is shorthand for an :ref:`identifier context <text-context>` whose components are all empty.


 .. index:: vector
    pair: text format; vector
 .. _text-vec:

 Vectors
 ~~~~~~~

 :ref:`Vectors <syntax-vec>` are written as plain sequences, but with a restriction on the length of these sequence.

 .. math::
    \begin{array}{llclll@{\qquad\qquad}l}
    \production{vector} & \Tvec(\T{A}) &::=&
      (x{:}\T{A})^n &\Rightarrow& x^n & (\iff n < 2^{32}) \\
    \end{array}
	.. index:: ! text format, Unicode, UTF-8, S-expression, identifier, file extension, abstract syntax

	Conventions
	-----------

	The textual format for WebAssembly :ref:`modules <module>` is a rendering of their :ref:`abstract syntax <syntax-module>` into \|SExpressions\|_.

	Like the :ref:`binary format <binary>`, the text format is defined by an attribute grammar.
	A text string is a well-formed description of a module if and only if it is generated by the grammar.
	Each production of this grammar has at most one synthesized attribute: the abstract syntax that the respective character sequence expresses.
	Thus, the attribute grammar implicitly defines a parsing function.
	Some productions also take a :ref:`context <text-context>` as an inherited attribute
	that records bound :ref:`identifers <text-id>`.

	Except for a few exceptions, the core of the text grammar closely mirrors the grammar of the abstract syntax.
	However, it also defines a number of abbreviations that are "syntactic sugar" over the core syntax.

	The recommended extension for files containing WebAssembly modules in text format is ":math:`\T{.wat}`".
	Files with this extension are assumed to be encoded in UTF-8, as per \|Unicode\|_ (Section 2.5).


	.. index:: grammar notation, notation, Unicode
	single: text format; grammar
	pair: text format; notation
	.. _text-grammar:

	Grammar
	~~~~~~~

	The following conventions are adopted in defining grammar rules of the text format.
	They mirror the conventions used for :ref:`abstract syntax <grammar>` and for the :ref:`binary format <binary>`.
	In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.

	* Terminal symbols are either literal strings of characters enclosed in quotes
	or expressed as \|Unicode\|_ code points: :math:`\text{module}`, :math:`\unicode{0A}`.
	(All characters written literally are unambiguously drawn from the 7-bit \|ASCII\|_ subset of Unicode.)

	* Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`.

	* :math:`T^n` is a sequence of :math:`n\geq 0` iterations of :math:`T`.

	* :math:`T^\ast` is a possibly empty sequence of iterations of :math:`T`.
	(This is a shorthand for :math:`T^n` used where :math:`n` is not relevant.)

	* :math:`T^+` is a sequence of one or more iterations of :math:`T`.
	(This is a shorthand for :math:`T^n` where :math:`n \geq 1`.)

	* :math:`T^?` is an optional occurrence of :math:`T`.
	(This is a shorthand for :math:`T^n` where :math:`n \leq 1`.)

	* :math:`x{:}T` denotes the same language as the nonterminal :math:`T`, but also binds the variable :math:`x` to the attribute synthesized for :math:`T`.

	* Productions are written :math:`\T{sym} ::= T_1 \Rightarrow A_1 ~\|~ \dots ~\|~ T_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\T{sym}` in the given case, usually from attribute variables bound in :math:`T_i`.

	* Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.

	.. _text-syntactic:

	* A distinction is made between lexical and syntactic productions. For the latter, arbitrary :ref:`white space <text-space>` is allowed in any place where the grammar contains spaces. The productions defining :ref:`lexical syntax <text-lexical>` and the syntax of :Ref:`values <text-value>` are considered lexical, all others are syntactic.

	.. note::
	For example, the :ref:`textual grammar <text-valtype>` for :ref:`value types <syntax-valtype>` is given as follows:

	.. math::
	\begin{array}{llcll@{\qquad\qquad}l}
	\production{value types} & \Tvaltype &::=&
	\text{i32} &\Rightarrow& \I32 \\ &&\|&
	\text{i64} &\Rightarrow& \I64 \\ &&\|&
	\text{f32} &\Rightarrow& \F32 \\ &&\|&
	\text{f64} &\Rightarrow& \F64 \\
	\end{array}

	The :ref:`textual grammar <text-limits>` for :ref:`limits <syntax-limits>` is defined as follows:

	.. math::
	\begin{array}{llclll}
	\production{limits} & \Tlimits &::=&
	n{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&\|&
	n{:}\Tu32~~m{:}\Tu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
	\end{array}

	The variables :math:`n` and :math:`m` name the attributes of the respective \|Tu32\| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those parse into.
	The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.


	.. index:: ! abbreviations, rewrite rule
	.. _text-abbreviations:

	Abbreviations
	~~~~~~~~~~~~~

	In addition to the core grammar, which corresponds directly to the :ref:`abstract syntax <syntax>`, the textual syntax also defines a number of abbreviations that can be used for convenience and readability.

	Abbreviations are defined by rewrite rules specifying their expansion into the core syntax:

	.. math::
	\X{abbreviation~syntax} \quad\equiv\quad \X{expanded~syntax}

	These expansions are assumed to be applied, recursively and in order of appearance, before applying the core grammar rules to construct the abstract syntax.


	.. index:: ! identifier context, identifier, index, index space
	.. _text-context-wf:
	.. _text-context:

	Contexts
	~~~~~~~~

	The text format allows the use of symbolic :ref:`identifiers <text-id>` in place of :ref:`indices <syntax-index>`.
	To resolve these identifiers into concrete indices,
	some grammar production are indexed by an identifier context :math:`I` as a synthesized attribute that records the declared identifiers in each :ref:`index space <syntax-index>`.
	In addition, the context records the types defined in the module, so that :ref:`parameter <text-param>` indices can be computed for :ref:`functions <text-func>`.

	It is convenient to define identifier contexts as :ref:`records <notation-record>` :math:`I` with abstract syntax as follows:

	.. math::
	\begin{array}{llll}
	\production{(identifier context)} & I &::=&
	\begin{array}[t]{l@{~}ll}
	\{ & \ITYPES & (\Tid^?)^\ast, \\
	& \IFUNCS & (\Tid^?)^\ast, \\
	& \ITABLES & (\Tid^?)^\ast, \\
	& \IMEMS & (\Tid^?)^\ast, \\
	& \IGLOBALS & (\Tid^?)^\ast, \\
	& \ILOCALS & (\Tid^?)^\ast, \\
	& \ILABELS & (\Tid^?)^\ast, \\
	& \ITYPEDEFS & \functype^\ast ~\} \\
	\end{array}
	\end{array}

	For each index space, such a context contains the list of :ref:`identifiers <text-id>` assigned to the defined indices.
	Unnamed indices are associated with empty (:math:`\epsilon`) entries in these lists.

	An identifier context is well-formed if no index space contains duplicate identifiers.


	Conventions
	...........

	To avoid unnecessary clutter, empty components are omitted when writing out identifier contexts.
	For example, the record :math:`\{\}` is shorthand for an :ref:`identifier context <text-context>` whose components are all empty.


	.. index:: vector
	pair: text format; vector
	.. _text-vec:

	Vectors
	~~~~~~~

	:ref:`Vectors <syntax-vec>` are written as plain sequences, but with a restriction on the length of these sequence.

	.. math::
	\begin{array}{llclll@{\qquad\qquad}l}
	\production{vector} & \Tvec(\T{A}) &::=&
	(x{:}\T{A})^n &\Rightarrow& x^n & (\iff n < 2^{32}) \\
	\end{array}