blob: 8d383997738421827bdd1edc27873c15f19cc9c6 [file] [log] [blame]
.. index:: ! binary format, module, byte, file extension, abstract syntax
Conventions
-----------
The binary format for WebAssembly :ref:`modules <module>` is a dense linear *encoding* of their :ref:`abstract syntax <syntax-module>`.
[#compression]_
The format is defined by an *attribute grammar* whose only terminal symbols are :ref:`bytes <syntax-byte>`.
A byte sequence is a well-formed encoding of a module if and only if it is generated by the grammar.
Each production of this grammar has exactly one synthesized attribute: the abstract syntax that the respective byte sequence encodes.
Thus, the attribute grammar implicitly defines a *decoding* function
(i.e., a parsing function for the binary format).
Except for a few exceptions, the binary grammar closely mirrors the grammar of the abstract syntax.
.. note::
Some phrases of abstract syntax have multiple possible encodings in the binary format.
For example, numbers may be encoded as if they had optional leading zeros.
Implementations of decoders must support all possible alternatives;
implementations of encoders can pick any allowed encoding.
The recommended extension for files containing WebAssembly modules in binary format is ":math:`\T{.wasm}`"
and the recommended |MediaType|_ is ":math:`\T{application/wasm}`".
.. [#compression]
Additional encoding layers -- for example, introducing compression -- may be defined on top of the basic representation defined here.
However, such layers are outside the scope of the current specification.
.. index:: grammar notation, notation, byte
single: binary format; grammar
pair: binary format; notation
.. _binary-grammar:
Grammar
~~~~~~~
The following conventions are adopted in defining grammar rules for the binary format.
They mirror the conventions used for :ref:`abstract syntax <grammar>`.
In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.
* Terminal symbols are :ref:`bytes <syntax-byte>` expressed in hexadecimal notation: :math:`\hex{0F}`.
* Nonterminal symbols are written in typewriter font: :math:`\B{valtype}, \B{instr}`.
* :math:`B^n` is a sequence of :math:`n\geq 0` iterations of :math:`B`.
* :math:`B^\ast` is a possibly empty sequence of iterations of :math:`B`.
(This is a shorthand for :math:`B^n` used where :math:`n` is not relevant.)
* :math:`B^?` is an optional occurrence of :math:`B`.
(This is a shorthand for :math:`B^n` where :math:`n \leq 1`.)
* :math:`x{:}B` denotes the same language as the nonterminal :math:`B`, but also binds the variable :math:`x` to the attribute synthesized for :math:`B`.
* Productions are written :math:`\B{sym} ::= B_1 \Rightarrow A_1 ~|~ \dots ~|~ B_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\B{sym}` in the given case, usually from attribute variables bound in :math:`B_i`.
* Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases.
.. note::
For example, the :ref:`binary grammar <binary-valtype>` for :ref:`value types <syntax-valtype>` is given as follows:
.. math::
\begin{array}{llcll@{\qquad\qquad}l}
\production{value types} & \Bvaltype &::=&
\hex{7F} &\Rightarrow& \I32 \\ &&|&
\hex{7E} &\Rightarrow& \I64 \\ &&|&
\hex{7D} &\Rightarrow& \F32 \\ &&|&
\hex{7C} &\Rightarrow& \F64 \\
\end{array}
Consequently, the byte :math:`\hex{7F}` encodes the type |I32|,
:math:`\hex{7E}` encodes the type |I64|, and so forth.
No other byte value is allowed as the encoding of a value type.
The :ref:`binary grammar <binary-limits>` for :ref:`limits <syntax-limits>` is defined as follows:
.. math::
\begin{array}{llclll}
\production{limits} & \Blimits &::=&
\hex{00}~~n{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|&
\hex{01}~~n{:}\Bu32~~m{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\
\end{array}
That is, a limits pair is encoded as either the byte :math:`\hex{00}` followed by the encoding of a |U32| value,
or the byte :math:`\hex{01}` followed by two such encodings.
The variables :math:`n` and :math:`m` name the attributes of the respective |Bu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those decode into.
The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values.
.. _binary-notation:
Auxiliary Notation
~~~~~~~~~~~~~~~~~~
When dealing with binary encodings the following notation is also used:
* :math:`\epsilon` denotes the empty byte sequence.
* :math:`||B||` is the length of the byte sequence generated from the production :math:`B` in a derivation.
.. index:: vector
pair: binary format; vector
.. _binary-vec:
Vectors
~~~~~~~
:ref:`Vectors <syntax-vec>` are encoded with their |Bu32| length followed by the encoding of their element sequence.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{vector} & \Bvec(\B{B}) &::=&
n{:}\Bu32~~(x{:}\B{B})^n &\Rightarrow& x^n \\
\end{array}