| .. index:: ! binary format, module, byte, file extension, abstract syntax |
| |
| Conventions |
| ----------- |
| |
| The binary format for WebAssembly :ref:`modules <module>` is a dense linear *encoding* of their :ref:`abstract syntax <syntax-module>`. |
| [#compression]_ |
| |
| The format is defined by an *attribute grammar* whose only terminal symbols are :ref:`bytes <syntax-byte>`. |
| A byte sequence is a well-formed encoding of a module if and only if it is generated by the grammar. |
| |
| Each production of this grammar has exactly one synthesized attribute: the abstract syntax that the respective byte sequence encodes. |
| Thus, the attribute grammar implicitly defines a *decoding* function |
| (i.e., a parsing function for the binary format). |
| |
| Except for a few exceptions, the binary grammar closely mirrors the grammar of the abstract syntax. |
| |
| .. note:: |
| Some phrases of abstract syntax have multiple possible encodings in the binary format. |
| For example, numbers may be encoded as if they had optional leading zeros. |
| Implementations of decoders must support all possible alternatives; |
| implementations of encoders can pick any allowed encoding. |
| |
| The recommended extension for files containing WebAssembly modules in binary format is ":math:`\T{.wasm}`" |
| and the recommended |MediaType|_ is ":math:`\T{application/wasm}`". |
| |
| .. [#compression] |
| Additional encoding layers -- for example, introducing compression -- may be defined on top of the basic representation defined here. |
| However, such layers are outside the scope of the current specification. |
| |
| |
| .. index:: grammar notation, notation, byte |
| single: binary format; grammar |
| pair: binary format; notation |
| .. _binary-grammar: |
| |
| Grammar |
| ~~~~~~~ |
| |
| The following conventions are adopted in defining grammar rules for the binary format. |
| They mirror the conventions used for :ref:`abstract syntax <grammar>`. |
| In order to distinguish symbols of the binary syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former. |
| |
| * Terminal symbols are :ref:`bytes <syntax-byte>` expressed in hexadecimal notation: :math:`\hex{0F}`. |
| |
| * Nonterminal symbols are written in typewriter font: :math:`\B{valtype}, \B{instr}`. |
| |
| * :math:`B^n` is a sequence of :math:`n\geq 0` iterations of :math:`B`. |
| |
| * :math:`B^\ast` is a possibly empty sequence of iterations of :math:`B`. |
| (This is a shorthand for :math:`B^n` used where :math:`n` is not relevant.) |
| |
| * :math:`B^?` is an optional occurrence of :math:`B`. |
| (This is a shorthand for :math:`B^n` where :math:`n \leq 1`.) |
| |
| * :math:`x{:}B` denotes the same language as the nonterminal :math:`B`, but also binds the variable :math:`x` to the attribute synthesized for :math:`B`. |
| |
| * Productions are written :math:`\B{sym} ::= B_1 \Rightarrow A_1 ~|~ \dots ~|~ B_n \Rightarrow A_n`, where each :math:`A_i` is the attribute that is synthesized for :math:`\B{sym}` in the given case, usually from attribute variables bound in :math:`B_i`. |
| |
| * Some productions are augmented by side conditions in parentheses, which restrict the applicability of the production. They provide a shorthand for a combinatorial expansion of the production into many separate cases. |
| |
| .. note:: |
| For example, the :ref:`binary grammar <binary-valtype>` for :ref:`value types <syntax-valtype>` is given as follows: |
| |
| .. math:: |
| \begin{array}{llcll@{\qquad\qquad}l} |
| \production{value types} & \Bvaltype &::=& |
| \hex{7F} &\Rightarrow& \I32 \\ &&|& |
| \hex{7E} &\Rightarrow& \I64 \\ &&|& |
| \hex{7D} &\Rightarrow& \F32 \\ &&|& |
| \hex{7C} &\Rightarrow& \F64 \\ |
| \end{array} |
| |
| Consequently, the byte :math:`\hex{7F}` encodes the type |I32|, |
| :math:`\hex{7E}` encodes the type |I64|, and so forth. |
| No other byte value is allowed as the encoding of a value type. |
| |
| The :ref:`binary grammar <binary-limits>` for :ref:`limits <syntax-limits>` is defined as follows: |
| |
| .. math:: |
| \begin{array}{llclll} |
| \production{limits} & \Blimits &::=& |
| \hex{00}~~n{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~\epsilon \} \\ &&|& |
| \hex{01}~~n{:}\Bu32~~m{:}\Bu32 &\Rightarrow& \{ \LMIN~n, \LMAX~m \} \\ |
| \end{array} |
| |
| That is, a limits pair is encoded as either the byte :math:`\hex{00}` followed by the encoding of a |U32| value, |
| or the byte :math:`\hex{01}` followed by two such encodings. |
| The variables :math:`n` and :math:`m` name the attributes of the respective |Bu32| nonterminals, which in this case are the actual :ref:`unsigned integers <syntax-uint>` those decode into. |
| The attribute of the complete production then is the abstract syntax for the limit, expressed in terms of the former values. |
| |
| |
| .. _binary-notation: |
| |
| Auxiliary Notation |
| ~~~~~~~~~~~~~~~~~~ |
| |
| When dealing with binary encodings the following notation is also used: |
| |
| * :math:`\epsilon` denotes the empty byte sequence. |
| |
| * :math:`||B||` is the length of the byte sequence generated from the production :math:`B` in a derivation. |
| |
| |
| .. index:: vector |
| pair: binary format; vector |
| .. _binary-vec: |
| |
| Vectors |
| ~~~~~~~ |
| |
| :ref:`Vectors <syntax-vec>` are encoded with their |Bu32| length followed by the encoding of their element sequence. |
| |
| .. math:: |
| \begin{array}{llclll@{\qquad\qquad}l} |
| \production{vector} & \Bvec(\B{B}) &::=& |
| n{:}\Bu32~~(x{:}\B{B})^n &\Rightarrow& x^n \\ |
| \end{array} |