[spec] More precise Unicode terminology (#1002)
diff --git a/document/core/appendix/embedding.rst b/document/core/appendix/embedding.rst
index 48b601b..9558bf1 100644
--- a/document/core/appendix/embedding.rst
+++ b/document/core/appendix/embedding.rst
@@ -82,10 +82,10 @@
.. index:: text format
.. _embed-parse-module:
-:math:`\F{parse\_module}(\codepoint^\ast) : \module ~|~ \error`
+:math:`\F{parse\_module}(\char^\ast) : \module ~|~ \error`
...............................................................
-1. If there exists a derivation for the :ref:`source <text-source>` :math:`\codepoint^\ast` as a :math:`\Tmodule` according to the :ref:`text grammar for modules <text-module>`, yielding a :ref:`module <syntax-module>` :math:`m`, then return :math:`m`.
+1. If there exists a derivation for the :ref:`source <text-source>` :math:`\char^\ast` as a :math:`\Tmodule` according to the :ref:`text grammar for modules <text-module>`, yielding a :ref:`module <syntax-module>` :math:`m`, then return :math:`m`.
2. Else, return :math:`\ERROR`.
diff --git a/document/core/appendix/implementation.rst b/document/core/appendix/implementation.rst
index 83d7d87..addd57c 100644
--- a/document/core/appendix/implementation.rst
+++ b/document/core/appendix/implementation.rst
@@ -24,7 +24,7 @@
Syntactic Limits
~~~~~~~~~~~~~~~~
-.. index:: abstract syntax, module, type, function, table, memory, global, element, data, import, export, parameter, result, local, structured control instruction, instruction, name, Unicode, code point
+.. index:: abstract syntax, module, type, function, table, memory, global, element, data, import, export, parameter, result, local, structured control instruction, instruction, name, Unicode, character
.. _impl-syntax:
Structure
@@ -52,7 +52,7 @@
* the length of an :ref:`element segment <syntax-elem>`
* the length of a :ref:`data segment <syntax-data>`
* the length of a :ref:`name <syntax-name>`
-* the range of :ref:`code points <syntax-codepoint>` in a :ref:`name <syntax-name>`
+* the range of :ref:`characters <syntax-char>` in a :ref:`name <syntax-name>`
If the limits of an implementation are exceeded for a given module,
then the implementation may reject the :ref:`validation <valid>`, compilation, or :ref:`instantiation <exec-instantiation>` of that module with an embedder-specific error.
@@ -91,7 +91,7 @@
* the size of an individual :ref:`token <text-token>`
* the nesting depth of :ref:`folded instructions <text-foldedinstr>`
* the length of symbolic :ref:`identifiers <text-id>`
-* the range of literal :ref:`characters <text-char>` (code points) allowed in the :ref:`source text <source>`
+* the range of literal :ref:`characters <text-char>` allowed in the :ref:`source text <source>`
.. index:: validation, function
diff --git a/document/core/binary/values.rst b/document/core/binary/values.rst
index 772309f..c626862 100644
--- a/document/core/binary/values.rst
+++ b/document/core/binary/values.rst
@@ -105,7 +105,7 @@
Names
~~~~~
-:ref:`Names <syntax-name>` are encoded as a :ref:`vector <binary-vec>` of bytes containing the |Unicode|_ (Section 3.9) UTF-8 encoding of the name's code point sequence.
+:ref:`Names <syntax-name>` are encoded as a :ref:`vector <binary-vec>` of bytes containing the |Unicode|_ (Section 3.9) UTF-8 encoding of the name's character sequence.
.. math::
\begin{array}{llclllll}
diff --git a/document/core/intro/introduction.rst b/document/core/intro/introduction.rst
index f952302..c190d3c 100644
--- a/document/core/intro/introduction.rst
+++ b/document/core/intro/introduction.rst
@@ -75,7 +75,7 @@
These will each define a WebAssembly *application programming interface (API)* suitable for a given environment.
-.. index:: IEEE 754, floating point, Unicode, name, text format, UTF-8, code point
+.. index:: IEEE 754, floating point, Unicode, name, text format, UTF-8, character
.. _dependencies:
Dependencies
@@ -88,7 +88,7 @@
* |Unicode|_, for the representation of import/export :ref:`names <syntax-name>` and the :ref:`text format <text>`.
However, to make this specification self-contained, relevant aspects of the aforementioned standards are defined and formalized as part of this specification,
-such as the :ref:`binary representation <aux-fbits>` and :ref:`rounding <aux-ieee>` of floating-point values, and the :ref:`value range <syntax-codepoint>` and :ref:`UTF-8 encoding <binary-utf8>` of Unicode characters.
+such as the :ref:`binary representation <aux-fbits>` and :ref:`rounding <aux-ieee>` of floating-point values, and the :ref:`value range <syntax-char>` and :ref:`UTF-8 encoding <binary-utf8>` of Unicode characters.
.. note::
The aforementioned standards are the authoritative source of all respective definitions.
diff --git a/document/core/syntax/values.rst b/document/core/syntax/values.rst
index 69385e6..7443842 100644
--- a/document/core/syntax/values.rst
+++ b/document/core/syntax/values.rst
@@ -146,21 +146,21 @@
* The meta variable :math:`z` ranges over floating-point values where clear from context.
-.. index:: ! name, byte, Unicode, UTF-8, code point, binary format
+.. index:: ! name, byte, Unicode, UTF-8, character, binary format
pair: abstract syntax; name
-.. _syntax-codepoint:
+.. _syntax-char:
.. _syntax-name:
Names
~~~~~
-*Names* are sequences of scalar *code points* as defined by |Unicode|_ (Section 2.4).
+*Names* are sequences of *characters*, which are *scalar values* as defined by |Unicode|_ (Section 2.4).
.. math::
\begin{array}{llclll}
\production{name} & \name &::=&
- \codepoint^\ast \qquad\qquad (\iff |\utf8(\codepoint^\ast)| < 2^{32}) \\
- \production{code point} & \codepoint &::=&
+ \char^\ast \qquad\qquad (\iff |\utf8(\char^\ast)| < 2^{32}) \\
+ \production{character} & \char &::=&
\unicode{00} ~|~ \dots ~|~ \unicode{D7FF} ~|~
\unicode{E000} ~|~ \dots ~|~ \unicode{10FFFF} \\
\end{array}
@@ -172,4 +172,4 @@
Convention
..........
-* Code points are sometimes used interchangeably with natural numbers :math:`n < 1114112`.
+* Characters (Unicode scalar values) are sometimes used interchangeably with natural numbers :math:`n < 1114112`.
diff --git a/document/core/text/conventions.rst b/document/core/text/conventions.rst
index 1d87c8c..c73e9c0 100644
--- a/document/core/text/conventions.rst
+++ b/document/core/text/conventions.rst
@@ -32,7 +32,7 @@
In order to distinguish symbols of the textual syntax from symbols of the abstract syntax, :math:`\mathtt{typewriter}` font is adopted for the former.
* Terminal symbols are either literal strings of characters enclosed in quotes
- or expressed as |Unicode|_ code points: :math:`\text{module}`, :math:`\unicode{0A}`.
+ or expressed as |Unicode|_ scalar values: :math:`\text{module}`, :math:`\unicode{0A}`.
(All characters written literally are unambiguously drawn from the 7-bit |ASCII|_ subset of Unicode.)
* Nonterminal symbols are written in typewriter font: :math:`\T{valtype}, \T{instr}`.
diff --git a/document/core/text/lexical.rst b/document/core/text/lexical.rst
index b4df167..a3b4529 100644
--- a/document/core/text/lexical.rst
+++ b/document/core/text/lexical.rst
@@ -5,7 +5,7 @@
--------------
-.. index:: ! character, Unicode, ASCII, code point, ! source text
+.. index:: ! character, Unicode, ASCII, character, ! source text
pair: text format; character
.. _source:
.. _text-source:
@@ -15,7 +15,7 @@
~~~~~~~~~~
The text format assigns meaning to *source text*, which consists of a sequence of *characters*.
-Characters are assumed to be represented as valid |Unicode|_ (Section 2.4) *code points*.
+Characters are assumed to be represented as valid |Unicode|_ (Section 2.4) *scalar values*.
.. math::
\begin{array}{llll}
diff --git a/document/core/text/values.rst b/document/core/text/values.rst
index 51be72e..d93dfba 100644
--- a/document/core/text/values.rst
+++ b/document/core/text/values.rst
@@ -179,7 +179,7 @@
\end{array}
-.. index:: name, byte, character, code point
+.. index:: name, byte, character, character
pair: text format; name
.. _text-name:
@@ -187,7 +187,7 @@
~~~~~
:ref:`Names <syntax-name>` are strings denoting a literal character sequence.
-A name string must form a valid UTF-8 encoding as defined by |Unicode|_ (Section 2.5) and is interpreted as a string of Unicode code points.
+A name string must form a valid UTF-8 encoding as defined by |Unicode|_ (Section 2.5) and is interpreted as a string of Unicode scalar values.
.. math::
\begin{array}{llclll@{\qquad}l}
diff --git a/document/core/util/macros.def b/document/core/util/macros.def
index 67d9b85..34d821c 100644
--- a/document/core/util/macros.def
+++ b/document/core/util/macros.def
@@ -150,7 +150,7 @@
.. |f64| mathdef:: \xref{syntax/values}{syntax-float}{\fX{\X{64}}}
.. |name| mathdef:: \xref{syntax/values}{syntax-name}{\X{name}}
-.. |codepoint| mathdef:: \xref{syntax/values}{syntax-name}{\X{codepoint}}
+.. |char| mathdef:: \xref{syntax/values}{syntax-name}{\X{char}}
.. Values, meta functions
@@ -434,7 +434,7 @@
.. |Bf64| mathdef:: \xref{binary/values}{binary-float}{\BfX{\B{64}}}
.. |Bname| mathdef:: \xref{binary/values}{binary-name}{\B{name}}
-.. |Bcodepoint| mathdef:: \xref{binary/values}{binary-name}{\B{codepoint}}
+.. |Bchar| mathdef:: \xref{binary/values}{binary-name}{\B{char}}
.. Values, meta functions
@@ -593,7 +593,7 @@
.. |Tstringelem| mathdef:: \xref{text/values}{text-string}{\T{stringelem}}
.. |Tstringchar| mathdef:: \xref{text/values}{text-string}{\T{stringchar}}
.. |Tname| mathdef:: \xref{text/values}{text-name}{\T{name}}
-.. |Tcodepoint| mathdef:: \xref{text/values}{text-name}{\T{codepoint}}
+.. |Tchar| mathdef:: \xref{text/values}{text-name}{\T{char}}
.. |Tcodeval| mathdef:: \xref{text/values}{text-name}{\T{codeval}}
.. |Tcodecont| mathdef:: \xref{text/values}{text-name}{\T{cont}}