blob: d93dfbab56918fe1f1817b15327de6bb2db0150b [file] [log] [blame]
.. index:: value
pair: text format; value
.. _text-value:
Values
------
The grammar productions in this section define *lexical syntax*,
hence no :ref:`white space <text-space>` is allowed.
.. index:: integer, unsigned integer, signed integer, uninterpreted integer
pair: text format; integer
pair: text format; unsigned integer
pair: text format; signed integer
pair: text format; uninterpreted integer
.. _text-sign:
.. _text-digit:
.. _text-hexdigit:
.. _text-num:
.. _text-hexnum:
.. _text-sint:
.. _text-uint:
.. _text-int:
Integers
~~~~~~~~
All :ref:`integers <syntax-int>` can be written in either decimal or hexadecimal notation.
In both cases, digits can optionally be separated by underscores.
.. math::
\begin{array}{llclll@{\qquad}l}
\production{sign} & \Tsign &::=&
\epsilon \Rightarrow {+} ~~|~~
\text{+} \Rightarrow {+} ~~|~~
\text{-} \Rightarrow {-} \\
\production{decimal digit} & \Tdigit &::=&
\text{0} \Rightarrow 0 ~~|~~ \dots ~~|~~ \text{9} \Rightarrow 9 \\
\production{hexadecimal digit} & \Thexdigit &::=&
d{:}\Tdigit \Rightarrow d \\ &&|&
\text{A} \Rightarrow 10 ~~|~~ \dots ~~|~~ \text{F} \Rightarrow 15 \\ &&|&
\text{a} \Rightarrow 10 ~~|~~ \dots ~~|~~ \text{f} \Rightarrow 15
\\[1ex]
\production{decimal number} & \Tnum &::=&
d{:}\Tdigit &\Rightarrow& d \\ &&|&
n{:}\Tnum~~\text{\_}^?~~d{:}\Tdigit &\Rightarrow& 10\cdot n + d \\
\production{hexadecimal number} & \Thexnum &::=&
h{:}\Thexdigit &\Rightarrow& h \\ &&|&
n{:}\Thexnum~~\text{\_}^?~~h{:}\Thexdigit &\Rightarrow& 16\cdot n + h \\
\end{array}
The allowed syntax for integer literals depends on size and signedness.
Moreover, their value must lie within the range of the respective type.
.. math::
\begin{array}{llclll@{\qquad}l}
\production{unsigned integer} & \TuN &::=&
n{:}\Tnum &\Rightarrow& n & (\iff n < 2^N) \\ &&|&
\text{0x}~~n{:}\Thexnum &\Rightarrow& n & (\iff n < 2^N) \\
\production{signed integer} & \TsN &::=&
{\pm}{:}\Tsign~~n{:}\Tnum &\Rightarrow& \pm n & (\iff -2^{N-1} \leq \pm n < 2^{N-1}) \\ &&|&
{\pm}{:}\Tsign~~\text{0x}~~n{:}\Thexnum &\Rightarrow& \pm n & (\iff -2^{N-1} \leq \pm n < 2^{N-1}) \\
\end{array}
:ref:`Uninterpreted integers <syntax-int>` can be written as either signed or unsigned, and are normalized to unsigned in the abstract syntax.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{uninterpreted integers} & \TiN &::=&
n{:}\TuN &\Rightarrow& n \\ &&|&
i{:}\TsN &\Rightarrow& n & (\iff i = \signed(n)) \\
\end{array}
.. index:: floating-point number
pair: text format; floating-point number
.. _text-frac:
.. _text-hexfrac:
.. _text-hexfloat:
.. _text-float:
Floating-Point
~~~~~~~~~~~~~~
:ref:`Floating-point <syntax-float>` values can be represented in either decimal or hexadecimal notation.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{decimal floating-point fraction} & \Tfrac &::=&
\epsilon &\Rightarrow& 0 \\ &&|&
d{:}\Tdigit~~q{:}\Tfrac &\Rightarrow& (d+q)/10 \\ &&|&
d{:}\Tdigit~~\text{\_}~~p{:}\Tdigit~~q{:}\Tfrac &\Rightarrow& (d+(p+q)/10)/10 \\
\production{hexadecimal floating-point fraction} & \Thexfrac &::=&
\epsilon &\Rightarrow& 0 \\ &&|&
h{:}\Thexdigit~~q{:}\Thexfrac &\Rightarrow& (h+q)/16 \\ &&|&
h{:}\Thexdigit~~\text{\_}~~~~p{:}\Thexdigit~~q{:}\Thexfrac &\Rightarrow& (h+(p+q)/16)/16 \\
\production{decimal floating-point number} & \Tfloat &::=&
p{:}\Tnum~\text{.}~q{:}\Tfrac
&\Rightarrow& p+q \\ &&|&
p{:}\Tnum~(\text{E}~|~\text{e})~{\pm}{:}\Tsign~e{:}\Tnum
&\Rightarrow& p\cdot 10^{\pm e} \\ &&|&
p{:}\Tnum~\text{.}~q{:}\Tfrac~(\text{E}~|~\text{e})~{\pm}{:}\Tsign~e{:}\Tnum
&\Rightarrow& (p+q)\cdot 10^{\pm e} \\
\production{hexadecimal floating-point number} & \Thexfloat &::=&
\text{0x}~p{:}\Thexnum~\text{.}~q{:}\Thexfrac
&\Rightarrow& p+q \\ &&|&
\text{0x}~p{:}\Thexnum~(\text{P}~|~\text{p})~{\pm}{:}\Tsign~e{:}\Tnum
&\Rightarrow& p\cdot 2^{\pm e} \\ &&|&
\text{0x}~p{:}\Thexnum~\text{.}~q{:}\Thexfrac~(\text{P}~|~\text{p})~{\pm}{:}\Tsign~e{:}\Tnum
&\Rightarrow& (p+q)\cdot 2^{\pm e}
\end{array}
The value of a literal must not lie outside the representable range of the corresponding |IEEE754|_ type
(that is, a numeric value must not overflow to :math:`\pm\mbox{infinity}`),
but it may be :ref:`rounded <aux-ieee>` to the nearest representable value.
.. note::
Rounding can be prevented by using hexadecimal notation with no more significant bits than supported by the required type.
Floating-point values may also be written as constants for *infinity* or *canonical NaN* (*not a number*).
Furthermore, arbitrary NaN values may be expressed by providing an explicit payload value.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{floating-point value} & \TfN &::=&
{\pm}{:}\Tsign~z{:}\TfNmag &\Rightarrow& \pm z \\
\production{floating-point magnitude} & \TfNmag &::=&
z{:}\Tfloat &\Rightarrow& \ieee_N(z) & (\iff \ieee_N(z) \neq \pm \infty) \\ &&|&
z{:}\Thexfloat &\Rightarrow& \ieee_N(z) & (\iff \ieee_N(z) \neq \pm \infty) \\ &&|&
\text{inf} &\Rightarrow& \infty \\ &&|&
\text{nan} &\Rightarrow& \NAN(2^{\significand(N)-1}) \\ &&|&
\text{nan{:}0x}~n{:}\Thexnum &\Rightarrow& \NAN(n) & (\iff 1 \leq n < 2^{\significand(N)}) \\
\end{array}
.. index:: ! string, byte, character, ASCII, Unicode, UTF-8
pair: text format; byte
pair: text format; string
.. _text-byte:
.. _text-string:
Strings
~~~~~~~
*Strings* denote sequences of bytes that can represent both textual and binary data.
They are enclosed in quotation marks
and may contain any character other than |ASCII|_ control characters, quotation marks (:math:`\text{"}`), or backslash (:math:`\text{\backslash}`),
except when expressed with an *escape sequence*.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{string} & \Tstring &::=&
\text{"}~(b^\ast{:}\Tstringelem)^\ast~\text{"}
&\Rightarrow& \concat((b^\ast)^\ast)
& (\iff |\concat((b^\ast)^\ast)| < 2^{32}) \\
\production{string element} & \Tstringelem &::=&
c{:}\Tstringchar &\Rightarrow& \utf8(c) \\ &&|&
\text{\backslash}~n{:}\Thexdigit~m{:}\Thexdigit
&\Rightarrow& 16\cdot n+m \\
\end{array}
Each character in a string literal represents the byte sequence corresponding to its UTF-8 |Unicode|_ (Section 2.5) encoding,
except for hexadecimal escape sequences :math:`\textl\backslash hh\textr`, which represent raw bytes of the respective value.
.. math::
\begin{array}{llclll@{\qquad\qquad}l}
\production{string character} & \Tstringchar &::=&
c{:}\Tchar &\Rightarrow& c \qquad
& (\iff c \geq \unicode{20} \wedge c \neq \unicode{7F} \wedge c \neq \text{"} \wedge c \neq \text{\backslash}) \\ &&|&
\text{\backslash t} &\Rightarrow& \unicode{09} \\ &&|&
\text{\backslash n} &\Rightarrow& \unicode{0A} \\ &&|&
\text{\backslash r} &\Rightarrow& \unicode{0D} \\ &&|&
\text{\backslash{"}} &\Rightarrow& \unicode{22} \\ &&|&
\text{\backslash{'}} &\Rightarrow& \unicode{27} \\ &&|&
\text{\backslash\backslash} &\Rightarrow& \unicode{5C} \\ &&|&
\text{\backslash u\{}~n{:}\Thexnum~\text{\}}
&\Rightarrow& \unicode{(n)} & (\iff n < \hex{D800} \vee \hex{E000} \leq n < \hex{110000}) \\
\end{array}
.. index:: name, byte, character, character
pair: text format; name
.. _text-name:
Names
~~~~~
:ref:`Names <syntax-name>` are strings denoting a literal character sequence.
A name string must form a valid UTF-8 encoding as defined by |Unicode|_ (Section 2.5) and is interpreted as a string of Unicode scalar values.
.. math::
\begin{array}{llclll@{\qquad}l}
\production{name} & \Tname &::=&
b^\ast{:}\Tstring &\Rightarrow& c^\ast & (\iff b^\ast = \utf8(c^\ast)) \\
\end{array}
.. note::
Presuming the source text is itself encoded correctly,
strings that do not contain any uses of hexadecimal byte escapes are always valid names.
.. index:: ! identifiers
pair: text format; identifiers
.. _text-idchar:
.. _text-id:
Identifiers
~~~~~~~~~~~
:ref:`Indices <syntax-index>` can be given in both numeric and symbolic form.
Symbolic *identifiers* that stand in lieu of indices start with :math:`\text{\$}`, followed by any sequence of printable |ASCII|_ characters that does not contain a space, quotation mark, comma, semicolon, or bracket.
.. math::
\begin{array}{llclll@{\qquad}l}
\production{identifier} & \Tid &::=&
\text{\$}~\Tidchar^+ \\
\production{identifier character} & \Tidchar &::=&
\text{0} ~~|~~ \dots ~~|~~ \text{9} \\ &&|&
\text{A} ~~|~~ \dots ~~|~~ \text{Z} \\ &&|&
\text{a} ~~|~~ \dots ~~|~~ \text{z} \\ &&|&
\text{!} ~~|~~
\text{\#} ~~|~~
\text{\$} ~~|~~
\text{\%} ~~|~~
\text{\&} ~~|~~
\text{'} ~~|~~
\text{*} ~~|~~
\text{+} ~~|~~
\text{-} ~~|~~
\text{.} ~~|~~
\text{/} \\ &&|&
\text{:} ~~|~~
\text{<} ~~|~~
\text{=} ~~|~~
\text{>} ~~|~~
\text{?} ~~|~~
\text{@} ~~|~~
\text{\backslash} ~~|~~
\text{\hat{~~}} ~~|~~
\text{\_} ~~|~~
\text{\grave{~~}} ~~|~~
\text{|} ~~|~~
\text{\tilde{~~}} \\
\end{array}
.. _text-id-fresh:
Conventions
...........
The expansion rules of some abbreviations require insertion of a *fresh* identifier.
That may be any syntactically valid identifier that does not already occur in the given source text.