| |
| .. _lexical: |
| |
| **************** |
| Lexical analysis |
| **************** |
| |
| .. index:: lexical analysis, parser, token |
| |
| A Python program is read by a *parser*. Input to the parser is a stream of |
| :term:`tokens <token>`, generated by the *lexical analyzer* (also known as |
| the *tokenizer*). |
| This chapter describes how the lexical analyzer produces these tokens. |
| |
| The lexical analyzer determines the program text's :ref:`encoding <encodings>` |
| (UTF-8 by default), and decodes the text into |
| :ref:`source characters <lexical-source-character>`. |
| If the text cannot be decoded, a :exc:`SyntaxError` is raised. |
| |
| Next, the lexical analyzer uses the source characters to generate a stream of tokens. |
| The type of a generated token generally depends on the next source character to |
| be processed. Similarly, other special behavior of the analyzer depends on |
| the first source character that hasn't yet been processed. |
| The following table gives a quick summary of these source characters, |
| with links to sections that contain more information. |
| |
| .. list-table:: |
| :header-rows: 1 |
| |
| * - Character |
| - Next token (or other relevant documentation) |
| |
| * - * space |
| * tab |
| * formfeed |
| - * :ref:`Whitespace <whitespace>` |
| |
| * - * CR, LF |
| - * :ref:`New line <line-structure>` |
| * :ref:`Indentation <indentation>` |
| |
| * - * backslash (``\``) |
| - * :ref:`Explicit line joining <explicit-joining>` |
| * (Also significant in :ref:`string escape sequences <escape-sequences>`) |
| |
| * - * hash (``#``) |
| - * :ref:`Comment <comments>` |
| |
| * - * quote (``'``, ``"``) |
| - * :ref:`String literal <strings>` |
| |
| * - * ASCII letter (``a``-``z``, ``A``-``Z``) |
| * non-ASCII character |
| - * :ref:`Name <identifiers>` |
| * Prefixed :ref:`string or bytes literal <strings>` |
| |
| * - * underscore (``_``) |
| - * :ref:`Name <identifiers>` |
| * (Can also be part of :ref:`numeric literals <numbers>`) |
| |
| * - * number (``0``-``9``) |
| - * :ref:`Numeric literal <numbers>` |
| |
| * - * dot (``.``) |
| - * :ref:`Numeric literal <numbers>` |
| * :ref:`Operator <operators>` |
| |
| * - * question mark (``?``) |
| * dollar (``$``) |
| * |
| .. (the following uses zero-width space characters to render |
| .. a literal backquote) |
| |
| backquote (`````) |
| * control character |
| - * Error (outside string literals and comments) |
| |
| * - * other printing character |
| - * :ref:`Operator or delimiter <operators>` |
| |
| * - * end of file |
| - * :ref:`End marker <endmarker-token>` |
| |
| |
| .. _line-structure: |
| |
| Line structure |
| ============== |
| |
| .. index:: line structure |
| |
| A Python program is divided into a number of *logical lines*. |
| |
| |
| .. _logical-lines: |
| |
| Logical lines |
| ------------- |
| |
| .. index:: logical line, physical line, line joining, NEWLINE token |
| |
| The end of a logical line is represented by the token :data:`~token.NEWLINE`. |
| Statements cannot cross logical line boundaries except where :data:`!NEWLINE` |
| is allowed by the syntax (e.g., between statements in compound statements). |
| A logical line is constructed from one or more *physical lines* by following |
| the :ref:`explicit <explicit-joining>` or :ref:`implicit <implicit-joining>` |
| *line joining* rules. |
| |
| |
| .. _physical-lines: |
| |
| Physical lines |
| -------------- |
| |
| A physical line is a sequence of characters terminated by one the following |
| end-of-line sequences: |
| |
| * the Unix form using ASCII LF (linefeed), |
| * the Windows form using the ASCII sequence CR LF (return followed by linefeed), |
| * the '`Classic Mac OS`__' form using the ASCII CR (return) character. |
| |
| __ https://en.wikipedia.org/wiki/Classic_Mac_OS |
| |
| Regardless of platform, each of these sequences is replaced by a single |
| ASCII LF (linefeed) character. |
| (This is done even inside :ref:`string literals <strings>`.) |
| Each line can use any of the sequences; they do not need to be consistent |
| within a file. |
| |
| The end of input also serves as an implicit terminator for the final |
| physical line. |
| |
| Formally: |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| newline: <ASCII LF> | <ASCII CR> <ASCII LF> | <ASCII CR> |
| |
| |
| .. _comments: |
| |
| Comments |
| -------- |
| |
| .. index:: comment, hash character |
| single: # (hash); comment |
| |
| A comment starts with a hash character (``#``) that is not part of a string |
| literal, and ends at the end of the physical line. A comment signifies the end |
| of the logical line unless the implicit line joining rules are invoked. Comments |
| are ignored by the syntax. |
| |
| |
| .. _encodings: |
| |
| Encoding declarations |
| --------------------- |
| |
| .. index:: source character set, encoding declarations (source file) |
| single: # (hash); source encoding declaration |
| |
| If a comment in the first or second line of the Python script matches the |
| regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an |
| encoding declaration; the first group of this expression names the encoding of |
| the source code file. The encoding declaration must appear on a line of its |
| own. If it is the second line, the first line must also be a comment-only line. |
| The recommended forms of an encoding expression are :: |
| |
| # -*- coding: <encoding-name> -*- |
| |
| which is recognized also by GNU Emacs, and :: |
| |
| # vim:fileencoding=<encoding-name> |
| |
| which is recognized by Bram Moolenaar's VIM. |
| |
| If no encoding declaration is found, the default encoding is UTF-8. If the |
| implicit or explicit encoding of a file is UTF-8, an initial UTF-8 byte-order |
| mark (``b'\xef\xbb\xbf'``) is ignored rather than being a syntax error. |
| |
| If an encoding is declared, the encoding name must be recognized by Python |
| (see :ref:`standard-encodings`). The |
| encoding is used for all lexical analysis, including string literals, comments |
| and identifiers. |
| |
| .. _lexical-source-character: |
| |
| All lexical analysis, including string literals, comments |
| and identifiers, works on Unicode text decoded using the source encoding. |
| Any Unicode code point, except the NUL control character, can appear in |
| Python source. |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| source_character: <any Unicode code point, except NUL> |
| |
| |
| .. _explicit-joining: |
| |
| Explicit line joining |
| --------------------- |
| |
| .. index:: physical line, line joining, line continuation, backslash character |
| |
| Two or more physical lines may be joined into logical lines using backslash |
| characters (``\``), as follows: when a physical line ends in a backslash that is |
| not part of a string literal or comment, it is joined with the following forming |
| a single logical line, deleting the backslash and the following end-of-line |
| character. For example:: |
| |
| if 1900 < year < 2100 and 1 <= month <= 12 \ |
| and 1 <= day <= 31 and 0 <= hour < 24 \ |
| and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date |
| return 1 |
| |
| A line ending in a backslash cannot carry a comment. A backslash does not |
| continue a comment. A backslash does not continue a token except for string |
| literals (i.e., tokens other than string literals cannot be split across |
| physical lines using a backslash). A backslash is illegal elsewhere on a line |
| outside a string literal. |
| |
| |
| .. _implicit-joining: |
| |
| Implicit line joining |
| --------------------- |
| |
| Expressions in parentheses, square brackets or curly braces can be split over |
| more than one physical line without using backslashes. For example:: |
| |
| month_names = ['Januari', 'Februari', 'Maart', # These are the |
| 'April', 'Mei', 'Juni', # Dutch names |
| 'Juli', 'Augustus', 'September', # for the months |
| 'Oktober', 'November', 'December'] # of the year |
| |
| Implicitly continued lines can carry comments. The indentation of the |
| continuation lines is not important. Blank continuation lines are allowed. |
| There is no NEWLINE token between implicit continuation lines. Implicitly |
| continued lines can also occur within triple-quoted strings (see below); in that |
| case they cannot carry comments. |
| |
| |
| .. _blank-lines: |
| |
| Blank lines |
| ----------- |
| |
| .. index:: single: blank line |
| |
| A logical line that contains only spaces, tabs, formfeeds and possibly a |
| comment, is ignored (i.e., no :data:`~token.NEWLINE` token is generated). |
| During interactive input of statements, handling of a blank line may differ |
| depending on the implementation of the read-eval-print loop. |
| In the standard interactive interpreter, an entirely blank logical line (that |
| is, one containing not even whitespace or a comment) terminates a multi-line |
| statement. |
| |
| |
| .. _indentation: |
| |
| Indentation |
| ----------- |
| |
| .. index:: indentation, leading whitespace, space, tab, grouping, statement grouping |
| |
| Leading whitespace (spaces and tabs) at the beginning of a logical line is used |
| to compute the indentation level of the line, which in turn is used to determine |
| the grouping of statements. |
| |
| Tabs are replaced (from left to right) by one to eight spaces such that the |
| total number of characters up to and including the replacement is a multiple of |
| eight (this is intended to be the same rule as used by Unix). The total number |
| of spaces preceding the first non-blank character then determines the line's |
| indentation. Indentation cannot be split over multiple physical lines using |
| backslashes; the whitespace up to the first backslash determines the |
| indentation. |
| |
| Indentation is rejected as inconsistent if a source file mixes tabs and spaces |
| in a way that makes the meaning dependent on the worth of a tab in spaces; a |
| :exc:`TabError` is raised in that case. |
| |
| **Cross-platform compatibility note:** because of the nature of text editors on |
| non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the |
| indentation in a single source file. It should also be noted that different |
| platforms may explicitly limit the maximum indentation level. |
| |
| A formfeed character may be present at the start of the line; it will be ignored |
| for the indentation calculations above. Formfeed characters occurring elsewhere |
| in the leading whitespace have an undefined effect (for instance, they may reset |
| the space count to zero). |
| |
| .. index:: INDENT token, DEDENT token |
| |
| The indentation levels of consecutive lines are used to generate |
| :data:`~token.INDENT` and :data:`~token.DEDENT` tokens, using a stack, |
| as follows. |
| |
| Before the first line of the file is read, a single zero is pushed on the stack; |
| this will never be popped off again. The numbers pushed on the stack will |
| always be strictly increasing from bottom to top. At the beginning of each |
| logical line, the line's indentation level is compared to the top of the stack. |
| If it is equal, nothing happens. If it is larger, it is pushed on the stack, and |
| one :data:`!INDENT` token is generated. If it is smaller, it *must* be one of the |
| numbers occurring on the stack; all numbers on the stack that are larger are |
| popped off, and for each number popped off a :data:`!DEDENT` token is generated. |
| At the end of the file, a :data:`!DEDENT` token is generated for each number |
| remaining on the stack that is larger than zero. |
| |
| Here is an example of a correctly (though confusingly) indented piece of Python |
| code:: |
| |
| def perm(l): |
| # Compute the list of all permutations of l |
| if len(l) <= 1: |
| return [l] |
| r = [] |
| for i in range(len(l)): |
| s = l[:i] + l[i+1:] |
| p = perm(s) |
| for x in p: |
| r.append(l[i:i+1] + x) |
| return r |
| |
| The following example shows various indentation errors:: |
| |
| def perm(l): # error: first line indented |
| for i in range(len(l)): # error: not indented |
| s = l[:i] + l[i+1:] |
| p = perm(l[:i] + l[i+1:]) # error: unexpected indent |
| for x in p: |
| r.append(l[i:i+1] + x) |
| return r # error: inconsistent dedent |
| |
| (Actually, the first three errors are detected by the parser; only the last |
| error is found by the lexical analyzer --- the indentation of ``return r`` does |
| not match a level popped off the stack.) |
| |
| |
| .. _whitespace: |
| |
| Whitespace between tokens |
| ------------------------- |
| |
| Except at the beginning of a logical line or in string literals, the whitespace |
| characters space, tab and formfeed can be used interchangeably to separate |
| tokens. Whitespace is needed between two tokens only if their concatenation |
| could otherwise be interpreted as a different token. For example, ``ab`` is one |
| token, but ``a b`` is two tokens. However, ``+a`` and ``+ a`` both produce |
| two tokens, ``+`` and ``a``, as ``+a`` is not a valid token. |
| |
| |
| .. _endmarker-token: |
| |
| End marker |
| ---------- |
| |
| At the end of non-interactive input, the lexical analyzer generates an |
| :data:`~token.ENDMARKER` token. |
| |
| |
| .. _other-tokens: |
| |
| Other tokens |
| ============ |
| |
| Besides :data:`~token.NEWLINE`, :data:`~token.INDENT` and :data:`~token.DEDENT`, |
| the following categories of tokens exist: |
| *identifiers* and *keywords* (:data:`~token.NAME`), *literals* (such as |
| :data:`~token.NUMBER` and :data:`~token.STRING`), and other symbols |
| (*operators* and *delimiters*, :data:`~token.OP`). |
| Whitespace characters (other than logical line terminators, discussed earlier) |
| are not tokens, but serve to delimit tokens. |
| Where ambiguity exists, a token comprises the longest possible string that |
| forms a legal token, when read from left to right. |
| |
| |
| .. _identifiers: |
| |
| Names (identifiers and keywords) |
| ================================ |
| |
| .. index:: identifier, name |
| |
| :data:`~token.NAME` tokens represent *identifiers*, *keywords*, and |
| *soft keywords*. |
| |
| Within the ASCII range (U+0001..U+007F), the valid characters for names |
| include the uppercase and lowercase letters (``A-Z`` and ``a-z``), |
| the underscore ``_`` and, except for the first character, the digits |
| ``0`` through ``9``. |
| |
| Names must contain at least one character, but have no upper length limit. |
| Case is significant. |
| |
| Besides ``A-Z``, ``a-z``, ``_`` and ``0-9``, names can also use "letter-like" |
| and "number-like" characters from outside the ASCII range, as detailed below. |
| |
| All identifiers are converted into the `normalization form`_ NFKC while |
| parsing; comparison of identifiers is based on NFKC. |
| |
| Formally, the first character of a normalized identifier must belong to the |
| set ``id_start``, which is the union of: |
| |
| * Unicode category ``<Lu>`` - uppercase letters (includes ``A`` to ``Z``) |
| * Unicode category ``<Ll>`` - lowercase letters (includes ``a`` to ``z``) |
| * Unicode category ``<Lt>`` - titlecase letters |
| * Unicode category ``<Lm>`` - modifier letters |
| * Unicode category ``<Lo>`` - other letters |
| * Unicode category ``<Nl>`` - letter numbers |
| * {``"_"``} - the underscore |
| * ``<Other_ID_Start>`` - an explicit set of characters in `PropList.txt`_ |
| to support backwards compatibility |
| |
| The remaining characters must belong to the set ``id_continue``, which is the |
| union of: |
| |
| * all characters in ``id_start`` |
| * Unicode category ``<Nd>`` - decimal numbers (includes ``0`` to ``9``) |
| * Unicode category ``<Pc>`` - connector punctuations |
| * Unicode category ``<Mn>`` - nonspacing marks |
| * Unicode category ``<Mc>`` - spacing combining marks |
| * ``<Other_ID_Continue>`` - another explicit set of characters in |
| `PropList.txt`_ to support backwards compatibility |
| |
| Unicode categories use the version of the Unicode Character Database as |
| included in the :mod:`unicodedata` module. |
| |
| These sets are based on the Unicode standard annex `UAX-31`_. |
| See also :pep:`3131` for further details. |
| |
| Even more formally, names are described by the following lexical definitions: |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| NAME: `xid_start` `xid_continue`* |
| id_start: <Lu> | <Ll> | <Lt> | <Lm> | <Lo> | <Nl> | "_" | <Other_ID_Start> |
| id_continue: `id_start` | <Nd> | <Pc> | <Mn> | <Mc> | <Other_ID_Continue> |
| xid_start: <all characters in `id_start` whose NFKC normalization is |
| in (`id_start` `xid_continue`*)"> |
| xid_continue: <all characters in `id_continue` whose NFKC normalization is |
| in (`id_continue`*)"> |
| identifier: <`NAME`, except keywords> |
| |
| A non-normative listing of all valid identifier characters as defined by |
| Unicode is available in the `DerivedCoreProperties.txt`_ file in the Unicode |
| Character Database. |
| |
| |
| .. _UAX-31: https://www.unicode.org/reports/tr31/ |
| .. _PropList.txt: https://www.unicode.org/Public/17.0.0/ucd/PropList.txt |
| .. _DerivedCoreProperties.txt: https://www.unicode.org/Public/17.0.0/ucd/DerivedCoreProperties.txt |
| .. _normalization form: https://www.unicode.org/reports/tr15/#Norm_Forms |
| |
| |
| .. _keywords: |
| |
| Keywords |
| -------- |
| |
| .. index:: |
| single: keyword |
| single: reserved word |
| |
| The following names are used as reserved words, or *keywords* of the |
| language, and cannot be used as ordinary identifiers. They must be spelled |
| exactly as written here: |
| |
| .. sourcecode:: text |
| |
| False await else import pass |
| None break except in raise |
| True class finally is return |
| and continue for lambda try |
| as def from nonlocal while |
| assert del global not with |
| async elif if or yield |
| |
| |
| .. _soft-keywords: |
| |
| Soft Keywords |
| ------------- |
| |
| .. index:: soft keyword, keyword |
| |
| .. versionadded:: 3.10 |
| |
| Some names are only reserved under specific contexts. These are known as |
| *soft keywords*: |
| |
| - ``match``, ``case``, and ``_``, when used in the :keyword:`match` statement. |
| - ``type``, when used in the :keyword:`type` statement. |
| |
| These syntactically act as keywords in their specific contexts, |
| but this distinction is done at the parser level, not when tokenizing. |
| |
| As soft keywords, their use in the grammar is possible while still |
| preserving compatibility with existing code that uses these names as |
| identifier names. |
| |
| .. versionchanged:: 3.12 |
| ``type`` is now a soft keyword. |
| |
| .. index:: |
| single: _, identifiers |
| single: __, identifiers |
| .. _id-classes: |
| |
| Reserved classes of identifiers |
| ------------------------------- |
| |
| Certain classes of identifiers (besides keywords) have special meanings. These |
| classes are identified by the patterns of leading and trailing underscore |
| characters: |
| |
| ``_*`` |
| Not imported by ``from module import *``. |
| |
| ``_`` |
| In a ``case`` pattern within a :keyword:`match` statement, ``_`` is a |
| :ref:`soft keyword <soft-keywords>` that denotes a |
| :ref:`wildcard <wildcard-patterns>`. |
| |
| Separately, the interactive interpreter makes the result of the last evaluation |
| available in the variable ``_``. |
| (It is stored in the :mod:`builtins` module, alongside built-in |
| functions like ``print``.) |
| |
| Elsewhere, ``_`` is a regular identifier. It is often used to name |
| "special" items, but it is not special to Python itself. |
| |
| .. note:: |
| |
| The name ``_`` is often used in conjunction with internationalization; |
| refer to the documentation for the :mod:`gettext` module for more |
| information on this convention. |
| |
| It is also commonly used for unused variables. |
| |
| ``__*__`` |
| System-defined names, informally known as "dunder" names. These names are |
| defined by the interpreter and its implementation (including the standard library). |
| Current system names are discussed in the :ref:`specialnames` section and elsewhere. |
| More will likely be defined in future versions of Python. *Any* use of ``__*__`` names, |
| in any context, that does not follow explicitly documented use, is subject to |
| breakage without warning. |
| |
| ``__*`` |
| Class-private names. Names in this category, when used within the context of a |
| class definition, are re-written to use a mangled form to help avoid name |
| clashes between "private" attributes of base and derived classes. See section |
| :ref:`atom-identifiers`. |
| |
| |
| .. _literals: |
| |
| Literals |
| ======== |
| |
| .. index:: literal, constant |
| |
| Literals are notations for constant values of some built-in types. |
| |
| In terms of lexical analysis, Python has :ref:`string, bytes <strings>` |
| and :ref:`numeric <numbers>` literals. |
| |
| Other "literals" are lexically denoted using :ref:`keywords <keywords>` |
| (``None``, ``True``, ``False``) and the special |
| :ref:`ellipsis token <lexical-ellipsis>` (``...``). |
| |
| |
| .. index:: string literal, bytes literal, ASCII |
| single: ' (single quote); string literal |
| single: " (double quote); string literal |
| .. _strings: |
| |
| String and Bytes literals |
| ========================= |
| |
| String literals are text enclosed in single quotes (``'``) or double |
| quotes (``"``). For example: |
| |
| .. code-block:: python |
| |
| "spam" |
| 'eggs' |
| |
| The quote used to start the literal also terminates it, so a string literal |
| can only contain the other quote (except with escape sequences, see below). |
| For example: |
| |
| .. code-block:: python |
| |
| 'Say "Hello", please.' |
| "Don't do that!" |
| |
| Except for this limitation, the choice of quote character (``'`` or ``"``) |
| does not affect how the literal is parsed. |
| |
| Inside a string literal, the backslash (``\``) character introduces an |
| :dfn:`escape sequence`, which has special meaning depending on the character |
| after the backslash. |
| For example, ``\"`` denotes the double quote character, and does *not* end |
| the string: |
| |
| .. code-block:: pycon |
| |
| >>> print("Say \"Hello\" to everyone!") |
| Say "Hello" to everyone! |
| |
| See :ref:`escape sequences <escape-sequences>` below for a full list of such |
| sequences, and more details. |
| |
| |
| .. index:: triple-quoted string |
| single: """; string literal |
| single: '''; string literal |
| |
| Triple-quoted strings |
| --------------------- |
| |
| Strings can also be enclosed in matching groups of three single or double |
| quotes. |
| These are generally referred to as :dfn:`triple-quoted strings`:: |
| |
| """This is a triple-quoted string.""" |
| |
| In triple-quoted literals, unescaped quotes are allowed (and are |
| retained), except that three unescaped quotes in a row terminate the literal, |
| if they are of the same kind (``'`` or ``"``) used at the start:: |
| |
| """This string has "quotes" inside.""" |
| |
| Unescaped newlines are also allowed and retained:: |
| |
| '''This triple-quoted string |
| continues on the next line.''' |
| |
| |
| .. index:: |
| single: u'; string literal |
| single: u"; string literal |
| |
| String prefixes |
| --------------- |
| |
| String literals can have an optional :dfn:`prefix` that influences how the |
| content of the literal is parsed, for example: |
| |
| .. code-block:: python |
| |
| b"data" |
| f'{result=}' |
| |
| The allowed prefixes are: |
| |
| * ``b``: :ref:`Bytes literal <bytes-literal>` |
| * ``r``: :ref:`Raw string <raw-strings>` |
| * ``f``: :ref:`Formatted string literal <f-strings>` ("f-string") |
| * ``t``: :ref:`Template string literal <t-strings>` ("t-string") |
| * ``u``: No effect (allowed for backwards compatibility) |
| |
| See the linked sections for details on each type. |
| |
| Prefixes are case-insensitive (for example, '``B``' works the same as '``b``'). |
| The '``r``' prefix can be combined with '``f``', '``t``' or '``b``', so '``fr``', |
| '``rf``', '``tr``', '``rt``', '``br``', and '``rb``' are also valid prefixes. |
| |
| .. versionadded:: 3.3 |
| The ``'rb'`` prefix of raw bytes literals has been added as a synonym |
| of ``'br'``. |
| |
| Support for the unicode legacy literal (``u'value'``) was reintroduced |
| to simplify the maintenance of dual Python 2.x and 3.x codebases. |
| See :pep:`414` for more information. |
| |
| |
| Formal grammar |
| -------------- |
| |
| String literals, except :ref:`"f-strings" <f-strings>` and |
| :ref:`"t-strings" <t-strings>`, are described by the |
| following lexical definitions. |
| |
| These definitions use :ref:`negative lookaheads <lexical-lookaheads>` (``!``) |
| to indicate that an ending quote ends the literal. |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| STRING: [`stringprefix`] (`stringcontent`) |
| stringprefix: <("r" | "u" | "b" | "br" | "rb"), case-insensitive> |
| stringcontent: |
| | "'''" ( !"'''" `longstringitem`)* "'''" |
| | '"""' ( !'"""' `longstringitem`)* '"""' |
| | "'" ( !"'" `stringitem`)* "'" |
| | '"' ( !'"' `stringitem`)* '"' |
| stringitem: `stringchar` | `stringescapeseq` |
| stringchar: <any `source_character`, except backslash and newline> |
| longstringitem: `stringitem` | newline |
| stringescapeseq: "\" <any `source_character`> |
| |
| Note that as in all lexical definitions, whitespace is significant. |
| In particular, the prefix (if any) must be immediately followed by the starting |
| quote. |
| |
| .. index:: physical line, escape sequence, Standard C, C |
| single: \ (backslash); escape sequence |
| single: \\; escape sequence |
| single: \a; escape sequence |
| single: \b; escape sequence |
| single: \f; escape sequence |
| single: \n; escape sequence |
| single: \r; escape sequence |
| single: \t; escape sequence |
| single: \v; escape sequence |
| single: \x; escape sequence |
| single: \N; escape sequence |
| single: \u; escape sequence |
| single: \U; escape sequence |
| |
| .. _escape-sequences: |
| |
| Escape sequences |
| ---------------- |
| |
| Unless an '``r``' or '``R``' prefix is present, escape sequences in string and |
| bytes literals are interpreted according to rules similar to those used by |
| Standard C. The recognized escape sequences are: |
| |
| .. list-table:: |
| :widths: auto |
| :header-rows: 1 |
| |
| * * Escape Sequence |
| * Meaning |
| * * ``\``\ <newline> |
| * :ref:`string-escape-ignore` |
| * * ``\\`` |
| * :ref:`Backslash <string-escape-escaped-char>` |
| * * ``\'`` |
| * :ref:`Single quote <string-escape-escaped-char>` |
| * * ``\"`` |
| * :ref:`Double quote <string-escape-escaped-char>` |
| * * ``\a`` |
| * ASCII Bell (BEL) |
| * * ``\b`` |
| * ASCII Backspace (BS) |
| * * ``\f`` |
| * ASCII Formfeed (FF) |
| * * ``\n`` |
| * ASCII Linefeed (LF) |
| * * ``\r`` |
| * ASCII Carriage Return (CR) |
| * * ``\t`` |
| * ASCII Horizontal Tab (TAB) |
| * * ``\v`` |
| * ASCII Vertical Tab (VT) |
| * * :samp:`\\\\{ooo}` |
| * :ref:`string-escape-oct` |
| * * :samp:`\\x{hh}` |
| * :ref:`string-escape-hex` |
| * * :samp:`\\N\\{{name}\\}` |
| * :ref:`string-escape-named` |
| * * :samp:`\\u{xxxx}` |
| * :ref:`Hexadecimal Unicode character <string-escape-long-hex>` |
| * * :samp:`\\U{xxxxxxxx}` |
| * :ref:`Hexadecimal Unicode character <string-escape-long-hex>` |
| |
| .. _string-escape-ignore: |
| |
| Ignored end of line |
| ^^^^^^^^^^^^^^^^^^^ |
| |
| A backslash can be added at the end of a line to ignore the newline:: |
| |
| >>> 'This string will not include \ |
| ... backslashes or newline characters.' |
| 'This string will not include backslashes or newline characters.' |
| |
| The same result can be achieved using :ref:`triple-quoted strings <strings>`, |
| or parentheses and :ref:`string literal concatenation <string-concatenation>`. |
| |
| .. _string-escape-escaped-char: |
| |
| Escaped characters |
| ^^^^^^^^^^^^^^^^^^ |
| |
| To include a backslash in a non-:ref:`raw <raw-strings>` Python string |
| literal, it must be doubled. The ``\\`` escape sequence denotes a single |
| backslash character:: |
| |
| >>> print('C:\\Program Files') |
| C:\Program Files |
| |
| Similarly, the ``\'`` and ``\"`` sequences denote the single and double |
| quote character, respectively:: |
| |
| >>> print('\' and \"') |
| ' and " |
| |
| .. _string-escape-oct: |
| |
| Octal character |
| ^^^^^^^^^^^^^^^ |
| |
| The sequence :samp:`\\\\{ooo}` denotes a *character* with the octal (base 8) |
| value *ooo*:: |
| |
| >>> '\120' |
| 'P' |
| |
| Up to three octal digits (0 through 7) are accepted. |
| |
| In a bytes literal, *character* means a *byte* with the given value. |
| In a string literal, it means a Unicode character with the given value. |
| |
| .. versionchanged:: 3.11 |
| Octal escapes with value larger than ``0o377`` (255) produce a |
| :exc:`DeprecationWarning`. |
| |
| .. versionchanged:: 3.12 |
| Octal escapes with value larger than ``0o377`` (255) produce a |
| :exc:`SyntaxWarning`. |
| In a future Python version they will raise a :exc:`SyntaxError`. |
| |
| .. _string-escape-hex: |
| |
| Hexadecimal character |
| ^^^^^^^^^^^^^^^^^^^^^ |
| |
| The sequence :samp:`\\x{hh}` denotes a *character* with the hex (base 16) |
| value *hh*:: |
| |
| >>> '\x50' |
| 'P' |
| |
| Unlike in Standard C, exactly two hex digits are required. |
| |
| In a bytes literal, *character* means a *byte* with the given value. |
| In a string literal, it means a Unicode character with the given value. |
| |
| .. _string-escape-named: |
| |
| Named Unicode character |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| The sequence :samp:`\\N\\{{name}\\}` denotes a Unicode character |
| with the given *name*:: |
| |
| >>> '\N{LATIN CAPITAL LETTER P}' |
| 'P' |
| >>> '\N{SNAKE}' |
| '🐍' |
| |
| This sequence cannot appear in :ref:`bytes literals <bytes-literal>`. |
| |
| .. versionchanged:: 3.3 |
| Support for `name aliases <https://www.unicode.org/Public/17.0.0/ucd/NameAliases.txt>`__ |
| has been added. |
| |
| .. _string-escape-long-hex: |
| |
| Hexadecimal Unicode characters |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| These sequences :samp:`\\u{xxxx}` and :samp:`\\U{xxxxxxxx}` denote the |
| Unicode character with the given hex (base 16) value. |
| Exactly four digits are required for ``\u``; exactly eight digits are |
| required for ``\U``. |
| The latter can encode any Unicode character. |
| |
| .. code-block:: pycon |
| |
| >>> '\u1234' |
| 'ሴ' |
| >>> '\U0001f40d' |
| '🐍' |
| |
| These sequences cannot appear in :ref:`bytes literals <bytes-literal>`. |
| |
| |
| .. index:: unrecognized escape sequence |
| |
| Unrecognized escape sequences |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Unlike in Standard C, all unrecognized escape sequences are left in the string |
| unchanged, that is, *the backslash is left in the result*:: |
| |
| >>> print('\q') |
| \q |
| >>> list('\q') |
| ['\\', 'q'] |
| |
| Note that for bytes literals, the escape sequences only recognized in string |
| literals (``\N...``, ``\u...``, ``\U...``) fall into the category of |
| unrecognized escapes. |
| |
| .. versionchanged:: 3.6 |
| Unrecognized escape sequences produce a :exc:`DeprecationWarning`. |
| |
| .. versionchanged:: 3.12 |
| Unrecognized escape sequences produce a :exc:`SyntaxWarning`. |
| In a future Python version they will raise a :exc:`SyntaxError`. |
| |
| |
| .. index:: |
| single: b'; bytes literal |
| single: b"; bytes literal |
| |
| |
| .. _bytes-literal: |
| |
| Bytes literals |
| -------------- |
| |
| :dfn:`Bytes literals` are always prefixed with '``b``' or '``B``'; they produce an |
| instance of the :class:`bytes` type instead of the :class:`str` type. |
| They may only contain ASCII characters; bytes with a numeric value of 128 |
| or greater must be expressed with escape sequences (typically |
| :ref:`string-escape-hex` or :ref:`string-escape-oct`): |
| |
| .. code-block:: pycon |
| |
| >>> b'\x89PNG\r\n\x1a\n' |
| b'\x89PNG\r\n\x1a\n' |
| >>> list(b'\x89PNG\r\n\x1a\n') |
| [137, 80, 78, 71, 13, 10, 26, 10] |
| |
| Similarly, a zero byte must be expressed using an escape sequence (typically |
| ``\0`` or ``\x00``). |
| |
| |
| .. index:: |
| single: r'; raw string literal |
| single: r"; raw string literal |
| |
| .. _raw-strings: |
| |
| Raw string literals |
| ------------------- |
| |
| Both string and bytes literals may optionally be prefixed with a letter '``r``' |
| or '``R``'; such constructs are called :dfn:`raw string literals` |
| and :dfn:`raw bytes literals` respectively and treat backslashes as |
| literal characters. |
| As a result, in raw string literals, :ref:`escape sequences <escape-sequences>` |
| are not treated specially: |
| |
| .. code-block:: pycon |
| |
| >>> r'\d{4}-\d{2}-\d{2}' |
| '\\d{4}-\\d{2}-\\d{2}' |
| |
| Even in a raw literal, quotes can be escaped with a backslash, but the |
| backslash remains in the result; for example, ``r"\""`` is a valid string |
| literal consisting of two characters: a backslash and a double quote; ``r"\"`` |
| is not a valid string literal (even a raw string cannot end in an odd number of |
| backslashes). Specifically, *a raw literal cannot end in a single backslash* |
| (since the backslash would escape the following quote character). Note also |
| that a single backslash followed by a newline is interpreted as those two |
| characters as part of the literal, *not* as a line continuation. |
| |
| |
| .. index:: |
| single: formatted string literal |
| single: interpolated string literal |
| single: string; formatted literal |
| single: string; interpolated literal |
| single: f-string |
| single: fstring |
| single: f'; formatted string literal |
| single: f"; formatted string literal |
| single: {} (curly brackets); in formatted string literal |
| single: ! (exclamation); in formatted string literal |
| single: : (colon); in formatted string literal |
| single: = (equals); for help in debugging using string literals |
| |
| .. _f-strings: |
| .. _formatted-string-literals: |
| |
| f-strings |
| --------- |
| |
| .. versionadded:: 3.6 |
| |
| A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal |
| that is prefixed with '``f``' or '``F``'. These strings may contain |
| replacement fields, which are expressions delimited by curly braces ``{}``. |
| While other string literals always have a constant value, formatted strings |
| are really expressions evaluated at run time. |
| |
| Escape sequences are decoded like in ordinary string literals (except when |
| a literal is also marked as a raw string). After decoding, the grammar |
| for the contents of the string is: |
| |
| .. productionlist:: python-grammar |
| f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)* |
| replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}" |
| f_expression: (`conditional_expression` | "*" `or_expr`) |
| : ("," `conditional_expression` | "," "*" `or_expr`)* [","] |
| : | `yield_expression` |
| conversion: "s" | "r" | "a" |
| format_spec: (`literal_char` | `replacement_field`)* |
| literal_char: <any code point except "{", "}" or NULL> |
| |
| The parts of the string outside curly braces are treated literally, |
| except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced |
| with the corresponding single curly brace. A single opening curly |
| bracket ``'{'`` marks a replacement field, which starts with a |
| Python expression. To display both the expression text and its value after |
| evaluation, (useful in debugging), an equal sign ``'='`` may be added after the |
| expression. A conversion field, introduced by an exclamation point ``'!'`` may |
| follow. A format specifier may also be appended, introduced by a colon ``':'``. |
| A replacement field ends with a closing curly bracket ``'}'``. |
| |
| Expressions in formatted string literals are treated like regular |
| Python expressions surrounded by parentheses, with a few exceptions. |
| An empty expression is not allowed, and both :keyword:`lambda` and |
| assignment expressions ``:=`` must be surrounded by explicit parentheses. |
| Each expression is evaluated in the context where the formatted string literal |
| appears, in order from left to right. Replacement expressions can contain |
| newlines in both single-quoted and triple-quoted f-strings and they can contain |
| comments. Everything that comes after a ``#`` inside a replacement field |
| is a comment (even closing braces and quotes). In that case, replacement fields |
| must be closed in a different line. |
| |
| .. code-block:: text |
| |
| >>> f"abc{a # This is a comment }" |
| ... + 3}" |
| 'abc5' |
| |
| .. versionchanged:: 3.7 |
| Prior to Python 3.7, an :keyword:`await` expression and comprehensions |
| containing an :keyword:`async for` clause were illegal in the expressions |
| in formatted string literals due to a problem with the implementation. |
| |
| .. versionchanged:: 3.12 |
| Prior to Python 3.12, comments were not allowed inside f-string replacement |
| fields. |
| |
| When the equal sign ``'='`` is provided, the output will have the expression |
| text, the ``'='`` and the evaluated value. Spaces after the opening brace |
| ``'{'``, within the expression and after the ``'='`` are all retained in the |
| output. By default, the ``'='`` causes the :func:`repr` of the expression to be |
| provided, unless there is a format specified. When a format is specified it |
| defaults to the :func:`str` of the expression unless a conversion ``'!r'`` is |
| declared. |
| |
| .. versionadded:: 3.8 |
| The equal sign ``'='``. |
| |
| If a conversion is specified, the result of evaluating the expression |
| is converted before formatting. Conversion ``'!s'`` calls :func:`str` on |
| the result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`. |
| |
| The result is then formatted using the :func:`format` protocol. The |
| format specifier is passed to the :meth:`~object.__format__` method of the |
| expression or conversion result. An empty string is passed when the |
| format specifier is omitted. The formatted result is then included in |
| the final value of the whole string. |
| |
| Top-level format specifiers may include nested replacement fields. These nested |
| fields may include their own conversion fields and :ref:`format specifiers |
| <formatspec>`, but may not include more deeply nested replacement fields. The |
| :ref:`format specifier mini-language <formatspec>` is the same as that used by |
| the :meth:`str.format` method. |
| |
| Formatted string literals may be concatenated, but replacement fields |
| cannot be split across literals. |
| |
| Some examples of formatted string literals:: |
| |
| >>> name = "Fred" |
| >>> f"He said his name is {name!r}." |
| "He said his name is 'Fred'." |
| >>> f"He said his name is {repr(name)}." # repr() is equivalent to !r |
| "He said his name is 'Fred'." |
| >>> width = 10 |
| >>> precision = 4 |
| >>> value = decimal.Decimal("12.34567") |
| >>> f"result: {value:{width}.{precision}}" # nested fields |
| 'result: 12.35' |
| >>> today = datetime(year=2017, month=1, day=27) |
| >>> f"{today:%B %d, %Y}" # using date format specifier |
| 'January 27, 2017' |
| >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging |
| 'today=January 27, 2017' |
| >>> number = 1024 |
| >>> f"{number:#0x}" # using integer format specifier |
| '0x400' |
| >>> foo = "bar" |
| >>> f"{ foo = }" # preserves whitespace |
| " foo = 'bar'" |
| >>> line = "The mill's closed" |
| >>> f"{line = }" |
| 'line = "The mill\'s closed"' |
| >>> f"{line = :20}" |
| "line = The mill's closed " |
| >>> f"{line = !r:20}" |
| 'line = "The mill\'s closed" ' |
| |
| |
| Reusing the outer f-string quoting type inside a replacement field is |
| permitted:: |
| |
| >>> a = dict(x=2) |
| >>> f"abc {a["x"]} def" |
| 'abc 2 def' |
| |
| .. versionchanged:: 3.12 |
| Prior to Python 3.12, reuse of the same quoting type of the outer f-string |
| inside a replacement field was not possible. |
| |
| Backslashes are also allowed in replacement fields and are evaluated the same |
| way as in any other context:: |
| |
| >>> a = ["a", "b", "c"] |
| >>> print(f"List a contains:\n{"\n".join(a)}") |
| List a contains: |
| a |
| b |
| c |
| |
| .. versionchanged:: 3.12 |
| Prior to Python 3.12, backslashes were not permitted inside an f-string |
| replacement field. |
| |
| Formatted string literals cannot be used as docstrings, even if they do not |
| include expressions. |
| |
| :: |
| |
| >>> def foo(): |
| ... f"Not a docstring" |
| ... |
| >>> foo.__doc__ is None |
| True |
| |
| See also :pep:`498` for the proposal that added formatted string literals, |
| and :meth:`str.format`, which uses a related format string mechanism. |
| |
| |
| .. _t-strings: |
| .. _template-string-literals: |
| |
| t-strings |
| --------- |
| |
| .. versionadded:: 3.14 |
| |
| A :dfn:`template string literal` or :dfn:`t-string` is a string literal |
| that is prefixed with '``t``' or '``T``'. |
| These strings follow the same syntax and evaluation rules as |
| :ref:`formatted string literals <f-strings>`, with the following differences: |
| |
| * Rather than evaluating to a ``str`` object, template string literals evaluate |
| to a :class:`string.templatelib.Template` object. |
| |
| * The :func:`format` protocol is not used. |
| Instead, the format specifier and conversions (if any) are passed to |
| a new :class:`~string.templatelib.Interpolation` object that is created |
| for each evaluated expression. |
| It is up to code that processes the resulting :class:`~string.templatelib.Template` |
| object to decide how to handle format specifiers and conversions. |
| |
| * Format specifiers containing nested replacement fields are evaluated eagerly, |
| prior to being passed to the :class:`~string.templatelib.Interpolation` object. |
| For instance, an interpolation of the form ``{amount:.{precision}f}`` will |
| evaluate the inner expression ``{precision}`` to determine the value of the |
| ``format_spec`` attribute. |
| If ``precision`` were to be ``2``, the resulting format specifier |
| would be ``'.2f'``. |
| |
| * When the equals sign ``'='`` is provided in an interpolation expression, |
| the text of the expression is appended to the literal string that precedes |
| the relevant interpolation. |
| This includes the equals sign and any surrounding whitespace. |
| The :class:`!Interpolation` instance for the expression will be created as |
| normal, except that :attr:`~string.templatelib.Interpolation.conversion` will |
| be set to '``r``' (:func:`repr`) by default. |
| If an explicit conversion or format specifier are provided, |
| this will override the default behaviour. |
| |
| |
| .. _numbers: |
| |
| Numeric literals |
| ================ |
| |
| .. index:: number, numeric literal, integer literal |
| floating-point literal, hexadecimal literal |
| octal literal, binary literal, decimal literal, imaginary literal, complex literal |
| |
| :data:`~token.NUMBER` tokens represent numeric literals, of which there are |
| three types: integers, floating-point numbers, and imaginary numbers. |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| NUMBER: `integer` | `floatnumber` | `imagnumber` |
| |
| The numeric value of a numeric literal is the same as if it were passed as a |
| string to the :class:`int`, :class:`float` or :class:`complex` class |
| constructor, respectively. |
| Note that not all valid inputs for those constructors are also valid literals. |
| |
| Numeric literals do not include a sign; a phrase like ``-1`` is |
| actually an expression composed of the unary operator '``-``' and the literal |
| ``1``. |
| |
| |
| .. index:: |
| single: 0b; integer literal |
| single: 0o; integer literal |
| single: 0x; integer literal |
| single: _ (underscore); in numeric literal |
| |
| .. _integers: |
| |
| Integer literals |
| ---------------- |
| |
| Integer literals denote whole numbers. For example:: |
| |
| 7 |
| 3 |
| 2147483647 |
| |
| There is no limit for the length of integer literals apart from what can be |
| stored in available memory:: |
| |
| 7922816251426433759354395033679228162514264337593543950336 |
| |
| Underscores can be used to group digits for enhanced readability, |
| and are ignored for determining the numeric value of the literal. |
| For example, the following literals are equivalent:: |
| |
| 100_000_000_000 |
| 100000000000 |
| 1_00_00_00_00_000 |
| |
| Underscores can only occur between digits. |
| For example, ``_123``, ``321_``, and ``123__321`` are *not* valid literals. |
| |
| Integers can be specified in binary (base 2), octal (base 8), or hexadecimal |
| (base 16) using the prefixes ``0b``, ``0o`` and ``0x``, respectively. |
| Hexadecimal digits 10 through 15 are represented by letters ``A``-``F``, |
| case-insensitive. For example:: |
| |
| 0b100110111 |
| 0b_1110_0101 |
| 0o177 |
| 0o377 |
| 0xdeadbeef |
| 0xDead_Beef |
| |
| An underscore can follow the base specifier. |
| For example, ``0x_1f`` is a valid literal, but ``0_x1f`` and ``0x__1f`` are |
| not. |
| |
| Leading zeros in a non-zero decimal number are not allowed. |
| For example, ``0123`` is not a valid literal. |
| This is for disambiguation with C-style octal literals, which Python used |
| before version 3.0. |
| |
| Formally, integer literals are described by the following lexical definitions: |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger` | `zerointeger` |
| decinteger: `nonzerodigit` (["_"] `digit`)* |
| bininteger: "0" ("b" | "B") (["_"] `bindigit`)+ |
| octinteger: "0" ("o" | "O") (["_"] `octdigit`)+ |
| hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+ |
| zerointeger: "0"+ (["_"] "0")* |
| nonzerodigit: "1"..."9" |
| digit: "0"..."9" |
| bindigit: "0" | "1" |
| octdigit: "0"..."7" |
| hexdigit: `digit` | "a"..."f" | "A"..."F" |
| |
| .. versionchanged:: 3.6 |
| Underscores are now allowed for grouping purposes in literals. |
| |
| |
| .. index:: |
| single: . (dot); in numeric literal |
| single: e; in numeric literal |
| single: _ (underscore); in numeric literal |
| .. _floating: |
| |
| Floating-point literals |
| ----------------------- |
| |
| Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote |
| :ref:`approximations of real numbers <datamodel-float>`. |
| |
| They consist of *integer* and *fraction* parts, each composed of decimal digits. |
| The parts are separated by a decimal point, ``.``:: |
| |
| 2.71828 |
| 4.0 |
| |
| Unlike in integer literals, leading zeros are allowed. |
| For example, ``077.010`` is legal, and denotes the same number as ``77.01``. |
| |
| As in integer literals, single underscores may occur between digits to help |
| readability:: |
| |
| 96_485.332_123 |
| 3.14_15_93 |
| |
| Either of these parts, but not both, can be empty. For example:: |
| |
| 10. # (equivalent to 10.0) |
| .001 # (equivalent to 0.001) |
| |
| Optionally, the integer and fraction may be followed by an *exponent*: |
| the letter ``e`` or ``E``, followed by an optional sign, ``+`` or ``-``, |
| and a number in the same format as the integer and fraction parts. |
| The ``e`` or ``E`` represents "times ten raised to the power of":: |
| |
| 1.0e3 # (represents 1.0×10³, or 1000.0) |
| 1.166e-5 # (represents 1.166×10⁻⁵, or 0.00001166) |
| 6.02214076e+23 # (represents 6.02214076×10²³, or 602214076000000000000000.) |
| |
| In floats with only integer and exponent parts, the decimal point may be |
| omitted:: |
| |
| 1e3 # (equivalent to 1.e3 and 1.0e3) |
| 0e0 # (equivalent to 0.) |
| |
| Formally, floating-point literals are described by the following |
| lexical definitions: |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| floatnumber: |
| | `digitpart` "." [`digitpart`] [`exponent`] |
| | "." `digitpart` [`exponent`] |
| | `digitpart` `exponent` |
| digitpart: `digit` (["_"] `digit`)* |
| exponent: ("e" | "E") ["+" | "-"] `digitpart` |
| |
| .. versionchanged:: 3.6 |
| Underscores are now allowed for grouping purposes in literals. |
| |
| |
| .. index:: |
| single: j; in numeric literal |
| .. _imaginary: |
| |
| Imaginary literals |
| ------------------ |
| |
| Python has :ref:`complex number <typesnumeric>` objects, but no complex |
| literals. |
| Instead, *imaginary literals* denote complex numbers with a zero |
| real part. |
| |
| For example, in math, the complex number 3+4.2\ *i* is written |
| as the real number 3 added to the imaginary number 4.2\ *i*. |
| Python uses a similar syntax, except the imaginary unit is written as ``j`` |
| rather than *i*:: |
| |
| 3+4.2j |
| |
| This is an expression composed |
| of the :ref:`integer literal <integers>` ``3``, |
| the :ref:`operator <operators>` '``+``', |
| and the :ref:`imaginary literal <imaginary>` ``4.2j``. |
| Since these are three separate tokens, whitespace is allowed between them:: |
| |
| 3 + 4.2j |
| |
| No whitespace is allowed *within* each token. |
| In particular, the ``j`` suffix, may not be separated from the number |
| before it. |
| |
| The number before the ``j`` has the same syntax as a floating-point literal. |
| Thus, the following are valid imaginary literals:: |
| |
| 4.2j |
| 3.14j |
| 10.j |
| .001j |
| 1e100j |
| 3.14e-10j |
| 3.14_15_93j |
| |
| Unlike in a floating-point literal the decimal point can be omitted if the |
| imaginary number only has an integer part. |
| The number is still evaluated as a floating-point number, not an integer:: |
| |
| 10j |
| 0j |
| 1000000000000000000000000j # equivalent to 1e+24j |
| |
| The ``j`` suffix is case-insensitive. |
| That means you can use ``J`` instead:: |
| |
| 3.14J # equivalent to 3.14j |
| |
| Formally, imaginary literals are described by the following lexical definition: |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| imagnumber: (`floatnumber` | `digitpart`) ("j" | "J") |
| |
| |
| .. _delimiters: |
| .. _operators: |
| .. _lexical-ellipsis: |
| |
| Operators and delimiters |
| ======================== |
| |
| .. index:: |
| single: operators |
| single: delimiters |
| |
| The following grammar defines :dfn:`operator` and :dfn:`delimiter` tokens, |
| that is, the generic :data:`~token.OP` token type. |
| A :ref:`list of these tokens and their names <token_operators_delimiters>` |
| is also available in the :mod:`!token` module documentation. |
| |
| .. grammar-snippet:: |
| :group: python-grammar |
| |
| OP: |
| | assignment_operator |
| | bitwise_operator |
| | comparison_operator |
| | enclosing_delimiter |
| | other_delimiter |
| | arithmetic_operator |
| | "..." |
| | other_op |
| |
| assignment_operator: "+=" | "-=" | "*=" | "**=" | "/=" | "//=" | "%=" | |
| "&=" | "|=" | "^=" | "<<=" | ">>=" | "@=" | ":=" |
| bitwise_operator: "&" | "|" | "^" | "~" | "<<" | ">>" |
| comparison_operator: "<=" | ">=" | "<" | ">" | "==" | "!=" |
| enclosing_delimiter: "(" | ")" | "[" | "]" | "{" | "}" |
| other_delimiter: "," | ":" | "!" | ";" | "=" | "->" |
| arithmetic_operator: "+" | "-" | "**" | "*" | "//" | "/" | "%" |
| other_op: "." | "@" |
| |
| .. note:: |
| |
| Generally, *operators* are used to combine :ref:`expressions <expressions>`, |
| while *delimiters* serve other purposes. |
| However, there is no clear, formal distinction between the two categories. |
| |
| Some tokens can serve as either operators or delimiters, depending on usage. |
| For example, ``*`` is both the multiplication operator and a delimiter used |
| for sequence unpacking, and ``@`` is both the matrix multiplication and |
| a delimiter that introduces decorators. |
| |
| For some tokens, the distinction is unclear. |
| For example, some people consider ``.``, ``(``, and ``)`` to be delimiters, while others |
| see the :py:func:`getattr` operator and the function call operator(s). |
| |
| Some of Python's operators, like ``and``, ``or``, and ``not in``, use |
| :ref:`keyword <keywords>` tokens rather than "symbols" (operator tokens). |
| |
| A sequence of three consecutive periods (``...``) has a special |
| meaning as an :py:data:`Ellipsis` literal. |
| |