Doc/reference/introduction.rst - external/github.com/python/cpython - Git at Google


 .. _introduction:

 ************
 Introduction
 ************

 This reference manual describes the Python programming language. It is not
 intended as a tutorial.

 While I am trying to be as precise as possible, I chose to use English rather
 than formal specifications for everything except syntax and lexical analysis.
 This should make the document more understandable to the average reader, but
 will leave room for ambiguities. Consequently, if you were coming from Mars and
 tried to re-implement Python from this document alone, you might have to guess
 things and in fact you would probably end up implementing quite a different
 language. On the other hand, if you are using Python and wonder what the precise
 rules about a particular area of the language are, you should definitely be able
 to find them here. If you would like to see a more formal definition of the
 language, maybe you could volunteer your time --- or invent a cloning machine
 :-).

 It is dangerous to add too many implementation details to a language reference
 document --- the implementation may change, and other implementations of the
 same language may work differently.  On the other hand, CPython is the one
 Python implementation in widespread use (although alternate implementations
 continue to gain support), and its particular quirks are sometimes worth being
 mentioned, especially where the implementation imposes additional limitations.
 Therefore, you'll find short "implementation notes" sprinkled throughout the
 text.

 Every Python implementation comes with a number of built-in and standard
 modules.  These are documented in :ref:`library-index`.  A few built-in modules
 are mentioned when they interact in a significant way with the language
 definition.


 .. _implementations:

 Alternate Implementations
 =========================

 Though there is one Python implementation which is by far the most popular,
 there are some alternate implementations which are of particular interest to
 different audiences.

 Known implementations include:

 CPython
    This is the original and most-maintained implementation of Python, written in C.
    New language features generally appear here first.

 Jython
    Python implemented in Java.  This implementation can be used as a scripting
    language for Java applications, or can be used to create applications using the
    Java class libraries.  It is also often used to create tests for Java libraries.
    More information can be found at `the Jython website <https://www.jython.org/>`_.

 Python for .NET
    This implementation actually uses the CPython implementation, but is a managed
    .NET application and makes .NET libraries available.  It was created by Brian
    Lloyd.  For more information, see the `Python for .NET home page
    <https://pythonnet.github.io/>`_.

 IronPython
    An alternate Python for .NET.  Unlike Python.NET, this is a complete Python
    implementation that generates IL, and compiles Python code directly to .NET
    assemblies.  It was created by Jim Hugunin, the original creator of Jython.  For
    more information, see `the IronPython website <https://ironpython.net/>`_.

 PyPy
    An implementation of Python written completely in Python. It supports several
    advanced features not found in other implementations like stackless support
    and a Just in Time compiler. One of the goals of the project is to encourage
    experimentation with the language itself by making it easier to modify the
    interpreter (since it is written in Python).  Additional information is
    available on `the PyPy project's home page <https://pypy.org/>`_.

 Each of these implementations varies in some way from the language as documented
 in this manual, or introduces specific information beyond what's covered in the
 standard Python documentation.  Please refer to the implementation-specific
 documentation to determine what else you need to know about the specific
 implementation you're using.


 .. _notation:

 Notation
 ========

 .. index:: BNF, grammar, syntax, notation

 The descriptions of lexical analysis and syntax use a grammar notation that
 is a mixture of
 `EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
 and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
 For example:

 .. grammar-snippet::
    :group: notation

    name:   `letter` (`letter` | `digit` | "_")*
    letter: "a"..."z" | "A"..."Z"
    digit:  "0"..."9"

 In this example, the first line says that a ``name`` is a ``letter`` followed
 by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
 A ``letter`` in turn is any of the single characters ``'a'`` through
 ``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
 to ``9``.

 Each rule begins with a name (which identifies the rule that's being defined)
 followed by a colon, ``:``.
 The definition to the right of the colon uses the following syntax elements:

 * ``name``: A name refers to another rule.
   Where possible, it is a link to the rule's definition.

   * ``TOKEN``: An uppercase name refers to a :term:`token`.
     For the purposes of grammar definitions, tokens are the same as rules.

 * ``"text"``, ``'text'``: Text in single or double quotes must match literally
   (without the quotes). The type of quote is chosen according to the meaning
   of ``text``:

   * ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
   * ``"case"``: A name in double quotes denotes a
     :ref:`soft-keyword <soft-keywords>`.
   * ``'@'``: A non-letter symbol in single quotes denotes an
     :py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
     :ref:`operator <operators>`.

 * ``e1 e2``: Items separated only by whitespace denote a sequence.
   Here, ``e1`` must be followed by ``e2``.
 * ``e1 | e2``: A vertical bar is used to separate alternatives.
   It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
   not considered.
   In traditional PEG grammars, this is written as a slash, ``/``, rather than
   a vertical bar.
   See :pep:`617` for more background and details.
 * ``e*``: A star means zero or more repetitions of the preceding item.
 * ``e+``: Likewise, a plus means one or more repetitions.
 * ``[e]``: A phrase enclosed in square brackets means zero or
   one occurrences. In other words, the enclosed phrase is optional.
 * ``e?``: A question mark has exactly the same meaning as square brackets:
   the preceding item is optional.
 * ``(e)``: Parentheses are used for grouping.

 The following notation is only used in
 :ref:`lexical definitions <notation-lexical-vs-syntactic>`.

 * ``"a"..."z"``: Two literal characters separated by three dots mean a choice
   of any single character in the given (inclusive) range of ASCII characters.
 * ``<...>``: A phrase between angular brackets gives an informal description
   of the matched symbol (for example, ``<any ASCII character except "\">``),
   or an abbreviation that is defined in nearby text (for example, ``<Lu>``).

 .. _lexical-lookaheads:

 Some definitions also use *lookaheads*, which indicate that an element
 must (or must not) match at a given position, but without consuming any input:

 * ``&e``: a positive lookahead (that is, ``e`` is required to match)
 * ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)

 The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
 the vertical bar (``|``) binds most loosely.

 White space is only meaningful to separate tokens.

 Rules are normally contained on a single line, but rules that are too long
 may be wrapped:

 .. grammar-snippet::
    :group: notation

    literal: stringliteral | bytesliteral
             | integer | floatnumber | imagnumber

 Alternatively, rules may be formatted with the first line ending at the colon,
 and each alternative beginning with a vertical bar on a new line.
 For example:


 .. grammar-snippet::
    :group: notation-alt

    literal:
       | stringliteral
       | bytesliteral
       | integer
       | floatnumber
       | imagnumber

 This does *not* mean that there is an empty first alternative.

 .. index:: lexical definitions

 .. _notation-lexical-vs-syntactic:

 Lexical and Syntactic definitions
 ---------------------------------

 There is some difference between *lexical* and *syntactic* analysis:
 the :term:`lexical analyzer` operates on the individual characters of the
 input source, while the *parser* (syntactic analyzer) operates on the stream
 of :term:`tokens <token>` generated by the lexical analysis.
 However, in some cases the exact boundary between the two phases is a
 CPython implementation detail.

 The practical difference between the two is that in *lexical* definitions,
 all whitespace is significant.
 The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not
 converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`.
 *Syntactic* definitions then use these tokens, rather than source characters.

 This documentation uses the same BNF grammar for both styles of definitions.
 All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions;
 uses in subsequent chapters are syntactic definitions.

	.. _introduction:

	************
	Introduction
	************

	This reference manual describes the Python programming language. It is not
	intended as a tutorial.

	While I am trying to be as precise as possible, I chose to use English rather
	than formal specifications for everything except syntax and lexical analysis.
	This should make the document more understandable to the average reader, but
	will leave room for ambiguities. Consequently, if you were coming from Mars and
	tried to re-implement Python from this document alone, you might have to guess
	things and in fact you would probably end up implementing quite a different
	language. On the other hand, if you are using Python and wonder what the precise
	rules about a particular area of the language are, you should definitely be able
	to find them here. If you would like to see a more formal definition of the
	language, maybe you could volunteer your time --- or invent a cloning machine
	:-).

	It is dangerous to add too many implementation details to a language reference
	document --- the implementation may change, and other implementations of the
	same language may work differently. On the other hand, CPython is the one
	Python implementation in widespread use (although alternate implementations
	continue to gain support), and its particular quirks are sometimes worth being
	mentioned, especially where the implementation imposes additional limitations.
	Therefore, you'll find short "implementation notes" sprinkled throughout the
	text.

	Every Python implementation comes with a number of built-in and standard
	modules. These are documented in :ref:`library-index`. A few built-in modules
	are mentioned when they interact in a significant way with the language
	definition.


	.. _implementations:

	Alternate Implementations
	=========================

	Though there is one Python implementation which is by far the most popular,
	there are some alternate implementations which are of particular interest to
	different audiences.

	Known implementations include:

	CPython
	This is the original and most-maintained implementation of Python, written in C.
	New language features generally appear here first.

	Jython
	Python implemented in Java. This implementation can be used as a scripting
	language for Java applications, or can be used to create applications using the
	Java class libraries. It is also often used to create tests for Java libraries.
	More information can be found at `the Jython website <https://www.jython.org/>`_.

	Python for .NET
	This implementation actually uses the CPython implementation, but is a managed
	.NET application and makes .NET libraries available. It was created by Brian
	Lloyd. For more information, see the `Python for .NET home page
	<https://pythonnet.github.io/>`_.

	IronPython
	An alternate Python for .NET. Unlike Python.NET, this is a complete Python
	implementation that generates IL, and compiles Python code directly to .NET
	assemblies. It was created by Jim Hugunin, the original creator of Jython. For
	more information, see `the IronPython website <https://ironpython.net/>`_.

	PyPy
	An implementation of Python written completely in Python. It supports several
	advanced features not found in other implementations like stackless support
	and a Just in Time compiler. One of the goals of the project is to encourage
	experimentation with the language itself by making it easier to modify the
	interpreter (since it is written in Python). Additional information is
	available on `the PyPy project's home page <https://pypy.org/>`_.

	Each of these implementations varies in some way from the language as documented
	in this manual, or introduces specific information beyond what's covered in the
	standard Python documentation. Please refer to the implementation-specific
	documentation to determine what else you need to know about the specific
	implementation you're using.


	.. _notation:

	Notation
	========

	.. index:: BNF, grammar, syntax, notation

	The descriptions of lexical analysis and syntax use a grammar notation that
	is a mixture of
	`EBNF <https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>`_
	and `PEG <https://en.wikipedia.org/wiki/Parsing_expression_grammar>`_.
	For example:

	.. grammar-snippet::
	:group: notation

	name: `letter` (`letter` \| `digit` \| "_")*
	letter: "a"..."z" \| "A"..."Z"
	digit: "0"..."9"

	In this example, the first line says that a ``name`` is a ``letter`` followed
	by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
	A ``letter`` in turn is any of the single characters ``'a'`` through
	``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
	to ``9``.

	Each rule begins with a name (which identifies the rule that's being defined)
	followed by a colon, ``:``.
	The definition to the right of the colon uses the following syntax elements:

	* ``name``: A name refers to another rule.
	Where possible, it is a link to the rule's definition.

	* ``TOKEN``: An uppercase name refers to a :term:`token`.
	For the purposes of grammar definitions, tokens are the same as rules.

	* ``"text"``, ``'text'``: Text in single or double quotes must match literally
	(without the quotes). The type of quote is chosen according to the meaning
	of ``text``:

	* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
	* ``"case"``: A name in double quotes denotes a
	:ref:`soft-keyword <soft-keywords>`.
	* ``'@'``: A non-letter symbol in single quotes denotes an
	:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
	:ref:`operator <operators>`.

	* ``e1 e2``: Items separated only by whitespace denote a sequence.
	Here, ``e1`` must be followed by ``e2``.
	* ``e1 \| e2``: A vertical bar is used to separate alternatives.
	It denotes PEG's "ordered choice": if ``e1`` matches, ``e2`` is
	not considered.
	In traditional PEG grammars, this is written as a slash, ``/``, rather than
	a vertical bar.
	See :pep:`617` for more background and details.
	* ``e*``: A star means zero or more repetitions of the preceding item.
	* ``e+``: Likewise, a plus means one or more repetitions.
	* ``[e]``: A phrase enclosed in square brackets means zero or
	one occurrences. In other words, the enclosed phrase is optional.
	* ``e?``: A question mark has exactly the same meaning as square brackets:
	the preceding item is optional.
	* ``(e)``: Parentheses are used for grouping.

	The following notation is only used in
	:ref:`lexical definitions <notation-lexical-vs-syntactic>`.

	* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
	of any single character in the given (inclusive) range of ASCII characters.
	* ``<...>``: A phrase between angular brackets gives an informal description
	of the matched symbol (for example, ``<any ASCII character except "\">``),
	or an abbreviation that is defined in nearby text (for example, ``<Lu>``).

	.. _lexical-lookaheads:

	Some definitions also use lookaheads, which indicate that an element
	must (or must not) match at a given position, but without consuming any input:

	* ``&e``: a positive lookahead (that is, ``e`` is required to match)
	* ``!e``: a negative lookahead (that is, ``e`` is required not to match)

	The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
	the vertical bar (``\|``) binds most loosely.

	White space is only meaningful to separate tokens.

	Rules are normally contained on a single line, but rules that are too long
	may be wrapped:

	.. grammar-snippet::
	:group: notation

	literal: stringliteral \| bytesliteral
	\| integer \| floatnumber \| imagnumber

	Alternatively, rules may be formatted with the first line ending at the colon,
	and each alternative beginning with a vertical bar on a new line.
	For example:


	.. grammar-snippet::
	:group: notation-alt

	literal:
	\| stringliteral
	\| bytesliteral
	\| integer
	\| floatnumber
	\| imagnumber

	This does not mean that there is an empty first alternative.

	.. index:: lexical definitions

	.. _notation-lexical-vs-syntactic:

	Lexical and Syntactic definitions
	---------------------------------

	There is some difference between lexical and syntactic analysis:
	the :term:`lexical analyzer` operates on the individual characters of the
	input source, while the parser (syntactic analyzer) operates on the stream
	of :term:`tokens <token>` generated by the lexical analysis.
	However, in some cases the exact boundary between the two phases is a
	CPython implementation detail.

	The practical difference between the two is that in lexical definitions,
	all whitespace is significant.
	The lexical analyzer :ref:`discards <whitespace>` all whitespace that is not
	converted to tokens like :data:`token.INDENT` or :data:`~token.NEWLINE`.
	Syntactic definitions then use these tokens, rather than source characters.

	This documentation uses the same BNF grammar for both styles of definitions.
	All uses of BNF in the next chapter (:ref:`lexical`) are lexical definitions;
	uses in subsequent chapters are syntactic definitions.