README.rst - external/github.com/kjd/idna - Git at Google

 Internationalized Domain Names in Applications (IDNA)
 =====================================================

 Support for the Internationalized Domain Names in
 Applications (IDNA) protocol as specified in `RFC 5891
 <https://tools.ietf.org/html/rfc5891>`_. This is the latest version of
 the protocol and is sometimes referred to as “IDNA 2008”.

 This library also provides support for Unicode Technical
 Standard 46, `Unicode IDNA Compatibility Processing
 <https://unicode.org/reports/tr46/>`_.

 This acts as a suitable replacement for the “encodings.idna”
 module that comes with the Python standard library, but which
 only supports the older superseded IDNA specification (`RFC 3490
 <https://tools.ietf.org/html/rfc3490>`_).

 Basic functions are simply executed:

 .. code-block:: pycon

     >>> import idna
     >>> idna.encode('ドメイン.テスト')
     b'xn--eckwd4c7c.xn--zckzah'
     >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
     ドメイン.テスト


 Installation
 ------------

 This package is available for installation from PyPI:

 .. code-block:: bash

     $ python3 -m pip install idna


 Usage
 -----

 For typical usage, the ``encode`` and ``decode`` functions will take a
 domain name argument and perform a conversion to A-labels or U-labels
 respectively.

 .. code-block:: pycon

     >>> import idna
     >>> idna.encode('ドメイン.テスト')
     b'xn--eckwd4c7c.xn--zckzah'
     >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
     ドメイン.テスト

 You may use the codec encoding and decoding methods using the
 ``idna.codec`` module:

 .. code-block:: pycon

     >>> import idna.codec
     >>> print('домен.испытание'.encode('idna2008'))
     b'xn--d1acufc.xn--80akhbyknj4f'
     >>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))
     домен.испытание

 Conversions can be applied at a per-label basis using the ``ulabel`` or
 ``alabel`` functions if necessary:

 .. code-block:: pycon

     >>> idna.alabel('测试')
     b'xn--0zwm56d'

 Compatibility Mapping (UTS #46)
 +++++++++++++++++++++++++++++++

 As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the
 IDNA specification does not normalize input from different potential
 ways a user may input a domain name. This functionality, known as
 a “mapping”, is considered by the specification to be a local
 user-interface issue distinct from IDNA conversion functionality.

 This library provides one such mapping that was developed by the
 Unicode Consortium. Known as `Unicode IDNA Compatibility Processing
 <https://unicode.org/reports/tr46/>`_, it provides for both a regular
 mapping for typical applications, as well as a transitional mapping to
 help migrate from older IDNA 2003 applications.

 For example, “Königsgäßchen” is not a permissible label as *LATIN
 CAPITAL LETTER K* is not allowed (nor are capital letters in general).
 UTS 46 will convert this into lower case prior to applying the IDNA
 conversion.

 .. code-block:: pycon

     >>> import idna
     >>> idna.encode('Königsgäßchen')
     ...
     idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
     >>> idna.encode('Königsgäßchen', uts46=True)
     b'xn--knigsgchen-b4a3dun'
     >>> print(idna.decode('xn--knigsgchen-b4a3dun'))
     königsgäßchen

 Transitional processing provides conversions to help transition from
 the older 2003 standard to the current standard. For example, in the
 original IDNA specification, the *LATIN SMALL LETTER SHARP S* (ß) was
 converted into two *LATIN SMALL LETTER S* (ss), whereas in the current
 IDNA specification this conversion is not performed.

 .. code-block:: pycon

     >>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
     'xn--knigsgsschen-lcb0w'

 Implementers should use transitional processing with caution, only in
 rare cases where conversion from legacy labels to current labels must be
 performed (i.e. IDNA implementations that pre-date 2008). For typical
 applications that just need to convert labels, transitional processing
 is unlikely to be beneficial and could produce unexpected incompatible
 results.

 ``encodings.idna`` Compatibility
 ++++++++++++++++++++++++++++++++

 Function calls from the Python built-in ``encodings.idna`` module are
 mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
 Simply substitute the ``import`` clause in your code to refer to the new
 module name.

 Exceptions
 ----------

 All errors raised during the conversion following the specification
 should raise an exception derived from the ``idna.IDNAError`` base
 class.

 More specific exceptions that may be generated as ``idna.IDNABidiError``
 when the error reflects an illegal combination of left-to-right and
 right-to-left characters in a label; ``idna.InvalidCodepoint`` when
 a specific codepoint is an illegal character in an IDN label (i.e.
 INVALID); and ``idna.InvalidCodepointContext`` when the codepoint is
 illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ
 but the contextual requirements are not satisfied.)

 Building and Diagnostics
 ------------------------

 The IDNA and UTS 46 functionality relies upon pre-calculated lookup
 tables for performance. These tables are derived from computing against
 eligibility criteria in the respective standards. These tables are
 computed using the command-line script ``tools/idna-data``.

 This tool will fetch relevant codepoint data from the Unicode repository
 and perform the required calculations to identify eligibility. There are
 three main modes:

 * ``idna-data make-libdata``. Generates ``idnadata.py`` and
   ``uts46data.py``, the pre-calculated lookup tables used for IDNA and
   UTS 46 conversions. Implementers who wish to track this library against
   a different Unicode version may use this tool to manually generate a
   different version of the ``idnadata.py`` and ``uts46data.py`` files.

 * ``idna-data make-table``. Generate a table of the IDNA disposition
   (e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix
   B.1 of RFC 5892 and the pre-computed tables published by `IANA
   <https://www.iana.org/>`_.

 * ``idna-data U+0061``. Prints debugging output on the various
   properties associated with an individual Unicode codepoint (in this
   case, U+0061), that are used to assess the IDNA and UTS 46 status of a
   codepoint. This is helpful in debugging or analysis.

 The tool accepts a number of arguments, described using ``idna-data
 -h``. Most notably, the ``--version`` argument allows the specification
 of the version of Unicode to be used in computing the table data. For
 example, ``idna-data --version 9.0.0 make-libdata`` will generate
 library data against Unicode 9.0.0.


 Additional Notes
 ----------------

 * **Packages**. The latest tagged release version is published in the
   `Python Package Index <https://pypi.org/project/idna/>`_.

 * **Version support**. This library supports Python 3.5 and higher.
   As this library serves as a low-level toolkit for a variety of
   applications, many of which strive for broad compatibility with older
   Python versions, there is no rush to remove older interpreter support.
   Removing support for older versions should be well justified in that the
   maintenance burden has become too high.

 * **Python 2**. Python 2 is supported by version 2.x of this library.
   While active development of the version 2.x series has ended, notable
   issues being corrected may be backported to 2.x. Use "idna<3" in your
   requirements file if you need this library for a Python 2 application.

 * **Testing**. The library has a test suite based on each rule of the
   IDNA specification, as well as tests that are provided as part of the
   Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing
   <https://unicode.org/reports/tr46/>`_.

 * **Emoji**. It is an occasional request to support emoji domains in
   this library. Encoding of symbols like emoji is expressly prohibited by
   the technical standard IDNA 2008 and emoji domains are broadly phased
   out across the domain industry due to associated security risks. For
   now, applications that need to support these non-compliant labels
   may wish to consider trying the encode/decode operation in this library
   first, and then falling back to using `encodings.idna`. See `the Github
   project <https://github.com/kjd/idna/issues/18>`_ for more discussion.
	Internationalized Domain Names in Applications (IDNA)
	=====================================================

	Support for the Internationalized Domain Names in
	Applications (IDNA) protocol as specified in `RFC 5891
	<https://tools.ietf.org/html/rfc5891>`_. This is the latest version of
	the protocol and is sometimes referred to as “IDNA 2008”.

	This library also provides support for Unicode Technical
	Standard 46, `Unicode IDNA Compatibility Processing
	<https://unicode.org/reports/tr46/>`_.

	This acts as a suitable replacement for the “encodings.idna”
	module that comes with the Python standard library, but which
	only supports the older superseded IDNA specification (`RFC 3490
	<https://tools.ietf.org/html/rfc3490>`_).

	Basic functions are simply executed:

	.. code-block:: pycon

	>>> import idna
	>>> idna.encode('ドメイン.テスト')
	b'xn--eckwd4c7c.xn--zckzah'
	>>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
	ドメイン.テスト


	Installation
	------------

	This package is available for installation from PyPI:

	.. code-block:: bash

	$ python3 -m pip install idna


	Usage
	-----

	For typical usage, the ``encode`` and ``decode`` functions will take a
	domain name argument and perform a conversion to A-labels or U-labels
	respectively.

	.. code-block:: pycon

	>>> import idna
	>>> idna.encode('ドメイン.テスト')
	b'xn--eckwd4c7c.xn--zckzah'
	>>> print(idna.decode('xn--eckwd4c7c.xn--zckzah'))
	ドメイン.テスト

	You may use the codec encoding and decoding methods using the
	``idna.codec`` module:

	.. code-block:: pycon

	>>> import idna.codec
	>>> print('домен.испытание'.encode('idna2008'))
	b'xn--d1acufc.xn--80akhbyknj4f'
	>>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))
	домен.испытание

	Conversions can be applied at a per-label basis using the ``ulabel`` or
	``alabel`` functions if necessary:

	.. code-block:: pycon

	>>> idna.alabel('测试')
	b'xn--0zwm56d'

	Compatibility Mapping (UTS #46)
	+++++++++++++++++++++++++++++++

	As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the
	IDNA specification does not normalize input from different potential
	ways a user may input a domain name. This functionality, known as
	a “mapping”, is considered by the specification to be a local
	user-interface issue distinct from IDNA conversion functionality.

	This library provides one such mapping that was developed by the
	Unicode Consortium. Known as `Unicode IDNA Compatibility Processing
	<https://unicode.org/reports/tr46/>`_, it provides for both a regular
	mapping for typical applications, as well as a transitional mapping to
	help migrate from older IDNA 2003 applications.

	For example, “Königsgäßchen” is not a permissible label as *LATIN
	CAPITAL LETTER K* is not allowed (nor are capital letters in general).
	UTS 46 will convert this into lower case prior to applying the IDNA
	conversion.

	.. code-block:: pycon

	>>> import idna
	>>> idna.encode('Königsgäßchen')
	...
	idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed
	>>> idna.encode('Königsgäßchen', uts46=True)
	b'xn--knigsgchen-b4a3dun'
	>>> print(idna.decode('xn--knigsgchen-b4a3dun'))
	königsgäßchen

	Transitional processing provides conversions to help transition from
	the older 2003 standard to the current standard. For example, in the
	original IDNA specification, the LATIN SMALL LETTER SHARP S (ß) was
	converted into two LATIN SMALL LETTER S (ss), whereas in the current
	IDNA specification this conversion is not performed.

	.. code-block:: pycon

	>>> idna.encode('Königsgäßchen', uts46=True, transitional=True)
	'xn--knigsgsschen-lcb0w'

	Implementers should use transitional processing with caution, only in
	rare cases where conversion from legacy labels to current labels must be
	performed (i.e. IDNA implementations that pre-date 2008). For typical
	applications that just need to convert labels, transitional processing
	is unlikely to be beneficial and could produce unexpected incompatible
	results.

	``encodings.idna`` Compatibility
	++++++++++++++++++++++++++++++++

	Function calls from the Python built-in ``encodings.idna`` module are
	mapped to their IDNA 2008 equivalents using the ``idna.compat`` module.
	Simply substitute the ``import`` clause in your code to refer to the new
	module name.

	Exceptions
	----------

	All errors raised during the conversion following the specification
	should raise an exception derived from the ``idna.IDNAError`` base
	class.

	More specific exceptions that may be generated as ``idna.IDNABidiError``
	when the error reflects an illegal combination of left-to-right and
	right-to-left characters in a label; ``idna.InvalidCodepoint`` when
	a specific codepoint is an illegal character in an IDN label (i.e.
	INVALID); and ``idna.InvalidCodepointContext`` when the codepoint is
	illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ
	but the contextual requirements are not satisfied.)

	Building and Diagnostics
	------------------------

	The IDNA and UTS 46 functionality relies upon pre-calculated lookup
	tables for performance. These tables are derived from computing against
	eligibility criteria in the respective standards. These tables are
	computed using the command-line script ``tools/idna-data``.

	This tool will fetch relevant codepoint data from the Unicode repository
	and perform the required calculations to identify eligibility. There are
	three main modes:

	* ``idna-data make-libdata``. Generates ``idnadata.py`` and
	``uts46data.py``, the pre-calculated lookup tables used for IDNA and
	UTS 46 conversions. Implementers who wish to track this library against
	a different Unicode version may use this tool to manually generate a
	different version of the ``idnadata.py`` and ``uts46data.py`` files.

	* ``idna-data make-table``. Generate a table of the IDNA disposition
	(e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix
	B.1 of RFC 5892 and the pre-computed tables published by `IANA
	<https://www.iana.org/>`_.

	* ``idna-data U+0061``. Prints debugging output on the various
	properties associated with an individual Unicode codepoint (in this
	case, U+0061), that are used to assess the IDNA and UTS 46 status of a
	codepoint. This is helpful in debugging or analysis.

	The tool accepts a number of arguments, described using ``idna-data
	-h``. Most notably, the ``--version`` argument allows the specification
	of the version of Unicode to be used in computing the table data. For
	example, ``idna-data --version 9.0.0 make-libdata`` will generate
	library data against Unicode 9.0.0.


	Additional Notes
	----------------

	* Packages. The latest tagged release version is published in the
	`Python Package Index <https://pypi.org/project/idna/>`_.

	* Version support. This library supports Python 3.5 and higher.
	As this library serves as a low-level toolkit for a variety of
	applications, many of which strive for broad compatibility with older
	Python versions, there is no rush to remove older interpreter support.
	Removing support for older versions should be well justified in that the
	maintenance burden has become too high.

	* Python 2. Python 2 is supported by version 2.x of this library.
	While active development of the version 2.x series has ended, notable
	issues being corrected may be backported to 2.x. Use "idna<3" in your
	requirements file if you need this library for a Python 2 application.

	* Testing. The library has a test suite based on each rule of the
	IDNA specification, as well as tests that are provided as part of the
	Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing
	<https://unicode.org/reports/tr46/>`_.

	* Emoji. It is an occasional request to support emoji domains in
	this library. Encoding of symbols like emoji is expressly prohibited by
	the technical standard IDNA 2008 and emoji domains are broadly phased
	out across the domain industry due to associated security risks. For
	now, applications that need to support these non-compliant labels
	may wish to consider trying the encode/decode operation in this library
	first, and then falling back to using `encodings.idna`. See `the Github
	project <https://github.com/kjd/idna/issues/18>`_ for more discussion.