| Internationalized Domain Names in Applications (IDNA) |
| ===================================================== |
| |
| Support for the Internationalized Domain Names in |
| Applications (IDNA) protocol as specified in `RFC 5891 |
| <https://tools.ietf.org/html/rfc5891>`_. This is the latest version of |
| the protocol and is sometimes referred to as “IDNA 2008”. |
| |
| This library also provides support for Unicode Technical |
| Standard 46, `Unicode IDNA Compatibility Processing |
| <https://unicode.org/reports/tr46/>`_. |
| |
| This acts as a suitable replacement for the “encodings.idna” |
| module that comes with the Python standard library, but which |
| only supports the older superseded IDNA specification (`RFC 3490 |
| <https://tools.ietf.org/html/rfc3490>`_). |
| |
| Basic functions are simply executed: |
| |
| .. code-block:: pycon |
| |
| >>> import idna |
| >>> idna.encode('ドメイン.テスト') |
| b'xn--eckwd4c7c.xn--zckzah' |
| >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah')) |
| ドメイン.テスト |
| |
| |
| Installation |
| ------------ |
| |
| This package is available for installation from PyPI: |
| |
| .. code-block:: bash |
| |
| $ python3 -m pip install idna |
| |
| |
| Usage |
| ----- |
| |
| For typical usage, the ``encode`` and ``decode`` functions will take a |
| domain name argument and perform a conversion to A-labels or U-labels |
| respectively. |
| |
| .. code-block:: pycon |
| |
| >>> import idna |
| >>> idna.encode('ドメイン.テスト') |
| b'xn--eckwd4c7c.xn--zckzah' |
| >>> print(idna.decode('xn--eckwd4c7c.xn--zckzah')) |
| ドメイン.テスト |
| |
| You may use the codec encoding and decoding methods using the |
| ``idna.codec`` module: |
| |
| .. code-block:: pycon |
| |
| >>> import idna.codec |
| >>> print('домен.испытание'.encode('idna2008')) |
| b'xn--d1acufc.xn--80akhbyknj4f' |
| >>> print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008')) |
| домен.испытание |
| |
| Conversions can be applied at a per-label basis using the ``ulabel`` or |
| ``alabel`` functions if necessary: |
| |
| .. code-block:: pycon |
| |
| >>> idna.alabel('测试') |
| b'xn--0zwm56d' |
| |
| Compatibility Mapping (UTS #46) |
| +++++++++++++++++++++++++++++++ |
| |
| As described in `RFC 5895 <https://tools.ietf.org/html/rfc5895>`_, the |
| IDNA specification does not normalize input from different potential |
| ways a user may input a domain name. This functionality, known as |
| a “mapping”, is considered by the specification to be a local |
| user-interface issue distinct from IDNA conversion functionality. |
| |
| This library provides one such mapping — `Unicode IDNA Compatibility |
| Processing <https://unicode.org/reports/tr46/>`_ developed by the Unicode |
| Consortium. Strings are preprocessed according to Section 4.4 |
| “Preprocessing for IDNA2008” prior to the IDNA operations. |
| |
| For example, “Königsgäßchen” is not a permissible label as *LATIN |
| CAPITAL LETTER K* is not allowed (nor are capital letters in general). |
| UTS 46 will convert this into lower case prior to applying the IDNA |
| conversion. |
| |
| .. code-block:: pycon |
| |
| >>> import idna |
| >>> idna.encode('Königsgäßchen') |
| ... |
| idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed |
| >>> idna.encode('Königsgäßchen', uts46=True) |
| b'xn--knigsgchen-b4a3dun' |
| >>> print(idna.decode('xn--knigsgchen-b4a3dun')) |
| königsgäßchen |
| |
| ``encodings.idna`` Compatibility |
| ++++++++++++++++++++++++++++++++ |
| |
| Function calls from the Python built-in ``encodings.idna`` module are |
| mapped to their IDNA 2008 equivalents using the ``idna.compat`` module. |
| Simply substitute the ``import`` clause in your code to refer to the new |
| module name. |
| |
| Exceptions |
| ---------- |
| |
| All errors raised during the conversion following the specification |
| should raise an exception derived from the ``idna.IDNAError`` base |
| class. |
| |
| More specific exceptions that may be generated as ``idna.IDNABidiError`` |
| when the error reflects an illegal combination of left-to-right and |
| right-to-left characters in a label; ``idna.InvalidCodepoint`` when |
| a specific codepoint is an illegal character in an IDN label (i.e. |
| INVALID); and ``idna.InvalidCodepointContext`` when the codepoint is |
| illegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJ |
| but the contextual requirements are not satisfied.) |
| |
| Building and Diagnostics |
| ------------------------ |
| |
| The IDNA and UTS 46 functionality relies upon pre-calculated lookup |
| tables for performance. These tables are derived from computing against |
| eligibility criteria in the respective standards. These tables are |
| computed using the command-line script ``tools/idna-data``. |
| |
| This tool will fetch relevant codepoint data from the Unicode repository |
| and perform the required calculations to identify eligibility. There are |
| three main modes: |
| |
| * ``idna-data make-libdata``. Generates ``idnadata.py`` and |
| ``uts46data.py``, the pre-calculated lookup tables used for IDNA and |
| UTS 46 conversions. Implementers who wish to track this library against |
| a different Unicode version may use this tool to manually generate a |
| different version of the ``idnadata.py`` and ``uts46data.py`` files. |
| |
| * ``idna-data make-table``. Generate a table of the IDNA disposition |
| (e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in Appendix |
| B.1 of RFC 5892 and the pre-computed tables published by `IANA |
| <https://www.iana.org/>`_. |
| |
| * ``idna-data U+0061``. Prints debugging output on the various |
| properties associated with an individual Unicode codepoint (in this |
| case, U+0061), that are used to assess the IDNA and UTS 46 status of a |
| codepoint. This is helpful in debugging or analysis. |
| |
| The tool accepts a number of arguments, described using ``idna-data |
| -h``. Most notably, the ``--version`` argument allows the specification |
| of the version of Unicode to be used in computing the table data. For |
| example, ``idna-data --version 9.0.0 make-libdata`` will generate |
| library data against Unicode 9.0.0. |
| |
| |
| Additional Notes |
| ---------------- |
| |
| * **Packages**. The latest tagged release version is published in the |
| `Python Package Index <https://pypi.org/project/idna/>`_. |
| |
| * **Version support**. This library supports Python 3.6 and higher. |
| As this library serves as a low-level toolkit for a variety of |
| applications, many of which strive for broad compatibility with older |
| Python versions, there is no rush to remove older interpreter support. |
| Removing support for older versions should be well justified in that the |
| maintenance burden has become too high. |
| |
| * **Python 2**. Python 2 is supported by version 2.x of this library. |
| Use "idna<3" in your requirements file if you need this library for |
| a Python 2 application. Be advised that these versions are no longer |
| actively developed. |
| |
| * **Testing**. The library has a test suite based on each rule of the |
| IDNA specification, as well as tests that are provided as part of the |
| Unicode Technical Standard 46, `Unicode IDNA Compatibility Processing |
| <https://unicode.org/reports/tr46/>`_. |
| |
| * **Emoji**. It is an occasional request to support emoji domains in |
| this library. Encoding of symbols like emoji is expressly prohibited by |
| the technical standard IDNA 2008 and emoji domains are broadly phased |
| out across the domain industry due to associated security risks. For |
| now, applications that need to support these non-compliant labels |
| may wish to consider trying the encode/decode operation in this library |
| first, and then falling back to using `encodings.idna`. See `the Github |
| project <https://github.com/kjd/idna/issues/18>`_ for more discussion. |
| |
| * **Transitional processing**. Unicode 16.0.0 removed transitional |
| processing so the `transitional` argument for the encode() method |
| no longer has any effect and will be removed at a later date. |