| Name: DFA UTF-8 Decoder |
| Short Name: utf8-decoder |
| URL: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ |
| Version: N/A |
| Date: 2010-06-25 |
| License: MIT |
| License File: LICENSE |
| Security Critical: yes |
| Shipped: yes |
| |
| Description: |
| Decodes UTF-8 bytes using a fast and simple definite finite automata. |
| |
| Local modifications: |
| - Rejection state has been mapped to row 0 (instead of row 1) of the DFA, |
| saving some 50 bytes and making the table easier to reason about. |
| - The transitions have been remapped to represent both a state transition and a |
| bit mask for the incoming byte. |
| - The caller must now zero out the code point buffer after successful or |
| unsuccessful state transitions. |
| - Specifically for generalized-utf8-decoder.h: we adapt the original |
| decoder to decode and validate "generalized UTF-8", a variant of UTF-8 |
| used in WTF-8 that can encode surrogates. See |
| https://simonsapin.github.io/wtf-8/#generalized-utf8. There is one |
| fewer state and so the transition table is smaller by one in both |
| dimensions. |