commit | fe3c71592ccc6fd6f3909215e326ffc8fe0c35ce | [log] [tgz] |
---|---|---|
author | Jungshik Shin <jshin@chromium.org> | Sat Jan 13 01:11:09 2018 |
committer | Commit Bot <commit-bot@chromium.org> | Sat Jan 13 01:11:09 2018 |
tree | 1562b41301a6658acde4567d6d8ea8dd9020f7ed | |
parent | 1f19e3ea6cf6acd1a06adf6552285032360accd2 [diff] |
Add more confusable character map entries When comparing domain names with top 10k domain names for confusability, characters with diacritics are decomposed into base + diacritic marks (Unicode Normalization Form D) and diacritics are dropped before calculating the confusability skeleton because two characters with and without a diacritics is NOT regarded as confusable. However, there are a dozen of characters (most of them are Cyrillic) with a diacritic-like mark attached but they are not decomposed into base + diacritics by NFD (e.g. U+049B, қ; Cyrillic Small Letter Ka with Descender). This CL treats them the same way as their "base" characters. For instance, қ (U+049B) is treated as confusable with Latin k because к (U+043A; Cyrillic Small Letter Ka) is. They're curated from the following sets: [:IdentifierStatus=Allowed:] & [:Ll:] & [[:sc=Cyrillic:] - [[\u01cd-\u01dc][\u1c80-\u1c8f][\u1e00-\u1e9b][\u1f00-\u1fff] [\ua640-\ua69f][\ua720-\ua7ff]]] & [:NFD_Inert=Yes:] [:IdentifierStatus=Allowed:] & [:Ll:] & [[:sc=Latin:] - [[\u01cd-\u01dc][\u1e00-\u1e9b][\ua720-\ua7ff]]] & [:NFD_Inert=Yes:] [:IdentifierStatus=Allowed:] & [:Ll:] & [[:sc=Greek:]] & [:NFD_Inert=Yes:] Bug: 793628,798892 Test: components_unittests --gtest_filter=*IDN* Change-Id: I20c6af13defa295f6952f33d75987e87ce1853d0 Reviewed-on: https://chromium-review.googlesource.com/860567 Commit-Queue: Jungshik Shin <jshin@chromium.org> Reviewed-by: Eric Lawrence <elawrence@chromium.org> Reviewed-by: Peter Kasting <pkasting@chromium.org> Cr-Commit-Position: refs/heads/master@{#529129}
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all users to experience the web.
The project's web site is https://www.chromium.org.
Documentation in the source is rooted in docs/README.md.
Learn how to Get Around the Chromium Source Code Directory Structure .