Add more confusable character map entries

When comparing domain names with top 10k domain names for confusability,
characters with diacritics are decomposed into base + diacritic marks
(Unicode Normalization Form D) and diacritics are dropped before
calculating the confusability skeleton because two characters with and
without a diacritics is NOT regarded as confusable.

However, there are a dozen of characters (most of them are Cyrillic)
with a diacritic-like mark attached but they are not decomposed into
base + diacritics by NFD (e.g. U+049B, қ; Cyrillic Small Letter Ka
with Descender).  This CL treats them the same way as their "base"
characters. For instance, қ (U+049B) is treated as confusable with
Latin k because к (U+043A; Cyrillic Small Letter Ka) is.

They're curated from the following sets:

[:IdentifierStatus=Allowed:] &  [:Ll:] &
  [[:sc=Cyrillic:] -
  [[\u01cd-\u01dc][\u1c80-\u1c8f][\u1e00-\u1e9b][\u1f00-\u1fff]
  [\ua640-\ua69f][\ua720-\ua7ff]]] &
[:NFD_Inert=Yes:]

[:IdentifierStatus=Allowed:] &  [:Ll:] &
  [[:sc=Latin:] - [[\u01cd-\u01dc][\u1e00-\u1e9b][\ua720-\ua7ff]]] &
[:NFD_Inert=Yes:]

[:IdentifierStatus=Allowed:] &  [:Ll:] & [[:sc=Greek:]] &
[:NFD_Inert=Yes:]

Bug: 793628,798892
Test: components_unittests --gtest_filter=*IDN*
Change-Id: I20c6af13defa295f6952f33d75987e87ce1853d0
Reviewed-on: https://chromium-review.googlesource.com/860567
Commit-Queue: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Eric Lawrence <elawrence@chromium.org>
Reviewed-by: Peter Kasting <pkasting@chromium.org>
Cr-Commit-Position: refs/heads/master@{#529129}
4 files changed