| Name: icu |
| URL: http://site.icu-project.org/ |
| Version: 62.1 |
| License: MIT |
| Security Critical: yes |
| |
| Description: |
| This directory contains the source code of ICU 62.1 for C/C++. |
| |
| A. How to update ICU |
| |
| 1. Run "scripts/update.sh <version>" (e.g. 60-1). |
| This will download ICU from the upstream svn repository. |
| It does preserve Chrome-specific build files (*local.mk) and |
| converter files. (see section C) |
| |
| BUILD.gn and icu.gyp* files are automatically updated, too. |
| |
| 2. Review and apply patches/changes in "D. Local Modifications" if |
| necessary/applicable. Update patch files in patches/. |
| |
| 3. Follow the instructions in section B on building ICU data files |
| |
| |
| B. How to build ICU data files |
| |
| |
| Pre-built data files are generated and checked in with the following steps |
| |
| 1. icu data files for Chrome OS, Linux, Mac and Windows |
| |
| a. Make a icu data build directory outside the Chromium source tree |
| and cd to that directory (say, $ICUBUILDIR). |
| |
| b. Run |
| |
| ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests |
| |
| |
| c. Run make |
| 'make' will fail when pkgdata looks for root_subset.res. This |
| is expected. See https://unicode-org.atlassian.net/browse/ICU-10570 |
| |
| d. Run |
| ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh |
| |
| This script takes the following steps: |
| |
| i) scripts/trim_data.sh |
| The full locale data for Chrome's UI languages and their select variants |
| and the bare minimum locale data for other locales will be kept. |
| |
| ii) scripts/make_data.sh |
| This makes icudt${version}l.dat. |
| |
| iii) scripts/copy_data.sh common |
| This copies the ICU data files for non-Android platforms |
| (both Little and Big Endian) to the following locations: |
| |
| common/icudtl.dat |
| common/icudtb.dat |
| |
| iv) cast/patch_locale.sh |
| On top of trim_data.sh (step d), further cuts the data entries for |
| Cast. |
| |
| v) Repeat ii) and iii) for cast to get cast/icudtl.dat |
| |
| vi) android/patch_locale.sh |
| On top of trim_data.sh (step d), further cuts the data entries for |
| Android. |
| |
| vii) Repeat ii) and iii) for Android to get android/icudtl.dat |
| |
| viii) ios/patch_locale.sh |
| Further cuts the data size for iOS. |
| |
| ix) Repeat ii) and iii) for iOS to get ios/icudtl.dat |
| |
| x) Repeat ii) and iii) for Flutter to get flutter/icudtl.dat |
| |
| xi) scripts/clean_up_data_source.sh |
| |
| This reverts the result of trim_data.sh and patch_locale.sh and |
| make the tree ready for committing updated ICU data files for |
| non-Android and Android platforms. |
| |
| e. Whenever data is updated (e.g timezone update), take step d as long |
| as the ICU build directory used in a ~ c is kept. |
| |
| 2. Note on the locale data customization |
| |
| - scripts/trim_data.sh |
| a. Trim the locale data for Chrome's UI langauges : |
| locales, lang, region, currency, zone |
| b. Trim the locale data for non-UI languages to the bare minimum : |
| ExemplarCharacters, LocaleScript, layout, and the name of the |
| language for a locale in its native language. |
| c. Remove the legacy Chinese character set-based collation |
| (big5han/gb2312han) that don't make any sense and nobdoy uses. |
| |
| - android/patch_locale.sh |
| a. Make changes to source/data/{region,lang} to exclude these data |
| except the language and script names of zh_Hans and zh_Hant. |
| b. Remove exemplar cities in timezone data (data/zone). |
| c. Keep only the minimal calendar data in data/locales. |
| d. Include currency display names for a smaller subset of currencies. |
| e. Minimize the locale data for 9 locales to which Chrome on Android |
| is not localized. |
| f. Also apply android/brkitr.patch |
| |
| - android/brkitr.patch |
| Do not use the C+J dictionary for Chinese/Japanese segmentation |
| to reduce the data size. Adjust word.txt and a few other files. |
| |
| C. Chromium-specific data build files and converters |
| |
| They're preserved in step A.1 above. In general, there's no need to touch |
| them when updating ICU. |
| |
| 1. source/data/mappings |
| - convrtrs.txt : Lists encodings and aliases required by the WHATWG |
| Encoding spec plus a few extra (see the file as to why). |
| |
| - ucmlocal.txt : to list only converters we need. |
| |
| - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP, |
| Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. |
| They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh. |
| |
| - gb18030.ucm and windows-936.ucm |
| gb_table.patch was applied for the following changes. No need |
| to apply it again. The patch is kept for the record. |
| a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per |
| the encoding spec (one-way mapping in toUnicode direction). |
| b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map |
| from U+1E3F to \xA8\xBC (windows-936/GBK). |
| See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 |
| |
| 2. source/data/*/*local.mk |
| - List locales of interest to Chromium |
| a. Chrome's UI languages |
| b. Variants of UI languages |
| c. Other locales in Accept-Language list : will only have bare minimum |
| locale data |
| |
| - brklocal.mk drops some line*brk files to save space for now. |
| |
| 3. source/data/brkitr |
| - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See |
| https://unicode-org.atlassian.net/browse/ICU-9451 |
| - rules/word_ja.txt (used only on Android) |
| Added for Japanese-specific word-breaking without the C+J dictionary. |
| - rules/{fi,root,zh,zh_Hant}.txt |
| a. Drop *_loose.txt for fi/root and use the corresponding line_normal.txt |
| b. Use line_normal by default. |
| c. Drop local patches we used to have for the following issues. They'll |
| be dealt with in the upstream (Unicode/CLDR). |
| http://unicode.org/cldr/trac/ticket/6557 |
| http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) |
| |
| 4. source/data/trnslit/root_subset.txt |
| Subset of transliteration data to keep for: |
| |
| 5. Add {an,ku,tg,wa}.txt to source/data/{locale,lang} |
| with the minimal locale data necessary for spellchecker and |
| and language menus. |
| |
| D. Local Modifications |
| |
| 1. Applied locale data patches from Google obtained by diff'ing |
| the upstream copy and Google's internal copy for source/data |
| |
| - patches/locale_google.patch: |
| * Google's internal ICU locale changes |
| * Simpler region names for Hong Kong and Macau in all locales |
| * Currency signs in ru and uk locales (do not include 'tr' locale changes) |
| * AM/PM, midnight, noon formatting for a few Indian locales |
| * Timezone name changes in Korean and Chinese locales |
| * Default digit for Arabic locale is European digits. |
| |
| - patches/locale1.patch: Minor fixes for Korean |
| |
| |
| 2. Breakiterator patches |
| - patches/wordbrk.patch for word.txt |
| a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
| FQDN labels can be split at '.' |
| b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
| See http://unicode.org/cldr/trac/ticket/6555 |
| |
| - patches/khmer-dictbe.patch |
| Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt). |
| https://unicode-org.atlassian.net/browse/ICU-9451 |
| |
| - Add several common Chinese words that were dropped previously to |
| source/data/cjdict/brkitr/cjdict.txt |
| patch: patches/cjdict.patch |
| upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888 |
| |
| 3. Timezone data update |
| Run scripts/update_tz.sh to grab the latest version of the |
| following timezone data files and put them in source/data/misc |
| |
| metaZones.txt |
| timezoneTypes.txt |
| windowsZones.txt |
| zoneinfo64.txt |
| |
| As of May 4, 2018, the latest version is 2018e and the above files |
| are available at |
| http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2018e/44/ |
| |
| 4. Build-related changes |
| |
| - patches/wpo.patch (only needed when icudata dll is used). |
| upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043 |
| https://unicode-org.atlassian.net/browse/ICU-5701 |
| - patches/vscomp.patch for building with Visual Studio on Windows: |
| do not use WINDOWS_LOCALE_API in locmap.c |
| |
| - patches/data.build.patch : |
| Remove unnecessary resources : unames, collator rule source |
| - patches/data.build.win.patch : |
| Windows-only data build patch. |
| - patches/data_symb.patch : |
| Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use |
| the icu data file or icudt.dll |
| |
| 5. Fix -Wsign-compare warning in EnumSet::isValidEnum() |
| |
| - patches/isvalidenum.patch |
| upstream bug: https://unicode-org.atlassian.net/browse/ICU-13509 |
| |
| 7. Update IANA language tag/subtag mapping and add missing canonicalization for |
| deprecated regions |
| |
| - patches/locid_map.patch |
| - upstream bugs: |
| https://unicode-org.atlassian.net/browse/ICU-13726 |
| https://unicode-org.atlassian.net/browse/ICU-13723 |
| https://unicode-org.atlassian.net/browse/ICU-13721 |
| https://unicode-org.atlassian.net/browse/ICU-13720 |
| https://unicode-org.atlassian.net/browse/ICU-13719 |
| |
| 8. Double conversion library build failure |
| |
| - patches/double_conversion.patch |
| - upstream bugs: |
| https://unicode-org.atlassian.net/browse/ICU-13750 |
| https://github.com/google/double-conversion/issues/66 |
| |
| 9. Cherry-pick Greek lowercase fix from the upstream |
| |
| - patches/greek_lowercase.patch |
| - upstream bug (fixed in 62.2-to-be) |
| https://unicode-org.atlassian.net/browse/ICU-13851 |
| |
| 10. Max significant digit is always 6 |
| |
| - patches/nf_maxsig.patch |
| - upstream bug: |
| https://unicode-org.atlassian.net/browse/ICU-13852 |
| |
| 11. Align memory buffer used in Decimal Format |
| |
| - patches/decimalformat_align.patch |
| - upstream bug: |
| https://unicode-org.atlassian.net/browse/ICU-20039 |