README.chromium - chromium/deps/icu.git - Git at Google

 Name: icu
 URL: http://site.icu-project.org/
 Version: 62.1
 License: MIT
 Security Critical: yes

 Description:
 This directory contains the source code of ICU 62.1 for C/C++.

 A. How to update ICU

 1. Run "scripts/update.sh <version>" (e.g. 60-1).
    This will download ICU from the upstream svn repository.
    It does preserve Chrome-specific build files (*local.mk) and
    converter files. (see section C)

    BUILD.gn and icu.gyp* files are automatically updated, too.

 2. Review and apply patches/changes in "D. Local Modifications" if
    necessary/applicable. Update patch files in patches/.

 3. Follow the instructions in section B on building ICU data files


 B. How to build ICU data files


 Pre-built data files are generated and checked in with the following steps

 1. icu data files for Chrome OS, Linux, Mac and Windows

   a. Make a icu data build directory outside the Chromium source tree
      and cd to that directory (say, $ICUBUILDIR).

   b. Run

     ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests


   c. Run make
      'make' will fail  when pkgdata looks for root_subset.res. This
      is expected. See https://unicode-org.atlassian.net/browse/ICU-10570

   d. Run
        ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh

      This script takes the following steps:

      i) scripts/trim_data.sh
        The full locale data for Chrome's UI languages and their select variants
        and the bare minimum locale data for other locales will be kept.

      ii) scripts/make_data.sh
        This makes icudt${version}l.dat.

      iii)  scripts/copy_data.sh common
        This copies the ICU data files for non-Android platforms
        (both Little and Big Endian) to the following locations:

        common/icudtl.dat
        common/icudtb.dat

      iv) cast/patch_locale.sh
        On top of trim_data.sh (step d), further cuts the data entries for
        Cast.

      v) Repeat ii) and iii) for cast to get cast/icudtl.dat

      vi) android/patch_locale.sh
        On top of trim_data.sh (step d), further cuts the data entries for
        Android.

      vii) Repeat ii) and iii) for Android to get android/icudtl.dat

      viii) ios/patch_locale.sh
        Further cuts the data size for iOS.

      ix) Repeat ii) and iii) for iOS to get ios/icudtl.dat

      x) Repeat ii) and iii) for Flutter to get flutter/icudtl.dat

      xi) scripts/clean_up_data_source.sh

      This reverts the result of trim_data.sh and patch_locale.sh and
      make the tree ready for committing updated ICU data files for
      non-Android and Android platforms.

   e. Whenever data is updated (e.g timezone update), take step d as long
   as the ICU build directory used in a ~ c is kept.

 2. Note on the locale data customization

   - scripts/trim_data.sh
       a. Trim the locale data for Chrome's UI langauges :
          locales, lang, region, currency, zone
       b. Trim the locale data for non-UI languages to the bare minimum :
         ExemplarCharacters, LocaleScript, layout, and the name of the
         language for a locale in its native language.
       c. Remove the legacy Chinese character set-based collation
          (big5han/gb2312han) that don't make any sense and nobdoy uses.

   - android/patch_locale.sh
       a. Make changes to source/data/{region,lang} to exclude these data
          except the language and script names of zh_Hans and zh_Hant.
       b. Remove exemplar cities in timezone data (data/zone).
       c. Keep only the minimal calendar data in data/locales.
       d. Include currency display names for a smaller subset of currencies.
       e. Minimize the locale data for 9 locales to which Chrome on Android
          is not localized.
       f. Also apply android/brkitr.patch

   - android/brkitr.patch
       Do not use the C+J dictionary for Chinese/Japanese segmentation
       to reduce the data size. Adjust word.txt and a few other files.

 C. Chromium-specific data build files and converters

 They're preserved in step A.1 above. In general, there's no need to touch
 them when updating ICU.

 1. source/data/mappings
   - convrtrs.txt : Lists encodings and aliases required by the WHATWG
     Encoding spec plus a few extra (see the file as to why).

   - ucmlocal.txt : to list only converters we need.

   - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
     Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
     They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.

   - gb18030.ucm and windows-936.ucm
     gb_table.patch was applied for the following changes. No need
     to apply it again. The patch is kept for the record.
     a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
     the encoding spec (one-way mapping in toUnicode direction).
     b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
     from U+1E3F to \xA8\xBC (windows-936/GBK).
        See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3

 2. source/data/*/*local.mk
   - List locales of interest to Chromium
    a. Chrome's UI languages
    b. Variants of UI languages
    c. Other locales in Accept-Language list : will only have bare minimum
    locale data

   - brklocal.mk drops some line*brk files to save space for now.

 3. source/data/brkitr
   - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See
     https://unicode-org.atlassian.net/browse/ICU-9451
   - rules/word_ja.txt (used only on Android)
     Added for Japanese-specific word-breaking without the C+J dictionary.
   - rules/{fi,root,zh,zh_Hant}.txt
     a. Drop *_loose.txt for fi/root and use the corresponding line_normal.txt
     b. Use line_normal by default.
     c. Drop local patches we used to have for the following issues. They'll
        be dealt with in the upstream (Unicode/CLDR).
        http://unicode.org/cldr/trac/ticket/6557
        http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)

 4. source/data/trnslit/root_subset.txt
    Subset of transliteration data to keep for:

 5. Add {an,ku,tg,wa}.txt to source/data/{locale,lang}
    with the minimal locale data necessary for spellchecker and
    and language menus.

 D. Local Modifications

 1. Applied locale data patches from Google obtained by diff'ing
    the upstream copy and Google's internal copy for source/data

   - patches/locale_google.patch:
     * Google's internal ICU locale changes
     * Simpler region names for Hong Kong and Macau in all locales
     * Currency signs in ru and uk locales (do not include 'tr' locale changes)
     * AM/PM, midnight, noon formatting for a few Indian locales
     * Timezone name changes in Korean and Chinese locales
     * Default digit for Arabic locale is European digits.

   - patches/locale1.patch: Minor fixes for Korean


 2. Breakiterator patches
   - patches/wordbrk.patch for word.txt
     a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
        FQDN labels can be split at '.'
     b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
        See http://unicode.org/cldr/trac/ticket/6555

   - patches/khmer-dictbe.patch
     Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
     https://unicode-org.atlassian.net/browse/ICU-9451

   - Add several common Chinese words that were dropped previously to
     source/data/cjdict/brkitr/cjdict.txt
     patch: patches/cjdict.patch
     upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888

 3. Timezone data update
   Run scripts/update_tz.sh to grab the latest version of the
   following timezone data files and put them in source/data/misc

      metaZones.txt
      timezoneTypes.txt
      windowsZones.txt
      zoneinfo64.txt

   As of May 4, 2018, the latest version is 2018e and the above files
   are available at
   http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2018e/44/

 4. Build-related changes

   - patches/wpo.patch (only needed when icudata dll is used).
     upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043
                     https://unicode-org.atlassian.net/browse/ICU-5701
   - patches/vscomp.patch for building with Visual Studio on Windows:
     do not use WINDOWS_LOCALE_API in locmap.c

   - patches/data.build.patch :
       Remove unnecessary resources : unames, collator rule source
   - patches/data.build.win.patch :
       Windows-only data build patch.
   - patches/data_symb.patch :
       Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
       the icu data file or icudt.dll

 5. Fix -Wsign-compare warning in EnumSet::isValidEnum()

   - patches/isvalidenum.patch
     upstream bug: https://unicode-org.atlassian.net/browse/ICU-13509

 7. Update IANA language tag/subtag mapping and add missing canonicalization for
     deprecated regions

   - patches/locid_map.patch
   - upstream bugs:
     https://unicode-org.atlassian.net/browse/ICU-13726
     https://unicode-org.atlassian.net/browse/ICU-13723
     https://unicode-org.atlassian.net/browse/ICU-13721
     https://unicode-org.atlassian.net/browse/ICU-13720
     https://unicode-org.atlassian.net/browse/ICU-13719

 8. Double conversion library build failure

   - patches/double_conversion.patch
   - upstream bugs:
     https://unicode-org.atlassian.net/browse/ICU-13750
     https://github.com/google/double-conversion/issues/66

 9. Cherry-pick Greek lowercase fix from the upstream

   - patches/greek_lowercase.patch
   - upstream bug (fixed in 62.2-to-be)
     https://unicode-org.atlassian.net/browse/ICU-13851

 10. Max significant digit is always 6

   - patches/nf_maxsig.patch
   - upstream bug:
     https://unicode-org.atlassian.net/browse/ICU-13852

 11. Align memory buffer used in Decimal Format

   - patches/decimalformat_align.patch
   - upstream bug:
     https://unicode-org.atlassian.net/browse/ICU-20039
	Name: icu
	URL: http://site.icu-project.org/
	Version: 62.1
	License: MIT
	Security Critical: yes

	Description:
	This directory contains the source code of ICU 62.1 for C/C++.

	A. How to update ICU

	1. Run "scripts/update.sh <version>" (e.g. 60-1).
	This will download ICU from the upstream svn repository.
	It does preserve Chrome-specific build files (*local.mk) and
	converter files. (see section C)

	BUILD.gn and icu.gyp* files are automatically updated, too.

	2. Review and apply patches/changes in "D. Local Modifications" if
	necessary/applicable. Update patch files in patches/.

	3. Follow the instructions in section B on building ICU data files


	B. How to build ICU data files


	Pre-built data files are generated and checked in with the following steps

	1. icu data files for Chrome OS, Linux, Mac and Windows

	a. Make a icu data build directory outside the Chromium source tree
	and cd to that directory (say, $ICUBUILDIR).

	b. Run

	${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests


	c. Run make
	'make' will fail when pkgdata looks for root_subset.res. This
	is expected. See https://unicode-org.atlassian.net/browse/ICU-10570

	d. Run
	${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh

	This script takes the following steps:

	i) scripts/trim_data.sh
	The full locale data for Chrome's UI languages and their select variants
	and the bare minimum locale data for other locales will be kept.

	ii) scripts/make_data.sh
	This makes icudt${version}l.dat.

	iii) scripts/copy_data.sh common
	This copies the ICU data files for non-Android platforms
	(both Little and Big Endian) to the following locations:

	common/icudtl.dat
	common/icudtb.dat

	iv) cast/patch_locale.sh
	On top of trim_data.sh (step d), further cuts the data entries for
	Cast.

	v) Repeat ii) and iii) for cast to get cast/icudtl.dat

	vi) android/patch_locale.sh
	On top of trim_data.sh (step d), further cuts the data entries for
	Android.

	vii) Repeat ii) and iii) for Android to get android/icudtl.dat

	viii) ios/patch_locale.sh
	Further cuts the data size for iOS.

	ix) Repeat ii) and iii) for iOS to get ios/icudtl.dat

	x) Repeat ii) and iii) for Flutter to get flutter/icudtl.dat

	xi) scripts/clean_up_data_source.sh

	This reverts the result of trim_data.sh and patch_locale.sh and
	make the tree ready for committing updated ICU data files for
	non-Android and Android platforms.

	e. Whenever data is updated (e.g timezone update), take step d as long
	as the ICU build directory used in a ~ c is kept.

	2. Note on the locale data customization

	- scripts/trim_data.sh
	a. Trim the locale data for Chrome's UI langauges :
	locales, lang, region, currency, zone
	b. Trim the locale data for non-UI languages to the bare minimum :
	ExemplarCharacters, LocaleScript, layout, and the name of the
	language for a locale in its native language.
	c. Remove the legacy Chinese character set-based collation
	(big5han/gb2312han) that don't make any sense and nobdoy uses.

	- android/patch_locale.sh
	a. Make changes to source/data/{region,lang} to exclude these data
	except the language and script names of zh_Hans and zh_Hant.
	b. Remove exemplar cities in timezone data (data/zone).
	c. Keep only the minimal calendar data in data/locales.
	d. Include currency display names for a smaller subset of currencies.
	e. Minimize the locale data for 9 locales to which Chrome on Android
	is not localized.
	f. Also apply android/brkitr.patch

	- android/brkitr.patch
	Do not use the C+J dictionary for Chinese/Japanese segmentation
	to reduce the data size. Adjust word.txt and a few other files.

	C. Chromium-specific data build files and converters

	They're preserved in step A.1 above. In general, there's no need to touch
	them when updating ICU.

	1. source/data/mappings
	- convrtrs.txt : Lists encodings and aliases required by the WHATWG
	Encoding spec plus a few extra (see the file as to why).

	- ucmlocal.txt : to list only converters we need.

	- *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
	Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
	They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.

	- gb18030.ucm and windows-936.ucm
	gb_table.patch was applied for the following changes. No need
	to apply it again. The patch is kept for the record.
	a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
	the encoding spec (one-way mapping in toUnicode direction).
	b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
	from U+1E3F to \xA8\xBC (windows-936/GBK).
	See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3

	2. source/data//local.mk
	- List locales of interest to Chromium
	a. Chrome's UI languages
	b. Variants of UI languages
	c. Other locales in Accept-Language list : will only have bare minimum
	locale data

	- brklocal.mk drops some line*brk files to save space for now.

	3. source/data/brkitr
	- dictionaries/khmerdict.txt: Abridged Khmer dictionary. See
	https://unicode-org.atlassian.net/browse/ICU-9451
	- rules/word_ja.txt (used only on Android)
	Added for Japanese-specific word-breaking without the C+J dictionary.
	- rules/{fi,root,zh,zh_Hant}.txt
	a. Drop *_loose.txt for fi/root and use the corresponding line_normal.txt
	b. Use line_normal by default.
	c. Drop local patches we used to have for the following issues. They'll
	be dealt with in the upstream (Unicode/CLDR).
	http://unicode.org/cldr/trac/ticket/6557
	http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)

	4. source/data/trnslit/root_subset.txt
	Subset of transliteration data to keep for:

	5. Add {an,ku,tg,wa}.txt to source/data/{locale,lang}
	with the minimal locale data necessary for spellchecker and
	and language menus.

	D. Local Modifications

	1. Applied locale data patches from Google obtained by diff'ing
	the upstream copy and Google's internal copy for source/data

	- patches/locale_google.patch:
	* Google's internal ICU locale changes
	* Simpler region names for Hong Kong and Macau in all locales
	* Currency signs in ru and uk locales (do not include 'tr' locale changes)
	* AM/PM, midnight, noon formatting for a few Indian locales
	* Timezone name changes in Korean and Chinese locales
	* Default digit for Arabic locale is European digits.

	- patches/locale1.patch: Minor fixes for Korean


	2. Breakiterator patches
	- patches/wordbrk.patch for word.txt
	a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
	FQDN labels can be split at '.'
	b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
	See http://unicode.org/cldr/trac/ticket/6555

	- patches/khmer-dictbe.patch
	Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
	https://unicode-org.atlassian.net/browse/ICU-9451

	- Add several common Chinese words that were dropped previously to
	source/data/cjdict/brkitr/cjdict.txt
	patch: patches/cjdict.patch
	upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888

	3. Timezone data update
	Run scripts/update_tz.sh to grab the latest version of the
	following timezone data files and put them in source/data/misc

	metaZones.txt
	timezoneTypes.txt
	windowsZones.txt
	zoneinfo64.txt

	As of May 4, 2018, the latest version is 2018e and the above files
	are available at
	http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2018e/44/

	4. Build-related changes

	- patches/wpo.patch (only needed when icudata dll is used).
	upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043
	https://unicode-org.atlassian.net/browse/ICU-5701
	- patches/vscomp.patch for building with Visual Studio on Windows:
	do not use WINDOWS_LOCALE_API in locmap.c

	- patches/data.build.patch :
	Remove unnecessary resources : unames, collator rule source
	- patches/data.build.win.patch :
	Windows-only data build patch.
	- patches/data_symb.patch :
	Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
	the icu data file or icudt.dll

	5. Fix -Wsign-compare warning in EnumSet::isValidEnum()

	- patches/isvalidenum.patch
	upstream bug: https://unicode-org.atlassian.net/browse/ICU-13509

	7. Update IANA language tag/subtag mapping and add missing canonicalization for
	deprecated regions

	- patches/locid_map.patch
	- upstream bugs:
	https://unicode-org.atlassian.net/browse/ICU-13726
	https://unicode-org.atlassian.net/browse/ICU-13723
	https://unicode-org.atlassian.net/browse/ICU-13721
	https://unicode-org.atlassian.net/browse/ICU-13720
	https://unicode-org.atlassian.net/browse/ICU-13719

	8. Double conversion library build failure

	- patches/double_conversion.patch
	- upstream bugs:
	https://unicode-org.atlassian.net/browse/ICU-13750
	https://github.com/google/double-conversion/issues/66

	9. Cherry-pick Greek lowercase fix from the upstream

	- patches/greek_lowercase.patch
	- upstream bug (fixed in 62.2-to-be)
	https://unicode-org.atlassian.net/browse/ICU-13851

	10. Max significant digit is always 6

	- patches/nf_maxsig.patch
	- upstream bug:
	https://unicode-org.atlassian.net/browse/ICU-13852

	11. Align memory buffer used in Decimal Format

	- patches/decimalformat_align.patch
	- upstream bug:
	https://unicode-org.atlassian.net/browse/ICU-20039