Name: icu
Version: 52.1
License: MIT
Security Critical: yes
This directory contains the source code of ICU 52.1 for C/C++
1. It was obtained with the following:
$ svn export --native-eol LF icu52
The following directories we don't use are removed:
- as_is
- packaging
- source/layout
- source/layoutex
- source/data/xml
patches/configure.patch is applied to get runConfigureICU work in the
icudata generation step without layout and layoutex directory by removing the
corresponding Makefile's from ac_config variable.
2. Apply the following patch for platform related headers (putilimpl.h and
- patches/putil.patch for Android, QNX and newlib(NaCl-newlib).
Upstream bug for Android :
Upstream bug for QNX :
Upstream bug for newlib :
- patches/platform_nacl.patch to add U_PF_NATIVE_CLIENT
Upstream bug :
3. Breakiterator patches
- Apply patches/brkitr.patch
* word.txt
a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
FQDN labels can be split at '.'
b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
* line.txt
a. Use Japanese rules for all locales because Japanese tailoring only
affects Japanese specific characters.
b. Minor changes in CL, OP and IS definitions to handle 'comma-variants'
more consistenly.
c. Fix line breaking for Chinese characters and quotation marks
See and
- Add a new file (copied from with line_ja.txt
and word_POSIX.txt dropped from the build list.
- Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
(source/data/brkitr/khmerdict.txt) obtained from
- Add several common Chinese words that were dropped previously to
patch: patches/cjdict.patch
upstream bug:
- android/brkitr.patch (to be applied for Android build only) :
Do not use the C+J dictionary for Chinese/Japanese segmentation
to reduce the data size. Adjust word.txt and a few other files.
- source/data/brkitr/word_ja.txt (used only on Android)
Added for Japanese-specific word-breaking without the C+J dictionary.
4. Converter changes :
- convrtrs.txt : Replaced the original by our own that only lists encodings
and aliases required by the WHATWG Encoding spec plus a few extra (see
the file as to why).
- Add source/data/mappings/ucmlocal.txt : to list only converters we need.
- Add new tables per the WHATWG encoding standards for EUC-JP,
Shift_JIS and all the single byte encodings.
They're generated with scripts : scripts/{eucjp, sjis, single_byte}
- uconv.patch
a. ISO-2022-JP-[1-4] is dropped.
b. SCSU, BOCU, ISCII, UTF-7, LMB, ibm42*, ISO-2022-{KR,CN*} and HZ-GB :
converters and detectors are dropped leading to the ~100kB reduction
in the code size.
5. Locale changes
- patches/locale1.patch :
a. Exemplar character set changes for zh*, ja + 9 Indian locales
b. Minor fixes for Korean, a few Indic (AmPmMarkers) and
others (datetime format)
- Locale build configuration files: To include the full locale data
for Chrome's UI languages and the minimum locale data for other locales,
add or {trns,sprep,rbnf,coll} files to
This along with #8 (, #3 (brkiter) and #4 (converter)
cuts down the data size by ~ 11MB.
- Run scripts/ : About 2.1MB data size reduction.
a. Trim the locale data for Chrome's UI langauges :
locales, lang, region, currency
b. Trim the locale data for non-UI languages to the bare minimum :
ExemplarCharacters, LocaleScript, layout, and the name of the
language for a locale in its native language.
c. Remove the legacy Chinese character set-based collation
(big5han/gb2312han) that don't make any sense and nobdoy uses.
- android/ (to be run for Android build only):
a. Makes changes to source/data/{curr,region,lang} to exclude these data
except the language and script names of zh_Hans and zh_Hant.
b. Remove exemplar cities in timezone data (data/zone)
c. Keep only the minimal calendar data in data/locales
- Add tg.txt to source/data/locale source/data/lang to add the minimal locale
data necessary for the spellchecker. In both directories, add tg.txt to
6. Timezone data update
- Grab the latest version of the following timezone data files and
put them in source/data/misc.
As of August 2014, the latest version is 2014f and the above files
are available at
7. Transliterator customization
- Also add css3transform.txt to source/data/trnslit.
- Put the following line in
8. Build-related changes
- patches/wpo.patch
Upstream bugs :
- patches/vscomp.patch for building with Visual Studio on Windows.
a. do not use WINDOWS_LOCALE_API in locmap.c
b. do not redefine stringpiece::npos
c. fix a Windows build failure with U_USING_ICU_NAMESPACE=0
upstream bug:
fixed in ICU 53)
d. Explicitly use Windows 'A' API when argument is an LPSTR in wintz.c
upstream bug :
- patches/ :
Remove unnecessary resources : invuca, unames, collator source, stringprep
- patches/ :
Windows-only data build patch.
- patches/clang_win.patch :
Take care of 3 warnings from clang and MSVC 2013.
upstream bug :
9. Pre-built data files are checked in with the following steps on Linux:
a. Make a icu data build directory outside the Chromium source tree
and cd to that directory.
b. Run
${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
c. Run 'make'
d. 'make' will fail in the 1st pass. Copy
to {BUILD_DIR_ROOT}/data/out/build/icudt52l/coll and re-run 'make'
in {BUILD_DIR_ROOT}/data.
e. 'make' will fail again when pkgdata looks for css3transform.res. Edit
data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
(see ) and run 'make' again.
- source/data/in/icudtl.dat : Built on Linux with all the patches
above applied. icudt52l.dat is generated in
{BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
version number (52) dropped.
- {mac,linux}/icudtl_dat.S : Built on Linux with all the
patches above (except android/brkitr.patch) applied and checked in.
This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
icudt52l_dat.S, but '52' is dropped while copying.
mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
the header portion. With "linux/icudtl_dat.S" in its place,
run scripts/ to generate it.
- android/icudtl_dat.S : Built on Linux with all the patches above and
android/brkitr.patch applied and android/ executed.
'52' is dropped from the name generated in the build tree.
- android/icudtl.dat : Generated as icudt52l.dat in
{BUILD_DIR_ROOT}/data/out/tmp along with icudt52l_dat.S and
copied to the above location with '52' dropped in its name.
- windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
and don't use this file.)
a. check out a clean copy of icu52 from the upstream on Windows
outside the Chrome tree.
$ svn export --native-eol LF ${SEPARATE_ICU_ROOT}/icu52
b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
c. In Visual Studio, open source/allinone/allinone.sln solution
d. Build 'makedata' target
e. icudt52.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
f. Copy that icudt52.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
and check that in.
10. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
- patches/declspec.patch
11. Cherry-pick an upstream patch to fix a bug in bidi.
- patches/bidi.patch
- upstream bug :
12. Apply the following patch for regex
- patches/regex.patch
- upstream bugs :