blob: 24b51182fbc2a299f6a4ef9cc4ff40b156655836 [file] [log] [blame]
Name: icu
URL: http://site.icu-project.org/
Version: 54.1
License: MIT
Security Critical: yes
Description:
This directory contains the source code of ICU 54.1 for C/C++.
1. It was obtained with the following:
$ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 icu
The following directories we don't use are removed:
- as_is
- packaging
- source/layout
- source/layoutex
- source/data/xml
patches/configure.patch is applied to get runConfigureICU work in the
icudata generation step without layout and layoutex directory by removing the
corresponding Makefile's from ac_config variable.
2. Apply the following patch for platform.h for NaCl.
- patches/platform_nacl.patch to add U_PF_NATIVE_CLIENT
- upstream bug (fixed in the upstream 55 RC)
http://bugs.icu-project.org/trac/ticket/11033
3. Breakiterator patches
- Apply patches/brkitr.patch
* word.txt
a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
FQDN labels can be split at '.'
b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
See http://unicode.org/cldr/trac/ticket/6555
* line.txt
a. Use Japanese rules for all locales because Japanese tailoring only
affects Japanese specific characters.
See http://unicode.org/cldr/trac/ticket/3974
b. Minor changes in CL, OP and IS definitions to handle 'comma-variants'
more consistenly.
See http://unicode.org/cldr/trac/ticket/6557
c. Fix line breaking for Chinese characters and quotation marks
See http://unicode.org/cldr/trac/ticket/4200 and
http://crbug.com/39779
- Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
and word_POSIX.txt dropped from the build list.
- Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
(source/data/brkitr/khmerdict.txt) obtained from
http://bugs.icu-project.org/trac/ticket/9451
- Add several common Chinese words that were dropped previously to
source/data/cjdict/brkitr/cjdict.txt
patch: patches/cjdict.patch
upstream bug: http://bugs.icu-project.org/trac/ticket/10888
- android/brkitr.patch (to be applied for Android build only) :
Do not use the C+J dictionary for Chinese/Japanese segmentation
to reduce the data size. Adjust word.txt and a few other files.
- source/data/brkitr/word_ja.txt (used only on Android)
Added for Japanese-specific word-breaking without the C+J dictionary.
4. Converter changes :
- convrtrs.txt : Replaced the original by our own that only lists encodings
and aliases required by the WHATWG Encoding spec plus a few extra (see
the file as to why).
- Add source/data/mappings/ucmlocal.txt : to list only converters we need.
- Add new tables per the WHATWG encoding standards for EUC-JP,
Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
They're generated with scripts :
scripts/{eucjp,sjis,big5,single_byte}_gen.sh
- gb_table.patch
1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
the encoding spec (one-way mapping in toUnicode direction).
2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
from U+1E3F to \xA8\xBC (windows-936/GBK).
See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
- uconv.patch
a. ISO-2022-JP-[1-4] is dropped.
b. SCSU, BOCU, ISCII, UTF-7, LMB, ibm42*, ISO-2022-{KR,CN*} and HZ-GB :
converters and detectors are dropped leading to the ~100kB reduction
in the code size.
- Upstream bugs
http://www.icu-project.org/trac/ticket/11296 (uconv.patch)
http://www.icu-project.org/trac/ticket/10303 (html5 encoding tables)
uconv.patch was landed in the upstream and is in 55 RC with the build
config changed to UCONFIG_ONLY_HTML_CONVERSION.
5. Locale changes
- patches/locale_google.patch : Google's internal ICU locale changes
- patches/locale1.patch :
a. Exemplar character set changes for zh*, ja + 9 Indian locales
b. Minor fixes for Korean and Turkish
- Locale build configuration files: To include the full locale data
for Chrome's UI languages and the minimum locale data for other locales,
add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter)
cuts down the data size by ~ 11MB.
- Run scripts/trim_data.sh : About 2.1MB data size reduction.
a. Trim the locale data for Chrome's UI langauges :
locales, lang, region, currency
b. Trim the locale data for non-UI languages to the bare minimum :
ExemplarCharacters, LocaleScript, layout, and the name of the
language for a locale in its native language.
c. Remove the legacy Chinese character set-based collation
(big5han/gb2312han) that don't make any sense and nobdoy uses.
- Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
with the minimal locale data necessary for spellchecker and
and language menus. Also change the English display name
for ckb to 'Kurdish (Arabic)'.
- android/patch_locale.sh (to be run for Android build only):
a. Make changes to source/data/{region,lang} to exclude these data
except the language and script names of zh_Hans and zh_Hant.
b. Remove exemplar cities in timezone data (data/zone).
c. Keep only the minimal calendar data in data/locales.
d. Include currency display names for a smaller subset of currencies.
e. Minimize the locale data for 9 locales to which Chrome on Android
is not localized.
f. Also apply android/brkitr.patch
6. Timezone data update
- Grab the latest version of the following timezone data files and
put them in source/data/misc using scripts/update_tz.sh
metaZones.txt
timezoneTypes.txt
windowsZones.txt
zoneinfo64.txt
As of Dec 14 2015, the latest version is 2015g and the above files
are available at
http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
7. Transliterator customization
- Also add css3transform.txt to source/data/trnslit.
- Put the following line in trnslocal.mk
TRANSLIT_SOURCE=css3transform.txt
8. Build-related changes
- patches/wpo.patch
upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
http://bugs.icu-project.org/trac/ticket/5701
- patches/vscomp.patch for building with Visual Studio on Windows.
a. do not use WINDOWS_LOCALE_API in locmap.c
b. do not redefine stringpiece::npos
c. Fix 'signed vs unsigned comparison' warning in
collationfastlatin.cpp. The upstream ToT does not have these lines
any more.
d. Add static_cast to avoid a possible data truncatiion warning
upstream bug (fixed in the upstream 55 RC)
http://bugs.icu-project.org/trac/ticket/11104
- patches/data.build.patch :
Remove unnecessary resources : unames, collator rule source
- patches/pkg_gen.patch :
upstream bug (fixed in the upstream RC 55)
http://bugs.icu-project.org/trac/ticket/10572
- patches/data.build.win.patch :
Windows-only data build patch.
- patches/data_symb.patch :
Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
the icu data file or icudt.dll
9. Pre-built data files are checked in with the following steps on Linux:
a. Make a icu data build directory outside the Chromium source tree
and cd to that directory, $ICUBUILDIR.
b. Run
${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
c. Run 'make'
d. 'make' will fail when pkgdata looks for css3transform.res.
See http://bugs.icu-project.org/trac/ticket/10570
e. run
${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh
This will make and copy icudtl.dat and icudtl_dat.S for Linux and
Mac as listed below. Renaming the data/assembly files to drop
the ICU major version number as well as running make_mac_assembly.sh
is done by this script.
This script can be run again whenever you update the data.
- source/data/in/icudtl.dat : Built on Linux with all the patches
above applied. icudt54l.dat is generated in
{BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
version number (54) dropped.
- {mac,linux}/icudtl_dat.S : Built on Linux with all the
patches above (except android/brkitr.patch) applied and checked in.
This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
icudt54l_dat.S, but '54' is dropped while copying.
mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
the header portion. With "linux/icudtl_dat.S" in its place,
- android/icudtl_dat.S : Built on Linux with all the patches above and
android/patch_locale.sh executed.
'54' is dropped from the name generated in the build tree.
- android/icudtl.dat : Generated as icudt54l.dat in
{BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and
copied to the above location with '54' dropped in its name.
- windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
and don't use this file.)
a. check out a clean copy of icu54 from the upstream on Windows
outside the Chrome tree.
$ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54
b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat
c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
${SEPARATE_ICU_ROOT}/source/data/makedata.mak
c. In Visual Studio, open source/allinone/allinone.sln solution
in ${SEPARATE_ICU_ROOT}
d. Build 'makedata' target
e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
and check that in.
10. Apply the following patches for regex
- patches/regex.patch (a combined patch of 3 revisions below)
- upstream bugs (fixed in the upstream 55 RC)
http://bugs.icu-project.org/trac/ticket/11370 (r36723:36724)
http://bugs.icu-project.org/trac/ticket/11369 (r36726:36727)
http://bugs.icu-project.org/trac/ticket/11371 (r36800:36801)
11. Fix bugs in locid (getBaseName / thread safety).
- patches/locid.patch
- upstream bugs (fixed in the upstream 55 RC)
http://bugs.icu-project.org/trac/ticket/11421
http://bugs.icu-project.org/trac/ticket/11547
12. Fix bugs in BiDi
- patches/bidi.patch
- upstream bugs (fixed in the upstream 55 RC)
http://bugs.icu-project.org/trac/ticket/11177
http://bugs.icu-project.org/trac/ticket/11451
13. Fix a data race in cmemory
- patches/cmemory.patch
- upstream bug (fixed in the upstream 55 RC)
http://www.icu-project.org/trac/ticket/11538
14. Fix a bug found by 'stack' (static analysis tool)
- patches/uloc.patch
- upstream bug
http://www.icu-project.org/trac/ticket/11602
15. Add a timezone detection API
- patches/tzdetect.patch (applied in the upstream 55)
- patches/tzdetect2.patch
- upstream bugs
http://bugs.icu-project.org/trac/ticket/11358
http://bugs.icu-project.org/trac/ticket/11623
16. Properly handle a converter name starting with 'x-'.
- patches/ucnv_name.patch
- upstream bug
http://bugs.icu-project.org/trac/ticket/11696
17. Cherry-pick an upstream patch to add a separate field for ref-counting in
the converter data.
- patches/ucnv_refcount.patch
- upstream change:
http://bugs.icu-project.org/trac/ticket/11601
18. Apply patches/infinite-recursion.patch , corresponds to upstream r36672
19. Fix a data race in umutex
- patches/mutex.patch
- upstream bug (fixed in ToT and will be included in 56)
http://bugs.icu-project.org/trac/ticket/11599
20. Cherry-pick an upstream patch for a data loading bug.
- patches/dataload.patch
- upstream change (fixed in ToT and will be included in 56)
http://bugs.icu-project.org/trac/changeset/37670
21. Fix -Woverloaded-virtual warnings
- patches/woverloaded-virtual.patch
22. Fix -Wmicrosoft-unqulified-friend warning
- patches/stringthreadtest.patch
23. Fix 'bad cast' found in Transliterator with a cfi build
- patches/xlit_badcast.patch ; speculative
- upstream bug (yet to be resolved)
http://bugs.icu-project.org/trac/ticket/11937
24. Fix a UText bug found in uregex_open fuzzer.
- patches/utext.patch
- upstream bug (fixed in trunk in Jan, 2016. Will be in 57 release)
http://bugs.icu-project.org/trac/ticket/12130
25. Fix a bug in regex compiler.
- patches/regexcmp.patch
- upstream bug
http://bugs.icu-project.org/trac/ticket/12138