blob: e9ca609d1841135f8d1c790ac2a9b1b27b043d58 [file] [log] [blame]
Name: icu
Version: 4.6
License: MIT
Security Critical: yes
This directory contains the source code of ICU 4.6 for C/C++
1. It was obtained with the following:
$ svn export --native-eol LF icu46
2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX:
- Apply platform.patch in patches directory. : It applies the upstream
patch to (see
and change source/common/unicode/ptypes.h to refer to plinux.h and
pmac.h generated below.
- 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
'runConfigureICU MacOSX' are run to generate
- On OpenBSD, source/common/unicode/platform.h is being generated
by the icu4c port in the ports directory and not by runConfigureICU.
In case the file has to be updated you can do:
cd /home/ports/textproc/icu4c && make configure
- Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
- Apply patches/pmach.h.patch on Mac to pmac.h
- On Android, the pandroid.h was generated by copying plinux.h to
pandroid.h and applying the patches/pandroid.h.patch.
- For QNX, the pqnx.h was generated by copying plinux.h to
pqnx.h and applying the patches/platform.qnx.patch.
- For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to
pnacl.h and applying the patches/pnacl.h.patch.
- Apply the CL at to plinux.h
3. The following directories were removed because they're not used by Chromium
at the moment:
4. The word breaking for Chinese and Japanese were modified to use a word
frequency list with the following patch and cjdict.txt.
- patches/segmentation.patch :
Adds a dictionary (word-frequency)-based word breaking for CJK
(Korean is supported in the code, but it does not do anything
because we don't have a Korean word-list.)
- source/data/brkitr/cjdict.txt :
Chinese and Japanese word frequency list.
See the file for license/copyright notice
- source/data/brkitr/cc_edict.txt :
the list of words derived from CC-Edict.)
- patches/brkitr.patch
* word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
handling of U+0022, and splitting of FQDN into labels at '.'.
For Hebrew, see
* line.txt : Incorporated line_he and minor changes in CL, OP and ID
For Hebrew, see
For others, see
* : build file changes to drop unnecessary brkitr rule
files (e.g. word_ja.txt, line_he.txt)
- android/brkitr.patch (to be applied for Android build only) :
Reverts some changes about Chinese/Japanese segmentation rules in
patches/brkitr.patch to reduce binary size for Android.
If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
5. Converter changes : converters.patch
- Include what we really need. See source/data/mappings/ucmlocal.txt
- Alias and mapping changes : source/data/mappings/convrtrs.txt
- Changes several tables and add six new tables, three of which
are 'fake' tables for ISO-2022-CN(-Ext).
- ucnv2022.c is modified to use 3 'fake' tables added above for
6. Locale changes
- patches/locale1.patch :
Filipino, Amharic, and Swahili locales
exemplar character set changes for CJK + 9 Indian locales
Minor fixes for Danish, , Turkish, and Korean.
- patches/locale2.patch :
The minimum locale data Chrome needs for 47 languages Chrome is
not localized to. Each locale data file has ExemplarCharacters,
LocaleScript, layout, and the name of the language for a locale
in its native language.
- patches/locale3.patch : Locale build configuration files. They
add or {trns,sprep,rbnf,coll} files to
- In source/data/region, run the following command to get rid of numeric region
display names we don't use (everything other than 419).
$ sed -i '/[0-35-9][0-9][0-9]{/ d' *.txt
- android/ (to be run for Android build only):
Makes changes to source/data/{curr,region,lang} to exclude these data
except the language and script names of zh_Hans and zh_Hant.
- Add tg.txt to source/data/locale source/data/lang to add the minimal locale
data necessary for the spellchecker. In both directories, add tg.txt to
7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
- patches/unihan.patch:
unihan collation tables are never used in Chrome/Webkit, but it takes
about 1MB in the uncompressed ICU data file in ICU 4.2.1.
8. Timezone data update
- Grab the latest version of the following timezone data files and
put them in source/data/misc.
As of Mar 2014, the latest version is 2014a and the above files
are available at
9. Transliterator customization
- Add el_Upper.txt taken from ICU 52 to source/data/trnslit
- Also add css3transform.txt to the same directory
- Put the following line in
10. Build-related changes
- patches/wpo.patch
- patches/vscomp.patch
(see and )
- patches/rtti.patch : Make RTTI work without exception handling on Windows
- patches/ :
To remove some data files we don't use and cut down the data size.
- patches/ :
Windows-only data build patch. Add a new target DATALIB to makedata.mak
- patches/clang.patch: To build with Clang.
(see Two other chunks in
the patch have already been fixed in the ICU trunk.)
- add an empty file (stubdatabuilt.txt) to source/stubdata
11. Pre-built data libraries are checked in.
Before building data file on Linux, re-run 'runConfigureICU Linux' again
if it's run without in #10 above.
Because we removed layout and layoutex directories in step 3,
'runConfigureICU Linux' will fail even with '--disable-layout'. A
work-around is to have a copy of our icu tree in a separate build directory
and add back directories we removed in step 3 before
running 'runConfigure'.
'make' will fail in the 1st pass. Copy source/data/in/coll/
to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
in {BUILD_DIR_ROOT}/data.
'make' will fail again when pkgdata looks for css3transform.res. Edit
data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
(see ) and run 'make' again.
- source/data/in/icudtl.dat : Built on Linux with all the patches
above applied. icudt46l.dat is generated in
{BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
version number (46) dropped.
- windows/icudt.dll : With icudt46l.dat in place, all the patches applied
and header files moved (#11 below), generated by building icudt_build
project of build/icudt_build.sln on Windows. icudt46.dll is
generated in bin/{Release,Debug} and copied to windows/icudt.dll
and checked in. Note that we drop the version number ('46') from the
dll name to avoind having to update our build scripts/configuration
files everytime ICU is upgraded to a new version.
- {mac,linux}/icudt46l_dat.S : Built on Linux with all the
patches above (except android/brkitr.patch) applied and checked in.
This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.
mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made
by changing the header portion of the Linux version to read as following
(no leading whitespace) :
.globl _icudt46_dat
.private_extern _icudt46_dat
.align 4
- android/icudt46l_dat.S : Built on Linux with all the patches above and
android/brkitr.patch applied and android/ executed, and
checked in.
- android/icudtl.dat : Generated as icudt46l.dat in
{BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and
copied to the above location with '46' dropped in its name.
12. Apply the fix found with static analysis tools such as PSV and coverity
- patches/static.analysis.patch
- upstream trunk/4.8 do not have this code any more.
13. Fix for msvs2010 applied:
--- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
(revision 78292)
+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
(working copy)
@@ -75,7 +75,7 @@
* Visual Studios 9.0.
* Cygwin with MSVC 9.0 also complains here about redefinition.
-#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
const int32_t StringPiece::npos;
14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
- upstream bug:
- Handle other chars besides the dot. This is required because decNumber's
parser expects the dot as a decimal separator.
- Locales that don't use dot were producing "NaN" values.
15. Fix a bug in the regex engine.
- patches/regex.patch
- upstream bug: (fixed in the upstream)
16. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
- patches/search_collation.patch
- upstream bug:
17. Fix a use of uninitialized memory bug in regular expression matching
- patches/rematch.patch
- upstream bug:
18. Make it compile with -Werror on gcc 4.6
- patches/gcc46.patch (ToT upstream does not have this code any more).
19. Fix four out of bounds memory access error in common/uloc.c
and common/uresbund.c
- patches/uloc.patch
- upstream bug:
1. (_canonicalize)
2. (_getKeywords)
3. (uresbund) (uresbund)
4. (_getKeywords)
20. Fix a null pointer error in ubrk_setText in ubrk.cpp.
- patches/ubrk.patch
- upstream bug :
21. Fix a clang warning in rbbi.cpp by merging in an upstream change.
- patches/changeset_30255.patch
- upstream change :
22. Fix time zone handling and compilation on iOS.
- patches/ios_timezone.patch
- upstream bugs :
23. Fix a buffer overflow in utext
- patches/utext.patch
- upstream change :
24. Fix compilation errors on VS2012 and above.
- patches/vs2012.patch
25. Fix a buffer overflow in UTF-16/32 detection.
- patches/csetdet.patch
- upstream bug:
26. Add BreakIterator::getRuleStatus
- patches/breakiterator.patch
- Copy and paste BreakIterator::getRuleStatus API from ICU 52
27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT.
- patches/declspec.patch
28. Add support for QNX Neutrino.
- patches/platform.qnx.patch:
See #2 about the platform header generation.
- patches/si_value.undef.patch:
Work around an all-lowercase macro defined in <signal.h>.
Upstream took a different approach:
- patches/xopen_source.patch:
Set _XOPEN_SOURCE to 600 as in the upstream changeset: