blob: 6a79320948e39172b3945a70f35422fa9dc8e784 [file] [log] [blame]
Name: icu
URL: http://site.icu-project.org/
Version: 4.6
Description:
This directory contains the source code of ICU 4.6 for C/C++
1. It was obtained with the following:
$ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
2. The following directories were removed because they're not used by Chromium
at the moment:
as_is
packaging
source/extra
source/sample
source/layout
source/layoutex
3. Platform header files for Linux and Mac OS X:
- Apply platform.patch in patches directory. : It applies the upstream
patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
and change source/common/unicode/ptypes.h to refer to plinux.h and
pmac.h generated below.
- On Linux and Mac OS X, 'runConfigureICU Linux' and 'runConfigureICU MacOSX'
are run to generate source/common/unicode/platform.h.
- Rename it to 'plinux.h' and 'pmac.h' on Linux and Mac
- Apply patches/pmach.h.patch on Mac to pmac.h
4. The word breaking for Chinese and Japanese were modified to use a word
frequency list with the following patch and cjdict.txt.
- patches/segmentation.patch :
Adds a dictionary (word-frequency)-based word breaking for CJK
(Korean is supported in the code, but it does not do anything
because we don't have a Korean word-list.)
- source/data/brkitr/cjdict.txt :
Chinese and Japanese word frequency list.
See the file for license/copyright notice
- source/data/brkitr/cc_edict.txt :
the list of words derived from CC-Edict.)
- patches/brkitr.patch
* word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
handling of U+0022, and splitting of FQDN into labels at '.'.
* line.txt : Incorporated line_he and minor changes in CL, OP and ID
definitions.
* brklocal.mk : build file changes to drop unnecessary brkitr rule
files (e.g. word_ja.txt, line_he.txt)
If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
5. Converter changes : converters.patch
- Include what we really need. See source/data/mappings/ucmlocal.txt
- Alias and mapping changes : source/data/mappings/convrtrs.txt
- Changes several tables and add six new tables, three of which
are 'fake' tables for ISO-2022-CN(-Ext).
- ucnv2022.c is modified to use 3 'fake' tables added above for
ISO-2022-CN(-Ext).
6. Locale changes
- patches/locale1.patch :
Filipino, Amharic, and Swahili locales
exemplar character set changes for CJK + 9 Indian locales
Minor fixes for Danish, , Turkish, and Korean.
- patches/locale2.patch :
The minimum locale data Chrome needs for 47 languages Chrome is
not localized to. Each locale data file has ExemplarCharacters,
LocaleScript, layout, and the name of the language for a locale
in its native language.
- patches/locale3.patch : Locale build configuration files. They
add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
- patches/unihan.patch:
unihan collation tables are never used in Chrome/Webkit, but it takes
about 1MB in the uncompressed ICU data file in ICU 4.2.1.
8. Build-related changes
- patches/wpo.patch
- patches/vscomp.patch
(see http://bugs.icu-project.org/trac/ticket/8355 and
http://bugs.icu-project.org/trac/ticket/8356 )
- patches/rtti.patch : Make RTTI work without exception handling on Windows
(see http://bugs.icu-project.org/trac/ticket/8343)
- patches/data.build.patch :
To remove some data files we don't use and cut down the data size.
- patches/data.build.win.patch :
Windows-only data build patch. Add a new target DATALIB to makedata.mak
- patches/clang.patch: To build with Clang.
- add an empty file (stubdatabuilt.txt) to source/stubdata
9. Pre-built data libraries are checked in.
Before building data file on Linux, re-run runConfigureICU Linux again
if it's run without data.build.patch in #8 above.
'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
in {BUILD_DIR_ROOT}/data.
- source/data/in/icudt46l.dat : Built on Linux with all the patches
above applied,
- windows/icudt.dll : With icudt46l.dat in place, all the patches applied
and header files moved (#11 below), generated by building icudt_build
project of build/icudt_build.sln on Windows. icudt46.dll is
generated in bin/{Release,Debug} and copied to windows/icudt.dll
and checked in. Note that we drop the version number ('46') from the
dll name to avoind having to update our build scripts/configuration
files everytime ICU is upgraded to a new version.
- {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the
patches above applied and checked in.
- cros/icudt46l_dat.S : Built on Linux with
abriged locale source files in cros/data put
in source/data. Those abridged locales files are
for locales ChromeOS is not localized to.
11. The header files were moved as shown below:
source/common/unicode ==> public/common/unicode
source/i18n/unicode ==> public/i18n/unicode
12. Fix for msvs2010 applied:
--- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
(revision 78292)
+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
(working copy)
@@ -75,7 +75,7 @@
* Visual Studios 9.0.
* Cygwin with MSVC 9.0 also complains here about redefinition.
*/
-#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
const int32_t StringPiece::npos;
#endif