| # Unihan for CLDR |
| |
| ## Run GenerateUnihanCollators |
| |
| This should be done several times during the Unicode beta process, as part of |
| going from Unicode/UCA to CLDR to ICU. See the section "Unihan collators" in |
| [icu4c/source/data/unidata/changes.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/unidata/changes.txt) |
| |
| Unicode Unihan tools code location: |
| [org/unicode/draft/GenerateUnihanCollators.java](https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/java/org/unicode/draft/GenerateUnihanCollators.java) |
| |
| There are text files in the same folder, for example patchPinyin.txt, that |
| provide overrides for bug fixes. |
| |
| :construction: **TODO**: Review the patch\*.txt overrides and remove (comment out) ones that do not |
| change the data any more because the Unihan data was updated. Probably do this |
| in the tool: Detect that an override does not change the data. |
| |
| Run `org.unicode.draft.GenerateUnihanCollators`. |
| This creates various files in $CLDR_DIR/../Generated/cldr/han |
| |
| Many of these are log files or showing fixes to properties. The important |
| results are |
| |
| 1. Han-Latin.txt |
| 2. strokeT.txt |
| 3. strokeT_short.txt |
| 4. pinyin.txt |
| 5. pinyin_short.txt |
| |
| ## Run GenerateUnihanCollatorFiles |
| |
| Code location: |
| [org/unicode/draft/GenerateUnihanCollatorFiles.java](https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/java/org/unicode/draft/GenerateUnihanCollatorFiles.java) |
| |
| Run `org.unicode.draft.GenerateUnihanCollatorFiles`. |
| |
| This merges #2-#4 into the common/collation/zh.xml. It reads from |
| $CLDR_DIR/common/collation/zh.xml and writes to |
| $CLDR_DIR/../Generated/cldr/han/replace/zh.xml. |
| |
| It also merges #1 into common/transforms/Han-Latin.xml: |
| $CLDR_DIR/common/transforms/Han-Latin.xml -> |
| $CLDR_DIR/../Generated/cldr/han/replace/Han-Latin.xml. |
| |
| After running the tool, compare the original with the output. |
| ``` |
| cd $CLDR_SRC |
| meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml |
| meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml |
| ``` |
| |
| Copy the output back into the CLDR source tree. |
| ``` |
| cd $CLDR_SRC |
| cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml |
| cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml |
| ``` |
| |
| Run CLDR unit tests. |
| If the tests pass and the changes look good, then commit. |
| |
| Details: |
| |
| This tool searches for lines of the following form, and replaces all lines between |
| them. |
| ``` |
| # START AUTOGENERATED <type> (<comment>) |
| ... |
| # END AUTOGENERATED <type> (<comment>) |
| ``` |
| |
| An error is generated if the file contains none of these AUTOGENERATED files, or |
| if there are mismatches in the type. The type is mapped to a filename using code |
| in the file. Just follow the pattern that is there. |