The following checklist for preparing a pull request with the UCD changes for an encoding proposal was (mostly) followed for https://github.com/unicode-org/unicodetools/pulls?q=label%3Apipeline-16.0. The plan is for this process to be part of the PAG’s review of encoding proposals going forward.
Prerequisites: proposal posted to L2, SAH agreed to recommend for provisional assignment (or the proposal is already in the pipeline).
If the proposal supplies LineBreak.txt:
If the proposal does not supply LineBreak.txt:
New scripts only:
New blocks only:
Joining scripts only:
Indic scripts only:
PR preparation:
<version>
depending on their status in the Pipeline.data-for-new
, so reporting those failures could distract from real issues in the UCD invariants. UCA and security data issues are addressed later in the process, before the start of β review.There are a variety of setups for unicodetools, depending on OS, in-source vs. out-of-source, git practices, etc. If you take part in UCD development, feel free to add your own.
Ken's files come from here (select appropriate ucd version e.g. ucd160
for Unicode 16.0). NOTE: this check is probably not applicable for pipeline-provisionally-assigned
data where Ken does not yet have a draft.
eggrobin (Windows, in-source; the remote corresponding to unicode-org is called la-vache, Ken’s files are downloaded next to the unicodetools repository).
$latestKenFile = (ls ..\UnicodeData-*.txt | sort LastWriteTime)[-1] $kenUnicodeData = (Get-Content $latestKenFile) git diff la-vache/main */UnicodeData.txt | sls ^\+[0-9A-F] | % { $headLine = $_.line.Substring(1) if (-not $kenUnicodeData.Contains($headLine)) { $codepoint = $headLine.Split(";")[0]; echo "Mismatch for U+$codepoint"; echo "HEAD : $headLine"; echo "Ken : $($kenUnicodeData.Where({$_.Split(";")[0] -eq $codepoint}))"; } }
eggrobin (Windows, in-source; the remote corresponding to unicode-org is called la-vache).
git fetch la-vache git merge la-vache/main git checkout la-vache/main unicodetools/data/ucd/dev/Derived*; git checkout la-vache/main unicodetools/data/ucd/dev/extracted/*; git checkout la-vache/main unicodetools/data/ucd/dev/auxiliary/*; rm .\Generated\* -recurse -force; mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=."; cp .\Generated\UCD\17.0.0\* .\unicodetools\data\ucd\dev -recurse -force; rm unicodetools\data\ucd\dev\zzz-unchanged-*; rm unicodetools\data\ucd\dev\*\zzz-unchanged-*; rm .\unicodetools\data\ucd\dev\extra\*; rm .\unicodetools\data\ucd\dev\cldr\*; git add ./unicodetools/data git merge --continue
markusicu (Linux, out-of-source; main tracks unicode-org/main)
git merge main # complains about merge conflicts as expected git checkout main unicodetools/data/ucd/dev/Derived* git checkout main unicodetools/data/ucd/dev/extracted/* git checkout main unicodetools/data/ucd/dev/auxiliary/* rm -r ../Generated/BIN/17.0.0.0/ rm -r ../Generated/BIN/UCD_Data17.0.0.bin mvn -s ~/.m2/settings.xml compile exec:java -Dexec.mainClass="org.unicode.text.UCD.Main" -Dexec.args="version 17.0.0 build MakeUnicodeFiles" -am -pl unicodetools -DCLDR_DIR=$(cd ../../../cldr/mine/src ; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -DUVERSION=17.0.0 # fix merge conflicts in unicodetools/src/main/java/org/unicode/text/UCD/UCD_Types.java # and in UCD_Names.java # rerun mvn cp -r ../Generated/UCD/17.0.0/* unicodetools/data/ucd/dev rm unicodetools/data/ucd/dev/ZZZ-UNCHANGED-* rm unicodetools/data/ucd/dev/*/ZZZ-UNCHANGED-* rm unicodetools/data/ucd/dev/extra/* rm unicodetools/data/ucd/dev/cldr/* git add unicodetools/src/main/java/org/unicode/text/UCD/UCD_Names.java git add unicodetools/src/main/java/org/unicode/text/UCD/UCD_Types.java git add unicodetools/data git merge --continue
macchiati (IDE)
sync github run MakeUnicodeFiles.java -c
Cf. https://github.com/unicode-org/unicodetools/pull/636
eggrobin (Windows, in-source).
rm .\Generated\* -recurse -force mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=." cp .\Generated\UCD\17.0.0\* .\unicodetools\data\ucd\dev -recurse -force rm unicodetools\data\ucd\dev\zzz-unchanged-* rm unicodetools\data\ucd\dev\*\zzz-unchanged-* rm .\unicodetools\data\ucd\dev\extra\* rm .\unicodetools\data\ucd\dev\cldr\* git add unicodetools/data/ucd/dev/* git commit -m "Regenerate UCD"
eggrobin (Windows, in-source).
rm .\Generated\* -recurse -force mvn compile exec:java '-Dexec.mainClass="org.unicode.text.UCD.Main"' '-Dexec.args="build MakeUnicodeFiles"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=." cp .\Generated\UCD\17.0.0\LineBreak.txt .\unicodetools\data\ucd\dev
eggrobin (Windows, in-source).
mvn compile exec:java '-Dexec.mainClass="org.unicode.props.GenerateEnums"' -am -pl unicodetools "-DCLDR_DIR=..\cldr\" "-DUNICODETOOLS_GEN_DIR=Generated" "-DUNICODETOOLS_REPO_DIR=." -U mvn spotless:apply git add *.java git commit -m GenerateEnums