There are three separate processes for generating and validating UCDXML files and their corresponding UAX42 report.
Generate the UCDXML files.
(Optional) You can compare the generated UCDXML files against each other (e.g., Flat vs Grouped) or against previous versions.
Generate UAX42. There are three steps involved:
Run these from the unicodetools repo root. Adjust the CLDR_DIR as needed.
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range ALL --output FLAT"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)" mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range UNIHAN --output FLAT"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)" mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range NOUNIHAN --output FLAT"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)" mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range ALL --output GROUPED"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)" mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range UNIHAN --output GROUPED"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)" mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range NOUNIHAN --output GROUPED"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)"
Including commands for zipping the XML files for publication.
Run these from the unicodetools repo root. Adjust the CLDR_DIR as needed.
mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range ALL --output FLAT" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range UNIHAN --output FLAT" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range NOUNIHAN --output FLAT" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range ALL --output GROUPED" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range UNIHAN --output GROUPED" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools mvn compile exec:java -Dexec.mainClass="org.unicode.xml.UCDXML" -Dexec.args="--range NOUNIHAN --output GROUPED" -DCLDR_DIR=$(cd ~/cldr/uni/src; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd) -am -pl unicodetools ls -l ../Generated/ucdxml/18.0.0 rm ../Generated/ucdxml/18.0.0/*.zip meld unicodetools/data/ucdxml/dev ../Generated/ucdxml/18.0.0 cd ../Generated/ucdxml/18.0.0 zip -9 ucd.all.flat.zip ucd.all.flat.xml zip -9 ucd.all.grouped.zip ucd.all.grouped.xml zip -9 ucd.nounihan.flat.zip ucd.nounihan.flat.xml zip -9 ucd.nounihan.grouped.zip ucd.nounihan.grouped.xml zip -9 ucd.unihan.flat.zip ucd.unihan.flat.xml zip -9 ucd.unihan.grouped.zip ucd.unihan.grouped.xml
After generating UCDXML files, you can compare:
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.CompareUCDXML"' '-Dexec.args="-a {path to file} -b {path to file}"'
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.GeneratePropertyValues"' -DCLDR_DIR="$((Get-Item ..\cldr).FullName)" -DUNICODETOOLS_GEN_DIR="$((Get-Item .\Generated).FullName)" -DUNICODETOOLS_REPO_DIR="$((Get-Item .).FullName)"
UAX42 fragments live in unicodetools/src/main/resources/org/unicode/uax42/fragments
mvn xml:transform -f $(cd ./unicodetools/src/main/resources/org/unicode/uax42; pwd) -Doutputdir=$(cd ../Generated/uax42; pwd)
You‘ll need a RELAX NG schema validator. We’ll use jing-trang in this example.
Clone and build jing-trang
Run the following:
java -jar C:\_git\jing-trang\build\jing.jar -c UNICODETOOLS_REPO_DIR\uax\uax42\output\tr42.rnc <path to UAX xml file>
Note that the UAX xml file has to be saved as NFD as the Unihan syntax regular expressions are expecting NFD.
To convert to NFD, use ICU's uconv.exe:
uconv.exe -f utf8 -t utf8 -x nfd -o {outputfile} {originalfile}