[text] Rewrite the text plist parser to be like text/template/parser
This commit overhauls the text property list parser, reducing the cost
in time and memory and overall sanity required to parse text property
list documents.
Herein, support is also added for:
* UTF-16 text property lists (#26)
* Proper scanning of UTF-8 codepoints
* Encoding conversion (UTF-16{BE,LE) +- BOM -> UTF-8)
* Empty data values, <>
* Error messages that include line and column info (#25)
* Legacy strings file format (dictionary without { }) (#27)
* Shortcut strings file format (dictionaries without values) (#27)
* Short hex/unicode/octal escapes (\x2, \u33, \0)
* Empty documents parsing as empty dictionaries
* Detection of garbage after the end of a document
The character tables have been augmented with their own characterSet
type, which allows them to report on their own residence. All characters
outside the 0-255 range will be considered "not in set" for now.
In the benchmarks below, *Step(Parse|Decode) operate on a relatively
small synthetic property list that contains every property list type.
BigParse operates on a ~700kb binary property list created by converting
the iTunes software update catalog from XML to GNUStep or OpenStep.
Pretty benchmarks include whitespace.
benchmark old ns/op new ns/op delta
BenchmarkBigGNUStepParse-4 125008990 33544860 -73.17%
BenchmarkBigPrettyGNUStepParse-4 54869160 38049063 -30.65%
BenchmarkBigOpenStepParse-4 124436480 31491614 -74.69%
BenchmarkBigPrettyOpenStepParse-4 54080760 34542446 -36.13%
BenchmarkOpenStepParse-4 20177 13894 -31.14%
BenchmarkGNUStepParse-4 18742 15087 -19.50%
benchmark old allocs new allocs delta
BenchmarkBigGNUStepParse-4 2248154 120655 -94.63%
BenchmarkBigPrettyGNUStepParse-4 969515 120655 -87.56%
BenchmarkBigOpenStepParse-4 2251448 120655 -94.64%
BenchmarkBigPrettyOpenStepParse-4 969541 120655 -87.56%
BenchmarkOpenStepParse-4 234 44 -81.20%
BenchmarkGNUStepParse-4 186 47 -74.73%
benchmark old bytes new bytes delta
BenchmarkBigGNUStepParse-4 67633657 24006777 -64.50%
BenchmarkBigPrettyGNUStepParse-4 30100843 24006784 -20.25%
BenchmarkBigOpenStepParse-4 67657126 24023625 -64.49%
BenchmarkBigPrettyOpenStepParse-4 30101001 24023619 -20.19%
BenchmarkOpenStepParse-4 15376 10192 -33.71%
BenchmarkGNUStepParse-4 14992 10320 -31.16%
Fixes #25
Fixes #26
Fixes #27
6 files changed