[text] Rewrite the text plist parser to be like text/template/parser

This commit overhauls the text property list parser, reducing the cost
in time and memory and overall sanity required to parse text property
list documents.

Herein, support is also added for:
* UTF-16 text property lists (#26)
* Proper scanning of UTF-8 codepoints
* Encoding conversion (UTF-16{BE,LE) +- BOM -> UTF-8)
* Empty data values, <>
* Error messages that include line and column info (#25)
* Legacy strings file format (dictionary without { }) (#27)
* Shortcut strings file format (dictionaries without values) (#27)
* Short hex/unicode/octal escapes (\x2, \u33, \0)
* Empty documents parsing as empty dictionaries
* Detection of garbage after the end of a document

The character tables have been augmented with their own characterSet
type, which allows them to report on their own residence. All characters
outside the 0-255 range will be considered "not in set" for now.

In the benchmarks below, *Step(Parse|Decode) operate on a relatively
small synthetic property list that contains every property list type.
BigParse operates on a ~700kb binary property list created by converting
the iTunes software update catalog from XML to GNUStep or OpenStep.
Pretty benchmarks include whitespace.

benchmark                             old ns/op     new ns/op     delta
BenchmarkBigGNUStepParse-4            125008990     33544860      -73.17%
BenchmarkBigPrettyGNUStepParse-4      54869160      38049063      -30.65%
BenchmarkBigOpenStepParse-4           124436480     31491614      -74.69%
BenchmarkBigPrettyOpenStepParse-4     54080760      34542446      -36.13%
BenchmarkOpenStepParse-4              20177         13894         -31.14%
BenchmarkGNUStepParse-4               18742         15087         -19.50%

benchmark                             old allocs     new allocs     delta
BenchmarkBigGNUStepParse-4            2248154        120655         -94.63%
BenchmarkBigPrettyGNUStepParse-4      969515         120655         -87.56%
BenchmarkBigOpenStepParse-4           2251448        120655         -94.64%
BenchmarkBigPrettyOpenStepParse-4     969541         120655         -87.56%
BenchmarkOpenStepParse-4              234            44             -81.20%
BenchmarkGNUStepParse-4               186            47             -74.73%

benchmark                             old bytes     new bytes     delta
BenchmarkBigGNUStepParse-4            67633657      24006777      -64.50%
BenchmarkBigPrettyGNUStepParse-4      30100843      24006784      -20.25%
BenchmarkBigOpenStepParse-4           67657126      24023625      -64.49%
BenchmarkBigPrettyOpenStepParse-4     30101001      24023619      -20.19%
BenchmarkOpenStepParse-4              15376         10192         -33.71%
BenchmarkGNUStepParse-4               14992         10320         -31.16%

Fixes #25
Fixes #26
Fixes #27
6 files changed
tree: 8c1b607c825359970de3402fded85b80cd48e892
  1. ply/
  2. .travis.yml
  3. bplist.go
  4. bplist_generator.go
  5. bplist_parser.go
  6. bplist_test.go
  7. common_data_for_test.go
  8. decode.go
  9. decode_test.go
  10. doc.go
  11. encode.go
  12. encode_test.go
  13. example_custom_marshaler_test.go
  14. fuzz.go
  15. invalid_bplist_test.go
  16. invalid_text_test.go
  17. LICENSE
  18. marshal.go
  19. marshal_test.go
  20. must.go
  21. plist.go
  22. plist_types.go
  23. README.md
  24. text_generator.go
  25. text_parser.go
  26. text_tables.go
  27. text_test.go
  28. typeinfo.go
  29. unmarshal.go
  30. unmarshal_test.go
  31. util.go
  32. xml.go
  33. xml_test.go
  34. zerocopy.go
  35. zerocopy_appengine.go
README.md

plist - A pure Go property list transcoder

INSTALL

$ go get howett.net/plist

FEATURES

  • Supports encoding/decoding property lists (Apple XML, Apple Binary, OpenStep and GNUStep) from/to arbitrary Go types

USE

package main
import (
	"howett.net/plist"
	"os"
)
func main() {
	encoder := plist.NewEncoder(os.Stdout)
	encoder.Encode(map[string]string{"hello": "world"})
}