commit | 8bd3916ec655c728bb368f27772429d0704d7785 | [log] [tgz] |
---|---|---|
author | Klaus Post <klauspost@gmail.com> | Fri Apr 12 10:02:33 2024 |
committer | GitHub <noreply@github.com> | Fri Apr 12 10:02:33 2024 |
tree | 1d48745bd1ba32d57fb516349295da8cb5b1c599 | |
parent | c0ff47e262d13b2d48101344c6eff7204d8e6696 [diff] |
s2: Reduce ReadFrom temporary allocations (#949) Only functional change is to add minimum 1MB between entries (which was enforced when saving anyway). Probably close to a worst case (and probably noisy), but overall looks good: ``` benchmark old ns/op new ns/op delta BenchmarkReadFromRandom/default-c1-4k-win-32 4157177 2604339 -37.35% BenchmarkReadFromRandom/default-c2-32 788147 679660 -13.76% BenchmarkReadFromRandom/best-c2-4k-win-32 215513740 207505150 -3.72% BenchmarkReadFromRandom/default-c2-4k-win-32 8712437 8354241 -4.11% BenchmarkReadFromRandom/better-c1-32 2861139 1862028 -34.92% BenchmarkReadFromRandom/none-c1-32 2331903 1430642 -38.65% BenchmarkReadFromRandom/best-c1-4k-win-32 278422200 267546000 -3.91% BenchmarkReadFromRandom/none-c2-pad-min-32 772803 672716 -12.95% BenchmarkReadFromRandom/best-c2-pad-min-32 5460325 5095500 -6.68% BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32 1121548 651889 -41.88% BenchmarkReadFromRandom/better-c1-4M-win-32 3135030 1813419 -42.16% BenchmarkReadFromRandom/none-c2-4k-win-32 4854943 4876356 +0.44% BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32 2526561 1457227 -42.32% BenchmarkReadFromRandom/better-c2-4k-win-32 11868536 11714897 -1.29% BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32 15489805 11971380 -22.71% BenchmarkReadFromRandom/none-c1-pad-min-32 2416111 1421856 -41.15% BenchmarkReadFromRandom/none-c2-4M-win-32 1101574 668233 -39.34% BenchmarkReadFromRandom/default-c2-4M-win-32 1125451 669567 -40.51% BenchmarkReadFromRandom/better-c1-pad-min-32 3601453 1885376 -47.65% BenchmarkReadFromRandom/best-c1-4M-win-32 15391320 13574288 -11.81% BenchmarkReadFromRandom/default-c1-32 2910080 1468122 -49.55% BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32 8912611 6817072 -23.51% BenchmarkReadFromRandom/none-c2-32 772356 670370 -13.20% BenchmarkReadFromRandom/default-c1-4M-win-32 2588579 1450875 -43.95% BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32 8659309 8312865 -4.00% BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32 11869082 11664866 -1.72% BenchmarkReadFromRandom/better-c1-4k-win-32 8838607 6716940 -24.00% BenchmarkReadFromRandom/better-c2-4M-win-32 1191929 835844 -29.87% BenchmarkReadFromRandom/best-c1-32 14526139 12391446 -14.70% BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32 195756200 202187120 +3.29% BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32 2738898 833709 -69.56% BenchmarkReadFromRandom/better-c2-pad-min-32 1004892 884365 -11.99% BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32 4823234 4856289 +0.69% BenchmarkReadFromRandom/best-c2-4M-win-32 5898056 5408690 -8.30% BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32 5669392 5523631 -2.57% BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32 4570917 2567954 -43.82% BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32 1231687 847167 -31.22% BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32 1096798 678716 -38.12% BenchmarkReadFromRandom/none-c1-4k-win-32 3316746 843895 -74.56% BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32 2640620 1429627 -45.86% BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32 278722100 273288575 -1.95% BenchmarkReadFromRandom/best-c1-pad-min-32 14499535 12347208 -14.84% BenchmarkReadFromRandom/none-c1-4M-win-32 2652251 1453083 -45.21% BenchmarkReadFromRandom/default-c1-pad-min-32 2791364 1474998 -47.16% BenchmarkReadFromRandom/default-c2-pad-min-32 824566 707436 -14.21% BenchmarkReadFromRandom/best-c2-32 5362292 5104630 -4.81% BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32 3919478 1822777 -53.49% BenchmarkReadFromRandom/better-c2-32 969154 880528 -9.14% benchmark old MB/s new MB/s speedup BenchmarkReadFromRandom/default-c1-4k-win-32 2017.86 3221.01 1.60x BenchmarkReadFromRandom/default-c2-32 10643.46 12342.35 1.16x BenchmarkReadFromRandom/best-c2-4k-win-32 38.92 40.43 1.04x BenchmarkReadFromRandom/default-c2-4k-win-32 962.83 1004.11 1.04x BenchmarkReadFromRandom/better-c1-32 2931.91 4505.09 1.54x BenchmarkReadFromRandom/none-c1-32 3597.32 5863.53 1.63x BenchmarkReadFromRandom/best-c1-4k-win-32 30.13 31.35 1.04x BenchmarkReadFromRandom/none-c2-pad-min-32 10854.78 12469.75 1.15x BenchmarkReadFromRandom/best-c2-pad-min-32 1536.28 1646.28 1.07x BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32 7479.49 12868.15 1.72x BenchmarkReadFromRandom/better-c1-4M-win-32 2675.77 4625.85 1.73x BenchmarkReadFromRandom/none-c2-4k-win-32 1727.85 1720.26 1.00x BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32 3320.17 5756.55 1.73x BenchmarkReadFromRandom/better-c2-4k-win-32 706.79 716.06 1.01x BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32 541.56 700.72 1.29x BenchmarkReadFromRandom/none-c1-pad-min-32 3471.95 5899.76 1.70x BenchmarkReadFromRandom/none-c2-4M-win-32 7615.11 12553.42 1.65x BenchmarkReadFromRandom/default-c2-4M-win-32 7453.55 12528.41 1.68x BenchmarkReadFromRandom/better-c1-pad-min-32 2329.23 4449.30 1.91x BenchmarkReadFromRandom/best-c1-4M-win-32 545.02 617.98 1.13x BenchmarkReadFromRandom/default-c1-32 2882.60 5713.84 1.98x BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32 941.21 1230.53 1.31x BenchmarkReadFromRandom/none-c2-32 10861.07 12513.39 1.15x BenchmarkReadFromRandom/default-c1-4M-win-32 3240.62 5781.76 1.78x BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32 968.74 1009.11 1.04x BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32 706.76 719.13 1.02x BenchmarkReadFromRandom/better-c1-4k-win-32 949.09 1248.87 1.32x BenchmarkReadFromRandom/better-c2-4M-win-32 7037.84 10036.09 1.43x BenchmarkReadFromRandom/best-c1-32 577.48 676.97 1.17x BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32 42.85 41.49 0.97x BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32 3062.77 10061.79 3.29x BenchmarkReadFromRandom/better-c2-pad-min-32 8347.77 9485.46 1.14x BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32 1739.21 1727.37 0.99x BenchmarkReadFromRandom/best-c2-4M-win-32 1422.27 1550.95 1.09x BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32 1479.63 1518.68 1.03x BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32 1835.21 3266.65 1.78x BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32 6810.66 9901.96 1.45x BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32 7648.27 12359.53 1.62x BenchmarkReadFromRandom/none-c1-4k-win-32 2529.17 9940.34 3.93x BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32 3176.76 5867.69 1.85x BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32 30.10 30.70 1.02x BenchmarkReadFromRandom/best-c1-pad-min-32 578.54 679.39 1.17x BenchmarkReadFromRandom/none-c1-4M-win-32 3162.83 5772.97 1.83x BenchmarkReadFromRandom/default-c1-pad-min-32 3005.20 5687.20 1.89x BenchmarkReadFromRandom/default-c2-pad-min-32 10173.36 11857.76 1.17x BenchmarkReadFromRandom/best-c2-32 1564.37 1643.33 1.05x BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32 2140.24 4602.10 2.15x BenchmarkReadFromRandom/better-c2-32 8655.60 9526.79 1.10x benchmark old allocs new allocs delta BenchmarkReadFromRandom/default-c1-4k-win-32 8196 6145 -25.02% BenchmarkReadFromRandom/default-c2-32 59 57 -3.39% BenchmarkReadFromRandom/best-c2-4k-win-32 14356 14347 -0.06% BenchmarkReadFromRandom/default-c2-4k-win-32 14344 14341 -0.02% BenchmarkReadFromRandom/better-c1-32 35 25 -28.57% BenchmarkReadFromRandom/none-c1-32 35 25 -28.57% BenchmarkReadFromRandom/best-c1-4k-win-32 8196 6147 -25.00% BenchmarkReadFromRandom/none-c2-pad-min-32 59 57 -3.39% BenchmarkReadFromRandom/best-c2-pad-min-32 58 57 -1.72% BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32 18 15 -16.67% BenchmarkReadFromRandom/better-c1-4M-win-32 11 7 -36.36% BenchmarkReadFromRandom/none-c2-4k-win-32 14345 14343 -0.01% BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32 11 7 -36.36% BenchmarkReadFromRandom/better-c2-4k-win-32 14343 14343 +0.00% BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32 11 7 -36.36% BenchmarkReadFromRandom/none-c1-pad-min-32 35 25 -28.57% BenchmarkReadFromRandom/none-c2-4M-win-32 18 15 -16.67% BenchmarkReadFromRandom/default-c2-4M-win-32 18 15 -16.67% BenchmarkReadFromRandom/better-c1-pad-min-32 35 25 -28.57% BenchmarkReadFromRandom/best-c1-4M-win-32 11 7 -36.36% BenchmarkReadFromRandom/default-c1-32 35 25 -28.57% BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32 8196 6145 -25.02% BenchmarkReadFromRandom/none-c2-32 59 57 -3.39% BenchmarkReadFromRandom/default-c1-4M-win-32 11 7 -36.36% BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32 14345 14343 -0.01% BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32 14344 14346 +0.01% BenchmarkReadFromRandom/better-c1-4k-win-32 8196 6145 -25.02% BenchmarkReadFromRandom/better-c2-4M-win-32 18 15 -16.67% BenchmarkReadFromRandom/best-c1-32 35 25 -28.57% BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32 14347 14343 -0.03% BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32 8196 6145 -25.02% BenchmarkReadFromRandom/better-c2-pad-min-32 59 57 -3.39% BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32 14345 14343 -0.01% BenchmarkReadFromRandom/best-c2-4M-win-32 17 15 -11.76% BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32 17 15 -11.76% BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32 8196 6145 -25.02% BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32 18 15 -16.67% BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32 18 15 -16.67% BenchmarkReadFromRandom/none-c1-4k-win-32 8196 6145 -25.02% BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32 11 7 -36.36% BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32 8196 6148 -24.99% BenchmarkReadFromRandom/best-c1-pad-min-32 35 25 -28.57% BenchmarkReadFromRandom/none-c1-4M-win-32 11 7 -36.36% BenchmarkReadFromRandom/default-c1-pad-min-32 35 25 -28.57% BenchmarkReadFromRandom/default-c2-pad-min-32 59 57 -3.39% BenchmarkReadFromRandom/best-c2-32 58 57 -1.72% BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32 11 7 -36.36% BenchmarkReadFromRandom/better-c2-32 59 57 -3.39% benchmark old bytes new bytes delta BenchmarkReadFromRandom/default-c1-4k-win-32 10119115 148076 -98.54% BenchmarkReadFromRandom/default-c2-32 1454676 4899 -99.66% BenchmarkReadFromRandom/best-c2-4k-win-32 631225 625546 -0.90% BenchmarkReadFromRandom/default-c2-4k-win-32 630732 625959 -0.76% BenchmarkReadFromRandom/better-c1-32 9514904 2797 -99.97% BenchmarkReadFromRandom/none-c1-32 9516943 2440 -99.97% BenchmarkReadFromRandom/best-c1-4k-win-32 10119906 150564 -98.51% BenchmarkReadFromRandom/none-c2-pad-min-32 1500410 6171 -99.59% BenchmarkReadFromRandom/best-c2-pad-min-32 1173566 25894 -97.79% BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32 6423795 5415 -99.92% BenchmarkReadFromRandom/better-c1-4M-win-32 12610237 6582 -99.95% BenchmarkReadFromRandom/none-c2-4k-win-32 632307 626486 -0.92% BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32 12610296 5410 -99.96% BenchmarkReadFromRandom/better-c2-4k-win-32 630839 626113 -0.75% BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32 12610476 49191 -99.61% BenchmarkReadFromRandom/none-c1-pad-min-32 9514740 2479 -99.97% BenchmarkReadFromRandom/none-c2-4M-win-32 6545265 13213 -99.80% BenchmarkReadFromRandom/default-c2-4M-win-32 6481737 11355 -99.82% BenchmarkReadFromRandom/better-c1-pad-min-32 9514861 2763 -99.97% BenchmarkReadFromRandom/best-c1-4M-win-32 12610426 57138 -99.55% BenchmarkReadFromRandom/default-c1-32 9517523 2540 -99.97% BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32 10118890 148124 -98.54% BenchmarkReadFromRandom/none-c2-32 1530722 5478 -99.64% BenchmarkReadFromRandom/default-c1-4M-win-32 12610210 5454 -99.96% BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32 630895 626457 -0.70% BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32 630794 626720 -0.65% BenchmarkReadFromRandom/better-c1-4k-win-32 10118922 148161 -98.54% BenchmarkReadFromRandom/better-c2-4M-win-32 6598822 13798 -99.79% BenchmarkReadFromRandom/best-c1-32 9516160 12594 -99.87% BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32 631572 628459 -0.49% BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32 10118789 148120 -98.54% BenchmarkReadFromRandom/better-c2-pad-min-32 1489742 6937 -99.53% BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32 631988 626774 -0.83% BenchmarkReadFromRandom/best-c2-4M-win-32 4734174 79313 -98.32% BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32 4859587 78584 -98.38% BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32 10118817 148094 -98.54% BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32 6497809 13671 -99.79% BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32 6482743 13380 -99.79% BenchmarkReadFromRandom/none-c1-4k-win-32 10118669 148214 -98.54% BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32 12610006 5378 -99.96% BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32 10119906 150652 -98.51% BenchmarkReadFromRandom/best-c1-pad-min-32 9515185 12174 -99.87% BenchmarkReadFromRandom/none-c1-4M-win-32 12610010 5304 -99.96% BenchmarkReadFromRandom/default-c1-pad-min-32 9514765 2516 -99.97% BenchmarkReadFromRandom/default-c2-pad-min-32 1524283 4986 -99.67% BenchmarkReadFromRandom/best-c2-32 1238837 3118 -99.75% BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32 12636434 7013 -99.94% BenchmarkReadFromRandom/better-c2-32 1486984 6114 -99.59% ``` `magicChunkSnappyBytes` and `magicChunkBytes` not included.
This package provides various compression algorithms.
github.com/golang/snappy
offering better compression and concurrent streams.Feb 5th, 2024 - 1.17.6
Jan 26th, 2024 - v1.17.5
Dec 1st, 2023 - v1.17.4
Nov 15th, 2023 - v1.17.3
Oct 22nd, 2023 - v1.17.2
Oct 14th, 2023 - v1.17.1
Sept 19th, 2023 - v1.17.0
July 1st, 2023 - v1.16.7
June 13, 2023 - v1.16.6
Apr 16, 2023 - v1.16.5
Apr 5, 2023 - v1.16.4
Mar 13, 2023 - v1.16.1
Feb 26, 2023 - v1.16.0
Jan 21st, 2023 (v1.15.15)
Jan 3rd, 2023 (v1.15.14)
Dec 11, 2022 (v1.15.13)
Oct 26, 2022 (v1.15.12)
HeaderNoCompression
https://github.com/klauspost/compress/pull/683Sept 26, 2022 (v1.15.11)
Sept 16, 2022 (v1.15.10)
July 21, 2022 (v1.15.9)
July 13, 2022 (v1.15.8)
June 29, 2022 (v1.15.7)
June 3, 2022 (v1.15.6)
May 25, 2022 (v1.15.5)
May 11, 2022 (v1.15.4)
May 5, 2022 (v1.15.3)
Apr 26, 2022 (v1.15.2)
Mar 11, 2022 (v1.15.1)
Mar 3, 2022 (v1.15.0)
Both compression and decompression now supports “synchronous” stream operations. This means that whenever “concurrency” is set to 1, they will operate without spawning goroutines.
Stream decompression is now faster on asynchronous, since the goroutine allocation much more effectively splits the workload. On typical streams this will typically use 2 cores fully for decompression. When a stream has finished decoding no goroutines will be left over, so decoders can now safely be pooled and still be garbage collected.
While the release has been extensively tested, it is recommended to testing when upgrading.
Feb 22, 2022 (v1.14.4)
Feb 17, 2022 (v1.14.3)
Jan 25, 2022 (v1.14.2)
Jan 11, 2022 (v1.14.1)
Aug 30, 2021 (v1.13.5)
Aug 12, 2021 (v1.13.4)
Aug 3, 2021 (v1.13.3)
Jun 14, 2021 (v1.13.1)
Jun 3, 2021 (v1.13.0)
May 25, 2021 (v1.12.3)
Apr 27, 2021 (v1.12.2)
Apr 14, 2021 (v1.12.1)
Mar 26, 2021 (v1.11.13)
Mar 5, 2021 (v1.11.12)
s2sx
binary that creates self extracting archives.Mar 1, 2021 (v1.11.9)
Feb 25, 2021 (v1.11.8)
Jan 14, 2021 (v1.11.7)
Jan 7, 2021 (v1.11.6)
Dec 20, 2020 (v1.11.4)
Nov 15, 2020 (v1.11.3)
Oct 11, 2020 (v1.11.2)
Oct 1, 2020 (v1.11.1)
Sept 8, 2020 (v1.11.0)
July 8, 2020 (v1.10.11)
June 23, 2020 (v1.10.10)
June 16, 2020 (v1.10.9):
June 5, 2020 (v1.10.8):
June 1, 2020 (v1.10.7):
May 21, 2020: (v1.10.6)
April 12, 2020: (v1.10.5)
Apr 8, 2020: (v1.10.4)
Mar 11, 2020: (v1.10.3)
Feb 27, 2020: (v1.10.2)
Feb 18, 2020: (v1.10.1)
Feb 4, 2020: (v1.10.0)
nil
for previous behaviour. #216-rm
(remove source files) and -q
(no output except errors) to s2c
and s2d
commandss2c
and s2d
commandline tools.The packages are drop-in replacements for standard libraries. Simply replace the import path to use them:
old import | new import | Documentation |
---|---|---|
compress/gzip | github.com/klauspost/compress/gzip | gzip |
compress/zlib | github.com/klauspost/compress/zlib | zlib |
archive/zip | github.com/klauspost/compress/zip | zip |
compress/flate | github.com/klauspost/compress/flate | flate |
You may also be interested in pgzip, which is a drop in replacement for gzip, which support multithreaded compression on big files and the optimized crc32 package used by these packages.
The packages contains the same as the standard library, so you can use the godoc for that: gzip, zip, zlib, flate.
Currently there is only minor speedup on decompression (mostly CRC32 calculation).
Memory usage is typically 1MB for a Writer. stdlib is in the same range. If you expect to have a lot of concurrently allocated Writers consider using the stateless compress described below.
For compression performance, see: this spreadsheet.
To disable all assembly add -tags=noasm
. This works across all packages.
This package offers stateless compression as a special option for gzip/deflate. It will do compression but without maintaining any state between Write calls.
This means there will be no memory kept between Write calls, but compression and speed will be suboptimal.
This is only relevant in cases where you expect to run many thousands of compressors concurrently, but with very little activity. This is not intended for regular web servers serving individual requests.
Because of this, the size of actual Write calls will affect output size.
In gzip, specify level -3
/ gzip.StatelessCompression
to enable.
For direct deflate use, NewStatelessWriter and StatelessDeflate are available. See documentation
A bufio.Writer
can of course be used to control write sizes. For example, to use a 4KB buffer:
// replace 'ioutil.Discard' with your output. gzw, err := gzip.NewWriterLevel(ioutil.Discard, gzip.StatelessCompression) if err != nil { return err } defer gzw.Close() w := bufio.NewWriterSize(gzw, 4096) defer w.Flush() // Write to 'w'
This will only use up to 4KB in memory when the writer is idle.
Compression is almost always worse than the fastest compression level and each write will allocate (a little) memory.
It has been a while since we have been looking at the speed of this package compared to the standard library, so I thought I would re-do my tests and give some overall recommendations based on the current state. All benchmarks have been performed with Go 1.10 on my Desktop Intel(R) Core(TM) i7-2600 CPU @3.40GHz. Since I last ran the tests, I have gotten more RAM, which means tests with big files are no longer limited by my SSD.
The raw results are in my updated spreadsheet. Due to cgo changes and upstream updates i could not get the cgo version of gzip to compile. Instead I included the zstd cgo implementation. If I get cgo gzip to work again, I might replace the results in the sheet.
The columns to take note of are: MB/s - the throughput. Reduction - the data size reduction in percent of the original. Rel Speed relative speed compared to the standard library at the same level. Smaller - how many percent smaller is the compressed output compared to stdlib. Negative means the output was bigger. Loss means the loss (or gain) in compression as a percentage difference of the input.
The gzstd
(standard library gzip) and gzkp
(this package gzip) only uses one CPU core. pgzip
, bgzf
uses all 4 cores. zstd
uses one core, and is a beast (but not Go, yet).
There appears to be a roughly 5-10% speed advantage over the standard library when comparing at similar compression levels.
The biggest difference you will see is the result of re-balancing the compression levels. I wanted by library to give a smoother transition between the compression levels than the standard library.
This package attempts to provide a more smooth transition, where “1” is taking a lot of shortcuts, “5” is the reasonable trade-off and “9” is the “give me the best compression”, and the values in between gives something reasonable in between. The standard library has big differences in levels 1-4, but levels 5-9 having no significant gains - often spending a lot more time than can be justified by the achieved compression.
There are links to all the test data in the spreadsheet in the top left field on each tab.
This test set aims to emulate typical use in a web server. The test-set is 4GB data in 53k files, and is a mixture of (mostly) HTML, JS, CSS.
Since level 1 and 9 are close to being the same code, they are quite close. But looking at the levels in-between the differences are quite big.
Looking at level 6, this package is 88% faster, but will output about 6% more data. For a web server, this means you can serve 88% more data, but have to pay for 6% more bandwidth. You can draw your own conclusions on what would be the most expensive for your case.
This test is for typical data files stored on a server. In this case it is a collection of Go precompiled objects. They are very compressible.
The picture is similar to the web content, but with small differences since this is very compressible. Levels 2-3 offer good speed, but is sacrificing quite a bit of compression.
The standard library seems suboptimal on level 3 and 4 - offering both worse compression and speed than level 6 & 7 of this package respectively.
This is a JSON file with very high redundancy. The reduction starts at 95% on level 1, so in real life terms we are dealing with something like a highly redundant stream of data, etc.
It is definitely visible that we are dealing with specialized content here, so the results are very scattered. This package does not do very well at levels 1-4, but picks up significantly at level 5 and levels 7 and 8 offering great speed for the achieved compression.
So if you know you content is extremely compressible you might want to go slightly higher than the defaults. The standard library has a huge gap between levels 3 and 4 in terms of speed (2.75x slowdown), so it offers little “middle ground”.
This is a pretty common test corpus: enwik9. It contains the first 10^9 bytes of the English Wikipedia dump on Mar. 3, 2006. This is a very good test of typical text based compression and more data heavy streams.
We see a similar picture here as in “Web Content”. On equal levels some compression is sacrificed for more speed. Level 5 seems to be the best trade-off between speed and size, beating stdlib level 3 in both.
I will combine two test sets, one 10GB file set and a VM disk image (~8GB). Both contain different data types and represent a typical backup scenario.
The most notable thing is how quickly the standard library drops to very low compression speeds around level 5-6 without any big gains in compression. Since this type of data is fairly common, this does not seem like good behavior.
This is mainly a test of how good the algorithms are at detecting un-compressible input. The standard library only offers this feature with very conservative settings at level 1. Obviously there is no reason for the algorithms to try to compress input that cannot be compressed. The only downside is that it might skip some compressible data on false detections.
This compression library adds a special compression level, named HuffmanOnly
, which allows near linear time compression. This is done by completely disabling matching of previous data, and only reduce the number of bits to represent each character.
This means that often used characters, like ‘e’ and ' ' (space) in text use the fewest bits to represent, and rare characters like ‘ยค’ takes more bits to represent. For more information see wikipedia or this nice video.
Since this type of compression has much less variance, the compression speed is mostly unaffected by the input data, and is usually more than 180MB/s for a single core.
The downside is that the compression ratio is usually considerably worse than even the fastest conventional compression. The compression ratio can never be better than 8:1 (12.5%).
The linear time compression can be used as a “better than nothing” mode, where you cannot risk the encoder to slow down on some content. For comparison, the size of the “Twain” text is 233460 bytes (+29% vs. level 1) and encode speed is 144MB/s (4.5x level 1). So in this case you trade a 30% size increase for a 4 times speedup.
For more information see my blog post on Fast Linear Time Compression.
This is implemented on Go 1.7 as “Huffman Only” mode, though not exposed for gzip.
Here are other packages of good quality and pure Go (no cgo wrappers or autoconverted code):
This code is licensed under the same conditions as the original Go code. See LICENSE file.