commit	8bd3916ec655c728bb368f27772429d0704d7785	[log] [tgz]
author	Klaus Post <klauspost@gmail.com>	Fri Apr 12 10:02:33 2024
committer	GitHub <noreply@github.com>	Fri Apr 12 10:02:33 2024
tree	1d48745bd1ba32d57fb516349295da8cb5b1c599
parent	c0ff47e262d13b2d48101344c6eff7204d8e6696 [diff]

s2: Reduce ReadFrom temporary allocations (#949)

Only functional change is to add minimum 1MB between entries (which was enforced when saving anyway).

Probably close to a worst case (and probably noisy), but overall looks good:

```
benchmark                                                old ns/op     new ns/op     delta
BenchmarkReadFromRandom/default-c1-4k-win-32             4157177       2604339       -37.35%
BenchmarkReadFromRandom/default-c2-32                    788147        679660        -13.76%
BenchmarkReadFromRandom/best-c2-4k-win-32                215513740     207505150     -3.72%
BenchmarkReadFromRandom/default-c2-4k-win-32             8712437       8354241       -4.11%
BenchmarkReadFromRandom/better-c1-32                     2861139       1862028       -34.92%
BenchmarkReadFromRandom/none-c1-32                       2331903       1430642       -38.65%
BenchmarkReadFromRandom/best-c1-4k-win-32                278422200     267546000     -3.91%
BenchmarkReadFromRandom/none-c2-pad-min-32               772803        672716        -12.95%
BenchmarkReadFromRandom/best-c2-pad-min-32               5460325       5095500       -6.68%
BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32        1121548       651889        -41.88%
BenchmarkReadFromRandom/better-c1-4M-win-32              3135030       1813419       -42.16%
BenchmarkReadFromRandom/none-c2-4k-win-32                4854943       4876356       +0.44%
BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32     2526561       1457227       -42.32%
BenchmarkReadFromRandom/better-c2-4k-win-32              11868536      11714897      -1.29%
BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32        15489805      11971380      -22.71%
BenchmarkReadFromRandom/none-c1-pad-min-32               2416111       1421856       -41.15%
BenchmarkReadFromRandom/none-c2-4M-win-32                1101574       668233        -39.34%
BenchmarkReadFromRandom/default-c2-4M-win-32             1125451       669567        -40.51%
BenchmarkReadFromRandom/better-c1-pad-min-32             3601453       1885376       -47.65%
BenchmarkReadFromRandom/best-c1-4M-win-32                15391320      13574288      -11.81%
BenchmarkReadFromRandom/default-c1-32                    2910080       1468122       -49.55%
BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32      8912611       6817072       -23.51%
BenchmarkReadFromRandom/none-c2-32                       772356        670370        -13.20%
BenchmarkReadFromRandom/default-c1-4M-win-32             2588579       1450875       -43.95%
BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32     8659309       8312865       -4.00%
BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32      11869082      11664866      -1.72%
BenchmarkReadFromRandom/better-c1-4k-win-32              8838607       6716940       -24.00%
BenchmarkReadFromRandom/better-c2-4M-win-32              1191929       835844        -29.87%
BenchmarkReadFromRandom/best-c1-32                       14526139      12391446      -14.70%
BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32        195756200     202187120     +3.29%
BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32        2738898       833709        -69.56%
BenchmarkReadFromRandom/better-c2-pad-min-32             1004892       884365        -11.99%
BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32        4823234       4856289       +0.69%
BenchmarkReadFromRandom/best-c2-4M-win-32                5898056       5408690       -8.30%
BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32        5669392       5523631       -2.57%
BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32     4570917       2567954       -43.82%
BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32      1231687       847167        -31.22%
BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32     1096798       678716        -38.12%
BenchmarkReadFromRandom/none-c1-4k-win-32                3316746       843895        -74.56%
BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32        2640620       1429627       -45.86%
BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32        278722100     273288575     -1.95%
BenchmarkReadFromRandom/best-c1-pad-min-32               14499535      12347208      -14.84%
BenchmarkReadFromRandom/none-c1-4M-win-32                2652251       1453083       -45.21%
BenchmarkReadFromRandom/default-c1-pad-min-32            2791364       1474998       -47.16%
BenchmarkReadFromRandom/default-c2-pad-min-32            824566        707436        -14.21%
BenchmarkReadFromRandom/best-c2-32                       5362292       5104630       -4.81%
BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32      3919478       1822777       -53.49%
BenchmarkReadFromRandom/better-c2-32                     969154        880528        -9.14%

benchmark                                                old MB/s     new MB/s     speedup
BenchmarkReadFromRandom/default-c1-4k-win-32             2017.86      3221.01      1.60x
BenchmarkReadFromRandom/default-c2-32                    10643.46     12342.35     1.16x
BenchmarkReadFromRandom/best-c2-4k-win-32                38.92        40.43        1.04x
BenchmarkReadFromRandom/default-c2-4k-win-32             962.83       1004.11      1.04x
BenchmarkReadFromRandom/better-c1-32                     2931.91      4505.09      1.54x
BenchmarkReadFromRandom/none-c1-32                       3597.32      5863.53      1.63x
BenchmarkReadFromRandom/best-c1-4k-win-32                30.13        31.35        1.04x
BenchmarkReadFromRandom/none-c2-pad-min-32               10854.78     12469.75     1.15x
BenchmarkReadFromRandom/best-c2-pad-min-32               1536.28      1646.28      1.07x
BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32        7479.49      12868.15     1.72x
BenchmarkReadFromRandom/better-c1-4M-win-32              2675.77      4625.85      1.73x
BenchmarkReadFromRandom/none-c2-4k-win-32                1727.85      1720.26      1.00x
BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32     3320.17      5756.55      1.73x
BenchmarkReadFromRandom/better-c2-4k-win-32              706.79       716.06       1.01x
BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32        541.56       700.72       1.29x
BenchmarkReadFromRandom/none-c1-pad-min-32               3471.95      5899.76      1.70x
BenchmarkReadFromRandom/none-c2-4M-win-32                7615.11      12553.42     1.65x
BenchmarkReadFromRandom/default-c2-4M-win-32             7453.55      12528.41     1.68x
BenchmarkReadFromRandom/better-c1-pad-min-32             2329.23      4449.30      1.91x
BenchmarkReadFromRandom/best-c1-4M-win-32                545.02       617.98       1.13x
BenchmarkReadFromRandom/default-c1-32                    2882.60      5713.84      1.98x
BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32      941.21       1230.53      1.31x
BenchmarkReadFromRandom/none-c2-32                       10861.07     12513.39     1.15x
BenchmarkReadFromRandom/default-c1-4M-win-32             3240.62      5781.76      1.78x
BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32     968.74       1009.11      1.04x
BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32      706.76       719.13       1.02x
BenchmarkReadFromRandom/better-c1-4k-win-32              949.09       1248.87      1.32x
BenchmarkReadFromRandom/better-c2-4M-win-32              7037.84      10036.09     1.43x
BenchmarkReadFromRandom/best-c1-32                       577.48       676.97       1.17x
BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32        42.85        41.49        0.97x
BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32        3062.77      10061.79     3.29x
BenchmarkReadFromRandom/better-c2-pad-min-32             8347.77      9485.46      1.14x
BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32        1739.21      1727.37      0.99x
BenchmarkReadFromRandom/best-c2-4M-win-32                1422.27      1550.95      1.09x
BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32        1479.63      1518.68      1.03x
BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32     1835.21      3266.65      1.78x
BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32      6810.66      9901.96      1.45x
BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32     7648.27      12359.53     1.62x
BenchmarkReadFromRandom/none-c1-4k-win-32                2529.17      9940.34      3.93x
BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32        3176.76      5867.69      1.85x
BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32        30.10        30.70        1.02x
BenchmarkReadFromRandom/best-c1-pad-min-32               578.54       679.39       1.17x
BenchmarkReadFromRandom/none-c1-4M-win-32                3162.83      5772.97      1.83x
BenchmarkReadFromRandom/default-c1-pad-min-32            3005.20      5687.20      1.89x
BenchmarkReadFromRandom/default-c2-pad-min-32            10173.36     11857.76     1.17x
BenchmarkReadFromRandom/best-c2-32                       1564.37      1643.33      1.05x
BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32      2140.24      4602.10      2.15x
BenchmarkReadFromRandom/better-c2-32                     8655.60      9526.79      1.10x

benchmark                                                old allocs     new allocs     delta
BenchmarkReadFromRandom/default-c1-4k-win-32             8196           6145           -25.02%
BenchmarkReadFromRandom/default-c2-32                    59             57             -3.39%
BenchmarkReadFromRandom/best-c2-4k-win-32                14356          14347          -0.06%
BenchmarkReadFromRandom/default-c2-4k-win-32             14344          14341          -0.02%
BenchmarkReadFromRandom/better-c1-32                     35             25             -28.57%
BenchmarkReadFromRandom/none-c1-32                       35             25             -28.57%
BenchmarkReadFromRandom/best-c1-4k-win-32                8196           6147           -25.00%
BenchmarkReadFromRandom/none-c2-pad-min-32               59             57             -3.39%
BenchmarkReadFromRandom/best-c2-pad-min-32               58             57             -1.72%
BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32        18             15             -16.67%
BenchmarkReadFromRandom/better-c1-4M-win-32              11             7              -36.36%
BenchmarkReadFromRandom/none-c2-4k-win-32                14345          14343          -0.01%
BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32     11             7              -36.36%
BenchmarkReadFromRandom/better-c2-4k-win-32              14343          14343          +0.00%
BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32        11             7              -36.36%
BenchmarkReadFromRandom/none-c1-pad-min-32               35             25             -28.57%
BenchmarkReadFromRandom/none-c2-4M-win-32                18             15             -16.67%
BenchmarkReadFromRandom/default-c2-4M-win-32             18             15             -16.67%
BenchmarkReadFromRandom/better-c1-pad-min-32             35             25             -28.57%
BenchmarkReadFromRandom/best-c1-4M-win-32                11             7              -36.36%
BenchmarkReadFromRandom/default-c1-32                    35             25             -28.57%
BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32      8196           6145           -25.02%
BenchmarkReadFromRandom/none-c2-32                       59             57             -3.39%
BenchmarkReadFromRandom/default-c1-4M-win-32             11             7              -36.36%
BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32     14345          14343          -0.01%
BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32      14344          14346          +0.01%
BenchmarkReadFromRandom/better-c1-4k-win-32              8196           6145           -25.02%
BenchmarkReadFromRandom/better-c2-4M-win-32              18             15             -16.67%
BenchmarkReadFromRandom/best-c1-32                       35             25             -28.57%
BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32        14347          14343          -0.03%
BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32        8196           6145           -25.02%
BenchmarkReadFromRandom/better-c2-pad-min-32             59             57             -3.39%
BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32        14345          14343          -0.01%
BenchmarkReadFromRandom/best-c2-4M-win-32                17             15             -11.76%
BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32        17             15             -11.76%
BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32     8196           6145           -25.02%
BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32      18             15             -16.67%
BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32     18             15             -16.67%
BenchmarkReadFromRandom/none-c1-4k-win-32                8196           6145           -25.02%
BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32        11             7              -36.36%
BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32        8196           6148           -24.99%
BenchmarkReadFromRandom/best-c1-pad-min-32               35             25             -28.57%
BenchmarkReadFromRandom/none-c1-4M-win-32                11             7              -36.36%
BenchmarkReadFromRandom/default-c1-pad-min-32            35             25             -28.57%
BenchmarkReadFromRandom/default-c2-pad-min-32            59             57             -3.39%
BenchmarkReadFromRandom/best-c2-32                       58             57             -1.72%
BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32      11             7              -36.36%
BenchmarkReadFromRandom/better-c2-32                     59             57             -3.39%

benchmark                                                old bytes     new bytes     delta
BenchmarkReadFromRandom/default-c1-4k-win-32             10119115      148076        -98.54%
BenchmarkReadFromRandom/default-c2-32                    1454676       4899          -99.66%
BenchmarkReadFromRandom/best-c2-4k-win-32                631225        625546        -0.90%
BenchmarkReadFromRandom/default-c2-4k-win-32             630732        625959        -0.76%
BenchmarkReadFromRandom/better-c1-32                     9514904       2797          -99.97%
BenchmarkReadFromRandom/none-c1-32                       9516943       2440          -99.97%
BenchmarkReadFromRandom/best-c1-4k-win-32                10119906      150564        -98.51%
BenchmarkReadFromRandom/none-c2-pad-min-32               1500410       6171          -99.59%
BenchmarkReadFromRandom/best-c2-pad-min-32               1173566       25894         -97.79%
BenchmarkReadFromRandom/none-c2-4M-win-pad-min-32        6423795       5415          -99.92%
BenchmarkReadFromRandom/better-c1-4M-win-32              12610237      6582          -99.95%
BenchmarkReadFromRandom/none-c2-4k-win-32                632307        626486        -0.92%
BenchmarkReadFromRandom/default-c1-4M-win-pad-min-32     12610296      5410          -99.96%
BenchmarkReadFromRandom/better-c2-4k-win-32              630839        626113        -0.75%
BenchmarkReadFromRandom/best-c1-4M-win-pad-min-32        12610476      49191         -99.61%
BenchmarkReadFromRandom/none-c1-pad-min-32               9514740       2479          -99.97%
BenchmarkReadFromRandom/none-c2-4M-win-32                6545265       13213         -99.80%
BenchmarkReadFromRandom/default-c2-4M-win-32             6481737       11355         -99.82%
BenchmarkReadFromRandom/better-c1-pad-min-32             9514861       2763          -99.97%
BenchmarkReadFromRandom/best-c1-4M-win-32                12610426      57138         -99.55%
BenchmarkReadFromRandom/default-c1-32                    9517523       2540          -99.97%
BenchmarkReadFromRandom/better-c1-4k-win-pad-min-32      10118890      148124        -98.54%
BenchmarkReadFromRandom/none-c2-32                       1530722       5478          -99.64%
BenchmarkReadFromRandom/default-c1-4M-win-32             12610210      5454          -99.96%
BenchmarkReadFromRandom/default-c2-4k-win-pad-min-32     630895        626457        -0.70%
BenchmarkReadFromRandom/better-c2-4k-win-pad-min-32      630794        626720        -0.65%
BenchmarkReadFromRandom/better-c1-4k-win-32              10118922      148161        -98.54%
BenchmarkReadFromRandom/better-c2-4M-win-32              6598822       13798         -99.79%
BenchmarkReadFromRandom/best-c1-32                       9516160       12594         -99.87%
BenchmarkReadFromRandom/best-c2-4k-win-pad-min-32        631572        628459        -0.49%
BenchmarkReadFromRandom/none-c1-4k-win-pad-min-32        10118789      148120        -98.54%
BenchmarkReadFromRandom/better-c2-pad-min-32             1489742       6937          -99.53%
BenchmarkReadFromRandom/none-c2-4k-win-pad-min-32        631988        626774        -0.83%
BenchmarkReadFromRandom/best-c2-4M-win-32                4734174       79313         -98.32%
BenchmarkReadFromRandom/best-c2-4M-win-pad-min-32        4859587       78584         -98.38%
BenchmarkReadFromRandom/default-c1-4k-win-pad-min-32     10118817      148094        -98.54%
BenchmarkReadFromRandom/better-c2-4M-win-pad-min-32      6497809       13671         -99.79%
BenchmarkReadFromRandom/default-c2-4M-win-pad-min-32     6482743       13380         -99.79%
BenchmarkReadFromRandom/none-c1-4k-win-32                10118669      148214        -98.54%
BenchmarkReadFromRandom/none-c1-4M-win-pad-min-32        12610006      5378          -99.96%
BenchmarkReadFromRandom/best-c1-4k-win-pad-min-32        10119906      150652        -98.51%
BenchmarkReadFromRandom/best-c1-pad-min-32               9515185       12174         -99.87%
BenchmarkReadFromRandom/none-c1-4M-win-32                12610010      5304          -99.96%
BenchmarkReadFromRandom/default-c1-pad-min-32            9514765       2516          -99.97%
BenchmarkReadFromRandom/default-c2-pad-min-32            1524283       4986          -99.67%
BenchmarkReadFromRandom/best-c2-32                       1238837       3118          -99.75%
BenchmarkReadFromRandom/better-c1-4M-win-pad-min-32      12636434      7013          -99.94%
BenchmarkReadFromRandom/better-c2-32                     1486984       6114          -99.59%
```

`magicChunkSnappyBytes` and `magicChunkBytes` not included.

4 files changed

tree: 1d48745bd1ba32d57fb516349295da8cb5b1c599

README.md

compress

This package provides various compression algorithms.

zstandard compression and decompression in pure Go.
S2 is a high performance replacement for Snappy.
Optimized deflate packages which can be used as a dropin replacement for gzip, zip and zlib.
snappy is a drop-in replacement for github.com/golang/snappy offering better compression and concurrent streams.
huff0 and FSE implementations for raw entropy encoding.
gzhttp Provides client and server wrappers for handling gzipped requests efficiently.
pgzip is a separate package that provides a very fast parallel gzip implementation.

changelog

Feb 5th, 2024 - 1.17.6
- zstd: Fix incorrect repeat coding in best mode https://github.com/klauspost/compress/pull/923
- s2: Fix DecodeConcurrent deadlock on errors https://github.com/klauspost/compress/pull/925
Jan 26th, 2024 - v1.17.5
- flate: Fix reset with dictionary on custom window encodes https://github.com/klauspost/compress/pull/912
- zstd: Add Frame header encoding and stripping https://github.com/klauspost/compress/pull/908
- zstd: Limit better/best default window to 8MB https://github.com/klauspost/compress/pull/913
- zstd: Speed improvements by @greatroar in https://github.com/klauspost/compress/pull/896 https://github.com/klauspost/compress/pull/910
- s2: Fix callbacks for skippable blocks and disallow 0xfe (Padding) by @Jille in https://github.com/klauspost/compress/pull/916 https://github.com/klauspost/compress/pull/917 https://github.com/klauspost/compress/pull/919 https://github.com/klauspost/compress/pull/918
Dec 1st, 2023 - v1.17.4
- huff0: Speed up symbol counting by @greatroar in https://github.com/klauspost/compress/pull/887
- huff0: Remove byteReader by @greatroar in https://github.com/klauspost/compress/pull/886
- gzhttp: Allow overriding decompression on transport https://github.com/klauspost/compress/pull/892
- gzhttp: Clamp compression level https://github.com/klauspost/compress/pull/890
- gzip: Error out if reserved bits are set https://github.com/klauspost/compress/pull/891
Nov 15th, 2023 - v1.17.3
- fse: Fix max header size https://github.com/klauspost/compress/pull/881
- zstd: Improve better/best compression https://github.com/klauspost/compress/pull/877
- gzhttp: Fix missing content type on Close https://github.com/klauspost/compress/pull/883
Oct 22nd, 2023 - v1.17.2
- zstd: Fix rare CORRUPTION output in “best” mode. See https://github.com/klauspost/compress/pull/876
Oct 14th, 2023 - v1.17.1
- s2: Fix S2 “best” dictionary wrong encoding by @klauspost in https://github.com/klauspost/compress/pull/871
- flate: Reduce allocations in decompressor and minor code improvements by @fakefloordiv in https://github.com/klauspost/compress/pull/869
- s2: Fix EstimateBlockSize on 6&7 length input by @klauspost in https://github.com/klauspost/compress/pull/867
Sept 19th, 2023 - v1.17.0
- Add experimental dictionary builder https://github.com/klauspost/compress/pull/853
- Add xerial snappy read/writer https://github.com/klauspost/compress/pull/838
- flate: Add limited window compression https://github.com/klauspost/compress/pull/843
- s2: Do 2 overlapping match checks https://github.com/klauspost/compress/pull/839
- flate: Add amd64 assembly matchlen https://github.com/klauspost/compress/pull/837
- gzip: Copy bufio.Reader on Reset by @thatguystone in https://github.com/klauspost/compress/pull/860

July 1st, 2023 - v1.16.7
- zstd: Fix default level first dictionary encode https://github.com/klauspost/compress/pull/829
- s2: add GetBufferCapacity() method by @GiedriusS in https://github.com/klauspost/compress/pull/832
June 13, 2023 - v1.16.6
- zstd: correctly ignore WithEncoderPadding(1) by @ianlancetaylor in https://github.com/klauspost/compress/pull/806
- zstd: Add amd64 match length assembly https://github.com/klauspost/compress/pull/824
- gzhttp: Handle informational headers by @rtribotte in https://github.com/klauspost/compress/pull/815
- s2: Improve Better compression slightly https://github.com/klauspost/compress/pull/663
Apr 16, 2023 - v1.16.5
- zstd: readByte needs to use io.ReadFull by @jnoxon in https://github.com/klauspost/compress/pull/802
- gzip: Fix WriterTo after initial read https://github.com/klauspost/compress/pull/804
Apr 5, 2023 - v1.16.4
- zstd: Improve zstd best efficiency by @greatroar and @klauspost in https://github.com/klauspost/compress/pull/784
- zstd: Respect WithAllLitEntropyCompression https://github.com/klauspost/compress/pull/792
- zstd: Fix amd64 not always detecting corrupt data https://github.com/klauspost/compress/pull/785
- zstd: Various minor improvements by @greatroar in https://github.com/klauspost/compress/pull/788 https://github.com/klauspost/compress/pull/794 https://github.com/klauspost/compress/pull/795
- s2: Fix huge block overflow https://github.com/klauspost/compress/pull/779
- s2: Allow CustomEncoder fallback https://github.com/klauspost/compress/pull/780
- gzhttp: Suppport ResponseWriter Unwrap() in gzhttp handler by @jgimenez in https://github.com/klauspost/compress/pull/799
Mar 13, 2023 - v1.16.1
- zstd: Speed up + improve best encoder by @greatroar in https://github.com/klauspost/compress/pull/776
- gzhttp: Add optional BREACH mitigation. https://github.com/klauspost/compress/pull/762 https://github.com/klauspost/compress/pull/768 https://github.com/klauspost/compress/pull/769 https://github.com/klauspost/compress/pull/770 https://github.com/klauspost/compress/pull/767
- s2: Add Intel LZ4s converter https://github.com/klauspost/compress/pull/766
- zstd: Minor bug fixes https://github.com/klauspost/compress/pull/771 https://github.com/klauspost/compress/pull/772 https://github.com/klauspost/compress/pull/773
- huff0: Speed up compress1xDo by @greatroar in https://github.com/klauspost/compress/pull/774
Feb 26, 2023 - v1.16.0
- s2: Add Dictionary support. https://github.com/klauspost/compress/pull/685
- s2: Add Compression Size Estimate. https://github.com/klauspost/compress/pull/752
- s2: Add support for custom stream encoder. https://github.com/klauspost/compress/pull/755
- s2: Add LZ4 block converter. https://github.com/klauspost/compress/pull/748
- s2: Support io.ReaderAt in ReadSeeker. https://github.com/klauspost/compress/pull/747
- s2c/s2sx: Use concurrent decoding. https://github.com/klauspost/compress/pull/746

Jan 21st, 2023 (v1.15.15)
- deflate: Improve level 7-9 by @klauspost in https://github.com/klauspost/compress/pull/739
- zstd: Add delta encoding support by @greatroar in https://github.com/klauspost/compress/pull/728
- zstd: Various speed improvements by @greatroar https://github.com/klauspost/compress/pull/741 https://github.com/klauspost/compress/pull/734 https://github.com/klauspost/compress/pull/736 https://github.com/klauspost/compress/pull/744 https://github.com/klauspost/compress/pull/743 https://github.com/klauspost/compress/pull/745
- gzhttp: Add SuffixETag() and DropETag() options to prevent ETag collisions on compressed responses by @willbicks in https://github.com/klauspost/compress/pull/740
Jan 3rd, 2023 (v1.15.14)
- flate: Improve speed in big stateless blocks https://github.com/klauspost/compress/pull/718
- zstd: Minor speed tweaks by @greatroar in https://github.com/klauspost/compress/pull/716 https://github.com/klauspost/compress/pull/720
- export NoGzipResponseWriter for custom ResponseWriter wrappers by @harshavardhana in https://github.com/klauspost/compress/pull/722
- s2: Add example for indexing and existing stream https://github.com/klauspost/compress/pull/723
Dec 11, 2022 (v1.15.13)
- zstd: Add MaxEncodedSize to encoder https://github.com/klauspost/compress/pull/691
- zstd: Various tweaks and improvements https://github.com/klauspost/compress/pull/693 https://github.com/klauspost/compress/pull/695 https://github.com/klauspost/compress/pull/696 https://github.com/klauspost/compress/pull/701 https://github.com/klauspost/compress/pull/702 https://github.com/klauspost/compress/pull/703 https://github.com/klauspost/compress/pull/704 https://github.com/klauspost/compress/pull/705 https://github.com/klauspost/compress/pull/706 https://github.com/klauspost/compress/pull/707 https://github.com/klauspost/compress/pull/708
Oct 26, 2022 (v1.15.12)
- zstd: Tweak decoder allocs. https://github.com/klauspost/compress/pull/680
- gzhttp: Always delete HeaderNoCompression https://github.com/klauspost/compress/pull/683
Sept 26, 2022 (v1.15.11)
- flate: Improve level 1-3 compression https://github.com/klauspost/compress/pull/678
- zstd: Improve “best” compression by @nightwolfz in https://github.com/klauspost/compress/pull/677
- zstd: Fix+reduce decompression allocations https://github.com/klauspost/compress/pull/668
- zstd: Fix non-effective noescape tag https://github.com/klauspost/compress/pull/667
Sept 16, 2022 (v1.15.10)
- zstd: Add WithDecodeAllCapLimit https://github.com/klauspost/compress/pull/649
- Add Go 1.19 - deprecate Go 1.16 https://github.com/klauspost/compress/pull/651
- flate: Improve level 5+6 compression https://github.com/klauspost/compress/pull/656
- zstd: Improve “better” compresssion https://github.com/klauspost/compress/pull/657
- s2: Improve “best” compression https://github.com/klauspost/compress/pull/658
- s2: Improve “better” compression. https://github.com/klauspost/compress/pull/635
- s2: Slightly faster non-assembly decompression https://github.com/klauspost/compress/pull/646
- Use arrays for constant size copies https://github.com/klauspost/compress/pull/659
July 21, 2022 (v1.15.9)
- zstd: Fix decoder crash on amd64 (no BMI) on invalid input https://github.com/klauspost/compress/pull/645
- zstd: Disable decoder extended memory copies (amd64) due to possible crashes https://github.com/klauspost/compress/pull/644
- zstd: Allow single segments up to “max decoded size” by @klauspost in https://github.com/klauspost/compress/pull/643
July 13, 2022 (v1.15.8)
- gzip: fix stack exhaustion bug in Reader.Read https://github.com/klauspost/compress/pull/641
- s2: Add Index header trim/restore https://github.com/klauspost/compress/pull/638
- zstd: Optimize seqdeq amd64 asm by @greatroar in https://github.com/klauspost/compress/pull/636
- zstd: Improve decoder memcopy https://github.com/klauspost/compress/pull/637
- huff0: Pass a single bitReader pointer to asm by @greatroar in https://github.com/klauspost/compress/pull/634
- zstd: Branchless getBits for amd64 w/o BMI2 by @greatroar in https://github.com/klauspost/compress/pull/640
- gzhttp: Remove header before writing https://github.com/klauspost/compress/pull/639
June 29, 2022 (v1.15.7)
- s2: Fix absolute forward seeks https://github.com/klauspost/compress/pull/633
- zip: Merge upstream https://github.com/klauspost/compress/pull/631
- zip: Re-add zip64 fix https://github.com/klauspost/compress/pull/624
- zstd: translate fseDecoder.buildDtable into asm by @WojciechMula in https://github.com/klauspost/compress/pull/598
- flate: Faster histograms https://github.com/klauspost/compress/pull/620
- deflate: Use compound hcode https://github.com/klauspost/compress/pull/622
June 3, 2022 (v1.15.6)
- s2: Improve coding for long, close matches https://github.com/klauspost/compress/pull/613
- s2c: Add Snappy/S2 stream recompression https://github.com/klauspost/compress/pull/611
- zstd: Always use configured block size https://github.com/klauspost/compress/pull/605
- zstd: Fix incorrect hash table placement for dict encoding in default https://github.com/klauspost/compress/pull/606
- zstd: Apply default config to ZipDecompressor without options https://github.com/klauspost/compress/pull/608
- gzhttp: Exclude more common archive formats https://github.com/klauspost/compress/pull/612
- s2: Add ReaderIgnoreCRC https://github.com/klauspost/compress/pull/609
- s2: Remove sanity load on index creation https://github.com/klauspost/compress/pull/607
- snappy: Use dedicated function for scoring https://github.com/klauspost/compress/pull/614
- s2c+s2d: Use official snappy framed extension https://github.com/klauspost/compress/pull/610
May 25, 2022 (v1.15.5)
- s2: Add concurrent stream decompression https://github.com/klauspost/compress/pull/602
- s2: Fix final emit oob read crash on amd64 https://github.com/klauspost/compress/pull/601
- huff0: asm implementation of Decompress1X by @WojciechMula https://github.com/klauspost/compress/pull/596
- zstd: Use 1 less goroutine for stream decoding https://github.com/klauspost/compress/pull/588
- zstd: Copy literal in 16 byte blocks when possible https://github.com/klauspost/compress/pull/592
- zstd: Speed up when WithDecoderLowmem(false) https://github.com/klauspost/compress/pull/599
- zstd: faster next state update in BMI2 version of decode by @WojciechMula in https://github.com/klauspost/compress/pull/593
- huff0: Do not check max size when reading table. https://github.com/klauspost/compress/pull/586
- flate: Inplace hashing for level 7-9 by @klauspost in https://github.com/klauspost/compress/pull/590
May 11, 2022 (v1.15.4)
- huff0: decompress directly into output by @WojciechMula in #577
- inflate: Keep dict on stack #581
- zstd: Faster decoding memcopy in asm #583
- zstd: Fix ignored crc #580
May 5, 2022 (v1.15.3)
- zstd: Allow to ignore checksum checking by @WojciechMula #572
- s2: Fix incorrect seek for io.SeekEnd in #575
Apr 26, 2022 (v1.15.2)
- zstd: Add x86-64 assembly for decompression on streams and blocks. Contributed by @WojciechMula. Typically 2x faster. #528 #531 #545 #537
- zstd: Add options to ZipDecompressor and fixes #539
- s2: Use sorted search for index #555
- Minimum version is Go 1.16, added CI test on 1.18.
Mar 11, 2022 (v1.15.1)
- huff0: Add x86 assembly of Decode4X by @WojciechMula in #512
- zstd: Reuse zip decoders in #514
- zstd: Detect extra block data and report as corrupted in #520
- zstd: Handle zero sized frame content size stricter in #521
- zstd: Add stricter block size checks in #523
Mar 3, 2022 (v1.15.0)
- zstd: Refactor decoder by @klauspost in #498
- zstd: Add stream encoding without goroutines by @klauspost in #505
- huff0: Prevent single blocks exceeding 16 bits by @klauspost in#507
- flate: Inline literal emission by @klauspost in #509
- gzhttp: Add zstd to transport by @klauspost in #400
- gzhttp: Make content-type optional by @klauspost in #510

Both compression and decompression now supports “synchronous” stream operations. This means that whenever “concurrency” is set to 1, they will operate without spawning goroutines.

Stream decompression is now faster on asynchronous, since the goroutine allocation much more effectively splits the workload. On typical streams this will typically use 2 cores fully for decompression. When a stream has finished decoding no goroutines will be left over, so decoders can now safely be pooled and still be garbage collected.

While the release has been extensively tested, it is recommended to testing when upgrading.

Feb 22, 2022 (v1.14.4)
- flate: Fix rare huffman only (-2) corruption. #503
- zip: Update deprecated CreateHeaderRaw to correctly call CreateRaw by @saracen in #502
- zip: don't read data descriptor early by @saracen in #501 #501
- huff0: Use static decompression buffer up to 30% faster by @klauspost in #499 #500
Feb 17, 2022 (v1.14.3)
- flate: Improve fastest levels compression speed ~10% more throughput. #482 #489 #490 #491 #494 #478
- flate: Faster decompression speed, ~5-10%. #483
- s2: Faster compression with Go v1.18 and amd64 microarch level 3+. #484 #486
Jan 25, 2022 (v1.14.2)
- zstd: improve header decoder by @dsnet #476
- zstd: Add bigger default blocks #469
- zstd: Remove unused decompression buffer #470
- zstd: Fix logically dead code by @ningmingxiao #472
- flate: Improve level 7-9 #471 #473
- zstd: Add noasm tag for xxhash #475
Jan 11, 2022 (v1.14.1)
- s2: Add stream index in #462
- flate: Speed and efficiency improvements in #439 #461 #455 #452 #458
- zstd: Performance improvement in #420 #456 #437 #467 #468
- zstd: add arm64 xxhash assembly in #464
- Add garbled for binaries for s2 in #445

Aug 30, 2021 (v1.13.5)
- gz/zlib/flate: Alias stdlib errors #425
- s2: Add block support to commandline tools #413
- zstd: pooledZipWriter should return Writers to the same pool #426
- Removed golang/snappy as external dependency for tests #421
Aug 12, 2021 (v1.13.4)
- Add snappy replacement package.
- zstd: Fix incorrect encoding in “best” mode #415
Aug 3, 2021 (v1.13.3)
- zstd: Improve Best compression #404
- zstd: Fix WriteTo error forwarding #411
- gzhttp: Return http.HandlerFunc instead of http.Handler. Unlikely breaking change. #406
- s2sx: Fix max size error #399
- zstd: Add optional stream content size on reset #401
- zstd: use SpeedBestCompression for level >= 10 #410
Jun 14, 2021 (v1.13.1)
- s2: Add full Snappy output support #396
- zstd: Add configurable Decoder window size #394
- gzhttp: Add header to skip compression #389
- s2: Improve speed with bigger output margin #395
Jun 3, 2021 (v1.13.0)
- Added gzhttp which allows wrapping HTTP servers and clients with GZIP compressors.
- zstd: Detect short invalid signatures #382
- zstd: Spawn decoder goroutine only if needed. #380

May 25, 2021 (v1.12.3)
- deflate: Better/faster Huffman encoding #374
- deflate: Allocate less for history. #375
- zstd: Forward read errors #373
Apr 27, 2021 (v1.12.2)
- zstd: Improve better/best compression #360 #364 #365
- zstd: Add helpers to compress/decompress zstd inside zip files #363
- deflate: Improve level 5+6 compression #367
- s2: Improve better/best compression #358 #359
- s2: Load after checking src limit on amd64. #362
- s2sx: Limit max executable size #368
Apr 14, 2021 (v1.12.1)
- snappy package removed. Upstream added as dependency.
- s2: Better compression in “best” mode #353
- s2sx: Add stdin input and detect pre-compressed from signature #352
- s2c/s2d: Add http as possible input #348
- s2c/s2d/s2sx: Always truncate when writing files #352
- zstd: Reduce memory usage further when using WithLowerEncoderMem #346
- s2: Fix potential problem with amd64 assembly and profilers #349

Mar 26, 2021 (v1.11.13)
- zstd: Big speedup on small dictionary encodes #344 #345
- zstd: Add WithLowerEncoderMem encoder option #336
- deflate: Improve entropy compression #338
- s2: Clean up and minor performance improvement in best #341
Mar 5, 2021 (v1.11.12)
- s2: Add s2sx binary that creates self extracting archives.
- s2: Speed up decompression on non-assembly platforms #328
Mar 1, 2021 (v1.11.9)
- s2: Add ARM64 decompression assembly. Around 2x output speed. #324
- s2: Improve “better” speed and efficiency. #325
- s2: Fix binaries.
Feb 25, 2021 (v1.11.8)
- s2: Fixed occational out-of-bounds write on amd64. Upgrade recommended.
- s2: Add AMD64 assembly for better mode. 25-50% faster. #315
- s2: Less upfront decoder allocation. #322
- zstd: Faster “compression” of incompressible data. #314
- zip: Fix zip64 headers. #313
Jan 14, 2021 (v1.11.7)
- Use Bytes() interface to get bytes across packages. #309
- s2: Add ‘best’ compression option. #310
- s2: Add ReaderMaxBlockSize, changes s2.NewReader signature to include varargs. #311
- s2: Fix crash on small better buffers. #308
- s2: Clean up decoder. #312
Jan 7, 2021 (v1.11.6)
- zstd: Make decoder allocations smaller #306
- zstd: Free Decoder resources when Reset is called with a nil io.Reader #305
Dec 20, 2020 (v1.11.4)
- zstd: Add Best compression mode #304
- Add header decoder #299
- s2: Add uncompressed stream option #297
- Simplify/speed up small blocks with known max size. #300
- zstd: Always reset literal dict encoder #303
Nov 15, 2020 (v1.11.3)
- inflate: 10-15% faster decompression #293
- zstd: Tweak DecodeAll default allocation #295
Oct 11, 2020 (v1.11.2)
- s2: Fix out of bounds read in “better” block compression #291
Oct 1, 2020 (v1.11.1)
- zstd: Set allLitEntropy true in default configuration #286
Sept 8, 2020 (v1.11.0)
- zstd: Add experimental compression dictionaries #281
- zstd: Fix mixed Write and ReadFrom calls #282
- inflate/gz: Limit variable shifts, ~5% faster decompression #274

July 8, 2020 (v1.10.11)
- zstd: Fix extra block when compressing with ReadFrom. #278
- huff0: Also populate compression table when reading decoding table. #275
June 23, 2020 (v1.10.10)
- zstd: Skip entropy compression in fastest mode when no matches. #270
June 16, 2020 (v1.10.9):
- zstd: API change for specifying dictionaries. See #268
- zip: update CreateHeaderRaw to handle zip64 fields. #266
- Fuzzit tests removed. The service has been purchased and is no longer available.
June 5, 2020 (v1.10.8):
- 1.15x faster zstd block decompression. #265
June 1, 2020 (v1.10.7):
- Added zstd decompression dictionary support
- Increase zstd decompression speed up to 1.19x. #259
- Remove internal reset call in zstd compression and reduce allocations. #263
May 21, 2020: (v1.10.6)
- zstd: Reduce allocations while decoding. #258, #252
- zstd: Stricter decompression checks.
April 12, 2020: (v1.10.5)
- s2-commands: Flush output when receiving SIGINT. #239
Apr 8, 2020: (v1.10.4)
- zstd: Minor/special case optimizations. #251, #250, #249, #247
Mar 11, 2020: (v1.10.3)
- s2: Use S2 encoder in pure Go mode for Snappy output as well. #245
- s2: Fix pure Go block encoder. #244
- zstd: Added “better compression” mode. #240
- zstd: Improve speed of fastest compression mode by 5-10% #241
- zstd: Skip creating encoders when not needed. #238
Feb 27, 2020: (v1.10.2)
- Close to 50% speedup in inflate (gzip/zip decompression). #236 #234 #232
- Reduce deflate level 1-6 memory usage up to 59%. #227
Feb 18, 2020: (v1.10.1)
- Fix zstd crash when resetting multiple times without sending data. #226
- deflate: Fix dictionary use on level 1-6. #224
- Remove deflate writer reference when closing. #224
Feb 4, 2020: (v1.10.0)
- Add optional dictionary to stateless deflate. Breaking change, send nil for previous behaviour. #216
- Fix buffer overflow on repeated small block deflate. #218
- Allow copying content from an existing ZIP file without decompressing+compressing. #214
- Added S2 AMD64 assembler and various optimizations. Stream speed >10GB/s. #186

Jan 20,2020 (v1.9.8) Optimize gzip/deflate with better size estimates and faster table generation. #207 by luyu6056, #206.
Jan 11, 2020: S2 Encode/Decode will use provided buffer if capacity is big enough. #204
Jan 5, 2020: (v1.9.7) Fix another zstd regression in v1.9.5 - v1.9.6 removed.
Jan 4, 2020: (v1.9.6) Regression in v1.9.5 fixed causing corrupt zstd encodes in rare cases.
Jan 4, 2020: Faster IO in s2c + s2d commandline tools compression/decompression. #192
Dec 29, 2019: Removed v1.9.5 since fuzz tests showed a compatibility problem with the reference zstandard decoder.
Dec 29, 2019: (v1.9.5) zstd: 10-20% faster block compression. #199
Dec 29, 2019: zip package updated with latest Go features
Dec 29, 2019: zstd: Single segment flag condintions tweaked. #197
Dec 18, 2019: s2: Faster compression when ReadFrom is used. #198
Dec 10, 2019: s2: Fix repeat length output when just above at 16MB limit.
Dec 10, 2019: zstd: Add function to get decoder as io.ReadCloser. #191
Dec 3, 2019: (v1.9.4) S2: limit max repeat length. #188
Dec 3, 2019: Add WithNoEntropyCompression to zstd #187
Dec 3, 2019: Reduce memory use for tests. Check for leaked goroutines.
Nov 28, 2019 (v1.9.3) Less allocations in stateless deflate.
Nov 28, 2019: 5-20% Faster huff0 decode. Impacts zstd as well. #184
Nov 12, 2019 (v1.9.2) Added Stateless Compression for gzip/deflate.
Nov 12, 2019: Fixed zstd decompression of large single blocks. #180
Nov 11, 2019: Set default s2c block size to 4MB.
Nov 11, 2019: Reduce inflate memory use by 1KB.
Nov 10, 2019: Less allocations in deflate bit writer.
Nov 10, 2019: Fix inconsistent error returned by zstd decoder.
Oct 28, 2019 (v1.9.1) ztsd: Fix crash when compressing blocks. #174
Oct 24, 2019 (v1.9.0) zstd: Fix rare data corruption #173
Oct 24, 2019 zstd: Fix huff0 out of buffer write #171 and always return errors #172
Oct 10, 2019: Big deflate rewrite, 30-40% faster with better compression #105

Oct 10, 2019: (v1.8.6) zstd: Allow partial reads to get flushed data. #169
Oct 3, 2019: Fix inconsistent results on broken zstd streams.
Sep 25, 2019: Added -rm (remove source files) and -q (no output except errors) to s2c and s2d commands
Sep 16, 2019: (v1.8.4) Add s2c and s2d commandline tools.
Sep 10, 2019: (v1.8.3) Fix s2 decoder Skip.
Sep 7, 2019: zstd: Added WithWindowSize, contributed by ianwilkes.
Sep 5, 2019: (v1.8.2) Add WithZeroFrames which adds full zero payload block encoding option.
Sep 5, 2019: Lazy initialization of zstandard predefined en/decoder tables.
Aug 26, 2019: (v1.8.1) S2: 1-2% compression increase in “better” compression mode.
Aug 26, 2019: zstd: Check maximum size of Huffman 1X compressed literals while decoding.
Aug 24, 2019: (v1.8.0) Added S2 compression, a high performance replacement for Snappy.
Aug 21, 2019: (v1.7.6) Fixed minor issues found by fuzzer. One could lead to zstd not decompressing.
Aug 18, 2019: Add fuzzit continuous fuzzing.
Aug 14, 2019: zstd: Skip incompressible data 2x faster. #147
Aug 4, 2019 (v1.7.5): Better literal compression. #146
Aug 4, 2019: Faster zstd compression. #143 #144
Aug 4, 2019: Faster zstd decompression. #145 #143 #142
July 15, 2019 (v1.7.4): Fix double EOF block in rare cases on zstd encoder.
July 15, 2019 (v1.7.3): Minor speedup/compression increase in default zstd encoder.
July 14, 2019: zstd decoder: Fix decompression error on multiple uses with mixed content.
July 7, 2019 (v1.7.2): Snappy update, zstd decoder potential race fix.
June 17, 2019: zstd decompression bugfix.
June 17, 2019: fix 32 bit builds.
June 17, 2019: Easier use in modules (less dependencies).
June 9, 2019: New stronger “default” zstd compression mode. Matches zstd default compression ratio.
June 5, 2019: 20-40% throughput in zstandard compression and better compression.
June 5, 2019: deflate/gzip compression: Reduce memory usage of lower compression levels.
June 2, 2019: Added zstandard compression!
May 25, 2019: deflate/gzip: 10% faster bit writer, mostly visible in lower levels.
Apr 22, 2019: zstd decompression added.
Aug 1, 2018: Added huff0 README.
Jul 8, 2018: Added Performance Update 2018 below.
Jun 23, 2018: Merged Go 1.11 inflate optimizations. Go 1.9 is now required. Backwards compatible version tagged with v1.3.0.
Apr 2, 2018: Added huff0 en/decoder. Experimental for now, API may change.
Mar 4, 2018: Added FSE Entropy en/decoder. Experimental for now, API may change.
Nov 3, 2017: Add compression Estimate function.
May 28, 2017: Reduce allocations when resetting decoder.
Apr 02, 2017: Change back to official crc32, since changes were merged in Go 1.7.
Jan 14, 2017: Reduce stack pressure due to array copies. See Issue #18625.
Oct 25, 2016: Level 2-4 have been rewritten and now offers significantly better performance than before.
Oct 20, 2016: Port zlib changes from Go 1.7 to fix zlib writer issue. Please update.
Oct 16, 2016: Go 1.7 changes merged. Apples to apples this package is a few percent faster, but has a significantly better balance between speed and compression per level.
Mar 24, 2016: Always attempt Huffman encoding on level 4-7. This improves base 64 encoded data compression.
Mar 24, 2016: Small speedup for level 1-3.
Feb 19, 2016: Faster bit writer, level -2 is 15% faster, level 1 is 4% faster.
Feb 19, 2016: Handle small payloads faster in level 1-3.
Feb 19, 2016: Added faster level 2 + 3 compression modes.
Feb 19, 2016: Rebalanced compression levels, so there is a more even progresssion in terms of compression. New default level is 5.
Feb 14, 2016: Snappy: Merge upstream changes.
Feb 14, 2016: Snappy: Fix aggressive skipping.
Feb 14, 2016: Snappy: Update benchmark.
Feb 13, 2016: Deflate: Fixed assembler problem that could lead to sub-optimal compression.
Feb 12, 2016: Snappy: Added AMD64 SSE 4.2 optimizations to matching, which makes easy to compress material run faster. Typical speedup is around 25%.
Feb 9, 2016: Added Snappy package fork. This version is 5-7% faster, much more on hard to compress content.
Jan 30, 2016: Optimize level 1 to 3 by not considering static dictionary or storing uncompressed. ~4-5% speedup.
Jan 16, 2016: Optimization on deflate level 1,2,3 compression.
Jan 8 2016: Merge CL 18317: fix reading, writing of zip64 archives.
Dec 8 2015: Make level 1 and -2 deterministic even if write size differs.
Dec 8 2015: Split encoding functions, so hashing and matching can potentially be inlined. 1-3% faster on AMD64. 5% faster on other platforms.
Dec 8 2015: Fixed rare one byte out-of bounds read. Please update!
Nov 23 2015: Optimization on token writer. ~2-4% faster. Contributed by @dsnet.
Nov 20 2015: Small optimization to bit writer on 64 bit systems.
Nov 17 2015: Fixed out-of-bound errors if the underlying Writer returned an error. See #15.
Nov 12 2015: Added io.WriterTo support to gzip/inflate.
Nov 11 2015: Merged CL 16669: archive/zip: enable overriding (de)compressors per file
Oct 15 2015: Added skipping on uncompressible data. Random data speed up >5x.

deflate usage

The packages are drop-in replacements for standard libraries. Simply replace the import path to use them:

old import	new import	Documentation
`compress/gzip`	`github.com/klauspost/compress/gzip`	gzip
`compress/zlib`	`github.com/klauspost/compress/zlib`	zlib
`archive/zip`	`github.com/klauspost/compress/zip`	zip
`compress/flate`	`github.com/klauspost/compress/flate`	flate

Optimized deflate packages which can be used as a dropin replacement for gzip, zip and zlib.

You may also be interested in pgzip, which is a drop in replacement for gzip, which support multithreaded compression on big files and the optimized crc32 package used by these packages.

The packages contains the same as the standard library, so you can use the godoc for that: gzip, zip, zlib, flate.

Currently there is only minor speedup on decompression (mostly CRC32 calculation).

Memory usage is typically 1MB for a Writer. stdlib is in the same range. If you expect to have a lot of concurrently allocated Writers consider using the stateless compress described below.

For compression performance, see: this spreadsheet.

To disable all assembly add -tags=noasm. This works across all packages.

Stateless compression

This package offers stateless compression as a special option for gzip/deflate. It will do compression but without maintaining any state between Write calls.

This means there will be no memory kept between Write calls, but compression and speed will be suboptimal.

This is only relevant in cases where you expect to run many thousands of compressors concurrently, but with very little activity. This is not intended for regular web servers serving individual requests.

Because of this, the size of actual Write calls will affect output size.

In gzip, specify level -3 / gzip.StatelessCompression to enable.

For direct deflate use, NewStatelessWriter and StatelessDeflate are available. See documentation

A bufio.Writer can of course be used to control write sizes. For example, to use a 4KB buffer:

	// replace 'ioutil.Discard' with your output.
	gzw, err := gzip.NewWriterLevel(ioutil.Discard, gzip.StatelessCompression)
	if err != nil {
		return err
	}
	defer gzw.Close()

	w := bufio.NewWriterSize(gzw, 4096)
	defer w.Flush()
	
	// Write to 'w'

This will only use up to 4KB in memory when the writer is idle.

Compression is almost always worse than the fastest compression level and each write will allocate (a little) memory.

Performance Update 2018

It has been a while since we have been looking at the speed of this package compared to the standard library, so I thought I would re-do my tests and give some overall recommendations based on the current state. All benchmarks have been performed with Go 1.10 on my Desktop Intel(R) Core(TM) i7-2600 CPU @3.40GHz. Since I last ran the tests, I have gotten more RAM, which means tests with big files are no longer limited by my SSD.

The raw results are in my updated spreadsheet. Due to cgo changes and upstream updates i could not get the cgo version of gzip to compile. Instead I included the zstd cgo implementation. If I get cgo gzip to work again, I might replace the results in the sheet.

The columns to take note of are: MB/s - the throughput. Reduction - the data size reduction in percent of the original. Rel Speed relative speed compared to the standard library at the same level. Smaller - how many percent smaller is the compressed output compared to stdlib. Negative means the output was bigger. Loss means the loss (or gain) in compression as a percentage difference of the input.

The gzstd (standard library gzip) and gzkp (this package gzip) only uses one CPU core. pgzip, bgzf uses all 4 cores. zstd uses one core, and is a beast (but not Go, yet).

Overall differences.

There appears to be a roughly 5-10% speed advantage over the standard library when comparing at similar compression levels.

The biggest difference you will see is the result of re-balancing the compression levels. I wanted by library to give a smoother transition between the compression levels than the standard library.

This package attempts to provide a more smooth transition, where “1” is taking a lot of shortcuts, “5” is the reasonable trade-off and “9” is the “give me the best compression”, and the values in between gives something reasonable in between. The standard library has big differences in levels 1-4, but levels 5-9 having no significant gains - often spending a lot more time than can be justified by the achieved compression.

There are links to all the test data in the spreadsheet in the top left field on each tab.

Web Content

This test set aims to emulate typical use in a web server. The test-set is 4GB data in 53k files, and is a mixture of (mostly) HTML, JS, CSS.

Since level 1 and 9 are close to being the same code, they are quite close. But looking at the levels in-between the differences are quite big.

Looking at level 6, this package is 88% faster, but will output about 6% more data. For a web server, this means you can serve 88% more data, but have to pay for 6% more bandwidth. You can draw your own conclusions on what would be the most expensive for your case.

Object files

This test is for typical data files stored on a server. In this case it is a collection of Go precompiled objects. They are very compressible.

The picture is similar to the web content, but with small differences since this is very compressible. Levels 2-3 offer good speed, but is sacrificing quite a bit of compression.

The standard library seems suboptimal on level 3 and 4 - offering both worse compression and speed than level 6 & 7 of this package respectively.

Highly Compressible File

This is a JSON file with very high redundancy. The reduction starts at 95% on level 1, so in real life terms we are dealing with something like a highly redundant stream of data, etc.

It is definitely visible that we are dealing with specialized content here, so the results are very scattered. This package does not do very well at levels 1-4, but picks up significantly at level 5 and levels 7 and 8 offering great speed for the achieved compression.

So if you know you content is extremely compressible you might want to go slightly higher than the defaults. The standard library has a huge gap between levels 3 and 4 in terms of speed (2.75x slowdown), so it offers little “middle ground”.

Medium-High Compressible

This is a pretty common test corpus: enwik9. It contains the first 10^9 bytes of the English Wikipedia dump on Mar. 3, 2006. This is a very good test of typical text based compression and more data heavy streams.

We see a similar picture here as in “Web Content”. On equal levels some compression is sacrificed for more speed. Level 5 seems to be the best trade-off between speed and size, beating stdlib level 3 in both.

Medium Compressible

I will combine two test sets, one 10GB file set and a VM disk image (~8GB). Both contain different data types and represent a typical backup scenario.

The most notable thing is how quickly the standard library drops to very low compression speeds around level 5-6 without any big gains in compression. Since this type of data is fairly common, this does not seem like good behavior.

Un-compressible Content

This is mainly a test of how good the algorithms are at detecting un-compressible input. The standard library only offers this feature with very conservative settings at level 1. Obviously there is no reason for the algorithms to try to compress input that cannot be compressed. The only downside is that it might skip some compressible data on false detections.

Huffman only compression

This compression library adds a special compression level, named HuffmanOnly, which allows near linear time compression. This is done by completely disabling matching of previous data, and only reduce the number of bits to represent each character.

This means that often used characters, like ‘e’ and ' ' (space) in text use the fewest bits to represent, and rare characters like ‘¤’ takes more bits to represent. For more information see wikipedia or this nice video.

Since this type of compression has much less variance, the compression speed is mostly unaffected by the input data, and is usually more than 180MB/s for a single core.

The downside is that the compression ratio is usually considerably worse than even the fastest conventional compression. The compression ratio can never be better than 8:1 (12.5%).

The linear time compression can be used as a “better than nothing” mode, where you cannot risk the encoder to slow down on some content. For comparison, the size of the “Twain” text is 233460 bytes (+29% vs. level 1) and encode speed is 144MB/s (4.5x level 1). So in this case you trade a 30% size increase for a 4 times speedup.

For more information see my blog post on Fast Linear Time Compression.

This is implemented on Go 1.7 as “Huffman Only” mode, though not exposed for gzip.

Other packages

Here are other packages of good quality and pure Go (no cgo wrappers or autoconverted code):

github.com/pierrec/lz4 - strong multithreaded LZ4 compression.
github.com/cosnicolaou/pbzip2 - multithreaded bzip2 decompression.
github.com/dsnet/compress - brotli decompression, bzip2 writer.
github.com/ronanh/intcomp - Integer compression.
github.com/spenczar/fpc - Float compression.
github.com/minio/zipindex - External ZIP directory index.
github.com/ybirader/pzip - Fast concurrent zip archiver and extractor.

license

This code is licensed under the same conditions as the original Go code. See LICENSE file.