[zlib][x86] Allow build & execution of both optimized CRC-32 functions
In Chromium zlib we have quite a few implementations of CRC-32, as follows:
- x86: vectorized SSE4.2 and AVX-512 functions.
- Arm: scalar crc32 using the crypto extensions (32bit & 64bit) and PMULL
based (aarch64 only)
- Portable: using the Kadatch-Jenkins algorithm, implemented by Mark Adler
in the canonical zlib.
The current behavior for x86-64 is that it was exclusive: either use AVX-512
or the SSE4.2 function, decided at compile time.
Instead the best approach is to have both built if AVX-512 optimizations are
enabled at compile time and leverage the best performant version depending
on the *length* of the data inputs.
Initial data points to an improvement of near +2% faster data decompression by
leveraging this strategy, tested on Xeon 4th gen (SPR).
Bug: 340921315
Change-Id: I6bb0bea763be1bb26b63d4f966b767b00310bd6c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5549255
Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
Reviewed-by: Hans Wennborg <hans@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1304191}
NOKEYCHECK=True
GitOrigin-RevId: 82602b4e2b1b2a476e465f1531545d3efa61fbf3
2 files changed