Enable decode input reads in 64 bit chunks
This enables reading bigger chunks of data in the DEFLATE decoder
Basically instead of performing 2x 32-bit loads (i.e. ldrb w22,[x9])
followed by a second write in higher lane of the register (i.e. w23),
memcpy will do a 64-bit load to the same register.
(i.e. ldr x22, [x9]).
This also allows to halve the amount of following operations (i.e. adds
and shifts), improving performance in decompression.
9% for little cores (A53).
Reviewed-by: Mike Klein <email@example.com>
Reviewed-by: Adenilson Cavalcanti <firstname.lastname@example.org>
Commit-Queue: Adenilson Cavalcanti <email@example.com>
1 file changed