commit | 432d186116eaef53671152b32131afb2c799c1dc | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Thu Apr 18 13:01:21 2024 |
committer | Frank Barchard <fbarchard@chromium.org> | Mon Sep 16 04:31:35 2024 |
tree | 0055f392eb6d07e36a0f91705f0b71935f309a35 | |
parent | 1c31461771ed6d21101ea7236496a620ba926863 [diff] |
[AArch64] Add Neon dot-product implementation for ARGBSepiaRow We can use the dot product instructions to apply the coefficients directly without the need for LD4 de-interleaving load instructions, since these are known to be slow on some micro-architectures. ST4 is also known to be slow on more modern micro-architectures, however avoiding this is left for a future SVE implementation where we can make use of interleaving-narrowing instructions. Reduction in cycle counts observed compared to existing Neon code: Cortex-A55: -5.8% Cortex-A510: -18.9% Cortex-A76: -21.8% Cortex-A720: -30.2% Cortex-X1: -28.6% Cortex-X2: -23.4% Bug: b/42280946 Change-Id: I5887559649cc805a810d867b652c85d48285657d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5790970 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.