commit | 356232b687b98328aa28c64889b429a7649c0db1 | [log] [tgz] |
---|---|---|
author | George Steed <george.steed@arm.com> | Thu Mar 14 05:12:21 2024 |
committer | Frank Barchard <fbarchard@chromium.org> | Thu Apr 25 21:23:55 2024 |
tree | 7a799cb2669aa1300f88b65ba2bfd54df8812dd5 | |
parent | 4f52235a6719eba097ccac60d84bd2c23bad89ed [diff] |
[AArch64] Replace UQXTN{,2} with UZP2 in Convert16To8Row_NEON The existing code makes use of a pair of shifts to put the bits we want in the low part of each vector lane and then a pair of UQXTN and UQXTN2 instructions to perform a saturating cast down from 16-bit elements to 8-bit elements. We can instead achieve the same thing by adding eight to the first shift amount so that the bits we want appear in the high half of the lane, doing the saturation at the same time, and then simply use UZP2 to pull out the high halves of each lane in a single instruction. Reduction in runtime for Convert16To8Row_NEON: Cortex-A55: -19.7% Cortex-A510: -23.5% Cortex-A76: -35.4% Cortex-X2: -34.1% Bug: libyuv:976 Change-Id: I9a80c0f4f2c6b5203f23e422c0970d3167052f91 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463950 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
libyuv is an open source project that includes YUV scaling and conversion functionality.
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.