5645938c36b9cd1fa4f7c97da0e8c0ef0330d45d - webm/libvpx

commit	5645938c36b9cd1fa4f7c97da0e8c0ef0330d45d	[log] [tgz]
author	Jonathan Wright <jonathan.wright@arm.com>	Wed May 18 15:58:50 2022
committer	Jonathan Wright <jonathan.wright@arm.com>	Thu Jan 12 10:43:13 2023
tree	2f5fe8b4cb966fd77f420a4f9573126c9855f1e6
parent	f952068691bcc397a17721d004ac84e63e46bb3c [diff]

Implement vertical convolutions using Neon USDOT instruction

Add additional AArch64 paths for vpx_convolve8_vert_neon and
vpx_convolve8_avg_vert_neon that use the Armv8.6-A USDOT (mixed-sign
dot-product) instruction. The USDOT instruction takes an 8-bit
unsigned operand vector and a signed 8-bit operand vector to produce
a signed 32-bit result. This is helpful because convolution filters
often have both positive and negative values, while the 8-bit pixel
channel data being filtered is all unsigned. As a result, the USDOT
convolution paths added here do not have to do the "transform the
pixel channel data to [-128, 128) and correct for it later" dance
that we have to do with the SDOT paths.

The USDOT instruction is optional from Armv8.2 to Armv8.5 but
mandatory from Armv8.6 onwards. The availability of the USDOT
instruction is indicated by the feature macro
__ARM_FEATURE_MATMUL_INT8. The SDOT paths are retained for use on
target CPUs that do not implement the USDOT instructions.

Change-Id: Ifbf467681dd53bb1d26e22359885e6edde3c5c72

2 files changed