10823f54681747b9f64deb3002531c95cc67d17f - webm/libvpx

commit	10823f54681747b9f64deb3002531c95cc67d17f	[log] [tgz]
author	Jonathan Wright <jonathan.wright@arm.com>	Sat May 22 21:07:25 2021
committer	James Zern <jzern@google.com>	Tue May 25 00:08:32 2021
tree	94b82a45a96d29498e9cf7fdd06250f576af68b4
parent	66c1ff6850fd53bcf5c17247569bea1d700d6247 [diff]

Merge transpose and permute in Neon SDOT vertical convolution

The original dot-product implementation of vpx_convolve8_vert_neon
used a separate transpose before and after the convolution operation.
This patch merges the first transpose with the TBL permute (necessary
before using SDOT to compute the convolution) to significantly reduce
the amount of data re-arrangement. This new approach also allows for
more effective data re-use between loop iterations.

Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>

Bug: b/181236880
Change-Id: I87fe4dadd312c3ad6216943b71a5410ddf4a1b5b

vpx_dsp/arm/vpx_convolve8_neon.c[diff]
vpx_dsp/arm/vpx_convolve8_neon.h[diff]

2 files changed