Optimize remaining mse and sse functions in variance_neon.c
Implement sum of squared difference calculations in vpx_mse16x16_neon
and vpx_get4x4sse_cs_neon using the ABD and UDOT instructions -
instead of widening subtracts followed by a sequence of MLAs.
The existing implementation is retained for use on CPUs that do not
implement the Armv8.4-A UDOT instruction. This commit also updates
the variable names used in the existing implementations to be more
1 file changed