Manually unroll the inner loop of Neon sad16x_4d()

Manually unrolling the inner loop is sufficient to stop the compiler
getting confused and emitting inefficient code.

Co-authored by: James Greenhalgh <james.greenhalgh@arm.com>

Bug: b/181236880
Change-Id: I860768ce0e6c0e0b6286d3fc1b94f0eae95d0a1a
1 file changed