Merge "[NEON] Optimize FHT functions, add highbd FHT 4x4" into main