| <p>Android NDK & ARM NEON instruction set extension support</p> |
| <h1>Introduction:</h1> |
| <p>Android NDK r3 added support for the new 'armeabi-v7a' ARM-based ABI |
| that allows native code to use two useful instruction set extensions:</p> |
| <ul> |
| <li> |
| <p>Thumb-2, which provides performance comparable to 32-bit ARM |
| instructions with similar compactness to Thumb-1</p> |
| </li> |
| <li> |
| <p>VFPv3, which provides hardware FPU registers and computations, |
| to boost floating point performance significantly.</p> |
| </li> |
| </ul> |
| <p>More specifically, by default 'armeabi-v7a' only supports |
| VFPv3-D16 which only uses/requires 16 hardware FPU 64-bit registers.</p> |
| <p>More information about this can be read in <a href="CPU-ARCH-ABIS.html">CPU-ARCH-ABIS</a></p> |
| <p>The ARMv7 Architecture Reference Manual also defines another optional |
| instruction set extension known as "ARM Advanced SIMD", nick-named |
| "NEON". It provides:</p> |
| <ul> |
| <li> |
| <p>A set of interesting scalar/vector instructions and registers |
| (the latter are mapped to the same chip area as the FPU ones), |
| comparable to MMX/SSE/3DNow! in the x86 world.</p> |
| </li> |
| <li> |
| <p>VFPv3-D32 as a requirement (i.e. 32 hardware FPU 64-bit registers, |
| instead of the minimum of 16).</p> |
| </li> |
| </ul> |
| <p>Not all ARMv7-based Android devices will support NEON, but those that |
| do may benefit in significant ways from the scalar/vector instructions.</p> |
| <p>The NDK supports the compilation of modules or even specific source |
| files with support for NEON. What this means is that a specific compiler |
| flag will be used to enable the use of GCC ARM Neon intrinsics and |
| VFPv3-D32 at the same time. The intrinsics are described here:</p> |
| <blockquote> |
| <p><a href="http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html">http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html</a></p> |
| </blockquote> |
| <h2>Using <code>LOCAL_ARM_NEON</code>:</h2> |
| <p>Define <code>LOCAL_ARM_NEON</code> to 'true' in your module definition, and the NDK |
| will build all its source files with NEON support. This can be useful if |
| you want to build a static or shared library that specifically contains |
| NEON code paths.</p> |
| <h2>Using the .neon suffix:</h2> |
| <p>When listing sources files in your <code>LOCAL_SRC_FILES</code> variable, you now have |
| the option of using the .neon suffix to indicate that you want to |
| corresponding source(s) to be built with Neon support. For example:</p> |
| <pre><code> LOCAL_SRC_FILES := foo.c.neon bar.c |
| </code></pre> |
| <p>Will only build 'foo.c' with NEON support.</p> |
| <p>Note that the .neon suffix can be used with the .arm suffix too (used to |
| specify the 32-bit ARM instruction set for non-NEON instructions), but must |
| appear after it.</p> |
| <p>In other words, 'foo.c.arm.neon' works, but 'foo.c.neon.arm' does NOT.</p> |
| <h2>Build Requirements:</h2> |
| <p>Neon support only works when targeting the 'armeabi-v7a' ABI, otherwise the |
| NDK build scripts will complain and abort. It is important to use checks like |
| the following in your Android.mk:</p> |
| <pre><code> # define a static library containing our NEON code |
| ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) |
| include $(CLEAR_VARS) |
| LOCAL_MODULE := mylib-neon |
| LOCAL_SRC_FILES := mylib-neon.c |
| LOCAL_ARM_NEON := true |
| include $(BUILD_STATIC_LIBRARY) |
| endif # TARGET_ARCH_ABI == armeabi-v7a |
| </code></pre> |
| <h2>Runtime Detection:</h2> |
| <p>As said previously, NOT ALL ARMv7-BASED ANDROID DEVICES WILL SUPPORT NEON ! |
| It is thus crucial to perform runtime detection to know if the NEON-capable |
| machine code can be run on the target device.</p> |
| <p>To do that, use the 'cpufeatures' library that comes with this NDK. To learn |
| more about it, see <a href="CPU-FEATURES.html">CPU-FEATURES</a>.</p> |
| <p>You should explicitly check that android_getCpuFamily() returns |
| ANDROID_CPU_FAMILY_ARM, and that android_getCpuFeatures() returns a value |
| that has the ANDROID_CPU_ARM_FEATURE_NEON flag set, as in:</p> |
| <pre><code> #include <cpu-features.h> |
| |
| ... |
| ... |
| |
| if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM && |
| (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0) |
| { |
| // use NEON-optimized routines |
| ... |
| } |
| else |
| { |
| // use non-NEON fallback routines instead |
| ... |
| } |
| |
| ... |
| </code></pre> |
| <h2>Sample code:</h2> |
| <p>Look at the source code for the "hello-neon" sample in this NDK for an example |
| on how to use the 'cpufeatures' library and Neon intrinsics at the same time.</p> |
| <p>This implements a tiny benchmark for a FIR filter loop using a C version, and |
| a NEON-optimized one for devices that support it.</p> |