blob: ae55cabcfe16c7fab0718e1f69ab141d7ef24352 [file] [log] [blame]
<p>Android NDK &amp; ARM NEON instruction set extension support</p>
<h1>Introduction:</h1>
<p>Android NDK r3 added support for the new 'armeabi-v7a' ARM-based ABI
that allows native code to use two useful instruction set extensions:</p>
<ul>
<li>
<p>Thumb-2, which provides performance comparable to 32-bit ARM
instructions with similar compactness to Thumb-1</p>
</li>
<li>
<p>VFPv3, which provides hardware FPU registers and computations,
to boost floating point performance significantly.</p>
</li>
</ul>
<p>More specifically, by default 'armeabi-v7a' only supports
VFPv3-D16 which only uses/requires 16 hardware FPU 64-bit registers.</p>
<p>More information about this can be read in <a href="CPU-ARCH-ABIS.html">CPU-ARCH-ABIS</a></p>
<p>The ARMv7 Architecture Reference Manual also defines another optional
instruction set extension known as "ARM Advanced SIMD", nick-named
"NEON". It provides:</p>
<ul>
<li>
<p>A set of interesting scalar/vector instructions and registers
(the latter are mapped to the same chip area as the FPU ones),
comparable to MMX/SSE/3DNow! in the x86 world.</p>
</li>
<li>
<p>VFPv3-D32 as a requirement (i.e. 32 hardware FPU 64-bit registers,
instead of the minimum of 16).</p>
</li>
</ul>
<p>Not all ARMv7-based Android devices will support NEON, but those that
do may benefit in significant ways from the scalar/vector instructions.</p>
<p>The NDK supports the compilation of modules or even specific source
files with support for NEON. What this means is that a specific compiler
flag will be used to enable the use of GCC ARM Neon intrinsics and
VFPv3-D32 at the same time. The intrinsics are described here:</p>
<blockquote>
<p><a href="http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html">http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html</a></p>
</blockquote>
<h2>Using <code>LOCAL_ARM_NEON</code>:</h2>
<p>Define <code>LOCAL_ARM_NEON</code> to 'true' in your module definition, and the NDK
will build all its source files with NEON support. This can be useful if
you want to build a static or shared library that specifically contains
NEON code paths.</p>
<h2>Using the .neon suffix:</h2>
<p>When listing sources files in your <code>LOCAL_SRC_FILES</code> variable, you now have
the option of using the .neon suffix to indicate that you want to
corresponding source(s) to be built with Neon support. For example:</p>
<pre><code> LOCAL_SRC_FILES := foo.c.neon bar.c
</code></pre>
<p>Will only build 'foo.c' with NEON support.</p>
<p>Note that the .neon suffix can be used with the .arm suffix too (used to
specify the 32-bit ARM instruction set for non-NEON instructions), but must
appear after it.</p>
<p>In other words, 'foo.c.arm.neon' works, but 'foo.c.neon.arm' does NOT.</p>
<h2>Build Requirements:</h2>
<p>Neon support only works when targeting the 'armeabi-v7a' ABI, otherwise the
NDK build scripts will complain and abort. It is important to use checks like
the following in your Android.mk:</p>
<pre><code> # define a static library containing our NEON code
ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
include $(CLEAR_VARS)
LOCAL_MODULE := mylib-neon
LOCAL_SRC_FILES := mylib-neon.c
LOCAL_ARM_NEON := true
include $(BUILD_STATIC_LIBRARY)
endif # TARGET_ARCH_ABI == armeabi-v7a
</code></pre>
<h2>Runtime Detection:</h2>
<p>As said previously, NOT ALL ARMv7-BASED ANDROID DEVICES WILL SUPPORT NEON !
It is thus crucial to perform runtime detection to know if the NEON-capable
machine code can be run on the target device.</p>
<p>To do that, use the 'cpufeatures' library that comes with this NDK. To learn
more about it, see <a href="CPU-FEATURES.html">CPU-FEATURES</a>.</p>
<p>You should explicitly check that android_getCpuFamily() returns
ANDROID_CPU_FAMILY_ARM, and that android_getCpuFeatures() returns a value
that has the ANDROID_CPU_ARM_FEATURE_NEON flag set, as in:</p>
<pre><code> #include &lt;cpu-features.h&gt;
...
...
if (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM &amp;&amp;
(android_getCpuFeatures() &amp; ANDROID_CPU_ARM_FEATURE_NEON) != 0)
{
// use NEON-optimized routines
...
}
else
{
// use non-NEON fallback routines instead
...
}
...
</code></pre>
<h2>Sample code:</h2>
<p>Look at the source code for the "hello-neon" sample in this NDK for an example
on how to use the 'cpufeatures' library and Neon intrinsics at the same time.</p>
<p>This implements a tiny benchmark for a FIR filter loop using a C version, and
a NEON-optimized one for devices that support it.</p>