native_client_sdk/src/doc/reference/pnacl-c-cpp-language-support.rst - chromium/src.git - Git at Google

 .. include:: /migration/deprecation.inc

 ============================
 PNaCl C/C++ Language Support
 ============================

 .. contents::
    :local:
    :backlinks: none
    :depth: 3

 Source language support
 =======================

 The currently supported languages are C and C++. The PNaCl toolchain is
 based on recent Clang, which fully supports C++11 and most of C11. A
 detailed status of the language support is available `here
 <http://clang.llvm.org/cxx_status.html>`_.

 For information on using languages other than C/C++, see the :ref:`FAQ
 section on other languages <other_languages>`.

 As for the standard libraries, the PNaCl toolchain is currently based on
 ``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also
 supported but its use is discouraged; see :ref:`building_cpp_libraries`
 for more details.

 Versions
 --------

 Version information can be obtained:

 * Clang/LLVM: run ``pnacl-clang -v``.
 * ``newlib``: use the ``_NEWLIB_VERSION`` macro.
 * ``libc++``: use the ``_LIBCPP_VERSION`` macro.
 * ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro.

 Preprocessor definitions
 ------------------------

 When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__``
 macro. In addition, ``__native_client__`` is defined for compatibility
 with other NaCl toolchains.

 .. _memory_model_and_atomics:

 Memory Model and Atomics
 ========================

 Memory Model for Concurrent Operations
 --------------------------------------

 The memory model offered by PNaCl relies on the same coding guidelines
 as the C11/C++11 one: concurrent accesses must always occur through
 atomic primitives (offered by :ref:`atomic intrinsics
 <bitcode_atomicintrinsics>`), and these accesses must always
 occur with the same size for the same memory location. Visibility of
 stores is provided on a happens-before basis that relates memory
 locations to each other as the C11/C++11 standards do.

 Non-atomic memory accesses may be reordered, separated, elided or fused
 according to C and C++'s memory model before the pexe is created as well
 as after its creation. Accessing atomic memory location through
 non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`.

 As in C11/C++11 some atomic accesses may be implemented with locks on
 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
 ``1``, signifying that all types are sometimes lock-free. The
 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
 current platform's implementation at translation time. These macros,
 methods and functions are in the C11 header ``<stdatomic.h>`` and the
 C++11 header ``<atomic>``.

 The PNaCl toolchain supports concurrent memory accesses through legacy
 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
 primitives and the underlying `GCCMM
 <http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*``
 primitives. ``volatile`` memory accesses can also be used, though these
 are discouraged. See `Volatile Memory Accesses`_.

 PNaCl supports concurrency and parallelism with some restrictions:

 * Threading is explicitly supported and has no restrictions over what
   prevalent implementations offer. See `Threading`_.

 * ``volatile`` and atomic operations are address-free (operations on the
   same memory location via two different addresses work atomically), as
   intended by the C11/C++11 standards. This is critical in supporting
   synchronous "external modifications" such as mapping underlying memory
   at multiple locations.

 * Inter-process communication through shared memory is currently not
   supported. See `Future Directions`_.

 * Signal handling isn't supported, PNaCl therefore promotes all
   primitives to cross-thread (instead of single-thread). This may change
   at a later date. Note that using atomic operations which aren't
   lock-free may lead to deadlocks when handling asynchronous
   signals. See `Future Directions`_.

 * Direct interaction with device memory isn't supported, and there is no
   intent to support it. The embedding sandbox's runtime can offer APIs
   to indirectly access devices.

 Setting up the above mechanisms requires assistance from the embedding
 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
 can be done through regular C/C++ code.

 Atomic Memory Ordering Constraints
 ----------------------------------

 Atomics follow the same ordering constraints as in regular C11/C++11,
 but all accesses are promoted to sequential consistency (the strongest
 memory ordering) at pexe creation time. We plan to support more of the
 C11/C++11 memory orderings in the future.

 Some additional restrictions, following the C11/C++11 standards:

 - Atomic accesses must at least be naturally aligned.
 - Some accesses may not actually be atomic on certain platforms,
   requiring an implementation that uses global locks.
 - An atomic memory location must always be accessed with atomic
   primitives, and these primitives must always be of the same bit size
   for that location.
 - Not all memory orderings are valid for all atomic operations.

 Volatile Memory Accesses
 ------------------------

 The C11/C++11 standards mandate that ``volatile`` accesses execute in
 program order (but are not fences, so other memory operations can
 reorder around them), are not necessarily atomic, and can’t be
 elided. They can be separated into smaller width accesses.

 Before any optimizations occur, the PNaCl toolchain transforms
 ``volatile`` loads and stores into sequentially consistent ``volatile``
 atomic loads and stores, and applies regular compiler optimizations
 along the above guidelines. This orders ``volatiles`` according to the
 atomic rules, and means that fences (including ``__sync_synchronize``)
 act in a better-defined manner. Regular memory accesses still do not
 have ordering guarantees with ``volatile`` and atomic accesses, though
 the internal representation of ``__sync_synchronize`` attempts to
 prevent reordering of memory accesses to objects which may escape.

 Relaxed ordering could be used instead, but for the first release it is
 more conservative to apply sequential consistency. Future releases may
 change what happens at compile-time, but already-released pexes will
 continue using sequential consistency.

 The PNaCl toolchain also requires that ``volatile`` accesses be at least
 naturally aligned, and tries to guarantee this alignment.

 The above guarantees ease the support of legacy (i.e. non-C11/C++11)
 code, and combined with builtin fences these programs can do meaningful
 cross-thread communication without changing code. They also better
 reflect the original code's intent and guarantee better portability.

 .. _language_support_threading:

 Threading
 =========

 Threading is explicitly supported through C11/C++11's threading
 libraries as well as POSIX threads.

 Communication between threads should use atomic primitives as described
 in `Memory Model and Atomics`_.

 ``setjmp`` and ``longjmp``
 ==========================

 PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any
 restrictions beyond C's.

 .. _exception_handling:

 C++ Exception Handling
 ======================

 PNaCl currently supports C++ exception handling through ``setjmp()`` and
 ``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj`` linker
 flag (set with ``LDFLAGS`` when using Make). Exceptions are disabled by default
 so that faster and smaller code is generated, and ``throw`` statements are
 replaced with calls to ``abort()``. The usual ``-fno-exceptions`` flag is also
 supported, though the default is ``-fexceptions``. PNaCl will support full
 zero-cost exception handling in the future.

 .. note:: When using webports_ or other prebuilt static libraries, you don't
           need to recompile because the exception handling support is
           implemented at link time (when all the static libraries are put
           together with your application).

 .. _webports: https://chromium.googlesource.com/webports

 NaCl supports full zero-cost C++ exception handling.

 Inline Assembly
 ===============

 Inline assembly isn't supported by PNaCl because it isn't portable. The
 one current exception is the common compiler barrier idiom
 ``asm("":::"memory")``, which gets transformed to a sequentially
 consistent memory barrier (equivalent to ``__sync_synchronize()``). In
 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
 memory accesses, though in practice the implementation attempts to also
 prevent reordering of memory accesses to objects which may escape.

 PNaCl supports :ref:`Portable SIMD Vectors <portable_simd_vectors>`,
 which are traditionally expressed through target-specific intrinsics or
 inline assembly.

 NaCl supports a fairly wide subset of inline assembly through GCC's
 inline assembly syntax, with the restriction that the sandboxing model
 for the target architecture has to be respected.

 .. _portable_simd_vectors:

 Portable SIMD Vectors
 =====================

 SIMD vectors aren't part of the C/C++ standards and are traditionally
 very hardware-specific. Portable Native Client offers a portable version
 of SIMD vector datatypes and operations which map well to modern
 architectures and offer performance which matches or approaches
 hardware-specific uses.

 SIMD vector support was added to Portable Native Client for version 37 of Chrome
 and more features, including performance enhancements, have been added in
 subsequent releases, see the :ref:`Release Notes <sdk-release-notes>` for more
 details.

 Hand-Coding Vector Extensions
 -----------------------------

 The initial vector support in Portable Native Client adds `LLVM vectors
 <http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_
 and `GCC vectors
 <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these
 are well supported by different hardware platforms and don't require any
 new compiler intrinsics.

 Vector types can be used through the ``vector_size`` attribute:

 .. naclcode::

   #define VECTOR_BYTES 16
   typedef int v4s __attribute__((vector_size(VECTOR_BYTES)));
   v4s a = {1,2,3,4};
   v4s b = {5,6,7,8};
   v4s c, d, e;
   c = a + b;  /* c = {6,8,10,12} */
   d = b >> a; /* d = {2,1,0,0} */

 Vector comparisons are represented as a bitmask as wide as the compared
 elements of all ``0`` or all ``1``:

 .. naclcode::

   typedef int v4s __attribute__((vector_size(16)));
   v4s snip(v4s in) {
     v4s limit = {32,64,128,256};
     v4s mask = in > limit;
     v4s ret = in & mask;
     return ret;
   }

 Vector datatypes are currently expected to be 128-bit wide with one of the
 following element types, and they're expected to be aligned to the underlying
 element's bit width (loads and store will otherwise be broken up into scalar
 accesses to prevent faults):

 ============  ============  ================ ======================
 Type          Num Elements  Vector Bit Width Expected Bit Alignment
 ============  ============  ================ ======================
 ``uint8_t``   16            128              8
 ``int8_t``    16            128              8
 ``uint16_t``  8             128              16
 ``int16_t``   8             128              16
 ``uint32_t``  4             128              32
 ``int32_t``   4             128              32
 ``float``     4             128              32
 ============  ============  ================ ======================

 64-bit integers and double-precision floating point will be supported in
 a future release, as will 256-bit and 512-bit vectors.

 Vector element bit width alignment can be stated explicitly (this is assumed by
 PNaCl, but not necessarily by other compilers), and smaller alignments can also
 be specified:

 .. naclcode::

   typedef int v4s_element   __attribute__((vector_size(16), aligned(4)));
   typedef int v4s_unaligned __attribute__((vector_size(16), aligned(1)));


 The following operators are supported on vectors:

 +----------------------------------------------+
 | unary ``+``, ``-``                           |
 +----------------------------------------------+
 | ``++``, ``--``                               |
 +----------------------------------------------+
 | ``+``, ``-``, ``*``, ``/``, ``%``            |
 +----------------------------------------------+
 | ``&``, ``|``, ``^``, ``~``                   |
 +----------------------------------------------+
 | ``>>``, ``<<``                               |
 +----------------------------------------------+
 | ``!``, ``&&``, ``||``                        |
 +----------------------------------------------+
 | ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` |
 +----------------------------------------------+
 | ``=``                                        |
 +----------------------------------------------+

 C-style casts can be used to convert one vector type to another without
 modifying the underlying bits. ``__builtin_convertvector`` can be used
 to convert from one type to another provided both types have the same
 number of elements, truncating when converting from floating-point to
 integer.

 .. naclcode::

   typedef unsigned v4u __attribute__((vector_size(16)));
   typedef float v4f __attribute__((vector_size(16)));
   v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
   v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23}  */
   v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */

 It is also possible to use array-style indexing into vectors to extract
 individual elements using ``[]``.

 .. naclcode::

   typedef unsigned v4u __attribute__((vector_size(16)));
   template<typename T>
   void print(const T v) {
     for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
       std::cout << v[i] << ' ';
     std::cout << std::endl;
   }

 Vector shuffles (often called permutation or swizzle) operations are
 supported through ``__builtin_shufflevector``. The builtin has two
 vector arguments of the same element type, followed by a list of
 constant integers that specify the element indices of the first two
 vectors that should be extracted and returned in a new vector. These
 element indices are numbered sequentially starting with the first
 vector, continuing into the second vector. Thus, if ``vec1`` is a
 4-element vector, index ``5`` would refer to the second element of
 ``vec2``. An index of ``-1`` can be used to indicate that the
 corresponding element in the returned vector is a don’t care and can be
 optimized by the backend.

 The result of ``__builtin_shufflevector`` is a vector with the same
 element type as ``vec1`` / ``vec2`` but that has an element count equal
 to the number of indices specified.

 .. naclcode::

   // identity operation - return 4-element vector v1.
   __builtin_shufflevector(v1, v1, 0, 1, 2, 3)

   // "Splat" element 0 of v1 into a 4-element result.
   __builtin_shufflevector(v1, v1, 0, 0, 0, 0)

   // Reverse 4-element vector v1.
   __builtin_shufflevector(v1, v1, 3, 2, 1, 0)

   // Concatenate every other element of 4-element vectors v1 and v2.
   __builtin_shufflevector(v1, v2, 0, 2, 4, 6)

   // Concatenate every other element of 8-element vectors v1 and v2.
   __builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14)

   // Shuffle v1 with some elements being undefined
   __builtin_shufflevector(v1, v1, 3, -1, 1, -1)

 One common use of ``__builtin_shufflevector`` is to perform
 vector-scalar operations:

 .. naclcode::

   typedef int v4s __attribute__((vector_size(16)));
   v4s shift_right_by(v4s shift_me, int shift_amount) {
     v4s tmp = {shift_amount};
     return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0);
   }

 Auto-Vectorization
 ------------------

 Auto-vectorization is currently not enabled for Portable Native Client,
 but will be in a future release.

 Undefined Behavior
 ==================

 The C and C++ languages expose some undefined behavior which is
 discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`.

 .. _c_cpp_floating_point:

 Floating-Point
 ==============

 PNaCl exposes 32-bit and 64-bit floating point operations which are
 mostly IEEE-754 compliant. There are a few caveats:

 * Some :ref:`floating-point behavior is currently left as undefined
   <undefined_behavior_fp>`.
 * The default rounding mode is round-to-nearest and other rounding modes
   are currently not usable, which isn't IEEE-754 compliant. PNaCl could
   support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS``
   macros).
 * Signaling ``NaN`` never fault.
 * Fast-math optimizations are currently supported before *pexe* creation
   time. A *pexe* loses all fast-math information when it is
   created. Fast-math translation could be enabled at a later date,
   potentially at a perf-function granularity. This wouldn't affect
   already-existing *pexe*; it would be an opt-in feature.

   * Fused-multiply-add have higher precision and often execute faster;
     PNaCl currently disallows them in the *pexe* because they aren't
     supported on all platforms and can't realistically be
     emulated. PNaCl could (but currently doesn't) only generate them in
     the backend if fast-math were specified and the hardware supports
     the operation.
   * Transcendentals aren't exposed by PNaCl's ABI; they are part of the
     math library that is included in the *pexe*. PNaCl could, but
     currently doesn't, use hardware support if fast-math were provided
     in the *pexe*.

 Computed ``goto``
 =================

 PNaCl supports computed ``goto``, a non-standard GCC extension to C used
 by some interpreters, by lowering them to ``switch`` statements. The
 resulting use of ``switch`` might not be as fast as the original
 indirect branches. If you are compiling a program that has a
 compile-time option for using computed ``goto``, it's possible that the
 program will run faster with the option turned off (e.g., if the program
 does extra work to take advantage of computed ``goto``).

 NaCl supports computed ``goto`` without any transformation.

 Future Directions
 =================

 Inter-Process Communication
 ---------------------------

 Inter-process communication through shared memory is currently not
 supported by PNaCl/NaCl. When implemented, it may be limited to
 operations which are lock-free on the current platform (``is_lock_free``
 methods). It will rely on the address-free properly discussed in `Memory
 Model for Concurrent Operations`_.

 POSIX-style Signal Handling
 ---------------------------

 POSIX-style signal handling really consists of two different features:

 * **Hardware exception handling** (synchronous signals): The ability
   to catch hardware exceptions (such as memory access faults and
   division by zero) using a signal handler.

   PNaCl currently doesn't support hardware exception handling.

   NaCl supports hardware exception handling via the
   ``<nacl/nacl_exception.h>`` interface.

 * **Asynchronous interruption of threads** (asynchronous signals): The
   ability to asynchronously interrupt the execution of a thread,
   forcing the thread to run a signal handler.

   A similar feature is **thread suspension**: The ability to
   asynchronously suspend and resume a thread and inspect or modify its
   execution state (such as register state).

   Neither PNaCl nor NaCl currently support asynchronous interruption
   or suspension of threads.

 If PNaCl were to support either of these, the interaction of
 ``volatile`` and atomics with same-thread signal handling would need
 to be carefully detailed.
	.. include:: /migration/deprecation.inc

	============================
	PNaCl C/C++ Language Support
	============================

	.. contents::
	:local:
	:backlinks: none
	:depth: 3

	Source language support
	=======================

	The currently supported languages are C and C++. The PNaCl toolchain is
	based on recent Clang, which fully supports C++11 and most of C11. A
	detailed status of the language support is available `here
	<http://clang.llvm.org/cxx_status.html>`_.

	For information on using languages other than C/C++, see the :ref:`FAQ
	section on other languages <other_languages>`.

	As for the standard libraries, the PNaCl toolchain is currently based on
	``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also
	supported but its use is discouraged; see :ref:`building_cpp_libraries`
	for more details.

	Versions
	--------

	Version information can be obtained:

	* Clang/LLVM: run ``pnacl-clang -v``.
	* ``newlib``: use the ``_NEWLIB_VERSION`` macro.
	* ``libc++``: use the ``_LIBCPP_VERSION`` macro.
	* ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro.

	Preprocessor definitions
	------------------------

	When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__``
	macro. In addition, ``__native_client__`` is defined for compatibility
	with other NaCl toolchains.

	.. _memory_model_and_atomics:

	Memory Model and Atomics
	========================

	Memory Model for Concurrent Operations
	--------------------------------------

	The memory model offered by PNaCl relies on the same coding guidelines
	as the C11/C++11 one: concurrent accesses must always occur through
	atomic primitives (offered by :ref:`atomic intrinsics
	<bitcode_atomicintrinsics>`), and these accesses must always
	occur with the same size for the same memory location. Visibility of
	stores is provided on a happens-before basis that relates memory
	locations to each other as the C11/C++11 standards do.

	Non-atomic memory accesses may be reordered, separated, elided or fused
	according to C and C++'s memory model before the pexe is created as well
	as after its creation. Accessing atomic memory location through
	non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`.

	As in C11/C++11 some atomic accesses may be implemented with locks on
	certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
	``1``, signifying that all types are sometimes lock-free. The
	``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
	current platform's implementation at translation time. These macros,
	methods and functions are in the C11 header ``<stdatomic.h>`` and the
	C++11 header ``<atomic>``.

	The PNaCl toolchain supports concurrent memory accesses through legacy
	GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
	primitives and the underlying `GCCMM
	<http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*``
	primitives. ``volatile`` memory accesses can also be used, though these
	are discouraged. See `Volatile Memory Accesses`_.

	PNaCl supports concurrency and parallelism with some restrictions:

	* Threading is explicitly supported and has no restrictions over what
	prevalent implementations offer. See `Threading`_.

	* ``volatile`` and atomic operations are address-free (operations on the
	same memory location via two different addresses work atomically), as
	intended by the C11/C++11 standards. This is critical in supporting
	synchronous "external modifications" such as mapping underlying memory
	at multiple locations.

	* Inter-process communication through shared memory is currently not
	supported. See `Future Directions`_.

	* Signal handling isn't supported, PNaCl therefore promotes all
	primitives to cross-thread (instead of single-thread). This may change
	at a later date. Note that using atomic operations which aren't
	lock-free may lead to deadlocks when handling asynchronous
	signals. See `Future Directions`_.

	* Direct interaction with device memory isn't supported, and there is no
	intent to support it. The embedding sandbox's runtime can offer APIs
	to indirectly access devices.

	Setting up the above mechanisms requires assistance from the embedding
	sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
	can be done through regular C/C++ code.

	Atomic Memory Ordering Constraints
	----------------------------------

	Atomics follow the same ordering constraints as in regular C11/C++11,
	but all accesses are promoted to sequential consistency (the strongest
	memory ordering) at pexe creation time. We plan to support more of the
	C11/C++11 memory orderings in the future.

	Some additional restrictions, following the C11/C++11 standards:

	- Atomic accesses must at least be naturally aligned.
	- Some accesses may not actually be atomic on certain platforms,
	requiring an implementation that uses global locks.
	- An atomic memory location must always be accessed with atomic
	primitives, and these primitives must always be of the same bit size
	for that location.
	- Not all memory orderings are valid for all atomic operations.

	Volatile Memory Accesses
	------------------------

	The C11/C++11 standards mandate that ``volatile`` accesses execute in
	program order (but are not fences, so other memory operations can
	reorder around them), are not necessarily atomic, and can’t be
	elided. They can be separated into smaller width accesses.

	Before any optimizations occur, the PNaCl toolchain transforms
	``volatile`` loads and stores into sequentially consistent ``volatile``
	atomic loads and stores, and applies regular compiler optimizations
	along the above guidelines. This orders ``volatiles`` according to the
	atomic rules, and means that fences (including ``__sync_synchronize``)
	act in a better-defined manner. Regular memory accesses still do not
	have ordering guarantees with ``volatile`` and atomic accesses, though
	the internal representation of ``__sync_synchronize`` attempts to
	prevent reordering of memory accesses to objects which may escape.

	Relaxed ordering could be used instead, but for the first release it is
	more conservative to apply sequential consistency. Future releases may
	change what happens at compile-time, but already-released pexes will
	continue using sequential consistency.

	The PNaCl toolchain also requires that ``volatile`` accesses be at least
	naturally aligned, and tries to guarantee this alignment.

	The above guarantees ease the support of legacy (i.e. non-C11/C++11)
	code, and combined with builtin fences these programs can do meaningful
	cross-thread communication without changing code. They also better
	reflect the original code's intent and guarantee better portability.

	.. _language_support_threading:

	Threading
	=========

	Threading is explicitly supported through C11/C++11's threading
	libraries as well as POSIX threads.

	Communication between threads should use atomic primitives as described
	in `Memory Model and Atomics`_.

	``setjmp`` and ``longjmp``
	==========================

	PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any
	restrictions beyond C's.

	.. _exception_handling:

	C++ Exception Handling
	======================

	PNaCl currently supports C++ exception handling through ``setjmp()`` and
	``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj`` linker
	flag (set with ``LDFLAGS`` when using Make). Exceptions are disabled by default
	so that faster and smaller code is generated, and ``throw`` statements are
	replaced with calls to ``abort()``. The usual ``-fno-exceptions`` flag is also
	supported, though the default is ``-fexceptions``. PNaCl will support full
	zero-cost exception handling in the future.

	.. note:: When using webports_ or other prebuilt static libraries, you don't
	need to recompile because the exception handling support is
	implemented at link time (when all the static libraries are put
	together with your application).

	.. _webports: https://chromium.googlesource.com/webports

	NaCl supports full zero-cost C++ exception handling.

	Inline Assembly
	===============

	Inline assembly isn't supported by PNaCl because it isn't portable. The
	one current exception is the common compiler barrier idiom
	``asm("":::"memory")``, which gets transformed to a sequentially
	consistent memory barrier (equivalent to ``__sync_synchronize()``). In
	PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
	memory accesses, though in practice the implementation attempts to also
	prevent reordering of memory accesses to objects which may escape.

	PNaCl supports :ref:`Portable SIMD Vectors <portable_simd_vectors>`,
	which are traditionally expressed through target-specific intrinsics or
	inline assembly.

	NaCl supports a fairly wide subset of inline assembly through GCC's
	inline assembly syntax, with the restriction that the sandboxing model
	for the target architecture has to be respected.

	.. _portable_simd_vectors:

	Portable SIMD Vectors
	=====================

	SIMD vectors aren't part of the C/C++ standards and are traditionally
	very hardware-specific. Portable Native Client offers a portable version
	of SIMD vector datatypes and operations which map well to modern
	architectures and offer performance which matches or approaches
	hardware-specific uses.

	SIMD vector support was added to Portable Native Client for version 37 of Chrome
	and more features, including performance enhancements, have been added in
	subsequent releases, see the :ref:`Release Notes <sdk-release-notes>` for more
	details.

	Hand-Coding Vector Extensions
	-----------------------------

	The initial vector support in Portable Native Client adds `LLVM vectors
	<http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_
	and `GCC vectors
	<http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these
	are well supported by different hardware platforms and don't require any
	new compiler intrinsics.

	Vector types can be used through the ``vector_size`` attribute:

	.. naclcode::

	#define VECTOR_BYTES 16
	typedef int v4s __attribute__((vector_size(VECTOR_BYTES)));
	v4s a = {1,2,3,4};
	v4s b = {5,6,7,8};
	v4s c, d, e;
	c = a + b; /* c = {6,8,10,12} */
	d = b >> a; /* d = {2,1,0,0} */

	Vector comparisons are represented as a bitmask as wide as the compared
	elements of all ``0`` or all ``1``:

	.. naclcode::

	typedef int v4s __attribute__((vector_size(16)));
	v4s snip(v4s in) {
	v4s limit = {32,64,128,256};
	v4s mask = in > limit;
	v4s ret = in & mask;
	return ret;
	}

	Vector datatypes are currently expected to be 128-bit wide with one of the
	following element types, and they're expected to be aligned to the underlying
	element's bit width (loads and store will otherwise be broken up into scalar
	accesses to prevent faults):

	============ ============ ================ ======================
	Type Num Elements Vector Bit Width Expected Bit Alignment
	============ ============ ================ ======================
	``uint8_t`` 16 128 8
	``int8_t`` 16 128 8
	``uint16_t`` 8 128 16
	``int16_t`` 8 128 16
	``uint32_t`` 4 128 32
	``int32_t`` 4 128 32
	``float`` 4 128 32
	============ ============ ================ ======================

	64-bit integers and double-precision floating point will be supported in
	a future release, as will 256-bit and 512-bit vectors.

	Vector element bit width alignment can be stated explicitly (this is assumed by
	PNaCl, but not necessarily by other compilers), and smaller alignments can also
	be specified:

	.. naclcode::

	typedef int v4s_element __attribute__((vector_size(16), aligned(4)));
	typedef int v4s_unaligned __attribute__((vector_size(16), aligned(1)));


	The following operators are supported on vectors:

	+----------------------------------------------+
	\| unary ``+``, ``-`` \|
	+----------------------------------------------+
	\| ``++``, ``--`` \|
	+----------------------------------------------+
	\| ``+``, ``-``, ``*``, ``/``, ``%`` \|
	+----------------------------------------------+
	\| ``&``, ``\|``, ``^``, ``~`` \|
	+----------------------------------------------+
	\| ``>>``, ``<<`` \|
	+----------------------------------------------+
	\| ``!``, ``&&``, ``\|\|`` \|
	+----------------------------------------------+
	\| ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` \|
	+----------------------------------------------+
	\| ``=`` \|
	+----------------------------------------------+

	C-style casts can be used to convert one vector type to another without
	modifying the underlying bits. ``__builtin_convertvector`` can be used
	to convert from one type to another provided both types have the same
	number of elements, truncating when converting from floating-point to
	integer.

	.. naclcode::

	typedef unsigned v4u __attribute__((vector_size(16)));
	typedef float v4f __attribute__((vector_size(16)));
	v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
	v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23} */
	v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */

	It is also possible to use array-style indexing into vectors to extract
	individual elements using ``[]``.

	.. naclcode::

	typedef unsigned v4u __attribute__((vector_size(16)));
	template<typename T>
	void print(const T v) {
	for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
	std::cout << v[i] << ' ';
	std::cout << std::endl;
	}

	Vector shuffles (often called permutation or swizzle) operations are
	supported through ``__builtin_shufflevector``. The builtin has two
	vector arguments of the same element type, followed by a list of
	constant integers that specify the element indices of the first two
	vectors that should be extracted and returned in a new vector. These
	element indices are numbered sequentially starting with the first
	vector, continuing into the second vector. Thus, if ``vec1`` is a
	4-element vector, index ``5`` would refer to the second element of
	``vec2``. An index of ``-1`` can be used to indicate that the
	corresponding element in the returned vector is a don’t care and can be
	optimized by the backend.

	The result of ``__builtin_shufflevector`` is a vector with the same
	element type as ``vec1`` / ``vec2`` but that has an element count equal
	to the number of indices specified.

	.. naclcode::

	// identity operation - return 4-element vector v1.
	__builtin_shufflevector(v1, v1, 0, 1, 2, 3)

	// "Splat" element 0 of v1 into a 4-element result.
	__builtin_shufflevector(v1, v1, 0, 0, 0, 0)

	// Reverse 4-element vector v1.
	__builtin_shufflevector(v1, v1, 3, 2, 1, 0)

	// Concatenate every other element of 4-element vectors v1 and v2.
	__builtin_shufflevector(v1, v2, 0, 2, 4, 6)

	// Concatenate every other element of 8-element vectors v1 and v2.
	__builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14)

	// Shuffle v1 with some elements being undefined
	__builtin_shufflevector(v1, v1, 3, -1, 1, -1)

	One common use of ``__builtin_shufflevector`` is to perform
	vector-scalar operations:

	.. naclcode::

	typedef int v4s __attribute__((vector_size(16)));
	v4s shift_right_by(v4s shift_me, int shift_amount) {
	v4s tmp = {shift_amount};
	return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0);
	}

	Auto-Vectorization
	------------------

	Auto-vectorization is currently not enabled for Portable Native Client,
	but will be in a future release.

	Undefined Behavior
	==================

	The C and C++ languages expose some undefined behavior which is
	discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`.

	.. _c_cpp_floating_point:

	Floating-Point
	==============

	PNaCl exposes 32-bit and 64-bit floating point operations which are
	mostly IEEE-754 compliant. There are a few caveats:

	* Some :ref:`floating-point behavior is currently left as undefined
	<undefined_behavior_fp>`.
	* The default rounding mode is round-to-nearest and other rounding modes
	are currently not usable, which isn't IEEE-754 compliant. PNaCl could
	support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS``
	macros).
	* Signaling ``NaN`` never fault.
	* Fast-math optimizations are currently supported before pexe creation
	time. A pexe loses all fast-math information when it is
	created. Fast-math translation could be enabled at a later date,
	potentially at a perf-function granularity. This wouldn't affect
	already-existing pexe; it would be an opt-in feature.

	* Fused-multiply-add have higher precision and often execute faster;
	PNaCl currently disallows them in the pexe because they aren't
	supported on all platforms and can't realistically be
	emulated. PNaCl could (but currently doesn't) only generate them in
	the backend if fast-math were specified and the hardware supports
	the operation.
	* Transcendentals aren't exposed by PNaCl's ABI; they are part of the
	math library that is included in the pexe. PNaCl could, but
	currently doesn't, use hardware support if fast-math were provided
	in the pexe.

	Computed ``goto``
	=================

	PNaCl supports computed ``goto``, a non-standard GCC extension to C used
	by some interpreters, by lowering them to ``switch`` statements. The
	resulting use of ``switch`` might not be as fast as the original
	indirect branches. If you are compiling a program that has a
	compile-time option for using computed ``goto``, it's possible that the
	program will run faster with the option turned off (e.g., if the program
	does extra work to take advantage of computed ``goto``).

	NaCl supports computed ``goto`` without any transformation.

	Future Directions
	=================

	Inter-Process Communication
	---------------------------

	Inter-process communication through shared memory is currently not
	supported by PNaCl/NaCl. When implemented, it may be limited to
	operations which are lock-free on the current platform (``is_lock_free``
	methods). It will rely on the address-free properly discussed in `Memory
	Model for Concurrent Operations`_.

	POSIX-style Signal Handling
	---------------------------

	POSIX-style signal handling really consists of two different features:

	* Hardware exception handling (synchronous signals): The ability
	to catch hardware exceptions (such as memory access faults and
	division by zero) using a signal handler.

	PNaCl currently doesn't support hardware exception handling.

	NaCl supports hardware exception handling via the
	``<nacl/nacl_exception.h>`` interface.

	* Asynchronous interruption of threads (asynchronous signals): The
	ability to asynchronously interrupt the execution of a thread,
	forcing the thread to run a signal handler.

	A similar feature is thread suspension: The ability to
	asynchronously suspend and resume a thread and inspect or modify its
	execution state (such as register state).

	Neither PNaCl nor NaCl currently support asynchronous interruption
	or suspension of threads.

	If PNaCl were to support either of these, the interaction of
	``volatile`` and atomics with same-thread signal handling would need
	to be carefully detailed.