|  | {{+bindTo:partials.standard_nacl_article}} | 
|  |  | 
|  | <section id="pnacl-c-c-language-support"> | 
|  | <h1 id="pnacl-c-c-language-support">PNaCl C/C++ Language Support</h1> | 
|  | <div class="contents local" id="contents" style="display: none"> | 
|  | <ul class="small-gap"> | 
|  | <li><p class="first"><a class="reference internal" href="#source-language-support" id="id3">Source language support</a></p> | 
|  | <ul class="small-gap"> | 
|  | <li><a class="reference internal" href="#versions" id="id4">Versions</a></li> | 
|  | <li><a class="reference internal" href="#preprocessor-definitions" id="id5">Preprocessor definitions</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li><p class="first"><a class="reference internal" href="#memory-model-and-atomics" id="id6">Memory Model and Atomics</a></p> | 
|  | <ul class="small-gap"> | 
|  | <li><a class="reference internal" href="#memory-model-for-concurrent-operations" id="id7">Memory Model for Concurrent Operations</a></li> | 
|  | <li><a class="reference internal" href="#atomic-memory-ordering-constraints" id="id8">Atomic Memory Ordering Constraints</a></li> | 
|  | <li><a class="reference internal" href="#volatile-memory-accesses" id="id9">Volatile Memory Accesses</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li><a class="reference internal" href="#threading" id="id10">Threading</a></li> | 
|  | <li><a class="reference internal" href="#setjmp-and-longjmp" id="id11"><code>setjmp</code> and <code>longjmp</code></a></li> | 
|  | <li><a class="reference internal" href="#c-exception-handling" id="id12">C++ Exception Handling</a></li> | 
|  | <li><a class="reference internal" href="#inline-assembly" id="id13">Inline Assembly</a></li> | 
|  | <li><p class="first"><a class="reference internal" href="#portable-simd-vectors" id="id14">Portable SIMD Vectors</a></p> | 
|  | <ul class="small-gap"> | 
|  | <li><a class="reference internal" href="#hand-coding-vector-extensions" id="id15">Hand-Coding Vector Extensions</a></li> | 
|  | <li><a class="reference internal" href="#auto-vectorization" id="id16">Auto-Vectorization</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | <li><a class="reference internal" href="#undefined-behavior" id="id17">Undefined Behavior</a></li> | 
|  | <li><a class="reference internal" href="#floating-point" id="id18">Floating-Point</a></li> | 
|  | <li><a class="reference internal" href="#computed-goto" id="id19">Computed <code>goto</code></a></li> | 
|  | <li><p class="first"><a class="reference internal" href="#future-directions" id="id20">Future Directions</a></p> | 
|  | <ul class="small-gap"> | 
|  | <li><a class="reference internal" href="#inter-process-communication" id="id21">Inter-Process Communication</a></li> | 
|  | <li><a class="reference internal" href="#posix-style-signal-handling" id="id22">POSIX-style Signal Handling</a></li> | 
|  | </ul> | 
|  | </li> | 
|  | </ul> | 
|  |  | 
|  | </div><h2 id="source-language-support">Source language support</h2> | 
|  | <p>The currently supported languages are C and C++. The PNaCl toolchain is | 
|  | based on recent Clang, which fully supports C++11 and most of C11. A | 
|  | detailed status of the language support is available <a class="reference external" href="http://clang.llvm.org/cxx_status.html">here</a>.</p> | 
|  | <p>For information on using languages other than C/C++, see the <a class="reference internal" href="/native-client/faq.html#other-languages"><em>FAQ | 
|  | section on other languages</em></a>.</p> | 
|  | <p>As for the standard libraries, the PNaCl toolchain is currently based on | 
|  | <code>libc++</code>, and the <code>newlib</code> standard C library. <code>libstdc++</code> is also | 
|  | supported but its use is discouraged; see <a class="reference internal" href="/native-client/devguide/devcycle/building.html#building-cpp-libraries"><em>C++ standard libraries</em></a> | 
|  | for more details.</p> | 
|  | <h3 id="versions">Versions</h3> | 
|  | <p>Version information can be obtained:</p> | 
|  | <ul class="small-gap"> | 
|  | <li>Clang/LLVM: run <code>pnacl-clang -v</code>.</li> | 
|  | <li><code>newlib</code>: use the <code>_NEWLIB_VERSION</code> macro.</li> | 
|  | <li><code>libc++</code>: use the <code>_LIBCPP_VERSION</code> macro.</li> | 
|  | <li><code>libstdc++</code>: use the <code>_GLIBCXX_VERSION</code> macro.</li> | 
|  | </ul> | 
|  | <h3 id="preprocessor-definitions">Preprocessor definitions</h3> | 
|  | <p>When compiling C/C++ code, the PNaCl toolchain defines the <code>__pnacl__</code> | 
|  | macro. In addition, <code>__native_client__</code> is defined for compatibility | 
|  | with other NaCl toolchains.</p> | 
|  | <h2 id="memory-model-and-atomics"><span id="id1"></span>Memory Model and Atomics</h2> | 
|  | <h3 id="memory-model-for-concurrent-operations">Memory Model for Concurrent Operations</h3> | 
|  | <p>The memory model offered by PNaCl relies on the same coding guidelines | 
|  | as the C11/C++11 one: concurrent accesses must always occur through | 
|  | atomic primitives (offered by <a class="reference internal" href="/native-client/reference/pnacl-bitcode-abi.html#bitcode-atomicintrinsics"><em>atomic intrinsics</em></a>), and these accesses must always | 
|  | occur with the same size for the same memory location. Visibility of | 
|  | stores is provided on a happens-before basis that relates memory | 
|  | locations to each other as the C11/C++11 standards do.</p> | 
|  | <p>Non-atomic memory accesses may be reordered, separated, elided or fused | 
|  | according to C and C++’s memory model before the pexe is created as well | 
|  | as after its creation. Accessing atomic memory location through | 
|  | non-atomic primitives is <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior"><em>Undefined Behavior</em></a>.</p> | 
|  | <p>As in C11/C++11 some atomic accesses may be implemented with locks on | 
|  | certain platforms. The <code>ATOMIC_*_LOCK_FREE</code> macros will always be | 
|  | <code>1</code>, signifying that all types are sometimes lock-free. The | 
|  | <code>is_lock_free</code> methods and <code>atomic_is_lock_free</code> will return the | 
|  | current platform’s implementation at translation time. These macros, | 
|  | methods and functions are in the C11 header <code><stdatomic.h></code> and the | 
|  | C++11 header <code><atomic></code>.</p> | 
|  | <p>The PNaCl toolchain supports concurrent memory accesses through legacy | 
|  | GCC-style <code>__sync_*</code> builtins, as well as through C11/C++11 atomic | 
|  | primitives and the underlying <a class="reference external" href="http://gcc.gnu.org/wiki/Atomic/GCCMM">GCCMM</a> <code>__atomic_*</code> | 
|  | primitives. <code>volatile</code> memory accesses can also be used, though these | 
|  | are discouraged. See <a class="reference internal" href="#volatile-memory-accesses">Volatile Memory Accesses</a>.</p> | 
|  | <p>PNaCl supports concurrency and parallelism with some restrictions:</p> | 
|  | <ul class="small-gap"> | 
|  | <li>Threading is explicitly supported and has no restrictions over what | 
|  | prevalent implementations offer. See <a class="reference internal" href="#threading">Threading</a>.</li> | 
|  | <li><code>volatile</code> and atomic operations are address-free (operations on the | 
|  | same memory location via two different addresses work atomically), as | 
|  | intended by the C11/C++11 standards. This is critical in supporting | 
|  | synchronous “external modifications” such as mapping underlying memory | 
|  | at multiple locations.</li> | 
|  | <li>Inter-process communication through shared memory is currently not | 
|  | supported. See <a class="reference internal" href="#future-directions">Future Directions</a>.</li> | 
|  | <li>Signal handling isn’t supported, PNaCl therefore promotes all | 
|  | primitives to cross-thread (instead of single-thread). This may change | 
|  | at a later date. Note that using atomic operations which aren’t | 
|  | lock-free may lead to deadlocks when handling asynchronous | 
|  | signals. See <a class="reference internal" href="#future-directions">Future Directions</a>.</li> | 
|  | <li>Direct interaction with device memory isn’t supported, and there is no | 
|  | intent to support it. The embedding sandbox’s runtime can offer APIs | 
|  | to indirectly access devices.</li> | 
|  | </ul> | 
|  | <p>Setting up the above mechanisms requires assistance from the embedding | 
|  | sandbox’s runtime (e.g. NaCl’s Pepper APIs), but using them once setup | 
|  | can be done through regular C/C++ code.</p> | 
|  | <h3 id="atomic-memory-ordering-constraints">Atomic Memory Ordering Constraints</h3> | 
|  | <p>Atomics follow the same ordering constraints as in regular C11/C++11, | 
|  | but all accesses are promoted to sequential consistency (the strongest | 
|  | memory ordering) at pexe creation time. We plan to support more of the | 
|  | C11/C++11 memory orderings in the future.</p> | 
|  | <p>Some additional restrictions, following the C11/C++11 standards:</p> | 
|  | <ul class="small-gap"> | 
|  | <li>Atomic accesses must at least be naturally aligned.</li> | 
|  | <li>Some accesses may not actually be atomic on certain platforms, | 
|  | requiring an implementation that uses global locks.</li> | 
|  | <li>An atomic memory location must always be accessed with atomic | 
|  | primitives, and these primitives must always be of the same bit size | 
|  | for that location.</li> | 
|  | <li>Not all memory orderings are valid for all atomic operations.</li> | 
|  | </ul> | 
|  | <h3 id="volatile-memory-accesses">Volatile Memory Accesses</h3> | 
|  | <p>The C11/C++11 standards mandate that <code>volatile</code> accesses execute in | 
|  | program order (but are not fences, so other memory operations can | 
|  | reorder around them), are not necessarily atomic, and can’t be | 
|  | elided. They can be separated into smaller width accesses.</p> | 
|  | <p>Before any optimizations occur, the PNaCl toolchain transforms | 
|  | <code>volatile</code> loads and stores into sequentially consistent <code>volatile</code> | 
|  | atomic loads and stores, and applies regular compiler optimizations | 
|  | along the above guidelines. This orders <code>volatiles</code> according to the | 
|  | atomic rules, and means that fences (including <code>__sync_synchronize</code>) | 
|  | act in a better-defined manner. Regular memory accesses still do not | 
|  | have ordering guarantees with <code>volatile</code> and atomic accesses, though | 
|  | the internal representation of <code>__sync_synchronize</code> attempts to | 
|  | prevent reordering of memory accesses to objects which may escape.</p> | 
|  | <p>Relaxed ordering could be used instead, but for the first release it is | 
|  | more conservative to apply sequential consistency. Future releases may | 
|  | change what happens at compile-time, but already-released pexes will | 
|  | continue using sequential consistency.</p> | 
|  | <p>The PNaCl toolchain also requires that <code>volatile</code> accesses be at least | 
|  | naturally aligned, and tries to guarantee this alignment.</p> | 
|  | <p>The above guarantees ease the support of legacy (i.e. non-C11/C++11) | 
|  | code, and combined with builtin fences these programs can do meaningful | 
|  | cross-thread communication without changing code. They also better | 
|  | reflect the original code’s intent and guarantee better portability.</p> | 
|  | <h2 id="threading"><span id="language-support-threading"></span>Threading</h2> | 
|  | <p>Threading is explicitly supported through C11/C++11’s threading | 
|  | libraries as well as POSIX threads.</p> | 
|  | <p>Communication between threads should use atomic primitives as described | 
|  | in <a class="reference internal" href="#id1">Memory Model and Atomics</a>.</p> | 
|  | <h2 id="setjmp-and-longjmp"><code>setjmp</code> and <code>longjmp</code></h2> | 
|  | <p>PNaCl and NaCl support <code>setjmp</code> and <code>longjmp</code> without any | 
|  | restrictions beyond C’s.</p> | 
|  | <h2 id="c-exception-handling"><span id="exception-handling"></span>C++ Exception Handling</h2> | 
|  | <p>PNaCl currently supports C++ exception handling through <code>setjmp()</code> and | 
|  | <code>longjmp()</code>, which can be enabled with the <code>--pnacl-exceptions=sjlj</code> linker | 
|  | flag (set with <code>LDFLAGS</code> when using Make). Exceptions are disabled by default | 
|  | so that faster and smaller code is generated, and <code>throw</code> statements are | 
|  | replaced with calls to <code>abort()</code>. The usual <code>-fno-exceptions</code> flag is also | 
|  | supported, though the default is <code>-fexceptions</code>. PNaCl will support full | 
|  | zero-cost exception handling in the future.</p> | 
|  | <aside> | 
|  | When using <a class="reference external" href="https://chromium.googlesource.com/webports">webports</a> or other prebuilt static libraries, you don’t | 
|  | need to recompile because the exception handling support is | 
|  | implemented at link time (when all the static libraries are put | 
|  | together with your application). | 
|  | </aside> | 
|  | <p>NaCl supports full zero-cost C++ exception handling.</p> | 
|  | <h2 id="inline-assembly">Inline Assembly</h2> | 
|  | <p>Inline assembly isn’t supported by PNaCl because it isn’t portable. The | 
|  | one current exception is the common compiler barrier idiom | 
|  | <code>asm("":::"memory")</code>, which gets transformed to a sequentially | 
|  | consistent memory barrier (equivalent to <code>__sync_synchronize()</code>). In | 
|  | PNaCl this barrier is only guaranteed to order <code>volatile</code> and atomic | 
|  | memory accesses, though in practice the implementation attempts to also | 
|  | prevent reordering of memory accesses to objects which may escape.</p> | 
|  | <p>PNaCl supports <a class="reference internal" href="#portable-simd-vectors"><em>Portable SIMD Vectors</em></a>, | 
|  | which are traditionally expressed through target-specific intrinsics or | 
|  | inline assembly.</p> | 
|  | <p>NaCl supports a fairly wide subset of inline assembly through GCC’s | 
|  | inline assembly syntax, with the restriction that the sandboxing model | 
|  | for the target architecture has to be respected.</p> | 
|  | <h2 id="portable-simd-vectors"><span id="id2"></span>Portable SIMD Vectors</h2> | 
|  | <p>SIMD vectors aren’t part of the C/C++ standards and are traditionally | 
|  | very hardware-specific. Portable Native Client offers a portable version | 
|  | of SIMD vector datatypes and operations which map well to modern | 
|  | architectures and offer performance which matches or approaches | 
|  | hardware-specific uses.</p> | 
|  | <p>SIMD vector support was added to Portable Native Client for version 37 of Chrome | 
|  | and more features, including performance enhancements, have been added in | 
|  | subsequent releases, see the <a class="reference internal" href="/native-client/sdk/release-notes.html#sdk-release-notes"><em>Release Notes</em></a> for more | 
|  | details.</p> | 
|  | <h3 id="hand-coding-vector-extensions">Hand-Coding Vector Extensions</h3> | 
|  | <p>The initial vector support in Portable Native Client adds <a class="reference external" href="http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors">LLVM vectors</a> | 
|  | and <a class="reference external" href="http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html">GCC vectors</a> since these | 
|  | are well supported by different hardware platforms and don’t require any | 
|  | new compiler intrinsics.</p> | 
|  | <p>Vector types can be used through the <code>vector_size</code> attribute:</p> | 
|  | <pre class="prettyprint"> | 
|  | #define VECTOR_BYTES 16 | 
|  | typedef int v4s __attribute__((vector_size(VECTOR_BYTES))); | 
|  | v4s a = {1,2,3,4}; | 
|  | v4s b = {5,6,7,8}; | 
|  | v4s c, d, e; | 
|  | c = a + b;  /* c = {6,8,10,12} */ | 
|  | d = b >> a; /* d = {2,1,0,0} */ | 
|  | </pre> | 
|  | <p>Vector comparisons are represented as a bitmask as wide as the compared | 
|  | elements of all <code>0</code> or all <code>1</code>:</p> | 
|  | <pre class="prettyprint"> | 
|  | typedef int v4s __attribute__((vector_size(16))); | 
|  | v4s snip(v4s in) { | 
|  | v4s limit = {32,64,128,256}; | 
|  | v4s mask = in > limit; | 
|  | v4s ret = in & mask; | 
|  | return ret; | 
|  | } | 
|  | </pre> | 
|  | <p>Vector datatypes are currently expected to be 128-bit wide with one of the | 
|  | following element types, and they’re expected to be aligned to the underlying | 
|  | element’s bit width (loads and store will otherwise be broken up into scalar | 
|  | accesses to prevent faults):</p> | 
|  | <table border="1" class="docutils"> | 
|  | <colgroup> | 
|  | </colgroup> | 
|  | <thead valign="bottom"> | 
|  | <tr class="row-odd"><th class="head">Type</th> | 
|  | <th class="head">Num Elements</th> | 
|  | <th class="head">Vector Bit Width</th> | 
|  | <th class="head">Expected Bit Alignment</th> | 
|  | </tr> | 
|  | </thead> | 
|  | <tbody valign="top"> | 
|  | <tr class="row-even"><td><code>uint8_t</code></td> | 
|  | <td>16</td> | 
|  | <td>128</td> | 
|  | <td>8</td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>int8_t</code></td> | 
|  | <td>16</td> | 
|  | <td>128</td> | 
|  | <td>8</td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>uint16_t</code></td> | 
|  | <td>8</td> | 
|  | <td>128</td> | 
|  | <td>16</td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>int16_t</code></td> | 
|  | <td>8</td> | 
|  | <td>128</td> | 
|  | <td>16</td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>uint32_t</code></td> | 
|  | <td>4</td> | 
|  | <td>128</td> | 
|  | <td>32</td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>int32_t</code></td> | 
|  | <td>4</td> | 
|  | <td>128</td> | 
|  | <td>32</td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>float</code></td> | 
|  | <td>4</td> | 
|  | <td>128</td> | 
|  | <td>32</td> | 
|  | </tr> | 
|  | </tbody> | 
|  | </table> | 
|  | <p>64-bit integers and double-precision floating point will be supported in | 
|  | a future release, as will 256-bit and 512-bit vectors.</p> | 
|  | <p>Vector element bit width alignment can be stated explicitly (this is assumed by | 
|  | PNaCl, but not necessarily by other compilers), and smaller alignments can also | 
|  | be specified:</p> | 
|  | <pre class="prettyprint"> | 
|  | typedef int v4s_element   __attribute__((vector_size(16), aligned(4))); | 
|  | typedef int v4s_unaligned __attribute__((vector_size(16), aligned(1))); | 
|  | </pre> | 
|  | <p>The following operators are supported on vectors:</p> | 
|  | <table border="1" class="docutils"> | 
|  | <colgroup> | 
|  | </colgroup> | 
|  | <tbody valign="top"> | 
|  | <tr class="row-odd"><td>unary <code>+</code>, <code>-</code></td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>++</code>, <code>--</code></td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>+</code>, <code>-</code>, <code>*</code>, <code>/</code>, <code>%</code></td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>&</code>, <code>|</code>, <code>^</code>, <code>~</code></td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>>></code>, <code><<</code></td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>!</code>, <code>&&</code>, <code>||</code></td> | 
|  | </tr> | 
|  | <tr class="row-odd"><td><code>==</code>, <code>!=</code>, <code>></code>, <code><</code>, <code>>=</code>, <code><=</code></td> | 
|  | </tr> | 
|  | <tr class="row-even"><td><code>=</code></td> | 
|  | </tr> | 
|  | </tbody> | 
|  | </table> | 
|  | <p>C-style casts can be used to convert one vector type to another without | 
|  | modifying the underlying bits. <code>__builtin_convertvector</code> can be used | 
|  | to convert from one type to another provided both types have the same | 
|  | number of elements, truncating when converting from floating-point to | 
|  | integer.</p> | 
|  | <pre class="prettyprint"> | 
|  | typedef unsigned v4u __attribute__((vector_size(16))); | 
|  | typedef float v4f __attribute__((vector_size(16))); | 
|  | v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30}; | 
|  | v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23}  */ | 
|  | v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */ | 
|  | </pre> | 
|  | <p>It is also possible to use array-style indexing into vectors to extract | 
|  | individual elements using <code>[]</code>.</p> | 
|  | <pre class="prettyprint"> | 
|  | typedef unsigned v4u __attribute__((vector_size(16))); | 
|  | template<typename T> | 
|  | void print(const T v) { | 
|  | for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i) | 
|  | std::cout << v[i] << ' '; | 
|  | std::cout << std::endl; | 
|  | } | 
|  | </pre> | 
|  | <p>Vector shuffles (often called permutation or swizzle) operations are | 
|  | supported through <code>__builtin_shufflevector</code>. The builtin has two | 
|  | vector arguments of the same element type, followed by a list of | 
|  | constant integers that specify the element indices of the first two | 
|  | vectors that should be extracted and returned in a new vector. These | 
|  | element indices are numbered sequentially starting with the first | 
|  | vector, continuing into the second vector. Thus, if <code>vec1</code> is a | 
|  | 4-element vector, index <code>5</code> would refer to the second element of | 
|  | <code>vec2</code>. An index of <code>-1</code> can be used to indicate that the | 
|  | corresponding element in the returned vector is a don’t care and can be | 
|  | optimized by the backend.</p> | 
|  | <p>The result of <code>__builtin_shufflevector</code> is a vector with the same | 
|  | element type as <code>vec1</code> / <code>vec2</code> but that has an element count equal | 
|  | to the number of indices specified.</p> | 
|  | <pre class="prettyprint"> | 
|  | // identity operation - return 4-element vector v1. | 
|  | __builtin_shufflevector(v1, v1, 0, 1, 2, 3) | 
|  |  | 
|  | // "Splat" element 0 of v1 into a 4-element result. | 
|  | __builtin_shufflevector(v1, v1, 0, 0, 0, 0) | 
|  |  | 
|  | // Reverse 4-element vector v1. | 
|  | __builtin_shufflevector(v1, v1, 3, 2, 1, 0) | 
|  |  | 
|  | // Concatenate every other element of 4-element vectors v1 and v2. | 
|  | __builtin_shufflevector(v1, v2, 0, 2, 4, 6) | 
|  |  | 
|  | // Concatenate every other element of 8-element vectors v1 and v2. | 
|  | __builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14) | 
|  |  | 
|  | // Shuffle v1 with some elements being undefined | 
|  | __builtin_shufflevector(v1, v1, 3, -1, 1, -1) | 
|  | </pre> | 
|  | <p>One common use of <code>__builtin_shufflevector</code> is to perform | 
|  | vector-scalar operations:</p> | 
|  | <pre class="prettyprint"> | 
|  | typedef int v4s __attribute__((vector_size(16))); | 
|  | v4s shift_right_by(v4s shift_me, int shift_amount) { | 
|  | v4s tmp = {shift_amount}; | 
|  | return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0); | 
|  | } | 
|  | </pre> | 
|  | <h3 id="auto-vectorization">Auto-Vectorization</h3> | 
|  | <p>Auto-vectorization is currently not enabled for Portable Native Client, | 
|  | but will be in a future release.</p> | 
|  | <h2 id="undefined-behavior">Undefined Behavior</h2> | 
|  | <p>The C and C++ languages expose some undefined behavior which is | 
|  | discussed in <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior"><em>PNaCl Undefined Behavior</em></a>.</p> | 
|  | <h2 id="floating-point"><span id="c-cpp-floating-point"></span>Floating-Point</h2> | 
|  | <p>PNaCl exposes 32-bit and 64-bit floating point operations which are | 
|  | mostly IEEE-754 compliant. There are a few caveats:</p> | 
|  | <ul class="small-gap"> | 
|  | <li>Some <a class="reference internal" href="/native-client/reference/pnacl-undefined-behavior.html#undefined-behavior-fp"><em>floating-point behavior is currently left as undefined</em></a>.</li> | 
|  | <li>The default rounding mode is round-to-nearest and other rounding modes | 
|  | are currently not usable, which isn’t IEEE-754 compliant. PNaCl could | 
|  | support switching modes (the 4 modes exposed by C99 <code>FLT_ROUNDS</code> | 
|  | macros).</li> | 
|  | <li>Signaling <code>NaN</code> never fault.</li> | 
|  | <li><p class="first">Fast-math optimizations are currently supported before <em>pexe</em> creation | 
|  | time. A <em>pexe</em> loses all fast-math information when it is | 
|  | created. Fast-math translation could be enabled at a later date, | 
|  | potentially at a perf-function granularity. This wouldn’t affect | 
|  | already-existing <em>pexe</em>; it would be an opt-in feature.</p> | 
|  | <ul class="small-gap"> | 
|  | <li>Fused-multiply-add have higher precision and often execute faster; | 
|  | PNaCl currently disallows them in the <em>pexe</em> because they aren’t | 
|  | supported on all platforms and can’t realistically be | 
|  | emulated. PNaCl could (but currently doesn’t) only generate them in | 
|  | the backend if fast-math were specified and the hardware supports | 
|  | the operation.</li> | 
|  | <li>Transcendentals aren’t exposed by PNaCl’s ABI; they are part of the | 
|  | math library that is included in the <em>pexe</em>. PNaCl could, but | 
|  | currently doesn’t, use hardware support if fast-math were provided | 
|  | in the <em>pexe</em>.</li> | 
|  | </ul> | 
|  | </li> | 
|  | </ul> | 
|  | <h2 id="computed-goto">Computed <code>goto</code></h2> | 
|  | <p>PNaCl supports computed <code>goto</code>, a non-standard GCC extension to C used | 
|  | by some interpreters, by lowering them to <code>switch</code> statements. The | 
|  | resulting use of <code>switch</code> might not be as fast as the original | 
|  | indirect branches. If you are compiling a program that has a | 
|  | compile-time option for using computed <code>goto</code>, it’s possible that the | 
|  | program will run faster with the option turned off (e.g., if the program | 
|  | does extra work to take advantage of computed <code>goto</code>).</p> | 
|  | <p>NaCl supports computed <code>goto</code> without any transformation.</p> | 
|  | <h2 id="future-directions">Future Directions</h2> | 
|  | <h3 id="inter-process-communication">Inter-Process Communication</h3> | 
|  | <p>Inter-process communication through shared memory is currently not | 
|  | supported by PNaCl/NaCl. When implemented, it may be limited to | 
|  | operations which are lock-free on the current platform (<code>is_lock_free</code> | 
|  | methods). It will rely on the address-free properly discussed in <a class="reference internal" href="#memory-model-for-concurrent-operations">Memory | 
|  | Model for Concurrent Operations</a>.</p> | 
|  | <h3 id="posix-style-signal-handling">POSIX-style Signal Handling</h3> | 
|  | <p>POSIX-style signal handling really consists of two different features:</p> | 
|  | <ul class="small-gap"> | 
|  | <li><p class="first"><strong>Hardware exception handling</strong> (synchronous signals): The ability | 
|  | to catch hardware exceptions (such as memory access faults and | 
|  | division by zero) using a signal handler.</p> | 
|  | <p>PNaCl currently doesn’t support hardware exception handling.</p> | 
|  | <p>NaCl supports hardware exception handling via the | 
|  | <code><nacl/nacl_exception.h></code> interface.</p> | 
|  | </li> | 
|  | <li><p class="first"><strong>Asynchronous interruption of threads</strong> (asynchronous signals): The | 
|  | ability to asynchronously interrupt the execution of a thread, | 
|  | forcing the thread to run a signal handler.</p> | 
|  | <p>A similar feature is <strong>thread suspension</strong>: The ability to | 
|  | asynchronously suspend and resume a thread and inspect or modify its | 
|  | execution state (such as register state).</p> | 
|  | <p>Neither PNaCl nor NaCl currently support asynchronous interruption | 
|  | or suspension of threads.</p> | 
|  | </li> | 
|  | </ul> | 
|  | <p>If PNaCl were to support either of these, the interaction of | 
|  | <code>volatile</code> and atomics with same-thread signal handling would need | 
|  | to be carefully detailed.</p> | 
|  | </section> | 
|  |  | 
|  | {{/partials.standard_nacl_article}} |