| // Copyright 2017-2024 The Khronos Group. |
| // SPDX-License-Identifier: CC-BY-4.0 |
| |
| // Extensions to enable |
| // Must be included before the header and attribs.txt |
| include::{generated}/specattribs.adoc[] |
| |
| = The OpenCL^(TM)^ C Specification |
| :R: pass:q,r[^(R)^] |
| Khronos{R} OpenCL Working Group |
| :data-uri: |
| :icons: font |
| :toc2: |
| :toclevels: 3 |
| :max-width: 100% |
| :numbered: |
| :imagewidth: 800 |
| :fullimagewidth: width="800" |
| :source-highlighter: rouge |
| :source-language: opencl_c |
| :rouge-style: opencl.spec |
| :sectnumoffset: 5 |
| :docinfo: shared-header |
| :docinfodir: config |
| :title-logo-image: image:images/OpenCL.png[top="25%",width="55%"] |
| :description: OpenCL(TM) is an open, royalty-free standard for cross-platform \ |
| parallel programming of diverse accelerators. \ |
| This document describes the OpenCL C language. |
| |
| // Various special / math symbols. This is easier to edit with than Unicode. |
| include::config/attribs.txt[] |
| |
| // Attributes that are shared by OpenCL specifications. |
| include::config/opencl.asciidoc[] |
| |
| // Feature Dictionary |
| include::c/feature-dictionary.asciidoc[] |
| |
| // External Footnotes |
| include::c/footnotes.asciidoc[] |
| |
| <<<< |
| |
| include::copyrights.txt[] |
| |
| <<< |
| |
| // :numbered: |
| |
| :leveloffset: 1 |
| |
| |
| [[the-opencl-c-programming-language]] |
| = The OpenCL C Programming Language |
| |
| [NOTE] |
| ==== |
| This document starts at chapter 6 to keep the section numbers historically |
| consistent with previous versions of the OpenCL and OpenCL C Programming |
| Language specifications. |
| ==== |
| |
| This section describes the OpenCL C programming language. |
| The OpenCL C programming language may be used to write kernels that execute |
| on an OpenCL device. |
| |
| The OpenCL C programming language (also referred to as OpenCL C) is based |
| on the <<C99-spec,ISO/IEC 9899:1999 Programming languages - C>> specification |
| (also referred to as the C99 specification, or just C99), with extensions |
| and restrictions to support parallel kernels. |
| In addition, some features of OpenCL C are based on the <<C11-spec,ISO/IEC |
| 9899:2011 Information technology - Programming languages - C>> specification |
| (also referred to as the C11 specification, or just C11). |
| |
| This document describes the modifications and restrictions to C99 and C11 |
| in OpenCL C. |
| Please refer to the C99 specification for a detailed description of the |
| language grammar. |
| |
| [[unified-spec]] |
| == Unified Specification |
| |
| This document specifies all versions of OpenCL C. |
| |
| There are several ways that an OpenCL C feature may be described in terms of |
| what versions of OpenCL C specify that feature. |
| |
| * Requires support for OpenCL C _major.minor_ or newer: Features that were |
| introduced in version _major.minor_. |
| Compilers for an earlier version of OpenCL C will not provide these |
| features. |
| ** In some instances the variation of "For OpenCL C _major.minor_ or newer" |
| is used, it has the identical meaning. |
| * Requires support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_feature_name} feature: |
| Features that were introduced in OpenCL C 2.0 as mandatory, but made |
| <<optional-functionality, optional>> in OpenCL C 3.0. |
| Compilers for versions of OpenCL C 1.2 or below will not provide these |
| features, compilers for OpenCL C 2.0 will provide these features, |
| compilers for OpenCL C 3.0 or newer may provide these features. |
| * Requires support for OpenCL C 3.0 or newer and the |
| {opencl_c_feature_name} feature: <<optional-functionality, |
| Optional>> features that were introduced in OpenCL C 3.0. |
| Compilers for an earlier version of OpenCL C will not provide these |
| features, compilers for OpenCL C 3.0 or newer may provide these features. |
| * Deprecated by OpenCL C _major.minor_: Features that were deprecated |
| in version _major.minor_, see the definition of deprecation in the |
| glossary of the main OpenCL specification. |
| * Universal: Features that have no mention of what version they are missing |
| before or deprecated by are specified for all versions of OpenCL C. |
| |
| [[optional-functionality]] |
| == Optional functionality |
| |
| Some language functionality is optional and will not be supported by all |
| devices. Such functionality is represented by optional language features or |
| language extensions. Support of optional functionality in OpenCL C is indicated |
| by the presence of special predefined macros. |
| |
| [[features]] |
| === Features |
| |
| IMPORTANT: Feature test macros <<unified-spec, require>> support for OpenCL C |
| 3.0 or newer. |
| |
| Optional core language features are described in this document. They are |
| optional from OpenCL C 3.0 onwards and therefore are not supported by all |
| implementations. When an OpenCL C 3.0 optional feature is supported, an |
| associated __feature test macro__ will be predefined. |
| |
| The following table describes OpenCL C 3.0 or newer features and their |
| meaning. The naming convention for the feature macros is |
| {opencl_c_feature_name}. |
| |
| Feature macro identifiers are used as names of features in this document. |
| |
| [[table-optional-lang-features]] |
| .Optional features in OpenCL C 3.0 or newer and their predefined macros. |
| [cols="1,1",options="header",] |
| |==== |
| | Feature Macro/Name | Brief Description |
| |
| | {opencl_c_3d_image_writes} |
| | The OpenCL C compiler supports built-in functions for writing to 3D image |
| objects. |
| |
| OpenCL C compilers that define the feature macro {opencl_c_3d_image_writes} |
| must also define the feature macro {opencl_c_images}. |
| |
| | {opencl_c_atomic_order_acq_rel} |
| | The OpenCL C compiler supports enumerations and built-in functions for atomic |
| operations with acquire and release memory consistency orders. |
| |
| | {opencl_c_atomic_order_seq_cst} |
| | The OpenCL C compiler supports enumerations and built-in functions for atomic |
| operations and fences with sequentially consistent memory consistency order. |
| |
| | {opencl_c_atomic_scope_device} |
| | The OpenCL C compiler supports enumerations and built-in functions for atomic |
| operations and fences with device memory scope. |
| |
| | {opencl_c_atomic_scope_all_devices} |
| | The OpenCL C compiler supports enumerations and built-in functions for atomic |
| operations and fences with all with memory scope across all devices that can |
| share SVM memory with each other and the host process. |
| |
| | {opencl_c_device_enqueue} |
| | The OpenCL C compiler supports built-in functions to enqueue additional work |
| from the device. |
| |
| OpenCL C compilers that define the feature macro {opencl_c_device_enqueue} must also |
| define {opencl_c_generic_address_space} and {opencl_c_program_scope_global_variables} |
| feature macros. |
| |
| | {opencl_c_generic_address_space} |
| | The OpenCL C compiler supports the unnamed generic address space. |
| |
| | {opencl_c_fp64} |
| | The OpenCL C compiler supports types and built-in functions with 64-bit |
| floating-point types. |
| |
| | {opencl_c_images} |
| | The OpenCL C compiler supports types and built-in functions for images. |
| |
| | {opencl_c_int64} |
| | The OpenCL C compiler supports types and built-in functions with 64-bit |
| integers. |
| |
| OpenCL C compilers for FULL profile devices or devices with 64-bit pointers |
| must always define the {opencl_c_int64} feature macro. |
| |
| | {opencl_c_pipes} |
| | The OpenCL C compiler supports the pipe specifier and built-in functions |
| to read and write from a pipe. |
| |
| OpenCL C compilers that define the feature macro {opencl_c_pipes} must |
| also define the feature macro {opencl_c_generic_address_space}. |
| |
| | {opencl_c_program_scope_global_variables} |
| | The OpenCL C compiler supports program scope variables in the global address |
| space. |
| |
| | {opencl_c_read_write_images} |
| | The OpenCL C compiler supports reading from and writing to the same image |
| object in a kernel. |
| |
| OpenCL C compilers that define the feature macro |
| {opencl_c_read_write_images} must also define the feature macro |
| {opencl_c_images}. |
| |
| | {opencl_c_subgroups} |
| | The OpenCL C compiler supports built-in functions operating on sub-groupings |
| of work-items. |
| |
| | {opencl_c_work_group_collective_functions} |
| | The OpenCL C compiler supports built-in functions that perform collective |
| operations across a work-group. |
| |
| ifdef::cl_khr_integer_dot_product[] |
| | {opencl_c_integer_dot_product_input_4x8bit_packed} + |
| (when the {cl_khr_integer_dot_product} extension macro is defined) |
| |
| | The OpenCL C compiler supports built-in functions that perform dot |
| products on 4x8 bit packed integer vectors |
| |
| | {opencl_c_integer_dot_product_input_4x8bit} + |
| (when the {cl_khr_integer_dot_product} extension macro is defined) |
| | The OpenCL C compiler supports built-in functions that perform dot |
| products on 4x8 bit integer vectors |
| endif::cl_khr_integer_dot_product[] |
| |
| |==== |
| |
| In OpenCL C 3.0 or newer, feature macros must expand to the value `1` if the |
| feature macro is defined by the OpenCL C compiler. A feature macro must not be |
| defined if the feature is not supported by the OpenCL C compiler. A feature |
| macro may expand to a different value in the future, but if this occurs the |
| value of the feature macro must compare greater than the prior value of the |
| feature macro. |
| |
| As specified in <<C99-spec,section 7.1.3 of the C99 Specification>> double |
| underscore identifiers are reserved and therefore implementations |
| for earlier OpenCL C versions are allowed to define feature test macros |
| but they are not required to do so. This means that applications which |
| target earlier OpenCL C versions should not rely on the presence of |
| feature test macros because there is no guarantee that feature test macros |
| will be defined and that if defined they will indicate the presence of the |
| corresponding optional functionality. |
| |
| |
| [[extensions]] |
| === Extensions |
| |
| Other optional functionality may be described by language extensions to OpenCL |
| C. Extensions are described in the <<opencl-extension-spec,OpenCL Extension |
| Specification>>. When an OpenCL C extension is supported an associated |
| __extension macro__ will be predefined. Please refer to the OpenCL Extension |
| Specification for more information about predefined extension macros. |
| |
| Prior to OpenCL C 3.0, support for some optional core language features was |
| indicated using predefined extension macros. |
| |
| When an optional core language feature began as an extension it may have both an |
| associated feature macro and an associated extension macro. If an optional core |
| language feature was an optional extension to an earlier version of OpenCL C it |
| can still be used as an extension, i.e. the same predefined extension macros are |
| still valid in OpenCL C 3.0 or newer, however the use of feature macros is |
| preferred whenever possible. |
| |
| |
| ifdef::cl_khr_3d_image_writes[] |
| [[cl_khr_3d_image_writes,cl_khr_3d_image_writes]] |
| ==== 3D Image Writes |
| |
| The `cl_khr_3d_image_writes` extension was promoted to OpenCL 2.0, and to |
| OpenCL 3.0 as the {opencl_c_3d_image_writes} feature. |
| The extension adds <<built-in-image-write-functions, Built-in Image Write |
| Functions>> that allow a kernel to write to 3D image objects in addition to |
| 2D image objects. |
| endif::cl_khr_3d_image_writes[] |
| |
| |
| ifdef::cl_khr_async_work_group_copy_fence[] |
| [[cl_khr_async_work_group_copy_fence,cl_khr_async_work_group_copy_fence]] |
| ==== Async Work-group Copy Fence |
| |
| The `cl_khr_async_work_group_copy_fence` extension supports establishing a |
| memory synchronization ordering of asynchronous copies. |
| The extension provides the `async_work_group_copy_fence` function, as |
| described in the <<table-builtin-async-copy, Built-in Async Copy and |
| Prefetch Functions>> table |
| endif::cl_khr_async_work_group_copy_fence[] |
| |
| |
| ifdef::cl_khr_byte_addressable_store[] |
| [[cl_khr_byte_addressable_store,cl_khr_byte_addressable_store]] |
| ==== Byte-Addressable Storage |
| |
| The `cl_khr_byte_addressable_store` extension was promoted to OpenCL C 1.1. |
| The extension relaxes <<restrictions>> on pointers to `char`, `uchar`, |
| `char2`, `uchar2`, `short`, `ushort` and `half`, allowing applications to |
| read from and write to pointers to these types. |
| endif::cl_khr_byte_addressable_store[] |
| |
| |
| ifdef::cl_khr_depth_images[] |
| [[cl_khr_depth_images,cl_khr_depth_images]] |
| ==== Depth Images |
| |
| The `cl_khr_depth_images` extension was promoted to OpenCL 2.0. |
| The extension provides new <<table-other-builtin-types, built-in depth image |
| types>>, as well as <<table-image-read, read functions>>, |
| <<table-image-samplerless-read, sampler-less read functions>>, |
| <<table-image-write, write functions>>, and <<table-image-query, image |
| queries>> operating on those types. |
| endif::cl_khr_depth_images[] |
| |
| |
| ifdef::cl_khr_device_enqueue_local_arg_types[] |
| [[cl_khr_device_enqueue_local_arg_types,cl_khr_device_enqueue_local_arg_types]] |
| ==== Device Enqueue Local Argument Types |
| |
| The `cl_khr_device_enqueue_local_arg_types` extension allows arguments to |
| blocks that are passed to the <<table-builtin-kernel-enqueue, Built-in |
| Kernel Enqueue Functions>> and to the <<table-builtin-kernel-query, Built-in |
| Kernel Query Functions>> to be pointers to any type (built-in or |
| user-defined) in local memory, instead of requiring arguments to blocks to |
| be pointers to `void` in local memory. |
| endif::cl_khr_device_enqueue_local_arg_types[] |
| |
| |
| ifdef::cl_khr_extended_async_copies[] |
| [[cl_khr_extended_async_copies,cl_khr_extended_async_copies]] |
| ==== Extended Async Copy Functions |
| |
| The `cl_khr_extended_async_copies` extension provides additional |
| <<extended-async-copies, Extended Async Copy Functions>> which interpret the |
| source and destination as 2D or 3D images. |
| endif::cl_khr_extended_async_copies[] |
| |
| |
| ifdef::cl_khr_extended_bit_ops[] |
| [[cl_khr_extended_bit_ops,cl_khr_extended_bit_ops]] |
| ==== Extended Bit Operations |
| |
| The `cl_khr_extended_bit_ops` extension provides additional |
| <<extended-bit-operations, Extended Bit Operations>> including bitfield |
| insert, bitfield extract, and bit reverse. |
| endif::cl_khr_extended_bit_ops[] |
| |
| |
| ifdef::cl_khr_fp16[] |
| [[cl_khr_fp16,cl_khr_fp16]] |
| ==== Half-Precision Floating-Point |
| |
| The `cl_khr_fp16` extension was promoted to OpenCL C 1.2 as an optional |
| feature, and to OpenCL 3.0 as the optional {cl_khr_fp16} feature. |
| The extension provides 16-bit precision scalar and vector floating-point |
| data types and extends many functions to accept these types. |
| endif::cl_khr_fp16[] |
| |
| |
| ifdef::cl_khr_fp64[] |
| [[cl_khr_fp64,cl_khr_fp64]] |
| ==== Double-Precision Floating-Point |
| |
| The `cl_khr_fp64` extension was promoted to OpenCL C 1.2 as an optional |
| feature, and to OpenCL 3.0 as the optional {cl_khr_fp64} feature. |
| The extension provides double-precision scalar and vector floating-point |
| data types and extends many functions to accept these types. |
| endif::cl_khr_fp64[] |
| |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| [[cl_khr_gl_msaa_sharing,cl_khr_gl_msaa_sharing]] |
| ==== Multi-Sample Shared OpenCL/OpenGL Images |
| |
| The `cl_khr_gl_msaa_sharing` extension adds support for multi-sample images |
| shared with OpenGL multi-sample textures. |
| The extension provides new <<table-other-builtin-types, built-in multisample |
| image types>>, as well as <<table-image-samplerless-read, sampler-less read |
| functions>> and <<table-image-query, image queries>> operating on those |
| types. |
| endif::cl_khr_gl_msaa_sharing[] |
| |
| |
| ifdef::cl_khr_global_int32_base_atomics[] |
| [[cl_khr_global_int32_base_atomics,cl_khr_global_int32_base_atomics]] |
| ==== Global 32-Bit Base Atomics |
| |
| The `cl_khr_global_int32_base_atomics` extension was promoted to OpenCL C |
| 1.1, with the supported functions renamed to use the **atomic_** prefix |
| rather than the **atom_** prefix. |
| The extension provides base atomic functions for {global} variables, as |
| described in the <<table-atomic-function-extensions, Atomic Function |
| Extensions>> table. |
| |
| endif::cl_khr_global_int32_base_atomics[] |
| |
| |
| ifdef::cl_khr_global_int32_extended_atomics[] |
| [[cl_khr_global_int32_extended_atomics,cl_khr_global_int32_extended_atomics]] |
| ==== Global 32-Bit Extended Atomics |
| |
| The `cl_khr_global_int32_extended_atomics` extension was promoted to OpenCL |
| C 1.1, with the supported functions renamed to use the **atomic_** prefix |
| rather than the **atom_** prefix. |
| The extension provides extended atomic functions for {global} variables, as |
| described in the <<table-atomic-function-extensions, Atomic Function |
| Extensions>> table. |
| |
| endif::cl_khr_global_int32_extended_atomics[] |
| |
| |
| ifdef::cl_khr_initialize_memory[] |
| [[cl_khr_initialize_memory,cl_khr_initialize_memory]] |
| ==== Initializing Memory |
| |
| The `cl_khr_initialize_memory` extension allows creating a context which |
| initializes specified types (local or private) of memory prior to the start |
| of kernel execution. |
| |
| There is one <<restrictions-initialize-memory, restriction>> on the timing |
| of this initialization discussed in this document, although most of the |
| extension is defined by the OpenCL 3.0 API Specification. |
| endif::cl_khr_initialize_memory[] |
| |
| |
| ifdef::cl_khr_int64_base_atomics[] |
| [[cl_khr_int64_base_atomics,cl_khr_int64_base_atomics]] |
| ==== 64-Bit Base Atomics |
| |
| The `cl_khr_int64_base_atomics` extension provides base atomic functions for |
| {global} and {local} 64-bit signed and unsigned integer variables, as |
| described in the <<table-atomic-int64-base, Built-in 64-Bit Base Atomic |
| Functions>> table. |
| endif::cl_khr_int64_base_atomics[] |
| |
| |
| ifdef::cl_khr_int64_extended_atomics[] |
| [[cl_khr_int64_extended_atomics,cl_khr_int64_extended_atomics]] |
| ==== 64-Bit Extended Atomics |
| |
| The `cl_khr_int64_extended_atomics` extension provides extended atomic functions for |
| {global} and {local} 64-bit signed and unsigned integer variables, as |
| described in the <<table-atomic-int64-extended, Built-in 64-Bit Extended Atomic |
| Functions>> table. |
| endif::cl_khr_int64_extended_atomics[] |
| |
| |
| ifdef::cl_khr_integer_dot_product[] |
| [[cl_khr_integer_dot_product,cl_khr_integer_dot_product]] |
| ==== Integer Dot Product |
| |
| The `cl_khr_integer_dot_product` extension adds support for SPIR-V |
| instructions and OpenCL C built-in functions to compute the dot product of |
| vectors of integers. |
| The extension provides new <<table-builtin-functions, built-in vector |
| integer argument functions>> operating on these types. |
| endif::cl_khr_integer_dot_product[] |
| |
| |
| ifdef::cl_khr_local_int32_base_atomics[] |
| [[cl_khr_local_int32_base_atomics,cl_khr_local_int32_base_atomics]] |
| ==== Local 32-Bit Base Atomics |
| |
| The `cl_khr_local_int32_base_atomics` extension was promoted to OpenCL C |
| 1.1, with the supported functions renamed to use the **atomic_** prefix |
| rather than the **atom_** prefix. |
| The extension provides base atomic functions for {local} variables, as |
| described in the <<table-atomic-function-extensions, Atomic Function |
| Extensions>> table. |
| |
| endif::cl_khr_local_int32_base_atomics[] |
| |
| |
| ifdef::cl_khr_local_int32_extended_atomics[] |
| [[cl_khr_local_int32_extended_atomics,cl_khr_local_int32_extended_atomics]] |
| ==== Local 32-Bit Extended Atomics |
| |
| The `cl_khr_local_int32_extended_atomics` extension was promoted to OpenCL |
| C 1.1, with the supported functions renamed to use the **atomic_** prefix |
| rather than the **atom_** prefix. |
| The extension provides extended atomic functions for {local} variables, as |
| described in the <<table-atomic-function-extensions, Atomic Function |
| Extensions>> table. |
| |
| endif::cl_khr_local_int32_extended_atomics[] |
| |
| |
| ifdef::cl_khr_mipmap_image[] |
| [[cl_khr_mipmap_image,cl_khr_mipmap_image]] |
| ==== Mipmapped Image Reads and Queries |
| |
| The `cl_khr_mipmap_image` extension adds support for mipmap images. |
| The extension provides built-in <<built-in-image-read-functions, image |
| read>> and <<built-in-image-query-functions, image query>> functions |
| operating on these images. |
| endif::cl_khr_mipmap_image[] |
| |
| |
| ifdef::cl_khr_mipmap_image_writes[] |
| [[cl_khr_mipmap_image_writes,cl_khr_mipmap_image_writes]] |
| ==== Mipmapped Image Writes |
| |
| The `cl_khr_mipmap_image_writes` extension adds support for writing to |
| mipmap images, and requires support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| The extension provides built-in <<built-in-image-write-functions, image |
| write>> functions operating on these images. |
| endif::cl_khr_mipmap_image_writes[] |
| |
| |
| ifdef::cl_khr_select_fprounding_mode[] |
| [[cl_khr_select_fprounding_mode,cl_khr_select_fprounding_mode]] |
| ==== Select Floating-Point Rounding Mode |
| |
| The `cl_khr_select_fprounding_mode` extension allows <<select-rounding-mode, |
| specifying the floating-point rounding mode>> for an instruction or group of |
| instructions in the program source by use of a *#pragma*. |
| |
| The extension was deprecated in OpenCL 1.1 and its use is not recommended. |
| endif::cl_khr_select_fprounding_mode[] |
| |
| |
| ifdef::cl_khr_srgb_image_writes[] |
| [[cl_khr_srgb_image_writes,cl_khr_srgb_image_writes]] |
| ==== sRGB Image Write Functions |
| |
| The `cl_khr_srgb_image_writes` extension adds support for writing to sRGB |
| images using the <<built-in-image-write-functions, *write_imagef*>> |
| functions. Color space conversion is performed by the function. |
| endif::cl_khr_srgb_image_writes[] |
| |
| |
| ifdef::cl_khr_subgroups[] |
| [[cl_khr_subgroups,cl_khr_subgroups]] |
| ==== Sub-Groups |
| |
| The `cl_khr_subgroups` extension was promoted to OpenCL C 2.1 as the |
| {opencl_c_subgroups} feature. |
| The extension provides the following functions: |
| |
| * <<table-subgroup-work-item-functions, Built-in Work-Item Functions for |
| Sub-Groups>> |
| * <<table-synchronization-functions, Built-in Synchronization Functions |
| for Sub-Groups>> |
| * <<table-collective-functions, Built-in Collective Functions for |
| Sub-Groups>> |
| * <<table-pipe-functions, Built-in Pipe Functions for Sub-Groups>> |
| * <<table-kernel-query-functions, Built-in Kernel Query Functions for |
| Sub-Groups>> |
| * The <<table-memory-scopes, `memory_scope_sub_group`>> type and |
| <<atomic-restrictions, associated restrictions>> |
| endif::cl_khr_subgroups[] |
| |
| |
| ifdef::cl_khr_subgroup_ballot[] |
| [[cl_khr_subgroup_ballot,cl_khr_subgroup_ballot]] |
| ==== Sub-Group Ballots |
| |
| The `cl_khr_subgroup_ballot` extension adds the ability to collect and |
| operate on ballots from work items in a sub-group. |
| The extension provides the following functions: |
| |
| * <<table-ballot-functions, Built-in Ballot Functions for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_ballot[] |
| |
| |
| ifdef::cl_khr_subgroup_clustered_reduce[] |
| [[cl_khr_subgroup_clustered_reduce,cl_khr_subgroup_clustered_reduce]] |
| ==== Clustered Reductions |
| |
| The `cl_khr_subgroup_clustered_reduce` extension adds support for clustered |
| reductions that operate on a subset of work items in the sub-group. |
| The extension provides the following functions: |
| |
| * <<table-clustered-reduce-math-functions, Built-in Arithmetic Functions |
| for Sub-Groups>> |
| * <<table-clustered-reduce-bitwise-functions, Built-in Bitwise Functions |
| for Sub-Groups>> |
| * <<table-clustered-reduce-logical-functions, Built-in Logical Functions |
| for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_clustered_reduce[] |
| |
| |
| ifdef::cl_khr_subgroup_extended_types[] |
| [[cl_khr_subgroup_extended_types,cl_khr_subgroup_extended_types]] |
| ==== Sub-Group Extended Types |
| |
| The `cl_khr_subgroup_extended_types` extension adds <<sub-group-functions, |
| additional supported data types>> to the existing |
| <<table-collective-functions, sub-group broadcast, scan, and reduction |
| functions>>. |
| |
| endif::cl_khr_subgroup_extended_types[] |
| |
| |
| ifdef::cl_khr_subgroup_non_uniform_arithmetic[] |
| [[cl_khr_subgroup_non_uniform_arithmetic,cl_khr_subgroup_non_uniform_arithmetic]] |
| ==== Built-in Non-Uniform Arithmetic Functions for Sub-Groups |
| |
| The `cl_khr_subgroup_non_uniform_arithmetic` extension adds the ability to |
| use some sub-group functions within non-uniform flow control, including |
| additional scan and reduction operators. |
| |
| The extension provides the following functions: |
| |
| * <<table-non-uniform-math-functions, Built-in Non-Uniform Arithmetic |
| Functions for Sub-Groups>> |
| * <<table-non-uniform-bitwise-functions, Built-in Non-Uniform Bitwise |
| Functions for Sub-Groups>> |
| * <<table-non-uniform-logical-functions, Built-in Non-Uniform Logical |
| Functions for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_non_uniform_arithmetic[] |
| |
| |
| ifdef::cl_khr_subgroup_non_uniform_vote[] |
| [[cl_khr_subgroup_non_uniform_vote,cl_khr_subgroup_non_uniform_vote]] |
| ==== Built-in Non-Uniform Vote and Election Functions for Sub-Groups |
| |
| The `cl_khr_subgroup_non_uniform_vote` extension adds the ability to elect a |
| single work item from a sub-group to perform a task and to hold votes among |
| work items in a sub-group. |
| |
| The extension provides the following functions: |
| |
| * <<table-non-uniform-vote-functions, Built-in Non-Uniform Vote Functions |
| for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_non_uniform_vote[] |
| |
| |
| ifdef::cl_khr_subgroup_rotate[] |
| [[cl_khr_subgroup_rotate,cl_khr_subgroup_rotate]] |
| ==== Sub-Group Rotation |
| |
| The `cl_khr_subgroup_rotate` extension adds support for a new sub-group data |
| exchange operation that makes it possible to rotate values through the work |
| items in a sub-group. |
| |
| The extension provides the following functions: |
| |
| * <<table-rotate-functions, Built-in Rotation Functions for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_rotate[] |
| |
| |
| ifdef::cl_khr_subgroup_shuffle[] |
| [[cl_khr_subgroup_shuffle,cl_khr_subgroup_shuffle]] |
| ==== General Purpose Shuffles |
| |
| The `cl_khr_subgroup_shuffle` extension adds additional ways to exchange |
| data among work items in a sub-group. |
| |
| The extension provides the following functions: |
| |
| * <<table-shuffle-functions, Built-in Shuffle Functions for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_shuffle[] |
| |
| |
| ifdef::cl_khr_subgroup_shuffle_relative[] |
| [[cl_khr_subgroup_shuffle_relative,cl_khr_subgroup_shuffle_relative]] |
| ==== Relative Shuffles |
| |
| The `cl_khr_subgroup_shuffle_relative` extension adds specialized ways to |
| exchange data among work items in a sub-group that may perform better on |
| some implementations. |
| |
| The extension provides the following functions: |
| |
| * <<table-shuffle-relative-functions, Built-in Relative Shuffle Functions |
| for Sub-Groups>> |
| |
| endif::cl_khr_subgroup_shuffle_relative[] |
| |
| |
| ifdef::cl_khr_work_group_uniform_arithmetic[] |
| [[cl_khr_work_group_uniform_arithmetic,cl_khr_work_group_uniform_arithmetic]] |
| ==== Work-group Collective Uniform Arithmetic Functions |
| |
| The `cl_khr_work_group_uniform_arithmetic` extension adds additional |
| work-group collective functions, including work-group scans and reductions |
| for the following operators: |
| |
| * Logical operations (`and`, `or`, and `xor`). |
| * Bitwise operations (`and`, `or`, and `xor`). |
| * Integer multiplication (`mul`). |
| * Floating-point multiplication (`mul`). |
| |
| The extension provides the following functions: |
| |
| * <<table-builtin-work-group-logical, Built-in Work-group Logical |
| Arithmetic Functions>> |
| * <<table-builtin-work-group-bitwise-integer, Built-in Work-group Bitwise |
| Integer Functions>> |
| * <<table-builtin-work-group-multiplicative, Built-in Work-group |
| Multiplicative Functions>> |
| endif::cl_khr_work_group_uniform_arithmetic[] |
| |
| |
| [[supported-data-types]] |
| == Supported Data Types |
| |
| The following data types are supported. |
| |
| |
| [[built-in-scalar-data-types]] |
| === Built-in Scalar Data Types |
| |
| [open,refpage='scalarDataTypes',desc='Built-in Scalar Data Types',type='freeform',spec='clang',anchor='built-in-scalar-data-types',xrefs='alignmentOfDataTypes halfDataType otherDataTypes reservedDataTypes vectorDataTypes'] |
| -- |
| |
| The following table describes the list of built-in scalar data types. |
| |
| [[table-builtin-scalar-types]] |
| .Built-in Scalar Data Types |
| [cols=",",options="header",] |
| |==== |
| | Type | Description |
| | `bool` footnote:[{fn-bool}] |
| | A conditional data type which is either _true_ or _false_. |
| The value _true_ expands to the integer constant 1 and the value |
| _false_ expands to the integer constant 0. |
| | `char` |
| | A signed two's complement 8-bit integer. |
| | `unsigned char`, `uchar` |
| | An unsigned 8-bit integer. |
| | `short` |
| | A signed two's complement 16-bit integer. |
| | `unsigned short`, `ushort` |
| | An unsigned 16-bit integer. |
| | `int` |
| | A signed two's complement 32-bit integer. |
| | `unsigned int`, `uint` |
| | An unsigned 32-bit integer. |
| | `long` footnote:long[{fn-long}] |
| | A signed two's complement 64-bit integer. |
| | `unsigned long`, `ulong` footnote:long[] |
| | An unsigned 64-bit integer. |
| | `float` |
| | A 32-bit floating-point number. |
| The `float` data type must conform to the IEEE 754 single precision |
| storage format. |
| | `double` footnote:[{fn-double}] |
| | A 64-bit floating-point number. |
| The `double` data type must conform to the IEEE 754 double-precision |
| storage format. |
| |
| <<unified-spec, Requires>> support for <<double-precision-support, |
| double-precision>>. |
| | `half` |
| | A 16-bit floating-point number. |
| The `half` data type must conform to the IEEE 754-2008 half-precision |
| storage format. |
| | `size_t` footnote:size_t[{fn-size_t}] |
| | The unsigned integer type of the result of the `sizeof` operator. |
| | `ptrdiff_t` footnote:size_t[] |
| | A signed integer type that is the result of subtracting two |
| pointers. |
| | `intptr_t` footnote:size_t[] |
| | A signed integer type with the property that any valid pointer to |
| `void` can be converted to this type, then converted back to pointer |
| to `void`, and the result will compare equal to the original pointer. |
| | `uintptr_t` footnote:size_t[] |
| | An unsigned integer type with the property that any valid pointer |
| to `void` can be converted to this type, then converted back to |
| pointer to `void`, and the result will compare equal to the original |
| pointer. |
| | `void` |
| | The `void` type comprises an empty set of values; it is an incomplete |
| type that cannot be completed. |
| |==== |
| |
| Most built-in scalar data types are also declared as appropriate types in |
| the OpenCL API (and header files) that can be used by an application. |
| The following table describes the built-in scalar data type in the OpenCL C |
| programming language and the corresponding data type available to the |
| application: |
| |
| [cols=",",options="header",] |
| |==== |
| | Type in OpenCL Language | API type for application |
| | `bool` | n/a |
| | `char` | `cl_char` |
| | `unsigned char`, `uchar` | `cl_uchar` |
| | `short` | `cl_short` |
| | `unsigned short`, `ushort` | `cl_ushort` |
| | `int` | `cl_int` |
| | `unsigned int`, `uint` | `cl_uint` |
| | `long` | `cl_long` |
| | `unsigned long`, `ulong` | `cl_ulong` |
| | `float` | `cl_float` |
| | `double` | `cl_double` footnote:[{fn-cl_double}] |
| | `half` | `cl_half` |
| | `size_t` | n/a |
| | `ptrdiff_t` | n/a |
| | `intptr_t` | n/a |
| | `uintptr_t` | n/a |
| | `void` | `void` |
| |==== |
| -- |
| |
| |
| [[double-precision-support]] |
| ==== Double-Precision Floating-Point Support |
| |
| Double-precision floating-point is supported if |
| ifdef::cl_khr_fp64[the `<<cl_khr_fp64>>` extension macro is supported, or if] |
| OpenCL 1.2 or newer is supported. |
| In OpenCL 3.0, it also requires support for the {opencl_c_fp64} feature, |
| |
| If double-precision is not supported, implementations may |
| implicitly cast double-precision floating-point literals to |
| single-precision literals. The use of double-precision literals without |
| double-precision support should result in a diagnostic. |
| |
| |
| [[the-half-data-type]] |
| ==== The `half` Data Type |
| |
| [open,refpage='halfDataType',desc='The half Data Type',type='freeform',spec='clang',anchor='the-half-data-type',xrefs='alignmentOfDataTypes otherDataTypes reservedDataTypes scalarDataTypes vectorDataTypes'] |
| -- |
| |
| The `half` data type must be IEEE 754-2008 compliant. |
| `half` numbers have 1 sign bit, 5 exponent bits, and 10 mantissa bits. |
| The interpretation of the sign, exponent and mantissa is analogous to IEEE |
| 754 floating-point numbers. |
| The exponent bias is 15. |
| The `half` data type must represent finite and normal numbers, denormalized |
| numbers, infinities and NaN. |
| Denormalized numbers for the `half` data type which may be generated when |
| converting a `float` to a `half` using *vstore_half* and converting a `half` |
| to a `float` using *vload_half* cannot be flushed to zero. |
| Conversions from `float` to `half` correctly round the mantissa to 11 bits |
| of precision. |
| Conversions from `half` to `float` are lossless; all `half` numbers are |
| exactly representable as `float` values. |
| Conversions from `double` to `half` are correctly rounded. |
| Conversions from `half` to `double` are lossless. |
| |
| The `half` data type can only be used to declare a pointer to a buffer that |
| contains `half` values. |
| A few valid examples are given below: |
| |
| [source,opencl_c] |
| ---------- |
| void |
| bar (__global half *p) |
| { |
| ... |
| } |
| |
| __kernel void |
| foo (__global half *pg, __local half *pl) |
| { |
| __global half *ptr; |
| int offset; |
| |
| ptr = pg + offset; |
| bar(ptr); |
| } |
| ---------- |
| |
| Below are some examples that are not valid usage of the `half` type: |
| |
| [source,opencl_c] |
| ---------- |
| half a; |
| half b[100]; |
| half *p; |
| a = *p; // not allowed. must use *vload_half* function |
| ---------- |
| |
| Loads from a pointer to a `half` and stores to a pointer to a `half` can be |
| performed using the <<vector-data-load-and-store-functions,vector data load |
| and store functions>> *vload_half*, *vload_half__n__*, *vloada_halfn* and |
| *vstore_half*, *vstore_half__n__*, and *vstorea_halfn*. |
| The load functions read scalar or vector `half` values from memory and |
| convert them to a scalar or vector `float` value. |
| The store functions take a scalar or vector `float` value as input, convert |
| it to a `half` scalar or vector value (with appropriate rounding mode) and |
| write the `half` scalar or vector value to memory. |
| -- |
| |
| |
| [[built-in-vector-data-types]] |
| === Built-in Vector Data Types |
| |
| [open,refpage='vectorDataTypes',desc='Built-in Vector Data Types',type='freeform',spec='clang',anchor='built-in-vector-data-types',xrefs='alignmentOfDataTypes otherDataTypes reservedDataTypes scalarDataTypes'] |
| -- |
| |
| The `char`, `unsigned char`, `short`, `unsigned short`, `int`, `unsigned int`, |
| `long`, `unsigned long`, `float` and `double` vector data types are supported. |
| footnote:[{fn-vector-types}] |
| The vector data type is defined with the type name, i.e. `char`, `uchar`, |
| `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `float`, or `double` |
| followed by a literal value _n_ that defines the number of elements in the |
| vector. |
| Supported values of _n_ are 2, 3, 4, 8, and 16 for all vector data types. |
| |
| NOTE: Vector types with three elements, i.e. where _n_ is 3, <<unified-spec, |
| require>> support for OpenCL C 1.1 or newer. |
| |
| The following table describes the list of built-in vector data types. |
| |
| [[table-builtin-vector-types]] |
| .Built-in Vector Data Types |
| [cols=",",options="header",] |
| |==== |
| | Type | Description |
| | `char__n__` |
| | A vector of _n_ 8-bit signed two's complement integer values. |
| | `uchar__n__` |
| | A vector of _n_ 8-bit unsigned integer values. |
| | `short__n__` |
| | A vector of _n_ 16-bit signed two's complement integer values. |
| | `ushort__n__` |
| | A vector of _n_ 16-bit unsigned integer values. |
| | `int__n__` |
| | A vector of _n_ 32-bit signed two's complement integer values. |
| | `uint__n__` |
| | A vector of _n_ 32-bit unsigned integer values. |
| | `long__n__` footnote:long-vec[{fn-long-vec}] |
| | A vector of _n_ 64-bit signed two's complement integer values. |
| | `ulong__n__` footnote:long-vec[] |
| | A vector of _n_ 64-bit unsigned integer values. |
| ifdef::cl_khr_fp16[] |
| | `half__n__` footnote:[{fn-half-supported}] |
| | A vector of _n_ 16-bit floating-point values. |
| endif::cl_khr_fp16[] |
| | `float__n__` |
| | A vector of _n_ 32-bit floating-point values. |
| | `double__n__` footnote:[{fn-double-vec}] |
| | A vector of _n_ 64-bit floating-point values. |
| |
| <<unified-spec, Requires>> support for <<double-precision-support, |
| double-precision>>. |
| |==== |
| |
| The built-in vector data types are also declared as appropriate types in the |
| OpenCL API (and header files) that can be used by an application. |
| The following table describes the built-in vector data type in the OpenCL C |
| programming language and the corresponding data type available to the |
| application: |
| |
| [cols=",",options="header",] |
| |==== |
| | Type in OpenCL Language | API type for application |
| | `char__n__` | `cl_char__n__` |
| | `uchar__n__` | `cl_uchar__n__` |
| | `short__n__` | `cl_short__n__` |
| | `ushort__n__` | `cl_ushort__n__` |
| | `int__n__` | `cl_int__n__` |
| | `uint__n__` | `cl_uint__n__` |
| | `long__n__` | `cl_long__n__` |
| | `ulong__n__` | `cl_ulong__n__` |
| ifdef::cl_khr_fp16[] |
| | `half__n__` | `cl_half__n__` |
| endif::cl_khr_fp16[] |
| | `float__n__` | `cl_float__n__` |
| | `double__n__` | `cl_double__n__` |
| |==== |
| -- |
| |
| |
| [[other-built-in-data-types]] |
| === Other Built-in Data Types |
| |
| [open,refpage='otherDataTypes',desc='Other Built-in Data Types',type='freeform',spec='clang',anchor='other-built-in-data-types',xrefs='alignmentOfDataTypes reservedDataTypes scalarDataTypes vectorDataTypes'] |
| -- |
| |
| The following table describes the list of additional data types supported by |
| OpenCL. |
| |
| [[table-other-builtin-types]] |
| .Other Built-in Data Types |
| [cols=",",options="header",] |
| |==== |
| | Type | Description |
| | `image2d_t` footnote:image-functions[{fn-image-functions}] |
| | A 2D image. |
| | `image3d_t` footnote:image-functions[] |
| | A 3D image. |
| | `image2d_array_t` footnote:image-functions[] |
| | A 2D image array. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | `image1d_t` footnote:image-functions[] |
| | A 1D image. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | `image1d_buffer_t` footnote:image-functions[] |
| | A 1D image created from a buffer object. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | `image1d_array_t` footnote:image-functions[] |
| | A 1D image array. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | `image2d_depth_t` footnote:image-functions[] |
| | A 2D depth image. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | `image2d_array_depth_t` footnote:image-functions[] |
| | A 2D depth image array. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | `sampler_t` footnote:image-functions[] |
| | A sampler type. |
| | `queue_t` |
| | A device command-queue. |
| This queue can only be used to enqueue commands from kernels executing |
| on the device. |
| |
| <<unifed-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_device_enqueue} feature. |
| | `ndrange_t` |
| | The N-dimensional range over which a kernel executes. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_device_enqueue} feature. |
| | `clk_event_t` |
| | A device-side event that identifies a command enqueued to |
| a device command-queue. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_device_enqueue} feature. |
| | `reserve_id_t` |
| | A reservation ID. |
| This opaque type is used to identify the reservation for |
| <<pipe-functions,reading and writing a pipe>>. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_pipes} feature. |
| | `event_t` |
| | An event. |
| This can be used to identify <<async-copies,async copies>> from |
| `global` to `local` memory and vice-versa. |
| | `cl_mem_fence_flags` |
| | This is a bitfield and can be 0 or a combination of the following |
| values ORed together: |
| |
| `CLK_GLOBAL_MEM_FENCE` + |
| `CLK_LOCAL_MEM_FENCE` + |
| `CLK_IMAGE_MEM_FENCE` |
| |
| These flags are described in detail in the |
| <<synchronization-functions, synchronization functions>> section. |
| ifdef::cl_khr_gl_msaa_sharing[] |
| | `image2d_msaa_t` |
| | A 2D multi-sample color image. |
| Refer to the <<built-in-image-sampler-less-read-functions, Built-in |
| Image Sampler-less Read Functions>> section for a detailed description |
| of the built-in functions that use this type. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| | `image2d_array_msaa_t` |
| | A 2D multi-sample color image array. |
| Refer to the <<built-in-image-sampler-less-read-functions, Built-in |
| Image Sampler-less Read Functions>> section for a detailed description |
| of the built-in functions that use this type. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| | `image2d_msaa_depth_t` |
| | A 2D multi-sample depth image. |
| Refer to the <<built-in-image-sampler-less-read-functions, Built-in |
| Image Sampler-less Read Functions>> section for a detailed description |
| of the built-in functions that use this type. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| | `image2d_array_msaa_depth_t` |
| | A 2D multi-sample depth image array. |
| Refer to the <<built-in-image-sampler-less-read-functions, Built-in |
| Image Sampler-less Read Functions>> section for a detailed description |
| of the built-in functions that use this type. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| endif::cl_khr_gl_msaa_sharing[] |
| |==== |
| |
| [NOTE] |
| ==== |
| The `image2d_t`, `image3d_t`, `image2d_array_t`, `image1d_t`, |
| `image1d_buffer_t`, `image1d_array_t`, `image2d_depth_t`, |
| `image2d_array_depth_t` and `sampler_t` types are only defined if the device |
| supports images, i.e. the value of the <<opencl-device-queries, |
| `CL_DEVICE_IMAGE_SUPPORT` device query>>) is `CL_TRUE`. |
| If this is the case then an OpenCL C 3.0 or newer compiler must also define |
| the {opencl_c_images} feature macro. |
| ==== |
| |
| The C99 derived types (arrays, structs, unions, functions, and pointers), |
| constructed from the built-in <<built-in-scalar-data-types,scalar>>, |
| <<built-in-vector-data-types,vector>>, and |
| <<other-built-in-data-types,other>> data types are supported, with specified |
| <<restrictions,restrictions>>. |
| |
| The following tables describe the other built-in data types in OpenCL described |
| in <<table-other-builtin-types,Other Built-in Data Types>> and the corresponding |
| data type available to the application: |
| |
| [cols=",",options="header",] |
| |==== |
| | Type in OpenCL C | API type for application |
| | `image2d_t` | `cl_mem` |
| | `image3d_t` | `cl_mem` |
| | `image2d_array_t` | `cl_mem` |
| | `image1d_t` | `cl_mem` |
| | `image1d_buffer_t` | `cl_mem` |
| | `image1d_array_t` | `cl_mem` |
| | `image2d_depth_t` | `cl_mem` |
| | `image2d_array_depth_t` | `cl_mem` |
| | `sampler_t` | `cl_sampler` |
| | `queue_t` | `cl_command_queue` |
| | `ndrange_t` | N/A |
| | `clk_event_t` | N/A |
| | `reserve_id_t` | N/A |
| | `event_t` | N/A |
| | `cl_mem_fence_flags` | N/A |
| |==== |
| -- |
| |
| |
| [[reserved-data-types]] |
| === Reserved Data Types |
| |
| [open,refpage='reservedDataTypes',desc='Reserved Data Types',type='freeform',spec='clang',anchor='reserved-data-types',xrefs='alignmentOfDataTypes otherDataTypes scalarDataTypes vectorDataTypes'] |
| -- |
| |
| The data type names described in the following table are reserved and cannot |
| be used by applications as type names. |
| The vector data type names defined in <<table-builtin-vector-types,Built-in |
| Vector Data Types>>, but where _n_ is any value other than 2, 3, 4, 8 and 16, |
| are also reserved. |
| |
| [[table-reserved-types]] |
| .Reserved Data Types |
| [cols=",",options="header",] |
| |==== |
| | Type | Description |
| | `bool__n__` |
| | A boolean vector. |
| | `half__n__` |
| | A 16-bit floating-point vector. |
| | `quad`, `quad__n__` |
| | A 128-bit floating-point scalar and vector. |
| | `complex half`, `complex half__n__` |
| | A complex 16-bit floating-point scalar and vector. |
| | `imaginary half`, `imaginary half__n__` |
| | An imaginary 16-bit floating-point scalar and vector. |
| | `complex float`, `complex float__n__` |
| | A complex 32-bit floating-point scalar and vector. |
| | `imaginary float`, `imaginary float__n__` |
| | An imaginary 32-bit floating-point scalar and vector. |
| | `complex double`, `complex double__n__` |
| | A complex 64-bit floating-point scalar and vector. |
| | `imaginary double`, `imaginary double__n__` |
| | An imaginary 64-bit floating-point scalar and vector. |
| | `complex quad`, `complex quad__n__` |
| | A complex 128-bit floating-point scalar and vector. |
| | `imaginary quad`, `imaginary quad__n__` |
| | An imaginary 128-bit floating-point scalar and vector. |
| | `float__n__x__m__` |
| | An _n_ {times} _m_ matrix of single precision floating-point values |
| stored in column-major order. |
| | `double__n__x__m__` |
| | An _n_ {times} _m_ matrix of double-precision floating-point values |
| stored in column-major order. |
| | `long double`, `long double__n__` |
| | A floating-point scalar and vector type with at least as much |
| precision and range as a `double` and no more precision and range than |
| a quad. |
| | `long long, long long__n__` |
| | A 128-bit signed integer scalar and vector. |
| | `unsigned long long`, |
| `ulong long`, |
| `ulong long__n__` |
| | A 128-bit unsigned integer scalar and vector. |
| |==== |
| -- |
| |
| |
| [[alignment-of-types]] |
| === Alignment of Types |
| |
| [open,refpage='alignmentOfDataTypes',desc='Alignment of Data Types',type='freeform',spec='clang',anchor='alignment-of-types',xrefs='otherDataTypes reservedDataTypes scalarDataTypes vectorDataTypes'] |
| -- |
| |
| A data item declared to be a data type in memory is always aligned to the |
| size of the data type in bytes. |
| For example, a `float4` variable will be aligned to a 16-byte boundary, a |
| `char2` variable will be aligned to a 2-byte boundary. |
| |
| For 3-component vector data types, the size of the data type is `4 * |
| sizeof(component)`. |
| This means that a 3-component vector data type will be aligned to a `4 * |
| sizeof(component)` boundary. |
| The *vload3* and *vstore3* built-in functions can be used to read and write, |
| respectively, 3-component vector data types from an array of packed scalar |
| data type. |
| |
| A built-in data type that is not a power of two bytes in size must be |
| aligned to the next larger power of two. |
| This rule applies to built-in types only, not structs or unions. |
| |
| The OpenCL compiler is responsible for aligning data items to the |
| appropriate alignment as required by the data type. |
| For arguments to a `{kernel}` function declared to be a pointer to a data |
| type, the OpenCL compiler can assume that the pointee is always |
| appropriately aligned as required by the data type. |
| The behavior of an unaligned load or store is undefined, except for the |
| <<vector-data-load-and-store-functions,vector data load and store |
| functions>> *vload__n__*, *vload_half__n__*, *vstore__n__*, and |
| *vstore_half__n__*. |
| The vector load functions can read a vector from an address aligned to the |
| element type of the vector. |
| The vector store functions can write a vector to an address aligned to the |
| element type of the vector. |
| -- |
| |
| |
| [[vector-literals]] |
| === Vector Literals |
| |
| Vector literals can be used to create vectors from a list of scalars, |
| vectors or a mixture thereof. |
| A vector literal can be used either as a vector initializer or as a primary |
| expression. |
| Whether a vector literal can be used as an l-value is implementation-defined. |
| |
| A vector literal is written as a parenthesized vector type followed by a |
| parenthesized comma delimited list of parameters. |
| A vector literal operates as an overloaded function. |
| The forms of the function that are available is the set of possible argument |
| lists for which all arguments have the same element type as the result |
| vector, and the total number of elements is equal to the number of elements |
| in the result vector. |
| In addition, a form with a single scalar of the same type as the element |
| type of the vector is available. |
| For example, the following forms are available for `float4`: |
| |
| [source,opencl_c] |
| ---------- |
| (float4)( float, float, float, float ) |
| (float4)( float2, float, float ) |
| (float4)( float, float2, float ) |
| (float4)( float, float, float2 ) |
| (float4)( float2, float2 ) |
| (float4)( float3, float ) |
| (float4)( float, float3 ) |
| (float4)( float ) |
| ---------- |
| |
| Operands are evaluated by standard rules for function evaluation, except |
| that implicit scalar widening shall not occur. |
| The order in which the operands are evaluated is undefined. |
| The operands are assigned to their respective positions in the result vector |
| as they appear in memory order. |
| That is, the first element of the first operand is assigned to `result.x`, |
| the second element of the first operand (or the first element of the second |
| operand if the first operand was a scalar) is assigned to `result.y`, etc. |
| In the case of the form that has a single scalar operand, the operand is |
| replicated across all lanes of the vector. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| uint4 u = (uint4)(1); // u will be (1, 1, 1, 1). |
| float4 f = (float4)((float2)(1.0f, 2.0f), (float2)(3.0f, 4.0f)); |
| float4 f = (float4)(1.0f, (float2)(2.0f, 3.0f), 4.0f); |
| float4 f = (float4)(1.0f, 2.0f); // error |
| ---------- |
| |
| |
| [[vector-components]] |
| === Vector Components |
| |
| The components of vector data types can be addressed as |
| `<vector_data_type>.xyzw`. |
| Vector data types with two or more components, such as `char2`, can access `.xy` elements. |
| Vector data types with three or more components, such as `uint3`, can access `.xyz` elements. |
| Vector data types with four or more components, such as `ulong4` or `float8`, can access `.xyzw` elements. |
| |
| In OpenCL C 3.0, the components of vector data types can also be addressed as |
| `<vector_data_type>.rgba`. |
| Vector data types with two or more components can access `.rg` elements. |
| Vector data types with three or more components can access `.rgb` elements. |
| Vector data types with four or more components can access `.rgba` elements. |
| |
| Accessing components beyond those declared for the vector type is an error |
| so, for example: |
| |
| [source,opencl_c] |
| ---------- |
| float2 coord; |
| coord.x = 1.0f; // is legal |
| coord.r = 1.0f; // is legal in OpenCL C 3.0 |
| coord.z = 1.0f; // is illegal, since coord only has two components |
| |
| float3 pos; |
| pos.z = 1.0f; // is legal |
| pos.b = 1.0f; // is legal in OpenCL C 3.0 |
| pos.w = 1.0f; // is illegal, since pos only has three components |
| ---------- |
| |
| The component selection syntax allows multiple components to be selected by |
| appending their names after the period (*.*). |
| |
| [source,opencl_c] |
| ---------- |
| float4 c; |
| |
| c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| c.z = 1.0f; |
| c.xy = (float2)(3.0f, 4.0f); |
| c.xyz = (float3)(3.0f, 4.0f, 5.0f); |
| ---------- |
| |
| The component selection syntax also allows components to be permuted or |
| replicated. |
| |
| [source,opencl_c] |
| ---------- |
| float4 pos = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| |
| float4 swiz= pos.wzyx; // swiz = (4.0f, 3.0f, 2.0f, 1.0f) |
| |
| float4 dup = pos.xxyy; // dup = (1.0f, 1.0f, 2.0f, 2.0f) |
| ---------- |
| |
| The component group notation can occur on the left hand side of an |
| expression. |
| To form an l-value, swizzling must be applied to an l-value of vector type, |
| contain no duplicate components, and it results in an l-value of scalar or |
| vector type, depending on number of components specified. |
| Each component must be a supported scalar or vector type. |
| |
| [source,opencl_c] |
| ---------- |
| float4 pos = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| |
| pos.xw = (float2)(5.0f, 6.0f);// pos = (5.0f, 2.0f, 3.0f, 6.0f) |
| pos.wx = (float2)(7.0f, 8.0f);// pos = (8.0f, 2.0f, 3.0f, 7.0f) |
| pos.xyz = (float3)(3.0f, 5.0f, 9.0f); // pos = (3.0f, 5.0f, 9.0f, 4.0f) |
| pos.xx = (float2)(3.0f, 4.0f);// illegal - 'x' used twice |
| |
| // illegal - mismatch between float2 and float4 |
| pos.xy = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| |
| float4 a, b, c, d; |
| float16 x; |
| x = (float16)(a, b, c, d); |
| x = (float16)(a.xxxx, b.xyz, c.xyz, d.xyz, a.yzw); |
| |
| // illegal - component a.xxxxxxx is not a valid vector type |
| x = (float16)(a.xxxxxxx, b.xyz, c.xyz, d.xyz); |
| ---------- |
| |
| Elements of vector data types can also be accessed using a numeric index to |
| refer to the appropriate element in the vector. |
| The numeric indices that can be used are given in the table below: |
| |
| [[table-vector-indices]] |
| .Numeric indices for built-in vector data types |
| [width="100%",cols="<34%,<66%",options="header"] |
| |==== |
| | Vector Components | Numeric indices that can be used |
| | 2-component | 0, 1 |
| | 3-component | 0, 1, 2 |
| | 4-component | 0, 1, 2, 3 |
| | 8-component | 0, 1, 2, 3, 4, 5, 6, 7 |
| | 16-component | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, |
| a, A, b, B, c, C, d, D, e, E, f, F |
| |==== |
| |
| The numeric indices must be preceded by the letter `s` or `S`. |
| |
| In the following example |
| |
| [source,opencl_c] |
| ---------- |
| float8 f; |
| ---------- |
| |
| `f.s0` refers to the 1^st^ element of the `float8` variable `f` and `f.s7` |
| refers to the 8^th^ element of the `float8` variable `f`. |
| |
| In the following example |
| |
| [source,opencl_c] |
| ---------- |
| float16 x; |
| ---------- |
| |
| `x.sa` (or `x.sA`) refers to the 11^th^ element of the `float16` variable |
| `x` and `x.sf` (or `x.sF`) refers to the 16^th^ element of the `float16` |
| variable `x`. |
| |
| The numeric indices used to refer to an appropriate element in the vector |
| cannot be intermixed with `.xyzw` notation used to access elements of a 1 .. |
| 4 component vector. |
| |
| For example |
| |
| [source,opencl_c] |
| ---------- |
| float4 f, a; |
| |
| a = f.x12w; // illegal use of numeric indices with .xyzw |
| |
| a.xyzw = f.s0123; // valid |
| ---------- |
| |
| Vector data types can use the `.lo` (or `.even`) and `.hi` (or `.odd`) |
| suffixes to get smaller vector types or to combine smaller vector types to a |
| larger vector type. |
| Multiple levels of `.lo` (or `.even`) and `.hi` (or `.odd`) suffixes can be |
| used until they refer to a scalar term. |
| |
| The `.lo` suffix refers to the lower half of a given vector. |
| The `.hi` suffix refers to the upper half of a given vector. |
| |
| The `.even` suffix refers to the even elements of a vector. |
| The `.odd` suffix refers to the odd elements of a vector. |
| |
| Some examples to help illustrate this are given below: |
| |
| [source,opencl_c] |
| ---------- |
| float4 vf; |
| |
| float2 low = vf.lo; // returns vf.xy |
| float2 high = vf.hi; // returns vf.zw |
| |
| float2 even = vf.even; // returns vf.xz |
| float2 odd = vf.odd; // returns vf.yw |
| ---------- |
| |
| The suffixes `.lo` (or `.even`) and `.hi` (or `.odd`) for a 3-component |
| vector type operate as if the 3-component vector type is a 4-component |
| vector type with the value in the `w` component undefined. |
| |
| Some examples are given below: |
| |
| [source,opencl_c] |
| ---------- |
| float8 vf; |
| float4 odd = vf.odd; |
| float4 even = vf.even; |
| float2 high = vf.even.hi; |
| float2 low = vf.odd.lo; |
| |
| // interleave LR stereo stream |
| float4 left, right; |
| float8 interleaved; |
| interleaved.even = left; |
| interleaved.odd = right; |
| |
| // deinterleave |
| left = interleaved.even; |
| right = interleaved.odd; |
| |
| // transpose a 4x4 matrix |
| |
| void transpose( float4 m[4] ) |
| { |
| // read matrix into a float16 vector |
| float16 x = (float16)( m[0], m[1], m[2], m[3] ); |
| float16 t; |
| |
| // transpose |
| t.even = x.lo; |
| t.odd = x.hi; |
| x.even = t.lo; |
| x.odd = t.hi; |
| |
| // write back |
| m[0] = x.lo.lo; // { m[0][0], m[1][0], m[2][0], m[3][0] } |
| m[1] = x.lo.hi; // { m[0][1], m[1][1], m[2][1], m[3][1] } |
| m[2] = x.hi.lo; // { m[0][2], m[1][2], m[2][2], m[3][2] } |
| m[3] = x.hi.hi; // { m[0][3], m[1][3], m[2][3], m[3][3] } |
| } |
| |
| float3 vf = (float3)(1.0f, 2.0f, 3.0f); |
| float2 low = vf.lo; // (1.0f, 2.0f); |
| float2 high = vf.hi; // (3.0f, _undefined_); |
| ---------- |
| |
| It is illegal to take the address of a vector element and will result in a |
| compilation error. |
| For example: |
| |
| [source,opencl_c] |
| ---------- |
| float8 vf; |
| |
| float *f = &vf.x; m // is illegal |
| float2 *f2 = &vf.s07; // is illegal |
| |
| float4 *odd = &vf.odd; // is illegal |
| float4 *even = &vf.even; // is illegal |
| float2 *high = &vf.even.hi; // is illegal |
| float2 *low = &vf.odd.lo; // is illegal |
| ---------- |
| |
| |
| [[aliasing-rules]] |
| === Aliasing Rules |
| |
| OpenCL C programs shall comply with the C99 type-based aliasing rules |
| defined in <<C99-spec,section 6.5, item 7 of the C99 Specification>>. |
| The OpenCL C built-in vector data types are considered aggregate types |
| footnote:[{fn-aggregate-types}] for the purpose of applying these |
| aliasing rules. |
| |
| [[keywords]] |
| === Keywords |
| |
| The following names are reserved for use as keywords in OpenCL C and shall |
| not be used otherwise. |
| |
| * Names reserved as keywords by C99. |
| * OpenCL C data types defined in <<table-builtin-vector-types,Built-in Vector |
| Data Types>>, |
| <<table-other-builtin-types,Other Built-in Data Types>>, and |
| <<table-reserved-types,Reserved Data Types>>. |
| * Address space qualifiers: `{global}`, `global`, `{local}`, `local`, |
| `{constant}`, `constant`, `{private}`, and `private`. |
| `{generic}` and `generic` are reserved for future use. |
| * Function qualifiers: `{kernel}` and `kernel`. |
| * Access qualifiers: `{read_only}`, `read_only`, `{write_only}`, |
| `write_only`, `{read_write}` and `read_write`. |
| * `uniform`, `pipe`. |
| |
| |
| [[conversions-and-type-casting]] |
| == Conversions and Type Casting |
| |
| |
| [[implicit-conversions]] |
| === Implicit Conversions |
| |
| Implicit conversions between scalar built-in types defined in |
| <<table-builtin-scalar-types,Built-in Scalar Data Types>> (except `void` and |
| `half` footnote:[{fn-cl_khr_fp16}]) are supported. |
| When an implicit conversion is done, it is not just a re-interpretation of |
| the expression's value but a conversion of that value to an equivalent value |
| in the new type. |
| For example, the integer value 5 will be converted to the floating-point |
| value 5.0. |
| |
| Implicit conversions from a scalar type to a vector type are allowed. |
| In this case, the scalar may be subject to the usual arithmetic conversion |
| to the element type used by the vector. |
| The scalar type is then widened to the vector. |
| |
| Implicit conversions between built-in vector data types are disallowed. |
| |
| Implicit conversions for pointer types follow the rules described in the |
| <<C99-spec,C99 Specification>>. |
| |
| |
| [[explicit-casts]] |
| === Explicit Casts |
| |
| Standard typecasts for built-in scalar data types defined in |
| <<table-builtin-scalar-types,Built-in Scalar Data Types>> will perform |
| appropriate conversion (except `void` and `half` footnote:[{fn-cl_khr_fp16}]). |
| In the example below: |
| |
| [source,opencl_c] |
| ---------- |
| float f = 1.0f; |
| int i = (int)f; |
| ---------- |
| |
| `f` stores `0x3F800000` and `i` stores `0x1` which is the floating-point |
| value `1.0f` in `f` converted to an integer value. |
| |
| Explicit casts between vector types are not legal. |
| The examples below will generate a compilation error. |
| |
| [source,opencl_c] |
| ---------- |
| int4 i; |
| uint4 u = (uint4) i; // not allowed |
| |
| float4 f; |
| int4 i = (int4) f; // not allowed |
| |
| float4 f; |
| int8 i = (int8) f; // not allowed |
| ---------- |
| |
| Scalar to vector conversions may be performed by casting the scalar to the |
| desired vector data type. |
| Type casting will also perform appropriate arithmetic conversion. |
| The round to zero rounding mode will be used for conversions to built-in |
| integer vector types. |
| The default rounding mode will be used for conversions to floating-point |
| vector types. |
| When casting a `bool` to a vector integer data type, the vector components |
| will be set to -1 (i.e. all bits set) if the bool value is _true_ and 0 |
| otherwise. |
| |
| Below are some correct examples of explicit casts. |
| |
| [source,opencl_c] |
| ---------- |
| float f = 1.0f; |
| float4 va = (float4)f; |
| |
| // va is a float4 vector with elements (f, f, f, f). |
| |
| uchar u = 0xFF; |
| float4 vb = (float4)u; |
| |
| // vb is a float4 vector with elements |
| // ((float)u, (float)u, (float)u, (float)u). |
| |
| float f = 2.0f; |
| int2 vc = (int2)f; |
| |
| // vc is an int2 vector with elements ((int)f, (int)f). |
| |
| uchar4 vtrue = (uchar4)true; |
| |
| // vtrue is a uchar4 vector with elements (0xff, 0xff, |
| // 0xff, 0xff). |
| ---------- |
| |
| |
| [[explicit-conversions]] |
| === Explicit Conversions |
| |
| Explicit conversions may be performed using the |
| |
| [source,opencl_c] |
| ---------- |
| convert_destType(sourceType) |
| ---------- |
| |
| suite of functions. |
| These provide a full set of type conversions between supported |
| <<built-in-scalar-data-types,scalar>>, |
| <<built-in-vector-data-types,vector>>, and |
| <<other-built-in-data-types,other>> data types except for the following |
| types: `bool`, `half`, `size_t`, `ptrdiff_t`, `intptr_t`, `uintptr_t`, and |
| `void`. |
| |
| The number of elements in the source and destination vectors must match. |
| |
| In the example below: |
| |
| [source,opencl_c] |
| ---------- |
| uchar4 u; |
| int4 c = convert_int4(u); |
| ---------- |
| |
| `convert_int4` converts a `uchar4` vector `u` to an `int4` vector `c`. |
| |
| [source,opencl_c] |
| ---------- |
| float f; |
| int i = convert_int(f); |
| ---------- |
| |
| `convert_int` converts a `float` scalar `f` to an int scalar `i`. |
| |
| The behavior of the conversion may be modified by one or two optional |
| modifiers that specify saturation for out-of-range inputs and rounding |
| behavior. |
| |
| The full form of the scalar convert function is: |
| |
| [source,opencl_c] |
| ---------- |
| destType convert_destType<_sat><_roundingMode>(sourceType) |
| ---------- |
| |
| where `dstType` is the destination scalar type and `sourceType` is the source scalar type. |
| |
| The full form of the vector convert function is: |
| |
| [source,opencl_c] |
| ---------- |
| destTypen convert_destTypen<_sat><_roundingMode>(sourceTypen) |
| ---------- |
| |
| where `destTypen` is the n-element destination vector type and `sourceTypen` is the n-element source vector type. |
| |
| |
| [[data-types]] |
| ==== Data Types |
| |
| Conversions are available for the following scalar types: `char`, `uchar`, |
| `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `float`, and built-in |
| vector types derived therefrom. |
| The operand and result type must have the same number of elements. |
| The operand and result type may be the same type in which case the |
| conversion has no effect on the type or value of an expression. |
| |
| Conversions between integer types follow the conversion rules specified in |
| <<C99-spec,sections 6.3.1.1 and 6.3.1.3 of the C99 Specification>> except |
| for <<out-of-range-behavior,out-of-range behavior and saturated |
| conversions>>. |
| |
| |
| [[rounding-modes]] |
| ==== Rounding Modes |
| |
| Conversions to and from floating-point type shall conform to IEEE-754 |
| rounding rules. |
| Conversions may have an optional rounding mode modifier described in the |
| following table. |
| |
| [[table-rounding-mode]] |
| .Rounding Modes |
| [cols=",",options="header",] |
| |==== |
| | Modifier | Rounding Mode Description |
| | `_rte` | Round to nearest even |
| | `_rtz` | Round toward zero |
| | `_rtp` | Round toward positive infinity |
| | `_rtn` | Round toward negative infinity |
| | no modifier specified | Use the default rounding mode for this destination |
| type, `_rtz` for conversion to integers or the |
| default rounding mode for conversion to |
| floating-point types. |
| |==== |
| |
| By default, conversions to integer type use the `_rtz` (round toward zero) |
| rounding mode and conversions to floating-point type |
| footnote:[{fn-float-conversion-rounding}] use the default rounding mode. |
| The only default floating-point rounding mode supported is round to nearest |
| even i.e the default rounding mode will be `_rte` for floating-point types. |
| |
| |
| [[out-of-range-behavior]] |
| ==== Out-of-Range Behavior and Saturated Conversions |
| |
| When the conversion operand is either greater than the greatest |
| representable destination value or less than the least representable |
| destination value, it is said to be out-of-range. |
| The result of out-of-range conversion is determined by the conversion rules |
| specified by <<C99-spec,section 6.3 of the C99 Specification>>. |
| When converting from a floating-point type to integer type, the behavior is |
| implementation-defined. |
| |
| Conversions to integer type may opt to convert using the optional saturated |
| mode by appending the _sat modifier to the conversion function name. |
| When in saturated mode, values that are outside the representable range |
| shall clamp to the nearest representable value in the destination format. |
| (NaN should be converted to 0). |
| |
| Conversions to floating-point type shall conform to IEEE-754 rounding rules. |
| The `_sat` modifier may not be used for conversions to floating-point |
| formats. |
| |
| |
| [[explicit-conversion-examples]] |
| ==== Explicit Conversion Examples |
| |
| Example 1: |
| |
| [source,opencl_c] |
| ---------- |
| short4 s; |
| |
| // negative values clamped to 0 |
| ushort4 u = convert_ushort4_sat( s ); |
| |
| // values > CHAR_MAX converted to CHAR_MAX |
| // values < CHAR_MIN converted to CHAR_MIN |
| char4 c = convert_char4_sat( s ); |
| ---------- |
| |
| Example 2: |
| |
| [source,opencl_c] |
| ---------- |
| float4 f; |
| |
| // values implementation-defined for |
| // f > INT_MAX, f < INT_MIN or NaN |
| int4 i = convert_int4( f ); |
| |
| // values > INT_MAX clamp to INT_MAX, values < INT_MIN clamp |
| // to INT_MIN. NaN should produce 0. |
| // The _rtz_ rounding mode is used to produce the integer values. |
| int4 i2 = convert_int4_sat( f ); |
| |
| // similar to convert_int4, except that floating-point values |
| // are rounded to the nearest integer instead of truncated |
| int4 i3 = convert_int4_rte( f ); |
| |
| // similar to convert_int4_sat, except that floating-point values |
| // are rounded to the nearest integer instead of truncated |
| int4 i4 = convert_int4_sat_rte( f ); |
| ---------- |
| |
| Example 3: |
| |
| [source,opencl_c] |
| ---------- |
| int4 i; |
| |
| // convert ints to floats using the default rounding mode. |
| float4 f = convert_float4( i ); |
| |
| // convert ints to floats. integer values that cannot |
| // be exactly represented as floats should round up to the |
| // next representable float. |
| float4 f = convert_float4_rtp( i ); |
| ---------- |
| |
| |
| [[reinterpreting-data-as-another-type]] |
| === Reinterpreting Data as Another Type |
| |
| It is frequently necessary to reinterpret bits in a data type as another |
| data type in OpenCL. |
| This is typically required when direct access to the bits in a |
| floating-point type is needed, for example to mask off the sign bit or make |
| use of the result of a vector <<operators-relational,relational operator>> |
| on floating-point data footnote:[{fn-float-reinterpretation}]. |
| Several methods to achieve this (non-) conversion are frequently practiced |
| in C, including pointer aliasing, unions and memcpy. |
| Of these, only memcpy is strictly correct in C99. |
| Since OpenCL does not provide *memcpy*, other methods are needed. |
| |
| [[reinterpreting-types-using-unions]] |
| ==== Reinterpreting Types Using Unions |
| |
| The OpenCL language extends the union to allow the program to access a |
| member of a union object using a member of a different type. |
| The relevant bytes of the representation of the object are treated as an |
| object of the type used for the access. |
| If the type used for access is larger than the representation of the object, |
| then the value of the additional bytes is undefined. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // d only if double-precision is supported |
| union { float f; uint u; double d; } u; |
| |
| u.u = 1; // u.f contains 2**-149. u.d is undefined -- |
| // depending on endianness the low or high half |
| // of d is unknown |
| |
| u.f = 1.0f; // u.u contains 0x3f800000, u.d contains an |
| // undefined value -- depending on endianness |
| // the low or high half of d is unknown |
| |
| u.d = 1.0; // u.u contains 0x3ff00000 (big endian) or 0 |
| // (little endian). u.f contains either 0x1.ep0f |
| // (big endian) or 0.0f (little endian) |
| ---------- |
| |
| |
| [[reinterpreting-types-using-as_type-and-as_typen]] |
| ==== Reinterpreting Types Using *as_type*() and *as_type__n__*() |
| |
| [open,refpage='as_typen',desc='Reinterpreting Types',type='freeform',spec='clang',anchor='reinterpreting-types-using-as_type-and-as_typen',xrefs='convert_T scalarDataTypes vectorDataTypes'] |
| -- |
| All data types described in <<table-builtin-scalar-types,Built-in Scalar Data |
| Types>> and <<table-builtin-vector-types,Built-in Vector Data Types>> (except |
| `bool`, `void`, and `half` footnote:[{fn-cl_khr_fp16}]) may be also |
| reinterpreted as another data type of the same size using the *as_type*() |
| operator for scalar data types and the *as_type__n__*() operator |
| footnote:[{fn-reinterpret-vector-types}] for vector data types. |
| When the operand and result type contain the same number of elements, the |
| bits in the operand shall be returned directly without modification as the |
| new type. |
| The usual type promotion for function arguments shall not be performed. |
| |
| For example, `*as_float*(0x3f800000)` returns `1.0f`, which is the value |
| that the bit pattern `0x3f800000` has if viewed as an IEEE-754 single |
| precision value. |
| |
| When the operand and result type contain a different number of elements, the |
| result shall be implementation-defined except if the operand is a |
| 4-component vector and the result is a 3-component vector. |
| In this case, the bits in the operand shall be returned directly without |
| modification as the new type. |
| That is, a conforming implementation shall explicitly define a behavior, but |
| two conforming implementations need not have the same behavior when the |
| number of elements in the result and operand types does not match. |
| The implementation may define the result to contain all, some or none of the |
| original bits in whatever order it chooses. |
| It is an error to use *as_type*() or *as_type__n__*() operator to |
| reinterpret data to a type of a different number of bytes. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| float f = 1.0f; |
| uint u = as_uint(f); // Legal. Contains: 0x3f800000 |
| |
| float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| // Legal. Contains: |
| // (int4)(0x3f800000, 0x40000000, 0x40400000, 0x40800000) |
| int4 i = as_int4(f); |
| |
| float4 f, g; |
| int4 is_less = f < g; |
| |
| // Legal. f[i] = f[i] < g[i] ? f[i] : 0.0f |
| f = as_float4(as_int4(f) & is_less); |
| |
| int i; |
| // Legal. Result is implementation-defined. |
| short2 j = as_short2(i); |
| |
| int4 i; |
| // Legal. Result is implementation-defined. |
| short8 j = as_short8(i); |
| |
| float4 f; |
| // Error. Result and operand have different sizes |
| double4 g = as_double4(f); // Only if double-precision is supported. |
| |
| float4 f; |
| // Legal. g.xyz will have same values as f.xyz. g.w is undefined |
| float3 g = as_float3(f); |
| ---------- |
| -- |
| |
| |
| [[pointer-casting]] |
| === Pointer Casting |
| |
| Pointers to old and new types may be cast back and forth to each other. |
| Casting a pointer to a new type represents an unchecked assertion that the |
| address is correctly aligned. |
| The developer will also need to know the endianness of the OpenCL device and |
| the endianness of the data to determine how the scalar and vector data |
| elements are stored in memory. |
| |
| |
| [[usual-arithmetic-conversions]] |
| === Usual Arithmetic Conversions |
| |
| Many operators that expect operands of arithmetic type cause conversions and |
| yield result types in a similar way. |
| The purpose is to determine a common real type for the operands and result. |
| For the specified operands, each operand is converted, without change of |
| type domain, to a type whose corresponding real type is the common real |
| type. |
| For this purpose, all vector types shall be considered to have higher |
| conversion ranks than scalars. |
| Unless explicitly stated otherwise, the common real type is also the |
| corresponding real type of the result, whose type domain is the type domain |
| of the operands if they are the same, and complex otherwise. |
| This pattern is called the usual arithmetic conversions. |
| If the operands are of more than one vector type, then an error shall occur. |
| <<implicit-conversions,Implicit conversions>> between vector types are not |
| permitted. |
| |
| Otherwise, if there is only a single vector type, and all other operands are |
| scalar types, the scalar types are converted to the type of the vector |
| element, then widened into a new vector containing the same number of |
| elements as the vector, by duplication of the scalar value across the width |
| of the new vector. |
| An error shall occur if any scalar operand has greater rank than the type of |
| the vector element. |
| For this purpose, the rank order defined as follows: |
| |
| . The rank of a floating-point type is greater than the rank of another |
| floating-point type, if the first floating-point type can exactly |
| represent all numeric values in the second floating-point type. |
| (For this purpose, the encoding of the floating-point value is used, |
| rather than the subset of the encoding usable by the device.) |
| . The rank of any floating-point type is greater than the rank of any |
| integer type. |
| . The rank of an integer type is greater than the rank of an integer type |
| with less precision. |
| . The rank of an unsigned integer type is *greater than* the rank of a |
| signed integer type with the same precision |
| footnote:[{fn-integer-conversion-rank}]. |
| . The rank of the bool type is less than the rank of any other type. |
| . The rank of an enumerated type shall equal the rank of the compatible |
| integer type. |
| . For all types, `T1`, `T2` and `T3`, if `T1` has greater rank than `T2`, |
| and `T2` has greater rank than `T3`, then `T1` has greater rank than |
| `T3`. |
| |
| Otherwise, if all operands are scalar, the usual arithmetic conversions |
| apply, per <<C99-spec,section 6.3.1.8 of the C99 Specification>>. |
| |
| [NOTE] |
| ==== |
| Both the standard orderings in <<C99-spec,sections 6.3.1.8 and 6.3.1.1 of |
| the C99 Specification>> were examined and rejected. |
| Had we used integer conversion rank here, `int4 + 0U` would have been legal |
| and had `int4` return type. |
| Had we used standard C99 usual arithmetic conversion rules for scalars, then |
| the standard integer promotion would have been performed on vector integer |
| element types and `short8 + char` would either have return type of `int8` or |
| be illegal. |
| ==== |
| |
| |
| [[operators]] |
| == Operators |
| |
| ifdef::refpageOnly[] |
| // This is an index page, generated only in the OpenCL refpages and not in |
| // the Specification. It's here so the index remains consistent with the |
| // spec. |
| [open,refpage='operators',desc='OpenCL Operators',type='freeform',spec='clang',anchor='operators',xrefs=''] |
| -- |
| OpenCL C includes a variety of operators, described individually in the |
| following sections: |
| |
| * link:arithmeticOperators.html[Arithmetic Operators] |
| * link:unaryOperators.html[Unary Operators] |
| * link:prePostOperators.html[Pre- and Post-Operators] |
| * link:relationalOperators.html[Relational Operators] |
| * link:equalityOperators.html[Equality Operators] |
| * link:bitwiseOperators.html[Bitwise Operators] |
| * link:logicalOperators.html[Logical Operators] |
| * link:unaryLogicalOperator.html[Unary Logical Operator] |
| * link:selectionOperator.html[Ternary Selection Operator] |
| * link:shiftOperators.html[Shift Operators] |
| * link:sizeofOperator.html[Sizeof Operator] |
| * link:commaOperator.html[Comma Operator] |
| * link:indirectionOperator.html[Indirection Operator] |
| * link:addressOperator.html[Address Operator] |
| * link:assignmentOperator.html[Assignment Operator] |
| -- |
| endif::refpageOnly[] |
| |
| |
| [[operators-arithmetic]] |
| === Arithmetic Operators |
| |
| [open,refpage='arithmeticOperators',desc='Arithmetic Operators',type='freeform',spec='clang',anchor='operators-arithmetic',xrefs='operators'] |
| -- |
| The arithmetic operators add (*+++*), subtract (*-*), multiply (*+*+*) and |
| divide (*/*) operate on built-in integer and floating-point scalar, and |
| vector data types. |
| The arithmetic operator remainder (*%*) operates on built-in integer scalar |
| and integer vector data types. |
| All arithmetic operators return result of the same built-in type (integer or |
| floating-point) as the type of the operands, after operand type conversion. |
| After conversion, the following cases are valid: |
| |
| * The two operands are scalars. |
| In this case, the operation is applied, resulting in a scalar. |
| * One operand is a scalar, and the other is a vector. |
| In this case, the scalar may be subject to the usual arithmetic |
| conversion to the element type used by the vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| * The two operands are vectors of the same type. |
| In this case, the operation is done component-wise resulting in the same |
| size vector. |
| |
| All other cases of implicit conversions are illegal. |
| Division on integer types which results in a value that lies outside of the |
| range bounded by the maximum and minimum representable values of the integer |
| type will not cause an exception but will result in an unspecified value. |
| A divide by zero with integer types does not cause an exception but will |
| result in an unspecified value. |
| Division by zero for floating-point types will result in {plusmn}{inf} or |
| NaN as prescribed by the IEEE-754 standard. |
| Use the built-in functions *dot* and *cross* to get, respectively, the |
| vector dot product and the vector cross product. |
| -- |
| |
| |
| [[operators-unary]] |
| === Unary Operators |
| |
| [open,refpage='unaryOperators',desc='Unary Operators',type='freeform',spec='clang',anchor='operators-unary',xrefs='operators'] |
| -- |
| The arithmetic unary operators (*+* and *-*) operate on built-in scalar and |
| vector types. |
| -- |
| |
| |
| [[operators-prepost]] |
| === Pre- and Post-Operators |
| |
| [open,refpage='prePostOperators',desc='Pre- and Post-Operators',type='freeform',spec='clang',anchor='operators-prepost',xrefs='operators'] |
| -- |
| The arithmetic post- and pre-increment and decrement operators (*--* and |
| *++*) operate on built-in scalar and vector types except the built-in scalar |
| and vector `float` types footnote:[{fn-float-increment-decrement}]. |
| All unary operators work component-wise on their operands. |
| These result with the same type they operated on. |
| For post- and pre-increment and decrement, the expression must be one that |
| could be assigned to (an l-value). |
| Pre-increment and pre-decrement add or subtract 1 to the contents of the |
| expression they operate on, and the value of the pre-increment or |
| pre-decrement expression is the resulting value of that modification. |
| Post-increment and post-decrement expressions add or subtract 1 to the |
| contents of the expression they operate on, but the resulting expression has |
| the expression's value before the post-increment or post-decrement was |
| executed. |
| -- |
| |
| |
| [[operators-relational]] |
| === Relational Operators |
| |
| [open,refpage='relationalOperators',desc='Relational Operators',type='freeform',spec='clang',anchor='operators-relational',xrefs='operators'] |
| -- |
| The relational operators greater than (*>*), less than (*<*), greater than or |
| equal (*>=*), and less than or equal (*\<=*) operate on scalar and vector types |
| footnote:[{fn-relational-any-all}]. |
| All relational operators result in an integer type. |
| After operand type conversion, the following cases are valid: |
| |
| * The two operands are scalars. |
| In this case, the operation is applied, resulting in an `int` scalar. |
| * One operand is a scalar, and the other is a vector. |
| In this case, the scalar may be subject to the usual arithmetic |
| conversion to the element type used by the vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| * The two operands are vectors of the same type. |
| In this case, the operation is done component-wise resulting in the same |
| size vector. |
| |
| All other cases of implicit conversions are illegal. |
| |
| The result is a scalar signed integer of type `int` if the source operands |
| are scalar and a vector signed integer type of the same size as the source |
| operands if the source operands are vector types. |
| Vector source operands of type `char__n__` and `uchar__n__` return a |
| `char__n__` result; vector source operands of type |
| ifdef::cl_khr_fp16[] |
| `_half__n__` footnote:[{fn-half-supported}], |
| endif::cl_khr_fp16[] |
| `short__n__` and |
| `ushort__n__` return a `short__n__` result; vector source operands of type |
| `int__n__`, `uint__n__` and `float__n__` return an `int__n__` result; vector |
| source operands of type `long__n__`, `ulong__n__` and `double__n__` return a |
| `long__n__` result. |
| |
| For scalar types, the relational operators shall return 0 if the specified |
| relation is _false_ and return 1 if the specified relation is _true_. |
| For vector types, the relational operators shall return 0 if the specified |
| relation is _false_ and return -1 (i.e. all bits set) if the specified |
| relation is _true_. |
| The relational operators always return 0 if either argument is not a number |
| (NaN). |
| -- |
| |
| |
| [[operators-equality]] |
| === Equality Operators |
| |
| [open,refpage='equalityOperators',desc='Equality Operators',type='freeform',spec='clang',anchor='operators-equality',xrefs='operators'] |
| -- |
| The equality operators equal (*==*) and not equal (*!=*) operate on |
| built-in scalar and vector types footnote:[{fn-relational-any-all}]. |
| All equality operators result in an integer type. |
| After operand type conversion, the following cases are valid: |
| |
| * The two operands are scalars. |
| In this case, the operation is applied, resulting in a scalar. |
| * One operand is a scalar, and the other is a vector. |
| In this case, the scalar may be subject to the usual arithmetic |
| conversion to the element type used by the vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| * The two operands are vectors of the same type. |
| In this case, the operation is done component-wise resulting in the same |
| size vector. |
| |
| All other cases of implicit conversions are illegal. |
| |
| The result is a scalar signed integer of type `int` if the source operands |
| are scalar and a vector signed integer type of the same size as the source |
| operands if the source operands are vector types. |
| Vector source operands of type `char__n__` and `uchar__n__` return a |
| `char__n__` result; vector source operands of type |
| ifdef::cl_khr_fp16[] |
| `_half__n__` footnote:[{fn-half-supported}], |
| endif::cl_khr_fp16[] |
| `short__n__` and |
| `ushort__n__` return a `short__n__` result; vector source operands of type |
| `int__n__`, `uint__n__` and `float__n__` return an `int__n__` result; vector |
| source operands of type `long__n__`, `ulong__n__` and `double__n__` return a |
| `long__n__` result. |
| |
| For scalar types, the equality operators shall return 0 if the specified |
| relation is _false_ and return 1 if the specified relation is _true_. |
| For vector types, the equality operators shall return 0 if the specified |
| relation is _false_ and return -1 (i.e. all bits set) if the specified |
| relation is _true_. |
| The equality operator equal (*==*) returns 0 if one or both arguments are |
| not a number (NaN). |
| The equality operator not equal (*!=*) returns 1 (for scalar source |
| operands) or -1 (for vector source operands) if one or both arguments are |
| not a number (NaN). |
| -- |
| |
| |
| [[operators-bitwise]] |
| === Bitwise Operators |
| |
| [open,refpage='bitwiseOperators',desc='Bitwise Operators',type='freeform',spec='clang',anchor='operators-bitwise',xrefs='operators'] |
| -- |
| The bitwise operators and (*&*), or (*|*), exclusive or (*^*), and not |
| (*+~+*) operate on all scalar and vector built-in types except the built-in |
| scalar and vector `float` types. |
| For vector built-in types, the operators are applied component-wise. |
| If one operand is a scalar and the other is a vector, the scalar may be |
| subject to the usual arithmetic conversion to the element type used by the |
| vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| ifdef::cl_khr_fp16[] |
| Vector source operands of type `_half__n__` footnote:[{fn-half-supported}] |
| return a `short__n__` result. |
| endif::cl_khr_fp16[] |
| -- |
| |
| |
| [[operators-logical]] |
| === Logical Operators |
| |
| [open,refpage='logicalOperators',desc='Logical Operators',type='freeform',spec='clang',anchor='operators-logical',xrefs='operators'] |
| -- |
| The logical operators and (*&&*) and or (*||*) operate on all scalar and |
| vector built-in types. |
| For scalar built-in types only, and (*&&*) will only evaluate the right hand |
| operand if the left hand operand compares unequal to 0. |
| For scalar built-in types only, or (*||*) will only evaluate the right hand |
| operand if the left hand operand compares equal to 0. |
| For built-in vector types, both operands are evaluated and the operators are |
| applied component-wise. |
| If one operand is a scalar and the other is a vector, the scalar may be |
| subject to the usual arithmetic conversion to the element type used by the |
| vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| |
| The logical operator exclusive or (*^^*) is reserved. |
| |
| The result is a scalar signed integer of type `int` if the source operands |
| are scalar and a vector signed integer type of the same size as the source |
| operands if the source operands are vector types. |
| Vector source operands of type `char__n__` and `uchar__n__` return a |
| `char__n__` result; vector source operands of type |
| ifdef::cl_khr_fp16[] |
| `_half__n__` footnote:[{fn-half-supported}], |
| endif::cl_khr_fp16[] |
| `short__n__` and |
| `ushort__n__` return a `short__n__` result; vector source operands of type |
| `int__n__`, `uint__n__` and `float__n__` return an `int__n__` result; vector |
| source operands of type `long__n__`, `ulong__n__` and `double__n__` return a |
| `long__n__` result. |
| |
| For scalar types, the logical operators shall return 0 if the result of the |
| operation is _false_ and return 1 if the result is _true_. |
| For vector types, the logical operators shall return 0 if the result of the |
| operation is _false_ and return -1 (i.e. all bits set) if the result is _true_. |
| -- |
| |
| |
| [[operators-logical-unary]] |
| === Unary Logical Operator |
| |
| [open,refpage='unaryLogicalOperator',desc='Unary Logical Operator',type='freeform',spec='clang',anchor='operators-logical-unary',xrefs='operators'] |
| -- |
| The logical unary operator not (*!*) operates on all scalar and vector |
| built-in types. |
| For built-in vector types, the operators are applied component-wise. |
| |
| The result is a scalar signed integer of type `int` if the source operands |
| are scalar and a vector signed integer type of the same size as the source |
| operands if the source operands are vector types. |
| Vector source operands of type `char__n__` and `uchar__n__` return a |
| `char__n__` result; vector source operands of type |
| ifdef::cl_khr_fp16[] |
| `_half__n__` footnote:[{fn-half-supported}], |
| endif::cl_khr_fp16[] |
| `short__n__` and |
| `ushort__n__` return a `short__n__` result; vector source operands of type |
| `int__n__`, `uint__n__` and `float__n__` return an `int__n__` result; vector |
| source operands of type `long__n__`, `ulong__n__` and `double__n__` return a |
| `long__n__` result. |
| |
| For scalar types, the logical unary operator shall return 0 if the value of |
| its operand compares unequal to 0, and return 1 if the value of its operand |
| compares equal to 0. |
| For vector types, the unary operator shall return 0 if the value of its |
| operand compares unequal to 0, and return -1 (i.e. all bits set) if the |
| value of its operand compares equal to 0. |
| -- |
| |
| |
| [[operators-ternary-selection]] |
| === Ternary Selection Operator |
| |
| [open,refpage='selectionOperator',desc='Ternary Selection Operator',type='freeform',spec='clang',anchor='operators-ternary-selection',xrefs='operators'] |
| -- |
| The ternary selection operator (*?:*) operates on three expressions (_exp1_ |
| *?* _exp2_ *:* _exp3_). |
| This operator evaluates the first expression _exp1_, which can be a scalar |
| or vector result except `float`. |
| If all three expressions are scalar values, the C99 rules for ternary |
| operator are followed. |
| If the result is a vector value, then this is equivalent to calling |
| *select*(_exp3_, _exp2_, _exp1_). |
| The *select* function is described in <<table-builtin-relational,Built-in Scalar |
| and Vector Relational Functions>>. |
| The second and third expressions can be any type, as long their types match, |
| or there is an <<implicit-conversions,implicit conversion>> that can be |
| applied to one of the expressions to make their types match, or one is a |
| vector and the other is a scalar and the scalar may be subject to the usual |
| arithmetic conversion to the element type used by the vector operand and |
| widened to the same type as the vector type. |
| This resulting matching type is the type of the entire expression. |
| -- |
| |
| |
| [[operators-shift]] |
| === Shift Operators |
| |
| [open,refpage='shiftOperators',desc='Shift Operators',type='freeform',spec='clang',anchor='operators-shift',xrefs='operators'] |
| -- |
| The operators right-shift (*>>*), left-shift (*<<*) operate on all scalar |
| and vector built-in types except the built-in scalar and vector `float` |
| types. |
| For built-in vector types, the operators are applied component-wise. |
| For the right-shift (*>>*), left-shift (*<<*) operators, the rightmost |
| operand must be a scalar if the first operand is a scalar, and the rightmost |
| operand can be a vector or scalar if the first operand is a vector. |
| |
| The result of `E1` *<<* `E2` is `E1` left-shifted by log~2~(N) least significant |
| bits in `E2` viewed as an unsigned integer value, where N is the number of bits |
| used to represent the data type of `E1` after integer promotion |
| footnote:[{fn-integer-promotion}], if `E1` is a scalar, or the number of bits |
| used to represent the type of `E1` elements, if `E1` is a vector. |
| The vacated bits are filled with zeros. |
| |
| The result of `E1` *>>* `E2` is `E1` right-shifted by log~2~(N) least |
| significant bits in `E2` viewed as an unsigned integer value, where N is the |
| number of bits used to represent the data type of `E1` after integer |
| promotion, if `E1` is a scalar, or the number of bits used to represent the |
| type of `E1` elements, if `E1` is a vector. |
| If `E1` has an unsigned type or if `E1` has a signed type and a nonnegative |
| value, the vacated bits are filled with zeros. |
| If `E1` has a signed type and a negative value, the vacated bits are filled |
| with ones. |
| -- |
| |
| |
| [[operators-sizeof]] |
| === Sizeof Operator |
| |
| [open,refpage='sizeofOperator',desc='Sizeof Operator',type='freeform',spec='clang',anchor='operators-sizeof',xrefs='operators'] |
| -- |
| The `sizeof` operator yields the size (in bytes) of its operand, including |
| any <<alignment-of-types,padding bytes needed for alignment>>, which may be |
| an expression or the parenthesized name of a type. |
| The size is determined from the type of the operand. |
| The result is of type `size_t`. |
| If the type of the operand is a variable length array |
| footnote:[{fn-variable-length-array-restriction}] type, the operand is |
| evaluated; otherwise, the operand is not evaluated and the result is an integer |
| constant. |
| |
| When applied to an operand that has type `char` or `uchar`, the result is 1. |
| When applied to an operand that has type `short`, `ushort`, or `half` the |
| result is 2. |
| When applied to an operand that has type `int`, `uint` or `float`, the |
| result is 4. |
| When applied to an operand that has type `long`, `ulong` or `double`, the |
| result is 8. |
| When applied to an operand that is a vector type, the result is the number of |
| components times the size of each scalar component footnote:[{fn-vec3-size}]. |
| When applied to an operand that has array type, the result is the total |
| number of bytes in the array. |
| When applied to an operand that has structure or union type, the result is |
| the total number of bytes in such an object, including internal and trailing |
| padding. |
| The `sizeof` operator shall not be applied to an expression that has |
| function type or an incomplete type, to the parenthesized name of such a |
| type, or to an expression that designates a bit-field struct member |
| footnote:[{fn-bitfield-struct-restriction}]. |
| |
| The behavior of applying the `sizeof` operator to the `bool`, `image2d_t`, |
| `image3d_t`, `image2d_array_t`, `image1d_t`, `image1d_buffer_t`, |
| `image1d_array_t`, `image2d_depth_t`, `image2d_array_depth_t`, |
| `sampler_t`, `queue_t`, `ndrange_t`, `clk_event_t`, `reserve_id_t`, and |
| `event_t` types is implementation-defined. Additionally, the behavior of |
| applying the `sizeof` operator to a pipe object (a type with the `pipe` type |
| specifier keyword) is implementation-defined. |
| -- |
| |
| |
| [[operators-comma]] |
| === Comma Operator |
| |
| [open,refpage='commaOperator',desc='Comma Operator',type='freeform',spec='clang',anchor='operators-comma',xrefs='operators'] |
| -- |
| The comma (*,*) operator operates on expressions by returning the type and |
| value of the right-most expression in a comma separated list of expressions. |
| All expressions are evaluated, in order, from left to right. |
| -- |
| |
| |
| [[operators-indirection]] |
| === Indirection Operator |
| |
| [open,refpage='indirectionOperator',desc='Indirection Operator',type='freeform',spec='clang',anchor='operators-indirection',xrefs='operators'] |
| -- |
| The unary (*+*+*) operator denotes indirection. |
| If the operand points to an object, the result is an l-value designating the |
| object. |
| If the operand has type "pointer to __type__", the result has type |
| "__type__". |
| If an invalid value has been assigned to the pointer, the behavior of the unary |
| *+*+* operator is undefined footnote:[{fn-pointer-invalid-value-indirection}]. |
| -- |
| |
| |
| [[operators-address]] |
| === Address Operator |
| |
| [open,refpage='addressOperator',desc='Address Operator',type='freeform',spec='clang',anchor='operators-address',xrefs='operators'] |
| -- |
| The unary (*&*) operator returns the address of its operand. |
| If the operand has type "__type__", the result has type "pointer to |
| __type__". |
| If the operand is the result of a unary *+*+* operator, neither that operator |
| nor the *&* operator is evaluated and the result is as if both were omitted, |
| except that the constraints on the operators still apply and the result is |
| not an l-value. |
| Similarly, if the operand is the result of a *[]* operator, neither the *&* |
| operator nor the unary *+*+* that is implied by the *[]* is evaluated and the |
| result is as if the *&* operator were removed and the *[]* operator were |
| changed to a *+* operator. |
| Otherwise, the result is a pointer to the object designated by its |
| operand footnote:[{fn-address-operator-result}]. |
| -- |
| |
| |
| [[operators-assignment]] |
| === Assignment Operator |
| |
| [open,refpage='assignmentOperator',desc='Assignment Operator',type='freeform',spec='clang',anchor='operators-assignment',xrefs='operators'] |
| -- |
| Assignments of values to variable names are done with the assignment |
| operator (*=*), like |
| |
| [none] |
| * _lvalue_ = _expression_ |
| |
| The assignment operator stores the value of _expression_ into _lvalue_. |
| The _expression_ and _lvalue_ must have the same type, or the expression must |
| have a type in <<table-builtin-scalar-types,Built-in Scalar Data Types>>, in |
| which case an implicit conversion will be done on the expression before the |
| assignment is done. |
| |
| If _expression_ is a scalar type and _lvalue_ is a vector type, the scalar |
| is converted to the element type used by the vector operand. |
| The scalar type is then widened to a vector that has the same number of |
| components as the vector operand. |
| The operation is done component-wise resulting in the same size vector. |
| |
| Any other desired type-conversions must be specified explicitly. |
| L-values must be writable. |
| Variables that are built-in types, entire structures or arrays, structure |
| fields, l-values with the field selector (*.*) applied to select components |
| or swizzles without repeated fields, l-values within parentheses, and |
| l-values dereferenced with the array subscript operator (*[]*) are all |
| l-values. |
| Other binary or unary expressions, function names, swizzles with repeated |
| fields, and constants cannot be l-values. |
| The ternary operator (*?:*) is also not allowed as an l-value. |
| |
| The order of evaluation of the operands is unspecified. |
| If an attempt is made to modify the result of an assignment operator or to |
| access it after the next sequence point, the behavior is undefined. |
| Other assignment operators are the assignments add into (**+=**), subtract |
| from (*-=*), multiply into (*=*), divide into (*/=*), modulus into (*%=*), |
| left shift by (*<\<=*), right shift by (*>>=*), and into (*&=*), inclusive |
| or into (*|=*), and exclusive or into (*^=*). |
| |
| The expression |
| |
| [none] |
| * _lvalue_ __op__ *=* _expression_ |
| |
| is equivalent to |
| |
| [none] |
| * _lvalue_ = _lvalue_ _op_ _expression_ |
| |
| and the _lvalue_ and _expression_ must satisfy the requirements for both |
| operator _op_ and assignment (*=*). |
| |
| [NOTE] |
| ==== |
| Except for the `sizeof` operator, the `half` data type cannot be used with |
| any of the operators described in this section. |
| ==== |
| -- |
| |
| |
| [[vector-operations]] |
| == Vector Operations |
| |
| Vector operations are component-wise. |
| Usually, when an operator operates on a vector, it is operating |
| independently on each component of the vector, in a component-wise fashion. |
| |
| For example, |
| |
| [source,opencl_c] |
| ---------- |
| float4 v, u; |
| float f; |
| |
| v = u + f; |
| ---------- |
| |
| will be equivalent to |
| |
| [source,opencl_c] |
| ---------- |
| v.x = u.x + f; |
| v.y = u.y + f; |
| v.z = u.z + f; |
| v.w = u.w + f; |
| ---------- |
| |
| And |
| |
| [source,opencl_c] |
| ---------- |
| float4 v, u, w; |
| |
| w = v + u; |
| ---------- |
| |
| will be equivalent to |
| |
| [source,opencl_c] |
| ---------- |
| w.x = v.x + u.x; |
| w.y = v.y + u.y; |
| w.z = v.z + u.z; |
| w.w = v.w + u.w; |
| ---------- |
| |
| and likewise for most operators and all integer and floating-point vector |
| types. |
| |
| |
| [[address-space-qualifiers]] |
| == Address Space Qualifiers |
| |
| [open,refpage='addressSpaceQualifiers',desc='Address Space Qualifiers',type='freeform',spec='clang',anchor='address-space-qualifiers',xrefs='constant genericAddressSpace global local private'] |
| -- |
| |
| OpenCL C has a hierarchical memory architecture represented by address spaces, as |
| defined in section 5 of <<embedded-c-spec, the Embedded C Specification>>. It |
| extends the C syntax to allow an address space name as a valid type qualifier |
| (section 5.1.2 of <<embedded-c-spec, the Embedded C Specification>>). |
| OpenCL implements disjoint named address spaces with the spelling |
| `{global}`, `{local}`, `{constant}` and `{private}`. |
| The address space qualifier may be used in variable declarations to specify |
| the region where objects are to be allocated. If the type of an |
| object is qualified by an address space name, the object is allocated in the |
| specified address space. Similarly, for pointers, the type pointed to can be qualified |
| by an address space signaling the address space where the object pointed to is located. |
| |
| The address space name spelling without the `+__+` prefix, i.e. `global`, |
| `local`, `constant` and `private`, are valid and may be substituted for the |
| corresponding address space names with the `+__+` prefix. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // declares a pointer p in the global address space that |
| // points to an object in the global address space |
| __global int *__global p; |
| |
| void foo (...) |
| { |
| // declares an array of 4 floats in the private address space |
| __private float x[4]; |
| ... |
| } |
| ---------- |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 with the {opencl_c_generic_address_space} |
| feature macro, there is an additional unnamed generic address space. |
| |
| Most of the restrictions from section 5.1.2 and section 5.3 of the |
| <<embedded-c-spec, Embedded C Specification>> apply in OpenCL C, e.g. address |
| spaces can not be used with a return type, a function parameter, or a function |
| type, and multiple address space qualifiers are not allowed. However, in OpenCL |
| C it is allowed to qualify local variables with an address space qualifier. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // OK. |
| int f() { ... } |
| |
| // Error. Address space qualifier cannot be used with a non-pointer return type. |
| private int f() { ... } |
| |
| // OK. Address space qualifier can be used with a pointer return type. |
| local int *f() { ... } |
| |
| // Error. Multiple address spaces specified for a type. |
| private local int i; |
| |
| // OK. The first address space qualifies the object pointed to and the second |
| // qualifies the pointer. |
| private int *local ptr; |
| ---------- |
| |
| The `{global}`, `{constant}`, `{local}`, `{private}`, `global`, |
| `constant`, `local`, and `private` names are reserved for use as address |
| space qualifiers and shall not be used otherwise. |
| The `{generic}` and `generic` names are reserved for future use. |
| |
| [NOTE] |
| ==== |
| The size of pointers to different address spaces may differ. |
| It is not correct to assume that, for example, `+sizeof(__global int *)+` |
| always equals `+sizeof(__local int *)+`. |
| ==== |
| -- |
| |
| [[global-or-global]] |
| === `{global}` (or `global`) |
| |
| [open,refpage='global',desc='global Address Space Qualifiers',type='freeform',spec='clang',anchor='global-or-global',xrefs='addressSpaceQualifiers constant genericAddressSpace local private'] |
| -- |
| |
| The `{global}` or `global` address space name is used to refer to memory |
| objects (buffer or image objects) allocated from the `global` memory pool. |
| |
| A buffer memory object can be declared as a pointer to a scalar, vector or |
| user-defined struct. |
| This allows the kernel to read and/or write any location in the buffer. |
| |
| The actual size of the memory object is determined when the memory |
| object is allocated via appropriate API calls in the host code. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| global float4 *color; // An array of float4 elements |
| |
| typedef struct { |
| float a[3]; |
| int b[2]; |
| } foo_t; |
| |
| global foo_t *my_info; // An array of foo_t elements |
| ---------- |
| |
| As image objects are always allocated from the `global` address space, the |
| `{global}` or `global` qualifier should not be specified for image types. |
| The elements of an image object cannot be directly accessed. |
| Built-in functions to read from and write to an image object are provided. |
| |
| Variables at program scope or `static` or `extern` variables inside functions |
| can be declared in global address space if the |
| {opencl_c_program_scope_global_variables} feature is supported. These |
| variables in the `global` address space have the same lifetime as the program, |
| and their values persist between calls to any of the kernels in the program. |
| They are not shared across devices and have distinct storage. |
| -- |
| |
| |
| [[local-or-local]] |
| === `{local}` (or `local`) |
| |
| [open,refpage='local',desc='local Address Space Qualifiers',type='freeform',spec='clang',anchor='local-or-local',xrefs='addressSpaceQualifiers constant genericAddressSpace global private'] |
| -- |
| |
| The `{local}` or `local` address space name is used to describe variables that |
| are allocated in local memory and shared by all work-items in a work-group. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void my_func(...) |
| { |
| local float a; // A single float allocated |
| // in the local address space |
| |
| local float b[10]; // An array of 10 floats |
| // allocated in the local address space |
| } |
| ---------- |
| [NOTE] |
| ==== |
| Variables allocated in the `{local}` address space inside a kernel |
| function are allocated for each work-group executing the kernel and exist |
| only for the lifetime of the work-group executing the kernel. |
| ==== |
| |
| -- |
| |
| [[constant-or-constant]] |
| === `{constant}` (or `constant`) |
| |
| [open,refpage='constant',desc='constant Address Space Qualifiers',type='freeform',spec='clang',anchor='constant-or-constant',xrefs='addressSpaceQualifiers genericAddressSpace global local private'] |
| -- |
| |
| The `{constant}` or `constant` address space name is used to describe |
| read-only variables that are accessible globally. They may |
| be declared in program scope or in the outermost kernel scope or inside |
| functions with a `static` or `extern` storage class specifier. Such variables |
| can be accessed by all work-items or by different kernels during the program execution. |
| |
| [NOTE] |
| ==== |
| Each argument to a kernel that is a pointer to the `{constant}` address |
| space is counted separately towards the maximum number of such arguments, |
| defined as the value of the <<opencl-device-queries, |
| `CL_DEVICE_MAX_CONSTANT_ARGS` device query>>. |
| ==== |
| |
| It is illegal to write to a variable in the constant address space and will |
| result in a compilation error. |
| |
| Example: |
| |
| [source,opencl_c] |
| ---------- |
| constant int a = 3; // int allocated in the constant address space |
| kernel void k1(global int *buf) |
| { |
| buf[a] = ...; // OK. All work items access element with index 3. |
| } |
| kernel void k2(global int *buf) |
| { |
| *buf = a; // OK. All work items store value 3. |
| a = 42; // Error. a is in constant memory. |
| } |
| ---------- |
| |
| Implementations are not required to aggregate these declarations into the |
| fewest number of constant arguments. This behavior is implementation-defined. |
| |
| Thus portable code must conservatively assume that each variable declared |
| inside a function or in program scope allocated in the `{constant}` |
| address space counts as a separate constant argument. |
| -- |
| |
| [[private-or-private]] |
| === `{private}` (or `private`) |
| |
| [open,refpage='private',desc='private Address Space Qualifiers',type='freeform',spec='clang',anchor='private-or-private',xrefs='addressSpaceQualifiers constant genericAddressSpace global local'] |
| -- |
| The private address space is a memory segment that can only be accessed by one |
| work item. Variables that are not shareable among work items are allocated in |
| the private address space, and it is the default address space for most |
| variables, in particular variables with automatic storage duration. |
| |
| Example: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void foo(...) |
| { |
| private int i; |
| } |
| ---------- |
| -- |
| |
| [[the-generic-address-space]] |
| === The Generic Address Space |
| |
| [open,refpage='genericAddressSpace',desc='The Generic Address Space',type='freeform',spec='clang',anchor='the-generic-address-space',xrefs='addressSpaceQualifiers constant global local private'] |
| -- |
| |
| The generic address space requires support for OpenCL C 2.0 or OpenCL C 3.0 with |
| the {opencl_c_generic_address_space} feature. It can be used with pointer |
| types and it represents a placeholder for any of the named address spaces |
| - `global`, `local` or `private`. It signals that a pointer points to an object |
| in one of these concrete named address spaces. The exact address space |
| resolution can occur dynamically during the kernel execution. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void foo(int a) |
| { |
| private int b; |
| local int c; |
| int* p = a ? &b : &c; // p points to the local or private address space. |
| } |
| ---------- |
| |
| -- |
| |
| === Usage for Declaration Scopes and Variable Types |
| -- |
| This section describes use of address space qualifiers with respect to |
| declaration scopes or variable types. |
| |
| Local variables inside functions can be qualified by the private address space |
| qualifier. |
| |
| Variables declared in the outermost compound statement inside the body of the |
| kernel function can be qualified by the local or constant address spaces. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void my_func(...) |
| { |
| private float a; // OK. |
| local float b; // OK. |
| |
| if (...) |
| { |
| // Example of variable in __local address space but not |
| // declared at __kernel function scope. |
| local float c; // Error. |
| } |
| } |
| ---------- |
| |
| Program scope variables or variables with a `extern` or `static` storage class |
| specifier: |
| |
| * Must be qualified by `{constant}` in OpenCL C prior to 2.0 or OpenCL C 3.0 |
| without {opencl_c_program_scope_global_variables} feature. |
| * Can be qualified by either `{constant}` or `{global}` for OpenCL C 2.0 or |
| OpenCL C 3.0 with {opencl_c_program_scope_global_variables} feature. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_program_scope_global_variables feature macro. |
| |
| constant int foo; // OK. |
| global int baz; // OK. |
| global uchar buf[512]; // OK. |
| |
| static global int bat; // OK. Internal linkage. |
| |
| extern constant int foo; // OK. |
| |
| void func(...) |
| { |
| constant static int foo = 1; // OK. |
| global extern int foo; // OK. |
| } |
| |
| global int *global ptr; // OK. |
| constant int *global ptr = &baz; // Error, baz is in the global address space. |
| global int *constant ptr = &baz; // OK. |
| ---------- |
| |
| Kernel function arguments declared to be a pointer or an array of a type |
| must point to one of the named address spaces `{global}`, `{local}` or |
| `{constant}`. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // OK. |
| kernel void my_kernel(global int *ptr) |
| { |
| ... |
| } |
| // Error, ptr must point to the global, local, or constant address space. |
| kernel void my_kernel(int *ptr) |
| { |
| ... |
| } |
| ---------- |
| -- |
| |
| === Initialization |
| -- |
| Program scope and `static` variables in the `{global}` address space are zero |
| initialized by default. A constant expression may be given as an initializer. |
| |
| Variables allocated in the `{local}` address space inside a kernel function |
| cannot be initialized. |
| |
| Variables allocated in the `{constant}` address space are required to be initialized |
| and the values used to initialize these variables must be a compile time constant. |
| |
| Private address space objects are not initialized by default; any initializer is |
| allowed to be given. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| global int a = 12; // Initialization is allowed. |
| global int b; // Zero initialized. |
| constant int c = 12; // Initializer is a compile time constant. |
| constant int d; // Error. No initializer provided. |
| kernel void my_func(...) |
| { |
| local float e = 1; // Error. Initializer is not allowed. |
| |
| local float f; |
| f = 1; // Allowed |
| private int g; // Uninitialized. |
| constant int h = g; // Error. Initializer is not a constant expression. |
| } |
| ---------- |
| |
| -- |
| |
| [[addr-spaces-inference]] |
| === Inference |
| |
| -- |
| Address space qualifiers are not required in many cases. If they are not |
| specified explicitly the default address space will be inferred depending |
| on the declaration scope and the object type. |
| |
| There is no syntax to provide address space in the source for some situations, |
| therefore only the default address space is applicable. |
| |
| For OpenCL C 2.0 or with the {opencl_c_program_scope_global_variables} |
| feature, the address space for a variable at program scope or a `static` |
| or `extern` variable inside a function are inferred to be `{global}`. |
| |
| If the generic address space is supported i.e. for OpenCL C 2.0 or OpenCL C 3.0 |
| with {opencl_c_generic_address_space} feature, pointers that are declared |
| without pointing to a named address space point to the generic address space. |
| |
| All string literal storage shall be in the `{constant}` address space. |
| |
| For all other cases that are not listed above the address space is inferred to |
| `{private}`. This includes: |
| |
| * All function arguments as well as return values are in the private address |
| space. |
| |
| * Pointers that are declared without pointing to a named address space point |
| to the `{private}` address space if the generic address space is not |
| supported. |
| |
| * Variables inside a function not declared with an address space qualifier |
| are inferred to be in the private address space. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_program_scope_global_variables feature macro. |
| |
| int foo; // Inferred to be in the global address space. |
| |
| static int foo; // Inferred to be in the global address space. |
| |
| int *ptr; // ptr is inferred to be in the global address space. |
| // ptr points to a location in (1) the generic address |
| // space for OpenCL C 2.0 or OpenCL C 3.0 with |
| // __opencl_c_generic_address_space feature or |
| // in (2) the private address space otherwise. |
| |
| int *global ptr; // ptr is declared to be in the global address space. |
| // ptr points to an location in (1) the generic address |
| // space for OpenCL C 2.0 or OpenCL C 3.0 with |
| // __opencl_c_generic_address_space feature or |
| // in (2) the private address space otherwise. |
| |
| constant int *ptr = |
| "Hello"; // string literal is in constant address space. |
| |
| void func(int param) // param is allocated in the private address space. |
| { |
| int foo; // foo is allocated in the private address space. |
| static int foo; // foo is allocated in the global address space. |
| int *ptr; // ptr is allocated in the private address space. |
| // ptr points to a location in (1) the generic address |
| // space for OpenCL C 2.0 or OpenCL C 3.0 with |
| // __opencl_c_generic_address_space feature or |
| // in (2) the private address space otherwise. |
| ... |
| } |
| ---------- |
| |
| [NOTE] |
| ==== |
| Qualifiers must be explicitly specified for: |
| |
| * Program scope variables or variables inside functions with |
| a `static` or `extern` type specifier for OpenCL C prior to version 2.0 or |
| OpenCL C 3.0 without {opencl_c_program_scope_global_variables} feature, |
| |
| * Pointers used as arguments to kernel functions (the address space pointed |
| to must be specified explicitly). |
| ==== |
| |
| [[table-addr-spaces-summary]] |
| .Address space behavior |
| [width="100%",cols="1,2,2,2",options="header"] |
| |==== |
| | Address Space | Supported Usage | Initialization | Inference |
| |
| | `{global}` |
| | Program scope variables, for OpenCL C 2.0 or |
| OpenCL C 3.0 with the {opencl_c_program_scope_global_variables} feature, |
| |
| `static` or `extern` local variables, for OpenCL C 2.0 or |
| OpenCL C 3.0 with the {opencl_c_program_scope_global_variables} feature, |
| |
| Pointers. |
| | Optional constant initializers, 0-initialized by default. |
| |
| | Program scope variables, for OpenCL C 2.0 or |
| OpenCL C 3.0 with the {opencl_c_program_scope_global_variables} feature. |
| |
| `static` or `extern` local variables, for OpenCL C 2.0 or |
| OpenCL C 3.0 with the {opencl_c_program_scope_global_variables} feature. |
| |
| | `{private}` |
| | Local scope variables, |
| |
| Function arguments and return types, |
| |
| Pointers. |
| |
| | Optional initializers, otherwise no default initialization. |
| | Local scope variables, |
| |
| Function arguments and return types, |
| |
| Pointers in which the address space they point to is not given explicitly, |
| for OpenCL C prior to version 2.0 or OpenCL C 3.0 without the |
| {opencl_c_generic_address_space} feature. |
| |
| | `{constant}` |
| | Program scope variables, |
| |
| Kernel scope variables, |
| |
| String literals, |
| |
| Pointers. |
| | Mandatory initialization with a compile time constant. |
| | String literals. |
| |
| | `{local}` |
| | Kernel scope variables, |
| |
| Pointers. |
| | Not supported. |
| | Not supported. |
| |
| | Generic |
| | Pointers, for OpenCL C 2.0 or OpenCL C 3.0 with the |
| {opencl_c_generic_address_space} feature |
| | Not applicable. |
| | Pointers in which the address space they point to is not given explicitly, |
| for OpenCL C 2.0 or OpenCL C 3.0 with the {opencl_c_generic_address_space} |
| feature. |
| |==== |
| -- |
| |
| [[addr-spaces-conversions]] |
| === Address Space Conversions |
| |
| -- |
| |
| OpenCL implements the address space nesting model for pointers from |
| <<embedded-c-spec, Embedded C, section 5.1.3>> as follows: |
| |
| * In OpenCL the named address spaces `{global}`, `{local}`, |
| `{constant}` and `{private}` are disjoint. |
| * The named address spaces `{global}`, `{local}`, and `{private}` |
| are subsets of the unnamed generic address space. |
| * The unnamed generic address space does not overlap the named `{constant}` |
| address space; the named `{constant}` address space is not in the generic |
| address space. |
| |
| [NOTE] |
| ==== |
| The OpenCL definition of the generic address space is different than the |
| definition in section 5 of the <<embedded-c-spec, Embedded C Specification>>. In |
| OpenCL C, no objects can be allocated in this address space. It can only be used |
| with pointer types, where a pointer pointing to a location in the generic |
| address space can be used for objects allocated in any of the concrete named |
| address spaces `private`, `local`, or `global`. |
| ==== |
| |
| Following section 5.3 of the <<embedded-c-spec, Embedded C Specification>>, it |
| is only allowed to convert pointers implicitly, i.e. in assignments, function |
| parameters, operations, if the original pointer points to an object qualified by |
| an address space enclosed into the address space pointed by the destination |
| pointer. |
| |
| In contrast to the <<embedded-c-spec, Embedded C Specification>>, explicitly |
| converting i.e. casting between pointers to non-overlapping address spaces is |
| illegal in OpenCL. |
| |
| Considering the above, the following applies to conversions of pointers pointing |
| to different address spaces: |
| |
| * A pointer that points to the `global`, `local` or `private` address |
| space can be implicitly converted to a pointer to the unnamed generic |
| address space but not vice-versa. |
| * Pointer casts can be used to cast a pointer that points to the `global`, |
| `local` or `private` space to the unnamed generic address space and |
| vice-versa. |
| * A pointer that points to the `constant` address space cannot be cast or |
| implicitly converted to the generic address space. |
| |
| Examples: |
| |
| This is the canonical example. |
| In this example, function `foo` is declared with an argument that is a |
| pointer with the unnamed generic address space address space qualifier. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| void foo(int *a) |
| { |
| *a = *a + 2; |
| } |
| |
| kernel void k1(local int *a) |
| { |
| ... |
| foo(a); |
| ... |
| } |
| |
| kernel void k2(global int *a) |
| { |
| ... |
| foo(a); |
| ... |
| } |
| ---------- |
| |
| In the example below, `var` is a pointer to the unnamed generic address space. |
| A pointer to the `global` or `local` address space may be assigned to `var` |
| depending on the result of a conditional expression. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| kernel void bar(global int *g, local int *l) |
| { |
| int *var; |
| |
| if (is_even(get_global_id(0)) |
| var = g; |
| else |
| var = l; |
| *var = 42; |
| ... |
| } |
| ---------- |
| |
| In the example below, the same pointer to the unnamed generic address |
| space is used to point to objects allocated in different named address spaces. |
| A pointer to the unnamed generic address space may point to |
| objects in the `global`, `local`, and `private` address spaces, |
| but it is not legal for a pointer to the unnamed generic address to |
| point to an object in the `constant` address space. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| int *ptr; |
| global int g; |
| ptr = &g; // legal |
| |
| local int l; |
| ptr = &l; // legal |
| |
| private int p; |
| ptr = &p; // legal |
| |
| constant int c; |
| ptr = &c; // illegal |
| ---------- |
| |
| In the example below, pointers to named address spaces are assigned to |
| a pointer to the unnamed generic address space. |
| It is legal to assign a pointer to the `global`, `local`, and `private` |
| address spaces to a pointer to the unnamed generic address space without |
| an explicit cast. |
| It is not legal to assign a pointer to the `constant` address space to |
| a pointer to the unnamed generic address space. |
| It is also not legal to assign a pointer to the unnamed generic address |
| space to a pointer to a named address space without a cast. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| global int *gp; |
| local int *lp; |
| private int *pp; |
| constant int *cp; |
| |
| int *p; |
| p = gp; // OK. |
| p = lp; // OK. |
| p = pp; // OK. |
| p = cp; // Error. |
| |
| // it is illegal to convert from a generic pointer |
| // to an explicit address space pointer without a cast: |
| gp = p; // Error. |
| lp = p; // Error. |
| pp = p; // Error. |
| cp = p; // Error. |
| ---------- |
| |
| The example below illustrates the implicit conversion between named address |
| spaces. |
| |
| [source,opencl_c] |
| ---------- |
| global int *gp; |
| local int *lp; |
| private int *pp; |
| constant int *cp; |
| |
| // it is illegal to convert pointers pointing to different |
| // named address spaces. |
| |
| gp = lp; // Error. |
| gp = pp; // Error. |
| gp = cp; // Error. |
| |
| lp = gp; // Error. |
| lp = pp; // Error. |
| lp = cp; // Error. |
| |
| pp = lp; // Error. |
| pp = gp; // Error. |
| pp = cp; // Error. |
| |
| cp = lp; // Error. |
| cp = pp; // Error. |
| cp = gp; // Error. |
| ---------- |
| |
| The example below demonstrates explicit conversions for pointers pointing to |
| different address spaces. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| global int *gp; |
| local int *lp; |
| private int *pp; |
| constant int *cp; |
| |
| int *p; |
| gp = (global int *)lp; // illegal to cast between named address spaces |
| p = (int *)lp; // legal to cast from global to generic |
| gp = (global int*)p; // legal to cast from generic to global |
| ---------- |
| |
| For nested pointers, implicit conversions between address spaces are disallowed. |
| Explicitly casting between different address spaces in nested pointers is |
| allowed but the use of such pointers can lead to incorrect behavior such as |
| accessing invalid memory locations. |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| kernel void mykernel(...) |
| { |
| // ll is a pointer to a pointer in the local address space, |
| // which points to an integer in the local address space |
| local int *local *ll; |
| |
| // gl is a pointer to a pointer in the local address space, |
| // which points to an integer in the global address space |
| global int *local *gl; |
| |
| // nl is a pointer to a pointer in the local address space, |
| // which points to an integer via the unnamed generic address space |
| int *local * nl; |
| |
| ll = gl; // Error, cannot convert address spaces implicitly |
| // for nested pointers. |
| ll = nl; // Error, cannot convert address spaces implicitly |
| // for nested pointers. |
| ll = (local int* local*)gl; // OK to convert explicitly, |
| // but uses of 'll' can result in |
| // in ill-formed program. |
| ll = (local int* local*)nl; // OK to convert explicitly, |
| // but uses of 'll' can result in |
| // in ill-formed program. |
| } |
| ---------- |
| |
| Various clarifications and examples illustrating how changes to ISO/IEC |
| 9899:1999 detailed in <<embedded-c-spec, Embedded C, section 5.3>> apply |
| to OpenCL C with the generic address space. |
| |
| *Clause 6.2.5 - Types*: |
| |
| If address space qualifier on type T is omitted refer to |
| <<addr-spaces-inference,Address Space Inference>>. |
| |
| *Clause 6.3.2.3 - Pointers* |
| |
| Conversions between disjoint address spaces are disallowed in OpenCL, |
| refer to <<addr-spaces-conversions,Address Space Conversions>>. |
| |
| *Clause 6.5.8 - Relational operators*: |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| kernel void test1() |
| { |
| global int arr[5] = { 0, 1, 2, 3, 4 }; |
| int *p = &arr[1]; |
| global int *q = &arr[3]; |
| |
| // q implicitly converted to the generic address space |
| // since the generic address space encloses the global |
| // address space |
| if (q >= p) |
| printf("true\n"); |
| |
| // q implicitly converted to the generic address space |
| // since the generic address space encloses the global |
| // address space |
| if (p <= q) |
| printf("true\n"); |
| } |
| ---------- |
| |
| |
| *Clause 6.5.9 - Equality operators*: |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| int *ptr = NULL; |
| local int lval = SOME_VAL; |
| local int *lptr = &lval; |
| global int gval = SOME_OTHER_VAL; |
| global int *gptr = &gval; |
| |
| ptr = lptr; |
| |
| if (ptr == gptr) // legal |
| { |
| ... |
| } |
| |
| if (ptr == lptr) // legal |
| { |
| ... |
| } |
| |
| if (lptr == gptr) // illegal, compiler error |
| { |
| ... |
| } |
| ---------- |
| |
| Consider the following example: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| bool callee(int *p1, int *p2) |
| { |
| if (p1 == p2) |
| return true; |
| return false; |
| } |
| |
| void caller() |
| { |
| global int *gptr = 0xdeadbeef; |
| private int *pptr = 0xdeadbeef; |
| |
| // behavior of callee is undefined |
| bool b = callee(gptr, pptr); |
| } |
| ---------- |
| |
| The behavior of callee is undefined as gptr and pptr are in different |
| address spaces. |
| The example above would have the same undefined behavior if the equality |
| operator is replaced with a relational operator. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| int *ptr = NULL; |
| local int *lptr = NULL; |
| global int *gptr = NULL; |
| |
| if (ptr == NULL) // legal |
| { |
| ... |
| } |
| |
| if (ptr == lptr) // legal |
| { |
| ... |
| } |
| |
| if (lptr == gptr) // compile-time error |
| { |
| ... |
| } |
| |
| ptr = lptr; // legal |
| |
| intptr l = (intptr_t)lptr; |
| if (l == 0) // legal |
| { |
| ... |
| } |
| |
| if (l == NULL) // legal |
| { |
| ... |
| } |
| ---------- |
| |
| *Clause 6.5.15 - Conditional operator*: |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| kernel void test1() |
| { |
| global int arr[5] = { 0, 1, 2, 3, 4 }; |
| int *p = &arr[1]; |
| global int *q = &arr[3]; |
| local int *r = NULL; |
| int *val = NULL; |
| |
| // legal. 2nd and 3rd operands are in address spaces |
| // that overlap |
| val = (q >= p) ? q : p; |
| |
| // compiler error. 2nd and 3rd operands are in disjoint |
| // address spaces |
| val = (q >= p) ? q : r; |
| } |
| ---------- |
| |
| *Clause 6.5.16.1 - Simple assignment*: |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // Note: these examples assume OpenCL C 2.0 or the |
| // __opencl_c_generic_address_space feature support. |
| |
| kernel void f() |
| { |
| int *ptr; |
| local int *lptr; |
| global int *gptr; |
| local int val = 55; |
| |
| ptr = &val; // legal: implicit cast to generic, then assign |
| lptr = ptr; // illegal: no implicit cast from |
| // generic to local |
| lptr = gptr; // illegal: no implicit cast from |
| // global to local |
| ptr = gptr; // legal: implicit cast from global to generic, |
| // then assign |
| } |
| ---------- |
| |
| *Clause 6.7.3 - Type qualifiers* |
| |
| The type of an object with automatic storage duration are in private address |
| space and therefore can be qualified with `private`/`{private}`. |
| -- |
| |
| |
| [[access-qualifiers]] |
| == Access Qualifiers |
| |
| [open,refpage='accessQualifiers',desc='Access Qualifiers',type='freeform',spec='clang',anchor='access-qualifiers',xrefs='kernel optionalAttributeQualifiers'] |
| -- |
| |
| Image objects specified as arguments to a kernel can be declared to be |
| read-only or write-only. |
| |
| For OpenCL C 2.0, or with the {opencl_c_read_write_images} feature, |
| image objects specified as arguments to a kernel can additionally be |
| declared to be read-write. |
| |
| The `{read_only}` (or `read_only`) access qualifier specifies that the |
| image object is only being read by a kernel or function. |
| The `{write_only}` (or `write_only`) access qualifier specifies that the |
| image object is only being written to by a kernel or function. |
| The `{read_write}` (or `read_write`) access qualifier specifies that the |
| image object may be both read from or written to by a kernel or function. |
| |
| The default access qualifier is `read_only`, if no access qualifier is declared. |
| |
| In the following example |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| foo (read_only image2d_t imageA, |
| write_only image2d_t imageB) |
| { |
| ... |
| } |
| ---------- |
| |
| `imageA` is a read-only 2D image object, and `imageB` is a write-only 2D image |
| object. |
| |
| The sampler-less read image and write image built-ins can be used with image |
| declared with the `{read_write}` (or `read_write`) qualifier. |
| Calls to built-ins that read from an image using a sampler for images |
| declared with the `{read_write}` (or `read_write`) qualifier will be a |
| compilation error. |
| |
| Pipe objects specified as arguments to a kernel also use these access |
| qualifiers. |
| See the <<pipe-functions,detailed description on how these access qualifiers |
| can be used with pipes>>. |
| |
| The `{read_only}`, `{write_only}`, `{read_write}`, `read_only`, |
| `write_only` and `read_write` names are reserved for use as access |
| qualifiers and shall not be used otherwise. |
| -- |
| |
| |
| [[function-qualifiers]] |
| == Function Qualifiers |
| |
| |
| [[kernel-or-kernel]] |
| === `{kernel}` (or `kernel`) |
| |
| [open,refpage='kernel',desc='Qualifiers for Kernel Functions',type='freeform',spec='clang',anchor='kernel-or-kernel',xrefs='accessQualifiers optionalAttributeQualifiers',alias='functionQualifiers'] |
| -- |
| |
| The `{kernel}` (or `kernel`) qualifier declares a function to be a kernel |
| that can be executed by an application on an OpenCL device(s). |
| The following rules apply to functions that are declared with this |
| qualifier: |
| |
| * It can be executed on the device only |
| * It can be called by the host |
| * It is just a regular function call if a `{kernel}` function is called |
| by another kernel function. |
| |
| [NOTE] |
| ==== |
| Kernel functions with variables declared inside the function with the |
| `{local}` or `local` qualifier can be called by the host using appropriate |
| APIs such as *clEnqueueNDRangeKernel*. |
| ==== |
| |
| The `{kernel}` and `kernel` names are reserved for use as functions |
| qualifiers and shall not be used otherwise. |
| -- |
| |
| |
| [[optional-attribute-qualifiers]] |
| === Optional Attribute Qualifiers |
| |
| [open,refpage='optionalAttributeQualifiers',desc='Optional Attribute Qualifiers',type='freeform',spec='clang',anchor='optional-attribute-qualifiers',xrefs='accessQualifiers kernel',alias='reqd_work_group_size vec_type_hint work_group_size_hint'] |
| -- |
| |
| The `{kernel}` qualifier can be used with the keyword __attribute__ to |
| declare additional information about the kernel function as described below. |
| |
| The optional `+__attribute__((vec_type_hint(<type>)))+` |
| footnote:[{fn-vec-type-hint}] is a hint to the compiler and is intended to be a |
| representation of the computational _width_ of the `{kernel}`, and should |
| serve as the basis for calculating processor bandwidth utilization when the |
| compiler is looking to autovectorize the code. |
| In the `+__attribute__((vec_type_hint(<type>)))+` qualifier <type> is one of |
| the built-in vector types listed in <<table-builtin-vector-types,Built-in Vector Data Types>> or the |
| constituent scalar element types. |
| If `vec_type_hint (<type>)` is not specified, the kernel is assumed to have |
| the `+__attribute__((vec_type_hint(int)))+` qualifier. |
| |
| For example, where the developer specified a width of `float4`, the compiler |
| should assume that the computation usually uses up to 4 lanes of a `float` |
| vector, and would decide to merge work-items or possibly even separate one |
| work-item into many threads to better match the hardware capabilities. |
| A conforming implementation is not required to autovectorize code, but shall |
| support the hint. |
| A compiler may autovectorize, even if no hint is provided. |
| If an implementation merges N work-items into one thread, it is responsible |
| for correctly handling cases where the number of `global` or `local` |
| work-items in any dimension modulo N is not zero. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| // autovectorize assuming float4 as the |
| // basic computation width |
| __kernel __attribute__((vec_type_hint(float4))) |
| void foo( __global float4 *p ) { ... } |
| |
| // autovectorize assuming double as the |
| // basic computation width |
| __kernel __attribute__((vec_type_hint(double))) |
| void foo( __global float4 *p ) { ... } |
| |
| // autovectorize assuming int (default) |
| // as the basic computation width |
| __kernel |
| void foo( __global float4 *p ) { ... } |
| ---------- |
| |
| If for example, a `{kernel}` function is declared with |
| |
| [none] |
| * `+__attribute__(( vec_type_hint (float4)))+` |
| |
| (meaning that most operations in the `{kernel}` function are explicitly |
| vectorized using `float4`) and the kernel is running using Intel^{reg}^ |
| Advanced Vector Instructions (Intel^{reg}^ AVX) which implements a |
| 8-float-wide vector unit, the autovectorizer might choose to merge two |
| work-items to one thread, running a second work-item in the high half of the |
| 256-bit AVX register. |
| |
| As another example, a Power4 machine has two scalar double-precision |
| floating-point units with an 6-cycle deep pipe. |
| An autovectorizer for the Power4 machine might choose to interleave six |
| kernels declared with the `+__attribute__(( vec_type_hint (double2)))+` |
| qualifier into one hardware thread, to ensure that there is always 12-way |
| parallelism available to saturate the FPUs. |
| It might also choose to merge 4 or 8 work-items (or some other number) if it |
| concludes that these are better choices, due to resource utilization |
| concerns or some preference for divisibility by 2. |
| |
| The optional `+__attribute__((work_group_size_hint(X, Y, Z)))+` is a hint to |
| the compiler and is intended to specify the work-group size that may be used |
| i.e. value most likely to be specified by the _local_work_size_ argument to |
| *clEnqueueNDRangeKernel*. |
| For example, the `+__attribute__((work_group_size_hint(1, 1, 1)))+` is a |
| hint to the compiler that the kernel will most likely be executed with a |
| work-group size of 1. |
| |
| The optional `+__attribute__((reqd_work_group_size(X, Y, Z)))+` is the |
| work-group size that must be used as the _local_work_size_ argument to |
| *clEnqueueNDRangeKernel*. |
| This allows the compiler to optimize the generated code appropriately for |
| this kernel. |
| |
| If `Z` is one, the _work_dim_ argument to *clEnqueueNDRangeKernel* can be 2 |
| or 3. |
| If `Y` and `Z` are one, the _work_dim_ argument to *clEnqueueNDRangeKernel* |
| can be 1, 2 or 3. |
| -- |
| |
| |
| [[storage-class-specifiers]] |
| == Storage-Class Specifiers |
| |
| [open,refpage='storageSpecifiers',desc='Storage-Class Specifiers',type='freeform',spec='clang',anchor='storage-class-specifiers',alias='typedef extern static'] |
| -- |
| |
| The `typedef` storage-class specifier is supported. |
| The `extern` and `static` storage-class specifiers are supported but |
| <<unified-spec, require>> support for OpenCL C 1.2 or newer. |
| The `auto` and `register` storage-class specifiers are not supported. |
| |
| The `extern` storage-class specifier can only be used for functions (kernel |
| and non-kernel functions) and `global` variables declared in program scope |
| or variables declared inside a function (kernel and non-kernel functions). |
| The `static` storage-class specifier can only be used for non-kernel |
| functions, `global` variables declared in program scope and variables inside |
| a function declared in the `global` or `constant` address space. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| extern constant float4 noise_table[256]; |
| static constant float4 color_table[256]; |
| |
| extern kernel void my_foo(image2d_t img); |
| extern void my_bar(global float *a); |
| |
| kernel void my_func(image2d_t img, global float *a) |
| { |
| extern constant float4 a; |
| static constant float4 b = (float4)(1.0f); // OK. |
| static float c; // Error: No implicit address space |
| global int hurl; // Error: Must be static |
| ... |
| my_foo(img); |
| ... |
| my_bar(a); |
| ... |
| while (1) |
| { |
| static global int inside; // OK. |
| ... |
| } |
| ... |
| } |
| ---------- |
| -- |
| |
| |
| [[restrictions]] |
| == Restrictions |
| |
| [open,refpage='restrictions',desc='Restrictions',type='freeform',spec='clang',anchor='restrictions'] |
| -- |
| |
| [loweralpha] |
| . The use of pointers is somewhat restricted. |
| The following rules apply: |
| * Arguments to kernel functions declared in a program that are pointers |
| must be declared with the `{global}`, `{constant}` or `{local}` |
| qualifier. |
| * A pointer declared with the `{constant}` qualifier can only be |
| assigned to a pointer declared with the `{constant}` qualifier |
| respectively. |
| * Pointers to functions are not allowed. |
| * Arguments to kernel functions in a program cannot be |
| declared as a pointer to a pointer(s). |
| Variables inside a function or arguments to non-kernel functions in a |
| program can be declared as a pointer to a pointer(s). |
| This restriction only applies to OpenCL C 1.2 or below. |
| . An image type (`image2d_t`, `image3d_t`, `image2d_array_t`, `image1d_t`, |
| `image1d_buffer_t` or `image1d_array_t`) can only be used as the type of |
| a function argument. |
| An image function argument cannot be modified. |
| Elements of an image can only be accessed using the built-in |
| <<image-read-and-write-functions,image read and write functions>>. |
| + |
| An image type cannot be used to declare a variable, a structure or union |
| field, an array of images, a pointer to an image, or the return type of a |
| function. |
| An image type cannot be used with the `{global}`, `{private}`, |
| `{local}` and `{constant}` address space qualifiers. |
| + |
| The sampler type (`sampler_t`) can only be used as the type of a function |
| argument or a variable declared in the program scope or the outermost scope |
| of a kernel function. |
| The behavior of a sampler variable declared in a non-outermost scope of a |
| kernel function is implementation-defined. |
| A sampler argument or variable cannot be modified. |
| + |
| The sampler type cannot be used to declare a structure or union field, an |
| array of samplers, a pointer to a sampler, or the return type of a function. |
| The sampler type cannot be used with the `{local}` and `{global}` |
| address space qualifiers. |
| . [[restrictions-bitfield]] Bit-field struct members are currently not |
| supported. |
| . [[restrictions-variable-length]] Variable length arrays and structures |
| with flexible (or unsized) arrays are not supported. |
| . Variadic functions are not supported, with the exception of `printf` and |
| `enqueue_kernel`. |
| . Variadic macros are not supported. |
| This restriction only applies to OpenCL C 2.0 or below. |
| . If a list of parameters in a function declaration is empty, the function |
| takes no arguments. This is due to the above restriction on variadic |
| functions. |
| . Unless defined in the OpenCL specification, the library functions, |
| macros, types, and constants defined in the C99 standard headers |
| `assert.h`, `ctype.h`, `complex.h`, `errno.h`, `fenv.h`, `float.h`, |
| `inttypes.h`, `limits.h`, `locale.h`, `setjmp.h`, `signal.h`, |
| `stdarg.h`, `stdio.h`, `stdlib.h`, `string.h`, `tgmath.h`, `time.h`, |
| `wchar.h` and `wctype.h` are not available and cannot be included by a |
| program. |
| . The `auto` and `register` storage-class specifiers are not supported. |
| . Predefined identifiers are not supported. |
| This restriction only applies to OpenCL C 1.1 or below. |
| . Recursion is not supported. |
| . The return type of a kernel function must be `void`. |
| . Arguments to kernel functions in a program cannot be declared with the |
| built-in scalar types `bool`, `size_t`, `ptrdiff_t`, `intptr_t`, and |
| `uintptr_t` or a struct and/or union that contain fields declared to be |
| one of these built-in scalar types. |
| . `half` is not supported as `half` can be used as a storage format |
| footnote:[{fn-cl_khr_fp16}] only and is not a data type on which |
| floating-point arithmetic can be performed. |
| . Whether or not irreducible control flow is illegal is implementation |
| defined. |
| . The following restriction only applies to |
| ifndef::cl_khr_byte_addressable_store[OpenCL C 1.0: +] |
| ifdef::cl_khr_byte_addressable_store[] |
| OpenCL C 1.0, and only if the `<<cl_khr_byte_addressable_store>>` |
| extension macro is not supported: + |
| endif::cl_khr_byte_addressable_store[] |
| Built-in types that are less than 32-bits in size, i.e. |
| `char`, `uchar`, `char2`, `uchar2`, `short`, `ushort`, and `half`, have |
| the following restriction: |
| + |
| * Writes to a pointer (or arrays) of type `char`, |
| `uchar`, `char2`, `uchar2`, `short`, `ushort`, and `half` or to |
| elements of a struct that are of type `char`, `uchar`, `char2`, |
| `uchar2`, `short` and `ushort` are not supported. |
| Refer to _section 9.9_ for additional information. |
| + |
| The kernel example below shows what memory operations are not supported on |
| built-in types less than 32-bits in size. |
| + |
| [source,opencl_c] |
| ---------- |
| kernel void |
| do_proc (__global char *pA, short b, |
| __global short *pB) |
| { |
| char x[100]; |
| __private char *px = x; |
| int id = (int)get_global_id(0); |
| short f; |
| |
| f = pB[id] + b; // is allowed |
| px[1] = pA[1]; // error. px cannot be written. |
| pB[id] = b; // error. pB cannot be written |
| } |
| ---------- |
| . The type qualifiers `const`, `restrict` and `volatile` as defined by the |
| C99 specification are supported. |
| These qualifiers cannot be used with `image2d_t`, `image3d_t`, |
| `image2d_array_t`, `image2d_depth_t`, `image2d_array_depth_t`, |
| `image1d_t`, `image1d_buffer_t` and `image1d_array_t` types. |
| Types other than pointer types shall not use the `restrict` qualifier. |
| . The event type (`event_t`) cannot be used as the type of a kernel |
| function argument. |
| The event type cannot be used to declare a program scope variable. |
| The event type cannot be used to declare a structure or union field. |
| The event type cannot be used with the `{local}`, `{constant}` and |
| `{global}` address space qualifiers. |
| . The `clk_event_t`, `ndrange_t` and `reserve_id_t` types cannot be used |
| as arguments to kernel functions that get enqueued from the host. |
| The `clk_event_t` and `reserve_id_t` types cannot be declared in program |
| scope. |
| . Kernels enqueued by the host must continue to have their arguments that |
| are a pointer to a type declared to point to a named address space. |
| . A function in an OpenCL program cannot be called `main`. |
| . Implicit function declaration is not supported. |
| . Program scope variables can be defined with any valid OpenCL C data type |
| except for those in <<table-other-builtin-types,Other Built-in Data Types>>. |
| Such program scope variables may be of any user-defined type, or a pointer |
| to a user-defined type. |
| + |
| In the presence of shared virtual memory, these pointers or pointer |
| members should work as expected as long as they are shared virtual memory |
| pointers and the referenced storage has been mapped appropriately. |
| Program scope variables can be declared with `{constant}` address space |
| qualifiers or if {opencl_c_program_scope_global_variables} feature is |
| supported with `{global}` address space qualifier. |
| -- |
| ifdef::cl_khr_initialize_memory[] |
| . [[restrictions-initialize-memory]] The following restriction only |
| applies if the `<<cl_khr_initialize_memory>>` extension is supported: + |
| If the context is created with `CL_CONTEXT_MEMORY_INITIALIZE_KHR`, |
| appropriate memory locations as specified by the bit-field are |
| initialized with zeroes, prior to the start of execution of any kernel. |
| The driver chooses when, prior to kernel execution, the initialization of |
| local and/or private memory is performed. |
| The only requirement is there should be no values set from outside the |
| context, which can be read during a kernel execution. |
| endif::cl_khr_initialize_memory[] |
| |
| |
| [[preprocessor-directives-and-macros]] |
| == Preprocessor Directives and Macros |
| |
| [open,refpage='preprocessorDirectives',desc='Preprocessor Directives and Macros',type='freeform',spec='clang',anchor='preprocessor-directives-and-macros',xrefs='clBuildProgram mathConstants EXTENSION FP_CONTRACT',alias='CL_VERSION_1_0 CL_VERSION_1_1 CL_VERSION_1_2 CL_VERSION_2_0 CL_VERSION_2_1 CL_VERSION_2_2 CL_VERSION_3_0'] |
| -- |
| |
| The preprocessing directives defined by the C99 specification are supported. |
| |
| The *#pragma* directive is described as: |
| |
| [none] |
| * *#pragma* _pp-tokens~opt~_ _new-line_ |
| |
| A *#pragma* directive where the preprocessing token `OPENCL` (used instead |
| of *`STDC`*) does not immediately follow *#pragma* in the directive (prior to |
| any macro replacement) causes the implementation to behave in an |
| implementation-defined manner. |
| The behavior might cause translation to fail or cause the translator or the |
| resulting program to behave in a non-conforming manner. |
| Any such *#pragma* that is not recognized by the implementation is ignored. |
| If the preprocessing token `OPENCL` does immediately follow *#pragma* in the |
| directive (prior to any macro replacement), then no macro replacement is |
| performed on the directive, and the directive shall have one of the |
| following forms whose meanings are described elsewhere: |
| |
| [source,opencl_c] |
| ---------- |
| // on-off-switch is one of ON, OFF, or DEFAULT |
| #pragma OPENCL FP_CONTRACT on-off-switch |
| |
| #pragma OPENCL EXTENSION extensionname : behavior |
| |
| #pragma OPENCL EXTENSION all : behavior |
| ---------- |
| |
| The following predefined macro names are available. |
| |
| `+__FILE__+` :: |
| The presumed name of the current source file (a character string |
| literal). |
| |
| `+__LINE__+` :: |
| The presumed line number (within the current source file) of the current |
| source line (an integer constant). |
| |
| `+__OPENCL_VERSION__+` :: |
| For OpenCL devices with OpenCL version less than or equal to OpenCL 2.0, |
| substitutes an integer value reflecting the OpenCL version supported by the |
| device. |
| This predefined macro is <<unified-spec, deprecated by>> OpenCL 2.1. |
| For OpenCL devices with OpenCL version greater than OpenCL 2.0, it must be |
| defined but may substitute any implementation-defined integer value greater |
| than 200, reflecting OpenCL 2.0. footnote:[{fn-OPENCL_VERSION}] |
| |
| `CL_VERSION_1_0` :: |
| Substitutes the integer 100 reflecting the OpenCL 1.0 version. |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| |
| `CL_VERSION_1_1` :: |
| Substitutes the integer 110 reflecting the OpenCL 1.1 version. |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| |
| `CL_VERSION_1_2` :: |
| Substitutes the integer 120 reflecting the OpenCL 1.2 version. |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| |
| `CL_VERSION_2_0` :: |
| Substitutes the integer 200 reflecting the OpenCL 2.0 version. |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| |
| `CL_VERSION_3_0` :: |
| Substitutes the integer 300 reflecting the OpenCL 3.0 version. |
| <<unified-spec, Requires>> support for OpenCL C 3.0 or newer. |
| |
| `+__OPENCL_C_VERSION__+` :: |
| Substitutes an integer reflecting the OpenCL C version specified by the |
| `-cl-std` build option (see <<opencl-spec,OpenCL Specification>>) to |
| *clBuildProgram* or *clCompileProgram*. |
| If the `-cl-std` build option is not specified, the highest OpenCL C 1.x |
| language version supported by each device is used as the version of |
| OpenCL C when compiling the program for each device. |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| |
| `+__ROUNDING_MODE__+` :: |
| Used to determine the current rounding mode and is set to rte. |
| Only affects the rounding mode of conversions to a float type. |
| <<unified-spec, Deprecated by>> OpenCL C 1.1, along with the |
| `<<cl_khr_select_fprounding_mode>>` extension. |
| |
| `+__ENDIAN_LITTLE__+` :: |
| Used to determine if the OpenCL device is a little endian architecture |
| or a big endian architecture (an integer constant of 1 if device is |
| little endian and is undefined otherwise). |
| Also refer to the value of the <<opencl-device-queries, |
| `CL_DEVICE_ENDIAN_LITTLE` device query>>. |
| |
| `+__kernel_exec(X, typen)+` (and `kernel_exec(X, typen)`) :: |
| is defined as: |
| |
| [source,opencl_c] |
| ---------- |
| __kernel __attribute__((work_group_size_hint(X, 1, 1))) \ |
| __attribute__((vec_type_hint(typen))) |
| ---------- |
| |
| `+__IMAGE_SUPPORT__+` :: |
| Used to determine if the OpenCL device supports images. |
| This is an integer constant of 1 if images are supported and is |
| undefined otherwise. |
| Also refer to the value of the <<opencl-device-queries, |
| `CL_DEVICE_IMAGE_SUPPORT` device query>> and the {opencl_c_images} |
| feature. |
| |
| `+__FAST_RELAXED_MATH__+` :: |
| Used to determine if the `-cl-fast-relaxed-math` optimization option is |
| specified in build options given to *clBuildProgram* or |
| *clCompileProgram*. |
| This is an integer constant of 1 if the `-cl-fast-relaxed-math` build |
| option is specified and is undefined otherwise. |
| |
| The `NULL` macro expands to a null pointer constant. |
| An integer constant expression with the value 0, or such an expression cast |
| to type `void *` is called a _null pointer constant_. |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| |
| The macro names defined by the C99 specification but not currently supported |
| by OpenCL are reserved for future use. |
| |
| The predefined identifier `+__func__+` is available. |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| |
| In OpenCL C 3.0 or newer there are a number of optional predefined macros |
| indicating optional language features. Such macros are listed in the |
| <<table-optional-lang-features, optional features in OpenCL C 3.0 table>>. |
| -- |
| |
| |
| [[attribute-qualifiers]] |
| == Attribute Qualifiers |
| |
| // [open,refpage='attribute',desc='Attribute Qualifiers',type='freeform',spec='clang',anchor='attribute-qualifiers',xrefs='attributes-types attributes-variables attributes-blocksAndControlFlow attributes-loopUnroll'] |
| // -- |
| |
| This section describes the syntax with which `+__attribute__+` may be used, |
| and the constructs to which attribute specifiers bind. |
| |
| An attribute specifier is of the form |
| |
| `+__attribute__ ((_attribute-list_))+`. |
| |
| An attribute list is defined as: |
| |
| [role="bnf"] |
| -- |
| _attribute-list_ : :: |
| _attribute~opt~_ + |
| _attribute-list_ , _attribute~opt~_ |
| |
| _attribute_ : :: |
| _attribute-token_ _attribute-argument-clause~opt~_ |
| |
| _attribute-token_ : :: |
| _identifier_ |
| |
| _attribute-argument-clause_ : :: |
| ( _attribute-argument-list_ ) |
| |
| _attribute-argument-list_ : :: |
| _attribute-argument_ + |
| _attribute-argument-list_ , _attribute-argument_ |
| |
| _attribute-argument_ : :: |
| _assignment-expression_ |
| -- |
| |
| This syntax is taken directly from GCC but unlike GCC, which allows |
| attributes to be applied only to functions, types, and variables, OpenCL |
| attributes can be associated with: |
| |
| * types; |
| * functions; |
| * variables; |
| * blocks; and |
| * control-flow statements. |
| |
| In general, the rules for how an attribute binds, for a given context, are |
| non-trivial and the reader is pointed to GCC's documentation and Maurer and |
| Wong's paper [See 16. |
| and 17. |
| in _section 11_ - *References*] for the details. |
| |
| // -- end 'attribute' open block |
| |
| [[specifying-attributes-of-types]] |
| === Specifying Attributes of Types |
| |
| [open,refpage='attributes-types',desc='Specifying Attribute of Types',type='freeform',spec='clang',anchor='specifying-attributes-of-types'] |
| -- |
| |
| The keyword `+__attribute__+` allows you to specify special attributes of |
| enum, struct and union types when you define such types. |
| This keyword is followed by an attribute specification inside double |
| parentheses. |
| Two attributes are currently defined for types: aligned, and packed. |
| |
| You may specify type attributes in an enum, struct or union type declaration |
| or definition, or for other types in a `typedef` declaration. |
| |
| For an enum, struct or union type, you may specify attributes either between |
| the enum, struct or union tag and the name of the type, or just past the |
| closing curly brace of the _definition_. |
| The former syntax is preferred. |
| |
| `aligned (_alignment_)` :: |
| This attribute specifies a minimum alignment (in bytes) for variables of the |
| specified type. |
| + |
| -- |
| For example, the declarations: |
| |
| [source,opencl_c] |
| ---------- |
| struct S { short f[3]; } __attribute__ ((aligned (8))); |
| typedef int more_aligned_int __attribute__ ((aligned (8))); |
| ---------- |
| |
| force the compiler to ensure (as far as it can) that each variable whose |
| type is `struct S` or `more_aligned_int` will be allocated and aligned _at |
| least_ on a 8-byte boundary. |
| |
| Note that the alignment of any given struct or union type is required by the |
| ISO C standard to be at least a perfect multiple of the lowest common |
| multiple of the alignments of all of the members of the struct or union in |
| question and must also be a power of two. |
| This means that you _can_ effectively adjust the alignment of a struct or |
| union type by attaching an aligned attribute to any one of the members of |
| such a type, but the notation illustrated in the example above is a more |
| obvious, intuitive, and readable way to request the compiler to adjust the |
| alignment of an entire struct or union type. |
| |
| As in the preceding example, you can explicitly specify the alignment (in |
| bytes) that you wish the compiler to use for a given struct or union type. |
| Alternatively, you can leave out the alignment factor and just ask the |
| compiler to align a type to the maximum useful alignment for the target |
| machine you are compiling for. |
| For example, you could write: |
| |
| [source,opencl_c] |
| ---------- |
| struct S { short f[3]; } __attribute__ ((aligned)); |
| ---------- |
| |
| Whenever you leave out the alignment factor in an aligned attribute |
| specification, the compiler automatically sets the alignment for the type to |
| the largest alignment which is ever used for any data type on the target |
| machine you are compiling for. |
| In the example above, the size of each `short` is 2 bytes, and therefore the |
| size of the entire `struct S` type is 6 bytes. |
| The smallest power of two which is greater than or equal to that is 8, so |
| the compiler sets the alignment for the entire `struct S` type to 8 bytes. |
| |
| Note that the effectiveness of aligned attributes may be limited by inherent |
| limitations of the OpenCL device and compiler. |
| For some devices, the OpenCL compiler may only be able to arrange for |
| variables to be aligned up to a certain maximum alignment. |
| If the OpenCL compiler is only able to align variables up to a maximum of 8 |
| byte alignment, then specifying `aligned(16)` in an `+__attribute__+` will |
| still only provide you with 8 byte alignment. |
| See your platform-specific documentation for further information. |
| |
| The aligned attribute can only increase the alignment; but you can decrease |
| it by specifying packed as well. |
| See below. |
| -- |
| |
| `packed` :: |
| This attribute, attached to struct or union type definition, specifies that |
| each member of the structure or union is placed to minimize the memory |
| required. |
| When attached to an enum definition, it indicates that the smallest integral |
| type should be used. |
| + |
| -- |
| Specifying this attribute for struct and union types is equivalent to |
| specifying the packed attribute on each of the structure or union members. |
| |
| In the following example, the members of `my_packed_struct` are packed |
| closely together, but the internal layout of its `s` member is not packed. |
| To do that, struct `my_unpacked_struct` would need to be packed, too. |
| |
| [source,opencl_c] |
| ---------- |
| struct my_unpacked_struct |
| { |
| char c; |
| int i; |
| }; |
| |
| struct __attribute__ ((packed)) my_packed_struct |
| { |
| char c; |
| int i; |
| struct my_unpacked_struct s; |
| }; |
| ---------- |
| |
| You may only specify this attribute on the definition of a enum, struct or |
| union, not on a `typedef` which does not also define the enumerated type, |
| structure or union. |
| -- |
| -- |
| |
| [[specifying-attributes-of-functions]] |
| === Specifying Attributes of Functions |
| |
| See <<function-qualifiers,Function Qualifiers>> for the function attribute |
| qualifiers currently supported. |
| |
| |
| [[specifying-attributes-of-variables]] |
| === Specifying Attributes of Variables |
| |
| [open,refpage='attributes-variables',desc='Specifying Attribute of Variables',type='freeform',spec='clang',anchor='specifying-attributes-of-variables'] |
| -- |
| |
| The keyword `+__attribute__+` allows you to specify special attributes of |
| variables or structure fields. |
| This keyword is followed by an attribute specification inside double |
| parentheses. |
| The following attribute qualifiers are currently defined: |
| |
| `aligned (_alignment_)` :: |
| |
| This attribute specifies a minimum alignment for the variable or structure |
| field, measured in bytes. |
| For example, the declaration: |
| + |
| [source,opencl_c] |
| ---------- |
| int x __attribute__ ((aligned (16))) = 0; |
| ---------- |
| + |
| causes the compiler to allocate the global variable `x` on a 16-byte |
| boundary. |
| The alignment value specified must be a power of two. |
| + |
| You can also specify the alignment of structure fields. |
| For example, to create a double-word aligned `int` pair, you could write: |
| + |
| [source,opencl_c] |
| ---------- |
| struct foo { int x[2] __attribute__ ((aligned (8))); }; |
| ---------- |
| + |
| This is an alternative to creating a union with a `double` member that |
| forces the union to be double-word aligned. |
| + |
| As in the preceding examples, you can explicitly specify the alignment (in |
| bytes) that you wish the compiler to use for a given variable or structure |
| field. |
| Alternatively, you can leave out the alignment factor and just ask the |
| compiler to align a variable or field to the maximum useful alignment for |
| the target machine you are compiling for. |
| For example, you could write: |
| + |
| [source,opencl_c] |
| ---------- |
| short array[3] __attribute__ ((aligned)); |
| ---------- |
| + |
| Whenever you leave out the alignment factor in an aligned attribute |
| specification, the OpenCL compiler automatically sets the alignment for the |
| declared variable or field to the largest alignment which is ever used for |
| any data type on the target device you are compiling for. |
| + |
| When used on a struct, or struct member, the aligned attribute can only |
| increase the alignment; in order to decrease it, the packed attribute must |
| be specified as well. |
| When used as part of a `typedef`, the aligned attribute can both increase |
| and decrease alignment, and specifying the packed attribute will generate a |
| warning. |
| + |
| Note that the effectiveness of aligned attributes may be limited by inherent |
| limitations of the OpenCL device and compiler. |
| For some devices, the OpenCL compiler may only be able to arrange for |
| variables to be aligned up to a certain maximum alignment. |
| If the OpenCL compiler is only able to align variables up to a maximum of 8 |
| byte alignment, then specifying `aligned(16)` in an `+__attribute__+` will |
| still only provide you with 8 byte alignment. |
| See your platform-specific documentation for further information. |
| |
| `packed` :: |
| |
| The packed attribute specifies that a variable or structure field should |
| have the smallest possible alignment -- one byte for a variable, unless you |
| specify a larger value with the aligned attribute. |
| + |
| Here is a structure in which the field `x` is packed, so that it immediately |
| follows a: |
| + |
| [source,opencl_c] |
| ---------- |
| struct foo |
| { |
| char a; |
| int x[2] __attribute__ ((packed)); |
| }; |
| ---------- |
| + |
| An attribute list placed at the beginning of a user-defined type applies to |
| the variable of that type and not the type, while attributes following the |
| type body apply to the type. |
| + |
| For example: |
| + |
| [source,opencl_c] |
| ---------- |
| /* a has alignment of 128 */ |
| __attribute__((aligned(128))) struct A {int i;} a; |
| |
| /* b has alignment of 16 */ |
| __attribute__((aligned(16))) struct B {double d;} |
| __attribute__((aligned(32))) b ; |
| |
| struct A a1; /* a1 has alignment of 4 */ |
| |
| struct B b1; /* b1 has alignment of 32 */ |
| ---------- |
| |
| `endian (_endiantype_)` :: |
| |
| The endian attribute determines the byte ordering of a variable. |
| _endiantype_ can be set to `host` indicating the variable uses the |
| endianness of the host processor or can be set to `device` indicating the |
| variable uses the endianness of the device on which the kernel will be |
| executed. |
| The default is `device`. |
| + |
| For example: |
| + |
| [source,opencl_c] |
| ---------- |
| global float4 *p __attribute__ ((endian(host))); |
| ---------- |
| + |
| specifies that data stored in memory pointed to by p will be in the host |
| endian format. |
| + |
| The endian attribute can only be applied to pointer types that are in the |
| `global` or `constant` address space. |
| The endian attribute cannot be used for variables that are not a pointer |
| type. |
| The endian attribute value for both pointers must be the same when one |
| pointer is assigned to another. |
| |
| `nosvm` :: |
| The `nosvm` attribute can be used with a pointer variable to inform the |
| compiler that the pointer does not refer to a shared virtual memory region. |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| // __attribute__((nosvm)) was both added and deprecated in OpenCL C 2.0, |
| // presumably it was a later revision that deprecated the attribute. |
| |
| [NOTE] |
| ==== |
| The `nosvm` attribute is deprecated, and the compiler can ignore it. |
| ==== |
| |
| -- |
| |
| |
| [[specifying-attributes-of-blocks-and-control-flow-statements]] |
| === Specifying Attributes of Blocks and Control-Flow-Statements |
| |
| [open,refpage='attributes-blocksAndControlFlow',desc='Specifying Attribute of Blocks and Control-Flow-Statements',type='freeform',spec='clang',anchor='specifying-attributes-of-blocks-and-control-flow-statements'] |
| -- |
| |
| For basic blocks and control-flow-statements the attribute is placed before |
| the structure in question, for example: |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((attr1)) {...} |
| |
| for __attribute__((attr2)) (...) __attribute__((attr3)) {...} |
| ---------- |
| |
| Here `attr1` applies to the block in braces and `attr2` and `attr3` apply to |
| the loop's control construct and body, respectively. |
| |
| No attribute qualifiers for blocks and control-flow-statements are currently |
| defined. |
| -- |
| |
| |
| [[specifying-attribute-for-unrolling-loops]] |
| === Specifying Attribute for Unrolling Loops |
| |
| [open,refpage='attributes-loopUnroll',desc='Specifying Attribute For Unrolling Loops',type='freeform',spec='clang',anchor='specifying-attribute-for-unrolling-loops'] |
| -- |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0 or newer. |
| |
| The `+__attribute__((opencl_unroll_hint))+` and |
| `+__attribute__((opencl_unroll_hint(n)))+` attribute qualifiers can be used |
| to specify that a loop (for, while and do loops) can be unrolled. |
| This attribute qualifier can be used to specify full unrolling or partial |
| unrolling by a specified amount. |
| This is a compiler hint and the compiler may ignore this directive. |
| |
| n is the loop unrolling factor and must be a positive integral compile time |
| constant expression. |
| An unroll factor of 1 disables unrolling. |
| If n is not specified, the compiler determines the unrolling factor for the |
| loop. |
| |
| [NOTE] |
| ==== |
| The `+__attribute__((opencl_unroll_hint(n)))+` attribute qualifier must |
| appear immediately before the loop to be affected. |
| ==== |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((opencl_unroll_hint(2))) |
| while (*s != 0) |
| *p++ = *s++; |
| ---------- |
| |
| The tells the compiler to unroll the above while loop by a factor of 2. |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((opencl_unroll_hint)) |
| for (int i=0; i<2; i++) |
| { |
| ... |
| } |
| ---------- |
| |
| In the example above, the compiler will determine how much to unroll the |
| loop. |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((opencl_unroll_hint(1))) |
| for (int i=0; i<32; i++) |
| { |
| ... |
| } |
| ---------- |
| |
| The above is an example where the loop should not be unrolled. |
| |
| Below are some examples of invalid usage of |
| `+__attribute__((opencl_unroll_hint(n)))+`. |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((opencl_unroll_hint(-1))) |
| while (...) |
| { |
| ... |
| } |
| ---------- |
| |
| The above example is an invalid usage of the loop unroll factor as the loop |
| unroll factor is negative. |
| |
| [source,opencl_c] |
| ---------- |
| __attribute__((opencl_unroll_hint)) |
| if (...) |
| { |
| ... |
| } |
| ---------- |
| |
| The above example is invalid because the unroll attribute qualifier is used |
| on a non-loop construct |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| my_kernel( ... ) |
| { |
| int x; |
| __attribute__((opencl_unroll_hint(x)) |
| for (int i=0; i<x; i++) |
| { |
| ... |
| } |
| } |
| ---------- |
| |
| The above example is invalid because the loop unroll factor is not a |
| compile-time constant expression. |
| -- |
| |
| |
| [[extending-attribute-qualifiers]] |
| === Extending Attribute Qualifiers |
| |
| The attribute syntax can be extended for standard language extensions and |
| vendor specific extensions. |
| Any extensions should follow the naming conventions outlined in the |
| introduction to <<opencl-extension-spec,section 9 in the OpenCL 2.0 |
| Extension Specification>>. |
| |
| Attributes are intended as useful hints to the compiler. |
| It is our intention that a particular implementation of OpenCL be free to |
| ignore all attributes and the resulting executable binary will produce the |
| same result. |
| This does not preclude an implementation from making use of the additional |
| information provided by attributes and performing optimizations or other |
| transformations as it sees fit. |
| In this case it is the programmer's responsibility to guarantee that the |
| information provided is in some sense correct. |
| |
| |
| [[blocks]] |
| == Blocks |
| |
| [open,refpage='blocks',desc='Blocks',type='freeform',spec='clang',anchor='blocks'] |
| -- |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_device_enqueue} feature. |
| |
| This section describes the clang block syntax |
| footnote:[{fn-clang-block-syntax}]. |
| |
| Like function types, the Block type is a pair consisting of a result value |
| type and a list of parameter types very similar to a function type. |
| Blocks are intended to be used much like functions with the key distinction |
| being that in addition to executable code they also contain various variable |
| bindings to automatic (stack) or `global` memory. |
| -- |
| |
| |
| [[declaring-and-using-a-block]] |
| === Declaring and Using a Block |
| |
| You use the ^ operator to declare a Block variable and to indicate the |
| beginning of a Block literal. |
| The body of the Block itself is contained within {}, as shown in this |
| example (as usual with C, ; indicates the end of the statement): |
| |
| The example is explained in the following illustration: |
| |
| // JPG size: 556 x 228 |
| image:images/block_example.jpg[align="center",title="Block Example"] |
| |
| Notice that the Block is able to make use of variables from the same scope |
| in which it was defined. |
| |
| If you declare a Block as a variable, you can then use it just as you would |
| a function: |
| |
| [source,opencl_c] |
| ---------- |
| int multiplier = 7; |
| |
| int (^myBlock)(int) = ^(int num) { |
| return num * multiplier; |
| }; |
| |
| printf("%d\n", myBlock(3)); |
| // prints 21 |
| ---------- |
| |
| |
| [[declaring-a-block-reference]] |
| === Declaring a Block Reference |
| |
| Block variables hold references to Blocks. |
| You declare them using syntax similar to that you use to declare a pointer |
| to a function, except that you use ^ instead of *. |
| The Block type fully interoperates with the rest of the C type system. |
| The following are valid Block variable declarations: |
| |
| [source,opencl_c] |
| ---------- |
| void (^blockReturningVoidWithVoidArgument)(void); |
| int (^blockReturningIntWithIntAndCharArguments)(int, char); |
| ---------- |
| |
| A Block that takes no arguments must specify `void` in the argument list. |
| A Block reference may not be dereferenced via the pointer dereference |
| operation *, and thus a Block's size may not be computed at compile time. |
| |
| Blocks are designed to be fully type safe by giving the compiler a full set |
| of metadata to use to validate use of Blocks, parameters passed to blocks, |
| and assignment of the return value. |
| |
| You can also create types for Blocks -- doing so is generally considered to |
| be best practice when you use a block with a given signature in multiple |
| places: |
| |
| [source,opencl_c] |
| ---------- |
| typedef float (^MyBlockType)(float, float); |
| |
| MyBlockType myFirstBlock = // ...; |
| MyBlockType mySecondBlock = // ...; |
| ---------- |
| |
| |
| [[block-literal-expressions]] |
| === Block Literal Expressions |
| |
| A Block literal expression produces a reference to a Block. |
| It is introduced by the use of the *^* token as a unary operator. |
| |
| [role="bnf"] |
| -- |
| Block_literal_expression : :: |
| ^ _block_decl_ _compound_statement_body_ |
| |
| _block_decl_ : :: |
| empty + |
| _parameter_list_ + |
| _type_expression_ |
| -- |
| |
| where _type_expression_ is extended to allow ^ as a Block reference where * |
| is allowed as a function reference. |
| |
| The following Block literal: |
| |
| [source,opencl_c] |
| ---------- |
| ^ void (void) { printf("hello world**\n**"); } |
| ---------- |
| |
| produces a reference to a Block with no arguments with no return value. |
| |
| The return type is optional and is inferred from the return statements. |
| If the return statements return a value, they all must return a value of the |
| same type. |
| If there is no value returned the inferred type of the Block is `void`; |
| otherwise it is the type of the return statement value. |
| If the return type is omitted and the argument list is `( void )`, the `( |
| void )` argument list may also be omitted. |
| |
| So: |
| |
| [source,opencl_c] |
| ---------- |
| ^ ( void ) { printf("hello world**\n**"); } |
| ---------- |
| |
| and: |
| |
| [source,opencl_c] |
| ---------- |
| ^ { printf("hello world**\n**"); } |
| ---------- |
| |
| are exactly equivalent constructs for the same expression. |
| |
| The compound statement body establishes a new lexical scope within that of |
| its parent. |
| Variables used within the scope of the compound statement are bound to the |
| Block in the normal manner with the exception of those in automatic (stack) |
| storage. |
| Thus one may access functions and global variables as one would expect, as |
| well as `static` local variables. |
| |
| Local automatic (stack) variables referenced within the compound statement |
| of a Block are imported and captured by the Block as const copies. |
| The capture (binding) is performed at the time of the Block literal |
| expression evaluation. |
| |
| The compiler is not required to capture a variable if it can prove that no |
| references to the variable will actually be evaluated. |
| |
| The lifetime of variables declared in a Block is that of a function.. |
| |
| Block literal expressions may occur within Block literal expressions |
| (nested) and all variables captured by any nested blocks are implicitly also |
| captured in the scopes of their enclosing Blocks. |
| |
| A Block literal expression may be used as the initialization value for Block |
| variables at global or local `static` scope. |
| |
| You can also declare a Block as a global literal in program scope. |
| |
| [source,opencl_c] |
| ---------- |
| int GlobalInt = 0; |
| |
| int (^getGlobalInt)(void) = ^{ return GlobalInt; }; |
| ---------- |
| |
| |
| [[control-flow]] |
| === Control Flow |
| |
| The compound statement of a Block is treated much like a function body with |
| respect to control flow in that continue, break and goto do not escape the |
| Block. |
| |
| |
| [[restrictions-1]] |
| === Restrictions |
| |
| The following Blocks features are currently not supported in OpenCL C. |
| |
| * The `+__block+` storage type. |
| * The *Block_copy*() and *Block_release*() functions that copy and release |
| Blocks. |
| * Blocks with variadic arguments. |
| * Arrays of Blocks. |
| * Blocks as structures and union members. |
| |
| Block literals are assumed to allocate memory at the point of definition and |
| to be destroyed at the end of the same scope. |
| To support these behaviors, additional restrictions |
| footnote:[{fn-clang-block-function-pointers}] in addition to the above feature |
| restrictions are: |
| |
| * Block variables must be defined and used in a way that allows them to be |
| statically determinable at build or "`link to executable`" time. |
| In particular: |
| ** Block variables assigned in one scope must be used only with the same |
| or any nested scope. |
| ** The `extern` storage-class specified cannot be used with program scope |
| block variables. |
| ** Block variable declarations are implicitly qualified with const. |
| Therefore all block variables must be initialized at declaration time |
| and may not be reassigned. |
| ** A block cannot be a return value or a parameter of a function. |
| ** Blocks cannot be used as expressions of the ternary selection operator |
| (*?:*). |
| * The unary operators (*+*+*) and (*&*) cannot be used with a Block. |
| * Pointers to Blocks are not allowed. |
| * A Block cannot capture another Block variable declared in the outer |
| scope (Example 4). |
| * Block capture semantics follows regular C argument passing convention, |
| i.e. arrays are captured by reference (decayed to pointers) and structs |
| are captured by value (Example 5). |
| |
| Some examples that describe legal and illegal issue of Blocks in OpenCL C |
| are described below. |
| |
| Example 1: |
| |
| [source,opencl_c] |
| ---------- |
| void foo(int *x, int (^bar)(int, int)) |
| { |
| *x = bar(*x, *x); |
| } |
| |
| kernel |
| void k(global int *x, global int *z) |
| { |
| if (some expression) |
| foo(x, ^int(int x, int y){return x+y+*z;}); // legal |
| else |
| foo(x, ^int(int x, int y){return (x*y)-*z;}); // legal |
| } |
| ---------- |
| |
| Example 2: |
| |
| [source,opencl_c] |
| ---------- |
| kernel |
| void k(global int *x, global int *z) |
| { |
| int ^(tmp)(int, int); |
| if (some expression) |
| { |
| tmp = ^int(int x, int y){return x+y+*z;}); // illegal |
| } |
| *x = foo(x, tmp); |
| } |
| ---------- |
| |
| Example 3: |
| |
| [source,opencl_c] |
| ---------- |
| int GlobalInt = 0; |
| int (^getGlobalInt)(void) = ^{ return GlobalInt; }; // legal |
| int (^getAnotherGlobalInt)(void); // illegal |
| extern int (^getExternGlobalInt)(void); // illegal |
| |
| void foo() |
| |
| { |
| ... |
| getGlobalInt = ^{ return 0; }; // illegal - cannot assign to |
| // a global block variable |
| ... |
| } |
| ---------- |
| |
| Example 4: |
| |
| [source,opencl_c] |
| ---------- |
| void (^bl0)(void) = ^{ |
| ... |
| }; |
| |
| kernel void k() |
| { |
| void(^bl1)(void) = ^{ |
| ... |
| }; |
| |
| void(^bl2)(void) = ^{ |
| bl0(); // legal because bl0 is a global |
| // variable available in this scope |
| bl1(); // illegal because bl1 would have to be captured |
| }; |
| } |
| ---------- |
| |
| Example 5: |
| |
| [source,opencl_c] |
| ---------- |
| struct v { |
| int arr[2]; |
| } s = {0, 1}; |
| |
| void (^bl1)() = ^(){printf("%d\n", s.arr[1]);}; |
| // array content copied into captured struct location |
| |
| int arr[2] = {0, 1}; |
| void (^bl2)() = ^(){printf("%d\n", arr[1]);}; |
| // array decayed to pointer while captured |
| |
| s.arr[1] = arr[1] = 8; |
| |
| bl1(); // prints - 1 |
| bl2(); // prints - 8 |
| ---------- |
| |
| |
| [[built-in-functions]] |
| == Built-in Functions |
| |
| The OpenCL C programming language provides a rich set of built-in functions |
| for scalar and vector operations. |
| Many of these functions are similar to the function names provided in common |
| C libraries but they support scalar and vector argument types. |
| Applications should use the built-in functions wherever possible instead of |
| writing their own version. |
| |
| User defined OpenCL C functions behave per C standard rules for functions as |
| defined in <<C99-spec,section 6.9.1 of the C99 Specification>>. |
| On entry to the function, the size of each variably modified parameter is |
| evaluated and the value of each argument expression is converted to the type |
| of the corresponding parameter as per the |
| <<usual-arithmetic-conversions,usual arithmetic conversion rules>>. |
| Built-in functions described in this section behave similarly, except that |
| in order to avoid ambiguity between multiple forms of the same built-in |
| function, implicit scalar widening shall not occur. |
| Note that some built-in functions described in this section do have forms |
| that operate on mixed scalar and vector types, however. |
| |
| |
| [[work-item-functions]] |
| === Work-Item Functions |
| |
| [open,refpage='workItemFunctions',desc='Work-Item Functions',type='freeform',spec='clang',anchor='work-item-functions',xrefs='',alias='get_enqueued_local_size get_global_id get_global_linear_id get_global_offset get_global_size get_group_id get_local_id get_local_linear_id get_local_size get_num_groups get_work_dim'] |
| -- |
| The following table describes the list of built-in work-item functions that |
| can be used to query the number of dimensions, the global and local work |
| size specified to *clEnqueueNDRangeKernel*, and the global and local |
| identifier of each work-item when this kernel is being executed on a device. |
| |
| [[table-work-item-functions]] |
| .Built-in Work-Item Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | uint *get_work_dim*() |
| | Returns the number of dimensions in use. |
| This is the value given to the _work_dim_ argument specified in |
| *clEnqueueNDRangeKernel*. |
| | size_t *get_global_size*(uint _dimindx_) |
| | Returns the number of global work-items specified for dimension |
| identified by _dimindx_. |
| This value is given by the _global_work_size_ argument to |
| *clEnqueueNDRangeKernel*. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_global_size*() returns 1. |
| | size_t *get_global_id*(uint _dimindx_) |
| | Returns the unique global work-item ID value for dimension identified |
| by _dimindx_. |
| The global work-item ID specifies the work-item ID based on the number |
| of global work-items specified to execute the kernel. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_global_id*() returns 0. |
| | size_t *get_local_size*(uint _dimindx_) |
| | Returns the number of local work-items specified in dimension |
| identified by _dimindx_. |
| This value is at most the value given by the _local_work_size_ |
| argument to *clEnqueueNDRangeKernel* if _local_work_size_ is not |
| `NULL`; otherwise the OpenCL implementation chooses an appropriate |
| _local_work_size_ value which is returned by this function. |
| If the kernel is executed with a non-uniform work-group size |
| footnote:[{fn-non-uniform-work-groups}], calls to this built-in from some |
| work-groups may return different values than calls to this built-in from |
| other work-groups. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_local_size*() returns 1. |
| | size_t *get_enqueued_local_size*( |
| uint _dimindx_) |
| | Returns the same value as that returned by *get_local_size*(_dimindx_) |
| if the kernel is executed with a uniform work-group size. |
| |
| If the kernel is executed with a non-uniform work-group size, returns |
| the number of local work-items in each of the work-groups that make up |
| the uniform region of the global range in the dimension identified by |
| _dimindx_. |
| If the _local_work_size_ argument to *clEnqueueNDRangeKernel* is not |
| `NULL`, this value will match the value specified in |
| _local_work_size_[_dimindx_]. |
| If _local_work_size_ is `NULL`, this value will match the local size |
| that the implementation determined would be most efficient at |
| implementing the uniform region of the global range. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_enqueued_local_size*() returns 1. |
| |
| <<unified-spec, Requires>> support for OpenCL 2.0 or newer. |
| | size_t *get_local_id*(uint _dimindx_) |
| | Returns the unique local work-item ID, i.e. a work-item within a |
| specific work-group for dimension identified by _dimindx_. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_local_id*() returns 0. |
| | size_t *get_num_groups*(uint _dimindx_) |
| | Returns the number of work-groups that will execute a kernel for |
| dimension identified by _dimindx_. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values of _dimindx_, *get_num_groups*() returns 1. |
| | size_t *get_group_id*(uint _dimindx_) |
| | *get_group_id* returns the work-group ID which is a number from 0 .. |
| *get_num_groups*(_dimindx_) - 1. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values, *get_group_id*() returns 0. |
| | size_t *get_global_offset*(uint _dimindx_) |
| | *get_global_offset* returns the offset values specified in |
| _global_work_offset_ argument to *clEnqueueNDRangeKernel*. |
| |
| Valid values of _dimindx_ are 0 to *get_work_dim*() - 1. |
| For other values, *get_global_offset*() returns 0. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | size_t *get_global_linear_id*() |
| | Returns the work-items 1-dimensional global ID. |
| |
| For 1D work-groups, it is computed as *get_global_id*(0) - |
| *get_global_offset*(0). |
| |
| For 2D work-groups, it is computed as (*get_global_id*(1) - |
| *get_global_offset*(1)) * *get_global_size*(0) {plus} (*get_global_id*(0) - |
| *get_global_offset*(0)). |
| |
| For 3D work-groups, it is computed as +((+*get_global_id*(2) - |
| *get_global_offset*(2)+)+ * *get_global_size*(1) * *get_global_size*(0)+)+ |
| {plus} +((+*get_global_id*(1) - *get_global_offset*(1)+)+ * *get_global_size*(0)+)+ |
| {plus} (*get_global_id*(0) - *get_global_offset*(0)+)+. |
| |
| <<unified-spec, Requires>> support for OpenCL 2.0 or newer. |
| | size_t *get_local_linear_id*() |
| | Returns the work-items 1-dimensional local ID. |
| |
| For 1D work-groups, it is the same value as |
| |
| *get_local_id*(0). |
| |
| For 2D work-groups, it is computed as |
| |
| *get_local_id*(1) * *get_local_size*(0) {plus} *get_local_id*(0). |
| |
| For 3D work-groups, it is computed as |
| |
| (*get_local_id*(2) * *get_local_size*(1) * *get_local_size*(0)) {plus} |
| (*get_local_id*(1) * *get_local_size*(0)) {plus} *get_local_id*(0). |
| |
| <<unified-spec, Requires>> support for OpenCL 2.0 or newer. |
| |==== |
| |
| NOTE: The functionality described in the following table <<unified-spec, |
| requires>> support for |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} feature. |
| |
| The following table describes the list of built-in work-item functions that |
| can be used to query the size of a sub-group, number of sub-groups per work-group, |
| and identifier of the sub-group within a work-group and work-item within a |
| sub-group when this kernel is being executed on a device. |
| |
| [[table-subgroup-work-item-functions]] |
| .Built-in Work-Item Functions for Sub-Groups |
| [cols="a,",options="header",] |
| |==== |
| | Function | Description |
| |
| | uint *get_sub_group_size*() |
| | Returns the number of work-items in the sub-group. |
| This value is no more than the maximum sub-group size and is |
| implementation-defined based on a combination of the compiled kernel and |
| the dispatch dimensions. |
| This will be a constant value for the lifetime of the sub-group. |
| |
| | uint *get_max_sub_group_size*() |
| | Returns the maximum size of a sub-group within the dispatch. |
| This value will be invariant for a given set of dispatch dimensions and a |
| kernel object compiled for a given device. |
| |
| | uint *get_num_sub_groups*() |
| | Returns the number of sub-groups that the current work-group is divided |
| into. |
| |
| This number will be constant for the duration of a work-group's execution. |
| If the kernel is executed with a non-uniform work-group size |
| (i.e. the global_work_size values specified to *clEnqueueNDRangeKernel* |
| are not evenly divisible by the local_work_size values for any dimension, |
| calls to this built-in from some work-groups may return different values |
| than calls to this built-in from other work-groups. |
| |
| | uint *get_enqueued_num_sub_groups*() |
| | Returns the same value as that returned by *get_num_sub_groups* if the |
| kernel is executed with a uniform work-group size. |
| |
| If the kernel is executed with a non-uniform work-group size, returns the |
| number of sub-groups in each of the work-groups that make up the uniform |
| region of the global range. |
| |
| | uint *get_sub_group_id*() |
| | *get_sub_group_id* returns the sub-group ID which is a number from 0 .. |
| *get_num_sub_groups*() - 1. |
| |
| For *clEnqueueTask*, this returns 0. |
| |
| | uint *get_sub_group_local_id*() |
| | Returns the unique work-item ID within the current sub-group. |
| The mapping from *get_local_id*(__dimindx__) to *get_sub_group_local_id* |
| will be invariant for the lifetime of the work-group. |
| |
| |==== |
| -- |
| |
| |
| [[math-functions]] |
| === Math Functions |
| |
| [open,refpage='mathFunctions',desc='Math Functions',type='freeform',spec='clang',anchor='math-functions',xrefs='commonFunctions integerFunctions',alias='acos acosh acospi asin asinh asinpi atan atan2 atan2pi atanh atanpi cbrt ceil copysign cos cosh cospi divide erf erfc exp exp10 exp2 expm1 fabs fdim floor fma fmax fmin fmod fract frexp half_cos half_divide half_exp half_exp10 half_exp2 half_log half_log10 half_log2 half_powr half_recip half_rsqrt half_sin half_sqrt half_tan hypot ilogb ldexp lgamma lgamma_r log log10 log1p log2 logb mad maxmag minmag modf nan native_cos native_divide native_exp native_exp10 native_exp2 native_log native_log10 native_log2 native_powr native_recip native_rsqrt native_sin native_sqrt native_tan nextafter pow pown powr recip remainder remquo rint rootn round rsqrt sin sincos sinh sinpi sqrt tan tanh tanpi tgamma trunc'] |
| -- |
| The built-in math functions are categorized into the following: |
| |
| * A list of built-in functions that have scalar or vector argument |
| versions, and, |
| * A list of built-in functions that only take scalar `float` arguments. |
| |
| The vector versions of the math functions operate component-wise. |
| The description is per-component. |
| |
| The built-in math functions are not affected by the prevailing rounding mode |
| in the calling environment, and always return the same value as they would |
| if called with the round to nearest even rounding mode. |
| |
| The <<table-builtin-math, Built-in Scalar and Vector Argument Math |
| Functions>> table describes the list of built-in math functions that can |
| take scalar or vector arguments. |
| |
| The generic type name `gentype` indicates that the function can take any of |
| |
| * `float`, `float2`, `float3`, `float4`, `float8`, or `float16` |
| * `double` footnote:double-supported[{fn-double-supported}], `double2`, |
| `double3`, `double4`, `double8` or `double16` |
| ifdef::cl_khr_fp16[] |
| * `half` footnote:[{fn-half-supported}], `half2`, `half3`, `half4`, |
| `half8` or `half16` |
| endif::cl_khr_fp16[] |
| |
| as the type for the arguments. |
| |
| The generic type name `gentypef` indicates that the function can take any of |
| |
| * `float`, `float2`, `float3`, `float4`, `float8`, or `float16` |
| |
| as the type for the arguments. |
| |
| The generic type name `gentyped` footnote:[{fn-double-supported}] indicates |
| that the function can take any of |
| |
| * `double`, `double2`, `double3`, `double4`, `double8` or `double16` |
| |
| as the type for the arguments. |
| |
| ifdef::cl_khr_fp16[] |
| The generic type name `gentypeh` footnote:[{fn-half-supported}] indicates |
| that the function can take any of |
| |
| * `half`, `half2`, `half3`, `half4`, `half8` or `half16` |
| |
| as the type for the arguments. |
| |
| NOTE: All functions taking or returning `half` types are supported only when |
| the `<<cl_khr_fp16>>` extension macro is supported. |
| endif::cl_khr_fp16[] |
| |
| For any specific use of a function with `gentype*` arguments the actual type |
| has to be the same for all arguments and the return type, unless they are |
| explicitly specified as an actual type. |
| |
| [[table-builtin-math]] |
| .Built-in Scalar and Vector Argument Math Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | gentype *acos*(gentype) |
| | Arc cosine function. Returns an angle in radians. |
| | gentype *acosh*(gentype) |
| | Inverse hyperbolic cosine. Returns an angle in radians. |
| | gentype *acospi*(gentype _x_) |
| | Compute *acos*(_x_) / {pi}. |
| | gentype *asin*(gentype) |
| | Arc sine function. Returns an angle in radians. |
| | gentype *asinh*(gentype) |
| | Inverse hyperbolic sine. Returns an angle in radians. |
| | gentype *asinpi*(gentype _x_) |
| | Compute *asin*(_x_) / {pi}. |
| | gentype *atan*(gentype _y_over_x_) |
| | Arc tangent function. Returns an angle in radians. |
| | gentype *atan2*(gentype _y_, gentype _x_) |
| | Arc tangent of _y_ / _x_. Returns an angle in radians. |
| | gentype *atanh*(gentype) |
| | Hyperbolic arc tangent. Returns an angle in radians. |
| | gentype *atanpi*(gentype _x_) |
| | Compute *atan*(_x_) / {pi}. |
| | gentype *atan2pi*(gentype _y_, gentype _x_) |
| | Compute *atan2*(_y_, _x_) / {pi}. |
| | gentype *cbrt*(gentype) |
| | Compute cube-root. |
| | gentype *ceil*(gentype) |
| | Round to integral value using the round to positive infinity rounding |
| mode. |
| | gentype *copysign*(gentype _x_, gentype _y_) |
| | Returns _x_ with its sign changed to match the sign of _y_. |
| | gentype *cos*(gentype _x_) |
| | Compute cosine, where _x_ is an angle in radians. |
| | gentype *cosh*(gentype _x_) |
| | Compute hyperbolic cosine, where _x_ is an angle in radians. |
| | gentype *cospi*(gentype _x_) |
| | Compute *cos*({pi} _x_). |
| | gentype *erfc*(gentype) |
| | Complementary error function. |
| | gentype *erf*(gentype) |
| | Error function encountered in integrating the |
| https://mathworld.wolfram.com/NormalDistribution.html[_normal |
| distribution_]. |
| | gentype *exp*(gentype _x_) |
| | Compute the base-_e_ exponential of _x_. |
| | gentype *exp2*(gentype) |
| | Exponential base 2 function. |
| | gentype *exp10*(gentype) |
| | Exponential base 10 function. |
| | gentype *expm1*(gentype _x_) |
| | Compute _e^x^_ - 1.0. |
| | gentype *fabs*(gentype) |
| | Compute absolute value of a floating-point number. |
| | gentype *fdim*(gentype _x_, gentype _y_) |
| | _x_ - _y_ if _x_ > _y_, +0 if _x_ is less than or equal to _y_. |
| | gentype *floor*(gentype) |
| | Round to integral value using the round to negative infinity rounding |
| mode. |
| | gentype *fma*(gentype _a_, gentype _b_, gentype _c_) |
| | Returns the correctly rounded floating-point representation of the sum |
| of _c_ with the infinitely precise product of _a_ and _b_. |
| Rounding of intermediate products shall not occur. |
| Edge case behavior is per the IEEE 754-2008 standard. |
| | gentype *fmax*(gentype _x_, gentype _y_) + |
| gentypef *fmax*(gentypef _x_, float _y_) + |
| gentyped *fmax*(gentyped _x_, double _y_) |
| |
| ifdef::cl_khr_fp16[gentypeh *fmax*(gentypeh _x_, half _y_)] |
| | Returns _y_ if _x_ < _y_, otherwise it returns _x_. |
| If one argument is a NaN, *fmax*() returns the other argument. |
| If both arguments are NaNs, *fmax*() returns a NaN. |
| | gentype *fmin*(gentype _x_, gentype _y_) + |
| gentypef *fmin*(gentypef _x_, float _y_) + |
| gentyped *fmin*(gentyped _x_, double _y_) |
| |
| ifdef::cl_khr_fp16[gentypeh *fmax*(gentypeh _x_, half _y_)] |
| | Returns _y_ if _y_ < _x_, otherwise it returns _x_. |
| If one argument is a NaN, *fmin*() returns the other argument. |
| If both arguments are NaNs, *fmin*() returns a NaN. |
| footnote:[{fn-fmin-fmax-nan}] |
| | gentype *fmod*(gentype _x_, gentype _y_) |
| | Modulus. |
| Returns _x_ - _y_ * *trunc*(_x_/_y_). |
| | gentype *fract*(gentype _x_, {global} gentype _*iptr_) + |
| gentype *fract*(gentype _x_, {local} gentype _*iptr_) + |
| gentype *fract*(gentype _x_, {private} gentype _*iptr_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| gentype *fract*(gentype _x_, gentype _*iptr_) |
| // TODO The fp16 extension uses the constant `0x1.ffcp-1f` below - unclear |
| // why, see the OpenCL-Docs issue. |
| | Returns *fmin*(_x_ - *floor*(_x_), `0x1.fffffep-1f`). |
| *floor*(x) is returned in _iptr_. |
| footnote:[{fn-fract-min}] |
| ifdef::cl_khr_fp16[] |
| | half__n__ **frexp**(half__n__ _x_, {global} int__n__ *exp) + |
| half **frexp**(half _x_, {global} int *exp) |
| |
| half__n__ **frexp**(half__n__ _x_, {local} int__n__ *exp) + |
| half **frexp**(half _x_, {local} int *exp) |
| |
| half__n__ **frexp**(half__n__ _x_, {private} int__n__ *exp) + |
| half **frexp**(half _x_, {private} int *exp) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| half__n__ **frexp**(half__n__ _x_, int__n__ *exp) + |
| half **frexp**(half _x_, int *exp) |
| | Extract mantissa and exponent from _x_. |
| For each component the mantissa returned is a `half` with magnitude in |
| the interval [1/2, 1) or 0. |
| Each component of _x_ equals mantissa returned * 2__^exp^__. |
| endif::cl_khr_fp16[] |
| | float__n__ **frexp**(float__n__ _x_, {global} int__n__ *exp) + |
| float **frexp**(float _x_, {global} int *exp) |
| |
| float__n__ **frexp**(float__n__ _x_, {local} int__n__ *exp) + |
| float **frexp**(float _x_, {local} int *exp) |
| |
| float__n__ **frexp**(float__n__ _x_, {private} int__n__ *exp) + |
| float **frexp**(float _x_, {private} int *exp) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float__n__ **frexp**(float__n__ _x_, int__n__ *exp) + |
| float **frexp**(float _x_, int *exp) |
| | Extract mantissa and exponent from _x_. |
| For each component the mantissa returned is a `float` with magnitude |
| in the interval [1/2, 1) or 0. |
| Each component of _x_ equals mantissa returned * 2__^exp^__. |
| | double__n__ **frexp**(double__n__ _x_, {global} int__n__ *exp) + |
| double **frexp**(double _x_, {global} int *exp) |
| |
| double__n__ **frexp**(double__n__ _x_, {local} int__n__ *exp) + |
| double **frexp**(double _x_, {local} int *exp) |
| |
| double__n__ **frexp**(double__n__ _x_, {private} int__n__ *exp) + |
| double **frexp**(double _x_, {private} int *exp) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| double__n__ **frexp**(double__n__ _x_, int__n__ *exp) + |
| double **frexp**(double _x_, int *exp) |
| | Extract mantissa and exponent from _x_. |
| For each component the mantissa returned is a `double` with magnitude |
| in the interval [1/2, 1) or 0. |
| Each component of _x_ equals mantissa returned * 2__^exp^__. |
| | gentype *hypot*(gentype _x_, gentype _y_) |
| | Compute the value of the square root of __x__^2^+ __y__^2^ without |
| undue overflow or underflow. |
| | int__n__ *ilogb*(float__n__ _x_) + |
| int *ilogb*(float _x_) + |
| int__n__ *ilogb*(double__n__ _x_) + |
| int *ilogb*(double _x_) |
| |
| ifdef::cl_khr_fp16[] |
| int__n__ *ilogb*(half__n__ _x_) + |
| int *ilogb*(half _x_) |
| endif::cl_khr_fp16[] |
| | Return the exponent as an integer value. |
| | float__n__ *ldexp*(float__n__ _x_, int__n__ _k_) + |
| float__n__ *ldexp*(float__n__ _x_, int _k_) + |
| float *ldexp*(float _x_, int _k_) + |
| double__n__ *ldexp*(double__n__ _x_, int__n__ _k_) + |
| double__n__ *ldexp*(double__n__ _x_, int _k_) + |
| double *ldexp*(double _x_, int _k_) |
| ifdef::cl_khr_fp16[] |
| half__n__ *ldexp*(half__n__ _x_, int__n__ _k_) + |
| half__n__ *ldexp*(half__n__ _x_, int _k_) + |
| half *ldexp*(half _x_, int _k_) |
| endif::cl_khr_fp16[] |
| | Multiply _x_ by 2 to the power _k_. |
| | gentype *lgamma*(gentype _x_) + |
| float__n__ **lgamma_r**(float__n__ _x_, {global} int__n__ *_signp_) + |
| float **lgamma_r**(float _x_, {global} int *_signp_) + |
| double__n__ **lgamma_r**(double__n__ _x_, {global} int__n__ *_signp_) + |
| double **lgamma_r**(double _x_, {global} int *_signp_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ **lgamma_r**(half__n__ _x_, {global} int__n__ *_signp_) + |
| half **lgamma_r**(half _x_, {global} int *_signp_) + |
| endif::cl_khr_fp16[] |
| |
| float__n__ **lgamma_r**(float__n__ _x_, {local} int__n__ *_signp_) + |
| float **lgamma_r**(float _x_, {local} int *_signp_) + |
| double__n__ **lgamma_r**(double__n__ _x_, {local} int__n__ *_signp_) + |
| double **lgamma_r**(double _x_, {local} int *_signp_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ **lgamma_r**(half__n__ _x_, {local} int__n__ *_signp_) + |
| half **lgamma_r**(half _x_, {local} int *_signp_) + |
| endif::cl_khr_fp16[] |
| |
| float__n__ **lgamma_r**(float__n__ _x_, {private} int__n__ *_signp_) + |
| float **lgamma_r**(float _x_, {private} int *_signp_) + |
| double__n__ **lgamma_r**(double__n__ _x_, {private} int__n__ *_signp_) + |
| double **lgamma_r**(double _x_, {private} int *_signp_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ **lgamma_r**(half__n__ _x_, {private} int__n__ *_signp_) + |
| half **lgamma_r**(half _x_, {private} int *_signp_) + |
| endif::cl_khr_fp16[] |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float__n__ **lgamma_r**(float__n__ _x_, int__n__ *_signp_) + |
| float **lgamma_r**(float _x_, int *_signp_) + |
| double__n__ **lgamma_r**(double__n__ _x_, int__n__ *_signp_) + |
| double **lgamma_r**(double _x_, int *_signp_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ **lgamma_r**(half__n__ _x_, int__n__ *_signp_) + |
| half **lgamma_r**(half _x_, int *_signp_) |
| endif::cl_khr_fp16[] |
| | Log gamma function. |
| Returns the natural logarithm of the absolute value of the gamma |
| function. |
| The sign of the gamma function is returned in the _signp_ argument of |
| *lgamma_r*. |
| | gentype *log*(gentype) |
| | Compute natural logarithm. |
| | gentype *log2*(gentype) |
| | Compute a base 2 logarithm. |
| | gentype *log10*(gentype) |
| | Compute a base 10 logarithm. |
| | gentype *log1p*(gentype _x_) |
| | Compute log~e~(1.0 + _x_). |
| | gentype *logb*(gentype _x_) |
| | Compute the exponent of _x_, which is the integral part of |
| log__~r~__(\|_x_\|). |
| | gentype *mad*(gentype _a_, gentype _b_, gentype _c_) |
| | *mad* computes _a_ * _b_ + _c_. |
| The function may compute _a_ * _b_ + _c_ with reduced accuracy in the |
| embedded profile. |
| See the OpenCL SPIR-V Environment Specification for details. |
| On some hardware the mad instruction may provide better performance |
| than expanded computation of _a_ * _b_ + _c_. |
| footnote:[{fn-mad-caution}] |
| | gentype *maxmag*(gentype _x_, gentype _y_) |
| | Returns _x_ if \|_x_\| > \|_y_\|, _y_ if \|_y_\| > \|_x_\|, otherwise |
| *fmax*(_x_, _y_). |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | gentype *minmag*(gentype _x_, gentype _y_) |
| | Returns _x_ if \|_x_\| < \|_y_\|, _y_ if \|_y_\| < \|_x_\|, otherwise |
| *fmin*(_x_, _y_). |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | gentype *modf*(gentype _x_, {global} gentype _*iptr_) + |
| gentype *modf*(gentype _x_, {local} gentype _*iptr_) + |
| gentype *modf*(gentype _x_, {private} gentype _*iptr_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| gentype *modf*(gentype _x_, gentype _*iptr_) |
| | Decompose a floating-point number. |
| The *modf* function breaks the argument _x_ into integral and |
| fractional parts, each of which has the same sign as the argument. |
| It stores the integral part in the object pointed to by _iptr_. |
| | float__n__ *nan*(uint__n__ _nancode_) + |
| float *nan*(uint _nancode_) + |
| double__n__ *nan*(ulong__n__ _nancode_) + |
| double *nan*(ulong _nancode_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ *nan*(ushort__n__ _nancode_) + |
| half *nan*(ushort _nancode_) |
| endif::cl_khr_fp16[] |
| | Returns a quiet NaN. |
| The _nancode_ may be placed in the significand of the resulting NaN. |
| | gentype *nextafter*(gentype _x_, gentype _y_) |
| // TODO shouldn't this be "next representable FP value of the precision of |
| // its arguments"? See the OpenCL-Docs issue. |
| | Computes the next representable floating-point value |
| following _x_ in the direction of _y_. |
| Thus, if _y_ is less than _x_, *nextafter*() returns the largest |
| representable floating-point number less than _x_. |
| | gentype *pow*(gentype _x_, gentype _y_) |
| | Compute _x_ to the power _y_. |
| | float__n__ *pown*(float__n__ _x_, int__n__ _y_) + |
| float *pown*(float _x_, int _y_) + |
| double__n__ *pown*(double__n__ _x_, int__n__ _y_) + |
| double *pown*(double _x_, int _y_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ *pown*(half__n__ _x_, int__n__ _y_) + |
| half *pown*(half _x_, int _y_) |
| endif::cl_khr_fp16[] |
| | Compute _x_ to the power _y_, where _y_ is an integer. |
| | gentype *powr*(gentype _x_, gentype _y_) |
| | Compute _x_ to the power _y_, where _x_ is >= 0. |
| | gentype *remainder*(gentype _x_, gentype _y_) |
| | Compute the value _r_ such that _r_ = _x_ - _n_*_y_, where _n_ is the |
| integer nearest the exact value of _x_/_y_. |
| If there are two integers closest to _x_/_y_, _n_ shall be the even |
| one. |
| If _r_ is zero, it is given the same sign as _x_. |
| | float__n__ **remquo**(float__n__ _x_, float__n__ _y_, {global} int__n__ _*quo_) + |
| float **remquo**(float _x_, float _y_, {global} int _*quo_) |
| |
| float__n__ **remquo**(float__n__ _x_, float__n__ _y_, {local} int__n__ _*quo_) + |
| float **remquo**(float _x_, float _y_, {local} int _*quo_) |
| |
| float__n__ **remquo**(float__n__ _x_, float__n__ _y_, {private} int__n__ _*quo_) + |
| float **remquo**(float _x_, float _y_, {private} int _*quo_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float__n__ **remquo**(float__n__ _x_, float__n__ _y_, int__n__ _*quo_) + |
| float **remquo**(float _x_, float _y_, int _*quo_) |
| | The *remquo* function computes the value r such that _r_ = _x_ - |
| _k_*_y_, where _k_ is the integer nearest the exact value of _x_/_y_. |
| If there are two integers closest to _x_/_y_, _k_ shall be the even |
| one. |
| If _r_ is zero, it is given the same sign as _x_. |
| This is the same value that is returned by the *remainder* function. |
| *remquo* also calculates the lower seven bits of the integral quotient |
| _x_/_y_, and gives that value the same sign as _x_/_y_. |
| It stores this signed value in the object pointed to by _quo_. |
| | double__n__ **remquo**(double__n__ _x_, double__n__ _y_, {global} int__n__ _*quo_) + |
| double **remquo**(double _x_, double _y_, {global} int _*quo_) |
| |
| double__n__ **remquo**(double__n__ _x_, double__n__ _y_, {local} int__n__ _*quo_) + |
| double **remquo**(double _x_, double _y_, {local} int _*quo_) |
| |
| double__n__ **remquo**(double__n__ _x_, double__n__ _y_, {private} int__n__ _*quo_) + |
| double **remquo**(double _x_, double _y_, {private} int _*quo_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| double__n__ **remquo**(double__n__ _x_, double__n__ _y_, int__n__ _*quo_) + |
| double **remquo**(double _x_, double _y_, int _*quo_) |
| | The *remquo* function computes the value r such that _r_ = _x_ - |
| _k_*_y_, where _k_ is the integer nearest the exact value of _x_/_y_. |
| If there are two integers closest to _x_/_y_, _k_ shall be the even |
| one. |
| If _r_ is zero, it is given the same sign as _x_. |
| This is the same value that is returned by the *remainder* function. |
| *remquo* also calculates the lower seven bits of the integral quotient |
| _x_/_y_, and gives that value the same sign as _x_/_y_. |
| It stores this signed value in the object pointed to by _quo_. |
| ifdef::cl_khr_fp16[] |
| | half__n__ **remquo**(half__n__ _x_, half__n__ _y_, {global} int__n__ _*quo_) + |
| half **remquo**(half _x_, half _y_, {global} int _*quo_) |
| |
| half__n__ **remquo**(half__n__ _x_, half__n__ _y_, {local} int__n__ _*quo_) + |
| half **remquo**(half _x_, half _y_, {local} int _*quo_) |
| |
| half__n__ **remquo**(half__n__ _x_, half__n__ _y_, {private} int__n__ _*quo_) + |
| half **remquo**(half _x_, half _y_, {private} int _*quo_) |
| |
| For OpenCL C 2.0 or with the |
| {opencl_c_generic_address_space} feature: |
| |
| half__n__ **remquo**(half__n__ _x_, half__n__ _y_, int__n__ _*quo_) + |
| half **remquo**(half _x_, half _y_, int _*quo_) |
| | The *remquo* function computes the value r such that _r_ = _x_ - |
| _k_*_y_, where _k_ is the integer nearest the exact value of _x_/_y_. |
| If there are two integers closest to _x_/_y_, _k_ shall be the even |
| one. |
| If _r_ is zero, it is given the same sign as _x_. |
| This is the same value that is returned by the *remainder* function. |
| *remquo* also calculates the lower seven bits of the integral quotient |
| _x_/_y_, and gives that value the same sign as _x_/_y_. |
| It stores this signed value in the object pointed to by _quo_. |
| endif::cl_khr_fp16[] |
| | gentype *rint*(gentype) |
| | Round to integral value (using round to nearest even rounding mode) in |
| floating-point format. |
| Refer to section 7.1 for description of rounding modes. |
| | float__n__ *rootn*(float__n__ _x_, int__n__ _y_) + |
| float *rootn*(float _x_, int _y_) + |
| double__n__ *rootn*(double__n__ _x_, int__n__ _y_) + |
| double *rootn*(double _x_, int _y_) |
| |
| ifdef::cl_khr_fp16[] |
| half__n__ *rootn*(half__n__ _x_, int__n__ _y_) + |
| half *rootn*(half _x_, int _y_) |
| endif::cl_khr_fp16[] |
| | Compute _x_ to the power 1/_y_. |
| | gentype *round*(gentype _x_) |
| | Return the integral value nearest to _x_ rounding halfway cases away |
| from zero, regardless of the current rounding direction. |
| | gentype *rsqrt*(gentype) |
| | Compute inverse square root. |
| | gentype *sin*(gentype _x_) |
| | Compute sine, where _x_ is an angle in radians. |
| | gentype *sincos*(gentype _x_, {global} gentype _*cosval_) + |
| gentype *sincos*(gentype _x_, {local} gentype _*cosval_) + |
| gentype *sincos*(gentype _x_, {private} gentype _*cosval_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| gentype *sincos*(gentype _x_, gentype _*cosval_) |
| | Compute sine and cosine of _x_. |
| The computed sine is the return value and computed cosine is returned |
| in _cosval_, where _x_ is an angle in radians. |
| | gentype *sinh*(gentype _x_) |
| | Compute hyperbolic sine, where _x_ is an angle in radians |
| | gentype *sinpi*(gentype _x_) |
| | Compute *sin*({pi} _x_). |
| | gentype *sqrt*(gentype) |
| | Compute square root. |
| | gentype *tan*(gentype _x_) |
| | Compute tangent, where _x_ is an angle in radians. |
| | gentype *tanh*(gentype _x_) |
| | Compute hyperbolic tangent, where _x_ is an angle in radians. |
| | gentype *tanpi*(gentype _x_) |
| | Compute *tan*({pi} _x_). |
| | gentype *tgamma*(gentype) |
| | Compute the gamma function. |
| | gentype *trunc*(gentype) |
| | Round to integral value using the round to zero rounding mode. |
| |==== |
| |
| The following table describes the following functions: |
| |
| * A subset of functions from <<table-builtin-math,Built-in Scalar and Vector |
| Argument Math Functions>> that are defined with the half_ prefix . |
| These functions are implemented with a minimum of 10-bits of accuracy, |
| i.e. the maximum error value \<= 8192 ulp. |
| * A subset of functions from <<table-builtin-math,Built-in Scalar and Vector |
| Argument Math Functions>> that are defined with the native_ prefix. |
| These functions may map to one or more native device instructions and |
| will typically have better performance compared to the corresponding |
| functions without the `+native_+` prefix). |
| The accuracy (and in some cases the input range(s)) of these functions |
| is implementation-defined. |
| * `+half_+` and `+native_+` functions for following basic operations: |
| divide and reciprocal. |
| |
| We use the generic type name `gentype` to indicate that the functions in the |
| following table can take `float`, `float2`, `float3`, `float4`, `float8` or |
| `float16` as the type for the arguments. |
| |
| ifdef::cl_khr_fp16[] |
| NOTE: The use of `half` in this table does not refer to the argument and |
| return types, which are 32-bit floating-point values, but to the accuracy |
| requirements of the function results. |
| endif::cl_khr_fp16[] |
| |
| [[table-builtin-half-native-math]] |
| .Built-in Scalar and Vector _half_ and _native_ Math Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | gentype *half_cos*(gentype _x_) |
| | Compute cosine. |
| _x_ is an angle in radians, and must be in the range [-2^16^, +2^16^]. |
| | gentype *half_divide*(gentype _x_, gentype _y_) |
| | Compute _x_ / _y_. |
| | gentype *half_exp*(gentype _x_) |
| | Compute the base-_e_ exponential of _x_. |
| | gentype *half_exp2*(gentype _x_) |
| | Compute the base- 2 exponential of _x_. |
| | gentype *half_exp10*(gentype _x_) |
| | Compute the base- 10 exponential of _x_. |
| | gentype *half_log*(gentype _x_) |
| | Compute natural logarithm. |
| | gentype *half_log2*(gentype _x_) |
| | Compute a base 2 logarithm. |
| | gentype *half_log10*(gentype _x_) |
| | Compute a base 10 logarithm. |
| | gentype *half_powr*(gentype _x_, gentype _y_) |
| | Compute _x_ to the power _y_, where _x_ is >= 0. |
| | gentype *half_recip*(gentype _x_) |
| | Compute reciprocal. |
| | gentype *half_rsqrt*(gentype _x_) |
| | Compute inverse square root. |
| | gentype *half_sin*(gentype _x_) |
| | Compute sine. |
| _x_ is an angle in radians, and must be in the range [-2^16^, +2^16^]. |
| | gentype *half_sqrt*(gentype _x_) |
| | Compute square root. |
| | gentype *half_tan*(gentype _x_) |
| | Compute tangent. |
| _x_ is an angle in radians, and must be in the range [-2^16^, +2^16^]. |
| | | |
| | gentype *native_cos*(gentype _x_) |
| | Compute cosine over an implementation-defined range, where _x_ is an |
| angle in radians. |
| The maximum error is implementation-defined. |
| | gentype *native_divide*(gentype _x_, gentype _y_) |
| | Compute _x_ / _y_ over an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_exp*(gentype _x_) |
| | Compute the base-__e__ exponential of _x_ over an |
| implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_exp2*(gentype _x_) |
| | Compute the base-2 exponential of _x_ over an implementation-defined |
| range. |
| The maximum error is implementation-defined. |
| | gentype *native_exp10*(gentype _x_) |
| | Compute the base-10 exponential of _x_ over an implementation-defined |
| range. |
| The maximum error is implementation-defined. |
| | gentype *native_log*(gentype _x_) |
| | Compute natural logarithm over an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_log2*(gentype _x_) |
| | Compute a base 2 logarithm over an |
| implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_log10*(gentype _x_) |
| | Compute a base 10 logarithm over |
| an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_powr*(gentype _x_, gentype _y_) |
| | Compute _x_ to the power _y_, where _x_ is >= 0. |
| The range of _x_ and _y_ are implementation-defined. |
| The maximum error is implementation-defined. |
| | gentype *native_recip*(gentype _x_) |
| | Compute reciprocal over an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_rsqrt*(gentype _x_) |
| | Compute inverse square root over an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_sin*(gentype _x_) |
| | Compute sine over an implementation-defined range, where _x_ is an |
| angle in radians. |
| The maximum error is implementation-defined. |
| | gentype *native_sqrt*(gentype _x_) |
| | Compute square root over an implementation-defined range. |
| The maximum error is implementation-defined. |
| | gentype *native_tan*(gentype _x_) |
| | Compute tangent over an implementation-defined range, where _x_ is an |
| angle in radians. |
| The maximum error is implementation-defined. |
| |==== |
| |
| |
| Support for denormal values is optional for *half_* functions. |
| The *half_* functions may return any result allowed by |
| <<edge-case-behavior-in-flush-to-zero-mode,Edge Case Behavior>>, even when |
| `-cl-denorms-are-zero` (see <<opencl-spec,section 5.8.4.2 of the OpenCL |
| Specification>>) is not in force. |
| Support for denormal values is implementation-defined for *native_* |
| functions. |
| -- |
| |
| |
| [open,refpage='mathConstants',desc='Math Constants',type='freeform',spec='clang',anchor='table-builtin-half-native-math',xrefs='mathFunctions',alias='MAXFLOAT HUGE_VALF INFINITY NAN HUGE_VAL'] |
| -- |
| The following constants are available. |
| Their values are of type `float` and are accurate within the precision of a |
| single precision floating-point number. |
| |
| [cols=",",options="header",] |
| |==== |
| | Constant Name | Description |
| | `MAXFLOAT` |
| | Value of maximum non-infinite single-precision floating-point number. |
| | `HUGE_VALF` |
| | A positive `float` constant expression. |
| `HUGE_VALF` evaluates to +infinity. |
| Used as an error value returned by the built-in math functions. |
| | `INFINITY` |
| | A constant expression of type `float` representing positive or |
| unsigned infinity. |
| | `NAN` |
| | A constant expression of type `float` representing a quiet NaN. |
| |==== |
| |
| If <<double-precision-support, double-precision is supported by the |
| device>>, then the following constants are also available: |
| |
| [cols=",",options="header",] |
| |==== |
| | Constant Name | Description |
| | `HUGE_VAL` |
| | A positive double constant expression. |
| `HUGE_VAL` evaluates to +infinity. |
| Used as an error value returned by the built-in math functions. |
| |==== |
| -- |
| |
| |
| [[floating-point-macros-and-pragmas]] |
| ==== Floating-point Macros and Pragmas |
| |
| [open,refpage='fpMacros',desc='Floating-Point Macros And Pragmas',type='freeform',spec='clang',anchor='floating-point-macros-and-pragmas',xrefs='integerMacros',alias='FP_CONTRACT FP_FAST_FMAF FP_FAST_FMA macroLimits'] |
| -- |
| The `FP_CONTRACT` pragma can be used to allow (if the state is on) or |
| disallow (if the state is off) the implementation to contract expressions. |
| Each pragma can occur either outside external declarations or preceding all |
| explicit declarations and statements inside a compound statement. |
| When outside external declarations, the pragma takes effect from its |
| occurrence until another `FP_CONTRACT` pragma is encountered, or until the |
| end of the translation unit. |
| When inside a compound statement, the pragma takes effect from its |
| occurrence until another `FP_CONTRACT` pragma is encountered (including |
| within a nested compound statement), or until the end of the compound |
| statement; at the end of a compound statement the state for the pragma is |
| restored to its condition just before the compound statement. |
| If this pragma is used in any other context, the behavior is undefined. |
| |
| The pragma definition to set `FP_CONTRACT` is: |
| |
| [source,opencl_c] |
| ---------- |
| // on-off-switch is one of ON, OFF, or DEFAULT. |
| // The DEFAULT value is ON. |
| #pragma OPENCL FP_CONTRACT on-off-switch |
| ---------- |
| |
| The `FP_FAST_FMAF` macro indicates whether the *fma* function is fast |
| compared with direct code for single precision floating-point. |
| If defined, the `FP_FAST_FMAF` macro shall indicate that the *fma* function |
| generally executes about as fast as, or faster than, a multiply and an add |
| of `float` operands. |
| |
| The macro names given in the following list must use the values specified. |
| These constant expressions are suitable for use in `#if` preprocessing |
| directives. |
| |
| [source,opencl_c] |
| ---------- |
| #define FLT_DIG 6 |
| #define FLT_MANT_DIG 24 |
| #define FLT_MAX_10_EXP +38 |
| #define FLT_MAX_EXP +128 |
| #define FLT_MIN_10_EXP -37 |
| #define FLT_MIN_EXP -125 |
| #define FLT_RADIX 2 |
| #define FLT_MAX 0x1.fffffep127f |
| #define FLT_MIN 0x1.0p-126f |
| #define FLT_EPSILON 0x1.0p-23f |
| ---------- |
| |
| The following table describes the built-in macro names given above in the |
| OpenCL C programming language and the corresponding macro names available to |
| the application. |
| |
| [cols=",",options="header",] |
| |==== |
| | Macro in OpenCL Language | Macro for application |
| | `FLT_DIG` | `CL_FLT_DIG` |
| | `FLT_MANT_DIG` | `CL_FLT_MANT_DIG` |
| | `FLT_MAX_10_EXP` | `CL_FLT_MAX_10_EXP` |
| | `FLT_MAX_EXP` | `CL_FLT_MAX_EXP` |
| | `FLT_MIN_10_EXP` | `CL_FLT_MIN_10_EXP` |
| | `FLT_MIN_EXP` | `CL_FLT_MIN_EXP` |
| | `FLT_RADIX` | `CL_FLT_RADIX` |
| | `FLT_MAX` | `CL_FLT_MAX` |
| | `FLT_MIN` | `CL_FLT_MIN` |
| | `FLT_EPSILSON` | `CL_FLT_EPSILON` |
| |==== |
| |
| The following macros shall expand to integer constant expressions whose |
| values are returned by *ilogb*(_x_) if _x_ is zero or NaN, respectively. |
| The value of `FP_ILOGB0` shall be either `INT_MIN` or `-INT_MAX`. |
| The value of `FP_ILOGBNAN` shall be either `INT_MAX` or `INT_MIN`. |
| |
| The following constants are also available. |
| They are of type `float` and are accurate within the precision of the |
| `float` type. |
| |
| [cols=",",options="header",] |
| |==== |
| | Constant | Description |
| | `M_E_F` | Value of _e_ |
| | `M_LOG2E_F` | Value of log~2~e |
| | `M_LOG10E_F` | Value of log~10~e |
| | `M_LN2_F` | Value of log~e~2 |
| | `M_LN10_F` | Value of log~e~10 |
| | `M_PI_F` | Value of {pi} |
| | `M_PI_2_F` | Value of {pi} / 2 |
| | `M_PI_4_F` | Value of {pi} / 4 |
| | `M_1_PI_F` | Value of 1 / {pi} |
| | `M_2_PI_F` | Value of 2 / {pi} |
| | `M_2_SQRTPI_F` | Value of 2 / {sqrt}{pi} |
| | `M_SQRT2_F` | Value of {sqrt}2 |
| | `M_SQRT1_2_F` | Value of 1 / {sqrt}2 |
| |==== |
| |
| If <<double-precision-support, double-precision is supported by the |
| device>>, then the following macros and constants are also available: |
| |
| The `FP_FAST_FMA` macro indicates whether the *fma*() family of functions |
| are fast compared with direct code for double-precision floating-point. |
| If defined, the `FP_FAST_FMA` macro shall indicate that the *fma*() function |
| generally executes about as fast as, or faster than, a multiply and an add |
| of `double` operands |
| |
| The macro names given in the following list must use the values specified. |
| These constant expressions are suitable for use in `#if` preprocessing |
| directives. |
| |
| [source,opencl_c] |
| ---------- |
| #define DBL_DIG 15 |
| #define DBL_MANT_DIG 53 |
| #define DBL_MAX_10_EXP +308 |
| #define DBL_MAX_EXP +1024 |
| #define DBL_MIN_10_EXP -307 |
| #define DBL_MIN_EXP -1021 |
| #define DBL_MAX 0x1.fffffffffffffp1023 |
| #define DBL_MIN 0x1.0p-1022 |
| #define DBL_EPSILON 0x1.0p-52 |
| ---------- |
| |
| The following table describes the built-in macro names given above in the |
| OpenCL C programming language and the corresponding macro names available to |
| the application. |
| |
| [cols=",",options="header",] |
| |==== |
| | Macro in OpenCL Language | Macro for application |
| | `DBL_DIG` | `CL_DBL_DIG` |
| | `DBL_MANT_DIG` | `CL_DBL_MANT_DIG` |
| | `DBL_MAX_10_EXP` | `CL_DBL_MAX_10_EXP` |
| | `DBL_MAX_EXP` | `CL_DBL_MAX_EXP` |
| | `DBL_MIN_10_EXP` | `CL_DBL_MIN_10_EXP` |
| | `DBL_MIN_EXP` | `CL_DBL_MIN_EXP` |
| | `DBL_MAX` | `CL_DBL_MAX` |
| | `DBL_MIN` | `CL_DBL_MIN` |
| | `DBL_EPSILSON` | `CL_DBL_EPSILON` |
| |==== |
| |
| The following constants are also available. |
| They are of type ``double`` and are accurate within the precision of the |
| double type. |
| |
| [cols=",",options="header",] |
| |==== |
| | Constant | Description |
| | `M_E` | Value of _e_ |
| | `M_LOG2E` | Value of log~2~e |
| | `M_LOG10E` | Value of log~10~e |
| | `M_LN2` | Value of log~e~2 |
| | `M_LN10` | Value of log~e~10 |
| | `M_PI` | Value of {pi} |
| | `M_PI_2` | Value of {pi} / 2 |
| | `M_PI_4` | Value of {pi} / 4 |
| | `M_1_PI` | Value of 1 / {pi} |
| | `M_2_PI` | Value of 2 / {pi} |
| | `M_2_SQRTPI` | Value of 2 / {sqrt}{pi} |
| | `M_SQRT2` | Value of {sqrt}2 |
| | `M_SQRT1_2` | Value of 1 / {sqrt}2 |
| |==== |
| |
| ifdef::cl_khr_fp16[] |
| If the `<<cl_khr_fp16>>` extension macro is supported, then the following |
| macros and constants are also available: |
| |
| The `FP_FAST_FMA_HALF` macro indicates whether the *fma*() family of |
| functions are fast compared with direct code for half-precision |
| floating-point. |
| If defined, the `FP_FAST_FMA_HALF` macro shall indicate that the *fma*() |
| function generally executes about as fast as, or faster than, a multiply and |
| an add of `half` operands. |
| |
| The macro names given in the following list must use the values specified. |
| These constant expressions are suitable for use in #if preprocessing |
| directives. |
| |
| [source,opencl_c] |
| ---- |
| #define HALF_DIG 3 |
| #define HALF_MANT_DIG 11 |
| #define HALF_MAX_10_EXP +4 |
| #define HALF_MAX_EXP +16 |
| #define HALF_MIN_10_EXP -4 |
| #define HALF_MIN_EXP -13 |
| #define HALF_RADIX 2 |
| #define HALF_MAX 0x1.ffcp15h |
| #define HALF_MIN 0x1.0p-14h |
| #define HALF_EPSILON 0x1.0p-10h |
| ---- |
| |
| The following table describes the built-in macro names given above in the |
| OpenCL C programming language and the corresponding macro names available to |
| the application. |
| |
| [cols=",",options="header",] |
| |==== |
| | Macro in OpenCL Language | Macro for application |
| | `HALF_DIG` | `CL_HALF_DIG` |
| | `HALF_MANT_DIG` | `CL_HALF_MANT_DIG` |
| | `HALF_MAX_10_EXP` | `CL_HALF_MAX_10_EXP` |
| | `HALF_MAX_EXP` | `CL_HALF_MAX_EXP` |
| | `HALF_MIN_10_EXP` | `CL_HALF_MIN_10_EXP` |
| | `HALF_MIN_EXP` | `CL_HALF_MIN_EXP` |
| | `HALF_RADIX` | `CL_HALF_RADIX` |
| | `HALF_MAX` | `CL_HALF_MAX` |
| | `HALF_MIN` | `CL_HALF_MIN` |
| | `HALF_EPSILSON` | `CL_HALF_EPSILON` |
| |==== |
| |
| The following constants are also available. |
| They are of type `half` and are accurate within the precision of the `half` |
| type. |
| |
| [cols=",",options="header",] |
| |==== |
| | Constant | Description |
| | `M_E_H` | Value of e |
| | `M_LOG2E_H` | Value of log~2~e |
| | `M_LOG10E_H` | Value of log~10~e |
| | `M_LN2_H` | Value of log~e~2 |
| | `M_LN10_H` | Value of log~e~10 |
| | `M_PI_H` | Value of {pi} |
| | `M_PI_2_H` | Value of {pi} / 2 |
| | `M_PI_4_H` | Value of {pi} / 4 |
| | `M_1_PI_H` | Value of 1 / {pi} |
| | `M_2_PI_H` | Value of 2 / {pi} |
| | `M_2_SQRTPI_H` | Value of 2 / {sqrt}{pi} |
| | `M_SQRT2_H` | Value of {sqrt}2 |
| | `M_SQRT1_2_H` | Value of 1 / {sqrt}2 |
| |==== |
| endif::cl_khr_fp16[] |
| |
| -- |
| |
| |
| [[integer-functions]] |
| === Integer Functions |
| |
| [open,refpage='integerFunctions',desc='Integer Functions',type='freeform',spec='clang',anchor='integer-functions',xrefs='commonFunctions',alias='abs add_sat clamp_integer clz ctz hadd mad24 mad_hi mad_sat integerMax integerMin mul24 mul_hi popcount rotate sub_sat upsample'] |
| -- |
| The <<table-builtin-functions, following table>> describes the built-in integer |
| functions that take scalar or vector arguments. |
| The vector versions of the integer functions operate component-wise. |
| The description is per-component. |
| |
| We use the generic type name `gentype` to indicate that the function can take |
| `char`, `char__n__`, `uchar`, `uchar__n__`, `short`, |
| `short__n__`, `ushort`, `ushort__n__`, `int`, `int__n__`, |
| `uint`, `uint__n__`, `long` footnote:[{fn-int64-supported}], |
| `long__n__`, `ulong`, or `ulong__n__` as the type for the |
| arguments. |
| We use the generic type name `ugentype` to refer to unsigned versions of |
| `gentype`. |
| For example, if `gentype` is `char4`, `ugentype` is `uchar4`. |
| We also use the generic type name `sgentype` to indicate that the function |
| can take a scalar data type, i.e. `char`, `uchar`, `short`, `ushort`, `int`, |
| `uint`, `long`, or `ulong`, as the type for the arguments. |
| For built-in integer functions that take `gentype` and `sgentype` arguments, |
| the `gentype` argument must be a vector or scalar version of the `sgentype` |
| argument. |
| For example, if `sgentype` is `uchar`, `gentype` must be `uchar` or |
| `uchar__n__`. |
| For vector versions, `sgentype` is implicitly widened to `gentype` as |
| described for <<operators-arithmetic,arithmetic operators>>. |
| _n_ is 2, 3, 4, 8, or 16. |
| |
| For any specific use of a function with `gentype*` arguments the actual type |
| has to be the same for all arguments and the return type, unless they are |
| explicitly specified as an actual type. |
| |
| [[table-builtin-functions]] |
| .Built-in Scalar and Vector Integer Argument Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | ugentype *abs*(gentype _x_) |
| | Returns \|x\|. |
| | ugentype *abs_diff*(gentype _x_, gentype _y_) |
| | Returns \|x - y\| without modulo overflow. |
| | gentype *add_sat*(gentype _x_, gentype _y_) |
| | Returns _x_ + _y_ and saturates the result. |
| | gentype *hadd*(gentype _x_, gentype _y_) |
| | Returns (_x_ + _y_) >> 1. |
| The intermediate sum does not modulo overflow. |
| | gentype *rhadd*(gentype _x_, gentype _y_) |
| | Returns (_x_ + _y_ + 1) >> 1. |
| The intermediate sum does not modulo overflow. |
| footnote:[{fn-rhadd-benefit}] |
| | gentype *clamp*(gentype _x_, gentype _minval_, gentype _maxval_) + |
| gentype *clamp*(gentype _x_, sgentype _minval_, sgentype _maxval_) |
| | Returns *min*(*max*(_x_, _minval_), _maxval_). |
| Results are undefined if _minval_ > _maxval_. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | gentype *clz*(gentype _x_) |
| | Returns the number of leading 0-bits in _x_, starting at the most |
| significant bit position. |
| If _x_ is 0, returns the size in bits of the type of _x_ or component |
| type of _x_, if _x_ is a vector. |
| | gentype *ctz*(gentype _x_) |
| | Returns the count of trailing 0-bits in _x_. |
| If _x_ is 0, returns the size in bits of the type of _x_ or component |
| type of _x_, if _x_ is a vector. |
| |
| <<unified-spec, Requires>> support for OpenCL 2.0 or newer. |
| ifdef::cl_khr_integer_dot_product[] |
| | uint *dot*(uchar4 a, uchar4 b) + |
| int *dot*(char4 a, char4 b) + |
| int *dot*(uchar4 a, char4 b) + |
| int *dot*(char4 a, uchar4 b) |
| | `dot` returns the dot product of the two input vectors `a` and `b`. |
| The components of `a` and `b` are sign- or zero-extended to the width |
| of the destination type and the vectors with extended components are |
| multiplied component-wise. |
| All the components of the resulting vectors are added together to form |
| the final result. |
| |
| <<unified-spec, Requires>> that the |
| {opencl_c_integer_dot_product_input_4x8bit} feature macro is defined, |
| |
| | uint *dot_acc_sat*(uchar4 a, uchar4 b, uint acc) + |
| int *dot_acc_sat*(char4 a, char4 b, int acc) + |
| int *dot_acc_sat*(uchar4 a, char4 b, int acc) + |
| int *dot_acc_sat*(char4 a, uchar4 b, int acc) |
| a| `dot_acc_sat` returns the saturating addition of the dot product of |
| the two input vectors `a` and `b` and the accumulator `acc`: |
| ---- |
| product = dot(a,b); |
| result = add_sat(product, acc); |
| ---- |
| |
| <<unified-spec, Requires>> that the |
| {opencl_c_integer_dot_product_input_4x8bit} feature macro is defined, |
| |
| | uint *dot_4x8packed_uu_uint*(uint a, uint b) + |
| int *dot_4x8packed_ss_int*(uint a, uint b) + |
| int *dot_4x8packed_us_int*(uint a, uint b) + |
| int *dot_4x8packed_su_int*(uint a, uint b) |
| | Returns *dot* for 4x8 bit input vectors packed into a 32-bit word. |
| |
| <<unified-spec, Requires>> that the |
| {opencl_c_integer_dot_product_input_4x8bit_packed} feature macro is |
| defined, |
| |
| | uint *dot_acc_sat_4x8packed_uu_uint*(uint a, uint b, uint acc) + |
| int *dot_acc_sat_4x8packed_ss_int*(uint a, uint b, int acc) + |
| int *dot_acc_sat_4x8packed_us_int*(uint a, uint b, int acc) + |
| int *dot_acc_sat_4x8packed_su_int*(uint a, uint b, int acc) |
| | Returns *dot_acc_set* for 4x8 bit input vectors packed into a 32-bit |
| word. |
| |
| <<unified-spec, Requires>> that the |
| {opencl_c_integer_dot_product_input_4x8bit_packed} feature macro is |
| defined, |
| endif::cl_khr_integer_dot_product[] |
| |
| | gentype *mad_hi*(gentype _a_, gentype _b_, gentype _c_) |
| | Returns *mul_hi*(_a_, _b_) + _c_. |
| | gentype *mad_sat*(gentype _a_, gentype _b_, gentype _c_) |
| | Returns _a_ * _b_ + _c_ and saturates the result. |
| | gentype *max*(gentype _x_, gentype _y_) |
| |
| For OpenCL C 1.1 or newer: |
| |
| gentype *max*(gentype _x_, sgentype _y_) |
| | Returns _y_ if _x_ < _y_, otherwise it returns _x_. |
| | gentype *min*(gentype _x_, gentype _y_) |
| |
| For OpenCL C 1.1 or newer: |
| |
| gentype *min*(gentype _x_, sgentype _y_) |
| | Returns _y_ if _y_ < _x_, otherwise it returns _x_. |
| | gentype *mul_hi*(gentype _x_, gentype _y_) |
| | Computes _x_ * _y_ and returns the high half of the product of _x_ and |
| _y_. |
| | gentype *rotate*(gentype _v_, gentype _i_) |
| | For each element in _v_, the bits are shifted left by the number of |
| bits given by the corresponding element in _i_ (subject to the usual |
| <<operators-shift,shift modulo rules>>). |
| Bits shifted off the left side of the element are shifted back in from |
| the right. |
| | gentype *sub_sat*(gentype _x_, gentype _y_) |
| | Returns _x_ - _y_ and saturates the result. |
| | short *upsample*(char _hi_, uchar _lo_) + |
| ushort *upsample*(uchar _hi_, uchar _lo_) + |
| short__n__ *upsample*(char__n__ _hi_, uchar__n__ _lo_) + |
| ushort__n__ *upsample*(uchar__n__ _hi_, uchar__n__ _lo_) |
| | _result_[i] = ((short)_hi_[i] << 8) \| _lo_[i] + |
| _result_[i] = ((ushort)_hi_[i] << 8) \| _lo_[i] + |
| | int *upsample*(short _hi_, ushort _lo_) + |
| uint *upsample*(ushort _hi_, ushort _lo_) + |
| int__n__ *upsample*(short__n__ _hi_, ushort__n__ _lo_) + |
| uint__n__ *upsample*(ushort__n__ _hi_, ushort__n__ _lo_) |
| | _result_[i] = ((int)_hi_[i] << 16) \| _lo_[i] + |
| _result_[i] = ((uint)_hi_[i] << 16) \| _lo_[i] + |
| | long *upsample*(int _hi_, uint _lo_) + |
| ulong *upsample*(uint _hi_, uint _lo_) + |
| long__n__ *upsample*(int__n__ _hi_, uint__n__ _lo_) + |
| ulong__n__ *upsample*(uint__n__ _hi_, uint__n__ _lo_) |
| | _result_[i] = ((long)_hi_[i] << 32) \| _lo_[i] + |
| _result_[i] = ((ulong)_hi_[i] << 32) \| _lo_[i] |
| | gentype *popcount*(gentype _x_) |
| | Returns the number of non-zero bits in _x_. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| |==== |
| |
| The following table describes fast integer functions that can be used for |
| optimizing performance of kernels. |
| We use the generic type name `gentype` to indicate that the function can |
| take `int`, `int2`, `int3`, `int4`, `int8`, `int16`, `uint`, `uint2`, |
| `uint3`, `uint4`, `uint8` or `uint16` as the type for the arguments. |
| |
| [[table-builtin-fast-integer]] |
| .Built-in 24-bit Integer Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | gentype *mad24*(gentype _x_, gentype _y_, gentype z) |
| | Multipy two 24-bit integer values _x_ and _y_ and add the 32-bit |
| integer result to the 32-bit integer _z_. |
| Refer to definition of *mul24* to see how the 24-bit integer |
| multiplication is performed. |
| | gentype *mul24*(gentype _x_, gentype _y_) |
| | Multiply two 24-bit integer values _x_ and _y_. |
| _x_ and _y_ are 32-bit integers but only the low 24-bits are used to |
| perform the multiplication. |
| *mul24* should only be used when values in _x_ and _y_ are in the |
| range [-2^23^, 2^23^-1] if _x_ and _y_ are signed integers and in the |
| range [0, 2^24^-1] if _x_ and _y_ are unsigned integers. |
| If _x_ and _y_ are not in this range, the multiplication result is |
| implementation-defined. |
| |==== |
| -- |
| |
| |
| ifdef::cl_khr_extended_bit_ops[] |
| [[extended-bit-operations]] |
| ==== Extended Bit Operations |
| |
| [open,refpage='extendedBitOperations',desc='Extended Bit Operations',type='freeform',spec='clang',anchor='extended-bit-operations',xrefs='commonFunctions',alias='bitfield_insert bitfield_extract_signed bitfield_extract_unsigned bit_reverse'] |
| -- |
| If the `<<cl_khr_extended_bit_ops>>` extension macro is supported, the |
| functions described in the <<table-builtin-extended-bit-operations, Built-in |
| Scalar and Vector Extended Bit Operations>> table can be used with built-in |
| scalar or vector integer types to perform extended bit operations. |
| The functions that operate on vector types operate component-wise. |
| The description is per-component. |
| |
| In the table below, the generic type name `gentype` refers to the built-in |
| integer types `char`, `char__n__`, `uchar`, `uchar__n__`, `short`, |
| `short__n__`, `ushort`, `ushort__n__`, `int`, `int__n__`, `uint`, |
| `uint__n__`, `long`, `long__n__`, `ulong`, and `ulong__n__`. |
| The generic type name `igentype` refers to the built-in signed integer types |
| `char`, `char__n__`, `short`, `short__n__`, `int`, `int__n__`, `long`, and |
| `long__n__`. |
| The generic type name `ugentype` refers to the built-in unsigned integer |
| types `uchar`, `uchar__n__`, `ushort`, `ushort__n__`, `uint`, `uint__n__`, |
| `ulong`, and `ulong__n__`. |
| _n_ is 2, 3, 4, 8, or 16. |
| |
| [[table-builtin-extended-bit-operations]] |
| .Built-in Scalar and Vector Extended Bit Operations |
| [cols="1a,1", options="header"] |
| |=== |
| | Function | Description |
| a| |
| [source,opencl_c] |
| ---- |
| gentype bitfield_insert( |
| gentype base, gentype insert, |
| uint offset, uint count) |
| ---- |
| | Returns a copy of _base_, with a modified bitfield that comes from |
| _insert_. |
| |
| Any bits of the result value numbered outside [_offset_, _offset_ {plus} |
| _count_ - 1] (inclusive) will come from the corresponding bits in |
| _base_. |
| |
| Any bits of the result value numbered inside [_offset_, _offset_ {plus} |
| _count_ - 1] (inclusive) will come from the bits numbered [0, _count_ |
| - 1] (inclusive) of _insert_. |
| |
| _count_ is the number of bits to be modified. |
| If _count_ equals 0, the return value will be equal to _base_. |
| |
| If _count_ or _offset_ or _offset_ + _count_ is greater than number of |
| bits in `gentype` (for scalar types) or components of `gentype` (for |
| vector types), the result is undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_extended_bit_ops>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| igentype bitfield_extract_signed( |
| gentype base, |
| uint offset, uint count) |
| ---- |
| | Returns an extracted bitfield from _base_ with sign extension. |
| The type of the return value is always a signed type. |
| |
| The bits of _base_ numbered in [_offset_, _offset_ + _count_ - 1] |
| (inclusive) are returned as the bits numbered in [0, _count_ - 1] |
| (inclusive) of the result. |
| The remaining bits in the result will be sign extended by replicating |
| the bit numbered _offset_ + _count_ - 1 of _base_. |
| |
| _count_ is the number of bits to be extracted. |
| If _count_ equals 0, the result is 0. |
| |
| If the _count_ or _offset_ or _offset_ + _count_ is greater than |
| number of bits in `gentype` (for scalar types) or components of |
| `gentype` (for vector types), the result is undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_extended_bit_ops>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| ugentype bitfield_extract_unsigned( |
| gentype base, |
| uint offset, uint count) |
| ---- |
| | Returns an extracted bitfield from _base_ with zero extension. |
| The type of the return value is always an unsigned type. |
| |
| The bits of _base_ numbered in [_offset_, _offset_ + _count_ - 1] |
| (inclusive) are returned as the bits numbered in [0, _count_ - 1] |
| (inclusive) of the result. |
| The remaining bits in the result will be zero. |
| |
| _count_ is the number of bits to be extracted. |
| If _count_ equals 0, the result is 0. |
| |
| If the _count_ or _offset_ or _offset_ + _count_ is greater than |
| number of bits in `gentype` (for scalar types) or components of |
| `gentype` (for vector types), the result is undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_extended_bit_ops>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| gentype bit_reverse( |
| gentype base) |
| ---- |
| | Returns the value of _base_ with reversed bits. |
| That is, the bit numbered _n_ of the result value will be taken from |
| the bit numbered _width_ - _n_ - 1 of _base_ (for scalar types) or a |
| component of _base_ (for vector types), where _width_ is number of |
| bits of `gentype` (for scalar types) or components of `gentype` (for |
| vector types). |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_extended_bit_ops>>` extension macro. |
| |=== |
| -- |
| endif::cl_khr_extended_bit_ops[] |
| |
| |
| [[integer-macros]] |
| ==== Integer Macros |
| |
| [open,refpage='integerMacros',desc='Integer Macros',type='freeform',spec='clang',anchor='integer-macros',xrefs='fpMacros'] |
| -- |
| The macro names given in the following list must use the values specified. |
| The values shall all be constant expressions suitable for use in `#if` |
| preprocessing directives. |
| |
| [source,opencl_c] |
| ---------- |
| #define CHAR_BIT 8 |
| #define CHAR_MAX SCHAR_MAX |
| #define CHAR_MIN SCHAR_MIN |
| #define INT_MAX 2147483647 |
| #define INT_MIN (-2147483647 - 1) |
| #define LONG_MAX 0x7fffffffffffffffL |
| #define LONG_MIN (-0x7fffffffffffffffL - 1) |
| #define SCHAR_MAX 127 |
| #define SCHAR_MIN (-127 - 1) |
| #define SHRT_MAX 32767 |
| #define SHRT_MIN (-32767 - 1) |
| #define UCHAR_MAX 255 |
| #define USHRT_MAX 65535 |
| #define UINT_MAX 0xffffffff |
| #define ULONG_MAX 0xffffffffffffffffUL |
| ---------- |
| |
| The following table describes the built-in macro names given above in the |
| OpenCL C programming language and the corresponding macro names available to |
| the application. |
| |
| [cols=",",options="header",] |
| |==== |
| | Macro in OpenCL Language | Macro for application |
| | `CHAR_BIT` | `CL_CHAR_BIT` |
| | `CHAR_MAX` | `CL_CHAR_MAX` |
| | `CHAR_MIN` | `CL_CHAR_MIN` |
| | `INT_MAX` | `CL_INT_MAX` |
| | `INT_MIN` | `CL_INT_MIN` |
| | `LONG_MAX` | `CL_LONG_MAX` |
| | `LONG_MIN` | `CL_LONG_MIN` |
| | `SCHAR_MAX` | `CL_SCHAR_MAX` |
| | `SCHAR_MIN` | `CL_SCHAR_MIN` |
| | `SHRT_MAX` | `CL_SHRT_MAX` |
| | `SHRT_MIN` | `CL_SHRT_MIN` |
| | `UCHAR_MAX` | `CL_UCHAR_MAX` |
| | `USHRT_MAX` | `CL_USHRT_MAX` |
| | `UINT_MAX` | `CL_UINT_MAX` |
| | `ULONG_MAX` | `CL_ULONG_MAX` |
| |==== |
| -- |
| |
| |
| [[common-functions]] |
| === Common Functions |
| |
| [open,refpage='commonFunctions',desc='Common Functions',type='freeform',spec='clang',anchor='common-functions',xrefs='integerFunctions',alias='commonClamp degrees commonMax commonMin mix radians sign smoothstep step'] |
| -- |
| The <<table-builtin-common, following table>> describes the list of built-in |
| common functions. |
| These all operate component-wise. |
| The description is per-component. |
| |
| The generic type name `gentype` indicates that the function can take any of |
| |
| * `float`, `float2`, `float3`, `float4`, `float8`, or `float16` |
| * `double` footnote:double-supported[{fn-double-supported}], `double2`, |
| `double3`, `double4`, `double8` or `double16` |
| ifdef::cl_khr_fp16[] |
| * `half` footnote:[{fn-half-supported}], `half2`, `half3`, `half4`, |
| `half8` or `half16` |
| endif::cl_khr_fp16[] |
| |
| as the type for the arguments. |
| |
| The generic type name `gentypef` indicates that the function can take any of |
| |
| * `float`, `float2`, `float3`, `float4`, `float8`, or `float16` |
| |
| as the type for the arguments. |
| |
| The generic type name `gentyped` footnote:[{fn-double-supported}] indicates |
| that the function can take any of |
| |
| * `double`, `double2`, `double3`, `double4`, `double8` or `double16` |
| |
| as the type for the arguments. |
| |
| ifdef::cl_khr_fp16[] |
| The generic type name `gentypeh` footnote:[{fn-half-supported}] indicates |
| that the function can take any of |
| |
| * `half`, `half2`, `half3`, `half4`, `half8` or `half16` |
| |
| as the type for the arguments. |
| |
| NOTE: All functions taking or returning `half` types are supported only when |
| the `<<cl_khr_fp16>>` extension macro is supported. |
| endif::cl_khr_fp16[] |
| |
| [[table-builtin-common]] |
| .Built-in Scalar and Vector Argument Common Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | gentype *clamp*(gentype _x_, gentype _minval_, gentype _maxval_) + |
| gentypef *clamp*(gentypef _x_, float _minval_, float _maxval_) + |
| gentyped *clamp*(gentyped _x_, double _minval_, double _maxval_) |
| |
| ifdef::cl_khr_fp16[gentypeh *clamp*(gentypeh _x_, half _minval_, half _maxval_)] |
| | Returns *fmin*(*fmax*(_x_, _minval_), _maxval_). |
| Results are undefined if _minval_ > _maxval_. |
| | gentype *degrees*(gentype _radians_) |
| | Converts _radians_ to degrees, i.e. (180 / {pi}) * _radians_. |
| | gentype *max*(gentype _x_, gentype _y_) + |
| gentypef *max*(gentypef _x_, float _y_) + |
| gentyped *max*(gentyped _x_, double _y_) |
| |
| ifdef::cl_khr_fp16[gentypeh *max*(gentypeh _x_, half _y_)] |
| | Returns _y_ if _x_ < _y_, otherwise it returns _x_. |
| If _x_ or _y_ are infinite or NaN, the return values are undefined. |
| | gentype *min*(gentype _x_, gentype _y_) + |
| gentypef *min*(gentypef _x_, float _y_) + |
| gentyped *min*(gentyped _x_, double _y_) |
| |
| ifdef::cl_khr_fp16[gentypeh *min*(gentypeh _x_, half _y_)] |
| | Returns _y_ if _y_ < _x_, otherwise it returns _x_. |
| If _x_ or _y_ are infinite or NaN, the return values are undefined. |
| | gentype *mix*(gentype _x_, gentype _y_, gentype _a_) + |
| gentypef *mix*(gentypef _x_, gentypef _y_, float _a_) + |
| gentyped *mix*(gentyped _x_, gentyped _y_, double _a_) |
| |
| ifdef::cl_khr_fp16[gentypeh *mix*(gentypeh _x_, gentypeh _y_, half _a_)] |
| a| Returns the linear blend of _x_ and _y_ implemented as: |
| |
| _x_ + (_y_ - _x_) * _a_ |
| |
| _a_ must be a value in the range [0.0, 1.0]. |
| If _a_ is not in the range [0.0, 1.0], the return values are |
| undefined. |
| |
| ifdef::cl_khr_fp16[] |
| NOTE: The half-precision *mix* function can be implemented using |
| contractions such as *mad* or *fma*. |
| endif::cl_khr_fp16[] |
| | gentype *radians*(gentype _degrees_) |
| | Converts _degrees_ to radians, i.e. ({pi} / 180) * _degrees_. |
| | gentype *step*(gentype _edge_, gentype _x_) + |
| gentypef *step*(float _edge_, gentypef _x_) + |
| gentyped *step*(double _edge_, gentyped _x_) |
| |
| ifdef::cl_khr_fp16[gentypeh *step*(half _edge_, gentypeh _x_)] |
| | Returns 0.0 if _x_ < _edge_, otherwise it returns 1.0. |
| | gentype *smoothstep*(gentype _edge0_, gentype _edge1_, gentype _x_) + |
| gentypef *smoothstep*(float _edge0_, float _edge1_, gentypef _x_) + |
| gentyped *smoothstep*(double _edge0_, double _edge1_, gentyped _x_) |
| |
| ifdef::cl_khr_fp16[gentypeh *smoothstep*(half _edge0_, half _edge1_, gentypeh _x_)] |
| a| Returns 0.0 if _x_ \<= _edge0_ and 1.0 if _x_ >= _edge1_ and performs |
| smooth Hermite interpolation between 0 and 1 when _edge0_ < _x_ < |
| _edge1_. |
| This is useful in cases where you would want a threshold function with |
| a smooth transition. |
| |
| This is equivalent to: |
| |
| [source,opencl_c] |
| ---------- |
| gentype t; |
| t = clamp ((x - edge0) / (edge1 - edge0), 0, 1); |
| return t * t * (3 - 2 * t); |
| ---------- |
| |
| Results are undefined if _edge0_ >= _edge1_ or if _x_, _edge0_ or _edge1_ is |
| a NaN. |
| |
| ifdef::cl_khr_fp16[] |
| NOTE: The half-precision *mix* function can be implemented using |
| contractions such as *mad* or *fma*. |
| endif::cl_khr_fp16[] |
| | gentype *sign*(gentype _x_) |
| | Returns 1.0 if _x_ > 0, -0.0 if _x_ = -0.0, +0.0 if _x_ = +0.0, or |
| -1.0 if _x_ < 0. |
| Returns 0.0 if _x_ is a NaN. |
| |==== |
| |
| -- |
| |
| |
| [[geometric-functions]] |
| === Geometric Functions |
| |
| [open,refpage='geometricFunctions',desc='Geometric Functions',type='freeform',spec='clang',anchor='geometric-functions',xrefs='integerFunctions',alias='cross dot distance length normalize fast_distance fast_length fast_normalize'] |
| -- |
| // TODO It is not actually true that these functions operate - |
| // TODO in general they *combine* components. |
| |
| The <<table-builtin-geometric, following table>> describes the list of built-in |
| geometric functions. |
| These all operate component-wise. |
| The description is per-component. |
| |
| The generic type name `gentypef` indicates that the function can take any of |
| |
| * `float`, `float2`, `float3`, or `float4` |
| |
| as the type for the arguments. |
| |
| The generic type name `gentyped` footnote:[{fn-double-supported}] indicates |
| that the function can take any of |
| |
| * `double`, `double2`, `double3`, or `double4` |
| |
| as the type for the arguments. |
| |
| ifdef::cl_khr_fp16[] |
| The generic type name `gentypeh` footnote:[{fn-half-supported}] indicates |
| that the function can take any of |
| |
| * `half`, `half2`, `half3`, or `half4` |
| |
| as the type for the arguments. |
| |
| NOTE: All functions taking or returning `half` types are supported only when |
| the `<<cl_khr_fp16>>` extension macro is supported. |
| endif::cl_khr_fp16[] |
| |
| For any specific use of a function with `gentype*` arguments the actual type |
| has to be the same for all arguments and the return type, unless they are |
| explicitly specified as an actual type. |
| |
| [[table-builtin-geometric]] |
| .Built-in Scalar and Vector Argument Geometric Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | float4 *cross*(float4 _p0_, float4 _p1_) + |
| float3 *cross*(float3 _p0_, float3 _p1_) + |
| double4 *cross*(double4 _p0_, double4 _p1_) + |
| double3 *cross*(double3 _p0_, double3 _p1_) |
| |
| ifdef::cl_khr_fp16[] |
| half4 *cross*(half4 _p0_, half4 _p1_) + |
| half3 *cross*(half3 _p0_, half3 _p1_) |
| endif::cl_khr_fp16[] |
| | Returns the cross product of _p0.xyz_ and _p1.xyz_. |
| The _w_ component of `float4` result returned will be 0.0. |
| | float *dot*(gentypef _p0_, gentypef _p1_) + |
| double *dot*(gentyped _p0_, gentyped _p1_) |
| |
| ifdef::cl_khr_fp16[half *dot*(gentypeh _p0_, gentypeh _p1_)] |
| | Compute the dot product of _p0_ and _p1_. |
| | float *distance*(gentypef _p0_, gentypef _p1_) + |
| double *distance*(gentyped _p0_, gentyped _p1_) |
| |
| ifdef::cl_khr_fp16[half *distance*(gentypeh _p0_, gentypeh _p1_)] |
| | Returns the distance between _p0_ and _p1_. |
| This is calculated as *length*(_p0_ - _p1_). |
| | float *length*(gentypef _p_) + |
| double *length*(gentyped _p_) |
| |
| ifdef::cl_khr_fp16[half *length*(gentypeh _p_)] |
| | Return the length of vector _p_, i.e., {sqrt} __p.x__^2^ + _p.y_ ^2^ |
| {plus} ... |
| | gentypef *normalize*(gentypef _p_) + |
| gentyped *normalize*(gentyped _p_) |
| |
| ifdef::cl_khr_fp16[gentypeh *normalize*(gentypeh _p_)] |
| | Returns a vector in the same direction as _p_ but with a length of 1. |
| | | |
| | float *fast_distance*(float _p0_, float__n__ _p1_) |
| | Returns *fast_length*(_p0_ - _p1_). |
| | float *fast_length*(float__n__ _p_) |
| | Returns the length of vector _p_ computed as: |
| |
| *half_sqrt*(__p.x__^2^ + __p.y__^2^ + ...) |
| | float__n__ *fast_normalize*(float__n__ _p_) |
| a| Returns a vector in the same direction as _p_ but with a length of 1. |
| *fast_normalize* is computed as: |
| |
| _p_ * *half_rsqrt*(__p.x__^2^ + __p.y__^2^ + ...) |
| |
| The result shall be within 8192 ulps error from the infinitely precise |
| result of |
| |
| [source,opencl_c] |
| ---------- |
| if (all(p == 0.0f)) |
| result = p; |
| else |
| result = p / |
| sqrt(p.x*p.x + p.y*p.y + ...); |
| ---------- |
| |
| with the following exceptions: |
| |
| . If the sum of squares is greater than `FLT_MAX` then the value of the |
| floating-point values in the result vector are undefined. |
| . If the sum of squares is less than `FLT_MIN` then the implementation |
| may return back _p_. |
| . If the device is in "`denorms are flushed to zero`" mode, individual |
| operand elements with magnitude less than *sqrt*(`FLT_MIN`) may be flushed |
| to zero before proceeding with the calculation. |
| |==== |
| -- |
| |
| |
| [[relational-functions]] |
| === Relational Functions |
| |
| [open,refpage='relationalFunctions',desc='Relational Functions',type='freeform',spec='clang',anchor='relational-functions',xrefs='integerFunctions',alias='all any bitselect isequal isfinite isgreater isgreaterequal isinf isless islessequal islessgreater isnan isnormal isnotequal isordered isunordered select signbit'] |
| -- |
| The <<operators-relational,relational>> and <<operators-equality,equality>> |
| operators (*<*, *\<=*, *>*, *>=*, *!=*, *==*) can be used with scalar and |
| vector built-in types and produce a scalar or vector signed integer result |
| respectively. |
| |
| The functions described in the <<table-builtin-relational,Built-in Scalar and |
| Vector Relational Functions>> table can be used with built-in scalar or vector |
| types as arguments and return a scalar or vector integer result |
| footnote:[{fn-floating-point-exception-nans}]. |
| The argument type `gentype` refers to the following built-in types: `char`, |
| `char__n__`, `uchar`, `uchar__n__`, `short`, `short__n__`, `ushort`, |
| `ushort__n__`, `int`, `int__n__`, `uint`, `uint__n__`, `long` |
| footnote:[{fn-int64-supported}], `long__n__`, `ulong`, `ulong__n__`, `float`, |
| `float__n__`, `double` footnote:[{fn-double-supported}], and |
| `double__n__`. |
| The argument type `igentype` refers to the built-in signed integer types |
| i.e. `char`, `char__n__`, `short`, `short__n__`, `int`, `int__n__`, `long` |
| and `long__n__`. |
| The argument type `ugentype` refers to the built-in unsigned integer types |
| i.e. `uchar`, `uchar__n__`, `ushort`, `ushort__n__`, `uint`, `uint__n__`, |
| `ulong` and `ulong__n__`. |
| _n_ is 2, 3, 4, 8, or 16. |
| |
| The functions *isequal*, *isnotequal*, *isgreater*, *isgreaterequal*, |
| *isless*, *islessequal*, *islessgreater*, *isfinite*, *isinf*, *isnan*, |
| *isnormal*, *isordered*, *isunordered* and *signbit* described in the |
| following table shall return a 0 if the specified relation is _false_ and a |
| 1 if the specified relation is _true_ for scalar argument types. |
| These functions shall return a 0 if the specified relation is _false_ and a |
| -1 (i.e. all bits set) if the specified relation is _true_ for vector |
| argument types. |
| |
| The relational functions *isequal*, *isgreater*, *isgreaterequal*, *isless*, |
| *islessequal*, and *islessgreater* always return 0 if either argument is not |
| a number (NaN). |
| *isnotequal* returns 1 if one or both arguments are not a number (NaN) and |
| the argument type is a scalar and returns -1 if one or both arguments are |
| not a number (NaN) and the argument type is a vector. |
| |
| [[table-builtin-relational]] |
| .Built-in Scalar and Vector Relational Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int *isequal*(float _x_, float _y_) + |
| int__n__ *isequal*(float__n__ _x_, float__n__ _y_) + |
| int *isequal*(double _x_, double _y_) + |
| long__n__ *isequal*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isequal*(half _x_, half _y_) + |
| short__n__ *isequal*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ == _y_. |
| | int *isnotequal*(float _x_, float _y_) + |
| int__n__ *isnotequal*(float__n__ _x_, float__n__ _y_) + |
| int *isnotequal*(double _x_, double _y_) + |
| long__n__ *isnotequal*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isnotequal*(half _x_, half _y_) + |
| short__n__ *isnotequal*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ != _y_. |
| | int *isgreater*(float _x_, float _y_) + |
| int__n__ *isgreater*(float__n__ _x_, float__n__ _y_) + |
| int *isgreater*(double _x_, double _y_) + |
| long__n__ *isgreater*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isgreater*(half _x_, half _y_) + |
| short__n__ *isgreater*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ > _y_. |
| | int *isgreaterequal*(float _x_, float _y_) + |
| int__n__ *isgreaterequal*(float__n__ _x_, float__n__ _y_) + |
| int *isgreaterequal*(double _x_, double _y_) + |
| long__n__ *isgreaterequal*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isgreaterequal*(half _x_, half _y_) + |
| short__n__ *isgreaterequal*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ >= _y_. |
| | int *isless*(float _x_, float _y_) + |
| int__n__ *isless*(float__n__ _x_, float__n__ _y_) + |
| int *isless*(double _x_, double _y_) + |
| long__n__ *isless*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isless*(half _x_, half _y_) + |
| short__n__ *isless*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ < _y_. |
| | int *islessequal*(float _x_, float _y_) + |
| int__n__ *islessequal*(float__n__ _x_, float__n__ _y_) + |
| int *islessequal*(double _x_, double _y_) + |
| long__n__ *islessequal*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *islessequal*(half _x_, half _y_) + |
| short__n__ *islessequal*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of _x_ \<= _y_. |
| | int *islessgreater*(float _x_, float _y_) + |
| int__n__ *islessgreater*(float__n__ _x_, float__n__ _y_) + |
| int *islessgreater*(double _x_, double _y_) + |
| long__n__ *islessgreater*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *islessgreater*(half _x_, half _y_) + |
| short__n__ *islessgreater*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Returns the component-wise compare of (_x_ < _y_) \|\| (_x_ > _y_) . |
| | | |
| | int *isfinite*(float) + |
| int__n__ *isfinite*(float__n__) + |
| int *isfinite*(double) + |
| long__n__ *isfinite*(double__n__) |
| |
| ifdef::cl_khr_fp16[] |
| int *isfinite*(half) + |
| short__n__ *isfinite*(half__n__) |
| endif::cl_khr_fp16[] |
| | Test for finite value. |
| | int *isinf*(float) + |
| int__n__ *isinf*(float__n__) + |
| int *isinf*(double) + |
| long__n__ *isinf*(double__n__) |
| |
| ifdef::cl_khr_fp16[] |
| int *isinf*(half) + |
| short__n__ *isinf*(half__n__) |
| endif::cl_khr_fp16[] |
| | Test for infinity value (positive or negative). |
| | int *isnan*(float) + |
| int__n__ *isnan*(float__n__) + |
| int *isnan*(double) + |
| long__n__ *isnan*(double__n__) |
| |
| ifdef::cl_khr_fp16[] |
| int *isnan*(half) + |
| short__n__ *isnan*(half__n__) |
| endif::cl_khr_fp16[] |
| | Test for a NaN. |
| | int *isnormal*(float) + |
| int__n__ *isnormal*(float__n__) + |
| int *isnormal*(double) + |
| long__n__ *isnormal*(double__n__) |
| |
| ifdef::cl_khr_fp16[] |
| int *isnormal*(half) + |
| short__n__ *isnormal*(half__n__) |
| endif::cl_khr_fp16[] |
| | Test for a normal value. |
| | int *isordered*(float _x_, float _y_) + |
| int__n__ *isordered*(float__n__ _x_, float__n__ _y_) + |
| int *isordered*(double _x_, double _y_) + |
| long__n__ *isordered*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isordered*(half _x_, half _y_) + |
| short__n__ *isordered*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Test if arguments are ordered. |
| *isordered*() takes arguments _x_ and _y_, and returns the result |
| *isequal*(_x_, _x_) && *isequal*(_y_, _y_). |
| | int *isunordered*(float _x_, float _y_) + |
| int__n__ *isunordered*(float__n__ _x_, float__n__ _y_) + |
| int *isunordered*(double _x_, double _y_) + |
| long__n__ *isunordered*(double__n__ _x_, double__n__ _y_) |
| |
| ifdef::cl_khr_fp16[] |
| int *isunordered*(half _x_, half _y_) + |
| short__n__ *isunordered*(half__n__ _x_, half__n__ _y_) |
| endif::cl_khr_fp16[] |
| | Test if arguments are unordered. |
| *isunordered*() takes arguments _x_ and _y_, returning non-zero if _x_ |
| or _y_ is NaN, and zero otherwise. |
| | int *signbit*(float _x_) + |
| int__n__ *signbit*(float__n__ _x_) + |
| int *signbit*(double _x_) + |
| long__n__ *signbit*(double__n__ _x_) |
| |
| ifdef::cl_khr_fp16[] |
| int *signbit*(half _x_) + |
| short__n__ *signbit*(half__n__ _x_) |
| endif::cl_khr_fp16[] |
| | Test for sign bit. |
| The scalar version of the function returns a 1 if the sign bit in _x_ |
| is set else returns 0. |
| The vector version of the function returns the following for each |
| component in _x_: -1 (i.e all bits set) if the sign bit in the float is |
| set else returns 0. |
| | | |
| | int *any*(igentype _x_) |
| |
| Scalar inputs to *any* are <<unified-spec, deprecated by>> OpenCL C version |
| 3.0. |
| | Returns 1 if the most significant bit of _x_ (for scalar inputs) or |
| any component of _x_ (for vector inputs) is set; otherwise returns 0. |
| | int *all*(igentype _x_) |
| |
| Scalar inputs to *all* are <<unified-spec, deprecated by>> OpenCL C version |
| 3.0. |
| | Returns 1 if the most significant bit of _x_ (for scalar inputs) or |
| all components of _x_ (for vector inputs) is set; otherwise returns 0. |
| | | |
| | gentype *bitselect*(gentype _a_, gentype _b_, gentype _c_) |
| | Each bit of the result is the corresponding bit of _a_ if the |
| corresponding bit of _c_ is 0. |
| Otherwise it is the corresponding bit of _b_. |
| | gentype **select**(gentype _a_, gentype _b_, igentype _c_) + |
| gentype **select**(gentype _a_, gentype _b_, ugentype _c_) |
| | For each component of a vector type, |
| |
| _result[i]_ = if MSB of _c[i]_ is set ? _b[i]_ : _a[i]_. |
| |
| For a scalar type, _result_ = _c_ ? _b_ : _a_. |
| |
| `igentype` and `ugentype` must have the same number of elements and |
| bits as `gentype` footnote:[{fn-select-vs-ternary}]. |
| |==== |
| -- |
| |
| |
| [[vector-data-load-and-store-functions]] |
| === Vector Data Load and Store Functions |
| |
| [open,refpage='vectorDataLoadandStoreFunctions',desc='Vector Data Load and Store Functions',type='freeform',spec='clang',anchor='vector-data-load-and-store-functions',xrefs='',alias='vloadn vload_half vload_halfn vloada_halfn vstoren vstore_half vstore_halfn vstorea_halfn'] |
| -- |
| |
| The <<table-vector-loadstore, Built-in Vector Data Load and Store |
| Functions>> table describes the list of supported functions that allow you |
| to read and write vector types from a pointer to memory. |
| |
| The generic type name `gentype` indicates that the function can take any of |
| |
| * `char`, `uchar`, `short`, `ushort`, `int`, `uint`, `long` |
| footnote:[{fn-int64-supported}] or `ulong` |
| * `float` or `double` footnote:double-supported[{fn-double-supported}] |
| ifdef::cl_khr_fp16[] |
| * `half` footnote:[{fn-half-supported}] |
| |
| |
| NOTE: All functions taking or returning `half` types are supported only when |
| the `<<cl_khr_fp16>>` extension macro is supported. |
| endif::cl_khr_fp16[] |
| |
| as the type for the arguments. |
| |
| The generic type name `gentype__n__` indicates an _n_-element vector of |
| `gentype` elements. |
| |
| The generic type name `half__n__` indicates an _n_-element vector of `half` |
| elements. |
| |
| The suffix _n_ is also used in the function names (i.e. *vload__n__*, |
| *vstore__n__* etc.), where _n_ = 2, 3 footnote:[{fn-vec3-vload-vstore}], 4, |
| 8 or 16. |
| |
| [[table-vector-loadstore]] |
| .Built-in Vector Data Load and Store Functions |
| [cols="7,3",options="header",] |
| |==== |
| | Function | Description |
| | gentype__n__ **vload__n__**(size_t _offset_, const {global} gentype *_p_) + |
| gentype__n__ **vload__n__**(size_t _offset_, const {local} gentype *_p_) + |
| gentype__n__ **vload__n__**(size_t _offset_, const {constant} gentype *_p_) + |
| gentype__n__ **vload__n__**(size_t _offset_, const {private} gentype *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| gentype__n__ **vload__n__**(size_t _offset_, const gentype *_p_) |
| | Return `sizeof(gentype__n__)` bytes of data, where the first `(__n__ * |
| sizeof(gentype))` bytes are read from the address |
| computed as `(_p_ {plus} (_offset_ * _n_))`. |
| The computed address must be 8-bit aligned if `gentype` is `char` or |
| `uchar`; 16-bit aligned if `gentype` is |
| ifdef::cl_khr_fp16[`half`,] |
| `short` or `ushort`; 32-bit aligned if `gentype` is `int`, `uint`, or |
| `float`; and 64-bit aligned if `gentype` is `long` or `ulong`. |
| | void **vstore__n__**(gentype__n__ _data_, size_t _offset_, {global} gentype *_p_) + |
| void **vstore__n__**(gentype__n__ _data_, size_t _offset_, {local} gentype *_p_) + |
| void **vstore__n__**(gentype__n__ _data_, size_t _offset_, {private} gentype *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstore__n__**(gentype__n__ _data_, size_t _offset_, gentype *_p_) |
| | Write `_n_ * sizeof(gentype)` bytes given by _data_ to the address |
| computed as `(_p_ {plus} (_offset_ * _n_))`. |
| The computed address must be 8-bit aligned if `gentype` is `char` or |
| `uchar`; 16-bit aligned if `gentype` is |
| ifdef::cl_khr_fp16[`half`,] |
| `short` or `ushort`; 32-bit aligned if `gentype` is `int`, `uint`, or |
| `float`; and 64-bit aligned if `gentype` is `long` or `ulong`. |
| | float **vload_half**(size_t _offset_, const {global} half *_p_) + |
| float **vload_half**(size_t _offset_, const {local} half *_p_) + |
| float **vload_half**(size_t _offset_, const {constant} half *_p_) + |
| float **vload_half**(size_t _offset_, const {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float **vload_half**(size_t _offset_, const half *_p_) |
| | Read `sizeof(half)` bytes of data from the address computed as `(_p_ |
| {plus} _offset_)`. |
| The data read is interpreted as a `half` value. |
| The `half` value is converted to a `float` value and the `float` value |
| is returned. |
| The computed read address must be 16-bit aligned. |
| | float__n__ **vload_half__n__**(size_t _offset_, const {global} half *_p_) + |
| float__n__ **vload_half__n__**(size_t _offset_, const {local} half *_p_) + |
| float__n__ **vload_half__n__**(size_t _offset_, const {constant} half *_p_) + |
| float__n__ **vload_half__n__**(size_t _offset_, const {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float__n__ **vload_half__n__**(size_t _offset_, const half *_p_) |
| | Read `(_n_ * sizeof(half))` bytes of data from the address computed as |
| `(_p_ {plus} (_offset * n_))`. |
| The data read is interpreted as a `half__n__` value. |
| The `half__n__` value read is converted to a `float__n__` value and |
| the `float__n__` value is returned. |
| The computed read address must be 16-bit aligned. |
| | void **vstore_half**(float _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rte}**(float _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtz}**(float _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtp}**(float _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtn}**(float _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstore_half**(float _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rte}**(float _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtz}**(float _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtp}**(float _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtn}**(float _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstore_half**(float _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rte}**(float _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtz}**(float _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtp}**(float _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtn}**(float _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstore_half**(float _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rte}**(float _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtz}**(float _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtp}**(float _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtn}**(float _data_, size_t _offset_, half *_p_) |
| | The `float` value given by _data_ is first converted to a `half` value |
| using the appropriate rounding mode. |
| The `half` value is then written to the address computed as `(_p_ |
| {plus} _offset_)`. |
| The computed address must be 16-bit aligned. |
| |
| *vstore_half* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| | void **vstore_half__n__**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rte}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstore_half__n__**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rte}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstore_half__n__**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rte}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstore_half__n__**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rte}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtz}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtp}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtn}**(float__n__ _data_, size_t _offset_, half *_p_) |
| | The `float__n__` value given by _data_ is converted to a `half__n__` |
| value using the appropriate rounding mode. |
| `_n_ * sizeof(half)` bytes from the `half__n__` value are then written to |
| the address computed as `(_p_ |
| {plus} (_offset_ * _n_))`. |
| The computed address must be 16-bit aligned. |
| |
| *vstore_half__n__* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| | void **vstore_half**(double _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rte}**(double _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtz}**(double _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtp}**(double _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half{rtn}**(double _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstore_half**(double _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rte}**(double _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtz}**(double _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtp}**(double _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half{rtn}**(double _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstore_half**(double _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rte}**(double _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtz}**(double _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtp}**(double _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half{rtn}**(double _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstore_half**(double _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rte}**(double _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtz}**(double _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtp}**(double _data_, size_t _offset_, half *_p_) + |
| void **vstore_half{rtn}**(double _data_, size_t _offset_, half *_p_) |
| | The `double` value given by _data_ is first converted to a `half` |
| value using the appropriate rounding mode. |
| The `half` value is then written to the address computed as `(_p_ |
| {plus} _offset_)`. |
| The computed address must be 16-bit aligned. |
| |
| *vstore_half* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| | void **vstore_half__n__**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rte}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstore_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstore_half__n__**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rte}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstore_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstore_half__n__**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rte}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstore_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstore_half__n__**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rte}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtz}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtp}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstore_half__n__{rtn}**(double__n__ _data_, size_t _offset_, half *_p_) |
| | The `double__n__` value given by _data_ is converted to a `half__n__` |
| value using the appropriate rounding mode. |
| `_n_ * sizeof(half)` bytes from the `half__n__` value are then written to |
| the address computed as `(_p_ {plus} (_offset_ * _n_))`. |
| The computed address must be 16-bit aligned. |
| |
| *vstore_half__n__* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| | float__n__ **vloada_half__n__**(size_t _offset_, const {global} half *_p_) + |
| float__n__ **vloada_half__n__**(size_t _offset_, const {local} half *_p_) + |
| float__n__ **vloada_half__n__**(size_t _offset_, const {constant} half *_p_) + |
| float__n__ **vloada_half__n__**(size_t _offset_, const {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| float__n__ **vloada_half__n__**(size_t _offset_, const half *_p_) |
| | For n = 2, 4, 8 and 16, read `sizeof(half__n__)` bytes of data from |
| the address computed as (_p_ + (_offset_ * _n_)). |
| The data read is interpreted as a `half__n__` value. |
| The `half__n__` value read is converted to a `float__n__` value and |
| the `float__n__` value is returned. |
| The computed address must be aligned to `sizeof(half__n__)` bytes. |
| |
| For n = 3, *vloada_half3* reads a `half3` from the address computed as |
| `(_p_ {plus} (_offset * 4_))` and returns a `float3`. |
| The computed address must be aligned to `sizeof(half)` * 4 bytes. |
| |
| | void **vstorea_half__n__**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rte}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstorea_half__n__**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rte}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstorea_half__n__**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rte}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtz}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtp}**(float__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtn}**(float__n__ _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstorea_half__n__**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rte}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtz}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtp}**(float__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtn}**(float__n__ _data_, size_t _offset_, half *_p_) |
| | The `float__n__` value given by _data_ is converted to a `half__n__` |
| value using the appropriate rounding mode. |
| |
| For n = 2, 4, 8 and 16, the `half__n__` value is written to the |
| address computed as `(_p_ {plus} (_offset_ * _n_))`. |
| The computed address must be aligned to `sizeof(half__n__)` bytes. |
| |
| For n = 3, the `half3` value is written |
| to the address computed as `(_p_ {plus} (_offset_ * 4))`. |
| The computed address must be aligned to `sizeof(half) * 4` bytes. |
| |
| *vstorea_half__n__* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| | void **vstorea_half__n__**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rte}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {global} half *_p_) + |
| void **vstorea_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {global} half *_p_) |
| |
| void **vstorea_half__n__**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rte}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {local} half *_p_) + |
| void **vstorea_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {local} half *_p_) |
| |
| void **vstorea_half__n__**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rte}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtz}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtp}**(double__n__ _data_, size_t _offset_, {private} half *_p_) + |
| void **vstorea_half__n__{rtn}**(double__n__ _data_, size_t _offset_, {private} half *_p_) |
| |
| For OpenCL C 2.0, or OpenCL C 3.0 or newer with the |
| {opencl_c_generic_address_space} feature: |
| |
| void **vstorea_half__n__**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rte}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtz}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtp}**(double__n__ _data_, size_t _offset_, half *_p_) + |
| void **vstorea_half__n__{rtn}**(double__n__ _data_, size_t _offset_, half *_p_) |
| | The `double__n__` value is converted to a `half__n__` value using the |
| appropriate rounding mode. |
| |
| For n = 2, 4, 8 or 16, the `half__n__` value is written to the address |
| computed as `(_p_ {plus} (_offset_ * _n_))`. |
| The computed address must be aligned to `sizeof(half__n__)` bytes. |
| |
| For n = 3, the `half3` value is written |
| to the address computed as `(_p_ {plus} (_offset_ * 4))`. |
| The computed address must be aligned to `sizeof(half) * 4` bytes. |
| |
| *vstorea_half__n__* uses the default rounding mode. |
| The default rounding mode is round to nearest even. |
| |==== |
| |
| The results of vector data load and store functions are undefined if the |
| address being read from or written to is not correctly aligned as described |
| in <<table-vector-loadstore,Built-in Vector Data Load and Store Functions>>. |
| The pointer argument p can be a pointer to `global`, `local`, or `private` |
| memory for store functions described in <<table-vector-loadstore,Built-in Vector |
| Data Load and Store Functions>>. |
| The pointer argument p can be a pointer to `global`, `local`, `constant`, or |
| `private` memory for load functions described in |
| <<table-vector-loadstore,Built-in Vector Data Load and Store Functions>>. |
| |
| [NOTE] |
| ==== |
| The vector data load and store functions variants that take pointer |
| arguments which point to the generic address space are also supported. |
| ==== |
| -- |
| |
| |
| [[synchronization-functions]] |
| === Synchronization Functions |
| |
| [open,refpage='syncFunctions',desc='Synchronization Functions',type='freeform',spec='clang',anchor='synchronization-functions',alias='barrier work_group_barrier'] |
| -- |
| The following table describes built-in functions to synchronize the work-items |
| in a work-group. |
| |
| // Editors note: The table column widths are chosen so the description |
| // of these functions fits onto a single page. This avoids an error |
| // from optimize-pdf and preserves the entire contents of the table. |
| // If an alternative solution to this issue is discovered then the |
| // table widths can be re-adjusted. |
| |
| [[table-builtin-synchronization]] |
| .Built-in Work-group Synchronization Functions |
| [cols="3,7",options="header",] |
| |==== |
| | Function | Description |
| |
| | void *barrier*( + |
| cl_mem_fence_flags _flags_) |
| |
| For OpenCL C 2.0 or newer, as an alias for *barrier*: |
| |
| void *work_group_barrier*( + |
| cl_mem_fence_flags _flags_) |
| |
| void *work_group_barrier*( + |
| cl_mem_fence_flags _flags_, |
| memory_scope _scope_) |
| | For these functions, if any work-item in a work-group encounters a |
| barrier, the barrier must be encountered by all work-items in the |
| work-group before any are allowed to continue execution beyond the |
| barrier. |
| |
| If the barrier is inside a conditional statement, then all |
| work-items in the work-group must enter the conditional if any work-item in the work-group enters the |
| conditional statement and executes the barrier. |
| |
| If the barrier is inside a loop, then all work-items in the work-group must execute |
| the barrier on each iteration of the loop if any work-item executes the barrier on that iteration. |
| |
| The *barrier* and *work_group_barrier* functions can specify which |
| memory operations become visible to the appropriate memory scope |
| identified by _scope_ footnote:[{fn-memory-scope-restrictions}]. |
| The _flags_ argument specifies the memory address spaces. |
| This is a bitfield and can be set to 0 or a combination of the |
| following values OR'ed together. |
| When these flags are OR'ed together the barrier acts as a |
| combined barrier for all address spaces specified by the flags |
| ordering memory accesses both within and across the specified address |
| spaces. |
| For *barrier* and the *work_group_barrier* variant that does not take a |
| memory scope, the _scope_ is `memory_scope_work_group`. |
| |
| `CLK_LOCAL_MEM_FENCE` - ensure |
| that all `local` memory accesses become visible to all work-items in the |
| work-group. |
| Note that the value of _scope_ is ignored as the memory scope is |
| always `memory_scope_work_group`. |
| |
| `CLK_GLOBAL_MEM_FENCE` - ensure that |
| all `global` memory accesses become visible to the appropriate memory scope |
| as given by _scope_. |
| |
| `CLK_IMAGE_MEM_FENCE` - ensure that all image memory accesses |
| become visible to the appropriate scope given by _scope_. |
| The value of _scope_ must be `memory_scope_work_group`. |
| |
| The values of _flags_ and _scope_ must be the same for all work-items |
| in the work-group. |
| |==== |
| -- |
| |
| NOTE: The functionality described in the following table <<unified-spec, |
| requires>> support for |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL 3.0 or newer and the {opencl_c_subgroups} feature. |
| |
| The following table describes built-in functions to synchronize the work-items |
| in a sub-group. |
| |
| [[table-synchronization-functions]] |
| .Built-in Sub-Group Synchronization Functions |
| [cols="3,7",options="header",] |
| |==== |
| | Function | Description |
| |
| | void **sub_group_barrier**( + |
| cl_mem_fence_flags _flags_) |
| |
| void **sub_group_barrier**( + |
| cl_mem_fence_flags _flags_, + |
| memory_scope _scope_) |
| |
| | For these functions, if any work-item in a sub-group encounters a |
| *sub_group_barrier*, the barrier must be encountered by all work-items in the |
| sub-group before any are allowed to continue execution beyond the barrier. |
| |
| If *sub_group_barrier* is inside a conditional statement, then all |
| work-items within the sub-group must enter the conditional if any work-item in |
| the sub-group enters the conditional statement and executes the |
| *sub_group_barrier*. |
| |
| If the *sub_group_barrier* is inside a loop, then all work-items in the sub-group |
| must execute the barrier on each iteration of the loop if any work-item executes the barrier on that iteration. |
| |
| The *sub_group_barrier* function can specify which |
| memory operations become visible to the appropriate memory scope |
| identified by _scope_. |
| The _flags_ argument specifies the memory address spaces. |
| This is a bitfield and can be set to 0 or a combination of the |
| following values OR'ed together. |
| When these flags are OR'ed together the barrier acts as a |
| combined barrier for all address spaces specified by the flags |
| ordering memory accesses both within and across the specified address |
| spaces. |
| For the *sub_group_barrier* variant that does not take a |
| memory scope, the _scope_ is `memory_scope_sub_group`. |
| |
| `CLK_LOCAL_MEM_FENCE` - The *sub_group_barrier* function will either flush |
| any variables stored in local memory or queue a memory fence to ensure |
| correct ordering of memory operations to local memory. |
| |
| `CLK_GLOBAL_MEM_FENCE` - The *sub_group_barrier* function will queue a |
| memory fence to ensure correct ordering of memory operations to global |
| memory. |
| This can be useful when work-items, for example, write to buffer objects |
| and then want to read the updated data from these buffer objects. |
| |
| `CLK_IMAGE_MEM_FENCE` - The *sub_group_barrier* function will queue a memory |
| fence to ensure correct ordering of memory operations to image objects. |
| This can be useful when work-items, for example, write to image objects |
| and then want to read the updated data from these image objects. |
| |
| The value of _scope_ must match requirements of the |
| <<atomic-restrictions,atomic restrictions section>>. |
| |
| |==== |
| |
| |
| [[legacy-mem-fence-functions]] |
| === Legacy Explicit Memory Fence Functions |
| |
| [open,refpage='legacyFenceFunctions',desc='Legacy Explicit Memory Fence Functions',type='freeform',spec='clang',anchor='legacy-mem-fence-functions',alias='mem_fence read_mem_fence write_mem_fence'] |
| -- |
| IMPORTANT: The memory fence functions described in this sub-section are |
| <<unified-spec, deprecated by>> OpenCL C 2.0. |
| |
| The OpenCL C programming language implements the following explicit memory fence functions to provide ordering between memory operations of a work-item. |
| |
| [[table-builtin-explicit-memory-fences]] |
| .Built-in Explicit Memory Fence Functions |
| [cols="3,7",options="header",] |
| |==== |
| | Function | Description |
| |
| | void *mem_fence*( + |
| cl_mem_fence_flags _flags_) |
| |
| | Orders loads and stores of a work-item executing a kernel. This means that |
| loads and stores preceding the *mem_fence* will be committed to memory |
| before any loads and stores following the *mem_fence*. |
| |
| The _flags_ argument specifies the memory address space and can be set to a |
| combination of the following literal values: |
| |
| `CLK_LOCAL_MEM_FENCE` + |
| `CLK_GLOBAL_MEM_FENCE` |
| |
| The value of _flags_ must be the same for all work-items in the work-group. |
| |
| | void *read_mem_fence*( + |
| cl_mem_fence_flags _flags_) |
| |
| | Read memory barrier that orders only loads. |
| |
| The _flags_ argument specifies the memory address space and can be set to a |
| combination of the following literal values: |
| |
| `CLK_LOCAL_MEM_FENCE` + |
| `CLK_GLOBAL_MEM_FENCE` |
| |
| The value of _flags_ must be the same for all work-items in the work-group. |
| |
| | void *write_mem_fence*( + |
| cl_mem_fence_flags _flags_) |
| |
| | Write memory barrier that orders only stores. |
| |
| The _flags_ argument specifies the memory address space and can be set to a |
| combination of the following literal values: |
| |
| `CLK_LOCAL_MEM_FENCE` + |
| `CLK_GLOBAL_MEM_FENCE` |
| |
| The value of _flags_ must be the same for all work-items in the work-group. |
| |
| |==== |
| -- |
| |
| |
| [[address-space-qualifier-functions]] |
| === Address Space Qualifier Functions |
| |
| [open,refpage='addressSpaceQualifierFuncs',desc='Address Space Qualifier Functions',type='freeform',spec='clang',anchor='address-space-qualifier-functions',alias='get_fence to_global to_local to_private'] |
| -- |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_generic_address_space} feature. |
| |
| This section describes built-in functions to safely convert from pointers |
| to the generic address space to pointers to named address spaces, and to |
| query the appropriate fence flags for a pointer to the generic address space. |
| We use the generic type name `gentype` to indicate any of the built-in data |
| types supported by OpenCL C or a user defined type. |
| |
| [[table-builtin-address-qualifier]] |
| .Built-in Address Space Qualifier Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | global gentype * **to_global**(gentype *_ptr_) + |
| const global gentype * **to_global**(const gentype *_ptr_) |
| | Returns a pointer that points to a region in the `global` address |
| space if *to_global* can cast _ptr_ to the `global` address space. |
| Otherwise it returns `NULL`. |
| | local gentype * **to_local**(gentype *_ptr_) + |
| const local gentype * **to_local**(const gentype *_ptr_) |
| | Returns a pointer that points to a region in the `local` address space |
| if *to_local* can cast _ptr_ to the local address space. |
| Otherwise it returns `NULL`. |
| | private gentype * **to_private**(gentype *_ptr_) + |
| const private gentype * **to_private**(const gentype *_ptr_) |
| | Returns a pointer that points to a region in the `private` address |
| space if *to_private* can cast _ptr_ to the `private` address space. |
| Otherwise it returns `NULL`. |
| | cl_mem_fence_flags **get_fence**(gentype *_ptr_) + |
| cl_mem_fence_flags **get_fence**(const gentype *_ptr_) |
| | Returns a valid memory fence value for _ptr_. |
| |==== |
| -- |
| |
| |
| [[async-copies]] |
| === Async Copies From Global to Local Memory, Local to Global Memory, and Prefetch |
| |
| [open,refpage='asyncCopyFunctions',desc='Async Copy Functions',type='freeform',spec='clang',anchor='async-copies',xrefs='',alias='async_work_group_copy async_work_group_strided_copy prefetch async_work_group_copy_fence wait_group_events'] |
| -- |
| The OpenCL C programming language implements the <<table-builtin-async-copy, |
| following functions>> that provide asynchronous copies between `global` and |
| local memory and a prefetch from `global` memory. |
| |
| The async copy and wait group events functions are performed by all work-items |
| in a work-group and therefore must be encountered by all work-items in a |
| work-group executing the kernel with the same argument values, otherwise the |
| results are undefined. |
| This rule applies to ND-ranges implemented with uniform and non-uniform |
| work-groups. |
| |
| If an async copy or wait group events function is inside a conditional statement |
| then all work-items in the work-group must enter the conditional if any |
| work-item in the work-group enters the conditional statement and executes the |
| async copy or wait group events function. |
| |
| If an async copy or wait group events function is inside a loop then all |
| work-items in the work-group must execute the async copy or wait group events |
| function on each iteration of the loop if any work-item executes the async copy |
| or wait group events function on that iteration. |
| |
| The generic type name `gentype` indicates that the function can take any of |
| |
| * `char`, `char__n__`, `uchar`, or `uchar__n__` |
| * `short`, `short__n__`, `ushort`, or `ushort__n__` |
| * `int`, `int__n__`, `uint`, or `uint__n__` |
| * `long` footnote:[{fn-int64-supported}], `long__n__`, `ulong`, or |
| `ulong__n__` |
| * `float`, `float__n__` |
| * `double` footnote:[{fn-double-supported}] or `double__n__` |
| ifdef::cl_khr_fp16[] |
| * `half` footnote:[{fn-half-supported}] or `half__n__` |
| |
| NOTE: All functions taking or returning `half` types are supported only when |
| the `<<cl_khr_fp16>>` extension macro is supported. |
| endif::cl_khr_fp16[] |
| |
| as the type for the arguments unless otherwise stated. |
| _n_ is 2, 3 footnote:[{fn-vec3-async-copy}], 4, 8, or 16. |
| |
| [[table-builtin-async-copy]] |
| .Built-in Async Copy and Prefetch Functions |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| | event_t **async_work_group_copy**({local} gentype _*dst_, |
| const {global} gentype *_src_, size_t _num_gentypes_, event_t _event_) + |
| event_t **async_work_group_copy**({global} gentype _*dst_, |
| const {local} gentype *_src_, size_t _num_gentypes_, event_t _event_) |
| | Perform an async copy of _num_gentypes_ gentype elements from _src_ to |
| _dst_. |
| |
| Returns an event object that can be used by *wait_group_events* to |
| wait for the async copy to finish. |
| The _event_ argument can also be used to associate the |
| *async_work_group_copy* with a previous async copy allowing an event |
| to be shared by multiple async copies; otherwise _event_ should be |
| zero. |
| |
| 0 can be implicitly and explicitly cast to `event_t` type. |
| |
| If _event_ argument is non-zero, the event object supplied in _event_ |
| argument will be returned. |
| |
| This function does not perform any implicit synchronization of source |
| data such as using a *barrier* before performing the copy. |
| | | |
| | event_t **async_work_group_strided_copy**({local} gentype _*dst_, |
| const {global} gentype *_src_, size_t _num_gentypes_, size_t _src_stride_, |
| event_t _event_) + |
| event_t **async_work_group_strided_copy**({global} gentype _*dst_, |
| const {local} gentype *_src_, size_t _num_gentypes_, size_t _dst_stride_, |
| event_t _event_) |
| | Perform an async gather of _num_gentypes_ `gentype` elements from |
| _src_ to _dst_. |
| The _src_stride_ is the stride in elements for each `gentype` |
| element read from _src_. |
| The _dst_stride_ is the stride in elements for each `gentype` element |
| written to _dst_. |
| |
| Returns an event object that can be used by *wait_group_events* to |
| wait for the async copy to finish. |
| The _event_ argument can also be used to associate the |
| *async_work_group_strided_copy* with a previous async copy allowing an |
| event to be shared by multiple async copies; otherwise _event_ should |
| be zero. |
| |
| 0 can be implicitly and explicitly cast to event_t type. |
| |
| If _event_ argument is non-zero, the event object supplied in _event_ |
| argument will be returned. |
| |
| This function does not perform any implicit synchronization of source |
| data such as using a *barrier* before performing the copy. |
| |
| The behavior of *async_work_group_strided_copy* is undefined if |
| _src_stride_ or _dst_stride_ is 0, or if the _src_stride_ or |
| _dst_stride_ values cause the _src_ or _dst_ pointers to exceed the |
| upper bounds of the address space during the copy. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | | |
| | void **wait_group_events**(int _num_events_, event_t *_event_list_) |
| | Wait for events that identify the *async_work_group_copy* operations |
| to complete. |
| The event objects specified in _event_list_ will be released after the |
| wait is performed. |
| | | |
| | void **prefetch**(const {global} gentype *_p_, size_t _num_gentypes_) |
| | Prefetch `_num_gentypes_ * sizeof(gentype)` bytes into the global |
| cache. |
| The prefetch instruction is applied to a work-item in a work-group and |
| does not affect the functional behavior of the kernel. |
| ifdef::cl_khr_async_work_group_copy_fence[] |
| |[source,opencl_c] |
| ---- |
| void async_work_group_copy_fence( |
| cl_mem_fence_flags flags) |
| ---- |
| | Orders async copies produced by the work-items of a work-group |
| executing a kernel. |
| Async copies preceding the *async_work_group_copy_fence* must complete |
| their access to the designated memory or memories, including both |
| reads-from and writes-to it, before async copies following the fence |
| are allowed to start accessing these memories. |
| In other words, every async copy preceding the |
| *async_work_group_copy_fence* must happen-before every async copy |
| following the fence, with respect to the designated memory or |
| memories. |
| |
| The _flags_ argument specifies the memory address space and can be set |
| to a combination of the following literal values: |
| |
| `CLK_LOCAL_MEM_FENCE` + |
| `CLK_GLOBAL_MEM_FENCE` |
| |
| The async fence is performed by all work-items in a work-group and |
| this built-in function must therefore be encountered by all work-items |
| in a work-group executing the kernel with the same argument values; |
| otherwise the results are undefined. |
| This rule applies to ND-ranges implemented with uniform and |
| non-uniform work-groups. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_async_work_group_copy_fence>>` extension macro. |
| endif::cl_khr_async_work_group_copy_fence[] |
| |==== |
| |
| [NOTE] |
| ==== |
| The kernel must wait for the completion of all async copies using the |
| *wait_group_events* built-in function before exiting; otherwise the behavior |
| is undefined. |
| ==== |
| -- |
| |
| |
| ifdef::cl_khr_extended_async_copies[] |
| [[extended-async-copies]] |
| ==== Extended Async Copy Functions |
| |
| [open,refpage='extendedAsyncCopyFunctions',desc='Extended Async Copy Functions',type='freeform',spec='clang',anchor='extended-async-copies',xrefs='',alias='async_work_group_copy_2D2D async_work_group_copy_3D3D'] |
| -- |
| If the `<<cl_khr_extended_async_copies>>` extension macro is supported, |
| additional <<table-builtin-extended-async-copy, Built-in Extended Async Copy |
| Functions>> are provided which interpret the source and destination as 2D or |
| 3D data. |
| |
| [NOTE] |
| ==== |
| <<table-builtin-async-copy, *async_work_group_strided_copy*>> is a special |
| case of *async_work_group_copy_2D2D*, namely one which copies a single |
| column to a single line or vice versa. |
| For example: + |
| `async_work_group_strided_copy(dst, src, num_gentypes, src_stride, event)` |
| is equal to `async_work_group_copy_2D2D(dst, 0, src, 0, sizeof(gentype), 1, |
| num_gentypes, src_stride, 1, event)` |
| ==== |
| |
| The functions described in this section support arbitrary `gentype`-based |
| buffers by casting pointers to `void*`. |
| |
| These functions do not perform any implicit synchronization of source data |
| such as using a *barrier* before performing the copy. |
| |
| These functions are performed by all work-items in a work-group and must |
| therefore be encountered by all work-items in a work-group executing the |
| kernel with the same argument values; otherwise the results are undefined. |
| |
| The _src_offset_, _dst_offset_, _src_total_line_length_, |
| _dst_total_line_length_, _src_total_plane_area_ and _dst_total_plane_area_ |
| function arguments are expressed in elements. |
| |
| Both _src_total_line_length_ and _dst_total_line_length_ describe the |
| number of elements between the beginning of the current line and the |
| beginning of the next line. |
| |
| Both _src_total_plane_area_ and _dst_total_plane_area_ describe the |
| number of elements between the beginning of the current plane and the |
| beginning of the next plane. |
| |
| These functions return an event object that can be used by |
| *wait_group_events* to wait for the async copy to finish. |
| The _event_ argument can also be used to associate the async copy with a |
| previous async copy allowing an event to be shared by multiple async copies; |
| otherwise _event_ should be zero. |
| If the _event_ argument is non-zero, the event object supplied as the |
| _event_ argument will be returned. |
| |
| [[table-builtin-extended-async-copy]] |
| .Built-in Extended Async Copy Functions |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| a| |
| [source,opencl_c] |
| ---- |
| event_t async_work_group_copy_2D2D( |
| __local void *dst, |
| size_t dst_offset, |
| const __global void *src, |
| size_t src_offset, |
| size_t num_bytes_per_element, |
| size_t num_elements_per_line, |
| size_t num_lines, |
| size_t src_total_line_length, |
| size_t dst_total_line_length, |
| event_t event) |
| |
| event_t async_work_group_copy_2D2D( |
| __global void *dst, |
| size_t dst_offset, |
| const __local void *src, |
| size_t src_offset, |
| size_t num_bytes_per_element, |
| size_t num_elements_per_line, |
| size_t num_lines, |
| size_t src_total_line_length, |
| size_t dst_total_line_length, |
| event_t event) |
| ---- |
| | Perform an async copy of (_num_elements_per_line_ * _num_lines_) |
| elements of size _num_bytes_per_element_ from (_src_ + (_src_offset_ * |
| _num_bytes_per_element_)) to (_dst_ + (_dst_offset_ * |
| _num_bytes_per_element_)). |
| All pointer arithmetic is performed with implicit casting to `char*` |
| by the implementation. |
| Each line contains _num_elements_per_line_ elements of size |
| _num_bytes_per_element_. |
| After each line of transfer, the _src_ address is incremented by |
| _src_total_line_length_ elements (i.e. _src_total_line_length_ * |
| _num_bytes_per_element_ bytes), and the _dst_ address is incremented |
| by _dst_total_line_length_ elements (i.e. _dst_total_line_length_ * |
| _num_bytes_per_element_ bytes), for the next line of transfer. |
| |
| The behavior of *async_work_group_copy_2D2D* is undefined if the |
| source or destination addresses exceed the upper bounds of the address |
| space during the copy. |
| |
| The behavior of *async_work_group_copy_2D2D* is also undefined if the |
| _src_total_line_length_ or _dst_total_line_length_ values are smaller |
| than _num_elements_per_line_, i.e. overlapping of lines is undefined. |
| a| |
| [source,opencl_c] |
| ---- |
| event_t async_work_group_copy_3D3D( |
| __local void *dst, |
| size_t dst_offset, |
| const __global void *src, |
| size_t src_offset, |
| size_t num_bytes_per_element, |
| size_t num_elements_per_line, |
| size_t num_lines, |
| size_t num_planes, |
| size_t src_total_line_length, |
| size_t src_total_plane_area, |
| size_t dst_total_line_length, |
| size_t dst_total_plane_area, |
| event_t event) |
| |
| event_t async_work_group_copy_3D3D( |
| __global void *dst, |
| size_t dst_offset, |
| const __local void *src, |
| size_t src_offset, |
| size_t num_bytes_per_element, |
| size_t num_elements_per_line, |
| size_t num_lines, |
| size_t num_planes, |
| size_t src_total_line_length, |
| size_t src_total_plane_area, |
| size_t dst_total_line_length, |
| size_t dst_total_plane_area, |
| event_t event) |
| ---- |
| | Perform an async copy of ((_num_elements_per_line_ * _num_lines_) * |
| _num_planes_) elements of size _num_bytes_per_element_ from (_src_ + |
| (_src_offset_ * _num_bytes_per_element_)) to (_dst_ + (_dst_offset_ * |
| _num_bytes_per_element_)), arranged in _num_planes_ planes. |
| All pointer arithmetic is performed with implicit casting to `char*` |
| by the implementation. |
| Each plane contains _num_lines_ lines. |
| Each line contains _num_elements_per_line_ elements. |
| After each line of transfer, the _src_ address is incremented by |
| _src_total_line_length_ elements (i.e. _src_total_line_length_ * |
| _num_bytes_per_element_ bytes), and the _dst_ address is incremented |
| by _dst_total_line_length_ elements (i.e. _dst_total_line_length_ * |
| _num_bytes_per_element_ bytes), for the next line of transfer. |
| |
| The behavior of *async_work_group_copy_3D3D* is undefined if the |
| source or destination addresses exceed the upper bounds of the address |
| space during the copy. |
| |
| The behavior of *async_work_group_copy_3D3D* is also undefined if the |
| _src_total_line_length_ or _dst_total_line_length_ values are smaller |
| than _num_elements_per_line_, i.e. overlapping of lines is undefined. |
| |
| The behavior of *async_work_group_copy_3D3D* is also undefined if |
| _src_total_plane_area_ is smaller than (_num_lines_ * |
| _src_total_line_length_), or _dst_total_plane_area_ is smaller than |
| (_num_lines_ * _dst_total_line_length_), i.e. overlapping of planes is |
| undefined. |
| |==== |
| -- |
| endif::cl_khr_extended_async_copies[] |
| |
| |
| [[atomic-functions]] |
| === Atomic Functions |
| |
| IMPORTANT: The C11 style atomic functions in this sub-section <<unified-spec, |
| require>> support for OpenCL 2.0 or newer. However, this statement does not |
| apply to the <<atomic-legacy, "OpenCL C 1.x Legacy Atomics">> descriptions at |
| the end of this sub-section. |
| |
| The OpenCL C programming language implements a subset of the C11 atomics |
| (refer to <<C11-spec,section 7.17 of the C11 Specification>>) and |
| synchronization operations. |
| These operations play a special role in making assignments in one work-item |
| visible to another. |
| A synchronization operation on one or more memory locations is either an |
| acquire operation, a release operation, or both an acquire and release |
| operation footnote:[{fn-atomic-no-consume}]. |
| A synchronization operation without an associated memory location is a fence |
| and can be either an acquire fence, a release fence or both an acquire and |
| release fence. |
| In addition, there are relaxed atomic operations, which are not |
| synchronization operations, and atomic read-modify-write operations which |
| have special characteristics. |
| |
| The types include |
| |
| [none] |
| * `memory_order` |
| |
| which is an enumerated type whose enumerators identify memory ordering |
| constraints; |
| |
| [none] |
| * `memory_scope` |
| |
| which is an enumerated type whose enumerators identify scope of memory |
| ordering constraints; |
| |
| [none] |
| * `atomic_flag` |
| |
| which is a 32-bit integer type representing a primitive atomic flag; and |
| several atomic analogs of integer types. |
| |
| In the following operation definitions: |
| |
| * An A refers to one of the atomic types. |
| * A C refers to its corresponding non-atomic type. |
| * An M refers to the type of the other argument for arithmetic operations. |
| For atomic integer types, M is C. |
| * The functions not ending in explicit have the same semantics as the |
| corresponding explicit function with `memory_order_seq_cst` for the |
| `memory_order` argument. |
| * The functions that do not have `memory_scope` argument have the same |
| semantics as the corresponding functions with the `memory_scope` |
| argument set to `memory_scope_device`. |
| |
| [NOTE] |
| ==== |
| With fine-grained system SVM, sharing happens at the granularity of |
| individual loads and stores anywhere in host memory. |
| Memory consistency is always guaranteed at synchronization points, but to |
| obtain finer control over consistency, the OpenCL atomics functions may be |
| used to ensure that the updates to individual data values made by one unit |
| of execution are visible to other execution units. |
| In particular, when a host thread needs fine control over the consistency of |
| memory that is shared with one or more OpenCL devices, it must use atomic |
| and fence operations that are compatible with the C11 atomic operations. |
| |
| We can't require <<C11-spec,C11 atomics>> since host programs can be |
| implemented in other programming languages and versions of C or {cpp}, but we |
| do require that the host programs use atomics and that those atomics be |
| compatible with those in C11. |
| ==== |
| |
| ifdef::refpageOnly[] |
| // This is an index page, generated only in the OpenCL refpages and not in |
| // the Specification. It's here so the index remains consistent with the |
| // spec. |
| [open,refpage='atomicFunctions',desc='Atomic Functions',type='freeform',spec='clang',anchor='atomic-functions',xrefs='memory_order memory_scope'] |
| -- |
| OpenCL C includes a variety of atomic functions, described individually in the |
| following sections: |
| |
| * link:ATOMIC_VAR_INIT.html[ATOMIC_VAR_INIT macro] |
| * link:atomic_init.html[atomic_init function] |
| * link:atomic_work_item_fence.html[Fences] |
| * link:atomicTypes.html[Atomic Integer And Floating-Point Types] |
| * link:atomic_store.html[atomic_store Functions] |
| * link:atomic_load.html[atomic_load Functions] |
| * link:atomic_exchange.html[atomic_exchange Functions] |
| * link:atomic_compare_exchange.html[atomic_compare_exchange Functions] |
| * link:atomic_fetch_key.html[atomic_fetch and modify Functions] |
| * link:atomic_flag.html[Atomic Flag Type and Operations] |
| * link:atomicFlagTestAndSet.html[atomic_flag_test_and_set Functions] |
| * link:atomic_flag_clear.html[atomic_flag_clear Functions] |
| * link:atomicRestrictions.html[Restrictions on Atomic Operations] |
| -- |
| endif::refpageOnly[] |
| |
| |
| [[the-atomic_var_init-macro]] |
| ==== The `ATOMIC_VAR_INIT` Macro |
| |
| [open,refpage='ATOMIC_VAR_INIT',desc='ATOMIC_VAR_INIT macro',type='freeform',spec='clang',anchor='the-atomic_var_init-macro',xrefs='atomicFunctions atomic_init'] |
| -- |
| |
| The `ATOMIC_VAR_INIT` macro expands to a token sequence suitable for |
| initializing an atomic object of a type that is initialization-compatible |
| with value. |
| An atomic object with automatic storage duration that is not explicitly |
| initialized using `ATOMIC_VAR_INIT` is initially in an indeterminate state; |
| however, the default (zero) initialization for objects with `static` storage |
| duration is guaranteed to produce a valid state. |
| |
| [source,opencl_c] |
| ---------- |
| #define ATOMIC_VAR_INIT(C value) |
| ---------- |
| |
| This macro can only be used to initialize atomic objects that are declared |
| in program scope in the `global` address space. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| global atomic_int guide = ATOMIC_VAR_INIT(42); |
| ---------- |
| |
| Concurrent access to the variable being initialized, even via an atomic |
| operation, constitutes a data-race. |
| -- |
| |
| |
| [[the-atomic_init-function]] |
| ==== The atomic_init Function |
| |
| [open,refpage='atomic_init',desc='The atomic_init function',type='freeform',spec='clang',anchor='the-atomic_init-function',xrefs='atomicFunctions ATOMIC_VAR_INIT'] |
| -- |
| |
| The `atomic_init` function non-atomically initializes the atomic object |
| pointed to by _obj_ to the value _value_. |
| |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer. |
| void atomic_init(volatile __global A *obj, C value) |
| void atomic_init(volatile __local A *obj, C value) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| void atomic_init(volatile A *obj, C value) |
| ---------- |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| local atomic_int guide; |
| if (get_local_id(0) == 0) |
| atomic_init(&guide, 42); |
| work_group_barrier(CLK_LOCAL_MEM_FENCE); |
| ---------- |
| |
| NOTE: The function variant that uses the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, requires>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[order-and-consistency]] |
| ==== Order and Consistency |
| |
| [open,refpage='memory_order',desc='Memory Operation Order and Consistency',type='freeform',spec='clang',anchor='order-and-consistency'] |
| -- |
| |
| The enumerated type `memory_order` specifies the detailed regular |
| (non-atomic) memory synchronization operations as defined in |
| <<C11-spec,section 5.1.2.4 of the C11 Specification>>, and may provide for |
| operation ordering. |
| The following table lists the enumeration constants: |
| |
| [[table-memory-orders]] |
| //.Memory Order Enumeration Constants |
| [cols=",",options="header",] |
| |==== |
| | Memory Order | Additional Notes |
| | `memory_order_relaxed` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| | `memory_order_acquire` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, but in OpenCL C 3.0 |
| or newer some uses require the {opencl_c_atomic_order_acq_rel} |
| feature. |
| | `memory_order_release` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, but in OpenCL C 3.0 |
| or newer some uses require the {opencl_c_atomic_order_acq_rel} |
| feature. |
| | `memory_order_acq_rel` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, but in OpenCL C 3.0 |
| or newer some uses require the {opencl_c_atomic_order_acq_rel} |
| feature. |
| | `memory_order_seq_cst` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_atomic_order_seq_cst} feature. |
| |==== |
| |
| The `memory_order` can be used when performing atomic operations to `global` |
| or `local` memory. |
| -- |
| |
| |
| [[memory-scope]] |
| ==== Memory Scope |
| |
| [open,refpage='memory_scope',desc='Memory Operation Scope',type='freeform',spec='clang',anchor='memory-scope'] |
| -- |
| |
| The enumerated type `memory_scope` specifies whether the memory ordering |
| constraints given by `memory_order` apply to work-items in a sub-group, |
| work-items in a work-group, or work-items from one or more kernels executing |
| on the device or across devices (in the case of shared virtual memory). |
| The following table lists the enumeration constants: |
| |
| [[table-memory-scopes]] |
| //.Memory Scope Enumeration Constants |
| [cols=",",options="header",] |
| |==== |
| | Memory Scope | Additional Notes |
| | `memory_scope_work_item` |
| | `memory_scope_work_item` can only be used with `atomic_work_item_fence` |
| with flags set to `CLK_IMAGE_MEM_FENCE`. |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| | `memory_scope_sub_group` |
| | <<unified-spec, Requires>> support for |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} feature. |
| | `memory_scope_work_group` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0 or newer. |
| | `memory_scope_device` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_atomic_scope_device} feature. |
| | `memory_scope_all_svm_devices` |
| | <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_atomic_scope_all_devices} feature. |
| | `memory_scope_all_devices` |
| | An alias for `memory_scope_all_svm_devices`. |
| <<unified-spec, Requires>> support for OpenCL C 3.0 or newer and the |
| {opencl_c_atomic_scope_all_devices} feature. |
| |==== |
| |
| // This is no longer correct given `memory_scope_sub_group`. |
| //The memory scope should only be used when performing atomic operations to |
| //global memory. |
| //Atomic operations to `local` memory only guarantee memory ordering in the |
| //work-group not across work-groups and therefore ignore the `memory_scope` |
| //value. |
| -- |
| |
| |
| [[fences]] |
| ==== Fences |
| |
| [open,refpage='atomic_work_item_fence',desc='Fences',type='freeform',spec='clang',anchor='fences',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| The following fence operations are supported. |
| |
| [source,opencl_c] |
| ---------- |
| void atomic_work_item_fence(cl_mem_fence_flags flags, |
| memory_order order, |
| memory_scope scope) |
| |
| // Older syntax memory fences are equivalent to atomic_work_item_fence with the |
| // same flags parameter, memory_scope_work_group scope, and ordering as follows: |
| void mem_fence(cl_mem_fence_flags flags) // memory_order_acq_rel |
| void read_mem_fence(cl_mem_fence_flags flags) // memory_order_acquire |
| void write_mem_fence(cl_mem_fence_flags flags) // memory_order_release |
| ---------- |
| |
| `flags` must be set to `CLK_GLOBAL_MEM_FENCE`, `CLK_LOCAL_MEM_FENCE`, |
| `CLK_IMAGE_MEM_FENCE` or a combination of these values ORed together; |
| otherwise the behavior is undefined. |
| The behavior of calling `atomic_work_item_fence` with `CLK_IMAGE_MEM_FENCE` |
| ORed together with either `CLK_GLOBAL_MEM_FENCE` or `CLK_LOCAL_MEM_FENCE` is |
| equivalent to calling `atomic_work_item_fence` individually for |
| `CLK_IMAGE_MEM_FENCE` and the other flags. |
| Passing both `CLK_GLOBAL_MEM_FENCE` and `CLK_LOCAL_MEM_FENCE` to |
| `atomic_work_item_fence` will synchronize memory operations to both `local` |
| and `global` memory through some shared atomic action, as described in |
| <<opencl-spec,section 3.3.6.2 of the OpenCL Specification>>. |
| |
| Depending on the value of order, this operation: |
| |
| * has no effects, if _order_ == `memory_order_relaxed`. |
| * is an acquire fence, if _order_ == `memory_order_acquire`. |
| * is a release fence, if _order_ == `memory_order_release`. |
| * is both an acquire fence and a release fence, if _order_ == |
| `memory_order_acq_rel`. |
| * is a sequentially consistent acquire and release fence, if _order_ == |
| `memory_order_seq_cst`. |
| |
| For images declared with the `read_write` qualifier, the |
| `atomic_work_item_fence` must be called to make sure that writes to the |
| image by a work-item become visible to that work-item on subsequent reads to |
| that image by that work-item. |
| |
| NOTE: The use of memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| -- |
| |
| |
| [[atomic-integer-and-floating-point-types]] |
| ==== Atomic Integer and Floating-point Types |
| |
| [open,refpage='atomicTypes',desc='Atomic Integer And Floating-Point Types',type='freeform',spec='clang',anchor='atomic-integer-and-floating-point-types',xrefs='atomicFunctions',alias='atomic_int atomic_uint atomic_long atomic_ulong atomic_float atomic_double atomic_intptr_t atomic_uintptr_t atomic_size_t atomic_ptrdiff_t'] |
| -- |
| |
| The list of supported atomic type names are: |
| |
| [none] |
| * `atomic_int` |
| * `atomic_uint` |
| * `atomic_long` footnote:atomic-int64-supported[{fn-atomic-int64-supported}] |
| * `atomic_ulong` footnote:atomic-int64-supported[] |
| * `atomic_float` |
| * `atomic_double` footnote:[{fn-atomic-double-supported}] |
| * `atomic_intptr_t` footnote:atomic-size_t-supported[{fn-atomic-size_t-supported}] |
| * `atomic_uintptr_t` footnote:atomic-size_t-supported[] |
| * `atomic_size_t` footnote:atomic-size_t-supported[] |
| * `atomic_ptrdiff_t` footnote:atomic-size_t-supported[] |
| |
| Arguments to a kernel can be declared to be a pointer to the above atomic |
| types or the atomic_flag type. |
| |
| The representation of atomic integer, floating-point and pointer types have |
| the same size as their corresponding regular types. |
| The atomic_flag type must be implemented as a 32-bit integer. |
| -- |
| |
| |
| [[operations-on-atomic-types]] |
| ==== Operations on Atomic Types |
| |
| There are only a few kinds of operations on atomic types, though there are |
| many instances of those kinds. |
| This section specifies each general kind. |
| |
| |
| [[atomic_store]] |
| ===== *The atomic_store Functions* |
| |
| [open,refpage='atomic_store',desc='The atomic_store Functions',type='freeform',spec='clang',anchor='atomic_store',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| void atomic_store(volatile __global A *object, C desired) |
| void atomic_store(volatile __local A *object, C desired) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| void atomic_store(volatile A *object, C desired) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| void atomic_store_explicit(volatile __global A *object, |
| C desired, |
| memory_order order) |
| void atomic_store_explicit(volatile __local A *object, |
| C desired, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // features. |
| void atomic_store_explicit(volatile A *object, |
| C desired, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| void atomic_store_explicit(volatile __global A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| void atomic_store_explicit(volatile __local A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| void atomic_store_explicit(volatile A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| The _order_ argument shall not be `memory_order_acquire`, nor |
| `memory_order_acq_rel`. |
| Atomically replace the value pointed to by _object_ with the value of |
| _desired_. |
| Memory is affected according to the value of _order_. |
| |
| NOTE: The non-explicit `atomic_store` function <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_load]] |
| ===== *The atomic_load Functions* |
| |
| [open,refpage='atomic_load',desc='The atomic_load Functions',type='freeform',spec='clang',anchor='atomic_load',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| C atomic_load(volatile __global A *object) |
| C atomic_load(volatile __local A *object) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| C atomic_load(volatile A *object) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| C atomic_load_explicit(volatile __global A *object, |
| memory_order order) |
| C atomic_load_explicit(volatile __local A *object, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // features. |
| C atomic_load_explicit(volatile A *object, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| C atomic_load_explicit(volatile __global A *object, |
| memory_order order, |
| memory_scope scope) |
| C atomic_load_explicit(volatile __local A *object, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| C atomic_load_explicit(volatile A *object, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| The _order_ argument shall not be `memory_order_release` nor |
| `memory_order_acq_rel`. |
| Memory is affected according to the value of _order_. |
| Atomically returns the value pointed to by _object_. |
| |
| NOTE: The non-explicit `atomic_load` function <<unified-spec, requires>> |
| support for OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_exchange]] |
| ===== *The atomic_exchange Functions* |
| |
| [open,refpage='atomic_exchange',desc='The atomic_exchange Functions',type='freeform',spec='clang',anchor='atomic_exchange',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| C atomic_exchange(volatile __global A *object, C desired) |
| C atomic_exchange(volatile __local A *object, C desired) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| C atomic_exchange(volatile A *object, C desired) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| C atomic_exchange_explicit(volatile __global A *object, |
| C desired, |
| memory_order order) |
| C atomic_exchange_explicit(volatile __local A *object, |
| C desired, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // feature. |
| C atomic_exchange_explicit(volatile A *object, |
| C desired, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| C atomic_exchange_explicit(volatile __global A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| C atomic_exchange_explicit(volatile __local A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| C atomic_exchange_explicit(volatile A *object, |
| C desired, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| Atomically replace the value pointed to by `object` with `desired`. |
| Memory is affected according to the value of `order`. |
| These operations are read-modify-write operations (as defined by |
| <<C11-spec,section 5.1.2.4 of the C11 Specification>>). |
| Atomically returns the value pointed to by `object` immediately before the |
| effects. |
| |
| NOTE: The non-explicit `atomic_exchange` function <<unified-spec, requires>> |
| support for OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_compare_exchange]] |
| ===== *The atomic_compare_exchange Functions* |
| |
| [open,refpage='atomic_compare_exchange',desc='The atomic_compare_exchange Functions',type='freeform',spec='clang',anchor='atomic_compare_exchange',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_strong( |
| volatile __global A *object, |
| __global C *expected, C desired) |
| bool atomic_compare_exchange_strong( |
| volatile __global A *object, |
| __local C *expected, C desired) |
| bool atomic_compare_exchange_strong( |
| volatile __global A *object, |
| __private C *expected, C desired) |
| bool atomic_compare_exchange_strong( |
| volatile __local A *object, |
| __global C *expected, C desired) |
| bool atomic_compare_exchange_strong( |
| volatile __local A *object, |
| __local C *expected, C desired) |
| bool atomic_compare_exchange_strong( |
| volatile __local A *object, |
| __private C *expected, C desired) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_strong( |
| volatile A *object, |
| C *expected, C desired) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and |
| // __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_strong_explicit( |
| volatile A *object, |
| C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| |
| // Requires OpenCL C 3.0 or newer. |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __global A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_strong_explicit( |
| volatile __local A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| bool atomic_compare_exchange_strong_explicit( |
| volatile A *object, |
| C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_weak( |
| volatile __global A *object, |
| __global C *expected, C desired) |
| bool atomic_compare_exchange_weak( |
| volatile __global A *object, |
| __local C *expected, C desired) |
| bool atomic_compare_exchange_weak( |
| volatile __global A *object, |
| __private C *expected, C desired) |
| bool atomic_compare_exchange_weak( |
| volatile __local A *object, |
| __global C *expected, C desired) |
| bool atomic_compare_exchange_weak( |
| volatile __local A *object, |
| __local C *expected, C desired) |
| bool atomic_compare_exchange_weak( |
| volatile __local A *object, |
| __private C *expected, C desired) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_weak( |
| volatile A *object, |
| C *expected, C desired) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and |
| // __opencl_c_atomic_scope_device features. |
| bool atomic_compare_exchange_weak_explicit( |
| volatile A *object, |
| C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure) |
| |
| // Requires OpenCL C 3.0 or newer. |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __global A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __global C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __local C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| bool atomic_compare_exchange_weak_explicit( |
| volatile __local A *object, |
| __private C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| bool atomic_compare_exchange_weak_explicit( |
| volatile A *object, |
| C *expected, |
| C desired, |
| memory_order success, |
| memory_order failure, |
| memory_scope scope) |
| ---------- |
| |
| The `failure` argument shall not be `memory_order_release` nor |
| `memory_order_acq_rel`. |
| The `failure` argument shall be no stronger than the `success` argument. |
| Atomically, compares the value pointed to by `object` for equality with that |
| in `expected`, and if _true_, replaces the value pointed to by `object` with |
| `desired`, and if _false_, updates the value in `expected` with the value |
| pointed to by `object`. |
| Further, if the comparison is _true_, memory is affected according to the |
| value of `success`, and if the comparison is _false_, memory is affected |
| according to the value of `failure`. |
| If the comparison is _true_, these operations are atomic read-modify-write operations (as defined by |
| <<C11-spec,section 5.1.2.4 of the C11 Specification>>). |
| Otherwise, these operations are atomic load operations. |
| |
| [NOTE] |
| ==== |
| The effect of the compare-and-exchange operations is |
| |
| [source,opencl_c] |
| ---------- |
| if (memcmp(object, expected, sizeof(*object)) == 0) { |
| memcpy(object, &desired, sizeof(*object)); |
| } else { |
| memcpy(expected, object, sizeof(*object)); |
| } |
| ---------- |
| ==== |
| |
| The weak compare-and-exchange operations may fail spuriously |
| footnote:[{fn-atomic-weak-rationale}]. |
| That is, even when the contents of memory referred to by `expected` and |
| `object` are equal, it may return zero and store back to `expected` the same |
| memory contents that were originally there. |
| |
| These generic functions return the result of the comparison. |
| |
| NOTE: The non-explicit `atomic_compare_exchange_strong` and |
| `atomic_compare_exchange_weak` functions <<unified-spec, requires>> support |
| for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_fetch_key]] |
| ===== *The atomic_fetch and modify Functions* |
| |
| [open,refpage='atomic_fetch_key',desc='The atomic_fetch and modify Functions',type='freeform',spec='clang',anchor='atomic_fetch_key',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| The following operations perform arithmetic and bitwise computations. |
| All of these operations are applicable to an object of any atomic integer |
| type. |
| The key, operator, and computation correspondence is given in table below: |
| |
| [cols=",,",options="header",] |
| |==== |
| | *key* | *op* | *computation* |
| | `add` | *+* | addition |
| | `sub` | *-* | subtraction |
| | `or` | *\|* | bitwise inclusive or |
| | `xor` | *^* | bitwise exclusive or |
| | `and` | *&* | bitwise and |
| | `min` | *min* | compute min |
| | `max` | *max* | compute max |
| |==== |
| |
| [NOTE] |
| ==== |
| For *atomic_fetch* and modify functions with *key* = `add` or `sub` on |
| atomic types `atomic_intptr_t` and `atomic_uintptr_t`, `M` is `ptrdiff_t`. |
| For *atomic_fetch* and modify functions with *key* = `or`, `xor`, `and`, |
| `min` and `max` on atomic type `atomic_intptr_t`, `M` is `intptr_t`, |
| and on atomic type `atomic_uintptr_t`, `M` is `uintptr_t`. |
| ==== |
| |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| C atomic_fetch_key(volatile __global A *object, M operand) |
| C atomic_fetch_key(volatile __local A *object, M operand) |
| |
| // Requires OpenCL C 2.0, or all of the __opencl_c_generic_address_space, |
| // __opencl_c_atomic_order_seq_cst and __opencl_c_atomic_scope_device features. |
| C atomic_fetch_key(volatile A *object, M operand) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device feature. |
| C atomic_fetch_key_explicit(volatile __global A *object, |
| M operand, |
| memory_order order) |
| C atomic_fetch_key_explicit(volatile __local A *object, |
| M operand, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // features. |
| C atomic_fetch_key_explicit(volatile A *object, |
| M operand, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| C atomic_fetch_key_explicit(volatile __global A *object, |
| M operand, |
| memory_order order, |
| memory_scope scope) |
| C atomic_fetch_key_explicit(volatile __local A *object, |
| M operand, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| C atomic_fetch_key_explicit(volatile A *object, |
| M operand, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| Atomically replaces the value pointed to by `object` with the result of the |
| computation applied to the value pointed to by `object` and the given |
| operand. |
| Memory is affected according to the value of `order`. |
| These operations are atomic read-modify-write operations (as defined by |
| <<C11-spec,section 5.1.2.4 of the C11 Specification>>). |
| For signed integer types, arithmetic is defined to use two's complement |
| representation with silent wrap-around on overflow; there are no undefined |
| results. |
| For address types, the result may be an undefined address, but the |
| operations otherwise have no undefined behavior. |
| Returns atomically the value pointed to by `object` immediately before the |
| effects. |
| |
| NOTE: The non-explicit `atomic_fetch_key` functions <<unified-spec, require>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_flag]] |
| ===== *Atomic Flag Type and Operations* |
| |
| [open,refpage='atomic_flag',desc='Atomic Flag Type and Operations',type='freeform',spec='clang',anchor='atomic_flag',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| The `atomic_flag` type provides the classic test-and-set functionality. |
| It has two states, _set_ (value is non-zero) and _clear_ (value is 0). |
| |
| In OpenCL C 2.0 Operations on an object of type `atomic_flag` shall be |
| lock-free, in OpenCL C 3.0 or newer they may be lock-free. |
| |
| The macro `ATOMIC_FLAG_INIT` may be used to initialize an `atomic_flag` to the |
| _clear_ state. |
| An `atomic_flag` that is not explicitly initialized with `ATOMIC_FLAG_INIT` is |
| initially in an indeterminate state. |
| |
| This macro can only be used for atomic objects that are declared in program |
| scope in the `global` address space with the `atomic_flag` type. |
| |
| Example: |
| |
| [source,opencl_c] |
| ---------- |
| global atomic_flag guard = ATOMIC_FLAG_INIT; |
| ---------- |
| -- |
| |
| |
| [[atomic_flag_test_and_set]] |
| ===== *The atomic_flag_test_and_set Functions* |
| |
| [open,refpage='atomicFlagTestAndSet',desc='The atomic_flag_test_and_set Functions',type='freeform',spec='clang',anchor='atomic_flag_test_and_set',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_init atomic_load atomic_store atomic_work_item_fence',alias='atomic_flag_test_and_set atomic_flag_test_and_set_explicit'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| bool atomic_flag_test_and_set( |
| volatile __global atomic_flag *object) |
| bool atomic_flag_test_and_set( |
| volatile __local atomic_flag *object) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| bool atomic_flag_test_and_set( |
| volatile atomic_flag *object) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| bool atomic_flag_test_and_set_explicit( |
| volatile __global atomic_flag *object, |
| memory_order order) |
| bool atomic_flag_test_and_set_explicit( |
| volatile __local atomic_flag *object, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0 or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // features. |
| bool atomic_flag_test_and_set_explicit( |
| volatile atomic_flag *object, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| bool atomic_flag_test_and_set_explicit( |
| volatile __global atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| bool atomic_flag_test_and_set_explicit( |
| volatile __local atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| bool atomic_flag_test_and_set_explicit( |
| volatile atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| Atomically sets the value pointed to by `object` to _true_. |
| Memory is affected according to the value of `order`. |
| These operations are atomic read-modify-write operations (as defined by |
| <<C11-spec,section 5.1.2.4 of the C11 Specification>>). |
| Returns atomically the value of the `object` immediately before the effects. |
| |
| NOTE: The non-explicit `atomic_flag_test_and_set` function <<unified-spec, |
| requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic_flag_clear]] |
| ===== *The atomic_flag_clear Functions* |
| |
| [open,refpage='atomic_flag_clear',desc='The atomic_flag_clear Functions',type='freeform',spec='clang',anchor='atomic_flag_clear',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| [source,opencl_c] |
| ---------- |
| // Requires OpenCL C 3.0 or newer and both the __opencl_c_atomic_order_seq_cst |
| // and __opencl_c_atomic_scope_device features. |
| void atomic_flag_clear(volatile __global atomic_flag *object) |
| void atomic_flag_clear(volatile __local atomic_flag *object) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and all of the |
| // __opencl_c_generic_address_space, __opencl_c_atomic_order_seq_cst and |
| // __opencl_c_atomic_scope_device features. |
| void atomic_flag_clear(volatile atomic_flag *object) |
| |
| // Requires OpenCL C 3.0 or newer and the __opencl_c_atomic_scope_device |
| // feature. |
| void atomic_flag_clear_explicit( |
| volatile __global atomic_flag *object, |
| memory_order order) |
| void atomic_flag_clear_explicit( |
| volatile __local atomic_flag *object, |
| memory_order order) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| // __opencl_c_generic_address_space and __opencl_c_atomic_scope_device |
| // features. |
| void atomic_flag_clear_explicit( |
| volatile atomic_flag *object, |
| memory_order order) |
| |
| // Requires OpenCL C 3.0 or newer. |
| void atomic_flag_clear_explicit( |
| volatile __global atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| void atomic_flag_clear_explicit( |
| volatile __local atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| |
| // Requires OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| // __opencl_c_generic_address_space feature. |
| void atomic_flag_clear_explicit( |
| volatile atomic_flag *object, |
| memory_order order, |
| memory_scope scope) |
| ---------- |
| |
| The `order` argument shall not be `memory_order_acquire` nor |
| `memory_order_acq_rel`. |
| Atomically sets the value pointed to by `object` to _false_. |
| Memory is affected according to the value of `order`. |
| |
| NOTE: The non-explicit `atomic_flag_clear` function <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and both the |
| {opencl_c_atomic_order_seq_cst} and {opencl_c_atomic_scope_device} |
| features. |
| For the explicit variants, memory order and scope enumerations must respect the |
| <<atomic-restrictions,restrictions section below>>. |
| |
| NOTE: The function variants that use the generic address space, i.e. no |
| explicit address space is listed, <<unified-spec, require>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_generic_address_space} |
| feature. |
| -- |
| |
| |
| [[atomic-legacy]] |
| ==== OpenCL C 1.x Legacy Atomics |
| |
| IMPORTANT: The atomic functions described in this sub-section <<unified-spec, |
| require>> support for OpenCL C 1.1 or newer, and are <<unified-spec, |
| deprecated by>> OpenCL C 2.0. |
| |
| OpenCL C 1.x had support for relaxed atomic operations via built-in functions |
| that could operate on any memory address in `{global}` or `{local}` spaces. |
| Unlike C11 style atomics these did not require using dedicated atomic types, |
| and instead operated on 32-bit signed integers, 32-bit unsigned integers, and |
| only in the case of **atomic_xchg** additionally single precision floating-point. |
| These were equivalent to atomic operations with `memory_order_relaxed` |
| consistency, and `memory_scope_work_group` scope. |
| |
| NOTE: Some implementations may implement legacy atomics with a stricter memory |
| consistency order than `memory_order_relaxed` or a broader scope than |
| `memory_scope_work_group`. |
| This is because all the stricter orders and broader scopes fully satisfy the |
| semantics of the minimum requirements. |
| |
| // Copied from table 6.19 in OpenCL 1.2 spec |
| [[table-legacy-atomic-functions]] |
| .Legacy Atomic Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int **atomic_add**(volatile {global} int *_p_, int _val_) + |
| int **atom_add**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_add**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_add**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_add**(volatile {local} int *_p_, int _val_) + |
| int **atom_add**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_add**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_add**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ + _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | int **atomic_sub**(volatile {global} int *_p_, int _val_) + |
| int **atom_sub**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_sub**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_sub**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_sub**(volatile {local} int *_p_, int _val_) + |
| int **atom_sub**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_sub**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_sub**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ - _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | int **atomic_xchg**(volatile {global} int *_p_, int _val_) + |
| int **atom_xchg**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_xchg**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_xchg**(volatile {global} uint *_p_, uint _val_) |
| |
| float **atomic_xchg**(volatile {global} float *_p_, float _val_) + |
| |
| int **atomic_xchg**(volatile {local} int *_p_, int _val_) + |
| int **atom_xchg**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_xchg**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_xchg**(volatile {local} uint *_p_, uint _val_) |
| |
| float **atomic_xchg**(volatile {local} float *_p_, float _val_) |
| | Swaps the _old_ value stored at location _p_ with new value given by |
| _val_. Returns _old_ value. |
| | int **atomic_inc**(volatile {global} int *_p_) + |
| int **atom_inc**(volatile {global} int *_p_) |
| |
| uint **atomic_inc**(volatile {global} uint *_p_) + |
| uint **atom_inc**(volatile {global} uint *_p_) |
| |
| int **atomic_inc**(volatile {local} int *_p_) + |
| int **atom_inc**(volatile {local} int *_p_) |
| |
| uint **atomic_inc**(volatile {local} uint *_p_) + |
| uint **atom_inc**(volatile {local} uint *_p_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ + 1) and store result at location pointed by _p_. The |
| function returns _old_. |
| | int **atomic_dec**(volatile {global} int *_p_) + |
| int **atom_dec**(volatile {global} int *_p_) |
| |
| uint **atomic_dec**(volatile {global} uint *_p_) + |
| uint **atom_dec**({global} uint *_p_) |
| |
| int **atomic_dec**(volatile {local} int *_p_) + |
| int **atom_dec**(volatile {local} int *_p_) |
| |
| uint **atomic_dec**(volatile {local} uint *_p_) + |
| uint **atom_dec**(volatile {local} uint *_p_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ - 1) and store result at location pointed by _p_. The |
| function returns _old_. |
| | int **atomic_cmpxchg**(volatile {global} int *_p_, int _cmp_, int _val_) + |
| int **atom_cmpxchg**(volatile {global} int *_p_, int _cmp_, int _val_) |
| |
| uint **atomic_cmpxchg**(volatile {global} uint *_p_, uint _cmp_, uint _val_) + |
| uint **atom_cmpxchg**(volatile {global} uint *_p_, uint _cmp_, uint _val_) |
| |
| int **atomic_cmpxchg**(volatile {local} int *_p_, int _cmp_, int _val_) + |
| int **atom_cmpxchg**(volatile {local} int *_p_, int _cmp_, int _val_) |
| |
| uint **atomic_cmpxchg**(volatile {local} uint *_p_, uint _cmp_, uint _val_) + |
| uint **atom_cmpxchg**(volatile {local} uint *_p_, uint _cmp_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ == _cmp_) ? _val_ : _old_ and store result at location |
| pointed by _p_. The function returns _old_. |
| | int **atomic_min**(volatile {global} int *_p_, int _val_) + |
| int **atom_min**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_min**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_min**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_min**(volatile {local} int *_p_, int _val_) + |
| int **atom_min**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_min**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_min**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute **min**(_old_, _val_) and store minimum value at location |
| pointed by _p_. The function returns _old_. |
| | int **atomic_max**(volatile {global} int *_p_, int _val_) + |
| int **atom_max**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_max**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_max**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_max**(volatile {local} int *_p_, int _val_) + |
| int **atom_max**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_max**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_max**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute **max**(_old_, _val_) and store maximum value at location |
| pointed by _p_. The function returns _old_. |
| | int **atomic_and**(volatile {global} int *_p_, int _val_) + |
| int **atom_and**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_and**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_and**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_and**(volatile {local} int *_p_, int _val_) + |
| int **atom_and**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_and**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_and**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ & _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | int **atomic_or**(volatile {global} int *_p_, int _val_) + |
| int **atom_or**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_or**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_or**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_or**(volatile {local} int *_p_, int _val_) + |
| int **atom_or**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_or**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_or**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ \| _val_) and store result at location pointed by |
| _p_. The function returns _old_. |
| | int **atomic_xor**(volatile {global} int *_p_, int _val_) + |
| int **atom_xor**(volatile {global} int *_p_, int _val_) |
| |
| uint **atomic_xor**(volatile {global} uint *_p_, uint _val_) + |
| uint **atom_xor**(volatile {global} uint *_p_, uint _val_) |
| |
| int **atomic_xor**(volatile {local} int *_p_, int _val_) + |
| int **atom_xor**(volatile {local} int *_p_, int _val_) |
| |
| uint **atomic_xor**(volatile {local} uint *_p_, uint _val_) + |
| uint **atom_xor**(volatile {local} uint *_p_, uint _val_) |
| | Read the 32-bit value (referred to as _old_) stored at location pointed by |
| _p_. Compute (_old_ ^ _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| |==== |
| |
| ifdef::cl_khr_global_int32_base_atomics,cl_khr_global_int32_extended_atomics,cl_khr_local_int32_base_atomics,cl_khr_local_int32_extended_atomics[] |
| A subset of the atomic functions described above are also supported in |
| OpenCL 1.0 when appropriate OpenCL extension macros are supported, as |
| described in the <<table-atomic-function-extensions, Atomic Function |
| Extensions>> table below. |
| |
| [[table-atomic-function-extensions]] |
| .Atomic Function Extensions |
| [cols=",",options="header",] |
| |==== |
| | Extension Macro | Supported Functions |
| ifdef::cl_khr_global_int32_base_atomics[] |
| | `<<cl_khr_global_int32_base_atomics>>` |
| | **atom_add** + |
| **atom_sub** + |
| **atom_xchg** + |
| **atom_inc** + |
| **atom_dec** + |
| **atom_cmpxchg** + |
| (with {global} parameters) |
| endif::cl_khr_global_int32_base_atomics[] |
| ifdef::cl_khr_global_int32_extended_atomics[] |
| | `<<cl_khr_global_int32_extended_atomics>>` |
| | **atom_min** + |
| **atom_max** + |
| **atom_and** + |
| **atom_or** + |
| **atom_xor** + |
| (with {global} parameters) |
| endif::cl_khr_global_int32_extended_atomics[] |
| ifdef::cl_khr_local_int32_base_atomics[] |
| | `<<cl_khr_local_int32_base_atomics>>` |
| | **atom_add** + |
| **atom_sub** + |
| **atom_xchg** + |
| **atom_inc** + |
| **atom_dec** + |
| **atom_cmpxchg** + |
| (with {local} parameters) |
| endif::cl_khr_local_int32_base_atomics[] |
| ifdef::cl_khr_local_int32_extended_atomics[] |
| | `<<cl_khr_local_int32_extended_atomics>>` |
| | **atom_min** + |
| **atom_max** + |
| **atom_and** + |
| **atom_or** + |
| **atom_xor** + |
| (with {local} parameters) |
| endif::cl_khr_local_int32_extended_atomics[] |
| |==== |
| endif::cl_khr_global_int32_base_atomics,cl_khr_global_int32_extended_atomics,cl_khr_local_int32_base_atomics,cl_khr_local_int32_extended_atomics[] |
| |
| |
| ifdef::cl_khr_int64_base_atomics,cl_khr_int64_extended_atomics[] |
| [[atomic-legacy-int64]] |
| ==== Legacy 64-Bit Atomic Extensions |
| |
| Similar to the <<atomic-legacy, OpenCL C 1.x Legacy Atomics>>, atomic |
| functions operating on 64-bit integers are provided by extensions. |
| |
| ifdef::cl_khr_int64_base_atomics[] |
| If the `<<cl_khr_int64_base_atomics>>` extension macro is supported, it |
| provides the functions described in the <<table-atomic-int64-base, Built-in |
| 64-Bit Base Atomic Functions>> table below. |
| |
| [[table-atomic-int64-base]] |
| .Built-in 64-Bit Base Atomic Functions |
| [cols="9,5",options="header",] |
| |==== |
| |*Function* |*Description* |
| | long **atom_add** (volatile {global} long *_p_, long _val_) + |
| long **atom_add** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_add** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_add** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ + _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_sub** (volatile {global} long *_p_, long _val_) + |
| long **atom_sub** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_sub** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_sub** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ - _val_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_xchg** (volatile {global} long *_p_, long _val_) + |
| long **atom_xchg** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_xchg** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_xchg** (volatile {local} ulong *_p_, ulong _val_) |
| | Swaps the _old_ value stored at location _p_ with new value given by |
| _val_. |
| Returns _old_ value. |
| | long **atom_inc** (volatile {global} long *_p_) + |
| long **atom_inc** (volatile {local} long *_p_) |
| |
| ulong **atom_inc** (volatile {global} ulong *_p_) + |
| ulong **atom_inc** (volatile {local} ulong *_p_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ + _1_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_dec** (volatile {global} long *_p_) + |
| long **atom_dec** (volatile {local} long *_p_) |
| |
| ulong **atom_dec** (volatile {global} ulong *_p_) + |
| ulong **atom_dec** (volatile {local} ulong *_p_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ - _1_) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_cmpxchg** (volatile {global} long *_p_, long _cmp_, long _val_) + |
| long **atom_cmpxchg** (volatile {local} long *_p_, long _cmp_, long _val_) |
| |
| ulong **atom_cmpxchg** (volatile {global} ulong *_p_, ulong _cmp_, ulong _val_) + |
| ulong **atom_cmpxchg** (volatile {local} ulong *_p_, ulong _cmp_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ == _cmp_) ? _val_ : _old_ and store result at location |
| pointed by _p_. |
| The function returns _old_. |
| |==== |
| |
| endif::cl_khr_int64_base_atomics[] |
| |
| ifdef::cl_khr_int64_extended_atomics[] |
| If the `<<cl_khr_int64_extended_atomics>>` extension macro is supported, it |
| provides the functions described in the <<table-atomic-int64-extended, |
| Built-in 64-Bit Extended Atomic Functions>> table below. |
| |
| [[table-atomic-int64-extended]] |
| .Built-in 64-Bit Extended Atomic Functions |
| [cols=",",options="header",] |
| |==== |
| |*Function* |*Description* |
| | long **atom_min** (volatile {global} long *_p_, long _val_) + |
| long **atom_min** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_min** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_min** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute *min*(_old_, _val_) and store minimum value at location |
| pointed by _p_. |
| The function returns _old_. |
| | long **atom_max** (volatile {global} long *_p_, long _val_) + |
| long **atom_max** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_max** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_max** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute *max*(_old_, _val_) and store maximum value at location |
| pointed by _p_. |
| The function returns _old_. |
| | long **atom_and** (volatile {global} long *_p_, long _val_) + |
| long **atom_and** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_and** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_and** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ & val) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_or** (volatile {global} long *_p_, long _val_) + |
| long **atom_or** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_or** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_or** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ \| val) and store result at location pointed by _p_. |
| The function returns _old_. |
| | long **atom_xor** (volatile {global} long *_p_, long _val_) + |
| long **atom_xor** (volatile {local} long *_p_, long _val_) |
| |
| ulong **atom_xor** (volatile {global} ulong *_p_, ulong _val_) + |
| ulong **atom_xor** (volatile {local} ulong *_p_, ulong _val_) |
| | Read the 64-bit value (referred to as _old_) stored at location |
| pointed by _p_. |
| Compute (_old_ ^ val) and store result at location pointed by _p_. |
| The function returns _old_. |
| |==== |
| endif::cl_khr_int64_extended_atomics[] |
| |
| NOTE: Atomic operations on 64-bit integers and 32-bit integers (and floats) |
| are also atomic with respect to each other. |
| |
| endif::cl_khr_int64_base_atomics,cl_khr_int64_extended_atomics[] |
| |
| |
| [[atomic-restrictions]] |
| ==== Restrictions |
| |
| [open,refpage='atomicRestrictions',desc='Restrictions on Atomic Operations',type='freeform',spec='clang',anchor='atomic-restrictions',xrefs='atomicFunctions atomicTypes atomic_compare_exchange atomic_exchange atomic_fetch_key atomic_flag atomic_flag_clear atomic_flag_test_and_set atomic_flag_test_and_set_explicit atomic_init atomic_load atomic_store atomic_work_item_fence'] |
| -- |
| * All operations on atomic types must be performed using the built-in |
| atomic functions. |
| C11 and {cpp11} support operators on atomic types. |
| OpenCL C does not support operators with atomic types. |
| Using atomic types with operators should result in a compilation error. |
| * The `atomic_bool`, `atomic_char`, `atomic_uchar`, `atomic_short`, |
| `atomic_ushort`, `atomic_intmax_t` and `atomic_uintmax_t` types are not |
| supported by OpenCL C. |
| * OpenCL C 2.0 requires that the built-in atomic functions on atomic types |
| are lock-free. |
| In OpenCL C 3.0 or newer, built-in atomic functions on atomic types may be |
| lock-free. |
| * The `+_Atomic+` type specifier and `+_Atomic+` type qualifier are not supported |
| by OpenCL C. |
| * The behavior of atomic operations where pointer arguments to the atomic |
| functions refers to an atomic type in the `private` address space is |
| undefined. |
| * Using `memory_order_acquire` with any built-in atomic function except |
| `atomic_work_item_fence` <<unified-spec, requires>> support for OpenCL C |
| 2.0, or OpenCL C 3.0 or newer and the {opencl_c_atomic_order_acq_rel} |
| feature. |
| * Using `memory_order_release` with any built-in atomic function except |
| `atomic_work_item_fence` <<unified-spec, requires>> support for OpenCL C |
| 2.0, or OpenCL C 3.0 or newer and the {opencl_c_atomic_order_acq_rel} |
| feature. |
| * Using `memory_order_acq_rel` with any built-in atomic function except |
| `atomic_work_item_fence` <<unified-spec, requires>> support for OpenCL C |
| 2.0, or OpenCL C 3.0 or newer and the {opencl_c_atomic_order_acq_rel} |
| feature. |
| * Using `memory_order_seq_cst` with any built-in atomic function |
| <<unified-spec, requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_atomic_order_seq_cst} feature. |
| * Using `memory_scope_sub_group` with any built-in atomic function |
| <<unified-spec, requires>> support for |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} feature. |
| * Using `memory_scope_device` <<unified-spec, requires>> support for OpenCL |
| C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_atomic_scope_device} feature. |
| * Using `memory_scope_all_svm_devices` <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {opencl_c_atomic_scope_all_devices} feature. |
| * Using `memory_scope_all_devices` <<unified-spec, requires>> support for OpenCL |
| C 3.0 or newer and the {opencl_c_atomic_scope_all_devices} feature. |
| -- |
| |
| |
| [[miscellaneous-vector-functions]] |
| === Miscellaneous Vector Functions |
| |
| [open,refpage='miscVectorFunctions',desc='Miscellaneous Vector Functions',type='freeform',spec='clang',anchor='miscellaneous-vector-functions',xrefs='atomicFunctions',alias='shuffle vec_step'] |
| -- |
| |
| The OpenCL C programming language implements the following additional |
| built-in vector functions. |
| We use the generic type name `gentype__n__` (or `gentype__m__`) to indicate the |
| built-in data types `char__n__`, `uchar__n__`, `short__n__`, |
| `ushort__n__`, |
| `int__n__`, `uint__n__`, `long__n__` |
| footnote:[{fn-int64-supported}], `ulong__n__`, `half__n__` footnote:[{fn-half-supported}], `float__n__`, or |
| `double__n__` footnote:[{fn-double-supported}] as the type for |
| the arguments unless otherwise stated. |
| We use the generic name `ugentype__n__` to indicate the built-in unsigned |
| integer data types. |
| _n_ is 2, 4, 8, or 16. |
| |
| [[table-misc-vector]] |
| .Built-in Miscellaneous Vector Functions |
| [cols="1,2",options="header",] |
| |==== |
| | Function | Description |
| | int *vec_step*(gentype__n__ _a_) + |
| int *vec_step*(char3 _a_) + |
| int *vec_step*(uchar3 _a_) + |
| int *vec_step*(short3 _a_) + |
| int *vec_step*(ushort3 _a_) + |
| int *vec_step*(half3 _a_) + |
| int *vec_step*(int3 _a_) + |
| int *vec_step*(uint3 _a_) + |
| int *vec_step*(long3 _a_) + |
| int *vec_step*(ulong3 _a_) + |
| int *vec_step*(float3 _a_) + |
| int *vec_step*(double3 _a_) + |
| int *vec_step*(_type_) |
| | The *vec_step* built-in function takes a built-in scalar or vector |
| data type argument and returns an integer value representing the |
| number of elements in the scalar or vector. The argument is not |
| evaluated. |
| |
| For all scalar types, *vec_step* returns 1. |
| |
| The *vec_step* built-in functions that take a 3-component vector |
| return 4. |
| |
| *vec_step* may also take a type name as an argument, e.g. |
| *vec_step*(float2) |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| | gentype__n__ *shuffle*(gentype__m__ _x_, |
| ugentype__n__ _mask_) + |
| gentype__n__ *shuffle2*(gentype__m__ _x_, |
| gentype__m__ _y_, |
| ugentype__n__ _mask_) |
| a| The *shuffle* and *shuffle2* built-in functions construct a |
| permutation of elements from one or two input vectors respectively |
| that are of the same type, returning a vector with the same element |
| type as the input and length that is the same as the shuffle mask. |
| The size of each element in the _mask_ must match the size of each |
| element in the result. |
| For *shuffle*, only the *ilogb*(2__m__-1) least significant bits of each |
| _mask_ element are considered. |
| For *shuffle2*, only the *ilogb*(2__m__-1)+1 least significant bits of |
| each _mask_ element are considered. |
| Other bits in the mask shall be ignored. |
| |
| The elements of the input vectors are numbered from left to right across one |
| or both of the vectors. |
| For this purpose, the number of elements in a vector is given by |
| *vec_step*(gentype__m__). |
| The shuffle _mask_ operand specifies, for each element of the result vector, |
| which element of the one or two input vectors the result element gets. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.1 or newer. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| uint4 mask = (uint4)(3, 2, 1, 0); |
| float4 a; |
| float4 r = shuffle(a, mask); |
| // r.s0123 = a.wzyx |
| |
| uint8 mask = (uint8)(0, 1, 2, 3, 4, 5, 6, 7); |
| float4 a, b; |
| float8 r = shuffle2(a, b, mask); |
| // r.s0123 = a.xyzw |
| // r.s4567 = b.xyzw |
| |
| uint4 mask; |
| float8 a; |
| float4 b; |
| |
| b = shuffle(a, mask); |
| ---------- |
| |
| Examples that are not valid are: |
| |
| [source,opencl_c] |
| ---------- |
| uint8 mask; |
| short16 a; |
| short8 b; |
| |
| b = shuffle(a, mask); // not valid |
| ---------- |
| |==== |
| -- |
| |
| |
| [[printf]] |
| === printf |
| |
| [open,refpage='printfFunction',desc='printf Function',type='freeform',spec='clang',anchor='printf'] |
| -- |
| IMPORTANT: *printf* <<unified-spec, requires>> support for OpenCL C 1.2. |
| |
| The OpenCL C programming language implements the *printf* function. |
| |
| [[table-printf]] |
| .Built-in printf Function |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int **printf**(constant char *restrict _format_, ...) |
| | The *printf* built-in function writes output to an |
| implementation-defined stream such as stdout under control of the |
| string pointed to by _format_ that specifies how subsequent arguments |
| are converted for output. |
| If there are insufficient arguments for the format, the behavior is |
| undefined. |
| If the format is exhausted while arguments remain, the excess |
| arguments are evaluated (as always) but are otherwise ignored. |
| The *printf* function returns when the end of the format string is |
| encountered. |
| |
| *printf* returns 0 if it was executed successfully and -1 otherwise. |
| |==== |
| -- |
| |
| |
| [[printf-output-synchronization]] |
| ==== printf Output Synchronization |
| |
| When the event that is associated with a particular kernel invocation is |
| completed, the output of all printf() calls executed by this kernel |
| invocation is flushed to the implementation-defined output stream. |
| Calling *clFinish* on a command-queue flushes all pending output by printf |
| in previously enqueued and completed commands to the implementation-defined |
| output stream. |
| In the case that printf is executed from multiple work-items concurrently, |
| there is no guarantee of ordering with respect to written data. |
| For example, it is valid for the output of a work-item with a global id |
| (0,0,1) to appear intermixed with the output of a work-item with a global id |
| (0,0,4) and so on. |
| |
| |
| [[printf-format-string]] |
| ==== printf Format String |
| |
| The format shall be a character sequence, beginning and ending in its |
| initial shift state. |
| The format is composed of zero or more directives: ordinary characters (not |
| *%*), which are copied unchanged to the output stream; and conversion |
| specifications, each of which results in fetching zero or more subsequent |
| arguments, converting them, if applicable, according to the corresponding |
| conversion specifier, and then writing the result to the output stream. |
| The format is in the constant address space and must be resolvable at |
| compile time, i.e. cannot be dynamically created by the executing program |
| itself. |
| |
| Each conversion specification is introduced by the character *%*. |
| After the *%*, the following appear in sequence: |
| |
| * Zero or more _flags_ (in any order) that modify the meaning of the |
| conversion specification. |
| * An optional minimum _field width_. |
| If the converted value has fewer characters than the field width, it is |
| padded with spaces (by default) on the left (or right, if the left |
| adjustment flag, described later, has been given) to the field width. |
| The field width takes the form of a nonnegative decimal integer |
| footnote:[{fn-printf-field-width}]. |
| * An optional __precision__ that gives the minimum number of digits to |
| appear for the *d*, *i*, *o*, *u*, *x*, and *X* conversions, the number |
| of digits to appear after the decimal-point character for *a*, *A*, *e*, |
| *E*, *f*, and *F* conversions, the maximum number of significant digits |
| for the *g* and *G* conversions, or the maximum number of bytes to be |
| written for *s* conversions. |
| The precision takes the form of a period (*.*) followed by an optional |
| decimal integer; if only the period is specified, the precision is taken |
| as zero. |
| If a precision appears with any other conversion specifier, the behavior |
| is undefined. |
| * An optional _vector specifier_. |
| * A __length modifier__ that specifies the size of the argument. |
| The _length modifier_ is required with a vector specifier and together |
| specifies the vector type. |
| <<implicit-conversions,Implicit conversions>> between vector types are |
| disallowed. |
| If the _vector specifier_ is not specified, the _length modifier_ is |
| optional. |
| * A __conversion specifier__ character that specifies the type of |
| conversion to be applied. |
| |
| The flag characters and their meanings are: |
| |
| *-* The result of the conversion is left-justified within the field. |
| (It is right-justified if this flag is not specified.) |
| |
| *+* The result of a signed conversion always begins with a plus or minus |
| sign. |
| (It begins with a sign only when a negative value is converted if this flag |
| is not specified.) footnote:[{fn-printf-minus-sign}] |
| |
| _space_ If the first character of a signed conversion is not a sign, or if a |
| signed conversion results in no characters, a space is prefixed to the |
| result. |
| If the __space__ and *+* flags both appear, the _space_ flag is ignored. |
| |
| *#* The result is converted to an "`alternative form`". |
| For *o* conversion, it increases the precision, if and only if necessary, |
| to force the first digit of the result to be a zero (if the value and |
| precision are both 0, a single 0 is printed). |
| For *x* (or *X*) conversion, a nonzero result has *0x* (or *0X*) |
| prefixed to it. |
| For *a*, *A*, *e*, *E*, *f*, *F*, *g*, and *G* conversions, |
| the result of converting a floating-point number always contains a |
| decimal-point character, even if no digits follow it. |
| (Normally, a decimal-point character appears in the result of these |
| conversions only if a digit follows it.) For *g* and *G* conversions, |
| trailing zeros are *not* removed from the result. |
| For other conversions, the behavior is undefined. |
| |
| *0* For *d*, *i*, *o*, *u*, *x*, *X*, *a*, *A*, *e*, |
| *E*, *f*, *F*, *g*, and *G* conversions, leading zeros (following |
| any indication of sign or base) are used to pad to the field width rather |
| than performing space padding, except when converting an infinity or NaN. |
| If the *0* and *-* flags both appear, the *0* flag is ignored. |
| For *d*, *i*, *o*, *u*, *x*, and *X* conversions, if a precision |
| is specified, the *0* flag is ignored. |
| For other conversions, the behavior is undefined. |
| |
| The vector specifier and its meaning is: |
| |
| **v**__n__ Specifies that a following *a*, *A*, *e*, *E*, *f*, *F*, *g*, *G*, |
| *d*, *i*, *o*, *u*, *x*, or *X* conversion specifier applies to a vector |
| argument, where __n__ is the size of the vector and must be 2, 3, 4, 8 or 16. |
| |
| The vector value is displayed in the following general form: |
| |
| [none] |
| * value1 C value2 C ... C value__n__ |
| |
| where C is a separator character. |
| The value for this separator character is a comma. |
| |
| If the vector specifier is not used, the length modifiers and their meanings |
| are: |
| |
| *hh* Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* conversion |
| specifier applies to a `char` or `uchar` argument (the argument will have |
| been promoted according to the integer promotions, but its value shall be |
| converted to `char` or `uchar` before printing). |
| |
| *h* Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* conversion |
| specifier applies to a `short` or `ushort` argument (the argument will have |
| been promoted according to the integer promotions, but its value shall be |
| converted to `short` or `unsigned short` before printing). |
| |
| *l* (ell) Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* |
| conversion specifier applies to a `long` or `ulong` argument. |
| The *l* modifier is supported by the full profile. |
| For the embedded profile, the *l* modifier is supported only if 64-bit |
| integers are supported by the device. |
| |
| If the vector specifier is used, the length modifiers and their meanings |
| are: |
| |
| *hh* Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* conversion |
| specifier applies to a `char__n__` or `uchar__n__` argument (the argument |
| will not be promoted). |
| |
| *h* Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* conversion |
| specifier applies to a `short__n__` or `ushort__n__` argument (the argument |
| will not be promoted); that a following *a*, *A*, *e*, *E*, *f*, *F*, *g*, |
| or *G* conversion specifier applies to a `half__n__` |
| footnote:[{fn-half-supported}] argument. |
| |
| *hl* This modifier can only be used with the vector specifier. |
| Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* conversion |
| specifier applies to a `int__n__` or `uint__n__` argument; that a following |
| *a*, *A*, *e*, *E*, *f*, *F*, *g*, or *G* conversion specifier applies to a |
| `float__n__` argument. |
| |
| **l**(ell) Specifies that a following *d*, *i*, *o*, *u*, *x*, or *X* |
| conversion specifier applies to a `long__n__` or `ulong__n__` argument; that |
| a following *a*, *A*, *e*, *E*, *f*, *F*, *g*, or *G* conversion specifier |
| applies to a `double__n__` argument. |
| The *l* modifier is supported by the full profile. |
| For the embedded profile, the *l* modifier is supported only if 64-bit |
| integers or double-precision floating-point are supported by the device. |
| |
| If a vector specifier appears without a length modifier, the behavior is |
| undefined. |
| The vector data type described by the vector specifier and length modifier |
| must match the data type of the argument; otherwise the behavior is |
| undefined. |
| |
| If a length modifier appears with any conversion specifier other than as |
| specified above, the behavior is undefined. |
| |
| The conversion specifiers and their meanings are: |
| |
| **d,i** The `int`, `char__n__`, `short__n__`, `int__n__` or `long__n__` |
| argument is converted to signed decimal in the style __[__**-**_]dddd_. |
| The precision specifies the minimum number of digits to appear; if the value |
| being converted can be represented in fewer digits, it is expanded with |
| leading zeros. |
| The default precision is 1. |
| The result of converting a zero value with a precision of zero is no |
| characters. |
| |
| *o,u,* |
| |
| *x,X* The `uint`, `uchar__n__`, `ushort__n__`, `uint__n__` or |
| `ulong__n__` argument is converted to unsigned octal (*o*), unsigned decimal |
| (*u*), or unsigned hexadecimal notation (*x* or *X*) in the style _dddd_; |
| the letters *abcdef* are used for *x* conversion and the letters *ABCDEF* |
| for *X* conversion. |
| The precision specifies the minimum number of digits to appear; if the value |
| being converted can be represented in fewer digits, it is expanded with |
| leading zeros. |
| The default precision is 1. |
| The result of converting a zero value with a precision of zero is no |
| characters. |
| |
| **f,F** A `double`, `half__n__`, `float__n__` or `double__n__` argument |
| representing a floating-point number is converted to decimal notation in the |
| style __[__**-**__]ddd__**.**_ddd_, where the number of digits after the |
| decimal-point character is equal to the precision specification. |
| If the precision is missing, it is taken as 6; if the precision is zero and |
| the **# **flag is not specified, no decimal-point character appears. |
| If a decimal-point character appears, at least one digit appears before it. |
| The value is rounded to the appropriate number of digits. |
| A `double`, `half__n__`, `float__n__` or `double__n__` argument representing |
| an infinity is converted in one of the styles __[__**-**__]__**inf **or |
| __[__**-**__]__**infinity ** -- which style is implementation-defined. |
| A `double`, `half__n__`, `float__n__` or `double__n__` argument representing |
| a NaN is converted in one of the styles __[__**-**__]__**nan **or |
| __[__**-**__]__**nan(**__n-char-sequence__**) ** |
| -- which style, and the meaning of any _n-char-sequence_, is |
| implementation-defined. |
| The *F* conversion specifier produces `INF`, `INFINITY`, or `NAN` instead of |
| *inf*, *infinity*, or *nan*, respectively footnote:[{fn-printf-infinity-nan}]. |
| |
| **e,E** A `double`, `half__n__`, `float__n__` or `double__n__` argument |
| representing a floating-point number is converted in the style |
| __[__**-**__]d__**.**__ddd __**e{plusmn}}**_dd_, where there is one digit |
| (which is nonzero if the argument is nonzero) before the decimal-point |
| character and the number of digits after it is equal to the precision; if |
| the precision is missing, it is taken as 6; if the precision is zero and the |
| *#* flag is not specified, no decimal-point character appears. |
| The value is rounded to the appropriate number of digits. |
| The *E* conversion specifier produces a number with *E* instead of *e* |
| introducing the exponent. |
| The exponent always contains at least two digits, and only as many more |
| digits as necessary to represent the exponent. |
| If the value is zero, the exponent is zero. |
| A `double`, `half__n__`, `float__n__` or `double__n__` argument representing |
| an infinity or NaN is converted in the style of an *f* or *F* conversion |
| specifier. |
| |
| **g,G** A `double`, `half__n__`, `float__n__` or `double__n__` argument |
| representing a floating-point number is converted in style *f* or *e* (or in |
| style *F* or *E* in the case of a *G* conversion specifier), depending on |
| the value converted and the precision. |
| Let __P __equal the precision if nonzero, 6 if the precision is omitted, or |
| 1 if the precision is zero. |
| Then, if a conversion with style *E* would have an exponent of _X_: -- if |
| _P_ > _X_ {geq} -4, the conversion is with style *f* (or *F*) and precision |
| _P_ *-* (_X_ *+* 1). |
| -- otherwise, the conversion is with style *e *(or *E*) and precision _P_ |
| *-* 1. |
| Finally, unless the *#* flag is used, any trailing zeros are removed from |
| the fractional portion of the result and the decimal-point character is |
| removed if there is no fractional portion remaining. |
| A `double`, `half__n__`, `float__n__` or `double__n__` *e* argument |
| representing an infinity or NaN is converted in the style of an *f* or *F* |
| conversion specifier. |
| |
| **a,A** A `double`, `half__n__`, `float__n__` or `double__n__` argument |
| representing a floating-point number is converted in the style |
| __[__**-**__]__**0x**__h__**.**__hhhh __**p{plusmn}**_d_, where there is one |
| hexadecimal digit (which is nonzero if the argument is a normalized |
| floating-point number and is otherwise unspecified) before the decimal-point |
| character footnote:[{fn-printf-hex-float}] and the number of hexadecimal digits |
| after it is equal to the precision; if the precision is missing, then the |
| precision is sufficient for an exact representation of the value; if the |
| precision is zero and the *#* flag is not specified, no decimal point character |
| appears. |
| The letters *abcdef* are used for *a* conversion and the letters *ABCDEF* |
| for *A* conversion. |
| The *A* conversion specifier produces a number with *X* and *P* instead of |
| *x* and *p*. |
| The exponent always contains at least one digit, and only as many more |
| digits as necessary to represent the decimal exponent of 2. |
| If the value is zero, the exponent is zero. |
| A `double`, `half__n__`, `float__n__` or `double__n__` argument representing |
| an infinity or NaN is converted in the style of an *f* or *F* conversion |
| specifier. |
| |
| [NOTE] |
| ==== |
| The conversion specifiers *e,E,g,G,a,A* convert a `float` or `half` argument |
| that is a scalar type to a `double` only if <<double-precision, double |
| precision is supported>>. |
| Otherwise, the argument will be a `float` instead of a `double` and the |
| `half` type will be converted to a `float`. |
| ==== |
| |
| *c* The `int` argument is converted to an `unsigned char`, and the resulting |
| character is written. |
| |
| *s* The argument shall be a literal string |
| footnote:[{fn-printf-literal-string}]. |
| Characters from the literal string array are written up to (but not |
| including) the terminating null character. |
| If the precision is specified, no more than that many bytes are written. |
| If the precision is not specified or is greater than the size of the array, |
| the array shall contain a null character. |
| |
| *p* The argument shall be a pointer to *void*. |
| The pointer can refer to a memory region in the `global`, `constant`, |
| `local`, `private`, or generic address space. |
| The value of the pointer is converted to a sequence of printing characters |
| in an implementation-defined manner. |
| |
| *%* A *%* character is written. |
| No argument is converted. |
| The complete conversion specification shall be *%%*. |
| |
| If a conversion specification is invalid, the behavior is undefined. |
| If any argument is not the correct type for the corresponding conversion |
| specification, the behavior is undefined. |
| |
| In no case does a nonexistent or small field width cause truncation of a |
| field; if the result of a conversion is wider than the field width, the |
| field is expanded to contain the conversion result. |
| |
| For *a* and *A* conversions, the value is correctly rounded to a hexadecimal |
| floating number with the given precision. |
| |
| A few examples of printf are given below: |
| |
| [source,opencl_c] |
| ---------- |
| float4 f = (float4)(1.0f, 2.0f, 3.0f, 4.0f); |
| uchar4 uc = (uchar4)(0xFA, 0xFB, 0xFC, 0xFD); |
| |
| printf("f4 = %2.2v4hlf\n", f); |
| printf("uc = %#v4hhx\n", uc); |
| ---------- |
| |
| The above two printf calls print the following: |
| |
| [source,opencl_c] |
| ---------- |
| f4 = 1.00,2.00,3.00,4.00 |
| uc = 0xfa,0xfb,0xfc,0xfd |
| ---------- |
| |
| A few examples of valid use cases of printf for the conversion specifier *s* |
| are given below. |
| The argument value must be a pointer to a literal string. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void my_kernel( ... ) |
| { |
| printf("%s\n", "this is a test string\n"); |
| } |
| ---------- |
| |
| A few examples of invalid use cases of printf for the conversion specifier |
| *s* are given below: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void my_kernel(global char *s, ... ) |
| { |
| printf("%s\n", s); |
| constant char *p = "`this is a test string\n`"; |
| printf("%s\n", p); |
| printf("%s\n", &p[3]); |
| } |
| ---------- |
| |
| A few examples of invalid use cases of printf where data types given by the |
| vector specifier and length modifier do not match the argument type are |
| given below: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void my_kernel(global char *s, ... ) |
| { |
| uint2 ui = (uint2)(0x12345678, 0x87654321); |
| |
| printf("unsigned short value = (%#v2hx)\n", ui) |
| printf("unsigned char value = (%#v2hhx)\n", ui) |
| } |
| ---------- |
| |
| |
| [[differences-between-opencl-c-and-c99-printf]] |
| ==== Differences Between OpenCL C and C99 printf |
| |
| * The *l* modifier followed by a *c* conversion specifier or *s* |
| conversion specifier is not supported by OpenCL C. |
| * The *ll*, *j*, *z*, *t*, and *L* length modifiers are not supported by |
| OpenCL C but are reserved. |
| * The *n* conversion specifier is not supported by OpenCL C but is |
| reserved. |
| * OpenCL C adds the optional *v*__n__ vector specifier to support printing |
| of vector types. |
| * The conversion specifiers *f*, *F*, *e*, *E*, *g*, *G*, *a*, *A* convert |
| a `float` argument to a `double` only if the `double` data type is |
| supported. |
| Refer to the value of the <<opencl-device-queries, |
| `CL_DEVICE_DOUBLE_FP_CONFIG` device query>>. |
| If the `double` data type is not supported, the argument will be a |
| `float` instead of a `double`. |
| * For the embedded profile, the *l* length modifier is supported only if |
| 64-bit integers are supported. |
| * In OpenCL C, *printf* returns 0 if it was executed successfully and -1 |
| otherwise vs. |
| C99 where *printf* returns the number of characters printed or a |
| negative value if an output or encoding error occurred. |
| * In OpenCL C, the conversion specifier *s* can only be used for arguments |
| that are literal strings. |
| |
| |
| [[image-read-and-write-functions]] |
| === Image Read and Write Functions |
| |
| The built-in functions defined in this section can only be used with image |
| memory objects. |
| An image memory object can be accessed by specific function calls that read |
| from and/or write to specific locations in the image. |
| |
| Support for the image built-in functions is optional. |
| If a device supports images then the value of the <<opencl-device-queries, |
| `CL_DEVICE_IMAGE_SUPPORT` device query>>) is `CL_TRUE` and the OpenCL C |
| compiler for that device must define the `+__IMAGE_SUPPORT__+` macro. |
| A compiler for OpenCL C 3.0 or newer for that device must also support the |
| {opencl_c_images} feature. |
| |
| Image memory objects that are being read by a kernel should be declared with |
| the `read_only` qualifier. |
| *write_image* calls to image memory objects declared with the read_only |
| qualifier will generate a compilation error. |
| Image memory objects that are being written to by a kernel should be |
| declared with the write_only qualifier. |
| *read_image* calls to image memory objects declared with the `write_only` |
| qualifier will generate a compilation error. |
| *read_image* and *write_image* calls to the same image memory object in a |
| kernel are supported. |
| Image memory objects that are being read and written by a kernel should be |
| declared with the `read_write` qualifier. |
| |
| The *read_image* calls returns a four component floating-point, integer or |
| unsigned integer color value. |
| The color values returned by *read_image* are identified as _x_, _y_, _z_, |
| _w_ where _x_ refers to the red component, _y_ refers to the green |
| component, _z_ refers to the blue component and _w_ refers to the alpha |
| component. |
| |
| |
| [[samplers]] |
| ==== Samplers |
| |
| [open,refpage='samplers',desc='Image Samplers',type='freeform',spec='clang',anchor='samplers',alias='sampler_t'] |
| -- |
| |
| The image read functions take a sampler argument. |
| The sampler can be passed as an argument to the kernel using |
| *clSetKernelArg*, or can be declared in the outermost scope of kernel |
| functions, or it can be a constant variable of type `sampler_t` declared in |
| the program source. |
| |
| Sampler variables in a program are declared to be of type `sampler_t`. |
| A variable of `sampler_t` type declared in the program source must be |
| initialized with a 32-bit unsigned integer constant, which is interpreted as |
| a bit-field specifying the following properties: |
| |
| * Addressing Mode |
| * Filter Mode |
| * Normalized Coordinates |
| |
| These properties control how elements of an image object are read by |
| *read_image{f|i|ui}*. |
| |
| Samplers can also be declared as global constants in the program source |
| using the following syntax. |
| |
| [source,opencl_c] |
| ---------- |
| const sampler_t <sampler name> = <value> |
| ---------- |
| |
| or |
| |
| [source,opencl_c] |
| ---------- |
| constant sampler_t <sampler name> = <value> |
| ---------- |
| |
| or |
| |
| [source,opencl_c] |
| ---------- |
| __constant sampler_t <sampler_name> = <value> |
| ---------- |
| |
| Note that samplers declared using the `constant` qualifier are not counted |
| towards the maximum number of arguments pointing to the constant address |
| space or the maximum size of the `constant` address space allowed per device |
| (i.e. the value of the <<opencl-device-queries, |
| `CL_DEVICE_MAX_CONSTANT_ARGS`>> and <<opencl-device-queries, |
| `CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE`>> device queries). |
| |
| The sampler fields are described in the following table. |
| |
| [[table-sampler-descriptor]] |
| .Sampler Descriptor |
| [cols=",",options="header",] |
| |==== |
| | Sampler State | Description |
| | `<normalized coords>` |
| | Specifies whether the _x_, _y_ and _z_ coordinates are passed in as |
| normalized or unnormalized values. |
| This must be a literal value and can be one of the following |
| predefined enums: |
| |
| `CLK_NORMALIZED_COORDS_TRUE` or `CLK_NORMALIZED_COORDS_FALSE`. |
| |
| The samplers used with an image in multiple calls to |
| *read_image{f\|i\|ui}* declared in a kernel must use the same value |
| for <normalized coords>. |
| | `<addressing mode>` |
| | Specifies the image addressing mode, i.e. how out-of-range image |
| coordinates are handled. |
| This must be a literal value and can be one of the following |
| predefined enums: |
| |
| `CLK_ADDRESS_MIRRORED_REPEAT` - Flip the image coordinate at every |
| integer junction. |
| This addressing mode can only be used with normalized coordinates. |
| If normalized coordinates are not used, this addressing mode may |
| generate image coordinates that are undefined. |
| |
| `CLK_ADDRESS_REPEAT` - out-of-range image coordinates are wrapped to |
| the valid range. |
| This addressing mode can only be used with normalized coordinates. |
| If normalized coordinates are not used, this addressing mode may |
| generate image coordinates that are undefined. |
| |
| `CLK_ADDRESS_CLAMP_TO_EDGE` - out-of-range image coordinates are |
| clamped to the extent. |
| |
| `CLK_ADDRESS_CLAMP` - out-of-range image coordinates will return a |
| border color footnote:[{fn-CLK_ADDRESS_CLAMP}]. |
| |
| `CLK_ADDRESS_NONE` - for this addressing mode the programmer |
| guarantees that the image coordinates used to sample elements of the |
| image refer to a location inside the image; otherwise the results are |
| undefined. |
| |
| For 1D and 2D image arrays, the addressing mode applies only to the |
| _x_ and (_x, y_) coordinates. |
| The addressing mode for the coordinate which specifies the array index |
| is always `CLK_ADDRESS_CLAMP_TO_EDGE`. |
| |
| | `<filter mode>` |
| | Specifies the filter mode to use. |
| This must be a literal value and can be one of the following |
| predefined enums: `CLK_FILTER_NEAREST` or `CLK_FILTER_LINEAR`. |
| |
| Refer to the <<addressing-and-filter-modes,detailed description of |
| these filter modes>>. |
| |==== |
| |
| *Examples*: |
| |
| [source,opencl_c] |
| ---------- |
| const sampler_t samplerA = CLK_NORMALIZED_COORDS_TRUE | |
| CLK_ADDRESS_REPEAT | |
| CLK_FILTER_NEAREST; |
| ---------- |
| |
| `samplerA` specifies a sampler that uses normalized coordinates, the repeat |
| addressing mode and a nearest filter. |
| |
| The maximum number of samplers that can be declared in a kernel can be |
| queried using the `CL_DEVICE_MAX_SAMPLERS` token in *clGetDeviceInfo*. |
| -- |
| |
| |
| [[determining-the-border-color-or-value]] |
| ===== *Determining the Border Color or Value* |
| |
| If `<addressing mode>` in sampler is `CLK_ADDRESS_CLAMP`, then out-of-range |
| image coordinates return the border color. |
| The border color selected depends on the image channel order and can be one |
| of the following values: |
| |
| * If the image channel order is `CL_A`, `CL_INTENSITY`, `CL_Rx`, |
| `CL_RA`, `CL_RGx`, `CL_RGBx`, `CL_sRGBx`, `CL_ARGB`, `CL_BGRA`, |
| `CL_ABGR`, `CL_RGBA`, `CL_sRGBA` or `CL_sBGRA`, the border color is |
| `(0.0f, 0.0f, 0.0f, 0.0f)`. |
| * If the image channel order is `CL_R`, `CL_RG`, `CL_RGB`, or |
| `CL_LUMINANCE`, the border color is `(0.0f, 0.0f, 0.0f, 1.0f)`. |
| * If the image channel order is `CL_DEPTH`, the border value is `0.0f`. |
| |
| |
| [[srgb-images]] |
| ===== *sRGB Images* |
| |
| The built-in image read functions will perform sRGB to linear RGB |
| conversions if the image is an sRGB image. |
| Likewise, the built-in image write functions perform the linear to |
| sRGB conversion if the image is an sRGB image. |
| |
| Only the R, G and B components are converted from linear to sRGB and |
| vice-versa. |
| The alpha component is returned as is. |
| |
| |
| [[built-in-image-read-functions]] |
| ==== Built-in Image Read Functions |
| |
| [open,refpage='imageReadFunctions',desc='Built-in Image Read Functions',type='freeform',spec='clang',anchor='built-in-image-read-functions',xrefs='imageQueryFunctions imageSamplerlessReadFunctions imageWriteFunctions',alias='read_imagef read_imagei read_imageui'] |
| -- |
| The following built-in function calls to read images with a sampler are |
| supported footnote:[{fn-read-image-with-sampler}]. |
| |
| ifdef::cl_khr_mipmap_image[] |
| If the `<<cl_khr_mipmap_image>>` extension macro is supported, read |
| functions which do not either |
| |
| * explicitly specify a level of detail _lod_, or |
| * compute a level of detail from _gradient_ parameters |
| |
| read from mip level 0 if _image_ is a mipmapped image. |
| endif::cl_khr_mipmap_image[] |
| |
| [[table-image-read]] |
| .Built-in Image Read Functions |
| [cols=",",,options="header",] |
| |==== |
| | Function | Description |
| | float4 *read_imagef*(read_only image2d_t _image_, sampler_t _sampler_, |
| int2 _coord_) + |
| float4 *read_imagef*(read_only image2d_t _image_, sampler_t _sampler_, |
| float2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D image object specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(read_only image2d_t _image_, sampler_t _sampler_, |
| int2 _coord_) + |
| half4 *read_imageh*(read_only image2d_t _image_, sampler_t _sampler_, |
| float2 _coord_) |
| | Use the coordinate _(coord.x, coord.y)_ to do an element lookup in the |
| 2D image object specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed |
| formats or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| The *read_imageh* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(read_only image2d_t _image_, sampler_t _sampler_, |
| int2 _coord_) + |
| int4 *read_imagei*(read_only image2d_t _image_, sampler_t _sampler_, |
| float2 _coord_) + |
| uint4 *read_imageui*(read_only image2d_t _image_, sampler_t _sampler_, |
| int2 _coord_) + |
| uint4 *read_imageui*(read_only image2d_t _image_, sampler_t _sampler_, |
| float2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D image object specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| The *read_image{i\|ui}* calls support a nearest filter only. |
| The filter_mode specified in _sampler_ must be set to |
| `CLK_FILTER_NEAREST`; otherwise the values returned are undefined. |
| |
| Furthermore, the *read_image{i\|ui}* calls that take integer |
| coordinates must use a sampler with normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode set to |
| `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| | | |
| | float4 *read_imagef*(read_only image3d_t _image_, sampler_t _sampler_, |
| int4 _coord_ ) + |
| float4 *read_imagef*(read_only image3d_t _image_, sampler_t _sampler_, |
| float4 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_, _coord.z_) to do an element |
| lookup in the 3D image object specified by _image_. |
| _coord.w_ is ignored. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description are |
| undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(read_only image3d_t _image_, sampler_t _sampler_, |
| int4 _coord_ ) + |
| half4 *read_imageh*(read_only image3d_t _image_, sampler_t _sampler_, |
| float4 _coord_) |
| | Use the coordinate _(coord.x_, _coord.y_, _coord.z)_ to do an |
| elementlookup in the 3D image object specified by _image_. |
| _coord.w_ is ignored. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| **read_imageh** returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| The *read_imageh* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description are |
| undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(read_only image3d_t _image_, sampler_t _sampler_, |
| int4 _coord_) + |
| int4 *read_imagei*(read_only image3d_t _image_, sampler_t _sampler_, |
| float4 _coord_) + |
| uint4 *read_imageui*(read_only image3d_t _image_, sampler_t _sampler_, |
| int4 _coord_) + |
| uint4 *read_imageui*(read_only image3d_t _image_, sampler_t _sampler_, |
| float4 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_, _coord.z_) to do an element |
| lookup in the 3D image object specified by _image_. |
| _coord.w_ is ignored. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| The *read_image{i\|ui}* calls support a nearest filter only. |
| The filter_mode specified in _sampler_ must be set to |
| `CLK_FILTER_NEAREST`; otherwise the values returned are undefined. |
| |
| Furthermore, the *read_image{i\|ui}* calls that take integer |
| coordinates must use a sampler with normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode set to |
| `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| | | |
| | float4 *read_imagef*(read_only image2d_array_t _image_, |
| sampler_t _sampler_, int4 _coord_) + |
| float4 *read_imagef*(read_only image2d_array_t _image_, |
| sampler_t _sampler_, float4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description above |
| are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(read_only image2d_array_t _image_, sampler_t |
| _sampler_, int4 _coord_) + |
| half4 *read_imageh*(read_only image2d_array_t _image_, sampler_t |
| _sampler_, float4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| The *read_imageh* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description above |
| are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(read_only image2d_array_t _image_, sampler_t _sampler_, |
| int4 _coord_) + |
| int4 *read_imagei*(read_only image2d_array_t _image_, sampler_t _sampler_, |
| float4 _coord_) + |
| uint4 *read_imageui*(read_only image2d_array_t _image_, |
| sampler_t _sampler_, int4 _coord_) + |
| uint4 *read_imageui*(read_only image2d_array_t _image_, |
| sampler_t _sampler_, float4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| The *read_image{i\|ui}* calls support a nearest filter only. |
| The filter_mode specified in _sampler_ must be set to |
| `CLK_FILTER_NEAREST`; otherwise the values returned are undefined. |
| |
| Furthermore, the *read_image{i\|ui}* calls that take integer |
| coordinates must use a sampler with normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode set to |
| `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| | | |
| | float4 *read_imagef*(read_only image1d_t _image_, sampler_t _sampler_, |
| int _coord_) + |
| float4 *read_imagef*(read_only image1d_t _image_, sampler_t _sampler_, |
| float _coord_) |
| | Use _coord_ to do an element lookup in the 1D image object specified |
| by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(read_only image1d_t _image_, sampler_t _sampler_, |
| int _coord_) + |
| half4 *read_imageh*(read_only image1d_t _image_, sampler_t _sampler_, |
| float _coord_) |
| | Use _coord_ to do an element lookup in the 1D image object specified |
| by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| The *read_imageh* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(read_only image1d_t _image_, sampler_t _sampler_, |
| int _coord_) + |
| int4 *read_imagei*(read_only image1d_t _image_, sampler_t _sampler_, |
| float _coord_) + |
| uint4 *read_imageui*(read_only image1d_t _image_, sampler_t _sampler_, |
| int _coord_) + |
| uint4 *read_imageui*(read_only image1d_t _image_, sampler_t _sampler_, |
| float _coord_) |
| | Use _coord_ to do an element lookup in the 1D image object specified |
| by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| The *read_image{i\|ui}* calls support a nearest filter only. |
| The filter_mode specified in _sampler_ must be set to |
| `CLK_FILTER_NEAREST`; otherwise the values returned are undefined. |
| |
| Furthermore, the *read_image{i\|ui}* calls that take integer |
| coordinates must use a sampler with normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode set to |
| `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | | |
| | float4 *read_imagef*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, int2 _coord_) + |
| float4 *read_imagef*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, float2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 1D image identified by |
| _coord.y_ in the 1D image array specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description above |
| are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, int2 _coord_) + |
| half4 *read_imageh*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, float2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 1D image identified by |
| _coord.y_ in the 1D image array specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| The *read_imageh* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description above |
| are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(read_only image1d_array_t _image_, sampler_t _sampler_, |
| int2 _coord_) + |
| int4 *read_imagei*(read_only image1d_array_t _image_, sampler_t _sampler_, |
| float2 _coord_) + |
| uint4 *read_imageui*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, int2 _coord_) + |
| uint4 *read_imageui*(read_only image1d_array_t _image_, |
| sampler_t _sampler_, float2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 1D image identified by |
| _coord.y_ in the 1D image array specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. Each channel will be stored |
| in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| The *read_image{i\|ui}* calls support a nearest filter only. |
| The filter_mode specified in _sampler_ must be set to |
| `CLK_FILTER_NEAREST`; otherwise the values returned are undefined. |
| |
| Furthermore, the *read_image{i\|ui}* calls that take integer |
| coordinates must use a sampler with normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode set to |
| `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | | |
| | float *read_imagef*(read_only image2d_depth_t _image_, |
| sampler_t _sampler_, int2 _coord_) + |
| float *read_imagef*(read_only image2d_depth_t _image_, |
| sampler_t _sampler_, float2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D depth image object specified by _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for depth image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| | float *read_imagef*(read_only image2d_array_depth_t _image_, |
| sampler_t _sampler_, int4 _coord_) + |
| float *read_imagef*(read_only image2d_array_depth_t _image_, |
| sampler_t _sampler_, float4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D depth image array specified by _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| The *read_imagef* calls that take integer coordinates must use a |
| sampler with filter mode set to `CLK_FILTER_NEAREST`, normalized |
| coordinates set to `CLK_NORMALIZED_COORDS_FALSE` and addressing mode |
| set to `CLK_ADDRESS_CLAMP_TO_EDGE`, `CLK_ADDRESS_CLAMP` or |
| `CLK_ADDRESS_NONE`; otherwise the values returned are undefined. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| |
| ifdef::cl_khr_mipmap_image[] |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| |
| int4 read_imagei( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| |
| uint4 read_imageui( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| |
| float read_imagef( |
| read_only image2d_depth_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| ---- |
| | Use the coordinate _coord.xy_ to do an element lookup in the mip level |
| specified by _lod_ in the 2D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| int4 read_imagei( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| uint4 read_imageui( |
| read_only image2d_t image, |
| sampler_t sampler, |
| float2 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| float read_imagef( |
| read_only image2d_depth_t image, |
| sampler_t sampler, |
| float2 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| ---- |
| | Use the gradients to compute the lod and coordinate _coord.xy_ to do |
| an element lookup in the mip level specified by the computed lod in |
| the 2D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float lod) |
| |
| int4 read_imagei( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float lod) |
| |
| uint4 read_imageui( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float lod) |
| ---- |
| | Use the coordinate _coord_ to do an element lookup in the mip level |
| specified by _lod_ in the 1D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float gradient_x, |
| float gradient_y) |
| |
| int4 read_imagei( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float gradient_x, |
| float gradient_y) |
| |
| uint4 read_imageui( |
| read_only image1d_t image, |
| sampler_t sampler, |
| float coord, |
| float gradient_x, |
| float gradient_y) |
| ---- |
| | Use the gradients to compute the lod and coordinate _coord_ to do an |
| element lookup in the mip level specified by the computed lod in the |
| 1D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| |
| int4 read_imagei( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| |
| uint4 read_imageui( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| ---- |
| | Use the coordinate _coord.xyz_ to do an element lookup in the mip |
| level specified by _lod_ in the 3D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float4 gradient_x, |
| float4 gradient_y) |
| |
| int4 read_imagei( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float4 gradient_x, |
| float4 gradient_y) |
| |
| uint4 read_imageui( |
| read_only image3d_t image, |
| sampler_t sampler, |
| float4 coord, |
| float4 gradient_x, |
| float4 gradient_y) |
| ---- |
| | Use the gradients to compute the lod and coordinate _coord.xyz_ to do |
| an element lookup in the mip level specified by the computed lod in |
| the 3D image object specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| |
| int4 read_imagei( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| |
| uint4 read_imageui( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float lod) |
| ---- |
| | Use the coordinate _coord.x_ to do an element lookup in the 1D image |
| identified by _coord.x_ and mip level specified by _lod_ in the 1D |
| image array specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float gradient_x, |
| float gradient_y) |
| |
| int4 read_imagei( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float gradient_x, |
| float gradient_y) |
| |
| uint4 read_imageui( |
| read_only image1d_array_t image, |
| sampler_t sampler, |
| float2 coord, |
| float gradient_x, |
| float gradient_y) |
| ---- |
| | Use the gradients to compute the lod and coordinate _coord.x_ to do an |
| element lookup in the mip level specified by the computed lod in the |
| 1D image array specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| |
| int4 read_imagei( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| |
| uint4 read_imageui( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| |
| float read_imagef( |
| read_only image2d_array_depth_t image, |
| sampler_t sampler, |
| float4 coord, |
| float lod) |
| ---- |
| | Use the coordinate _coord.xy_ to do an element lookup in the 2D image |
| identified by _coord.z_ and mip level specified by _lod_ in the 2D |
| image array specified by _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| int4 read_imagei( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| uint4 read_imageui( |
| read_only image2d_array_t image, |
| sampler_t sampler, |
| float4 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| |
| float read_imagef( |
| read_only image2d_array_depth_t image, |
| sampler_t sampler, |
| float4 coord, |
| float2 gradient_x, |
| float2 gradient_y) |
| ---- |
| | Use the gradients to compute the lod coordinate and _coord.xy_ to do |
| an element lookup in the 2D image identified by _coord.z_ and mip |
| level specified by the computed lod in the 2D image array specified by |
| _image_. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_mipmap_image>>` |
| extension macro. |
| endif::cl_khr_mipmap_image[] |
| |
| |==== |
| -- |
| |
| ifdef::cl_khr_mipmap_image[] |
| NOTE: If the `<<cl_khr_mipmap_image>>` extension macro is supported, |
| `CL_SAMPLER_NORMALIZED_COORDS` must be `CL_TRUE` for built-in functions |
| described in the table above that read from a mipmapped image; otherwise |
| behavior is undefined. |
| The value specified in the _lod_ argument is clamped to the minimum of |
| (actual number of mip levels - 1) in the image or the value specified for |
| `CL_SAMPLER_LOD_MAX`. |
| endif::cl_khr_mipmap_image[] |
| |
| |
| [[built-in-image-sampler-less-read-functions]] |
| ==== Built-in Image Sampler-less Read Functions |
| |
| [open,refpage='imageSamplerlessReadFunctions',desc='Built-in Image Sampler-less Read Functions',type='freeform',spec='clang',anchor='built-in-image-sampler-less-read-functions',xrefs='imageQueryFunctions imageReadFunctions imageWriteFunctions'] |
| -- |
| NOTE: Sampler-less image read functions <<unified-spec, require>> support for |
| OpenCL C 1.2 or newer, with some functions requiring support for newer |
| versions of OpenCL C as noted in the <<table-image-samplerless-read, table |
| below>>. |
| |
| The sampler-less image read functions behave exactly as the corresponding |
| <<built-in-image-read-functions,built-in image read functions>> that take |
| integer coordinates and a sampler with filter mode set to |
| `CLK_FILTER_NEAREST`, normalized coordinates set to |
| `CLK_NORMALIZED_COORDS_FALSE` and addressing mode to `CLK_ADDRESS_NONE`. |
| There is one exception when the _image_channel_data_type_ is a floating-point |
| type (such as `CL_FLOAT`). |
| In this exceptional case, when channel data values are denormalized, the |
| sampler-less image read function may return the denormalized data, while |
| the image read function with a sampler argument may flush the denormalized |
| channel data values to zero. |
| |
| _aQual_ in the following table refers to one of the access qualifiers. |
| For sampler-less read functions this may be `read_only` or `read_write`. |
| |
| [[table-image-samplerless-read]] |
| .Built-in Image Sampler-less Read Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | float4 *read_imagef*(_aQual_ image2d_t _image_, int2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D image object specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(_aQual_ image2d_t _image_, int2 _coord_) |
| | Use the coordinate _(coord.x, coord.y)_ to do an element lookup in the |
| 2D image object specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(_aQual_ image2d_t _image_, int2 _coord_) + |
| uint4 *read_imageui*(_aQual_ image2d_t _image_, int2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D image object specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. Each channel will be stored |
| in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| | | |
| | float4 *read_imagef*(_aQual_ image3d_t _image_, int4 _coord_ ) |
| | Use the coordinate (_coord.x_, _coord.y_, _coord.z_) to do an element |
| lookup in the 3D image object specified by _image_. |
| _coord.w_ is ignored. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description are |
| undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(_aQual_ image3d_t _image_, int4 _coord_ ) |
| | Use the coordinate _(coord.x_, _coord.y_, _coord.z)_ to do an element |
| lookup in the 3D image object specified by _image_. _coord.w_ is |
| ignored. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description are |
| undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(_aQual_ image3d_t _image_, int4 _coord_) + |
| uint4 *read_imageui*(_aQual_ image3d_t _image_, int4 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_, _coord.z_) to do an element |
| lookup in the 3D image object specified by _image_. |
| _coord.w_ is ignored. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| | | |
| | float4 *read_imagef*(_aQual_ image2d_array_t _image_, int4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(_aQual_ image2d_array_t _image_, int4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(_aQual_ image2d_array_t _image_, int4 _coord_) + |
| uint4 *read_imageui*(_aQual_ image2d_array_t _image_, int4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. Each channel will be stored |
| in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| | | |
| | float4 *read_imagef*(_aQual_ image1d_t _image_, int _coord_) + |
| float4 *read_imagef*(_aQual_ image1d_buffer_t _image_, int _coord_) |
| | Use _coord_ to do an element lookup in the 1D image or 1D image buffer |
| object specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(_aQual_ image1d_t _image_, int _coord_) + |
| half4 *read_imageh*(_aQual_ image1d_buffer_t _image_, int _coord_) |
| | Use _coord_ to do an element lookup in the 1D image or 1D image buffer |
| object specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(_aQual_ image1d_t _image_, int _coord_) + |
| uint4 *read_imageui*(_aQual_ image1d_t _image_, int _coord_) + |
| int4 *read_imagei*(_aQual_ image1d_buffer_t _image_, int _coord_) + |
| uint4 *read_imageui*(_aQual_ image1d_buffer_t _image_, int _coord_) |
| | Use _coord_ to do an element lookup in the 1D image or 1D image buffer |
| object specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. Each channel will be stored |
| in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| | | |
| | float4 *read_imagef*(_aQual_ image1d_array_t _image_, int2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 1D image identified by |
| _coord.y_ in the 1D image array specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| ifdef::cl_khr_fp16[] |
| | | |
| | half4 *read_imageh*(_aQual_ image1d_array_t _image_, int2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 2D image identified by |
| _coord.y_ in the 2D image array specified by _image_. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [0.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or `CL_UNORM_INT8`, or `CL_UNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values in the |
| range [-1.0, 1.0] for image objects created with |
| _image_channel_data_type_ set to `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imageh* returns half-precision floating-point values for image |
| objects created with _image_channel_data_type_ set to `CL_HALF_FLOAT`. |
| |
| Values returned by *read_imageh* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the `<<cl_khr_fp16>>` extension |
| macro. |
| endif::cl_khr_fp16[] |
| | | |
| | int4 *read_imagei*(_aQual_ image1d_array_t _image_, int2 _coord_) + |
| uint4 *read_imageui*(_aQual_ image1d_array_t _image_, int2 _coord_) |
| | Use _coord.x_ to do an element lookup in the 1D image identified by |
| _coord.y_ in the 1D image array specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. Each channel will be stored |
| in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| | | |
| | float *read_imagef*(_aQual_ image2d_depth_t _image_, int2 _coord_) |
| | Use the coordinate (_coord.x_, _coord.y_) to do an element lookup in |
| the 2D depth image object specified by _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| | float *read_imagef*(_aQual_ image2d_array_depth_t _image_, int4 _coord_) |
| | Use _coord.xy_ to do an element lookup in the 2D image identified by |
| _coord.z_ in the 2D depth image array specified by _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef( |
| image2d_msaa_t image, |
| int2 coord, |
| int sample) |
| ---- |
| | Use the coordinate _(coord.x, coord.y)_ and _sample_ to do an element |
| lookup in the 2D image object specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| int4 read_imagei(image2d_msaa_t image, |
| int2 coord, |
| int sample) |
| |
| uint4 read_imageui(image2d_msaa_t image, |
| int2 coord, |
| int sample) |
| ---- |
| | Use the coordinate _(coord.x, coord.y)_ and _sample_ to do an element |
| lookup in the 2D image object specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| * `CL_SIGNED_INT8`, |
| * `CL_SIGNED_INT16`, and |
| * `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| * `CL_UNSIGNED_INT8`, |
| * `CL_UNSIGNED_INT16`, and |
| * `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float4 read_imagef(image2d_array_msaa_t image, |
| int4 coord, |
| int sample) |
| ---- |
| | Use _coord.xy_ and _sample_ to do an element lookup in the 2D image |
| identified by _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagef* returns floating-point values in the range [0.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to one of |
| the pre-defined packed formats or `CL_UNORM_INT8`, or |
| `CL_UNORM_INT16`. |
| |
| *read_imagef* returns floating-point values in the range [-1.0, 1.0] |
| for image objects created with _image_channel_data_type_ set to |
| `CL_SNORM_INT8`, or `CL_SNORM_INT16`. |
| |
| *read_imagef* returns floating-point values for image objects created |
| with _image_channel_data_type_ set to `CL_HALF_FLOAT` or `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| int4 read_imagei(image2d_array_msaa_t image, |
| int4 coord, |
| int sample) |
| |
| uint4 read_imageui(image2d_array_msaa_t image, |
| int4 coord, |
| int sample) |
| ---- |
| | Use _coord.xy_ and _sample_ to do an element lookup in the 2D image |
| identified by _coord.z_ in the 2D image array specified by _image_. |
| |
| *read_imagei* and *read_imageui* return unnormalized signed integer |
| and unsigned integer values respectively. |
| Each channel will be stored in a 32-bit integer. |
| |
| *read_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| * `CL_SIGNED_INT8`, |
| * `CL_SIGNED_INT16`, and |
| * `CL_SIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imagei* are undefined. |
| |
| *read_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| * `CL_UNSIGNED_INT8`, |
| * `CL_UNSIGNED_INT16`, and |
| * `CL_UNSIGNED_INT32`. |
| |
| If the _image_channel_data_type_ is not one of the above values, the |
| values returned by *read_imageui* are undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| float read_imagef(image2d_msaa_depth_t image, |
| int2 coord, |
| int sample) |
| ---- |
| | Use the coordinate _(coord.x, coord.y)_ and _sample_ to do an element |
| lookup in the 2D depth image object specified by _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| a| |
| [source,c] |
| ---- |
| float read_imagef(image2d_array_msaaa_depth_t image, |
| int4 coord, |
| int sample) |
| ---- |
| | Use _coord.xy_ and _sample_ to do an element lookup in the 2D image |
| identified by _coord.z_ in the 2D depth image array specified by |
| _image_. |
| |
| *read_imagef* returns a floating-point value in the range [0.0, 1.0] |
| for depth image objects created with _image_channel_data_type_ set to |
| `CL_UNORM_INT16` or `CL_UNORM_INT24`. |
| |
| *read_imagef* returns a floating-point value for depth image objects |
| created with _image_channel_data_type_ set to `CL_FLOAT`. |
| |
| Values returned by *read_imagef* for image objects with |
| _image_channel_data_type_ values not specified in the description |
| above are undefined. |
| |
| Note: When a multisample image is accessed in a kernel, the access |
| takes one vector of integers describing which pixel to fetch and an |
| integer corresponding to the sample numbers describing which sample |
| within the pixel to fetch. |
| _sample_ identifies the sample position in the multi-sample image. |
| |
| *For best performance, we recommend that _sample_ be a literal value |
| so it is known at compile time and the OpenCL compiler can perform |
| appropriate optimizations for multi-sample reads on the device*. |
| |
| No standard sampling instructions are allowed on the multisample |
| image. Accessing a coordinate outside the image and/or a sample that |
| is outside the number of samples associated with each pixel in the |
| image is undefined |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_gl_msaa_sharing>>` extension macro. |
| endif::cl_khr_gl_msaa_sharing[] |
| |==== |
| -- |
| |
| |
| [[built-in-image-write-functions]] |
| ==== Built-in Image Write Functions |
| |
| [open,refpage='imageWriteFunctions',desc='Built-in Image Write Functions',type='freeform',spec='clang',anchor='built-in-image-write-functions',xrefs='imageQueryFunctions imageReadFunctions imageSamplerlessReadFunctions',alias='write_imagef write_imagei write_imageui'] |
| -- |
| The following built-in function calls to write images are supported. |
| |
| _aQual_ in the following table refers to one of the access qualifiers. |
| For write functions this may be `write_only` or `read_write`. |
| |
| ifdef::cl_khr_mipmap_image_writes[] |
| If the `<<cl_khr_mipmap_image_writes>>` extension macro is supported, write |
| functions which do not explicitly specify a level of detail _lod_ write to |
| mip level 0 if _image_ is a mipmapped image. |
| _mipwidth_, _mipheight_, and _mipdepth_ in the table refer to the width, |
| height, and depth of the _image_ mip level specified by _lod_ respectively; |
| _miplayers_ refers to the number of layers in _image_; and _miplevels_ |
| refers to the number of mip levels in _image_. |
| endif::cl_khr_mipmap_image_writes[] |
| |
| ifdef::cl_khr_srgb_image_writes[] |
| If the `<<cl_khr_srgb_image_writes>>` extension macro is supported, the |
| *write_imagef* functions described below may write to sRGB images. |
| Linear to sRGB conversion is performed by the function. |
| Only the R, G, and B components are converted from linear to sRGB; the A |
| component is written as-is. |
| endif::cl_khr_srgb_image_writes[] |
| |
| |
| [[table-image-write]] |
| .Built-in Image Write Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | void *write_imagef*(_aQual_ image2d_t _image_, int2 _coord_, float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image2d_t _image_, int2 _coord_, half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image2d_t _image_, int2 _coord_, int4 _color_) + |
| void *write_imageui*(_aQual_ image2d_t _image_, int2 _coord_, uint4 _color_) |
| | Write _color_ value to location specified by _coord.xy_ in the 2D |
| image object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| _coord.x_ and _coord.y_ are considered to be unnormalized coordinates, |
| and must be in the range [0, image width-1] and [0, image height-1] |
| respectively. |
| |
| *write_imagef* |
| ifdef::cl_khr_fp16[and *write_imageh*] |
| can only be used with image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or set to `CL_SNORM_INT8`, `CL_UNORM_INT8`, `CL_SNORM_INT16`, |
| `CL_UNORM_INT16`, `CL_HALF_FLOAT` or `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert channel |
| data from a floating-point value to actual data format in which the |
| channels are stored. |
| |
| *write_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| *write_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| The behavior of *write_imagef*, |
| ifdef::cl_khr_fp16[*write_imageh*,] |
| *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above or with _x_ and _y_ coordinate |
| values that are not in the range [0, image width-1] and [0, image |
| height-1], respectively, is undefined. |
| |
| ifdef::cl_khr_fp16[] |
| *write_imageh* <<unified-spec, requires>> support for the |
| `<<cl_khr_fp16>>` extension macro. |
| endif::cl_khr_fp16[] |
| | | |
| | void *write_imagef*(_aQual_ image2d_array_t _image_, int4 _coord_, |
| float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image2d_array_t _image_, int4 _coord_, |
| half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image2d_array_t _image_, int4 _coord_, |
| int4 _color_) + |
| void *write_imageui*(_aQual_ image2d_array_t _image_, int4 _coord_, |
| uint4 _color_) |
| | Write _color_ value to location specified by _coord.xy_ in the 2D |
| image identified by _coord.z_ in the 2D image array specified by |
| _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| _coord.x_, _coord.y_ and _coord.z_ are considered to be unnormalized |
| coordinates, and must be in the range [0, image width-1] and [0, image |
| height-1], and [0, image number of layers-1], respectively. |
| |
| *write_imagef* |
| ifdef::cl_khr_fp16[and *write_imageh*] |
| can only be used with image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or set to `CL_SNORM_INT8`, `CL_UNORM_INT8`, `CL_SNORM_INT16`, |
| `CL_UNORM_INT16`, `CL_HALF_FLOAT` or `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert channel |
| data from a floating-point value to actual data format in which the |
| channels are stored. |
| |
| *write_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| *write_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| The behavior of *write_imagef*, |
| ifdef::cl_khr_fp16[*write_imageh*,] |
| *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above or with (_x_, _y_, _z_) coordinate |
| values that are not in the range [0, image width-1], [0, image |
| height-1], and [0, image number of layers-1], respectively, is |
| undefined. |
| |
| ifdef::cl_khr_fp16[] |
| *write_imageh* <<unified-spec, requires>> support for the |
| `<<cl_khr_fp16>>` extension macro. |
| endif::cl_khr_fp16[] |
| | | |
| | void *write_imagef*(_aQual_ image1d_t _image_, int _coord_, |
| float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image1d_t _image_, int _coord_, |
| half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image1d_t _image_, int _coord_, |
| int4 _color_) + |
| void *write_imageui*(_aQual_ image1d_t _image_, int _coord_, |
| uint4 _color_) + |
| void *write_imagef*(_aQual_ image1d_buffer_t _image_, int _coord_, |
| float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image1d_buffer_t _image_, int _coord_, |
| half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image1d_buffer_t _image_, int _coord_, |
| int4 _color_) + |
| void *write_imageui*(_aQual_ image1d_buffer_t _image_, int _coord_, |
| uint4 _color_) |
| | Write _color_ value to location specified by _coord_ in the 1D image |
| or 1D image buffer object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| _coord_ is considered to be an unnormalized coordinate, and must be in |
| the range [0, image width-1]. |
| |
| *write_imagef* |
| ifdef::cl_khr_fp16[and *write_imageh*] |
| can only be used with image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or set to `CL_SNORM_INT8`, `CL_UNORM_INT8`, `CL_SNORM_INT16`, |
| `CL_UNORM_INT16`, `CL_HALF_FLOAT` or `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert channel |
| data from a floating-point value to actual data format in which the |
| channels are stored. |
| |
| *write_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| *write_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| The behavior of *write_imagef*, |
| ifdef::cl_khr_fp16[*write_imageh*,] |
| *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above, or with a coordinate value that is |
| not in the range [0, image width-1], is undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| |
| ifdef::cl_khr_fp16[] |
| *write_imageh* <<unified-spec, requires>> support for the |
| `<<cl_khr_fp16>>` extension macro. |
| endif::cl_khr_fp16[] |
| | | |
| | void *write_imagef*(_aQual_ image1d_array_t _image_, int2 _coord_, |
| float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image1d_array_t _image_, int2 _coord_, |
| half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image1d_array_t _image_, int2 _coord_, |
| int4 _color_) + |
| void *write_imageui*(_aQual_ image1d_array_t _image_, int2 _coord_, |
| uint4 _color_) |
| | Write _color_ value to location specified by _coord.x_ in the 1D image |
| identified by _coord.y_ in the 1D image array specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| _coord.x_ and _coord.y_ are considered to be unnormalized coordinates |
| and must be in the range [0, image width-1] and [0, image number of |
| layers-1], respectively. |
| |
| *write_imagef* |
| ifdef::cl_khr_fp16[and *write_imageh*] |
| can only be used with image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or set to `CL_SNORM_INT8`, `CL_UNORM_INT8`, `CL_SNORM_INT16`, |
| `CL_UNORM_INT16`, `CL_HALF_FLOAT` or `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert channel |
| data from a floating-point value to actual data format in which the |
| channels are stored. |
| |
| *write_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16` and + |
| `CL_SIGNED_INT32`. |
| |
| *write_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16` and + |
| `CL_UNSIGNED_INT32`. |
| |
| The behavior of *write_imagef*, |
| ifdef::cl_khr_fp16[*write_imageh*,] |
| *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above or with (_x_, _y_) coordinate |
| values that are not in the range [0, image width-1] and [0, image |
| number of layers-1], respectively, is undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or newer. |
| | | |
| | void *write_imagef*(_aQual_ image2d_depth_t _image_, int2 _coord_, |
| float _depth_) |
| | Write _depth_ value to location specified by _coord.xy_ in the 2D |
| depth image object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the depth value. |
| _coord.x_ and _coord.y_ are considered to be unnormalized coordinates, |
| and must be in the range [0, image width-1], and [0, image height-1], |
| respectively. |
| |
| *write_imagef* can only be used with image objects created with |
| _image_channel_data_type_ set to `CL_UNORM_INT16`, `CL_UNORM_INT24` or |
| `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert depth valye |
| from a floating-point value to actual data format associated with the |
| image. |
| |
| The behavior of *write_imagef*, *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above or with (_x_, _y_) coordinate |
| values that are not in the range [0, image width-1] and [0, image |
| height-1], respectively, is undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| | void *write_imagef*(_aQual_ image2d_array_depth_t _image_, int4 _coord_, |
| float _depth_) |
| | Write _depth_ value to location specified by _coord.xy_ in the 2D |
| image identified by _coord.z_ in the 2D depth image array specified by |
| _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the depth value. |
| _coord.x_, _coord.y_ and _coord.z_ are considered to be unnormalized |
| coordinates, and must be in the range [0, image width-1], [0, image |
| height-1], and [0, image number of layers-1], respectively. |
| |
| *write_imagef* can only be used with image objects created with |
| _image_channel_data_type_ set to `CL_UNORM_INT16`, `CL_UNORM_INT24` or |
| `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert depth valye |
| from a floating-point value to actual data format associated with the |
| image. |
| |
| The behavior of *write_imagef*, *write_imagei* and *write_imageui* for |
| image objects created with _image_channel_data_type_ values not |
| specified in the description above or with (_x_, _y_, _z_) coordinate |
| values that are not in the range [0, image width-1], [0, image |
| height-1], [0, image number of layers-1], respectively, is undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| | | |
| | void *write_imagef*(_aQual_ image3d_t _image_, int4 _coord_, |
| float4 _color_) + |
| ifdef::cl_khr_fp16[] |
| void *write_imageh*(_aQual_ image3d_t _image_, int4 _coord_, |
| half4 _color_) + |
| endif::cl_khr_fp16[] |
| void *write_imagei*(_aQual_ image3d_t _image_, int4 _coord_, |
| int4 _color_) + |
| void *write_imageui*(_aQual_ image3d_t _image_, int4 _coord_, |
| uint4 _color_) |
| | Write _color_ value to the location specified by _coord.xyz_ in the 3D |
| image object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| _coord.x_, _coord.y_ and _coord.z_ are considered to be unnormalized |
| coordinates, and must be in the range [0, image width-1], [0, image |
| height-1], and [0, image depth-1], respectively. |
| |
| *write_imagef* |
| ifdef::cl_khr_fp16[and *write_imageh*] |
| can only be used with image objects created with |
| _image_channel_data_type_ set to one of the pre-defined packed formats |
| or set to `CL_SNORM_INT8`, `CL_UNORM_INT8`, `CL_SNORM_INT16`, |
| `CL_UNORM_INT16`, `CL_HALF_FLOAT` or `CL_FLOAT`. |
| Appropriate data format conversion will be done to convert channel |
| data from a floating-point value to actual data format in which the |
| channels are stored. |
| |
| *write_imagei* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_SIGNED_INT8`, + |
| `CL_SIGNED_INT16`, or + |
| `CL_SIGNED_INT32`. |
| |
| *write_imageui* can only be used with image objects created with |
| _image_channel_data_type_ set to one of the following values: |
| |
| `CL_UNSIGNED_INT8`, + |
| `CL_UNSIGNED_INT16`, or + |
| `CL_UNSIGNED_INT32`. |
| |
| The behavior of *write_imagef*, |
| ifdef::cl_khr_fp16[*write_imageh*,] |
| *write_imagei* and *write_imageui* for |
| image objects with _image_channel_data_type_ values not specified in |
| the description above or with (_x_, _y_, _z_) coordinate values that |
| are not in the range [0, image width-1], [0, image height-1], and [0, |
| image depth-1], respectively, is undefined. |
| |
| <<unified-spec, Requires>> support for OpenCL C 2.0, or OpenCL C 3.0 or |
| newer and the {c_3d_image_writes} feature, or the |
| `<<cl_khr_3d_image_writes>>` extension. |
| |
| ifdef::cl_khr_fp16[] |
| *write_imageh* <<unified-spec, requires>> support for the |
| `<<cl_khr_fp16>>` extension macro. |
| endif::cl_khr_fp16[] |
| |
| ifdef::cl_khr_mipmap_image_writes[] |
| a| |
| [source,opencl_c] |
| ---- |
| void write_imagef( |
| write_only image2d_t image, |
| int2 coord, |
| int lod, |
| float4 color) |
| |
| void write_imagei( |
| write_only image2d_t image, |
| int2 coord, |
| int lod, |
| int4 color) |
| |
| void write_imageui( |
| write_only image2d_t image, |
| int2 coord, |
| int lod, |
| uint4 color) |
| |
| void write_imagef( |
| write_only image2d_depth_t image, |
| int2 coord, |
| int lod, |
| float depth) |
| ---- |
| | Write _color_ value to location specified by _coord.xy_ in the mip |
| level specified by _lod_ in the 2D image object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| |
| _lod_ must be in the range [0, _miplevels_-1]. |
| _coord.x_ and _coord.y_ are considered to be unnormalized coordinates |
| and must be in the range [0, _mipwidth_-1] and [0, _mipheight_-1] |
| respectively. |
| Behavior is undefined if _lod_, _coord.x_, or _coord.y_ is not in |
| range. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_mipmap_image_writes>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| void write_imagef( |
| write_only image1d_t image, |
| int coord, |
| int lod, |
| float4 color) |
| |
| void write_imagei( |
| write_only image1d_t image, |
| int coord, |
| int lod, |
| int4 color) |
| |
| void write_imageui( |
| write_only image1d_t image, |
| int coord, |
| int lod, |
| uint4 color) |
| ---- |
| | Write _color_ value to location specified by _coord_ in the mip level |
| specified by _lod_ in the 1D image object specified by _image_. |
| Appropriate data format conversion to the specified image format is |
| done before writing the color value. |
| |
| _lod_ must be in the range [0, _miplevels_-1]. |
| _coord_ is considered to be an unnormalized coordinate and must be in |
| the range [0, _mipwidth_-1]. |
| Behavior is undefined if _lod_ or _coord_ is not in range. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_mipmap_image_writes>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| void write_imagef( |
| write_only image1d_array_t image, |
| int2 coord, |
| int lod, |
| float4 color) |
| |
| void write_imagei( |
| write_only image1d_array_t image, |
| int2 coord, |
| int lod, |
| int4 color) |
| |
| void write_imageui( |
| write_only image1d_array_t image, |
| int2 coord, |
| int lod, |
| uint4 color) |
| ---- |
| | Write _color_ value to location specified by _coord.x_ in the 1D image |
| identified by _coord.y_ and mip level _lod_ in the 1D image array |
| specified by _image_. |
| Appropriate data format conversion to the specified image format is done |
| before writing the color value. |
| |
| _lod_ must be in the range [0, _miplevels_-1]. |
| _coord.x_ and _coord.y_ are considered to be unnormalized coordinates |
| and must be in the range [0, _mipwidth_-1] and [0, _miplayers_ -1] |
| respectively. |
| Behavior is undefined if _lod_, _coord.x_, or _coord.y_ is not in range. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_mipmap_image_writes>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| void write_imagef( |
| write_only image2d_array_t image, |
| int4 coord, |
| int lod, |
| float4 color) |
| |
| void write_imagei( |
| write_only image2d_array_t image, |
| int4 coord, |
| int lod, |
| int4 color) |
| |
| void write_imageui( |
| write_only image2d_array_t image, |
| int4 coord, |
| int lod, |
| uint4 color) |
| |
| void write_imagef( |
| write_only image2d_array_depth_t image, |
| int4 coord, |
| int lod, |
| float depth) |
| ---- |
| | Write _color_ value to location specified by _coord.xy_ in the 2D image |
| identified by _coord.z_ and mip level _lod_ in the 2D image array |
| specified by _image_. |
| Appropriate data format conversion to the specified image format is done |
| before writing the color value. |
| |
| _lod_ must be in the range [0, _miplevels_-1]. |
| _coord.x_, _coord.y_ and _coord.z_ are considered to be unnormalized |
| coordinates and must be in the range [0, _mipwidth_-1], [0, |
| _mipheight_-1], and [0, _miplayers_-1] respectively. |
| Behavior is undefined if |
| _lod_, _coord.x_, _coord.y_, or _coord.z_ is not in range. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_mipmap_image_writes>>` extension macro. |
| a| |
| [source,opencl_c] |
| ---- |
| void write_imagef( |
| write_only image3d_t image, |
| int4 coord, |
| int lod, |
| float4 color) |
| |
| void write_imagei( |
| write_only image3d_t image, |
| int4 coord, |
| int lod, |
| int4 color) |
| |
| void write_imageui( |
| write_only image3d_t image, |
| int4 coord, |
| int lod, |
| uint4 color) |
| ---- |
| | Write _color_ value to location specified by _coord.xyz_ and mip level |
| _lod_ in the 3D image object specified by _image_. |
| Appropriate data format conversion to the specified image format is done |
| before writing the color value. |
| |
| _lod_ must be in the range [0, _miplevels_-1]. |
| _coord.x_, _coord.y_ and _coord.z_ are considered to be unnormalized |
| coordinates and must be in the range [0, _mipwidth_-1], [0, |
| _mipheight_-1] and [0, _mipdepth_-1] respectively. |
| Behavior is undefined if _lod_, _coord.x_, _coord.y_, or _coord.z_ is |
| not in range. |
| |
| <<unified-spec, Requires>> support for the |
| `<<cl_khr_mipmap_image_writes>>` extension macro. |
| endif::cl_khr_mipmap_image_writes[] |
| |
| |==== |
| -- |
| |
| |
| [[built-in-image-query-functions]] |
| ==== Built-in Image Query Functions |
| |
| [open,refpage='imageQueryFunctions',desc='Built-in Image Query Functions',type='freeform',spec='clang',anchor='built-in-image-query-functions',xrefs='imageReadFunctions imageSamplerlessReadFunctions imageWriteFunctions',alias='get_image_width get_image_height get_image_depth get_image_channel_data_type get_image_channel_order get_image_dim get_image_array_size'] |
| -- |
| |
| The following built-in function calls to query image information are |
| supported. |
| |
| _aQual_ in the following table refers to one of the access qualifiers. |
| For query functions this may be `read_only`, `write_only` or `read_write`. |
| |
| [[table-image-query]] |
| .Built-in Image Query Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int *get_image_width*(_aQual_ image2d_t _image_) + |
| int *get_image_width*(_aQual_ image3d_t _image_) |
| |
| For OpenCL C 1.2 or newer: |
| |
| int *get_image_width*(_aQual_ image1d_t _image_) + |
| int *get_image_width*(_aQual_ image1d_buffer_t _image_) + |
| int *get_image_width*(_aQual_ image1d_array_t _image_) + |
| int *get_image_width*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| int *get_image_width*(_aQual_ image2d_depth_t _image_) + |
| int *get_image_width*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int *get_image_width*(_aQual_ image2d_msaa_t image) + |
| int *get_image_width*(_aQual_ image2d_array_msaa_t image) + |
| int *get_image_width*(_aQual_ image2d_msaa_depth_t image) + |
| int *get_image_width*(_aQual_ image2d_array_msaa_depth_t image) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the image width in pixels. |
| |
| | int *get_image_height*(_aQual_ image2d_t _image_) + |
| int *get_image_height*(_aQual_ image3d_t _image_) |
| |
| For OpenCL C 1.2 or newer: |
| |
| int *get_image_height*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| int *get_image_height*(_aQual_ image2d_depth_t _image_) + |
| int *get_image_height*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int *get_image_height*(_aQual_ image2d_msaa_t image) + |
| int *get_image_height*(_aQual_ image2d_array_msaa_t image) + |
| int *get_image_height*(_aQual_ image2d_msaa_depth_t image) + |
| int *get_image_height*(_aQual_ image2d_array_msaa_depth_t image) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the image height in pixels. |
| |
| | int *get_image_depth*(image3d_t _image_) |
| | Return the image depth in pixels. |
| | | |
| | int *get_image_channel_data_type*(_aQual_ image2d_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image3d_t _image_) |
| |
| For OpenCL C 1.2 or newer: |
| |
| int *get_image_channel_data_type*(_aQual_ image1d_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image1d_buffer_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image2d_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image3d_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image1d_array_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| int *get_image_channel_data_type*(_aQual_ image2d_depth_t _image_) + |
| int *get_image_channel_data_type*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int *get_image_channel_data_type*(_aQual_ image2d_msaa_t image) + |
| int *get_image_channel_data_type*(_aQual_ image2d_array_msaa_t image) + |
| int *get_image_channel_data_type*(_aQual_ image2d_msaa_depth_t image) + |
| int *get_image_channel_data_type*(_aQual_ image2d_array_msaa_depth_t image) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the channel data type. Valid values are: |
| |
| `CLK_SNORM_INT8` + |
| `CLK_SNORM_INT16` + |
| `CLK_UNORM_INT8` + |
| `CLK_UNORM_INT16` + |
| `CLK_UNORM_SHORT_565` + |
| `CLK_UNORM_SHORT_555` + |
| `CLK_UNORM_INT_101010` + |
| `CLK_SIGNED_INT8` + |
| `CLK_SIGNED_INT16` + |
| `CLK_SIGNED_INT32` + |
| `CLK_UNSIGNED_INT8` + |
| `CLK_UNSIGNED_INT16` + |
| `CLK_UNSIGNED_INT32` + |
| `CLK_HALF_FLOAT` + |
| `CLK_FLOAT` + |
| |
| Additionally, for OpenCL C 3.0 or newer: |
| |
| `CLK_UNORM_INT_101010_2` footnote:[{fn-CLK_UNORM_INT_101010_2}] |
| |
| | int *get_image_channel_order*(_aQual_ image2d_t _image_) + |
| int *get_image_channel_order*(_aQual_ image3d_t _image_) |
| |
| For OpenCL C 1.2 or newer: |
| |
| int *get_image_channel_order*(_aQual_ image1d_t _image_) + |
| int *get_image_channel_order*(_aQual_ image1d_buffer_t _image_) + |
| int *get_image_channel_order*(_aQual_ image1d_array_t _image_) + |
| int *get_image_channel_order*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| int *get_image_channel_order*(_aQual_ image2d_depth_t _image_) + |
| int *get_image_channel_order*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int *get_image_channel_order*(_aQual_ image2d_msaa_t image) + |
| int *get_image_channel_order*(_aQual_ image2d_array_msaa_t image) + |
| int *get_image_channel_order*(_aQual_ image2d_msaa_depth_t image) + |
| int *get_image_channel_order*(_aQual_ image2d_array_msaa_depth_t image) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the image channel order. Valid values are: |
| |
| `CLK_A` + |
| `CLK_R` + |
| `CLK_RG` + |
| `CLK_RA` + |
| `CLK_RGB` + |
| `CLK_RGBA` + |
| `CLK_ARGB` + |
| `CLK_BGRA` + |
| `CLK_INTENSITY` + |
| `CLK_LUMINANCE` |
| |
| Additionally, for OpenCL C 1.1 or newer: |
| |
| `CLK_Rx` + |
| `CLK_RGx` + |
| `CLK_RGBx` |
| |
| Additionally, for OpenCL C 2.0 or newer: |
| |
| `CLK_ABGR` + |
| `CLK_DEPTH` + |
| `CLK_sRGB` + |
| `CLK_sRGBx` + |
| `CLK_sRGBA` + |
| `CLK_sBGRA` |
| |
| | | |
| | int2 *get_image_dim*(_aQual_ image2d_t _image_) |
| |
| For OpenCL C 1.2 or newer: |
| |
| int2 *get_image_dim*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| int2 *get_image_dim*(_aQual_ image2d_depth_t _image_) + |
| int2 *get_image_dim*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int2 *get_image_dim*(_aQual_ image2d_msaa_t image) + |
| int2 *get_image_dim*(_aQual_ image2d_array_msaa_t image) + |
| int2 *get_image_dim*(_aQual_ image2d_msaa_depth_t image) + |
| int2 *get_image_dim*(_aQual_ image2d_array_msaa_depth_t image) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the 2D image width and height as an `int2` type. |
| The width is returned in the _x_ component, and the height in the _y_ |
| component. |
| |
| | int4 *get_image_dim*(_aQual_ image3d_t _image_) |
| | Return the 3D image width, height, and depth as an `int4` type. |
| The width is returned in the _x_ component, height in the _y_ |
| component, depth in the _z_ component and the _w_ component is 0. |
| | | |
| | For OpenCL C 1.2 or newer: |
| |
| size_t *get_image_array_size*(_aQual_ image2d_array_t _image_) |
| |
| For OpenCL C 2.0 or newer, or if the `<<cl_khr_depth_images>>` extension |
| macro is supported: |
| |
| size_t *get_image_array_size*(_aQual_ image2d_array_depth_t _image_) |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| size_t *get_image_array_size*(_aQual_ image2d_array_msaa_depth_t _image_) |
| endif::cl_khr_gl_msaa_sharing[] |
| | Return the number of images in the 2D image array. |
| |
| | For OpenCL C 1.2 or newer: |
| |
| size_t *get_image_array_size*(_aQual_ image1d_array_t _image_) |
| | Return the number of images in the 1D image array. |
| |
| ifdef::cl_khr_gl_msaa_sharing[] |
| | If the `<<cl_khr_gl_msaa_sharing>>` extension macro is supported: |
| |
| int *get_image_num_samples*(_aQual_ image2d_msaa_t _image_) + |
| int *get_image_num_samples*(_aQual_ image2d_array_msaa_t _image_) + |
| int *get_image_num_samples*(_aQual_ image2d_msaa_depth_t _image_) + |
| int *get_image_num_samples*(_aQual_ image2d_array_msaa_depth_t _image_) |
| | Return the number of samples in the 2D MSAA image |
| endif::cl_khr_gl_msaa_sharing[] |
| |
| ifdef::cl_khr_mipmap_image[] |
| | If the `<<cl_khr_mipmap_image>>` extension macro is supported: |
| |
| int *get_image_num_mip_levels*(_aQual_ image1d_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image2d_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image3d_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image1d_array_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image2d_array_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image2d_depth_t _image_) + |
| int *get_image_num_mip_levels*(_aQual_ image2d_array_depth_t _image_) |
| |
| | Return the number of mip levels in _image_. |
| endif::cl_khr_mipmap_image[] |
| |
| |==== |
| |
| The values returned by *get_image_channel_data_type* and |
| *get_image_channel_order* as specified in <<table-image-query,Built-in Image |
| Query Functions>> with the `CLK_` prefixes correspond to the `CL_` prefixes used |
| to describe the <<opencl-channel-order,image channel order>> and |
| <<opencl-channel-data-type,data type>> in the <<opencl-spec,OpenCL |
| Specification>>. |
| For example, both `CL_UNORM_INT8` and `CLK_UNORM_INT8` refer to an image |
| channel data type that is an unnormalized unsigned 8-bit integer. |
| -- |
| |
| |
| [[reading-and-writing-to-the-same-image-in-a-kernel]] |
| ==== Reading and Writing to the Same Image in a Kernel |
| |
| The *atomic_work_item_fence*(`CLK_IMAGE_MEM_FENCE`) built-in function can be |
| used to make sure that sampler-less writes are visible to later reads by the |
| same work-item. |
| Only a scope of `memory_scope_work_item` and an order of |
| `memory_order_acq_rel` is valid for `atomic_work_item_fence` when passed the |
| `CLK_IMAGE_MEM_FENCE` flag. |
| If multiple work-items are writing to and reading from multiple locations in |
| an image, the *work_group_barrier*(`CLK_IMAGE_MEM_FENCE`) should be used. |
| |
| Consider the following example: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| foo(read_write image2d_t img, ... ) |
| { |
| int2 coord; |
| coord.x = (int)get_global_id(0); |
| coord.y = (int)get_global_id(1); |
| |
| float4 clr = read_imagef(img, coord); |
| ... |
| write_imagef(img, coord, clr); |
| |
| // required to ensure that following read from image at |
| // location coord returns the latest color value. |
| atomic_work_item_fence( |
| CLK_IMAGE_MEM_FENCE, |
| memory_order_acq_rel, |
| memory_scope_work_item); |
| |
| float4 clr_new = read_imagef(img, coord); |
| ... |
| |
| } |
| ---------- |
| |
| |
| [[mapping-image-channels-to-color-values-returned-by-read_image-and-color-values-passed-to-write_image-to-image-channels]] |
| ==== Mapping Image Channels to Color Values Returned by read_image and Color Values Passed to write_image to Image Channels |
| |
| The following table describes the mapping of the number of channels of an |
| image element to the appropriate components in the `float4`, `int4` or |
| `uint4` vector data type for the color values returned by |
| *read_image{f|i|ui}* or supplied to *write_image{f|i|ui}*. |
| The unmapped components will be set to 0.0 for red, green and blue channels |
| and will be set to 1.0 for the alpha channel. |
| |
| [cols=",",options="header",] |
| |==== |
| | Channel Order | `float4`, `int4` or `uint4` components of channel data |
| | `CL_R`, `CL_Rx` | (r, 0.0, 0.0, 1.0) |
| | `CL_A` | (0.0, 0.0, 0.0, a) |
| | `CL_RG`, `CL_RGx` | (r, g, 0.0, 1.0) |
| | `CL_RA` | (r, 0.0, 0.0, a) |
| | `CL_RGB`, `CL_RGBx`, `CL_sRGB`, `CL_sRGBx` |
| | (r, g, b, 1.0) |
| | `CL_RGBA`, `CL_BGRA`, `CL_ARGB`, `CL_ABGR`, `CL_sRGBA`, `CL_sBGRA` |
| | (r, g, b, a) |
| | `CL_INTENSITY` | (I, I, I, I) |
| | `CL_LUMINANCE` | (L, L, L, 1.0) |
| |==== |
| |
| For `CL_DEPTH` images, a scalar value is returned by *read_imagef* or |
| supplied to *write_imagef*. |
| <<unified-spec, Requires>> support for OpenCL C 2.0 or newer, or for |
| the `<<cl_khr_depth_images>>` extension macro. |
| |
| [NOTE] |
| ==== |
| A kernel that uses a sampler with the `CL_ADDRESS_CLAMP` addressing mode |
| with multiple images may result in additional samplers being used internally |
| by an implementation. |
| If the same sampler is used with multiple images called via |
| *read_image{f|i|ui}*, then it is possible that an implementation may need to |
| allocate an additional sampler to handle the different border color values |
| that may be needed depending on the image formats being used. |
| These implementation allocated samplers will count against the maximum |
| sampler values supported by the device and given by |
| `CL_DEVICE_MAX_SAMPLERS`. |
| Enqueuing a kernel that requires more samplers than the implementation can |
| support will result in a `CL_OUT_OF_RESOURCES` error being returned. |
| ==== |
| |
| |
| [[work-group-functions]] |
| === Work-group Collective Functions |
| |
| [open,refpage='workGroupFunctions',desc='Work-group Collective Functions',type='freeform',spec='clang',anchor='work-group-functions',xrefs='',alias='work_group_all work_group_any work_group_broadcast work_group_reduce work_group_scan_exclusive work_group_scan_inclusive'] |
| -- |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_work_group_collective_functions} feature. |
| |
| This section describes built-in functions that perform collective options |
| across a work-group. |
| These built-in functions must be encountered by all work-items in a |
| work-group executing the kernel. |
| We use the generic type name `gentype` to indicate the built-in data types |
| `half` footnote:[{fn-half-supported}], `int`, `uint`, `long` |
| footnote:[{fn-int64-supported}], `ulong`, `float` or `double` |
| footnote:[{fn-double-supported}] as the type for the arguments. |
| |
| [[table-builtin-work-group]] |
| .Built-in Work-group Collective Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int *work_group_all*(int _predicate_) |
| | Evaluates _predicate_ for all work-items in the work-group and returns |
| a non-zero value if _predicate_ evaluates to non-zero for all |
| work-items in the work-group. |
| | int *work_group_any*(int _predicate_) |
| | Evaluates _predicate_ for all work-items in the work-group and returns |
| a non-zero value if _predicate_ evaluates to non-zero for any |
| work-items in the work-group. |
| | gentype *work_group_broadcast*(gentype _a_, size_t _local_id_) + |
| gentype *work_group_broadcast*(gentype _a_, size_t _local_id_x_, |
| size_t _local_id_y_) + |
| gentype *work_group_broadcast*(gentype _a_, size_t _local_id_x_, |
| size_t _local_id_y_, size_t _local_id_z_) |
| | Broadcast the value of _a_ for work-item identified by _local_id_ to |
| all work-items in the work-group. |
| |
| Behavior is undefined when the value of _local_id_ is not equivalent for |
| all work-items in the work-group. |
| |
| Behavior is undefined when _local_id_ is greater or equal to the |
| work-group size in the corresponding dimension. |
| | gentype *work_group_reduce_<op>*(gentype _x_) |
| | Return result of reduction operation specified by *<op>* for all |
| values of _x_ specified by work-items in a work-group. |
| | gentype *work_group_scan_exclusive_<op>*(gentype _x_) |
| | Do an exclusive scan operation specified by *<op>* of all values |
| specified by work-items in the work-group. The scan results are |
| returned for each work-item. |
| |
| The scan order is defined by increasing 1D linear global ID within the |
| work-group. |
| | gentype *work_group_scan_inclusive_<op>*(gentype _x_) |
| | Do an inclusive scan operation specified by *<op>* of all values |
| specified by work-items in the work-group. The scan results are |
| returned for each work-item. |
| |
| The scan order is defined by increasing 1D linear global ID within the |
| work-group. |
| |==== |
| |
| The *<op>* in *work_group_reduce_<op>*, *work_group_scan_exclusive_<op>* and |
| *work_group_scan_inclusive_<op>* defines the operator and can be *add*, |
| *min* or *max*. |
| |
| The inclusive scan operation takes a binary operator *op* with _n_ (where _n_ |
| is the size of the work-group) elements [a~0~, a~1~, ... a~n-1~] and returns |
| [a~0~, (a~0~ *op* a~1~), ... (a~0~ *op* a~1~ *op* ... *op* a~n-1~)]. |
| |
| Consider the following example: |
| |
| [source,opencl_c] |
| ---------- |
| void foo(int *p) |
| { |
| ... |
| int prefix_sum_val = work_group_scan_inclusive_add( |
| p[get_local_id(0)]); |
| } |
| ---------- |
| |
| For the example above, let's assume that the work-group size is 8 and _p_ |
| points to the following elements [3 1 7 0 4 1 6 3]. |
| Work-item 0 calls *work_group_scan_inclusive_add* with 3 and returns 3. |
| Work-item 1 calls *work_group_scan_inclusive_add* with 1 and returns 4. |
| The full set of values returned by *work_group_scan_inclusive_add* for |
| work-items 0 ... 7 are [3 4 11 11 15 16 22 25]. |
| |
| The exclusive scan operation takes a binary associative operator *op* with |
| an identity I and n (where n is the size of the work-group) elements [a~0~, |
| a~1~, ... a~n-1~] and returns [I, a~0~, (a~0~ *op* a~1~), ... (a~0~ *op* |
| a~1~ *op* ... *op* a~n-2~)]. |
| If *op* = add, the identity I is 0. |
| If *op* = min, the identity I is `INT_MAX`, `UINT_MAX`, `LONG_MAX`, |
| `ULONG_MAX`, for `int`, `uint`, `long`, `ulong` types and is `+INF` for |
| floating-point types. |
| Similarly if *op* = max, the identity I is `INT_MIN`, 0, `LONG_MIN`, 0 and |
| `-INF`. |
| For the example above, the exclusive scan add operation on the ordered set |
| [3 1 7 0 4 1 6 3] would return [0 3 4 11 11 15 16 22]. |
| |
| [NOTE] |
| ==== |
| The order of floating-point operations is not guaranteed for the |
| *work_group_reduce_<op>*, *work_group_scan_inclusive_<op>* and |
| *work_group_scan_exclusive_<op>* built-in functions that operate on `half`, |
| `float` and `double` data types. |
| The order of these floating-point operations is also non-deterministic for a |
| given work-group. |
| ==== |
| -- |
| |
| |
| ifdef::cl_khr_work_group_uniform_arithmetic[] |
| [[work-group-collective-uniform-arithmetic-functions]] |
| === Work-group Collective Uniform Arithmetic Functions |
| |
| [open,refpage='workGroupUniformArithmeticFunctions',desc='Work-group Collective Uniform Arithmetic Functions',type='freeform',spec='clang',anchor='work-group-collective-uniform-arithmetic-functions',xrefs='workGroupFunctions',alias='work_group_all work_group_any work_group_broadcast work_group_reduce work_group_scan_exclusive work_group_scan_inclusive'] |
| -- |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0 and the `<<cl_khr_work_group_uniform_arithmetic>>` |
| extension macro. |
| |
| The <<table-builtin-work-group-logical>> table describes the OpenCL C |
| programming language built-in functions that perform logical arithmetic |
| operations across work items in a work-group. |
| These functions must be encountered by all work items in a work-group |
| executing the kernel, otherwise the behavior is undefined. |
| For these functions, a non-zero _predicate_ argument or return value is |
| logically `true` and a zero _predicate_ argument or return value is |
| logically `false`. |
| |
| [[table-builtin-work-group-logical]] |
| .Built-in Work-group Logical Arithmetic Functions |
| [cols="2a,1",options="header"] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| int work_group_reduce_logical_and(int predicate); |
| int work_group_reduce_logical_or(int predicate); |
| int work_group_reduce_logical_xor(int predicate); |
| ---- |
| | Returns the logical *and*, *or*, or *xor* of _predicate_ for all work |
| items in the work-group. |
| |[source,opencl_c] |
| ---- |
| int work_group_scan_inclusive_logical_and(int predicate); |
| int work_group_scan_inclusive_logical_or(int predicate); |
| int work_group_scan_inclusive_logical_xor(int predicate); |
| ---- |
| | Returns the result of an inclusive scan operation, which is the logical |
| *and*, *or*, or *xor* of _predicate_ for all work items in the |
| work-group with a work-group linear local ID less than or equal to this |
| work item's work-group linear local ID. |
| |[source,opencl_c] |
| ---- |
| int work_group_scan_exclusive_logical_and(int predicate); |
| int work_group_scan_exclusive_logical_or(int predicate); |
| int work_group_scan_exclusive_logical_xor(int predicate); |
| ---- |
| | Returns the result of an exclusive scan operation, which is the logical |
| *and*, *or*, or *xor* of _predicate_ for all work items in the |
| work-group with a work-group linear local ID less than this work item's |
| work-group linear local ID. |
| |
| If there is no work item in the work-group with a work-group linear |
| local ID less than this work item's work-group linear local ID then an |
| identity value `I` is returned. |
| For *and*, the identity value is `true` (non-zero). |
| For *or* and *xor*, the identity value is `false` (zero). |
| |==== |
| |
| The <<table-builtin-work-group-bitwise-integer>> table describes the OpenCL |
| C programming language built-in functions that perform bitwise integer |
| operations across work items in a work-group. |
| These functions must be encountered by all work items in a work-group |
| executing the kernel, otherwise the behavior is undefined. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `int`, `uint`, `long`, and `ulong`. |
| |
| [[table-builtin-work-group-bitwise-integer]] |
| .Built-in Work-group Bitwise Integer Functions |
| [cols="2a,1",options="header"] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype work_group_reduce_and(gentype value); |
| gentype work_group_reduce_or(gentype value); |
| gentype work_group_reduce_xor(gentype value); |
| ---- |
| | Returns the bitwise *and*, *or*, or *xor* of _value_ for all work items |
| in the work-group. |
| |[source,opencl_c] |
| ---- |
| gentype work_group_scan_inclusive_and(gentype value); |
| gentype work_group_scan_inclusive_or(gentype value); |
| gentype work_group_scan_inclusive_xor(gentype value); |
| ---- |
| | Returns the result of an inclusive scan operation, which is the bitwise |
| *and*, *or*, or *xor* of _value_ for all work items in the work-group |
| with a work-group linear local ID less than or equal to this work item's |
| work-group linear local ID. |
| |[source,opencl_c] |
| ---- |
| gentype work_group_scan_exclusive_and(gentype value); |
| gentype work_group_scan_exclusive_or(gentype value); |
| gentype work_group_scan_exclusive_xor(gentype value); |
| ---- |
| | Returns the result of an exclusive scan operation, which is the bitwise |
| *and*, *or*, or *xor* of _value_ for all work items in the work-group |
| with a work-group linear local ID less than this work item's work-group |
| linear local ID. |
| |
| If there is no work item in the work-group with a work-group linear |
| local ID less than this work item's work-group linear local ID then an |
| identity value `I` is returned. |
| For *and*, the identity value is `~0` (all bits set). |
| For *or* and *xor*, the identity value is `0`. |
| |==== |
| |
| The <<table-builtin-work-group-multiplicative>> table describes the OpenCL C |
| programming language built-in functions that perform multiplicative |
| operations across work items in a work-group. |
| These functions must be encountered by all work items in a work-group |
| executing the kernel, otherwise the behavior is undefined. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `int`, `uint`, `long`, `ulong`, |
| `float`, `double` (if double precision is supported), or `half` (if half |
| precision is supported). |
| |
| [[table-builtin-work-group-multiplicative]] |
| .Built-in Work-group Multiplicative Functions |
| [cols="2a,1",options="header"] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype work_group_reduce_mul(gentype value); |
| ---- |
| | Returns the multiplication of _value_ for all work items in the |
| work-group. |
| |[source,opencl_c] |
| ---- |
| gentype work_group_scan_inclusive_mul(gentype value); |
| ---- |
| | Returns the result of an inclusive scan operation which is the |
| multiplication of _value_ for all work items in the work-group with a |
| work-group linear local ID less than or equal to this work item's |
| work-group linear local ID. |
| |[source,opencl_c] |
| ---- |
| gentype work_group_scan_exclusive_mul(gentype value); |
| ---- |
| | Returns the result of an exclusive scan operation which is the |
| multiplication of _value_ for all work items in the work-group with a |
| work-group linear local ID less than this work item's work-group linear |
| local ID. |
| |
| If there is no work item in the work-group with a work-group linear |
| local ID less than this work item's work-group linear local ID then the |
| identity value `1` is returned. |
| |==== |
| -- |
| endif::cl_khr_work_group_uniform_arithmetic[] |
| |
| |
| [[pipe-functions]] |
| === Pipe Functions |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the {opencl_c_pipes} feature. |
| |
| A pipe is identified by specifying the `pipe` keyword with a type. |
| The data type specifies the type of each element in the pipe. |
| The `pipe` keyword is a type specifier. |
| When it is applied to another type *T*, the result is a pipe type whose |
| elements (or packets) are of type *T*. |
| The packet type *T* may be any supported OpenCL C scalar and vector integer |
| or floating-point data types, or a user-defined type built from these scalar |
| and vector data types. |
| |
| Examples: |
| |
| [source,opencl_c] |
| ---------- |
| pipe int4 pipeA; // a pipe with int4 packets |
| |
| pipe user_type_t pipeB; // a pipe with user_type_t packets |
| ---------- |
| |
| The `read_only` (or `{read_only}`) and `write_only` (or `{write_only}`) |
| qualifiers must be used with the `pipe` specifier when a pipe is a parameter |
| of a kernel or of a user-defined function to identify if a pipe can be read |
| from or written to by a kernel and its callees and enqueued child kernels. |
| If no qualifier is specified, `read_only` is assumed. |
| |
| A kernel cannot read from and write to the same pipe object. |
| Using the `read_write` (or `{read_write}`) qualifier with the `pipe` |
| specifier is a compilation error. |
| |
| In the following example |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| foo (read_only pipe fooA_t pipeA, |
| write_only pipe fooB_t pipeB) |
| { |
| ... |
| } |
| ---------- |
| |
| `pipeA` is a read-only pipe object, and `pipeB` is a write-only pipe object. |
| |
| The macro `CLK_NULL_RESERVE_ID` refers to an invalid reservation ID. |
| |
| |
| [[restrictions-3]] |
| ==== Restrictions |
| |
| * Pipes can only be passed as arguments to a function (including kernel |
| functions). |
| The <<operators,C operators>> cannot be used with variables declared |
| with the pipe specifier. |
| * The `pipe` specifier cannot be used with variables declared inside a |
| kernel, a structure or union field, a pointer type, an array, global |
| variables declared in program scope or the return type of a function. |
| |
| |
| [[built-in-pipe-read-and-write-functions]] |
| ==== Built-in Pipe Read and Write Functions |
| |
| [open,refpage='pipeFunctions',desc='Built-in Pipe Read and Write Functions',type='freeform',spec='clang',anchor='built-in-pipe-read-and-write-functions',xrefs='pipeWorkgroupFunctions pipeQueryFunctions',alias='commit_read_pipe commit_write_pipe is_valid_reserve_id read_pipe reserve_read_pipe reserve_write_pipe write_pipe'] |
| -- |
| |
| The OpenCL C programming language implements the following built-in |
| functions that read from or write to a pipe. |
| We use the generic type name `gentype` to indicate the built-in OpenCL C scalar |
| or vector integer or floating-point data types |
| footnote:[{fn-float-types-supported}] or any user defined type built from these |
| scalar and vector data types can be used as the type for the arguments to the |
| pipe functions listed in the following table. |
| |
| [[table-builtin-pipe]] |
| .Built-in Pipe Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | int *read_pipe*(read_only pipe gentype _p_, gentype *_ptr_) |
| | Read packet from pipe _p_ into _ptr_. |
| Returns 0 if *read_pipe* is successful and a negative value if the |
| pipe is empty. |
| | | |
| | int *write_pipe*(write_only pipe gentype _p_, const gentype *_ptr_) |
| | Write packet specified by _ptr_ to pipe _p_. |
| Returns 0 if *write_pipe* is successful and a negative value if the |
| pipe is full. |
| | int *read_pipe*(read_only pipe gentype _p_, reserve_id_t _reserve_id_, |
| uint _index_, gentype *_ptr_) |
| | Read packet from the reserved area of the pipe referred to by |
| _reserve_id_ and _index_ into _ptr_. |
| |
| The reserved pipe entries are referred to by indices that go from 0 |
| ... _num_packets_ - 1. |
| |
| Returns 0 if *read_pipe* is successful and a negative value otherwise. |
| | int *write_pipe*(write_only pipe gentype _p_, reserve_id_t |
| _reserve_id_, uint _index_, const gentype *_ptr_) |
| | Write packet specified by _ptr_ to the reserved area of the pipe |
| referred to by _reserve_id_ and _index_. |
| |
| The reserved pipe entries are referred to by indices that go from 0 |
| ... _num_packets_ - 1. |
| |
| Returns 0 if *write_pipe* is successful and a negative value |
| otherwise. |
| | | |
| | reserve_id_t *reserve_read_pipe*(read_only pipe gentype _p_, |
| uint _num_packets_) + |
| reserve_id_t *reserve_write_pipe*(write_only pipe gentype _p_, |
| uint _num_packets_) |
| | Reserve _num_packets_ entries for reading from or writing to pipe _p_. |
| Returns a valid reservation ID if the reservation is successful. |
| | void *commit_read_pipe*(read_only pipe gentype _p_, |
| reserve_id_t _reserve_id_) + |
| void *commit_write_pipe*(write_only pipe gentype _p_, |
| reserve_id_t _reserve_id_) |
| | Indicates that all reads and writes to _num_packets_ associated with |
| reservation _reserve_id_ are completed. |
| | bool *is_valid_reserve_id*(reserve_id_t _reserve_id_) |
| | Return _true_ if _reserve_id_ is a valid reservation ID and _false_ |
| otherwise. |
| |==== |
| -- |
| |
| |
| [[built-in-work-group-pipe-read-and-write-functions]] |
| ==== Built-in Work-group Pipe Read and Write Functions |
| |
| [open,refpage='pipeWorkgroupFunctions',desc='Built-in Work-group Pipe Read and Write Functions',type='freeform',spec='clang',anchor='built-in-work-group-pipe-read-and-write-functions',xrefs='pipeFunctions pipeQueryFunctions',alias='work_group_commit_read_pipe work_group_commit_write_pipe work_group_reserve_read_pipe work_group_reserve_write_pipe'] |
| -- |
| |
| The OpenCL C programming language implements the following built-in pipe |
| functions that operate at a work-group level. |
| These built-in functions must be encountered by all work-items in a |
| work-group executing the kernel with the same argument values; otherwise the |
| behavior is undefined. |
| We use the generic type name `gentype` to indicate the built-in OpenCL C scalar |
| or vector integer or floating-point data types |
| footnote:[{fn-float-types-supported}] or any user defined type built from these |
| scalar and vector data types can be used as the type for the arguments to the |
| pipe functions listed in the following table. |
| |
| [[table-builtin-pipe-work-group]] |
| .Built-in Pipe Work-group Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | reserve_id_t *work_group_reserve_read_pipe*(read_only pipe gentype _p_, |
| uint _num_packets_) + |
| reserve_id_t *work_group_reserve_write_pipe*(write_only pipe gentype _p_, |
| uint _num_packets_) |
| | Reserve _num_packets_ entries for reading from or writing to pipe _p_. |
| Returns a valid reservation ID if the reservation is successful. |
| |
| The reserved pipe entries are referred to by indices that go from 0 |
| ... _num_packets_ - 1. |
| | void *work_group_commit_read_pipe*(read_only pipe gentype _p_, |
| reserve_id_t _reserve_id_) |
| void *work_group_commit_write_pipe*(write_only pipe gentype _p_, |
| reserve_id_t _reserve_id_) |
| | Indicates that all reads and writes to _num_packets_ associated with |
| reservation _reserve_id_ are completed. |
| |==== |
| |
| [NOTE] |
| ==== |
| The *read_pipe* and *write_pipe* functions that take a reservation ID as an |
| argument can be used to read from or write to a packet index. |
| These built-ins can be used to read from or write to a packet index one or |
| multiple times. |
| If a packet index that is reserved for writing is not written to using the |
| *write_pipe* function, the contents of that packet in the pipe are |
| undefined. |
| *commit_read_pipe* and *work_group_commit_read_pipe* remove the entries |
| reserved for reading from the pipe. |
| *commit_write_pipe* and *work_group_commit_write_pipe* ensures that the |
| entries reserved for writing are all added in-order as one contiguous set of |
| packets to the pipe. |
| ==== |
| |
| There can only be the value of the <<opencl-device-queries, |
| `CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS` device query>> reservations active |
| (i.e. reservation IDs that have been reserved but not committed) per |
| work-item or work-group for a pipe in a kernel executing on a device. |
| |
| Work-item based reservations made by a work-item are ordered in the pipe as |
| they are ordered in the program. |
| Reservations made by different work-items that belong to the same work-group |
| can be ordered using the work-group barrier function. |
| The order of work-item based reservations that belong to different |
| work-groups is implementation-defined. |
| |
| Work-group based reservations made by a work-group are ordered in the pipe |
| as they are ordered in the program. |
| The order of work-group based reservations by different work-groups is |
| implementation-defined. |
| -- |
| |
| |
| [[built-in-pipe-query-functions]] |
| ==== Built-in Pipe Query Functions |
| |
| [open,refpage='pipeQueryFunctions',desc='Built-in Pipe Query Functions',type='freeform',spec='clang',anchor='built-in-pipe-query-functions',xrefs='pipeFunctions pipeWorkgroupFunctions',alias='get_pipe_max_packets get_pipe_num_packets'] |
| -- |
| |
| The OpenCL C programming language implements the following built-in query |
| functions for a pipe. |
| We use the generic type name `gentype` to indicate the built-in OpenCL C scalar |
| or vector integer or floating-point data types |
| footnote:[{fn-float-types-supported}] or any user defined type built from these |
| scalar and vector data types can be used as the type for the arguments to the |
| pipe functions listed in the following table. |
| |
| _aQual_ in the following table refers to one of the access qualifiers. |
| For pipe query functions this may be `read_only` or `write_only`. |
| |
| [[table-builtin-pipe-query]] |
| .Built-in Pipe Query Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| | uint *get_pipe_num_packets*(_aQual_ pipe gentype _p_) |
| | Returns the number of available entries in the pipe. |
| The number of available entries in a pipe is a dynamic value. |
| The value returned should be considered immediately stale. |
| | uint *get_pipe_max_packets*(_aQual_ pipe gentype _p_) |
| | Returns the maximum number of packets specified when _pipe_ was |
| created. |
| |==== |
| -- |
| |
| |
| [[restrictions-4]] |
| ==== Restrictions |
| |
| The following behavior is undefined: |
| |
| * A kernel fails to call *reserve_pipe* before calling *read_pipe* or |
| *write_pipe* that take a reservation ID. |
| * A kernel calls *read_pipe*, *write_pipe*, *commit_read_pipe* or |
| *commit_write_pipe* with an invalid reservation ID. |
| * A kernel calls *read_pipe* or *write_pipe* with an valid reservation ID |
| but with an _index_ that is not a value in the range [0, |
| _num_packets_-1] specified to the corresponding call to _reserve_pipe_. |
| * A kernel calls *read_pipe* or *write_pipe* with a reservation ID that |
| has already been committed (i.e. a *commit_read_pipe* or |
| *commit_write_pipe* with this reservation ID has already been called). |
| * A kernel fails to call *commit_read_pipe* for any reservation ID |
| obtained by a prior call to *reserve_read_pipe*. |
| * A kernel fails to call *commit_write_pipe* for any reservation ID |
| obtained by a prior call to *reserve_write_pipe*. |
| * The contents of the reserved data packets in the pipe are undefined if |
| the kernel does not call *write_pipe* for all entries that were reserved |
| by the corresponding call to *reserve_pipe*. |
| * Calls to *read_pipe* that takes a reservation ID and *commit_read_pipe* |
| or *write_pipe* that takes a reservation ID and *commit_write_pipe* for |
| a given reservation ID must be called by the same kernel that made the |
| reservation using *reserve_read_pipe* or *reserve_write_pipe*. |
| The reservation ID cannot be passed to another kernel including child |
| kernels. |
| |
| |
| [[enqueuing-kernels]] |
| === Enqueuing Kernels |
| |
| [open,refpage='enqueue_kernel',desc='Enqueuing Kernels',type='freeform',spec='clang',anchor='enqueuing-kernels',xrefs='enqueue_marker'] |
| -- |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for OpenCL C 2.0, or OpenCL C 3.0 or newer and the |
| {opencl_c_device_enqueue} feature. |
| |
| This section describes built-in functions that allow a kernel to |
| enqueue additional work to the same device, without host interaction. |
| A kernel may enqueue code represented by Block syntax, and control execution |
| order with event dependencies including user events and markers. |
| There are several advantages to using the Block syntax: it is more compact; |
| it does not require a cl_kernel object; and enqueuing can be done as a |
| single semantic step. |
| |
| The following table describes the list of built-in functions that can be |
| used to enqueue a kernel(s). |
| |
| ifdef::cl_khr_device_enqueue_local_arg_types[] |
| When the `<<cl_khr_device_enqueue_local_arg_types>>` extension macro is |
| supported, the <<table-builtin-kernel-enqueue, Built-in Kernel Enqueue |
| Functions>> and <<table-builtin-kernel-query, Built-in Kernel Query |
| Functions>> described in this section can use any of the built-in OpenCL C |
| scalar or vector integer or floating-point data types, or any user defined |
| type built from these scalar and vector data types, as the pointee type of |
| their arguments. |
| This is indicated by the generic type name `gentype` in those function |
| signatures. |
| |
| When the `<<cl_khr_device_enqueue_local_arg_types>>` extension macro is |
| not supported, the pointee type of these functions must be `void`. |
| |
| :localArgType: gentype |
| endif::cl_khr_device_enqueue_local_arg_types[] |
| |
| ifndef::cl_khr_device_enqueue_local_arg_types[] |
| :localArgType: void |
| endif::cl_khr_device_enqueue_local_arg_types[] |
| |
| The macro `CLK_NULL_EVENT` refers to an invalid device event. |
| The macro `CLK_NULL_QUEUE` refers to an invalid device queue. |
| -- |
| |
| |
| [[built-in-functions-enqueuing-a-kernel]] |
| ==== Built-in Functions - Enqueuing a Kernel |
| |
| [[table-builtin-kernel-enqueue]] |
| .Built-in Kernel Enqueue Functions |
| [cols=",",options="header",] |
| |==== |
| | Built-in Function | Description |
| |
| | int **enqueue_kernel**(queue_t _queue_, kernel_enqueue_flags_t _flags_, |
| const ndrange_t _ndrange_, void (^__block__)(void)) + |
| int **enqueue_kernel**(queue_t _queue_, kernel_enqueue_flags_t _flags_, |
| const ndrange_t _ndrange_, uint _num_events_in_wait_list_, |
| const clk_event_t *_event_wait_list_, clk_event_t *_event_ret_, |
| void (^__block__)(void)) + |
| int **enqueue_kernel**(queue_t _queue_, kernel_enqueue_flags_t _flags_, |
| const ndrange_t _ndrange_, void (^__block__)(local {localArgType} *, ...), |
| uint size0, ...) + |
| int **enqueue_kernel**(queue_t _queue_, kernel_enqueue_flags_t _flags_, |
| const ndrange_t _ndrange_, uint _num_events_in_wait_list_, |
| const clk_event_t *_event_wait_list_, clk_event_t *_event_ret_, |
| void (^__block__)(local {localArgType} *, ...), uint size0, ...) |
| | Enqueue the block for execution to _queue_. |
| |
| If an event is returned, *enqueue_kernel* performs an implicit retain |
| on the returned event. |
| |==== |
| |
| The *enqueue_kernel* built-in function allows a work-item to enqueue a |
| block. |
| Work-items can enqueue multiple blocks to a device queue(s). |
| |
| The *enqueue_kernel* built-in function returns `CLK_SUCCESS` if the block is |
| enqueued successfully and returns `CLK_ENQUEUE_FAILURE` otherwise. |
| If the -g compile option is specified in compiler options passed to |
| *clCompileProgram* or *clBuildProgram* when compiling or building the parent |
| program, the following errors may be returned instead of |
| `CLK_ENQUEUE_FAILURE` to indicate why *enqueue_kernel* failed to enqueue the |
| block: |
| |
| * `CLK_INVALID_QUEUE` if _queue_ is not a valid device queue. |
| * `CLK_INVALID_NDRANGE` if _ndrange_ is not a valid ND-range descriptor or |
| if the program was compiled with `-cl-uniform-work-group-size` and the |
| _local_work_size_ is specified in _ndrange_ but the _global_work_size_ |
| specified in _ndrange_ is not a multiple of the _local_work_size_. |
| * `CLK_INVALID_EVENT_WAIT_LIST` if _event_wait_list_ is `NULL` and |
| _num_events_in_wait_list_ > 0, or if _event_wait_list_ is not `NULL` and |
| _num_events_in_wait_list_ is 0, or if event objects in _event_wait_list_ |
| are not valid events. |
| * `CLK_DEVICE_QUEUE_FULL` if _queue_ is full. |
| * `CLK_INVALID_ARG_SIZE` if size of local memory arguments is 0. |
| * `CLK_EVENT_ALLOCATION_FAILURE` if _event_ret_ is not `NULL` and an event |
| could not be allocated. |
| * `CLK_OUT_OF_RESOURCES` if there is a failure to queue the block in |
| _queue_ because of insufficient resources needed to execute the kernel. |
| |
| Below are some examples of how to enqueue a block. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| my_func_A(global int *a, global int *b, global int *c) |
| { |
| ... |
| } |
| |
| kernel void |
| my_func_B(global int *a, global int *b, global int *c) |
| { |
| ndrange_t ndrange; |
| // build ndrange information |
| ... |
| |
| // example - enqueue a kernel as a block |
| enqueue_kernel(get_default_queue(), ndrange, |
| ^{my_func_A(a, b, c);}); |
| |
| ... |
| } |
| |
| kernel void |
| my_func_C(global int *a, global int *b, global int *c) |
| { |
| ndrange_t ndrange; |
| // build ndrange information |
| ... |
| |
| // note that a, b and c are variables in scope of |
| // the block |
| void (^my_block_A)(void) = ^{my_func_A(a, b, c);}; |
| |
| // enqueue the block variable |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| my_block_A); |
| ... |
| } |
| ---------- |
| |
| The example below shows how to declare a block literal and enqueue it. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| my_func(global int *a, global int *b) |
| { |
| ndrange_t ndrange; |
| // build ndrange information |
| ... |
| |
| // note that a, b and c are variables in scope of |
| // the block |
| void (^my_block_A)(void) = |
| ^{ |
| size_t id = get_global_id(0); |
| b[id] += a[id]; |
| }; |
| |
| // enqueue the block variable |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| my_block_A); |
| |
| // or we could have done the following |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| ^{ |
| size_t id = get_global_id(0); |
| b[id] += a[id]; |
| }; |
| } |
| ---------- |
| |
| [NOTE] |
| ==== |
| Blocks passed to enqueue_kernel cannot use global variables or stack |
| variables local to the enclosing lexical scope that are a pointer type in |
| the `local` or `private` address space. |
| ==== |
| |
| Example: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| foo(global int *a, local int *lptr, ...) |
| { |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| ^{ |
| size_t id = get_global_id(0); |
| local int *p = lptr; // undefined behavior |
| } ); |
| } |
| ---------- |
| |
| |
| [[arguments-that-are-a-pointer-type-to-local-address-space]] |
| ==== Arguments That are a Pointer Type to Local Address Space |
| |
| A block passed to enqueue_kernel can have arguments declared to be a pointer |
| to `local` memory. |
| The enqueue_kernel built-in function variants allow blocks to be enqueued |
| with a variable number of arguments. |
| Each argument must be declared to be a `void` pointer to local memory. |
| These enqueue_kernel built-in function variants also have a corresponding |
| number of arguments each of type `uint` that follow the block argument. |
| These arguments specify the size of each local memory pointer argument of |
| the enqueued block. |
| |
| Some examples follow: |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| my_func_A_local_arg1(global int *a, local int *lptr, ...) |
| { |
| ... |
| } |
| |
| kernel void |
| my_func_A_local_arg2(global int *a, |
| local int *lptr1, local float4 *lptr2, ...) |
| { |
| ... |
| } |
| |
| kernel void |
| my_func_B(global int *a, ...) |
| { |
| ... |
| |
| ndrange_t ndrange = ndrange_1D(...); |
| |
| uint local_mem_size = compute_local_mem_size(); |
| |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| ^(local void *p){ |
| my_func_A_local_arg1(a, (local int *)p, ...);}, |
| local_mem_size); |
| } |
| |
| kernel void |
| my_func_C(global int *a, ...) |
| { |
| ... |
| ndrange_t ndrange = ndrange_1D(...); |
| |
| void (^my_blk_A)(local void *, local void *) = |
| ^(local void *lptr1, local void *lptr2){ |
| my_func_A_local_arg2( |
| a, |
| (local int *)lptr1, |
| (local float4 *)lptr2, ...);}; |
| |
| // calculate local memory size for lptr |
| // argument in local address space for my_blk_A |
| uint local_mem_size = compute_local_mem_size(); |
| |
| enqueue_kernel(get_default_queue(), |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| my_blk_A, |
| local_mem_size, local_mem_size*4); |
| } |
| ---------- |
| |
| |
| [[a-complete-example]] |
| ==== A Complete Example |
| |
| The example below shows how to implement an iterative algorithm where the |
| host enqueues the first instance of the nd-range kernel (dp_func_A). |
| The kernel dp_func_A will launch a kernel (evaluate_dp_work_A) that will |
| determine if new nd-range work needs to be performed. |
| If new nd-range work does need to be performed, then evaluate_dp_work_A will |
| enqueue a new instance of dp_func_A . |
| This process is repeated until all the work is completed. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| dp_func_A(queue_t q, ...) |
| { |
| ... |
| |
| // queue a single instance of evaluate_dp_work_A to |
| // device queue q. queued kernel begins execution after |
| // kernel dp_func_A finishes |
| |
| if (get_global_id(0) == 0) |
| { |
| enqueue_kernel(q, |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange_1D(1), |
| ^{evaluate_dp_work_A(q, ...);}); |
| } |
| } |
| |
| kernel void |
| evaluate_dp_work_A(queue_t q,...) |
| { |
| // check if more work needs to be performed |
| bool more_work = check_new_work(...); |
| if (more_work) |
| { |
| size_t global_work_size = compute_global_size(...); |
| |
| void (^dp_func_A_blk)(void) = |
| ^{dp_func_A(q, ...}); |
| |
| // get local WG-size for kernel dp_func_A |
| size_t local_work_size = |
| get_kernel_work_group_size(dp_func_A_blk); |
| |
| // build nd-range descriptor |
| ndrange_t ndrange = ndrange_1D(global_work_size, |
| local_work_size); |
| |
| // enqueue dp_func_A |
| enqueue_kernel(q, |
| CLK_ENQUEUE_FLAGS_WAIT_KERNEL, |
| ndrange, |
| dp_func_A_blk); |
| } |
| ... |
| } |
| ---------- |
| |
| |
| [[determining-when-a-child-kernel-begins-execution]] |
| ==== Determining when a Child Kernel Begins Execution |
| |
| The `kernel_enqueue_flags_t` footnote:[{fn-dse-kernel_enqueue_flags_t}] argument |
| to the `enqueue_kernel` built-in functions can be used to specify when the child |
| kernel begins execution. |
| Supported values are described in the <<table-kernel-enqueue-flags, |
| following table>>: |
| |
| [[table-kernel-enqueue-flags]] |
| .Kernel Enqueue Flags |
| [cols=",",options="header",] |
| |==== |
| | `kernel_enqueue_flags_t` enum | Description |
| | `CLK_ENQUEUE_FLAGS_NO_WAIT` |
| | Indicates that the enqueued kernels do not need to wait for the parent |
| kernel to finish execution before they begin execution. |
| | `CLK_ENQUEUE_FLAGS_WAIT_KERNEL` |
| | Indicates that all work-items of the parent kernel must finish |
| executing and all immediate footnote:[{fn-dse-immediate-definition}] side |
| effects committed before the enqueued child kernel may begin execution. |
| | `CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP` |
| | Indicates that the enqueued kernels wait only for the workgroup that |
| enqueued the kernels to finish before they begin execution. |
| footnote:[{fn-dse-CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP}] |
| |==== |
| |
| [NOTE] |
| ==== |
| The `kernel_enqueue_flags_t` flags are useful when a kernel enqueued from |
| the host and executing on a device enqueues kernels on the device. |
| The kernel enqueued from the host may not have an event associated with it. |
| The `kernel_enqueue_flags_t` flags allow the developer to indicate when the |
| child kernels can begin execution. |
| ==== |
| |
| |
| [[determining-when-a-parent-kernel-has-finished-execution]] |
| ==== Determining When a Parent Kernel has Finished Execution |
| |
| A parent kernel's execution status is considered to be complete when it and |
| all its child kernels have finished execution. |
| The execution status of a parent kernel will be `CL_COMPLETE` if this kernel |
| and all its child kernels finish execution successfully. |
| The execution status of the kernel will be an error code (given by a |
| negative integer value) if it or any of its child kernels encounter an |
| error, or are abnormally terminated. |
| |
| For example, assume that the host enqueues a kernel `k` for execution on a |
| device. |
| Kernel `k` when executing on the device enqueues kernels `A` and `B` to a |
| device queue(s). |
| The enqueue_kernel call to enqueue kernel `B` specifies the event associated |
| with kernel `A` in the `event_wait_list` argument, i.e. wait for kernel `A` |
| to finish execution before kernel `B` can begin execution. |
| Let's assume kernel `A` enqueues kernels `X`, `Y` and `Z`. |
| Kernel `A` is considered to have finished execution, i.e. its execution |
| status is `CL_COMPLETE`, only after `A` and the kernels `A` enqueued (and |
| any kernels these enqueued kernels enqueue and so on) have finished |
| execution. |
| |
| |
| [[built-in-functions-kernel-query-functions]] |
| ==== Built-in Functions - Kernel Query Functions |
| |
| // Note: the Unicode "zero width space" (​) is used in some places to |
| // cause long function names to break much more sensibly. |
| // Probably the asciidoc built-in {zwsp} should be used instead. |
| |
| [open,refpage='kernelQueryFunctions',desc='Built-in Functions - Kernel Query Functions',type='freeform',spec='clang',anchor='built-in-functions-kernel-query-functions',xrefs='enqueue_kernel',alias='get_kernel_preferred get_kernel_work_group_size'] |
| -- |
| [[table-builtin-kernel-query]] |
| .Built-in Kernel Query Functions |
| [cols=",",options="header",] |
| |==== |
| | Built-in Function | Description |
| | uint *get_kernel_work_group_size*(void (^block)(void)) + |
| uint *get_kernel_work_group_size*(void (^block)(local {localArgType} *, ...)) |
| | This provides a mechanism to query the maximum work-group size that |
| can be used to execute a block on a specific device given by _device_. |
| |
| _block_ specifies the block to be enqueued. |
| | uint *get_kernel_preferred_​work_group_size_multiple*( |
| void (^block)(void)) + |
| uint *get_kernel_preferred_​work_group_size_multiple*( |
| void (^block)(local {localArgType} *, ...)) |
| | Returns the preferred multiple of work-group size for launch. |
| This is a performance hint. |
| Specifying a work-group size that is not a multiple of the value |
| returned by this query as the value of the local work size argument to |
| enqueue_kernel will not fail to enqueue the block for execution unless |
| the work-group size specified is larger than the device maximum. |
| |==== |
| -- |
| |
| |
| [[built-in-functions-queuing-other-commands]] |
| ==== Built-in Functions - Queuing Other Commands |
| |
| [open,refpage='enqueue_marker',desc='Built-in Functions - Queuing Other Commands',type='freeform',spec='clang',anchor='built-in-functions-queuing-other-commands',xrefs='enqueue_kernel'] |
| -- |
| |
| The following table describes the list of built-in functions that can be |
| used to enqueue commands such as a marker. |
| |
| [[table-builtin-other-enqueue]] |
| .Built-in Other Enqueue Functions |
| [cols=",",options="header",] |
| |==== |
| | Built-in Function | Description |
| | int *enqueue_marker*(queue_t _queue_, uint _num_events_in_wait_list_, |
| const clk_event_t *_event_wait_list_, clk_event_t *_event_ret_) |
| | Enqueue a marker command to _queue_. |
| |
| The marker command waits for a list of events specified by |
| _event_wait_list_ to complete before the marker completes. |
| |
| _event_ret_ must not be `NULL` as otherwise this is a no-op. |
| |
| If an event is returned, *enqueue_marker* performs an implicit retain |
| on the returned event. |
| |==== |
| |
| The *enqueue_marker* built-in function returns `CLK_SUCCESS` if the marked |
| command is enqueued successfully and returns `CLK_ENQUEUE_FAILURE` |
| otherwise. |
| If the -g compile option is specified in compiler options passed to |
| *clCompileProgram* or *clBuildProgram*, the following errors may be returned |
| instead of `CLK_ENQUEUE_FAILURE` to indicate why *enqueue_marker* failed to |
| enqueue the marker command: |
| |
| * `CLK_INVALID_QUEUE` if _queue_ is not a valid device queue. |
| * `CLK_INVALID_EVENT_WAIT_LIST` if _event_wait_list_ is `NULL`, or if |
| _event_wait_list_ is not `NULL` and _num_events_in_wait_list_ is 0, or |
| if event objects in _event_wait_list_ are not valid events. |
| * `CLK_DEVICE_QUEUE_FULL` if _queue_ is full. |
| * `CLK_EVENT_ALLOCATION_FAILURE` if _event_ret_ is not `NULL` and an event |
| could not be allocated. |
| * `CLK_OUT_OF_RESOURCES` if there is a failure to queue the block in |
| _queue_ because of insufficient resources needed to execute the kernel. |
| -- |
| |
| |
| [[built-in-functions-event-functions]] |
| ==== Built-in Functions - Event Functions |
| |
| [open,refpage='eventFunctions',desc='Built-in Event Functions',type='freeform',spec='clang',anchor='built-in-functions-event-functions',xrefs='',alias='capture_event_profiling_info create_user_event is_valid_event release_event retain_event set_user_event_status'] |
| -- |
| |
| The following table describes the list of built-in functions that work on |
| events. |
| |
| [[table-builtin-event]] |
| .Built-in Event Functions |
| [cols=",",options="header",] |
| |==== |
| | Built-in Function | Description |
| |
| | void *retain_event*(clk_event_t _event_) |
| | Increments the event reference count. |
| Behavior is undefined if _event_ is not a valid event. |
| | void *release_event*(clk_event_t _event_) |
| | Decrements the event reference count. |
| The event object is deleted once the event reference count is zero, |
| the specific command identified by this event has completed (or |
| terminated), and there are no commands in any device command-queue that |
| require a wait for this event to complete. |
| Behavior is undefined if _event_ is not a valid event. |
| | | |
| | clk_event_t *create_user_event*() |
| | Create a user event. |
| Returns the user event. |
| The execution status of the user event created is set to |
| `CL_SUBMITTED`. |
| | bool *is_valid_event*(clk_event_t _event_) |
| | Returns _true_ if _event_ is a valid event. |
| Otherwise returns _false_. |
| | void *set_user_event_status*(clk_event_t _event_, int _status_) |
| | Sets the execution status of a user event. |
| Behavior is undefined if _event_ is not a valid event returned by |
| *create_user_event*. |
| _status_ can be either `CL_COMPLETE` or a negative integer value |
| indicating an error. |
| | | |
| | void *capture_event_profiling_info*(clk_event_t _event_, |
| clk_profiling_info _name_, global void *_value_) |
| a| Captures the profiling information for functions that are enqueued as |
| commands. |
| These enqueued commands are identified by unique event objects. |
| The profiling information will be available in _value_ once the |
| command identified by _event_ has completed. |
| |
| Behavior is undefined if _event_ is not a valid event returned by |
| *enqueue_kernel*. |
| |
| _name_ identifies which profiling information is to be queried and can be: |
| |
| `CLK_PROFILING_COMMAND_EXEC_TIME` |
| |
| _value_ is a pointer to two 64-bit values. |
| |
| The first 64-bit value describes the elapsed time `CL_PROFILING_COMMAND_END` |
| - `CL_PROFLING_COMMAND_START` for the command identified by _event_ in |
| nanoseconds. |
| |
| The second 64-bit value describes the elapsed time |
| `CL_PROFILING_COMMAND_COMPLETE` - `CL_PROFILING_COMAMND_START` for the |
| command identified by _event_ in nanoseconds. |
| |
| [NOTE] |
| ==== |
| The behavior of capture_event_profiling_info when called multiple times for |
| the same _event_ is undefined. |
| ==== |
| |==== |
| |
| Events can be used to identify commands enqueued to a command-queue from the |
| host. |
| These events created by the OpenCL runtime can only be used on the host, |
| i.e. as events passed in the _event_wait_list_ argument to various |
| *clEnqueue* APIs or runtime APIs that take events as arguments, such as |
| *clRetainEvent*, *clReleaseEvent*, and *clGetEventProfilingInfo*. |
| |
| Similarly, events can be used to identify commands enqueued to a device |
| queue (from a kernel). |
| These event objects cannot be passed to the host or used by OpenCL runtime |
| APIs such as the *clEnqueue* APIs or runtime APIs that take event arguments. |
| |
| *clRetainEvent* and *clReleaseEvent* will return `CL_INVALID_OPERATION` if |
| _event_ specified is an event that refers to any kernel enqueued to a device |
| queue using *enqueue_kernel* or *enqueue_marker*, or is a user event created |
| by *create_user_event*. |
| |
| Similarly, *clSetUserEventStatus* can only be used to set the execution |
| status of events created using *clCreateUserEvent*. |
| User events created on the device can be set using set_user_event_status |
| built-in function. |
| |
| The example below shows how events can be used with kernels enqueued to |
| multiple device queues. |
| |
| [source,opencl_c] |
| ---------- |
| extern void barA_kernel(...); |
| extern void barB_kernel(...); |
| |
| kernel void |
| foo(queue_t q0, queue q1, ...) |
| { |
| ... |
| clk_event_t evt0; |
| |
| // enqueue kernel to queue q0 |
| enqueue_kernel(q0, |
| CLK_ENQUEUE_FLAGS_NO_WAIT, |
| ndrange_A, |
| 0, NULL, &evt0, |
| ^{barA_kernel(...);} ); |
| |
| // enqueue kernel to queue q1 |
| enqueue_kernel(q1, |
| CLK_ENQUEUE_FLAGS_NO_WAIT, |
| ndrange_B, |
| 1, &evt0, NULL, |
| ^{barB_kernel(...);} ); |
| |
| // release event evt0. This will get released |
| // after barA_kernel enqueued in queue q0 has finished |
| // execution and barB_kernel enqueued in queue q1 and |
| // waits for evt0 is submitted for execution, i.e. wait |
| // for evt0 is satisfied. |
| release_event(evt0); |
| |
| } |
| ---------- |
| |
| The example below shows how the marker command can be used with kernels |
| enqueued to a device queue. |
| |
| [source,opencl_c] |
| ---------- |
| kernel void |
| foo(queue_t q, ...) |
| { |
| ... |
| clk_event_t marker_event; |
| clk_event_t events[2]; |
| |
| enqueue_kernel(q, |
| CLK_ENQUEUE_FLAGS_NO_WAIT, |
| ndrange, |
| 0, NULL, &events[0], |
| ^{barA_kernel(...);} ); |
| |
| enqueue_kernel(q, |
| CLK_ENQUEUE_FLAGS_NO_WAIT, |
| ndrange, |
| 0, NULL, &events[1], |
| ^{barB_kernel(...);} ); |
| |
| // barA_kernel and barB_kernel can be executed |
| // out-of-order. We need to wait for both these |
| // kernels to finish execution before barC_kernel |
| // starts execution so we enqueue a marker command and |
| // then enqueue barC_kernel that waits on the event |
| // associated with the marker. |
| enqueue_marker(q, 2, events, &marker_event); |
| |
| enqueue_kernel(q, |
| CLK_ENQUEUE_FLAGS_NO_WAIT, |
| 1, &marker_event, NULL, |
| ^{barC_kernel(...);} ); |
| |
| release_event(events[0]; |
| release_event(events[1]); |
| release_event(marker_event); |
| } |
| ---------- |
| -- |
| |
| |
| [[built-in-functions-helper-functions]] |
| ==== Built-in Functions - Helper Functions |
| |
| [open,refpage='helperFunctions',desc='Built-in Helper Functions',type='freeform',spec='clang',anchor='built-in-functions-helper-functions',xrefs='',alias='get_default_queue ndrange ndrange_1D ndrange_2D ndrange_3D'] |
| -- |
| |
| [[table-builtin-helper]] |
| .Built-in Helper Functions |
| [cols=",",options="header",] |
| |==== |
| | Built-in Function | Description |
| | queue_t *get_default_queue*(void) |
| | Returns the default device queue. |
| If a default device queue has not been created, `CLK_NULL_QUEUE` is |
| returned. |
| | | |
| | ndrange_t *ndrange_1D*(size_t _global_work_size_) + |
| ndrange_t *ndrange_1D*(size_t _global_work_size_, |
| size_t _local_work_size_) + |
| ndrange_t *ndrange_1D*(size_t _global_work_offset_, |
| size_t _global_work_size_, size_t _local_work_size_) + |
| ndrange_t *ndrange_2D*(const size_t _global_work_size_[2]) + |
| ndrange_t *ndrange_2D*(const size_t _global_work_size_[2], |
| const size_t _local_work_size_[2]) + |
| ndrange_t *ndrange_2D*(const size_t _global_work_offset_[2], |
| const size_t _global_work_size_[2], |
| const size_t _local_work_size_[2]) + |
| ndrange_t *ndrange_3D*(const size_t _global_work_size_[3]) + |
| ndrange_t *ndrange_3D*(const size_t _global_work_size_[3], |
| const size_t _local_work_size_[3]) + |
| ndrange_t *ndrange_3D*(const size_t _global_work_offset_[3], |
| const size_t _global_work_size_[3], |
| const size_t _local_work_size_[3]) |
| | Builds a 1D, 2D or 3D ND-range descriptor. |
| |==== |
| -- |
| |
| |
| [[sub-group-functions]] |
| === Sub-Group Functions |
| |
| [open,refpage='subGroupFunctions',desc='Sub-Group Functions',type='freeform',spec='clang',anchor='sub-group-functions',xrefs='',alias='sub_group_all sub_group_any sub_group_broadcast sub_group_reduce sub_group_scan_exclusive sub_group_scan_inclusive sub_group_reserve_read_pipe sub_gorup_reserve_write_pipe sub_group_commit_read_pipe sub_group_commit_write_pipe get_kernel_sub_group_count_for_ndrange get_kernel_max_sub_group_size_for_ndrange'] |
| -- |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} feature. |
| |
| The <<table-collective-functions, following table>> describes OpenCL C |
| programming language built-in functions that operate on a sub-group level. |
| These built-in functions must be encountered by all work-items in the |
| sub-group executing the kernel. |
| For the functions below, the generic type name `gentype` may be the one of the |
| supported built-in scalar data types `int`, `uint`, `long` |
| footnote:[{fn-int64-supported}], `ulong`, `half` |
| footnote:[{fn-half-supported}], `float`, and `double` |
| footnote:[{fn-double-supported}]. |
| |
| ifdef::cl_khr_subgroup_extended_types[] |
| NOTE: If the `<<cl_khr_subgroup_extended_types>>` extension is supported, the |
| generic type name `gentype` may additionally be `char`, `uchar`, `short`, and |
| `ushort`. |
| For the `sub_group_broadcast` function, `gentype` may additionally be one of |
| the supported built-in vector data types `char__n__`, `uchar__n__`, |
| `short__n__`, `ushort__n__`, `int__n__`, `uint__n__`, `long__n__`, |
| `ulong__n__`, `float__n__`, `half__n__` footnote:[{fn-half-supported}], or |
| `double__n__` footnote:[{fn-double-supported}] |
| endif::cl_khr_subgroup_extended_types[] |
| |
| [[table-collective-functions]] |
| .Built-in Sub-Group Collective Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| |
| | int *sub_group_all* (int _predicate_) |
| | Evaluates _predicate_ for all work-items in the sub-group and returns a |
| non-zero value if _predicate_ evaluates to non-zero for all work-items in |
| the sub-group. |
| |
| | int *sub_group_any* (int _predicate_) |
| | Evaluates _predicate_ for all work-items in the sub-group and returns a |
| non-zero value if _predicate_ evaluates to non-zero for any work-items in |
| the sub-group. |
| |
| | gentype *sub_group_broadcast* ( + |
| gentype _x_, uint _sub_group_local_id_) |
| | Broadcast the value of _x_ for work-item identified by |
| _sub_group_local_id_ (value returned by *get_sub_group_local_id*) to all |
| work-items in the sub-group. |
| |
| Behavior is undefined when the value of _sub_group_local_id_ is not |
| equivalent for all work-items in the sub-group. |
| |
| Behavior is undefined when _sub_group_local_id_ is greater or equal to the |
| sub-group size. |
| |
| | gentype *sub_group_reduce_<op>* ( + |
| gentype _x_) |
| | Return result of reduction operation specified by *<op>* for all values of |
| _x_ specified by work-items in a sub-group. |
| |
| | gentype *sub_group_scan_exclusive_<op>* ( + |
| gentype _x_) |
| | Do an exclusive scan operation specified by *<op>* of all values specified |
| by work-items in a sub-group. |
| The scan results are returned for each work-item. |
| |
| The scan order is defined by increasing sub-group local ID within the |
| sub-group. |
| |
| | gentype *sub_group_scan_inclusive_<op>* ( + |
| gentype _x_) |
| | Do an inclusive scan operation specified by *<op>* of all values specified |
| by work-items in a sub-group. |
| The scan results are returned for each work-item. |
| |
| The scan order is defined by increasing sub-group local ID within the |
| sub-group. |
| |
| |==== |
| |
| The *<op>* in *sub_group_reduce_<op>*, *sub_group_scan_inclusive_<op>* and *sub_group_scan_exclusive_<op>* defines the operator and can be *add*, *min* or *max*. |
| |
| The exclusive scan operation takes a binary operator *op* with an identity I and _n_ (where _n_ is the size of the sub-group) elements [a~0~, a~1~, ... a~n-1~] and returns [I, a~0~, (a~0~ *op* a~1~), ... (a~0~ *op* a~1~ *op* ... *op* a~n-2~)]. |
| |
| The inclusive scan operation takes a binary operator *op* with an identity I and _n_ (where _n_ is the size of the sub-group) elements [a~0~, a~1~, ... a~n-1~] and returns [a~0~, (a~0~ *op* a~1~), ... (a~0~ *op* a~1~ *op* ... *op* a~n-1~)]. |
| |
| If *op* = *add*, the identity I is 0. |
| If *op* = *min*, the identity I is `INT_MAX`, `UINT_MAX`, `LONG_MAX`, `ULONG_MAX`, for `int`, `uint`, `long`, `ulong` types and is `+INF` for |
| floating-point types. |
| Similarly if *op* = max, the identity I is `INT_MIN`, 0, `LONG_MIN`, 0 and `-INF`. |
| |
| [NOTE] |
| ==== |
| The order of floating-point operations is not guaranteed for the *sub_group_reduce_<op>*, *sub_group_scan_inclusive_<op>* and *sub_group_scan_exclusive_<op>* built-in functions that operate on `half`, `float` and `double` data types. |
| The order of these floating-point operations is also non-deterministic for a given sub-group. |
| ==== |
| |
| NOTE: The functionality described in the following table <<unified-spec, |
| requires>> support |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} and {opencl_c_pipes} |
| features. |
| |
| The <<table-pipe-functions, following table>> describes built-in pipe |
| functions that operate at a sub-group level. |
| These built-in functions must be encountered by all work-items in a sub-group |
| executing the kernel with the same argument values, otherwise the behavior |
| is undefined. |
| We use the generic type name `gentype` to indicate the built-in OpenCL C |
| scalar or vector integer or floating-point data types or any user defined |
| type built from these scalar and vector data types can be used as the type |
| for the arguments to the pipe functions listed in _table 6.29_. |
| |
| [[table-pipe-functions]] |
| .Built-in Sub-Group Pipe Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Description |
| |
| | reserve_id_t *sub_group_reserve_read_pipe* ( + |
| read_only pipe gentype _pipe_, + |
| uint _num_packets_) |
| |
| reserve_id_t *sub_group_reserve_write_pipe* ( + |
| write_only pipe gentype _pipe_, + |
| uint _num_packets_) |
| | Reserve _num_packets_ entries for reading from or writing to _pipe_. |
| Returns a valid non-zero reservation ID if the reservation is successful |
| and 0 otherwise. |
| |
| The reserved pipe entries are referred to by indices that go from 0 ... |
| _num_packets_ - 1. |
| |
| | void *sub_group_commit_read_pipe* ( + |
| read_only pipe gentype _pipe_, + |
| reserve_id_t _reserve_id_) |
| |
| void *sub_group_commit_write_pipe* ( + |
| write_only pipe gentype _pipe_, + |
| reserve_id_t _reserve_id_) |
| | Indicates that all reads and writes to _num_packets_ associated with |
| reservation _reserve_id_ are completed. |
| |
| |==== |
| |
| Note: Reservations made by a sub-group are ordered in the pipe as they are |
| ordered in the program. |
| Reservations made by different sub-groups that belong to the same work-group |
| can be ordered using sub-group synchronization. |
| The order of sub-group based reservations that belong to different work |
| groups is implementation-defined. |
| |
| NOTE: The functionality described in the following table <<unified-spec, |
| requires>> support |
| ifdef::cl_khr_subgroups[the `<<cl_khr_subgroups>>` extension macro; or for] |
| OpenCL C 3.0 or newer and the {opencl_c_subgroups} and |
| {opencl_c_device_enqueue} features. |
| |
| The <<table-kernel-query-functions, following table>> describes built-in |
| functions to query sub-group information for a block to be enqueued. |
| |
| [[table-kernel-query-functions]] |
| .Built-in Sub-Group Kernel Query Functions |
| [cols="5,4",options="header",] |
| |==== |
| | Built-in Function | Description |
| |
| | uint *get_kernel_sub_group_count_for_ndrange* ( + |
| const ndrange_t _ndrange_, + |
| void (^block)(void)); |
| |
| uint *get_kernel_sub_group_count_for_ndrange* ( + |
| const ndrange_t _ndrange_, + |
| void (^block)(local void *, ...)); |
| | Returns the number of sub-groups in each work-group of the dispatch (except |
| for the last in cases where the global size does not divide cleanly into |
| work-groups) given the combination of the passed ndrange and block. |
| |
| _block_ specifies the block to be enqueued. |
| |
| | uint *get_kernel_max_sub_group_size_for_ndrange* ( + |
| const ndrange_t _ndrange_, + |
| void (^block)(void)); + |
| |
| uint *get_kernel_max_sub_group_size_for_ndrange* ( + |
| const ndrange_t _ndrange_, + |
| void (^block)(local void *, ...)); |
| | Returns the maximum sub-group size for a block. |
| |
| |==== |
| -- |
| |
| |
| ifdef::cl_khr_subgroup_ballot[] |
| [[sub-group-ballot-functions]] |
| ==== Built-in Sub-Group Ballot Functions |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_ballot>>` extension. |
| |
| The <<table-ballot-functions, followingtable>> describes OpenCL C |
| programming language built-in functions to allow work items in a sub-group |
| to collect and operate on ballots from work items in the sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| |
| For the `sub_group_non_uniform_broadcast` and `sub_group_broadcast_first` |
| functions, the generic type name `gentype` may be one of the supported |
| built-in scalar data types `char`, `uchar`, `short`, `ushort`, `int`, |
| `uint`, `long`, `ulong`, `float`, `half` footnote:[{fn-half-supported}], and |
| `double` footnote:[{fn-double-supported}]. |
| |
| For the `sub_group_non_uniform_broadcast` function, the generic type name |
| `gentype` may additionally be one of the supported built-in vector data |
| types `char__n__`, `uchar__n__`, `short__n__`, `ushort__n__`, `int__n__`, |
| `uint__n__`, `long__n__`, `ulong__n__`, `float__n__`, `half__n__` |
| footnote:[{fn-half-supported}], or `double__n__` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-ballot-functions]] |
| .Built-in Sub-Group Ballot Functions |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_broadcast( |
| gentype value, |
| uint index ) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to |
| _index_. |
| |
| Behavior is undefined when the value of _index_ is not equivalent for |
| all active work items in the sub-group. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to _index_ is inactive or if _index_ is greater than or equal to |
| the size of the sub-group. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_broadcast_first( |
| gentype value ) |
| ---- |
| | Returns _value_ for the work item with the smallest sub-group local ID |
| among active work items in the sub-group. |
| |[source,opencl_c] |
| ---- |
| uint4 sub_group_ballot( |
| int predicate ) |
| ---- |
| | Returns a bitfield combining the _predicate_ values from all work items |
| in the sub-group. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| The representative bit in the bitfield is set if the work item is active |
| and the _predicate_ is non-zero, and is unset otherwise. |
| |[source,opencl_c] |
| ---- |
| int sub_group_inverse_ballot( |
| uint4 value ) |
| ---- |
| | Returns the predicate value for this work item in the sub-group from the |
| bitfield _value_ representing predicate values from all work items in |
| the sub-group. |
| The predicate return value will be non-zero if the bit in the bitfield |
| _value_ for this work item is set, and zero otherwise. |
| |
| Behavior is undefined when _value_ is not equivalent for all active work |
| items in the sub-group. |
| |
| This is a specialized function that may perform better than the |
| equivalent `sub_group_ballot_bit_extract` on some implementations. |
| |[source,opencl_c] |
| ---- |
| int sub_group_ballot_bit_extract( |
| uint4 value, |
| uint index ) |
| ---- |
| | Returns the predicate value for the work item with sub-group local ID |
| equal to _index_ from the bitfield _value_ representing predicate values |
| from all work items in the sub-group. |
| The predicate return value will be non-zero if the bit in the bitfield |
| _value_ for the work item with sub-group local ID equal to _index_ is |
| set, and zero otherwise. |
| |
| The predicate return value is undefined if the work item with sub-group |
| local ID equal to _index_ is greater than or equal to the size of the |
| sub-group. |
| |[source,opencl_c] |
| ---- |
| uint sub_group_ballot_bit_count( |
| uint4 value ) |
| ---- |
| | Returns the number of bits that are set in the bitfield _value_, only |
| considering the bits in _value_ that represent predicate values |
| corresponding to sub-group local IDs less than the maximum sub-group |
| size within the dispatch (as returned by `get_max_sub_group_size`). |
| |[source,opencl_c] |
| ---- |
| uint sub_group_ballot_inclusive_scan( |
| uint4 value ) |
| ---- |
| | Returns the number of bits that are set in the bitfield _value_, only |
| considering the bits in _value_ representing work items with a sub-group |
| local ID less than or equal to this work item's sub-group local ID. |
| |[source,opencl_c] |
| ---- |
| uint sub_group_ballot_exclusive_scan( |
| uint4 value ) |
| ---- |
| | Returns the number of bits that are set in the bitfield _value_, only |
| considering the bits in _value_ representing work items with a sub-group |
| local ID less than this work item's sub-group local ID. |
| |[source,opencl_c] |
| ---- |
| uint sub_group_ballot_find_lsb( |
| uint4 value ) |
| ---- |
| | Returns the smallest sub-group local ID with a bit set in the bitfield |
| _value_, only considering the bits in _value_ that represent predicate |
| values corresponding to sub-group local IDs less than the maximum |
| sub-group size within the dispatch (as returned by |
| `get_max_sub_group_size`). |
| If no bits representing predicate values from all work items in the |
| sub-group are set in the bitfield _value_ then the return value is |
| undefined. |
| |[source,opencl_c] |
| ---- |
| uint sub_group_ballot_find_msb( |
| uint4 value ) |
| ---- |
| | Returns the largest sub-group local ID with a bit set in the bitfield |
| _value_, only considering the bits in _value_ that represent predicate |
| values corresponding to sub-group local IDs less than the maximum |
| sub-group size within the dispatch (as returned by |
| `get_max_sub_group_size`). |
| If no bits representing predicate values from all work items in the |
| sub-group are set in the bitfield _value_ then the return value is |
| undefined. |
| |[source,opencl_c] |
| ---- |
| uint4 get_sub_group_eq_mask() |
| ---- |
| | Generates a bitmask where the bit is set in the bitmask if the bit index |
| equals the sub-group local ID and unset otherwise. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| |[source,opencl_c] |
| ---- |
| uint4 get_sub_group_ge_mask() |
| ---- |
| | Generates a bitmask where the bit is set in the bitmask if the bit index |
| is greater than or equal to the sub-group local ID and less than the |
| maximum sub-group size, and unset otherwise. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| |[source,opencl_c] |
| ---- |
| uint4 get_sub_group_gt_mask() |
| ---- |
| | Generates a bitmask where the bit is set in the bitmask if the bit index |
| is greater than the sub-group local ID and less than the maximum |
| sub-group size, and unset otherwise. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| |[source,opencl_c] |
| ---- |
| uint4 get_sub_group_le_mask() |
| ---- |
| | Generates a bitmask where the bit is set in the bitmask if the bit index |
| is less than or equal to the sub-group local ID and unset otherwise. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| |[source,opencl_c] |
| ---- |
| uint4 get_sub_group_lt_mask() |
| ---- |
| | Generates a bitmask where the bit is set in the bitmask if the bit index |
| is less than the sub-group local ID and unset otherwise. |
| Bit zero of the first vector component represents the sub-group local ID |
| zero, with higher-order bits and subsequent vector components |
| representing, in order, increasing sub-group local IDs. |
| |==== |
| |
| endif::cl_khr_subgroup_ballot[] |
| |
| |
| ifdef::cl_khr_subgroup_clustered_reduce[] |
| [[sub-group-clustered-reduction-functions]] |
| ==== Built-in Clustered Reduction Functions for Sub-Groups |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_clustered_reduce>>` extension. |
| |
| This section describes arithmetic operations that are performed on a subset |
| of work items in a sub-group, referred to as a cluster. |
| A cluster is described by a specified cluster size. |
| Work items in a sub-group are assigned to clusters such that for cluster |
| size _n_, the _n_ work items in the sub-group with the smallest sub-group |
| local IDs are assigned to the first cluster, then the _n_ remaining work |
| items with the smallest sub-group local IDs are assigned to the next |
| cluster, and so on. |
| Behavior is undefined if the specified cluster size is not an integer |
| constant expression, is not a power-of-two, or is greater than the maximum |
| size of a sub-group within the dispatch. |
| |
| |
| ===== Arithmetic Operations |
| |
| The table below describes the OpenCL C programming language built-in |
| functions that perform simple arithmetic operations on a cluster of work |
| items in a sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-clustered-reduce-math-functions]] |
| .Built-in Arithmetic Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_clustered_reduce_add( |
| gentype value, uint clustersize ) |
| gentype sub_group_clustered_reduce_mul( |
| gentype value, uint clustersize ) |
| gentype sub_group_clustered_reduce_min( |
| gentype value, uint clustersize ) |
| gentype sub_group_clustered_reduce_max( |
| gentype value, uint clustersize ) |
| ---- |
| | Returns the summation, multiplication, minimum, or maximum of _value_ |
| for all active work items in the sub-group within a cluster of the |
| specified _clustersize_. |
| |==== |
| |
| Note: The order of floating-point operations is not guaranteed for the |
| sub-group clustered reduction built-in functions that operate on |
| floating-point types, and the order of operations may additionally be |
| non-deterministic for a given sub-group. |
| |
| |
| ===== Bitwise Operations |
| |
| The table below describes the OpenCL C programming language built-in |
| functions to perform simple bitwise integer operations across a cluster of |
| work items in a sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be the one of |
| the supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, or `ulong`. |
| |
| [[table-clustered-reduce-bitwise-functions]] |
| .Built-in Bitwise Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_clustered_reduce_and( |
| gentype value, uint clustersize ) |
| gentype sub_group_clustered_reduce_or( |
| gentype value, uint clustersize ) |
| gentype sub_group_clustered_reduce_xor( |
| gentype value, uint clustersize ) |
| ---- |
| | Returns the bitwise *and*, *or*, or *xor* of _value_ for all active work |
| items in the sub-group within a cluster of the specified _clustersize_. |
| |==== |
| |
| |
| ===== Logical Operations |
| |
| The table below describes the OpenCL C programming language built-in |
| functions to perform simple logical operations across a cluster of work |
| items in a sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For these functions, a non-zero _predicate_ argument or return value is |
| logically `true` and a zero _predicate_ argument or return value is |
| logically `false`. |
| |
| [[table-clustered-reduce-logical-functions]] |
| .Built-in Logical Functions for Sub-Groups |
| [cols="3a,2",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| int sub_group_clustered_reduce_logical_and( |
| int predicate, uint clustersize ) |
| int sub_group_clustered_reduce_logical_or( |
| int predicate, uint clustersize ) |
| int sub_group_clustered_reduce_logical_xor( |
| int predicate, uint clustersize ) |
| ---- |
| | Returns the logical *and*, *or*, or *xor* of _predicate_ for all active |
| work items in the sub-group within a cluster of the specified |
| _clustersize_. |
| |==== |
| |
| endif::cl_khr_subgroup_clustered_reduce[] |
| |
| |
| ifdef::cl_khr_subgroup_non_uniform_arithmetic[] |
| ==== Built-in Non-Uniform Scan and Reduction Functions for Sub-Groups |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_non_uniform_arithmetic>>` extension. |
| |
| ===== Arithmetic Operations |
| |
| The <<table-non-uniform-math-functions, following table>> describes the |
| OpenCL C programming language built-in functions that perform simple |
| arithmetic operations across work items in a sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-non-uniform-math-functions]] |
| .Built-in Non-Uniform Arithmetic Functions for Sub-Groups |
| [cols="3a,2",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_reduce_add( |
| gentype value ) |
| gentype sub_group_non_uniform_reduce_min( |
| gentype value ) |
| gentype sub_group_non_uniform_reduce_max( |
| gentype value ) |
| gentype sub_group_non_uniform_reduce_mul( |
| gentype value ) |
| ---- |
| | Returns the summation, multiplication, minimum, or maximum of _value_ |
| for all active work items in the sub-group. |
| |
| Note: This behavior is the same as the *add*, *min*, and *max* reduction |
| built-in functions from `<<cl_khr_subgroups>>` and OpenCL 2.1, except |
| these functions support additional types and need not be encountered by |
| all work items in the sub-group executing the kernel. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_scan_inclusive_add( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_inclusive_min( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_inclusive_max( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_inclusive_mul( |
| gentype value ) |
| ---- |
| | Returns the result of an inclusive scan operation, which is the |
| summation, multiplication, minimum, or maximum of _value_ for all active |
| work items in the sub-group with a sub-group local ID less than or equal |
| to this work item's sub-group local ID. |
| |
| Note: This behavior is the same as the *add*, *min*, and *max* inclusive |
| scan built-in functions from `<<cl_khr_subgroups>>` and OpenCL 2.1, |
| except these functions support additional types and need not be |
| encountered by all work items in the sub-group executing the kernel. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_scan_exclusive_add( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_exclusive_min( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_exclusive_max( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_exclusive_mul( |
| gentype value ) |
| ---- |
| | Returns the result of an exclusive scan operation, which is the |
| summation, multiplication, minimum, or maximum of _value_ for all active |
| work items in the sub-group with a sub-group local ID less than this |
| work item's sub-group local ID. |
| |
| If there is no active work item in the sub-group with a sub-group local |
| ID less than this work item's sub-group local ID then an identity value |
| `I` is returned. |
| For *add*, the identity value is `0`. |
| For *min*, the identity value is the largest representable value for |
| integer types, or `+INF` for floating-point types. |
| For *max*, the identity value is the minimum representable value for |
| integer types, or `-INF` for floating-point types. |
| For *mul*, the identity value is `1`. |
| |
| Note: This behavior is the same as the *add*, *min*, and *max* exclusive |
| scan built-in functions from `<<cl_khr_subgroups>>` and OpenCL 2.1, |
| except these functions support additional types and need not be |
| encountered by all work items in the sub-group executing the kernel. |
| |==== |
| |
| Note: The order of floating-point operations is not guaranteed for the |
| sub-group scan and reduction built-in functions that operate on |
| floating-point types, and the order of operations may additionally be |
| non-deterministic for a given sub-group. |
| |
| |
| ===== Bitwise Operations |
| |
| The table below describes the OpenCL C programming language built-in |
| functions that perform simple bitwise integer operations across work items |
| in a sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, and `ulong`. |
| |
| [[table-non-uniform-bitwise-functions]] |
| .Built-in Non-Uniform Bitwise Functions for Sub-Groups |
| [cols="3a,2",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_reduce_and( |
| gentype value ) |
| gentype sub_group_non_uniform_reduce_or( |
| gentype value ) |
| gentype sub_group_non_uniform_reduce_xor( |
| gentype value ) |
| ---- |
| | Returns the bitwise *and*, *or*, or *xor* of _value_ for all active work |
| items in the sub-group. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_scan_inclusive_and( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_inclusive_or( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_inclusive_xor( |
| gentype value ) |
| ---- |
| | Returns the result of an inclusive scan operation, which is the bitwise |
| *and*, *or*, or *xor* of _value_ for all active work items in the |
| sub-group with a sub-group local ID less than or equal to this work |
| item's sub-group local ID. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_non_uniform_scan_exclusive_and( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_exclusive_or( |
| gentype value ) |
| gentype sub_group_non_uniform_scan_exclusive_xor( |
| gentype value ) |
| ---- |
| | Returns the result of an exclusive scan operation, which is the bitwise |
| *and*, *or*, or *xor* of _value_ for all active work items in the |
| sub-group with a sub-group local ID less than this work item's sub-group |
| local ID. |
| |
| If there is no active work item in the sub-group with a sub-group local |
| ID less than this work item's sub-group local ID then an identity value |
| `I` is returned. |
| For *and*, the identity value is `~0` (all bits set). |
| For *or* and *xor*, the identity value is `0`. |
| |==== |
| |
| |
| ===== Logical Operations |
| |
| The table below describes the OpenCL C programming language built-in |
| functions that perform simple logical operations across work items in a |
| sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For these functions, a non-zero _predicate_ argument or return value is |
| logically `true` and a zero _predicate_ argument or return value is |
| logically `false`. |
| |
| [[table-non-uniform-logical-functions]] |
| .Built-in Non-Uniform Logical Functions for Sub-Groups |
| [cols="2a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_reduce_logical_and( |
| int predicate ) |
| int sub_group_non_uniform_reduce_logical_or( |
| int predicate ) |
| int sub_group_non_uniform_reduce_logical_xor( |
| int predicate ) |
| ---- |
| | Returns the logical *and*, *or*, or *xor* of _predicate_ for all active |
| work items in the sub-group. |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_scan_inclusive_logical_and( |
| int predicate ) |
| int sub_group_non_uniform_scan_inclusive_logical_or( |
| int predicate ) |
| int sub_group_non_uniform_scan_inclusive_logical_xor( |
| int predicate ) |
| ---- |
| | Returns the result of an inclusive scan operation, which is the logical |
| *and*, *or*, or *xor* of _predicate_ for all active work items in the |
| sub-group with a sub-group local ID less than or equal to this work |
| item's sub-group local ID. |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_scan_exclusive_logical_and( |
| int predicate ) |
| int sub_group_non_uniform_scan_exclusive_logical_or( |
| int predicate ) |
| int sub_group_non_uniform_scan_exclusive_logical_xor( |
| int predicate ) |
| ---- |
| | Returns the result of an exclusive scan operation, which is the logical |
| *and*, *or*, or *xor* of _predicate_ for all active work items in the |
| sub-group with a sub-group local ID less than this work item's sub-group |
| local ID. |
| |
| If there is no active work item in the sub-group with a sub-group local |
| ID less than this work item's sub-group local ID then an identity value |
| `I` is returned. |
| For *and*, the identity value is `true` (non-zero). |
| For *or* and *xor*, the identity value is `false` (zero). |
| |==== |
| |
| endif::cl_khr_subgroup_non_uniform_arithmetic[] |
| |
| |
| ifdef::cl_khr_subgroup_non_uniform_vote[] |
| ==== Built-in Non-Uniform Vote Functions for Sub-Groups |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_non_uniform_vote>>` extension. |
| |
| The <<table-non-uniform-vote-functions, following table>> describes the |
| OpenCL C programming language built-in functions to elect a single work item |
| in a sub-group to perform a task and to collectively vote to determine a |
| boolean condition for the sub-group. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be the one of |
| the supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-non-uniform-vote-functions]] |
| .Built-in Non-Uniform Vote Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| int sub_group_elect() |
| ---- |
| | Elects a single work item in the sub-group to perform a task. |
| |
| This function will return true (nonzero) for the active work item in the |
| sub-group with the smallest sub-group local ID, and false (zero) for all |
| other active work items in the sub-group. |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_all( |
| int predicate ) |
| ---- |
| | Examines _predicate_ for all active work items in the sub-group and |
| returns a non-zero value if _predicate_ is non-zero for all active work |
| items in the sub-group and zero otherwise. |
| |
| Note: This behavior is the same as `sub_group_all` from |
| `<<cl_khr_subgroups>>` and OpenCL 2.1, except this function need not be |
| encountered by all work items in the sub-group executing the kernel. |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_any( |
| int predicate ) |
| ---- |
| | Examines _predicate_ for all active work items in the sub-group and |
| returns a non-zero value if _predicate_ is non-zero for any active work |
| item in the sub-group and zero otherwise. |
| |
| Note: This behavior is the same as `sub_group_any` from |
| `<<cl_khr_subgroups>>` and OpenCL 2.1, except this function need not be |
| encountered by all work items in the sub-group executing the kernel. |
| |[source,opencl_c] |
| ---- |
| int sub_group_non_uniform_all_equal( |
| gentype value ) |
| ---- |
| | Examines _value_ for all active work items in the sub-group and returns |
| a non-zero value if _value_ is equivalent for all active invocations in |
| the sub-group and zero otherwise. |
| |
| Integer types use a bitwise test for equality. Floating-point types use |
| an ordered floating-point test for equality. |
| |==== |
| |
| endif::cl_khr_subgroup_non_uniform_vote[] |
| |
| |
| ifdef::cl_khr_subgroup_rotate[] |
| [[sub-group-rotate-functions]] |
| ==== Built-in Sub-Group Rotation Functions |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_rotate>>` extension. |
| |
| The <<table-rotate-functions, following table>> describes a specialized |
| OpenCL C programming language built-in function that allow work items in a |
| sub-group to exchange data. |
| This function need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-rotate-functions]] |
| .Built-in Rotation Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_rotate( |
| gentype value, int delta) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to the |
| remainder of the division of the sum of this work item's sub-group local |
| ID and _delta_ by the maximum sub-group size. + |
| The value of _delta_ is required to be dynamically-uniform for all work |
| items in the sub-group, otherwise the behavior is undefined. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to the calculated index is inactive. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_clustered_rotate( |
| gentype value, int delta, |
| uint clustersize) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to the |
| sum of, the remainder of the division of the sum of this work item's ID |
| within the cluster and _delta_ by _clustersize_, and the sub-group local |
| ID of the first work-item of the cluster to which the work-item |
| executing the function belongs. + |
| The value of _delta_ is required to be dynamically-uniform for all work |
| items in the sub-group, otherwise the behavior is undefined. |
| |
| _clustersize_ must be an integer constant expression and a power of two, |
| smaller than or equal to the maximum sub-group size, otherwise the |
| behavior is undefined. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to the calculated index is inactive. |
| |==== |
| |
| endif::cl_khr_subgroup_rotate[] |
| |
| |
| ifdef::cl_khr_subgroup_shuffle[] |
| ==== Built-in Shuffle Functions for Sub-Groups |
| |
| NOTE: The functionality described in this section <<unified-spec, requires>> |
| support for the `<<cl_khr_subgroup_shuffle>>` extension. |
| |
| The <<table-shuffle-functions, following table>> describes the OpenCL C |
| programming language built-in functions that allow work items in a sub-group |
| to exchange data. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-shuffle-functions]] |
| .Built-in Shuffle Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_shuffle( |
| gentype value, uint index ) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to |
| _index_. |
| The shuffle _index_ need not be the same for all work items in the |
| sub-group. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to _index_ is inactive or if _index_ is greater than or equal to |
| the size of the sub-group. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_shuffle_xor( |
| gentype value, uint mask ) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to |
| this work item's sub-group local ID xor'd with _mask_. |
| The shuffle _mask_ need not be the same for all work items in the |
| sub-group. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to the calculated index is inactive or if the calculated index is |
| greater than or equal to the size of the sub-group. |
| |
| This is a specialized function that may perform better than the |
| equivalent `sub_group_shuffle` on some implementations. |
| |==== |
| |
| endif::cl_khr_subgroup_shuffle[] |
| |
| |
| ifdef::cl_khr_subgroup_shuffle_relative[] |
| ==== Add a new Section 6.15.X - Sub-Group Relative Shuffle Built-in Functions |
| |
| The table below describes specialized OpenCL C programming language built-in |
| functions that allow work items in a sub-group to exchange data. |
| These functions need not be encountered by all work items in a sub-group |
| executing the kernel. |
| For the functions below, the generic type name `gentype` may be one of the |
| supported built-in scalar data types `char`, `uchar`, `short`, `ushort`, |
| `int`, `uint`, `long`, `ulong`, `float`, `half` |
| footnote:[{fn-half-supported}], and `double` |
| footnote:[{fn-double-supported}]. |
| |
| [[table-shuffle-relative-functions]] |
| .Built-in Relative Shuffle Functions for Sub-Groups |
| [cols="1a,1",options="header",] |
| |==== |
| | Function | Description |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_shuffle_up( |
| gentype value, uint delta ) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to this |
| work item's sub-group local ID minus _delta_. |
| The shuffle _delta_ need not be the same for all work items in the |
| sub-group. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to the calculated index is inactive, or _delta_ is greater than |
| this work item's sub-group local ID. |
| |
| This is a specialized function that may perform better than the |
| equivalent `sub_group_shuffle` on some implementations. |
| |[source,opencl_c] |
| ---- |
| gentype sub_group_shuffle_down( |
| gentype value, uint delta ) |
| ---- |
| | Returns _value_ for the work item with sub-group local ID equal to this |
| work item's sub-group local ID plus _delta_. |
| The shuffle _delta_ need not be the same for all work items in the |
| sub-group. |
| |
| The return value is undefined if the work item with sub-group local ID |
| equal to the calculated index is inactive, or this work item's sub-group |
| local ID plus _delta_ is greater than or equal to the size of the |
| sub-group. |
| |
| This is a specialized function that may perform better than the |
| equivalent `sub_group_shuffle` on some implementations. |
| |==== |
| endif::cl_khr_subgroup_shuffle_relative[] |
| |
| |
| [[extended-sub-groups-mapping]] |
| === Sub-Groups Function Mapping and Capabilities |
| |
| This section describes a possible mapping between OpenCL built-in sub-group functions |
| and SPIR-V instructions and required SPIR-V capabilities. |
| |
| This section is informational and non-normative. |
| |
| [cols="1,1,1",options="header"] |
| |==== |
| | OpenCL C Function | SPIR-V BuiltIn or Instruction | Enabling SPIR-V Capability |
| |
| 3+| For OpenCL 2.1 or `<<cl_khr_subgroups>>`: |
| |
| | `get_​sub_​group_​size` |
| | *SubgroupSize* |
| | *Kernel* |
| | `get_​max_​sub_​group_​size` |
| | *SubgroupMaxSize* |
| | *Kernel* |
| | `get_​num_​sub_​groups` |
| | *NumSubgroups* |
| | *Kernel* |
| | `get_​enqueued_​num_​sub_​groups` |
| | *NumEnqueuedSubgroups* |
| | *Kernel* |
| | `get_​sub_​group_​id` |
| | *SubgroupId* |
| | *Kernel* |
| | `get_​sub_​group_​local_​id` |
| | *SubgroupLocalInvocationId* |
| | *Kernel* |
| |
| | `sub_​group_​barrier` |
| | *OpControlBarrier* |
| | None Needed |
| |
| | `sub_​group_​all` |
| | *OpGroupAll* |
| | *Groups* |
| | `sub_​group_​any` |
| | *OpGroupAny* |
| | *Groups* |
| |
| | `sub_​group_​broadcast` |
| | *OpGroupBroadcast* |
| | *Groups* |
| |
| | `sub_​group_​reduce_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​reduce_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​reduce_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| |
| | `sub_​group_​scan_​exclusive_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​scan_​exclusive_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​scan_​exclusive_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| |
| | `sub_​group_​scan_​inclusive_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​scan_​inclusive_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​scan_​inclusive_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| |
| | `sub_​group_​reserve_​read_​pipe` |
| | *OpGroupReserveReadPipePackets* |
| | *Pipes* |
| | `sub_​group_​reserve_​write_​pipe` |
| | *OpGroupReserveReadWritePackets* |
| | *Pipes* |
| | `sub_​group_​commit_​read_​pipe` |
| | *OpGroupCommitReadPipe* |
| | *Pipes* |
| | `sub_​group_​commit_​write_​pipe` |
| | *OpGroupCommitWritePipe* |
| | *Pipes* |
| |
| | `get_​kernel_​sub_​group_​count_​for_​ndrange` |
| | *OpGetKernelNDrangeSubGroupCount* |
| | *DeviceEnqueue* |
| | `get_​kernel_​max_​sub_​group_​size_​for_​ndrange` |
| | *OpGetKernelNDrangeMaxSubGroupSize* |
| | *DeviceEnqueue* |
| |
| ifdef::cl_khr_subgroup_ballot[] |
| 3+| For `<<cl_khr_subgroup_ballot>>`: |
| |
| | `sub_​group_​non_​uniform_​broadcast` |
| | *OpGroupNonUniformBroadcast* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​broadcast_​first` |
| | *OpGroupNonUniformBroadcastFirst* |
| | *GroupNonUniformBallot* |
| |
| | `sub_​group_​ballot` |
| | *OpGroupNonUniformBallot* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​inverse_​ballot` |
| | *OpGroupNonUniformInverseBallot* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​bit_​extract` |
| | *OpGroupNonUniformBallotBitExtract* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​bit_​count` |
| | *OpGroupNonUniformBallotBitCount* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​inclusive_​scan` |
| | *OpGroupNonUniformBallotBitCount* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​exclusive_​scan` |
| | *OpGroupNonUniformBallotBitCount* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​find_​lsb` |
| | *OpGroupNonUniformBallotFindLSB* |
| | *GroupNonUniformBallot* |
| | `sub_​group_​ballot_​find_​msb` |
| | *OpGroupNonUniformBallotFindMSB* |
| | *GroupNonUniformBallot* |
| |
| | `get_​sub_​group_​eq_​mask` |
| | *SubgroupEqMask* |
| | *GroupNonUniformBallot* |
| | `get_​sub_​group_​ge_​mask` |
| | *SubgroupGeMask* |
| | *GroupNonUniformBallot* |
| | `get_​sub_​group_​gt_​mask` |
| | *SubgroupGtMask* |
| | *GroupNonUniformBallot* |
| | `get_​sub_​group_​le_​mask` |
| | *SubgroupLeMask* |
| | *GroupNonUniformBallot* |
| | `get_​sub_​group_​lt_​mask` |
| | *SubgroupLtMask* |
| | *GroupNonUniformBallot* |
| endif::cl_khr_subgroup_ballot[] |
| |
| ifdef::cl_khr_subgroup_clustered_reduce[] |
| 3+| For `<<cl_khr_subgroup_clustered_reduce>>`: |
| |
| | `sub_​group_​clustered_​reduce_​add` |
| | *OpGroupNonUniformIAdd*, *OpGroupNonUniformFAdd* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​mul` |
| | *OpGroupNonUniformIMul*, *OpGroupNonUniformFMul* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​min` |
| | *OpGroupNonUniformSMin*, *OpGroupNonUniformUMin*, *OpGroupNonUniformFMin* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​max` |
| | *OpGroupNonUniformSMax*, *OpGroupNonUniformUMax*, *OpGroupNonUniformFMax* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​and` |
| | *OpGroupNonUniformBitwiseAnd* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​or` |
| | *OpGroupNonUniformBitwiseOr* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​xor` |
| | *OpGroupNonUniformBitwiseXor* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​logical_​and` |
| | *OpGroupNonUniformLogicalAnd* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​logical_​or` |
| | *OpGroupNonUniformLogicalOr* |
| | *GroupNonUniformClustered* |
| | `sub_​group_​clustered_​reduce_​logical_​xor` |
| | *OpGroupNonUniformLogicalXor* |
| | *GroupNonUniformClustered* |
| endif::cl_khr_subgroup_clustered_reduce[] |
| |
| ifdef::cl_khr_subgroup_extended_types[] |
| 3+| For `<<cl_khr_subgroup_extended_types>>`: + |
| Note: This extension adds new types to uniform sub-group operations. |
| |
| | `sub_​group_​broadcast` |
| | *OpGroupBroadcast* |
| | *Groups* |
| |
| | `sub_​group_​reduce_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​reduce_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​reduce_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| |
| | `sub_​group_​scan_​exclusive_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​scan_​exclusive_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​scan_​exclusive_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| |
| | `sub_​group_​scan_​inclusive_​add` |
| | *OpGroupIAdd*, *OpGroupFAdd* |
| | *Groups* |
| | `sub_​group_​scan_​inclusive_​min` |
| | *OpGroupSMin*, *OpGroupUMin*, *OpGroupFMin* |
| | *Groups* |
| | `sub_​group_​scan_​inclusive_​max` |
| | *OpGroupSMax*, *OpGroupUMax*, *OpGroupFMax* |
| | *Groups* |
| endif::cl_khr_subgroup_extended_types[] |
| |
| ifdef::cl_khr_subgroup_non_uniform_arithmetic[] |
| 3+| For `<<cl_khr_subgroup_non_uniform_arithmetic>>`: |
| |
| | `sub_​group_​non_​uniform_​reduce_​add` |
| | *OpGroupNonUniformIAdd*, *OpGroupNonUniformFAdd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​mul` |
| | *OpGroupNonUniformIMul*, *OpGroupNonUniformFMul* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​min` |
| | *OpGroupNonUniformSMin*, *OpGroupNonUniformUMin*, *OpGroupNonUniformFMin* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​max` |
| | *OpGroupNonUniformSMax*, *OpGroupNonUniformUMax*, *OpGroupNonUniformFMax* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​and` |
| | *OpGroupNonUniformBitwiseAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​or` |
| | *OpGroupNonUniformBitwiseOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​xor` |
| | *OpGroupNonUniformBitwiseXor* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​logical_​and` |
| | *OpGroupNonUniformLogicalAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​logical_​or` |
| | *OpGroupNonUniformLogicalOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​reduce_​logical_​xor` |
| | *OpGroupNonUniformLogicalXor* |
| | *GroupNonUniformArithmetic* |
| |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​add` |
| | *OpGroupNonUniformIAdd*, *OpGroupNonUniformFAdd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​mul` |
| | *OpGroupNonUniformIMul*, *OpGroupNonUniformFMul* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​min` |
| | *OpGroupNonUniformSMin*, *OpGroupNonUniformUMin*, *OpGroupNonUniformFMin* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​max` |
| | *OpGroupNonUniformSMax*, *OpGroupNonUniformUMax*, *OpGroupNonUniformFMax* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​and` |
| | *OpGroupNonUniformBitwiseAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​or` |
| | *OpGroupNonUniformBitwiseOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​xor` |
| | *OpGroupNonUniformBitwiseXor* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​logical_​and` |
| | *OpGroupNonUniformLogicalAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​logical_​or` |
| | *OpGroupNonUniformLogicalOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​inclusive_​logical_​xor` |
| | *OpGroupNonUniformLogicalXor* |
| | *GroupNonUniformArithmetic* |
| |
| | `sub_​group_​non_​uniform_​scan_​exclusive_​add` |
| | *OpGroupNonUniformIAdd*, *OpGroupNonUniformFAdd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​exclusive_​mul` |
| | *OpGroupNonUniformIMul*, *OpGroupNonUniformFMul* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​scan_​exclusive_​min` |
| | *OpGroupNonUniformSMin*, *OpGroupNonUniformUMin*, *OpGroupNonUniformFMin* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​max` |
| | *OpGroupNonUniformSMax*, *OpGroupNonUniformUMax*, *OpGroupNonUniformFMax* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​and` |
| | *OpGroupNonUniformBitwiseAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​or` |
| | *OpGroupNonUniformBitwiseOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​xor` |
| | *OpGroupNonUniformBitwiseXor* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​logical_​and` |
| | *OpGroupNonUniformLogicalAnd* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​logical_​or` |
| | *OpGroupNonUniformLogicalOr* |
| | *GroupNonUniformArithmetic* |
| | `sub_​group_​non_​uniform_​​scan_​exclusive_​logical_​xor` |
| | *OpGroupNonUniformLogicalXor* |
| | *GroupNonUniformArithmetic* |
| endif::cl_khr_subgroup_non_uniform_arithmetic[] |
| |
| ifdef::cl_khr_subgroup_non_uniform_vote[] |
| 3+| For `<<cl_khr_subgroup_non_uniform_vote>>`: |
| |
| | `sub_​group_​elect` |
| | *OpGroupNonUniformElect* |
| | *GroupNonUniform* |
| | `sub_​group_​non_​uniform_​all` |
| | *OpGroupNonUniformAll* |
| | *GroupNonUniformVote* |
| | `sub_​group_​non_​uniform_​any` |
| | *OpGroupNonUniformAny* |
| | *GroupNonUniformVote* |
| | `sub_​group_​non_​uniform_​all_​equal` |
| | *OpGroupNonUniformAllEqual* |
| | *GroupNonUniformVote* |
| endif::cl_khr_subgroup_non_uniform_vote[] |
| |
| ifdef::cl_khr_subgroup_shuffle[] |
| 3+| For `<<cl_khr_subgroup_shuffle>>`: |
| |
| | `sub_​group_​shuffle` |
| | *OpGroupNonUniformShuffle* |
| | *GroupNonUniformShuffle* |
| | `sub_​group_​shuffle_​xor` |
| | *OpGroupNonUniformShuffleXor* |
| | *GroupNonUniformShuffle* |
| endif::cl_khr_subgroup_shuffle[] |
| |
| ifdef::cl_khr_subgroup_shuffle_relative[] |
| 3+| For `<<cl_khr_subgroup_shuffle_relative>>`: |
| |
| | `sub_​group_​shuffle_​up` |
| | *OpGroupNonUniformShuffleUp* |
| | *GroupNonUniformShuffleRelative* |
| | `sub_​group_​shuffle_​down` |
| | *OpGroupNonUniformShuffleDown* |
| | *GroupNonUniformShuffleRelative* |
| endif::cl_khr_subgroup_shuffle_relative[] |
| |
| |==== |
| |
| |
| [[opencl-numerical-compliance]] |
| = OpenCL Numerical Compliance |
| |
| This section describes features of the <<C99-spec,C99>> and IEEE 754 |
| standards that must be supported by all OpenCL compliant devices. |
| |
| This section describes the functionality that must be supported by all |
| OpenCL devices for single precision floating-point numbers. |
| Currently, only single precision floating-point is a requirement. |
| Double-precision floating-point is an optional feature. |
| |
| |
| [[rounding-modes-1]] |
| == Rounding Modes |
| |
| Floating-point calculations may be carried out internally with extra |
| precision and then rounded to fit into the destination type. |
| IEEE 754 defines four possible rounding modes: |
| |
| * Round to nearest even |
| * Round toward +{inf} |
| * Round toward -{inf} |
| * Round toward zero |
| |
| _Round to nearest_ _even_ is currently the only rounding mode required by the |
| OpenCL specification for single precision and double-precision operations and is |
| therefore the default rounding mode |
| footnote:[{fn-float-required-rounding-mode}]. |
| In addition, only static selection of rounding mode is supported. |
| Dynamically reconfiguring the rounding modes as specified by the IEEE 754 |
| spec is unsupported. |
| |
| ifdef::cl_khr_fp16[] |
| If the `<<cl_khr_fp16>>` extension macro is supported, then |
| if `CL_FP_ROUND_TO_NEAREST` is supported, the default rounding mode for |
| half-precision floating-point operations will be round to nearest even; |
| otherwise the default rounding mode will be round to zero. |
| |
| Conversions to half floating-point format must be correctly rounded using |
| the indicated `convert` operator rounding mode or the default rounding mode |
| for half-precision floating-point operations if no rounding mode is |
| specified by the operator, or a C-style cast is used. |
| |
| Conversions from half to integer format shall correctly round using the |
| indicated `convert` operator rounding mode, or towards zero if no rounding |
| mode is specified by the operator or a C-style cast is used. |
| All conversions from half to floating-point formats are exact. |
| endif::cl_khr_fp16[] |
| |
| ifdef::cl_khr_select_fprounding_mode[] |
| [open,refpage='SELECT_ROUNDING_MODE',desc='Select rounding mode for a group of instructions',type='freeform',spec='clang',anchor='select-rounding-mode-macro',xrefs='fpMacros'] |
| -- |
| [[select-rounding-mode]] |
| |
| If the `<<cl_khr_select_fprounding_mode>>` extension macro is supported, the |
| floating-point rounding mode may be specified using the following *#pragma* |
| in the OpenCL program source: |
| |
| [source,opencl_c] |
| ---- |
| #pragma OPENCL SELECT_ROUNDING_MODE <rounding-mode> |
| ---- |
| |
| The _<rounding-mode>_ may be one of the following values: |
| |
| * *rte* - round to nearest even |
| * *rtz* - round to zero |
| * *rtp* - round to positive infinity |
| * *rtn* - round to negative infinity |
| |
| If this extensions is supported then the OpenCL implementation must support |
| all four rounding modes for single precision floating-point. |
| |
| The *#pragma* sets the rounding mode for all instructions that operate on |
| floating-point types (scalar or vector types) or produce floating-point |
| values that follow this pragma in the program source until the next |
| *#pragma*. |
| Note that the rounding mode specified for a block of code is known at |
| compile time. |
| When inside a compound statement, the pragma takes effect from its |
| occurrence until another *#pragma* is encountered (including within a nested |
| compound statement), or until the end of the compound statement; at the end |
| of a compound statement the state for the pragma is restored to its |
| condition just before the compound statement. |
| Except where otherwise documented, the callee functions do not inherit the |
| rounding mode of the caller function. |
| |
| If the `<<cl_khr_select_fprounding_mode>>` extension is enabled, the |
| `\\__ROUNDING_MODE__` preprocessor symbol shall be defined to be one of the |
| following according to the current rounding mode: |
| |
| [source,opencl_c] |
| ---- |
| #define __ROUNDING_MODE__ rte |
| #define __ROUNDING_MODE__ rtz |
| #define __ROUNDING_MODE__ rtp |
| #define __ROUNDING_MODE__ rtz |
| ---- |
| |
| This is intended to enable remapping `foo()` to `foo_rte()` by the |
| preprocessor by using: |
| |
| [source,opencl_c] |
| ---- |
| #define foo foo ## __ROUNDING_MODE__ |
| ---- |
| |
| The default rounding mode is round to nearest even. |
| The <<math-functions, Math Functions>>, <<common-functions, Common |
| Functions>>, and <<geometric-functions, Geometric Functions>> are |
| implemented with the round to nearest even rounding mode. |
| Various built-in conversions and the *vstore_half* and *vstorea_half* |
| built-in functions that do not specify a rounding mode inherit the current |
| rounding mode. |
| Conversions from floating-point to integer type always use `rtz` mode, |
| except where the user specifically asks for another rounding mode. |
| |
| NOTE: The `<<cl_khr_select_fprounding_mode>>` extension was deprecated in |
| OpenCL 1.1, and its use is not recommended. |
| -- |
| endif::cl_khr_select_fprounding_mode[] |
| |
| |
| [[inf-nan-and-denormalized-numbers]] |
| == INF, NaN and Denormalized Numbers |
| |
| `INF` and NaNs must be supported. |
| Support for signaling NaNs is not required. |
| |
| Support for denormalized numbers with single precision floating-point is |
| optional. |
| Denormalized single precision floating-point numbers passed as input or |
| produced as the output of single precision floating-point operations such as |
| add, sub, mul, divide, and the functions defined in <<math-functions,math |
| functions>>, <<common-functions,common functions>>, and |
| <<geometric-functions,geometric functions>> may be flushed to zero. |
| |
| |
| [[floating-point-exceptions]] |
| == Floating-Point Exceptions |
| |
| Floating-point exceptions are disabled in OpenCL. |
| The result of a floating-point exception must match the IEEE 754 spec for |
| the exceptions not enabled case. |
| Whether and when the implementation sets floating-point flags or raises |
| floating-point exceptions is implementation-defined. |
| This standard provides no method for querying, clearing or setting |
| floating-point flags or trapping raised exceptions. |
| Due to non-performance, non-portability of trap mechanisms and the |
| impracticality of servicing precise exceptions in a vector context |
| (especially on heterogeneous hardware), such features are discouraged. |
| |
| Implementations that nevertheless support such operations through an |
| extension to the standard shall initialize with all exception flags cleared |
| and the exception masks set so that exceptions raised by arithmetic |
| operations do not trigger a trap to be taken. |
| If the underlying work is reused by the implementation, the implementation |
| is however not responsible for reclearing the flags or resetting exception |
| masks to default values before entering the kernel. |
| That is to say that kernels that do not inspect flags or enable traps are |
| licensed to expect that their arithmetic will not trigger a trap. |
| Those kernels that do examine flags or enable traps are responsible for |
| clearing flag state and disabling all traps before returning control to the |
| implementation. |
| Whether or when the underlying work-item (and accompanying global |
| floating-point state if any) is reused is implementation-defined. |
| |
| The expressions *math_errorhandling* and `MATH_ERREXCEPT` are reserved for |
| use by this standard, but not defined. |
| Implementations that extend this specification with support for |
| floating-point exceptions shall define *math_errorhandling* and |
| `MATH_ERREXCEPT` per <<C99-spec, TC2 to the C99 Specification>>. |
| |
| |
| [[relative-error-as-ulps]] |
| == Relative Error as ULPs |
| |
| In this section we discuss the maximum relative error defined as ulp (units |
| in the last place). |
| Addition, subtraction, multiplication, fused multiply-add and conversion |
| between integer and a single precision floating-point format are IEEE 754 |
| compliant and are therefore correctly rounded. |
| Conversion between floating-point formats and |
| <<explicit-conversions,explicit conversions>> must be correctly rounded. |
| |
| ifdef::cl_khr_fp16[] |
| If the `<<cl_khr_fp16>>` extension macro is supported, |
| addition, subtraction, multiplication, fused multiply-add operations on half |
| types are required to be correctly rounded using the default rounding mode |
| for half-precision floating-point operations. |
| endif::cl_khr_fp16[] |
| |
| The ULP is defined as follows: |
| |
| ==== |
| If _x_ is a real number that lies between two finite consecutive |
| floating-point numbers _a_ and _b_, without being equal to one of them, then |
| ulp(_x_) = |_b_ - _a_|, otherwise ulp(_x_) is the distance between the two |
| non-equal finite floating-point numbers nearest _x_. |
| Moreover, ulp(NaN) is NaN. |
| ==== |
| |
| _Attribution: This definition was taken with consent from Jean-Michel Muller |
| with slight clarification for behavior at zero._ |
| |
| ==== |
| Jean-Michel Muller. On the definition of ulp(x). RR-5504, INRIA. 2005, pp.16. <inria-00070503> |
| Currently hosted at |
| https://hal.inria.fr/inria-00070503/document[https://hal.inria.fr/inria-00070503/document]. |
| ==== |
| |
| The following table describes the minimum accuracy of single precision |
| floating-point arithmetic operations given as ULP values. |
| The reference value used to compute the ULP value of an arithmetic operation |
| is the infinitely precise result. |
| 0 ulp is used for math functions that do not require rounding. |
| |
| Result overflow within the specified ULP error is permitted. Math functions are |
| allowed to return infinity for a finite reference value when the next |
| floating-point number that would be representable after the finite maximum, if |
| there was sufficient range, meets ULP error tolerance. |
| |
| [[table-ulp-float-math]] |
| .ULP Values for Single-Precision Built-in Math Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Min Accuracy - ULP values |
| | _x_ + _y_ | Correctly rounded |
| | _x_ - _y_ | Correctly rounded |
| | _x_ * _y_ | Correctly rounded |
| | 1.0 / _x_ | {leq} 2.5 ulp |
| | _x_ / _y_ | {leq} 2.5 ulp |
| | | |
| | *acos* | {leq} 4 ulp |
| | *acospi* | {leq} 5 ulp |
| | *asin* | {leq} 4 ulp |
| | *asinpi* | {leq} 5 ulp |
| | *atan* | {leq} 5 ulp |
| | *atan2* | {leq} 6 ulp |
| | *atanpi* | {leq} 5 ulp |
| | *atan2pi* | {leq} 6 ulp |
| | *acosh* | {leq} 4 ulp |
| | *asinh* | {leq} 4 ulp |
| | *atanh* | {leq} 5 ulp |
| | *cbrt* | {leq} 2 ulp |
| | *ceil* | Correctly rounded |
| | *clamp* | 0 ulp |
| | *copysign* | 0 ulp |
| | *cos* | {leq} 4 ulp |
| | *cosh* | {leq} 4 ulp |
| | *cospi* | {leq} 4 ulp |
| // 3 operations from the 2 multiplications and 1 subtraction per component |
| | *cross* | absolute error tolerance of 'max * max * (3 * FLT_EPSILON)' per vector component, where _max_ is the maximum input operand magnitude |
| | *degrees* | {leq} 2 ulp |
| // 3 ULP error in sqrt |
| // 1.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 3 + (1.5 * n) + (0.5 * (n - 1)) |
| // = 3 + 1.5n + (0.5n - 0.5) |
| // = 2.5 + 2n |
| | *distance* | {leq} 2.5 + 2n ulp, for gentype with vector width _n_ |
| // n + n-1 Number of operations from n multiples and (n-1) additions |
| // 2n - 1 |
| | *dot* | absolute error tolerance of 'max * max * (2n - 1) * FLT_EPSILON', for vector width _n_ and maximum input operand magnitude _max_ across all vector components |
| | *erfc* | {leq} 16 ulp |
| | *erf* | {leq} 16 ulp |
| | *exp* | {leq} 3 ulp |
| | *exp2* | {leq} 3 ulp |
| | *exp10* | {leq} 3 ulp |
| | *expm1* | {leq} 3 ulp |
| | *fabs* | 0 ulp |
| | *fdim* | Correctly rounded |
| | *floor* | Correctly rounded |
| | *fma* | Correctly rounded |
| | *fmax* | 0 ulp |
| | *fmin* | 0 ulp |
| | *fmod* | 0 ulp |
| | *fract* | Correctly rounded |
| | *frexp* | 0 ulp |
| | *hypot* | {leq} 4 ulp |
| | *ilogb* | 0 ulp |
| // 3 ULP error in sqrt |
| // 0.5 effect on e of taking sqrt(x + e) |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = (3 + 0.5 * ((0.5 * n) + (0.5 * (n - 1)))) |
| // = 3 + 0.5 * (n - 0.5) |
| // = 2.75 + 0.5n |
| | *length* | {leq} 2.75 + 0.5n ulp, for gentype with vector width _n_ |
| | *ldexp* | Correctly rounded |
| | *lgamma* | Undefined |
| | *lgamma_r* | Undefined |
| | *log* | {leq} 3 ulp |
| | *log2* | {leq} 3 ulp |
| | *log10* | {leq} 3 ulp |
| | *log1p* | {leq} 2 ulp |
| | *logb* | 0 ulp |
| | *mad* | Implemented either as a correctly rounded fma or |
| as a multiply followed by an add both of which are |
| correctly rounded |
| | *max* | 0 ulp |
| | *maxmag* | 0 ulp |
| | *min* | 0 ulp |
| | *minmag* | 0 ulp |
| | *mix* | absolute error tolerance of 1e-3 |
| | *modf* | 0 ulp |
| | *nan* | 0 ulp |
| | *nextafter* | 0 ulp |
| // 2.5 error in rsqrt + error in multiply |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 2.5 + (0.5 * n) + (0.5 * (n - 1)) |
| // = 2.5 + 0.5n + (0.5n - 0.5) |
| // = 2.0 + n |
| | *normalize* | {leq} 2 + n ulp, for gentype with vector width _n_ |
| | *pow*(_x_, _y_) | {leq} 16 ulp |
| | *pown*(_x_, _y_) | {leq} 16 ulp |
| | *powr*(_x_, _y_) | {leq} 16 ulp |
| | *radians* | {leq} 2 ulp |
| | *remainder* | 0 ulp |
| | *remquo* | 0 ulp |
| | *rint* | Correctly rounded |
| | *rootn* | {leq} 16 ulp |
| | *round* | Correctly rounded |
| | *rsqrt* | {leq} 2 ulp |
| | *sign* | 0 ulp |
| | *sin* | {leq} 4 ulp |
| | *sincos* | {leq} 4 ulp for sine and cosine values |
| | *sinh* | {leq} 4 ulp |
| | *sinpi* | {leq} 4 ulp |
| | *smoothstep* | absolute error tolerance of 1e-5 |
| | *sqrt* | {leq} 3 ulp |
| | *step* | 0 ulp |
| | *tan* | {leq} 5 ulp |
| | *tanh* | {leq} 5 ulp |
| | *tanpi* | {leq} 6 ulp |
| | *tgamma* | {leq} 16 ulp |
| | *trunc* | Correctly rounded |
| | | |
| | *half_cos* | {leq} 8192 ulp |
| | *half_divide* | {leq} 8192 ulp |
| | *half_exp* | {leq} 8192 ulp |
| | *half_exp2* | {leq} 8192 ulp |
| | *half_exp10* | {leq} 8192 ulp |
| | *half_log* | {leq} 8192 ulp |
| | *half_log2* | {leq} 8192 ulp |
| | *half_log10* | {leq} 8192 ulp |
| | *half_powr* | {leq} 8192 ulp |
| | *half_recip* | {leq} 8192 ulp |
| | *half_rsqrt* | {leq} 8192 ulp |
| | *half_sin* | {leq} 8192 ulp |
| | *half_sqrt* | {leq} 8192 ulp |
| | *half_tan* | {leq} 8192 ulp |
| | | |
| |
| // 8192 ULP error in half_sqrt |
| // 1.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 8192 + (1.5 * n) + (0.5 * (n - 1)) |
| // = 8192 + 1.5n + (0.5n - 0.5) |
| // = 8191.5 + 2n |
| | *fast_distance* | {leq} 8191.5 + 2n ulp, for gentype with vector width _n_ |
| |
| // 8192 ULP error in half_sqrt |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 8192 + (0.5 * n) + (0.5 * (n - 1)) |
| // = 8192 + 0.5n + (0.5n - 0.5) |
| // = 8191.5 + n |
| | *fast_length* | {leq} 8191.5 + n ulp, for gentype with vector width _n_ |
| |
| // 8192.5 error in half_rsqrt + error in multiply |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 8192.5 + (0.5 * n) + (0.5 * (n - 1)) |
| // = 8192.5 + 0.5n + (0.5n - 0.5) |
| // = 8192 + n |
| | *fast_normalize* | {leq} 8192 + n ulp, for gentype with vector width _n_ |
| | | |
| | *native_cos* | Implementation-defined |
| | *native_divide* | Implementation-defined |
| | *native_exp* | Implementation-defined |
| | *native_exp2* | Implementation-defined |
| | *native_exp10* | Implementation-defined |
| | *native_log* | Implementation-defined |
| | *native_log2* | Implementation-defined |
| | *native_log10* | Implementation-defined |
| | *native_powr* | Implementation-defined |
| | *native_recip* | Implementation-defined |
| | *native_rsqrt* | Implementation-defined |
| | *native_sin* | Implementation-defined |
| | *native_sqrt* | Implementation-defined |
| | *native_tan* | Implementation-defined |
| |==== |
| |
| The following table describes the minimum accuracy of single precision |
| floating-point arithmetic operations given as ULP values for the embedded |
| profile. |
| The reference value used to compute the ULP value of an arithmetic operation |
| is the infinitely precise result. |
| 0 ulp is used for math functions that do not require rounding. |
| |
| [[table-ulp-embedded]] |
| .ULP Values for the Embedded Profile |
| [cols=",",options="header",] |
| |==== |
| | Function | Min Accuracy - ULP values |
| | _x_ + _y_ | Correctly rounded |
| | _x_ - _y_ | Correctly rounded |
| | _x_ * _y_ | Correctly rounded |
| | 1.0 / _x_ | {leq} 3 ulp |
| | _x_ / _y_ | {leq} 3 ulp |
| | | |
| | *acos* | {leq} 4 ulp |
| | *acospi* | {leq} 5 ulp |
| | *asin* | {leq} 4 ulp |
| | *asinpi* | {leq} 5 ulp |
| | *atan* | {leq} 5 ulp |
| | *atan2* | {leq} 6 ulp |
| | *atanpi* | {leq} 5 ulp |
| | *atan2pi* | {leq} 6 ulp |
| | *acosh* | {leq} 4 ulp |
| | *asinh* | {leq} 4 ulp |
| | *atanh* | {leq} 5 ulp |
| | *cbrt* | {leq} 4 ulp |
| | *ceil* | Correctly rounded |
| | *clamp* | 0 ulp |
| | *copysign* | 0 ulp |
| | *cos* | {leq} 4 ulp |
| | *cosh* | {leq} 4 ulp |
| | *cospi* | {leq} 4 ulp |
| | *cross* | Implementation-defined |
| | *degrees* | {leq} 2 ulp |
| | *distance* | Implementation-defined |
| | *dot* | Implementation-defined |
| | *erfc* | {leq} 16 ulp |
| | *erf* | {leq} 16 ulp |
| | *exp* | {leq} 4 ulp |
| | *exp2* | {leq} 4 ulp |
| | *exp10* | {leq} 4 ulp |
| | *expm1* | {leq} 4 ulp |
| | *fabs* | 0 ulp |
| | *fdim* | Correctly rounded |
| | *floor* | Correctly rounded |
| | *fma* | Correctly rounded |
| | *fmax* | 0 ulp |
| | *fmin* | 0 ulp |
| | *fmod* | 0 ulp |
| | *fract* | Correctly rounded |
| | *frexp* | 0 ulp |
| | *hypot* | {leq} 4 ulp |
| | *ilogb* | 0 ulp |
| | *ldexp* | Correctly rounded |
| | *length* | Implementation-defined |
| | *log* | {leq} 4 ulp |
| | *log2* | {leq} 4 ulp |
| | *log10* | {leq} 4 ulp |
| | *log1p* | {leq} 4 ulp |
| | *logb* | 0 ulp |
| | *mad* | Any value allowed (infinite ulp) |
| | *max* | 0 ulp |
| | *maxmag* | 0 ulp |
| | *min* | 0 ulp |
| | *minmag* | 0 ulp |
| | *mix* | Implementation-defined |
| | *modf* | 0 ulp |
| | *nan* | 0 ulp |
| | *normalize* | Implementation-defined |
| | *nextafter* | 0 ulp |
| | *pow*(_x_, _y_) | {leq} 16 ulp |
| | *pown*(_x_, _y_) | {leq} 16 ulp |
| | *powr*(_x_, _y_) | {leq} 16 ulp |
| | *radians* | {leq} 2 ulp |
| | *remainder* | 0 ulp |
| | *remquo* | 0 ulp |
| | *rint* | Correctly rounded |
| | *rootn* | {leq} 16 ulp |
| | *round* | Correctly rounded |
| | *rsqrt* | {leq} 4 ulp |
| | *sign* | 0 ulp |
| | *sin* | {leq} 4 ulp |
| | *sincos* | {leq} 4 ulp for sine and cosine values |
| | *sinh* | {leq} 4 ulp |
| | *sinpi* | {leq} 4 ulp |
| | *smoothstep* | Implementation-defined |
| | *sqrt* | {leq} 4 ulp |
| | *step* | 0 ulp |
| | *tan* | {leq} 5 ulp |
| | *tanh* | {leq} 5 ulp |
| | *tanpi* | {leq} 6 ulp |
| | *tgamma* | {leq} 16 ulp |
| | *trunc* | Correctly rounded |
| | | |
| | *half_cos* | {leq} 8192 ulp |
| | *half_divide* | {leq} 8192 ulp |
| | *half_exp* | {leq} 8192 ulp |
| | *half_exp2* | {leq} 8192 ulp |
| | *half_exp10* | {leq} 8192 ulp |
| | *half_log* | {leq} 8192 ulp |
| | *half_log2* | {leq} 8192 ulp |
| | *half_log10* | {leq} 8192 ulp |
| | *half_powr* | {leq} 8192 ulp |
| | *half_recip* | {leq} 8192 ulp |
| | *half_rsqrt* | {leq} 8192 ulp |
| | *half_sin* | {leq} 8192 ulp |
| | *half_sqrt* | {leq} 8192 ulp |
| | *half_tan* | {leq} 8192 ulp |
| | | |
| | *fast_distance* | Implementation-defined |
| | *fast_length* | Implementation-defined |
| | *fast_normalize* | Implementation-defined |
| | | |
| | *native_cos* | Implementation-defined |
| | *native_divide* | Implementation-defined |
| | *native_exp* | Implementation-defined |
| | *native_exp2* | Implementation-defined |
| | *native_exp10* | Implementation-defined |
| | *native_log* | Implementation-defined |
| | *native_log2* | Implementation-defined |
| | *native_log10* | Implementation-defined |
| | *native_powr* | Implementation-defined |
| | *native_recip* | Implementation-defined |
| | *native_rsqrt* | Implementation-defined |
| | *native_sin* | Implementation-defined |
| | *native_sqrt* | Implementation-defined |
| | *native_tan* | Implementation-defined |
| |==== |
| |
| The <<table-float-ulp-relaxed,following table>> describes the minimum accuracy |
| of commonly used single precision floating-point arithmetic operations given |
| as ULP values if the `-cl-unsafe-math-optimizations` compiler option is |
| specified when compiling or building an OpenCL program. |
| For derived implementations, the operations used in the derivation may |
| themselves be relaxed according to the following table. |
| The minimum accuracy of math functions not defined in the following table when |
| the `-cl-unsafe-math-optimizations` compiler option is specified is as defined |
| in <<table-ulp-float-math,ULP values for single precision built-in math |
| functions>> when operating in the full profile, and as defined in |
| <<table-ulp-embedded,ULP values for the embedded profile>> when operating in the |
| embedded profile. |
| The reference value used to compute the ULP value of an arithmetic operation |
| is the infinitely precise result. |
| 0 ulp is used for math functions that do not require rounding. |
| |
| Defined minimum accuracy of single precision floating-point arithmetic |
| operations and builtins with `-cl-unsafe-math-optimizations` <<unified-spec, |
| requires>> support for OpenCL C 2.0 or newer. |
| |
| [[table-float-ulp-relaxed]] |
| .ULP Values for Single-Precision Built-in Math Functions With Unsafe Math Optimizations in the Full and Embedded Profiles |
| [cols="3,7",options="header",] |
| |==== |
| | Function | Minimum Accuracy |
| |
| | 1.0 / _x_ |
| | {leq} 2.5 ulp for _x_ in the domain of 2^-126^ to 2^126^ for the full |
| profile, and {leq} 3 ulp for the embedded profile. |
| |
| | _x_ / _y_ |
| | {leq} 2.5 ulp for _x_ in the domain of 2^-62^ to 2^62^ and _y_ in the |
| domain of 2^-62^ to 2^62^ for the full profile, and {leq} 3 ulp for |
| the embedded profile. |
| |
| | *acos*(_x_) |
| | {leq} 4096 ulp |
| |
| | *acosh*(_x_) |
| | Derived implementations may implement as *log*(_x_ + *sqrt*(_x_ * _x_ - 1)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *acospi*(_x_) |
| | Derived implementations may implement as *acos*(_x_) * `M_PI_F`. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *asin*(_x_) |
| | {leq} 4096 ulp |
| |
| | *asinh*(_x_) |
| | Derived implementations may implement as *log*(_x_ + *sqrt*(_x_ * _x_ + 1)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *asinpi*(_x_) |
| | Derived implementations may implement as *asin*(_x_) * `M_PI_F`. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *atan*(_x_) |
| | {leq} 4096 ulp |
| |
| | *atanh*(_x_) |
| | Defined for _x_ in the domain (-1, 1). |
| For _x_ in [-2^-10^, 2^-10^], derived implementations may implement as _x_. |
| For _x_ outside of [-2^-10^, 2^-10^], derived implementations may implement as |
| 0.5f * *log*\((1.0f + _x_) / (1.0f - _x_)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *atanpi*(_x_) |
| | Derived implementations may implement as *atan*(_x_) * `M_1_PI_F`. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *atan2*(_y_, _x_) |
| | Derived implementations may implement as *atan*(_y_ / _x_) for _x_ > 0, |
| *atan*(_y_ / _x_) + `M_PI_F` for _x_ < 0 and _y_ > 0, and |
| *atan*(_y_ / _x_) - `M_PI_F` for _x_ < 0 and _y_ < 0. |
| |
| | *atan2pi*(_y_, _x_) |
| | Derived implementations may implement as *atan2*(_y_, _x_) * `M_1_PI_F`. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *cbrt*(_x_) |
| | Derived implementations may implement as *rootn*(_x_, 3). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *cos*(_x_) |
| | For _x_ in the domain [-{pi}, {pi}], the maximum absolute error |
| is {leq} 2^-11^ and larger otherwise. |
| |
| | *cosh*(_x_) |
| | Defined for _x_ in the domain [-88, 88]. |
| Derived implementations may implement as 0.5f * (*exp*(_x_) + *exp*(-_x_)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *cospi*(_x_) |
| | For _x_ in the domain [-1, 1], the maximum absolute error is {leq} |
| 2^-11^ and larger otherwise. |
| |
| | *exp*(_x_) |
| | {leq} 3 + *floor*(*fabs*(2 * _x_)) ulp for the full profile, and {leq} |
| 4 ulp for the embedded profile. |
| |
| | *exp2*(_x_) |
| | {leq} 3 + *floor*(*fabs*(2 * _x_)) ulp for the full profile, and {leq} |
| 4 ulp for the embedded profile. |
| |
| | *exp10*(_x_) |
| | Derived implementations may implement as *exp2*(_x_ * *log2*(10)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *expm1*(_x_) |
| | Derived implementations may implement as *exp*(_x_) - 1. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *log*(_x_) |
| | For _x_ in the domain [0.5, 2] the maximum absolute error is {leq} |
| 2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile |
| and {leq} 4 ulp for the embedded profile. |
| |
| | *log2*(_x_) |
| | For _x_ in the domain [0.5, 2] the maximum absolute error is {leq} |
| 2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile |
| and {leq} 4 ulp for the embedded profile. |
| |
| | *log10*(_x_) |
| | For _x_ in the domain [0.5, 2] the maximum absolute error is {leq} |
| 2^-21^; otherwise the maximum error is {leq} 3 ulp for the full profile |
| and {leq} 4 ulp for the embedded profile. |
| |
| | *log1p*(_x_) |
| | Derived implementations may implement as *log*(_x_ + 1). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *pow*(_x_, _y_) |
| | Undefined for _x_ = 0 and _y_ = 0. |
| Undefined for _x_ < 0 and non-integer _y_. |
| Undefined for _x_ < 0 and _y_ outside the domain [-2^24^, 2^24^]. |
| For _x_ > 0 or _x_ < 0 and even _y_, derived implementations may implement as |
| *exp2*(_y_ * *log2*(*fabs*(_x_))). |
| For _x_ < 0 and odd _y_, derived implementations may implement as |
| -*exp2*(_y_ * *log2*(*fabs*(_x_)). |
| For _x_ == 0 and non-zero _y_, for derived implementations may return zero. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| footnote:[{fn-pow-performance}] |
| |
| | *pown*(_x_, _y_) |
| | Defined only for integer values of _y_. |
| Undefined for _x_ = 0 and _y_ = 0. |
| For _x_ >= 0 or _x_ < 0 and even _y_, derived implementations may implement as |
| *exp2*(_y_ * *log2*(*fabs*(_x_))). |
| For _x_ < 0 and odd _y_, derived implementations may implement as |
| -*exp2*(_y_ * *log2*(*fabs*(_x_))). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *powr*(_x_, _y_) |
| | Defined only for _x_ >= 0. |
| Undefined for _x_ = 0 and _y_ = 0. |
| Derived implementations may implement as *exp2*(_y_ * *log2*(_x_)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *rootn*(_x_, _y_) |
| | Defined for _x_ > 0 when _y_ is non-zero, derived implementations |
| may implement this case as *exp2*(*log2*(_x_) / _y_). |
| Defined for _x_ < 0 when _y_ is odd, derived implementations |
| may implement this case as -*exp2*(*log2*(-_x_) / _y_). |
| Defined for _x_ = +/-0 when _y_ > 0, derived implementations may |
| return +0 in this case. |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *sin*(_x_) |
| | For _x_ in the domain [-{pi}, {pi}], the maximum absolute error is |
| {leq} 2^-11^ and larger otherwise. |
| |
| | *sincos*(_x_) |
| | ulp values as defined for *sin*(_x_) and *cos*(_x_). |
| |
| | *sinh*(_x_) |
| | Defined for _x_ in the domain [-88,88]. |
| For _x_ in [-2^-10^, 2^-10^], derived implementations |
| may implement as _x_. |
| For _x_ outside of [-2^-10^, 2^-10^], derived implementations |
| may implement as 0.5f * (*exp*(_x_) - *exp*(-_x_)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *sinpi*(_x_) |
| | For _x_ in the domain [-1, 1], the maximum absolute error is {leq} |
| 2^-11^ and larger otherwise. |
| |
| | *tan*(_x_) |
| | Derived implementations may implement as |
| *sin*(_x_) * (1.0f / *cos*(_x_)). |
| For non-derived implementations, the error is {leq} 8192 ulp. |
| |
| | *tanh*(_x_) |
| | Defined for _x_ in the domain [-{inf}, {inf}]. |
| For _x_ in [-2^-10^, 2^-10^], derived implementations |
| may implement as _x_. |
| For _x_ outside of [-2^-10^, 2^-10^], derived implementations |
| may implement as (*exp*(_x_) - *exp*(-_x_)) / (*exp*(_x_) + *exp*(-_x_)). |
| For non-derived implementations, the error is {leq} 8192 ULP. |
| |
| | *tanpi*(_x_) |
| | Derived implementations may implement as *tan*(_x_ * `M_PI_F`). |
| For non-derived implementations, the error is {leq} 8192 ulp for _x_ |
| in the domain [-1, 1]. |
| |
| | _x_ * _y_ + _z_ |
| | Implemented either as a correctly rounded *fma* or as a multiply and |
| an add both of which are correctly rounded. |
| |==== |
| |
| The following table describes the minimum accuracy of double-precision |
| floating-point arithmetic operations given as ULP values. |
| The reference value used to compute the ULP value of an arithmetic operation |
| is the infinitely precise result. |
| 0 ulp is used for math functions that do not require rounding. |
| |
| [[table-ulp-double]] |
| .ULP Values for Double-Precision Built-in Math Functions |
| [cols=",",options="header",] |
| |==== |
| | Function | Min Accuracy - ULP values |
| | _x_ + _y_ | Correctly rounded |
| | _x_ - _y_ | Correctly rounded |
| | _x_ * _y_ | Correctly rounded |
| | 1.0 / _x_ | Correctly rounded |
| | _x_ / _y_ | Correctly rounded |
| | | |
| | *acos* | {leq} 4 ulp |
| | *acospi* | {leq} 5 ulp |
| | *asin* | {leq} 4 ulp |
| | *asinpi* | {leq} 5 ulp |
| | *atan* | {leq} 5 ulp |
| | *atan2* | {leq} 6 ulp |
| | *atanpi* | {leq} 5 ulp |
| | *atan2pi* | {leq} 6 ulp |
| | *acosh* | {leq} 4 ulp |
| | *asinh* | {leq} 4 ulp |
| | *atanh* | {leq} 5 ulp |
| | *cbrt* | {leq} 2 ulp |
| | *ceil* | Correctly rounded |
| | *clamp* | 0 ulp |
| | *copysign* | 0 ulp |
| | *cos* | {leq} 4 ulp |
| | *cosh* | {leq} 4 ulp |
| | *cospi* | {leq} 4 ulp |
| // 3 operations from the 2 multiplications and 1 subtraction per component |
| | *cross* | absolute error tolerance of 'max * max * (3 * FLT_EPSILON)' per vector component, where _max_ is the maximum input operand magnitude |
| | *degrees* | {leq} 2 ulp |
| // 3 ULP error in sqrt |
| // 0.5 effect on e of taking sqrt(x + e) |
| // 1.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // 2 accounts for error in reference code |
| // |
| // = 2 * (3 + 0.5 * ((1.5 * n) + (0.5 * (n - 1)))) |
| // = 2 * (3 + 0.5 * (1.5n + (0.5n - 0.5))) |
| // = 2 * (3 + 0.5 * (2n - 0.5)) |
| // = 2 * (3 + n - 0.25) |
| // = 2 * (2.75 + n) |
| // = 5.5 + 2n |
| | *distance* | {leq} 5.5 + 2n ulp, for gentype with vector width _n_ |
| // n + n-1 Number of operations from n multiples and (n-1) additions |
| // 2n - 1 |
| | *dot* | absolute error tolerance of 'max * max * (2n - 1) * FLT_EPSILON', for vector width _n_ and maximum input operand magnitude _max_ across all vector components |
| | *erfc* | {leq} 16 ulp |
| | *erf* | {leq} 16 ulp |
| | *exp* | {leq} 3 ulp |
| | *exp2* | {leq} 3 ulp |
| | *exp10* | {leq} 3 ulp |
| | *expm1* | {leq} 3 ulp |
| | *fabs* | 0 ulp |
| | *fdim* | Correctly rounded |
| | *floor* | Correctly rounded |
| | *fma* | Correctly rounded |
| | *fmax* | 0 ulp |
| | *fmin* | 0 ulp |
| | *fmod* | 0 ulp |
| | *fract* | Correctly rounded |
| | *frexp* | 0 ulp |
| | *hypot* | {leq} 4 ulp |
| | *ilogb* | 0 ulp |
| // 3 ULP error in sqrt |
| // 0.5 effect on e of taking sqrt(x + e) |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // 2 accounts for error in reference code |
| // |
| // = 2 * (3 + 0.5 * ((0.5 * n) + (0.5 * (n - 1)))) |
| // = 2 * (3 + 0.5 * (n - 0.5)) |
| // = 2 * (2.75 + 0.5n) |
| // = 5.5 + n |
| | *length* | {leq} 5.5 + n ulp, for gentype with vector width _n_ |
| | *ldexp* | Correctly rounded |
| | *log* | {leq} 3 ulp |
| | *log2* | {leq} 3 ulp |
| | *log10* | {leq} 3 ulp |
| | *log1p* | {leq} 2 ulp |
| | *logb* | 0 ulp |
| | *mad* | Any value allowed (infinite ulp) |
| | *max* | 0 ulp |
| | *maxmag* | 0 ulp |
| | *min* | 0 ulp |
| | *minmag* | 0 ulp |
| | *mix* | Implementation-defined |
| | *modf* | 0 ulp |
| | *nan* | 0 ulp |
| | *nextafter* | 0 ulp |
| // 2.5 error in rsqrt + error in multiply |
| // 0.5 effect on e of taking sqrt(x + e) |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // 2 accounts for error in reference code |
| // |
| // = 2 * (2.5 + 0.5 * ((0.5 * n) + (0.5 * (n - 1)))) |
| // = 2 * (2.5 + 0.5 * (0.5n + (0.5n - 0.5))) |
| // = 2 * (2.5 + 0.5 * (n - 0.5)) |
| // = 2 * (2.5 + 0.5n - 0.25) |
| // = 2 * (2.25 + 0.5n) |
| // = 4.5 + n |
| | *normalize* | {leq} 4.5 + n ulp, for gentype with vector width _n_ |
| | *pow*(_x_, _y_) | {leq} 16 ulp |
| | *pown*(_x_, _y_) | {leq} 16 ulp |
| | *powr*(_x_, _y_) | {leq} 16 ulp |
| | *radians* | {leq} 2 ulp |
| | *remainder* | 0 ulp |
| | *remquo* | 0 ulp |
| | *rint* | Correctly rounded |
| | *rootn* | {leq} 16 ulp |
| | *round* | Correctly rounded |
| | *rsqrt* | {leq} 2 ulp |
| | *sign* | 0 ulp |
| | *sin* | {leq} 4 ulp |
| | *sincos* | {leq} 4 ulp for sine and cosine values |
| | *sinh* | {leq} 4 ulp |
| | *sinpi* | {leq} 4 ulp |
| | *smoothstep* | Implementation-defined |
| | *step* | 0 ulp |
| | *fsqrt* | Correctly rounded |
| | *tan* | {leq} 5 ulp |
| | *tanh* | {leq} 5 ulp |
| | *tanpi* | {leq} 6 ulp |
| | *tgamma* | {leq} 16 ulp |
| | *trunc* | Correctly rounded |
| |==== |
| |
| |
| ifdef::cl_khr_fp16[] |
| If the `<<cl_khr_fp16>>` extension macro is supported, |
| the following table describes the minimum accuracy of half-precision |
| floating-point arithmetic operations given as ULP values. |
| The reference value used to compute the ULP value of an arithmetic operation |
| is the infinitely precise result. |
| 0 ulp is used for math functions that do not require rounding. |
| |
| [[table-ulp-half-math]] |
| .ULP Values for Half-Precision Floating-Point Arithmetic Operations |
| [cols=",,",options="header",] |
| |==== |
| | Function | Min Accuracy - Full Profile | Min Accuracy - Embedded Profile |
| | *_x_ + _y_* | Correctly rounded | Correctly rounded |
| | *_x_ - _y_* | Correctly rounded | Correctly rounded |
| | *_x_ * _y_* | Correctly rounded | Correctly rounded |
| | *1.0 / _x_* | Correctly rounded | \<= 1 ulp |
| | *_x_ / _y_* | Correctly rounded | \<= 1 ulp |
| | | | |
| | *acos* | \<= 2 ulp | \<= 3 ulp |
| | *acosh* | \<= 2 ulp | \<= 3 ulp |
| | *acospi* | \<= 2 ulp | \<= 3 ulp |
| | *asin* | \<= 2 ulp | \<= 3 ulp |
| | *asinh* | \<= 2 ulp | \<= 3 ulp |
| | *asinpi* | \<= 2 ulp | \<= 3 ulp |
| | *atan* | \<= 2 ulp | \<= 3 ulp |
| | *atanh* | \<= 2 ulp | \<= 3 ulp |
| | *atanpi* | \<= 2 ulp | \<= 3 ulp |
| | *atan2* | \<= 2 ulp | \<= 3 ulp |
| | *atan2pi* | \<= 2 ulp | \<= 3 ulp |
| | *cbrt* | \<= 2 ulp | \<= 2 ulp |
| | *ceil* | Correctly rounded | Correctly rounded |
| | *clamp* | 0 ulp | 0 ulp |
| | *copysign* | 0 ulp | 0 ulp |
| | *cos* | \<= 2 ulp | \<= 2 ulp |
| | *cosh* | \<= 2 ulp | \<= 3 ulp |
| | *cospi* | \<= 2 ulp | \<= 2 ulp |
| |
| // 3 operations from the 2 multiplications and 1 subtraction per component |
| | *cross* |
| | absolute error tolerance of 'max * max * (3 * HALF_EPSILON)' per vector |
| component, where _max_ is the maximum input operand magnitude |
| | Implementation-defined |
| | *degrees* | \<= 2 ulp | \<= 2 ulp |
| |
| // 0.5 ULP error in sqrt |
| // 1.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 0.5 + (1.5 * n) + (0.5 * (n - 1)) |
| // = 0.5 + 1.5n + (0.5n - 0.5) |
| // = 2n |
| | *distance* |
| | \<= 2n ulp, for gentype with vector width _n_ |
| | Implementation-defined |
| |
| // n + n-1 Number of operations from n multiples and (n-1) additions |
| // 2n - 1 |
| | *dot* |
| | absolute error tolerance of 'max * max * (2n - 1) * HALF_EPSILON', for |
| vector width _n_ and maximum input operand magnitude _max_ across all |
| vector components |
| | Implementation-defined |
| |
| | *erfc* | \<= 4 ulp | \<= 4 ulp |
| | *erf* | \<= 4 ulp | \<= 4 ulp |
| | *exp* | \<= 2 ulp | \<= 3 ulp |
| | *exp2* | \<= 2 ulp | \<= 3 ulp |
| | *exp10* | \<= 2 ulp | \<= 3 ulp |
| | *expm1* | \<= 2 ulp | \<= 3 ulp |
| | *fabs* | 0 ulp | 0 ulp |
| | *fdim* | Correctly rounded | Correctly rounded |
| | *floor* | Correctly rounded | Correctly rounded |
| | *fma* | Correctly rounded | Correctly rounded |
| | *fmax* | 0 ulp | 0 ulp |
| | *fmin* | 0 ulp | 0 ulp |
| | *fmod* | 0 ulp | 0 ulp |
| | *fract* | Correctly rounded | Correctly rounded |
| | *frexp* | 0 ulp | 0 ulp |
| | *hypot* | \<= 2 ulp | \<= 3 ulp |
| | *ilogb* | 0 ulp | 0 ulp |
| | *ldexp* | Correctly rounded | Correctly rounded |
| |
| // 0.5 ULP error in sqrt |
| // 0.5 effect on e of taking sqrt(x + e) |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = (0.5 + 0.5 * ((0.5 * n) + (0.5 * (n - 1)))) |
| // = 0.5 + 0.5 * (n - 0.5) |
| // = 0.25 + 0.5n |
| | *length* |
| | \<= 0.25 + 0.5n ulp, for gentype with vector width _n_ |
| | Implementation-defined |
| | *log* | \<= 2 ulp | \<= 3 ulp |
| | *log2* | \<= 2 ulp | \<= 3 ulp |
| | *log10* | \<= 2 ulp | \<= 3 ulp |
| | *log1p* | \<= 2 ulp | \<= 3 ulp |
| | *logb* | 0 ulp | 0 ulp |
| | *mad* | Implementation-defined | Implementation-defined |
| | *max* | 0 ulp | 0 ulp |
| | *maxmag* | 0 ulp | 0 ulp |
| | *min* | 0 ulp | 0 ulp |
| | *minmag* | 0 ulp | 0 ulp |
| | *mix* | Implementation-defined | Implementation-defined |
| | *modf* | 0 ulp | 0 ulp |
| | *nan* | 0 ulp | 0 ulp |
| | *nextafter* | 0 ulp | 0 ulp |
| |
| // 1.5 error in rsqrt + error in multiply |
| // 0.5 * n cumulative error for multiplications |
| // 0.5 * (n-1) cumulative error for additions |
| // |
| // = 1.5 + (0.5 * n) + (0.5 * (n - 1)) |
| // = 1.5 + 0.5n + (0.5n - 0.5) |
| // = 1.0 + n |
| | *normalize* |
| | \<= 1 + n ulp, for gentype with vector width _n_ |
| | Implementation-defined |
| | *pow(x, y)* | \<= 4 ulp | \<= 5 ulp |
| | *pown(x, y)* | \<= 4 ulp | \<= 5 ulp |
| | *powr(x, y)* | \<= 4 ulp | \<= 5 ulp |
| | *radians* | \<= 2 ulp | \<= 2 ulp |
| | *remainder* | 0 ulp | 0 ulp |
| | *remquo* |
| | 0 ulp for the remainder, at least the lower 7 bits of the integral |
| quotient |
| | 0 ulp for the remainder, at least the lower 7 bits of the integral |
| quotient |
| | *rint* | Correctly rounded | Correctly rounded |
| | *rootn* | \<= 4 ulp | \<= 5 ulp |
| | *round* | Correctly rounded | Correctly rounded |
| | *rsqrt* | \<=1 ulp | \<=1 ulp |
| | *sign* | 0 ulp | 0 ulp |
| | *sin* | \<= 2 ulp | \<= 2 ulp |
| | *sincos* |
| | \<= 2 ulp for sine and cosine values |
| | \<= 2 ulp for sine and cosine values |
| | *sinh* | \<= 2 ulp | \<= 3 ulp |
| | *sinpi* | \<= 2 ulp | \<= 2 ulp |
| | *smoothstep* | Implementation-defined | Implementation-defined |
| | *sqrt* | Correctly rounded | \<= 1 ulp |
| | *step* | 0 ulp | 0 ulp |
| | *tan* | \<= 2 ulp | \<= 3 ulp |
| | *tanh* | \<= 2 ulp | \<= 3 ulp |
| | *tanpi* | \<= 2 ulp | \<= 3 ulp |
| | *tgamma* | \<= 4 ulp | \<= 4 ulp |
| | *trunc* | Correctly rounded | Correctly rounded |
| |==== |
| |
| NOTE: _Implementations may perform floating-point operations on_ `half` |
| _scalar or vector data types by converting the_ `half` _values to single |
| precision floating-point values and performing the operation in single |
| precision floating-point. |
| In this case, the implementation will use the_ `half` _scalar or vector data |
| type as a storage only format_. |
| |
| endif::cl_khr_fp16[] |
| |
| |
| [[edge-case-behavior]] |
| == Edge Case Behavior |
| |
| The edge case behavior of the <<math-functions,math functions>> shall |
| conform to <<C99-spec,sections F.9 and G.6 of the C99 Specification>>, |
| except <<additional-requirements-beyond-c99-tc2,where noted below>>. |
| |
| |
| [[additional-requirements-beyond-c99-tc2]] |
| === Additional Requirements Beyond C99 TC2 |
| |
| All functions that return a NaN should return a quiet NaN. |
| |
| *half_<funcname>* functions behave identically to the function of the same |
| name without the *half_* prefix. |
| They must conform to the same edge case requirements (<<C99-spec,see |
| sections F.9 and G.6 of the C99 Specification>>). |
| For other cases, except where otherwise noted, these single precision |
| functions are permitted to have up to 8192 ulps of error (as measured in the |
| single precision result), although better accuracy is encouraged. |
| |
| The usual allowances for <<relative-error-as-ulps,rounding error>> or |
| <<edge-case-behavior-in-flush-to-zero-mode,flushing behavior>> shall not |
| apply for those values for which <<C99-spec,section F.9 of the C99 |
| Specification>>, or the <<additional-requirements-beyond-c99-tc2,additional |
| requirements>> and <<edge-case-behavior-in-flush-to-zero-mode,edge case |
| behavior>> below (and similar sections for other floating-point precisions) |
| prescribe a result (e.g. *ceil*(-1 < _x_ < 0) returns -0). |
| Those values shall produce exactly the prescribed answers, and no other. |
| Where the {plusmn} symbol is used, the sign shall be preserved. |
| For example, *sin*({plusmn}0) = {plusmn}0 shall be interpreted to mean |
| *sin*(+0) is +0 and *sin*(-0) is -0. |
| |
| [none] |
| * *acospi*(1) = +0. |
| * *acospi*(_x_) returns a NaN for |_x_| > 1. |
| * *asinpi*({plusmn}0) = {plusmn}0. |
| * *asinpi*(_x_) returns a NaN for |_x_| > 1. |
| * *atanpi*({plusmn}0) = {plusmn}0. |
| * *atanpi*({plusmn}{inf}) = {plusmn}0.5. |
| * *atan2pi*({plusmn}0, -0) = {plusmn}1. |
| * *atan2pi*({plusmn}0, +0) = {plusmn}0. |
| * *atan2pi*({plusmn}0, _x_) returns {plusmn}1 for _x_ < 0. |
| * *atan2pi*({plusmn}0, _x_) returns {plusmn}0 for _x_ > 0. |
| * *atan2pi*(_y_, {plusmn}0) returns -0.5 for _y_ < 0. |
| * *atan2pi*(_y_, {plusmn}0) returns 0.5 for _y_ > 0. |
| * *atan2pi*({plusmn}_y_, -{inf}) returns {plusmn}1 for finite _y_ > 0. |
| * *atan2pi*({plusmn}_y_, +{inf}) returns {plusmn}0 for finite _y_ > 0. |
| * *atan2pi*({plusmn}{inf}, _x_) returns {plusmn}0.5 for finite _x_. |
| * *atan2pi*({plusmn}{inf}, -{inf}) returns {plusmn}0.75. |
| * *atan2pi*({plusmn}{inf}, +{inf}) returns {plusmn}0.25. |
| * *ceil*(-1 < _x_ < 0) returns -0. |
| * *cospi*({plusmn}0) returns 1 |
| * *cospi*(_n_ + 0.5) is +0 for any integer _n_ where _n_ + 0.5 is |
| representable. |
| * *cospi*({plusmn}{inf}) returns a NaN. |
| * *exp10*(-{inf}) returns +0. |
| * *exp10*(+{inf}) returns +{inf}. |
| * *distance*(_x_, _y_) calculates the distance from _x_ to _y_ without |
| overflow or extraordinary precision loss due to underflow. |
| * *fdim*(any, NaN) returns NaN. |
| * *fdim*(NaN, any) returns NaN. |
| * *fmod*({plusmn}0, NaN) returns NaN. |
| * *frexp*({plusmn}{inf}, _exp_) returns {plusmn}{inf} and stores 0 in |
| _exp_. |
| * *frexp*(NaN, _exp_) returns the NaN and stores 0 in _exp_. |
| * *fract*(_x_, _iptr_) shall not return a value greater than or equal to |
| 1.0, and shall not return a value less than 0. |
| * *fract*(+0, _iptr_) returns +0 and +0 in iptr. |
| * *fract*(-0, _iptr_) returns -0 and -0 in iptr. |
| * *fract*(+{inf}, _iptr_) returns +0 and +{inf} in _iptr_. |
| * *fract*(-{inf}, _iptr_) returns -0 and -{inf} in _iptr_. |
| * *fract*(NaN, _iptr_) returns the NaN and NaN in _iptr_. |
| * *length* calculates the length of a vector without overflow or |
| extraordinary precision loss due to underflow. |
| * *lgamma_r*(_x_, _signp_) returns 0 in _signp_ if _x_ is zero or a |
| negative integer. |
| * *nextafter*(-0, _y_ > 0) returns smallest positive denormal value. |
| * *nextafter*(+0, _y_ < 0) returns smallest negative denormal value. |
| * *normalize* shall reduce the vector to unit length, pointing in the |
| same direction without overflow or extraordinary precision loss due to |
| underflow. |
| * *normalize*(_v_) returns _v_ if all elements of _v_ are zero. |
| * *normalize*(_v_) returns a vector full of NaNs if any element is a NaN. |
| * *normalize*(_v_) for which any element in _v_ is infinite shall proceed |
| as if the elements in _v_ were replaced as follows: |
| + |
| [source,opencl_c] |
| ---------- |
| for (i = 0; i < sizeof(v) / sizeof(v[0]); i++) |
| v[i] = isinf(v[i]) ? copysign(1.0, v[i]) : 0.0 * v[i]; |
| ---------- |
| * *pow*({plusmn}0, -{inf}) returns +{inf} |
| * *pown*(_x_, 0) is 1 for any _x_, even zero, NaN or infinity. |
| * *pown*({plusmn}0, _n_) is {plusmn}{inf} for odd _n_ < 0. |
| * *pown*({plusmn}0, _n_) is +{inf} for even _n_ < 0. |
| * *pown*({plusmn}0, _n_) is +0 for even _n_ > 0. |
| * *pown*({plusmn}0, _n_) is {plusmn}0 for odd _n_ > 0. |
| * *powr*(_x_, {plusmn}0) is 1 for finite _x_ > 0. |
| * *powr*({plusmn}0, _y_) is +{inf} for finite _y_ < 0. |
| * *powr*({plusmn}0, -{inf}) is +{inf}. |
| * *powr*({plusmn}0, _y_) is +0 for _y_ > 0. |
| * *powr*(+1, _y_) is 1 for finite _y_. |
| * *powr*(_x_, _y_) returns NaN for _x_ < 0. |
| * *powr*({plusmn}0, {plusmn}0) returns NaN. |
| * *powr*(+{inf}, {plusmn}0) returns NaN. |
| * *powr*(+1, {plusmn}{inf}) returns NaN. |
| * *powr*(_x_, NaN) returns the NaN for _x_ >= 0. |
| * *powr*(NaN, _y_) returns the NaN. |
| * *rint*(-0.5 \<= _x_ < 0) returns -0. |
| * *remquo*(_x_, _y_, &_quo_) returns a NaN and 0 in _quo_ if _x_ is |
| {plusmn}{inf}, or if _y_ is 0 and the other argument is non-NaN or if |
| either argument is a NaN. |
| * *rootn*({plusmn}0, _n_) is {plusmn}{inf} for odd _n_ < 0. |
| * *rootn*({plusmn}0, _n_) is +{inf} for even _n_ < 0. |
| * *rootn*({plusmn}0, _n_) is +0 for even _n_ > 0. |
| * *rootn*({plusmn}0, _n_) is {plusmn}0 for odd _n_ > 0. |
| * *rootn*(_x_, _n_) returns a NaN for _x_ < 0 and _n_ is even. |
| * *rootn*(_x_, 0) returns a NaN. |
| * *round*(-0.5 < _x_ < 0) returns -0. |
| * *sinpi*({plusmn}0) returns {plusmn}0. |
| * *sinpi*(+__n__) returns +0 for positive integers _n_. |
| * *sinpi*(-_n_) returns -0 for negative integers _n_. |
| * *sinpi*({plusmn}{inf}) returns a NaN. |
| * *tanpi*({plusmn}0) returns {plusmn}0. |
| * *tanpi*({plusmn}{inf}) returns a NaN. |
| * *tanpi*(_n_) is *copysign*(0.0, _n_) for even integers _n_. |
| * *tanpi*(_n_) is *copysign*(0.0, - _n_) for odd integers _n_. |
| * *tanpi*(_n_ + 0.5) for even integer _n_ is +{inf} where _n_ + 0.5 is |
| representable. |
| * *tanpi*(_n_ + 0.5) for odd integer _n_ is -{inf} where _n_ + 0.5 is |
| representable. |
| * *trunc*(-1 < _x_ < 0) returns -0. |
| Binary file (standard input) matches |
| |
| |
| [[changes-to-c99-tc2-behavior]] |
| === Changes to C99 TC2 Behavior |
| |
| *modf* behaves as though implemented by: |
| |
| [source,opencl_c] |
| ---------- |
| gentype modf(gentype value, gentype *iptr) |
| { |
| *iptr = trunc( value ); |
| return copysign(isinf( value ) ? 0.0 : value - *iptr, value); |
| } |
| ---------- |
| |
| *rint* always rounds according to round to nearest even rounding mode even |
| if the caller is in some other rounding mode. |
| |
| |
| [[edge-case-behavior-in-flush-to-zero-mode]] |
| === Edge Case Behavior in Flush to Zero Mode |
| |
| If denormals are flushed to zero, then a function may return one of four |
| results: |
| |
| . Any conforming result for non-flush-to-zero mode |
| . If the result given by 1. |
| is a sub-normal before rounding, it may be flushed to zero |
| . Any non-flushed conforming result for the function if one or more of its |
| sub-normal operands are flushed to zero. |
| . If the result of 3. |
| is a sub-normal before rounding, the result may be flushed to zero. |
| |
| In each of the above cases, if an operand or result is flushed to zero, the |
| sign of the zero is undefined. |
| |
| If subnormals are flushed to zero, a device may choose to conform to the |
| following edge cases for *nextafter* instead of those listed in the |
| <<additional-requirements-beyond-c99-tc2,additional requirements>> section. |
| |
| [none] |
| * *nextafter*(+smallest normal, _y_ < +smallest normal) = +0. |
| * *nextafter*(-smallest normal, _y_ > -smallest normal) = -0. |
| * *nextafter*(-0, _y_ > 0) returns smallest positive normal value. |
| * *nextafter*(+0, _y_ < 0) returns smallest negative normal value. |
| |
| For clarity, subnormals or denormals are defined to be the set of |
| representable numbers in the range 0 < _x_ < `TYPE_MIN` and `-TYPE_MIN` < |
| _x_ < -0. |
| They do not include {plusmn}0. |
| A non-zero number is said to be sub-normal before rounding if after |
| normalization, its radix-2 exponent is less than (`TYPE_MIN_EXP` - 1) |
| footnote:[{fn-min-float-constants}]. |
| |
| |
| [[image-addressing-and-filtering]] |
| = Image Addressing and Filtering |
| |
| Let w~t~, h~t~ and d~t~ be the width, height (or image array size for a 1D |
| image array) and depth (or image array size for a 2D image array) of the |
| image in pixels. |
| Let _coord.xy_ (also referred to as (_s_,_t_)) or _coord.xyz_ (also referred |
| to as (_s_,_t_,_r_)) be the coordinates specified to *read_image{f|i|ui}*. |
| The sampler specified in *read_image{f|i|ui}* is used to determine how to |
| sample the image and return an appropriate color. |
| |
| |
| [[image-coordinates]] |
| == Image Coordinates |
| |
| This affects the interpretation of image coordinates. |
| If image coordinates specified to *read_image{f|i|ui}* are normalized (as |
| specified in the sampler), the _s_, _t_, and _r_ coordinate values are |
| multiplied by w~t~, h~t,~ and d~t~ respectively to generate the unnormalized |
| coordinate values. |
| For image arrays, the image array coordinate (i.e. _t_ if it is a 1D image |
| array or _r_ if it is a 2D image array) specified to *read_image{f|i|ui}* |
| must always be the un-normalized image coordinate value. |
| |
| Let (_u_,_v_,_w_) represent the unnormalized image coordinate values. |
| |
| |
| [[addressing-and-filter-modes]] |
| == Addressing and Filter Modes |
| |
| We first describe how the addressing and filter modes are applied to |
| generate the appropriate sample locations to read from the image if the |
| addressing mode is not `CLK_ADDRESS_REPEAT` nor |
| `CLK_ADDRESS_MIRRORED_REPEAT`. |
| |
| After generating the image coordinate (_u_,_v_,_w_) we apply the appropriate |
| addressing and filter mode to generate the appropriate sample locations to |
| read from the image. |
| |
| If values in (_u_,_v_,_w_) are `INF` or NaN, the behavior of |
| *read_image{f|i|ui}* is undefined. |
| |
| *Filter Mode* `CLK_FILTER_NEAREST` |
| |
| When filter mode is `CLK_FILTER_NEAREST`, the image element in the image |
| that is nearest (in Manhattan distance) to that specified by (_u_,_v_,_w_) |
| is obtained. |
| This means the image element at location (_i_,_j_,_k_) becomes the image |
| element value, where |
| |
| [source,opencl_c] |
| ---------- |
| i = address_mode((int)floor(u)) |
| j = address_mode((int)floor(v)) |
| k = address_mode((int)floor(w)) |
| ---------- |
| |
| For a 3D image, the image element at location (_i_,_j_,_k_) becomes the |
| color value. |
| For a 2D image, the image element at location (_i_,_j_) becomes the color |
| value. |
| |
| The following table describes the address_mode function. |
| |
| [[table-address-modes-texel-location]] |
| .Addressing modes to generate texel location |
| [cols=",",options="header",] |
| |==== |
| | Addressing Mode | Result of address_mode(coord) |
| | `CLK_ADDRESS_CLAMP_TO_EDGE` | clamp (coord, 0, size - 1) |
| | `CLK_ADDRESS_CLAMP` | clamp (coord, -1, size) |
| | `CLK_ADDRESS_NONE` | coord |
| |==== |
| |
| The `size` term in this table is w~t~ for _u_, h~t~ for _v_ and d~t~ for |
| _w_. |
| |
| The `clamp` function used in this table is defined as: |
| |
| [source,opencl_c] |
| ---------- |
| clamp(a, b, c) = return (a < b) ? b : ((a > c) ? c : a) |
| ---------- |
| |
| If the selected texel location (_i_,_j_,_k_) refers to a location outside |
| the image, the border color is used as the color value for this texel. |
| |
| *Filter Mode* `CLK_FILTER_LINEAR` |
| |
| When filter mode is `CLK_FILTER_LINEAR`, a 2{times}2 square of image |
| elements for a 2D image or a 2{times}2{times}2 cube of image elements for a |
| 3D image is selected. |
| This 2{times}2 square or 2{times}2{times}2 cube is obtained as follows. |
| |
| Let |
| |
| [source,opencl_c] |
| ---------- |
| i0 = address_mode((int)floor(u - 0.5)) |
| j0 = address_mode((int)floor(v - 0.5)) |
| k0 = address_mode((int)floor(w - 0.5)) |
| i1 = address_mode((int)floor(u - 0.5) + 1) |
| j1 = address_mode((int)floor(v - 0.5) + 1) |
| k1 = address_mode((int)floor(w - 0.5) + 1) |
| a = frac(u - 0.5) |
| b = frac(v - 0.5) |
| c = frac(w - 0.5) |
| ---------- |
| |
| where `frac(x)` denotes the fractional part of x and is computed as `x - |
| floor(x)`. |
| |
| For a 3D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * (1 - c) * T_i0j0k0 |
| + a * (1 - b) * (1 - c) * T_i1j0k0 |
| + (1 - a) * b * (1 - c) * T_i0j1k0 |
| + a * b * (1 - c) * T_i1j1k0 |
| + (1 - a) * (1 - b) * c * T_i0j0k1 |
| + a * (1 - b) * c * T_i1j0k1 |
| + (1 - a) * b * c * T_i0j1k1 |
| + a * b * c * T_i1j1k1 |
| ---------- |
| |
| where `T_ijk` is the image element at location (_i_,_j_,_k_) in the 3D image. |
| |
| For a 2D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * T_i0j0 |
| + a * (1 - b) * T_i1j0 |
| + (1 - a) * b * T_i0j1 |
| + a * b * T_i1j1 |
| ---------- |
| |
| where `T_ij` is the image element at location (_i_,_j_) in the 2D image. |
| |
| If any of the selected `T_ijk` or `T_ij` in the above equations refers to a |
| location outside the image, the border color is used as the color value for |
| `T_ijk` or `T_ij`. |
| |
| If the image channel type is `CL_FLOAT` or `CL_HALF_FLOAT` and any of the |
| image elements `T_ijk` or `T_ij` is `INF` or NaN, the behavior of the built-in |
| image read function is undefined. |
| |
| We now discuss how the addressing and filter modes are applied to generate |
| the appropriate sample locations to read from the image if the addressing |
| mode is `CLK_ADDRESS_REPEAT`. |
| |
| If values in (_s_,_t_,_r_) are `INF` or NaN, the behavior of the built-in |
| image read functions is undefined. |
| |
| *Filter Mode* `CLK_FILTER_NEAREST` |
| |
| When filter mode is `CLK_FILTER_NEAREST`, the image element at location |
| (_i_,_j_,_k_) becomes the image element value, with _i_, _j_, and _k_ |
| computed as |
| |
| [source,opencl_c] |
| ---------- |
| u = (s - floor(s)) * w_t |
| i = (int)floor(u) |
| if (i > w_t - 1) |
| i = i - w_t |
| |
| v = (t - floor(t)) * h_t |
| j = (int)floor(v) |
| if (j > h_t - 1) |
| j = j - h_t |
| |
| w = (r - floor(r)) * d_t |
| k = (int)floor(w) |
| if (k > d_t - 1) |
| k = k - d_t |
| ---------- |
| |
| |
| For a 3D image, the image element at location (_i_,_j_,_k_) becomes the |
| color value. |
| For a 2D image, the image element at location (_i_,_j_) becomes the color |
| value. |
| |
| *Filter Mode* `CLK_FILTER_LINEAR` |
| |
| When filter mode is `CLK_FILTER_LINEAR`, a 2{times}2 square of image |
| elements for a 2D image or a 2{times}2{times}2 cube of image elements for a |
| 3D image is selected. |
| This 2{times}2 square or 2{times}2{times}2 cube is obtained as follows. |
| |
| Let |
| |
| [source,opencl_c] |
| ---------- |
| u = (s - floor(s)) * w_t |
| i0 = (int)floor(u - 0.5) |
| i1 = i0 + 1 |
| if (i0 < 0) |
| i0 = w_t + i0 |
| if (i1 > w_t - 1) |
| i1 = i1 - w_t |
| |
| v = (t - floor(t)) * h_t |
| j0 = (int)floor(v - 0.5) |
| j1 = j0 + 1 |
| if (j0 < 0) |
| j0 = h_t + j0 |
| if (j1 > h_t - 1) |
| j1 = j1 - h_t |
| |
| w = (r - floor(r)) * d_t |
| k0 = (int)floor(w - 0.5) |
| k1 = k0 + 1 |
| if (k0 < 0) |
| k0 = d_t + k0 |
| if (k1 > d_t - 1) |
| k1 = k1 - d_t |
| |
| a = frac(u - 0.5) |
| b = frac(v - 0.5) |
| c = frac(w - 0.5) |
| ---------- |
| |
| where `frac(x)` denotes the fractional part of x and is computed as `x - |
| floor(x)`. |
| |
| For a 3D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * (1 - c) * T_i0j0k0 |
| + a * (1 - b) * (1 - c) * T_i1j0k0 |
| + (1 - a) * b * (1 - c) * T_i0j1k0 |
| + a * b * (1 - c) * T_i1j1k0 |
| + (1 - a) * (1 - b) * c * T_i0j0k1 |
| + a * (1 - b) * c * T_i1j0k1 |
| + (1 - a) * b * c * T_i0j1k1 |
| + a * b * c * T_i1j1k1 |
| ---------- |
| |
| where `T_ijk` is the image element at location (_i_,_j_,_k_) in the 3D image. |
| |
| For a 2D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * T_i0j0 |
| + a * (1 - b) * T_i1j0 |
| + (1 - a) * b * T_i0j1 |
| + a * b * T_i1j1 |
| ---------- |
| |
| where `T_ij` is the image element at location (_i_,_j_) in the 2D image. |
| |
| If the image channel type is `CL_FLOAT` or `CL_HALF_FLOAT` and any of the |
| image elements `T_ijk` or `T_ij` is `INF` or NaN, the behavior of the built-in |
| image read function is undefined. |
| |
| We now discuss how the addressing and filter modes are applied to generate |
| the appropriate sample locations to read from the image if the addressing |
| mode is `CLK_ADDRESS_MIRRORED_REPEAT`. |
| The `CLK_ADDRESS_MIRRORED_REPEAT` addressing mode causes the image to be |
| read as if it is tiled at every integer seam with the interpretation of the |
| image data flipped at each integer crossing. |
| For example, the (_s_,_t_,_r_) coordinates between 2 and 3 are addressed |
| into the image as coordinates from 1 down to 0. |
| If values in (_s_,_t_,_r_) are `INF` or NaN, the behavior of the built-in |
| image read functions is undefined. |
| |
| *Filter Mode* `CLK_FILTER_NEAREST` |
| |
| When filter mode is `CLK_FILTER_NEAREST`, the image element at location |
| (_i_,_j_,_k_) becomes the image element value, with _i_,_j_ and k computed |
| as |
| |
| [source,opencl_c] |
| ---------- |
| s' = 2.0f * rint(0.5f * s) |
| s' = fabs(s - s') |
| u = s' * w_t |
| i = (int)floor(u) |
| i = min(i, w_t - 1) |
| |
| t' = 2.0f * rint(0.5f * t) |
| t' = fabs(t - t') |
| v = t' * h_t |
| j = (int)floor(v) |
| j = min(j, h_t - 1) |
| |
| r' = 2.0f * rint(0.5f * r) |
| r' = fabs(r - r') |
| w = r' * d_t |
| k = (int)floor(w) |
| k = min(k, d_t - 1) |
| ---------- |
| |
| For a 3D image, the image element at location (_i_,_j_,_k_) becomes the |
| color value. |
| For a 2D image, the image element at location (_i_,_j_) becomes the color |
| value. |
| |
| *Filter Mode* `CLK_FILTER_LINEAR` |
| |
| When filter mode is `CLK_FILTER_LINEAR`, a 2{times}2 square of image |
| elements for a 2D image or a 2{times}2{times}2 cube of image elements for a |
| 3D image is selected. |
| This 2{times}2 square or 2{times}2{times}2 cube is obtained as follows. |
| |
| Let |
| |
| [source,opencl_c] |
| ---------- |
| s' = 2.0f * rint(0.5f * s) |
| s' = fabs(s - s') |
| u = s' * w_t |
| i0 = (int)floor(u - 0.5f) |
| i1 = i0 + 1 |
| i0 = max(i0, 0) |
| i1 = min(i1, w_t - 1) |
| |
| t' = 2.0f * rint(0.5f * t) |
| t' = fabs(t - t') |
| v = t' * h_t |
| j0 = (int)floor(v - 0.5f) |
| j1 = j0 + 1 |
| j0 = max(j0, 0) |
| j1 = min(j1, h_t - 1) |
| |
| r' = 2.0f * rint(0.5f * r) |
| r' = fabs(r - r') |
| w = r' * d_t |
| k0 = (int)floor(w - 0.5f) |
| k1 = k0 + 1 |
| k0 = max(k0, 0) |
| k1 = min(k1, d_t - 1) |
| |
| a = frac(u - 0.5) |
| b = frac(v - 0.5) |
| c = frac(w - 0.5) |
| ---------- |
| |
| where `frac(x)` denotes the fractional part of x and is computed as `x - |
| floor(x)`. |
| |
| For a 3D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * (1 - c) * T_i0j0k0 |
| + a * (1 - b) * (1 - c) * T_i1j0k0 |
| + (1 - a) * b * (1 - c) * T_i0j1k0 |
| + a * b * (1 - c) * T_i1j1k0 |
| + (1 - a) * (1 - b) * c * T_i0j0k1 |
| + a * (1 - b) * c * T_i1j0k1 |
| + (1 - a) * b * c * T_i0j1k1 |
| + a * b * c * T_i1j1k1 |
| ---------- |
| |
| where `T_ijk` is the image element at location (_i_,_j_,_k_) in the 3D image. |
| |
| For a 2D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * (1 - b) * T_i0j0 |
| + a * (1 - b) * T_i1j0 |
| + (1 - a) * b * T_i0j1 |
| + a * b * T_i1j1 |
| ---------- |
| |
| where `T_ij` is the image element at location (_i_,_j_) in the 2D image. |
| |
| For a 1D image, the image element value is found as |
| |
| [source,opencl_c] |
| ---------- |
| T = (1 - a) * T_i0 |
| + a * T_i1 |
| ---------- |
| |
| where `T_i` is the image element at location (_i_) in the 1D image. |
| |
| If the image channel type is `CL_FLOAT` or `CL_HALF_FLOAT` and any of the |
| image elements `T_ijk` or `T_ij` is `INF` or NaN, the behavior of the built-in |
| image read function is undefined. |
| |
| [NOTE] |
| ==== |
| If the sampler is specified as using unnormalized coordinates |
| (floating-point or integer coordinates), filter mode set to |
| `CLK_FILTER_NEAREST` and addressing mode set to one of the following modes - |
| `CLK_ADDRESS_NONE`, `CLK_ADDRESS_CLAMP_TO_EDGE` or `CLK_ADDRESS_CLAMP`, the |
| <<addressing-and-filter-modes,location of the image element in the image>> |
| given by (_i_,_j_,_k_) will be computed without any loss of precision. |
| |
| For all other sampler combinations of normalized or unnormalized |
| coordinates, filter and addressing modes, the relative error or precision of |
| the addressing mode calculations and the image filter operation are not |
| defined by this revision of the OpenCL specification. |
| To ensure a minimum precision of image addressing and filter calculations |
| across any OpenCL device, for these sampler combinations, developers should |
| unnormalize the image coordinate in the kernel and implement the linear |
| filter in the kernel with appropriate calls to *read_image{f|i|ui}* with a |
| sampler that uses unnormalized coordinates, filter mode set to |
| `CLK_FILTER_NEAREST`, addressing mode set to `CLK_ADDRESS_NONE`, |
| `CLK_ADDRESS_CLAMP_TO_EDGE` _or_ `CLK_ADDRESS_CLAMP`, and finally performing |
| the interpolation of color values read from the image to generate the |
| filtered color value. |
| ==== |
| |
| |
| [[conversion-rules]] |
| == Conversion Rules |
| |
| In this section we discuss conversion rules that are applied when reading |
| and writing images in a kernel. |
| |
| |
| [[conversion-rules-for-normalized-integer-channel-data-types]] |
| === Conversion Rules for Normalized Integer Channel Data Types |
| |
| In this section we discuss converting normalized integer channel data types |
| to floating-point values and vice-versa. |
| |
| |
| [[converting-normalized-integer-channel-data-types-to-floating-point-values]] |
| ==== Converting Normalized Integer Channel Data Types to Floating-point Values |
| |
| For images created with image channel data type of `CL_UNORM_INT8` and |
| `CL_UNORM_INT16`, *read_imagef* will convert the channel values from an |
| 8-bit or 16-bit unsigned integer to normalized floating-point values in the |
| range [`0.0f`, `1.0f`]. |
| |
| For images created with image channel data type of `CL_SNORM_INT8` and |
| `CL_SNORM_INT16`, *read_imagef* will convert the channel values from an |
| 8-bit or 16-bit signed integer to normalized floating-point values in the |
| range [`-1.0f`, `1.0f`]. |
| |
| These conversions are performed as follows: |
| |
| `CL_UNORM_INT8` (8-bit unsigned integer) {rightarrow} `float` |
| |
| [none] |
| * normalized `float` value = `(float)c / 255.0f` |
| |
| `CL_UNORM_INT_101010` (10-bit unsigned integer) {rightarrow} `float` |
| |
| [none] |
| * normalized `float` value = `(float)c / 1023.0f` |
| |
| `CL_UNORM_INT16` (16-bit unsigned integer) {rightarrow} `float` |
| |
| [none] |
| * normalized `float` value = `(float)c / 65535.0f` |
| |
| `CL_SNORM_INT8` (8-bit signed integer) {rightarrow} `float` |
| |
| [none] |
| * normalized `float` value = *max*(`-1.0f`, `(float)c / 127.0f`) |
| |
| `CL_SNORM_INT16` (16-bit signed integer) {rightarrow} `float` |
| |
| [none] |
| * normalized `float` value = *max*(`-1.0f`, `(float)c / 32767.0f`) |
| |
| The precision of the above conversions is \<= 1.5 ulp except for the |
| following cases: |
| |
| For `CL_UNORM_INT8` |
| |
| [none] |
| * 0 must convert to `0.0f` and |
| * 255 must convert to `1.0f` |
| |
| For `CL_UNORM_INT_101010` |
| |
| [none] |
| * 0 must convert to `0.0f` and |
| * 1023 must convert to `1.0f` |
| |
| For `CL_UNORM_INT16` |
| |
| [none] |
| * 0 must convert to `0.0f` and |
| * 65535 must convert to `1.0f` |
| |
| For `CL_SNORM_INT8` |
| |
| [none] |
| * -128 and -127 must convert to `-1.0f`, |
| * 0 must convert to `0.0f` and |
| * 127 must convert to `1.0f` |
| |
| For `CL_SNORM_INT16` |
| |
| [none] |
| * -32768 and -32767 must convert to `-1.0f`, |
| * 0 must convert to `0.0f` and |
| * 32767 must convert to `1.0f` |
| |
| |
| ifdef::cl_khr_fp16[] |
| [[converting-normalized-integer-channel-data-types-to-half-precision-floating-point-values]] |
| ==== Converting Normalized Integer Channel Data Types to Half-Precision Floating-Point Values |
| |
| If the `<<cl_khr_fp16>>` extension is supported, then |
| for images created with image channel data type of `CL_UNORM_INT8` and |
| `CL_UNORM_INT16`, *read_imageh* will convert the channel values from an |
| 8-bit or 16-bit unsigned integer to normalized half-precision floating-point |
| values in the range [`0.0h`, `1.0h`]. |
| |
| For images created with image channel data type of `CL_SNORM_INT8` and |
| `CL_SNORM_INT16`, *read_imageh* will convert the channel values from an |
| 8-bit or 16-bit signed integer to normalized half-precision floating-point |
| values in the range [`-1.0h`, `1.0h`]. |
| |
| These conversions are performed as follows: |
| |
| `CL_UNORM_INT8` (8-bit unsigned integer) {rightarrow} `half` |
| |
| [none] |
| * normalized `half` value = `round_to_half(c / 255)` |
| |
| `CL_UNORM_INT_101010` (10-bit unsigned integer) {rightarrow} `half` |
| |
| [none] |
| * normalized `half` value = `round_to_half(c / 1023)` |
| |
| `CL_UNORM_INT16` (16-bit unsigned integer) {rightarrow} `half` |
| |
| [none] |
| * normalized `half` value = `round_to_half(c / 65535)` |
| |
| `CL_SNORM_INT8` (8-bit signed integer) {rightarrow} `half` |
| |
| [none] |
| * normalized `half` value = *max*(`-1.0h`, `round_to_half(c / 127)`) |
| |
| `CL_SNORM_INT16` (16-bit signed integer) {rightarrow} `half` |
| |
| [none] |
| * normalized `half` value = *max*(`-1.0h`, `round_to_half(c / 32767)`) |
| |
| The precision of the above conversions is \<= 1.5 ulp except for the |
| following cases: |
| |
| For `CL_UNORM_INT8` |
| |
| [none] |
| * 0 must convert to `0.0h` and |
| * 255 must convert to `1.0h` |
| |
| For `CL_UNORM_INT_101010` |
| |
| [none] |
| * 0 must convert to `0.0h` and |
| * 1023 must convert to `1.0h` |
| |
| For `CL_UNORM_INT16` |
| |
| [none] |
| * 0 must convert to `0.0h` and |
| * 65535 must convert to `1.0h` |
| |
| For `CL_SNORM_INT8` |
| |
| [none] |
| * -128 and -127 must convert to `-1.0h`, |
| * 0 must convert to `0.0h` and |
| * 127 must convert to `1.0h` |
| |
| For `CL_SNORM_INT16` |
| |
| [none] |
| * -32768 and -32767 must convert to `-1.0h`, |
| * 0 must convert to `0.0h` and |
| * 32767 must convert to `1.0h` |
| endif::cl_khr_fp16[] |
| |
| |
| [[converting-floating-point-values-to-normalized-integer-channel-data-types]] |
| ==== Converting Floating-Point Values to Normalized Integer Channel Data Types |
| |
| For images created with image channel data type of `CL_UNORM_INT8` and |
| `CL_UNORM_INT16`, *write_imagef* will convert the floating-point color value |
| to an 8-bit or 16-bit unsigned integer. |
| |
| For images created with image channel data type of `CL_SNORM_INT8` and |
| `CL_SNORM_INT16`, *write_imagef* will convert the floating-point color value |
| to an 8-bit or 16-bit signed integer. |
| |
| The preferred method for how conversions from floating-point values to |
| normalized integer values are performed is as follows: |
| |
| `float` {rightarrow} `CL_UNORM_INT8` (8-bit unsigned integer) |
| |
| [none] |
| * *convert_uchar_sat_rte*(`f * 255.0f`) |
| |
| `float` {rightarrow} `CL_UNORM_INT_101010` (10-bit unsigned integer) |
| |
| [none] |
| * *min*(*convert_ushort_sat_rte*(`f * 1023.0f`), `0x3ff`) |
| |
| `float` {rightarrow} `CL_UNORM_INT16` (16-bit unsigned integer) |
| |
| [none] |
| * *convert_ushort_sat_rte*(`f * 65535.0f`) |
| |
| `float` {rightarrow} `CL_SNORM_INT8` (8-bit signed integer) |
| |
| [none] |
| * *convert_char_sat_rte*(`f * 127.0f`) |
| |
| `float` {rightarrow} `CL_SNORM_INT16` (16-bit signed integer) |
| |
| [none] |
| * *convert_short_sat_rte*(`f * 32767.0f`) |
| |
| Please refer to the <<out-of-range-behavior,out-of-range behavior and |
| saturated conversion>> rules. |
| |
| OpenCL implementations may choose to approximate the rounding mode used in |
| the conversions described above. |
| If a rounding mode other than round to nearest even (`_rte`) is used, the |
| absolute error of the implementation dependant rounding mode vs. |
| the result produced by the round to nearest even rounding mode must be {leq} |
| 0.6. |
| |
| `float` {rightarrow} `CL_UNORM_INT8` (8-bit unsigned integer) |
| |
| [none] |
| * Let f~preferred~ = *convert_uchar_sat_rte*(f * `255.0f`) |
| * Let f~approx~ = *convert_uchar_sat_<impl-rounding-mode>*(f * `255.0f`) |
| * *fabs*(f~preferred~ - f~approx~) must be \<= 0.6 |
| |
| `float` {rightarrow} `CL_UNORM_INT_101010` (10-bit unsigned integer) |
| |
| [none] |
| * Let f~preferred~ = *convert_ushort_sat_rte*(f * `1023.0f`) |
| * Let f~approx~ = *convert_ushort_sat_<impl-rounding-mode>*(f * |
| `1023.0f`) |
| * *fabs*(f~preferred~ - f~approx~) must be \<= 0.6 |
| |
| `float` {rightarrow} `CL_UNORM_INT16` (16-bit unsigned integer) |
| |
| [none] |
| * Let f~preferred~ = *convert_ushort_sat_rte*(f * `65535.0f`) |
| * Let f~approx~ = *convert_ushort_sat_<impl-rounding-mode>*(f * |
| `65535.0f`) |
| * *fabs*(f~preferred~ - f~approx~) must be \<= 0.6 |
| |
| `float` {rightarrow} `CL_SNORM_INT8` (8-bit signed integer) |
| |
| [none] |
| * Let f~preferred~ = *convert_char_sat_rte*(f * `127.0f`) |
| * Let f~approx~ = *convert_char_sat_<impl_rounding_mode>*(f * `127.0f`) |
| * *fabs*(f~preferred~ - f~approx~) must be \<= 0.6 |
| |
| `float` {rightarrow} `CL_SNORM_INT16` (16-bit signed integer) |
| |
| [none] |
| * Let f~preferred~ = *convert_short_sat_rte*(f * `32767.0f`) |
| * Let f~approx~ = *convert_short_sat_<impl-rounding-mode>*(f * |
| `32767.0f`) |
| * *fabs*(f~preferred~ - f~approx~) must be \<= 0.6 |
| |
| |
| ifdef::cl_khr_fp16[] |
| [[converting-half-precision-floating-point-values-to-normalized-integer-channel-data-types]] |
| ==== Converting Half-Precision Floating-point Values to Normalized Integer Channel Data Types |
| |
| If the `<<cl_khr_fp16>>` extension is supported, then |
| for images created with image channel data type of `CL_UNORM_INT8` and |
| `CL_UNORM_INT16`, *write_imageh* will convert the floating-point color value |
| to an 8-bit or 16-bit unsigned integer. |
| |
| For images created with image channel data type of `CL_SNORM_INT8` and |
| `CL_SNORM_INT16`, *write_imageh* will convert the floating-point color value |
| to an 8-bit or 16-bit signed integer. |
| |
| The preferred conversion uses the round to nearest even (`_rte`) rounding |
| mode, but OpenCL implementations may choose to approximate the rounding mode |
| used in the conversions described below. |
| When approximate rounding is used instead of the preferred rounding, the |
| result of the conversion must satisfy the bound given below. |
| |
| `half` {rightarrow` `CL_UNORM_INT8` (8-bit unsigned integer) |
| |
| [none] |
| * Let f~exact~ = *max*(`0`, *min*(`f * 255`, `255`)) |
| * Let f~preferred~ = *convert_uchar_sat_rte*(`f * 255.0f`) |
| * Let f~approx~ = *convert_uchar_sat_<impl-rounding-mode>*(`f * 255.0f`) |
| * *fabs*(f~exact~ - f~approx~) must be \<= 0.6 |
| |
| `half` {rightarrow` `CL_UNORM_INT_101010` (10-bit unsigned integer) |
| |
| [none] |
| * Let f~exact~ = *max*(`0`, *min*(`f * 1023`, `1023`)) |
| * Let f~preferred~ = *min*(*convert_ushort_sat_rte*(`f * 1023.0f`), |
| `1023`) |
| * Let f~approx~ = *convert_ushort_sat_<impl-rounding-mode>*(`f * 1023.0f`) |
| * *fabs*(f~exact~ - f~approx~) must be \<= 0.6 |
| |
| `half` {rightarrow` `CL_UNORM_INT16` (16-bit unsigned integer) |
| |
| [none] |
| * Let f~exact~ = *max*(`0`, *min*(`f * 65535`, `65535`)) |
| * Let f~preferred~ = *convert_ushort_sat_rte*(`f * 65535.0f`) |
| * Let f~approx~ = *convert_ushort_sat_<impl-rounding-mode>*(`f * |
| 65535.0f`) |
| * *fabs*(f~exact~ - f~approx~) must be \<= 0.6 |
| |
| `half` {rightarrow` `CL_SNORM_INT8` (8-bit signed integer) |
| |
| [none] |
| * Let f~exact~ = *max*(`-128`, *min*(`f * 127`, `127`)) |
| * Let f~preferred~ = *convert_char_sat_rte*(`f * 127.0f`) |
| * Let f~approx~ = *convert_char_sat_<impl_rounding_mode>*(`f * 127.0f`) |
| * *fabs*(f~exact~ - f~approx~) must be \<= 0.6 |
| |
| `half` {rightarrow` `CL_SNORM_INT16` (16-bit signed integer) |
| |
| [none] |
| * Let f~exact~ = *max*(`-32768`, *min*(`f * 32767`, `32767`)) |
| * Let f~preferred~ = *convert_short_sat_rte*(`f * 32767.0f`) |
| * Let f~approx~ = *convert_short_sat_<impl-rounding-mode>*(`f * 32767.0f`) |
| * *fabs*(f~exact~ - f~approx~) must be \<= 0.6 |
| endif::cl_khr_fp16[] |
| |
| |
| [[conversion-rules-for-half-precision-floating-point-channel-data-type]] |
| === Conversion Rules for Half-Precision Floating-Point Channel Data Type |
| |
| For images created with a channel data type of `CL_HALF_FLOAT`, the |
| conversions from `half` to `float` are lossless (as described in |
| <<the-half-data-type,"The half data type">>). |
| Conversions from `float` to `half` round the mantissa using the round to |
| nearest even or round to zero rounding mode. |
| Denormalized numbers for the `half` data type which may be generated when |
| converting a `float` to a `half` may be flushed to zero. |
| A `float` NaN must be converted to an appropriate NaN in the `half` type. |
| A `float` `INF` must be converted to an appropriate `INF` in the `half` |
| type. |
| |
| |
| [[conversion-rules-for-floating-point-channel-data-type]] |
| === Conversion Rules for Floating-Point Channel Data Type |
| |
| The following rules apply for reading and writing images created with |
| channel data type of `CL_FLOAT`. |
| |
| * NaNs may be converted to a NaN value(s) supported by the device. |
| * Denorms can be flushed to zero. |
| * All other values must be preserved. |
| |
| |
| [[conversion-rules-for-signed-and-unsigned-8-bit-16-bit-and-32-bit-integer-channel-data-types]] |
| === Conversion Rules for Signed and Unsigned 8-Bit, 16-Bit and 32-Bit Integer Channel Data Types |
| |
| Calls to *read_imagei* with channel data type values of `CL_SIGNED_INT8`, |
| `CL_SIGNED_INT16` and `CL_SIGNED_INT32` return the unmodified integer values |
| stored in the image at specified location. |
| |
| Calls to *read_imageui* with channel data type values of `CL_UNSIGNED_INT8`, |
| `CL_UNSIGNED_INT16` and `CL_UNSIGNED_INT32` return the unmodified integer |
| values stored in the image at specified location. |
| |
| Calls to *write_imagei* will perform one of the following conversions: |
| |
| 32 bit signed integer {rightarrow} 8-bit signed integer |
| |
| [none] |
| * *convert_char_sat*(i) |
| |
| 32 bit signed integer {rightarrow} 16-bit signed integer |
| |
| [none] |
| * *convert_short_sat*(i) |
| |
| 32 bit signed integer {rightarrow} 32-bit signed integer |
| |
| [none] |
| * no conversion is performed |
| |
| Calls to *write_imageui* will perform one of the following conversions: |
| |
| 32 bit unsigned integer {rightarrow} 8-bit unsigned integer |
| |
| [none] |
| * *convert_uchar_sat*(i) |
| |
| 32 bit unsigned integer {rightarrow} 16-bit unsigned integer |
| |
| [none] |
| * *convert_ushort_sat*(i) |
| |
| 32 bit unsigned integer {rightarrow} 32-bit unsigned integer |
| |
| [none] |
| * no conversion is performed |
| |
| The conversions described in this section must be correctly saturated. |
| |
| |
| [[conversion-rules-for-srgba-and-sbgra-images]] |
| === Conversion Rules for sRGBA and sBGRA Images |
| |
| Standard RGB data, which roughly displays colors in a linear ramp of |
| luminosity levels such that an average observer, under average viewing |
| conditions, can view them as perceptually equal steps on an average display. |
| All 0's maps to `0.0f`, and all 1's maps to `1.0f`. |
| The sequence of unsigned integer encodings between all 0's and all 1's |
| represent a nonlinear progression in the floating-point interpretation of |
| the numbers between `0.0f` to `1.0f`. |
| For more detail, see the <<sRGB-spec, SRGB color standard>>. |
| |
| Conversion from sRGB space is automatically done by *read_imagef* built-in |
| functions if the image channel order is one of the sRGB values described |
| above. |
| When reading from an sRGB image, the conversion from sRGB to linear RGB is |
| performed before the filter specified in the sampler specified to |
| read_imagef is applied. |
| If the format has an alpha channel, the alpha data is stored in linear color |
| space. |
| Conversion to sRGB space is automatically done by *write_imagef* built-in |
| functions if the image channel order is one of the sRGB values described |
| above and the device supports writing to sRGB images. |
| |
| If the format has an alpha channel, the alpha data is stored in linear color |
| space. |
| |
| The following is the conversion rule for converting a normalized 8-bit |
| unsigned integer sRGB color value to a floating-point linear RGB color value |
| using *read_imagef*. |
| |
| [source,opencl_c] |
| ---------- |
| // Convert the normalized 8-bit unsigned integer R, G and B channel values |
| // to a floating-point value (call it c) as per rules described in section |
| // 8.3.1.1. |
| |
| if (c <= 0.04045), |
| result = c / 12.92; |
| else |
| result = powr((c + 0.055) / 1.055, 2.4); |
| ---------- |
| |
| The resulting floating-point value, if converted back to an sRGB value |
| without rounding to a 8-bit unsigned integer value, must be within 0.5 ulp |
| of the original sRGB value. |
| |
| The following are the conversion rules for converting a linear RGB |
| floating-point color value (call it _c_) to a normalized 8-bit unsigned |
| integer sRGB value using *write_imagef*. |
| |
| [source,opencl_c] |
| ---------- |
| if (c is NaN) |
| c = 0.0; |
| if (c > 1.0) |
| c = 1.0; |
| else if (c < 0.0) |
| c = 0.0; |
| else if (c < 0.0031308) |
| c = 12.92 * c; |
| else |
| c = 1.055 * powr(c, 1.0/2.4) - 0.055; |
| |
| scaled_reference_result = c * 255 |
| channel_component = floor(scaled_reference_result + 0.5); |
| ---------- |
| |
| The precision of the above conversion should be such that |
| |
| [none] |
| * `|generated_channel_component - scaled_reference_result|` {leq} 0.6 |
| |
| where `generated_channel_component` is the actual value that the |
| implementation produces and being checked for conformance. |
| |
| |
| [[selecting-an-image-from-an-image-array]] |
| == Selecting an Image From an Image Array |
| |
| Let (_u_,_v_,_w_) represent the unnormalized image coordinate values for |
| reading from and/or writing to a 2D image in a 2D image array. |
| |
| When read using a sampler, the 2D image layer selected is computed as: |
| |
| [none] |
| * layer = *clamp*(*rint*(_w_), 0, d~t~ - 1) |
| |
| otherwise the layer selected is computed as: |
| |
| [none] |
| * layer = _w_ |
| |
| (since _w_ is already an integer) and the result is undefined if _w_ is not |
| one of the integers 0, 1, ... d~t~ - 1. |
| |
| Let (_u_,_v_) represent the unnormalized image coordinate values for reading |
| from and/or writing to a 1D image in a 1D image array. |
| |
| When read using a sampler, the 1D image layer selected is computed as: |
| |
| [none] |
| * layer = *clamp*(*rint*(_v_), 0, h~t~ - 1) |
| |
| otherwise the layer selected is computed as: |
| |
| [none] |
| * layer = _v_ |
| |
| (since _v_ is already an integer) and the result is undefined if _v_ is not |
| one of the integers 0, 1, ... h~t~ - 1. |
| |
| |
| [[references]] |
| = Normative References |
| |
| . [[C99-spec]] "`ISO/IEC 9899:1999 - Programming languages - C`", with |
| technical corrigenda TC1 and TC2, |
| https://www.iso.org/standard/29237.html . |
| References are to sections of this specific version, referred to as the |
| "`C99 Specification`", although other versions exist. |
| . [[C11-spec]] "`ISO/IEC 9899:2011 - Information technology - Programming |
| languages - C`", https://www.iso.org/standard/57853.html . |
| References are to sections of this specific version, referred to as the |
| "`C11 Specification`", although other versions exist. |
| . [[opencl-spec]] "`The OpenCL Specification, Version 3.0, Unified`", |
| https://www.khronos.org/registry/OpenCL/ . |
| References are to sections and tables of this specific version, although |
| other versions exists. |
| . [[opencl-device-queries]] "`Device Queries`" are defined in the |
| <<opencl-spec,OpenCL Specification>> for *clGetDeviceInfo*, and the |
| individual queries are defined in the "`OpenCL Device Queries`" table |
| (4.3) of that Specification. |
| . [[opencl-channel-order,image channel order]] "`Image Channel Order`" is |
| defined in the <<opencl-spec,OpenCL Specification>> in the "`Image |
| Format Descriptor`" section (5.3.1.1), and the individual channel orders |
| are defined in the "`List of supported Image Channel Order Values`" |
| table (5.6) of that Specification. |
| . [[opencl-channel-data-type,image channel data type]] "`Image Channel |
| Data Type`" is defined in the <<opencl-spec,OpenCL Specification>> in the |
| "`Image Format Descriptor`" section (5.3.1.1), and the individual |
| channel data types are defined in the "`List of supported Image Channel |
| Data Types" table (5.7) of that Specification. |
| . [[opencl-extension-spec]] "`The OpenCL Extension Specification, Version |
| 3.0, Unified`", https://www.khronos.org/registry/OpenCL/ . |
| References are to sections and tables of this specific version, although |
| other versions exists. |
| . [[sRGB-spec]] "`IEC 61966-2-1:1999 Multimedia systems and equipment - |
| Colour measurement and management - Part 2-1: Colour management - |
| Default RGB colour space - sRGB`", |
| https://webstore.iec.ch/publication/6169 . |
| . [[embedded-c-spec]] "`ISO/IEC TR 18037:2008 Programming languages - |
| C - Extensions to support embedded processors`", |
| https://www.iso.org/standard/51126.html . |
| References are to sections of this specific version, referred to as the |
| "`Embedded C Specification`", although other versions exist. |
| |
| :numbered!: |
| |
| include::c/appendix_a.asciidoc[] |
| |
| // This is generating asciidoctor errors: |
| // OpenCL_C.txt: Failed to load AsciiDoc document - undefined method `+' for nil:NilClass |
| // Disabling acknowledgements for now. We have them in the API spec already. |
| //<<< |
| //:numbered!: |
| //include::api/acknowledgements.asciidoc[] |