The OpenCL C 1.2 language provides an expressive variant of the C language with which to program heterogeneous architectures. There is already a significant body of code in the wild written in the OpenCL C language - both in open source and proprietary software. This document explains how the OpenCL C language is mapped onto an implementation of the Vulkan standard for high-performance graphics and compute.
The following subjects are covered:
The sampler map is deprecated. It is no longer necessary to specify a sampler map for literal samplers. Refer to the bindings generated for the literal samplers in descriptor map. The -samplermap option is still accepted, but support will be removed at later date. A single sampler binding is generated for each unique literal sampler in the program.
The descriptor map and it's C++ API are deprecated. Clspv is transitioning to embedding equivalent information directly into the SPIR-V binary via a non-semantic instruction set. The old ‘-descriptormap’ option is no longer accepted by clspv. That file can still be generated by running the new ‘clspv-reflection’ executable. It will produce an equivalent descriptor map.
The SPIR-V as produced from the OpenCL C language can make use of the following additional extensions:
The SPIR-V as produced from the OpenCL C language can make use of the following capabilities:
Shader
as we are targeting the OpenCL C language at a Vulkan implementation.VariablePointersStorageBuffer
, from the SPV_KHR_variable_pointers extension.VariablePointers
, from the SPV_KHR_variable_pointers extension.Int8
if char or uchar types (or composites of them) are used.Int16
if short or ushort types (or composites of them) are used.Int64
if long or ulong types (or composites of them) are used.Float16
if the half type (or composites of it) is used.Float64
if the double type (or composites of it) is used.ImageStorageWriteWithoutFormat
if write_only images are used.ImageQuery
if any image query is used.Image1D
if a write_only image is used.Sampled1D
if a read_only image is used.GroupNonUniform
(see cl_khr_subgroups below)The command-line switch ‘-spv-version’ can be used to specify the SPIR-V output version. Only ‘1.0’ and ‘1.3’ are currently supported, corresponding with vk versions ‘1.0’ and ‘1.1’ respectively.
A Vulkan implementation that is to consume the SPIR-V produced from the OpenCL C language must conform to the following the rules:
shaderInt16
field of VkPhysicalDeviceFeatures
must be set to true.shaderStorageImageReadWithoutFormat
field of VkPhysicalDeviceFeatures
must be set to true.shaderStorageImageWriteWithoutFormat
field of VkPhysicalDeviceFeatures
must be set to true.vkCreateDevice()
where the ppEnabledExtensionNames
field of VkDeviceCreateInfo
contains extension strings “VK_KHR_storage_buffer_storage_class” and “VK_KHR_variable_pointers” must succeed.In order to pass 8- or 16-bit types in the shader interface, Vulkan requires the appropriate bits are set in VkPhysicalDevice8BitStorageFeaturesKHR for 8-bit types or VkPhysicalDevice16BitStorageFeaturesKHR (or the appropriate Vulkan 1.1 or 1.2 feature structs). By default clspv assumes all feature bits are enabled, but provides options to disallow 8- or 16-bit interfaces. -no-8bit-storage reflects 8-bit storage features and -no-16bit-storage reflects 16-bit storage features. Each option can take the following values (can be specified in a comma-separated list or multiple times):
ssbo
: Represents the storage buffer feature bit.ubo
: Represents the uniform and storage buffer feature bit.pushconstant
: Represents the push constant feature bit.For example, if your device only supports storageBuffer16BitAccess (and no 8-bit interfaces), pass the following on the command line:
-no-16bit-storage=ubo,pushconstant -no-8bit-storage=ssbo,ubo,pushconstant
OpenCL C kernel argument types are mapped to Vulkan descriptor types in the following way:
VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE
.VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
.VK_DESCRIPTOR_TYPE_SAMPLER
.VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
. If option -constant-args-ubo' is used and the kernel has constant pointer types, set VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
.VK_DESCRIPTOR_TYPE_STORAGE_BUFFER
by default. If the option -pod-ubo
is used the descriptor set type isVK_DESCRIPTOR_TYPE_UNIFORM_BUFFER
. If the option -pod-pushconstant
is used the arg is instead passed via push constants.Note: -pod-ubo
and -pod-pushconstant
are exclusive options.
Note: By default, all plain-old-data kernel arguments are collected into a single structure to be passed in to the compute shader as a single resource. If -cluster-pod-kernel-args=0
is specified, each plain-old-data argument will be passed in a via a unique resource.
Note: -pod-pushconstant
cannot be specified with -cluster-pod-kernel-args=0
.
Some OpenCL C language features that are not natively expressible in Vulkan's variant of SPIR-V, require a subtle mapping to how Vulkan SPIR-V represents the corresponding functionality.
An additional preprocessor macro VULKAN
is set, to allow developers to guard OpenCL C functionality based on whether the Vulkan API is being targeted or not. This value is set to 100, to match Vulkan version 1.0.
OpenCL C language kernels take the form:
void kernel foo(global int* a, global float* b, uint c, float2 local* d);
SPIR-V tracks OpenCL C language kernels using OpEntryPoint
opcodes that denote the entry-points where an API interacts with a compute kernel.
Vulkan's variant of SPIR-V requires that the entry-points be void
return functions, and that they take no arguments. To pass data into Vulkan SPIR-V shaders, OpVariable
s are declared outside of the functions, and decorated with DescriptorSet
and Binding
decorations, to denote that the shaders can interact with their data.
The default way to map an OpenCL C language kernel to a Vulkan SPIR-V compute shader is as follows:
__constant
variables are mapped.) This is new default behaviour.-distinct-kernel-descriptor-sets
to get the old behaviour, where each kernel is assigned its own descriptor set number, such that the first kernel has descriptor set 0, and each subsequent kernel is an increment of 1 from the previous.DescriptorSet
.global
or constant
pointer, it is placed into a SPIR-V OpTypeStruct
that is decorated with Block
, and an OpVariable
of this structure type is created and decorated with the corresponding DescriptorSet
and Binding
, using the StorageBuffer
storage class.OpTypeStruct
that is decorated with Block
, and an OpVariable
of this structure type is created and decorated with the corresponding DescriptorSet
and Binding
, using the StorageBuffer
storage class.OpVariable
of the OpTypeImage
or OpTypeSampler
type is created and decorated with the corresponding DescriptorSet
and Binding
, using the UniformConstant
storage class.__local
storage, then no descriptor is generated. Instead, that argument is mapped to a variable in Workgroup
storage class, of type array-of-T. The array size is specified by an integer specialization constant. The specialization ID is reported in the descriptor map file, generated via the -descriptormap
option.The shaders produced use the GLSL450 memory model. As such, there is an assumption of no aliasing by default. The compiler does not generate Aliased decorations currently. Users should be aware of this and ensure they are not relying on aliasing.
Clspv embeds reflection information via use of the [NonSemantic.ClspvReflection]( (http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/nonsemantic/NonSemantic.ClspvReflection.html) non-semantic extended instruction set. It requires SPV_KHR_non_semantic_info. If your Vulkan implementation does not support VK_KHR_shader_non_semantic_info, the reflection instructions should be stripped before loading the module. SPIRV-Tools's optimizer has a transformation to strip non-semantic instructions (use the --strip-reflect
option).
The reflection instructions replace the descriptor map.
The compiler can report the descriptor set and bindings used for samplers in the sampler map and for the kernel arguments, and also array sizing information for pointer-to-local arguments. Run clspv-reflection <spirv> -o <descriptor map>
to produce the descriptor map.
The descriptor map is a text file with comma-separated values.
Consider this example:
// First kernel in the translation unit, and no sampler map is used. kernel void foo(global int* a, float f, global float* b, uint c) {...}
It generates the following descriptor map:
kernel_decl,foo kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod,argSize,4 kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,2,offset,0,argKind,buffer kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,3,offset,0,argKind,pod,argSize,4 spec_constant,workgroup_size_x,spec_id,0 spec_constant,workgroup_size_y,spec_id,1 spec_constant,workgroup_size_z,spec_id,2
The kernels in the module module are declared as follows:
kernel_decl
For kernel arguments of types pointer-to-global, pointer-to-constant, and plain-old-data types, the fields are:
kernel
to indicate a kernel argumentarg
to indicate a kernel argumentargOrdinal
to indicate a kernel argument ordinal position fielddescriptorSet
binding
offset
argKind
buffer
- OpenCL bufferbuffer_ubo
- OpenCL constant buffer. Sent in a uniform buffer.pod
- Plain Old Data, e.g. a scalar, vector, or structure. Sent in a storage buffer.pod_ubo
- Plain Old Data, e.g. a scalar, vector, or structure. Sent in a uniform buffer.pod_pushconstant
- Plain Old Data, e.g. a scalar, vector or structure. Sent in push constants.ro_image
- Read-only imagewo_image
- Write-only imagesampler
- SamplerargSize
Module-wide specialization constants are specified as follows:
spec_constant
to describe the use of a module-wide specialization constantspec_id
Consider this example, which uses pointer-to-local arguments:
kernel void foo(local float* L, global float* A, local float4 *L2) {...}
It generates the following descriptor map:
kernel_decl,foo kernel,foo,arg,L,argOrdinal,0,argKind,local,arrayElemSize,4,arrayNumElemSpecId,3 kernel,foo,arg,A,argOrdinal,1,descriptorSet,0,binding,0,offset,0,argKind,buffer kernel,foo,arg,L2,argOrdinal,2,argKind,local,arrayElemSize,16,arrayNumElemSpecId,4 spec_constant,workgroup_size_x,spec_id,0 spec_constant,workgroup_size_y,spec_id,1 spec_constant,workgroup_size_z,spec_id,2
The kernels in the module module are declared as follows:
kernel_decl
For kernel arguments of type pointer-to-local, the fields are:
kernel
to indicate a kernel argumentarg
to indicate a kernel argumentargOrdinal
to indicate a kernel argument ordinal position fieldargKind
local
to indicate a pointer-to-local argumentarrayElemSize
arrayNumElemSpecId
Workgroup
storage. Specifically, it is the SpecId
decoration on the integer constant that specficies the array size. (This number is always at least 3 so that specialization IDs 0, 1, and 2 can be use for the workgroup size dimensions along x
, y
, and z
.)Module-wide specialization constants are specified as follows:
spec_constant
to describe the use of a module-wide specialization constantspec_id
Notes: Each pointer-to-local argument is assigned its own array type and specialization constant to size the array. Unless you override the array size specialization constant at pipeline creation time, the array will only have one element.
If a sampler map is used, then samplers use descriptor set 0 and kernel descriptor set numbers start at 1. For example, if the sampler map file is mysamplermap
containing:
CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR
Then compiling with:
clspv foo.cl -samplermap=mysamplermap -descriptormap=mydescriptormap
Then mydescriptormap
will contain:
sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0 sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1 kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0,argKind,buffer kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,1,offset,0,argKind,pod,argSize,4 kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,2,offset,0,argKind,buffer kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,3,offset,0,argKind,pod,argSize,4
Normally plain-old-data arguments are passed into the kernel via a storage buffer. Use option -pod-ubo
to pass these parameters in via a uniform buffer. These can be faster to read in the shader.
When option -pod-ubo
is used, the descriptor map list the argKind
of a plain-old-data argument as pod_ubo
rather than the default of pod
.
Normally plain-old-data arguments are passed into the kernel via a storage buffer. Use the option -pod-pushconstant
to pass these parameters in via push constants. The option -cluster-pod-kernel-args=0
cannot be specified. Push constants are intended to provide a fast read path and should be faster to access than a buffer.
When the option -pod-pushconstant
is used, the descriptor map lists the argKind
of plain-old-data arguments as pod_pushconstant
rather than the default of pod
. There is no descriptorset
or binding
information for push constants.
Note: Vulkan implementations have limited push constant storage (default is 128B). clspv provides the option -max-pushconstant-size
to specify (in bytes) the implementation limit for push constants. This is validated at the start of the compile.
Note: Module scope push constanst are currently incompatible with plain-old-data arguments sent as push constants.
TODO(alan-baker): See #529 for the overall plan to address this.
The descriptor map entry for kernel arguments will not contain descriptorSet
or binding
entries. For example, an integer arg f
to kernel foo
might look like:
kernel,foo,arg,f,argOrdinal,1,offset,0,argKind,pod_pushconstant,argSize,4
Normally pointer-to-constant kernel arguments are passed into the kernel via a storage buffer. Use option -constant-args-ubo
to pass these parameters in via a uniform buffer. Uniform buffers can be faster to read in the shader.
The compiler will generate an error if the layout of the buffer does not satisfy the Standard Uniform Buffer Layout rules of the Vulkan specification (see section 15.5.4).
Descriptors can be scarce. So the compiler also has an option -cluster-pod-kernel-args
which can be used to reduce the number of descriptors. When the option is used:
By default this option is enabled. To disable this behavior, pass -cluster-pod-kernel-args=0
to the compiler.
For example:
// First kernel in the translation unit. void kernel foo(global int* a, float f, global float* b, uint c);
In the default case, the bindings are:
a
is mapped to a storage buffer with descriptor set 0, binding 0b
is mapped to a storage buffer with descriptor set 0, binding 1f
and c
are POD arguments, so they are mapped to the first and second members of a struct, and that struct is mapped to a storage buffer with descriptor set 0 and binding 2If -cluster-pod-kernel-args=0
is used:
a
is mapped to a storage buffer with descriptor set 0, binding 0f
is mapped to a storage buffer with descriptor set 0, binding 1b
is mapped to a storage buffer with descriptor set 0, binding 2c
is mapped to a storage buffer with descriptor set 0, binding 3That is, compiling as follows:
clspv foo.cl -descriptormap=myclusteredmap
will produce the following in myclusteredmap
:
kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,1,offset,0,argKind,buffer kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,2,offset,0,argKind,pod,argSize,4 kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,2,offset,4,argKind,pod,argSize,4
If foo
were the second kernel in the translation unit, then its arguments would also use descriptor set 0. If foo
were the second kernel in the translation unit and option -distinct-kernel-descriptor-sets
is used, then its arguments would use descriptor set 1.
Compiling with the same sampler map from before:
clspv foo.cl -descriptormap=myclusteredmap -samplermap=mysamplermap
produces the following descriptor map:
sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0 sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1 kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0,argKind,buffer kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,1,offset,0,argKind,buffer kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,2,offset,0,argKind,pod,argSize,4 kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,2,offset,4,argKind,pod,argSize,4
TODO(dneto): Give an example using images.
The descriptor map includes entries for specialization constants that are used module-wide. The following specialization constants are currently generated:
workgroup\_size\_x
: The x-dimension of workgroup size.workgroup\_size\_y
: The y-dimension of workgroup size.workgroup\_size\_z
: The z-dimension of workgroup size.work_dim
: The work dimensions.global\_offset\_x
: The x-dimension of global offset.global\_offset\_y
: The y-dimension of global offset.global\_offset\_z
: The z-dimension of global offset.By default, each module-scope variable in __constant
address space is mapped to a SPIR-V variable in Private address space, with an intializer. This works only for simple scenarios, where:
In more general cases, use compiler option -module-constants-in-storage-buffer
. In this case:
Consider this example kernel a.cl
:
typedef struct { char c; uint a; float f; } Foo; __constant Foo ppp[3] = {{'a', 0x1234abcd, 1.0}, {'b', 0xffffffff, 1.5}, {0}}; kernel void foo(global uint* A, uint i) { *A = ppp[i].a; }
Compiling as follows:
clspv a.cl -descriptormap=map -module-constants-in-storage-buffer
Produces the following in file map
:
constant,descriptorSet,1,binding,0,hexbytes,61000000cdab34120000803f62000000ffffffff0000c03f000000000000000000000000 kernel,foo,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer kernel,foo,arg,i,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod
The initialization data are in the line starting with constant
, and its fields are:
constant
to indicate constant initialization datadescriptorSet
binding
kind
buffer
to indicate the use of a storage bufferhexbytes
to indicate the next field is the data, as a sequence of bytes in hexadecimalTake a closer look at the hexadecimal bytes in the example. They are:
61
: ASCII character ‘a’000000
: zero padding to satisfy alignment for the 32-bit integer value that followscdab3412
: the integer value 0x1234abcd
in little-endian format0000803f
: the float value 1.062
: ASCII character ‘b’000000
: zero padding to satisfy alignment for the 32-bit integer value that followsffffffff
: the integer value 0xffffffff
0000c03f
: the float value 1.5000000000000000000000000
: 12 zero bytes representing the zero-initialized third Foo value.Some features, when enabled, require values to be passed by the application via push constants. For each value requested by clspv, an entry will be present in the descriptor map. The format is:
pushconstant
to indicate the entry is a push constantname
to indicate the name of the requested push constant followsoffset
to indicate the offset at which the push constant is expected followssize
to indicate that the total size followsHere is a list of the push constants currently supported:
global_offset
: the 3D global offset used by get_global_offset()
and in global ID calculations. A vector of 3 32-bit integer values. Lower dimensions come first in memory.enqueued_local_size
: the 3D local work size returned by get_enqueued_local_size()
. A vector of 3 32-bit integer values. Lower dimensions come first in memory.global_size
: the 3D global size of the NDRange as returned by get_global_size()
. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.num_workgroups
: the 3D number of work groups in the NDRange as returned by get_num_groups()
. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.region_offset
: the sum of the global ID offset into the NDRange for this uniform region and the global offset of the NDRange. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.region_group_offset
: the 3D group ID offset into the NDRange for this region. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.The following attributes are ignored in the OpenCL C source, and thus have no functional impact on the produced SPIR-V:
__attribute__((work_group_size_hint(X, Y, Z)))
__attribute__((packed))
__attribute__ ((endian(host)))
__attribute__ ((endian(device)))
__attribute__((vec_type_hint(<typen>)))
The __attribute__((reqd_work_group_size(X, Y, Z)))
kernel attribute specifies the work-group size that must be used with that kernel.
The OpenCL C language allows the work-group size to be set just before executing the kernel on the device, at clEnqueueNDRangeKernel()
time. Vulkan requires that the work-group size be specified no later than when the VkPipeline
is created, which in OpenCL terms corresponds to when the cl_kernel
is created.
To allow for the maximum flexibility to developers who are used to specifying the work-group size in the host API and not in the device-side kernel language, we can use specialization constants to allow for setting the work-group size at VkPipeline
creation time.
If all kernels in the OpenCL C source use the reqd_work_group_size
attribute, then that attribute will specify the work-group size that must be used and the required values will be set in the SPIR-V via the LocalSize
execution mode.
If at least one of the kernels does not use the reqd_work_group_size
attribute, the Vulkan SPIR-V produced by the compiler will contain specialization constants as follows:
SpecId
, whose value defaults to 1.SpecId
, whose value defaults to 1.SpecId
, whose value defaults to 1.In either case, metadata that can be used by the runtime or host code is recorded in the generated SPIR-V module, using either PropertyRequiredWorkgroupSize
and/or SpecConstantWorkgroupSize
non-semantic instructions.
Signed integer types are mapped down onto their unsigned equivalents in SPIR-V as produced from OpenCL C.
Signed integer modulus (%
) operations, where either argument to the modulus is a negative integer, will result in an undefined result.
OpenCL C language built-in functions are mapped, where possible, onto their GLSL 4.5 built-in equivalents. For example, the OpenCL C language built-in function tan()
is mapped onto GLSL's built-in function tan()
.
The OpenCL C built-in sign()
function does not differentiate between a signed and unsigned 0.0 input value, nor does it return 0.0 if the input value is a NaN.
The OpenCL C built-in mad24()
and mul24()
functions do not perform their operations using 24-bit integers. Instead, they use 32-bit integers, and thus have no performance-improving characteristics over normal 32-bit integer arithmetic.
The OpenCL C work-item functions map to Vulkan SPIR-V as follows:
get_work_dim()
is mapped to the work_dim
spec constant when -work-dim
is enabled (on by default), otherwise it always returns 3.get_global_size()
is implemented by multiplying the result from get_local_size()
by the result from get_num_groups()
.get_global_id()
is mapped to a SPIR-V variable decorated with GlobalInvocationId
. The global offset is added to that variable when -global-offset
or -global-offset-push-constant
is enabled.get_local_size()
is mapped to a SPIR-V variable decorated with WorkgroupSize
.get_local_id()
is mapped to a SPIR-V variable decorated with LocalInvocationId
.get_num_groups()
is mapped to a SPIR-V variable decorated with NumWorkgroups
.get_group_id()
is mapped to a SPIR-V variable decorated with WorkgroupId
.get_global_offset()
is mapped to the global_offset
spec constant or push constant when -global-offset
or -global-offset-push-constant
is enabled, otherwise it always returns 0. Spec constants are used unless -global-offset-push-constant
is specified or the language is set to OpenCL C++ or OpenCL 2.0.Some OpenCL C language features that have no expressible equivalents in Vulkan's variant of SPIR-V are restricted.
OpenCL C language kernels must not be called from other kernels.
Pointers of type half
must not be used as kernel arguments.
Booleans are an abstract type - they have no known compile-time size. Using a boolean type as the argument to the sizeof()
operator will result in an undefined value. The boolean type must not be used to form global
, or constant
variables, nor be used within a struct or union type in the global
, or constant
address spaces.
The char
, char2
, char3
, uchar
, uchar2
, and uchar3
types can be used. To disable general support for these types, use -int8=0
.
The double
, double2
, double3
and double4
types must not be used.
The image1d_buffer_t
type must not be used.
The event_t
type must not be used.
Pointers are an abstract type - they have no known compile-time size. Using a pointer type as the argument to the sizeof()
operator will result in an undefined value.
Pointer-to-integer casts must not be used.
Integer-to-pointer casts must not be used.
Pointers must not be compared for equality or inequality.
Vectors of 8 and 16 elements must not be used.
Recursively defined struct types must not be used.
Since pointers have no known compile-time size, the pointer-sized types size_t
, ptrdiff_t
, uintptr_t
, and intptr_t
do not represent types that are the same size as a pointer. Instead, those types are mapped to 32-bit integer types.
For any OpenCL C language built-in functions that are mapped onto their GLSL 4.5 built-in equivalents, the precision requirements of the OpenCL C language built-ins are not necessarily honoured. In general, Clspv authors expect implementations will satisfy the relaxed precision requirements described in the OpenCL C specification. This means kernels will operate as if compiled with --cl-fast-relaxed-math
. For higher performance (lower accuracy) variants of some builtin functions, clspv also provides the --cl-native-math
option. This option goes beyond fast-relaxed math and provides no precision guarantees (similar to the native_ functions in OpenCL).
The atomic_xchg()
built-in function that takes a floating-point argument must not be used.
The OpenCL 2.0 atomic functions are supported with the following exceptions:
atomic_flag
functions are not supportedmemory_order_seq_cst
is weakened to acquire for loads, release for stores and acquire release for read-modify-write operationsmemory_scope_all_svm_devices
and memory_scope_all_devices
are not supportedatomic_compare_exchange_weak*
is implemented as atomic_compare_exchange_strong*
The convert_<type>_rte()
, convert_<type>_rtz()
, convert_<type>_rtp()
, convert_<type>_rtn()
, convert_<type>_sat()
, convert_<type>_sat_rte()
, convert_<type>_sat_rtz()
, convert_<type>_sat_rtp()
, and convert_<type>_sat_rtn()
built-in functions must not be used.
All supported.
All supported.
All supported.
The vload<size>()
, vstore<size>()
, vstore_half_rtp()
, vstore_half_rtn()
, vstore_half<size>_rtp()
, vstore_half<size>_rtn()
, vstorea_half<size>_rtp()
, and vstorea_half<size>_rtn()
built-in functions must not be used.
The vload_half()
, vload_half<size>()
, vstore_half()
, vstore_half_rte()
, vstore_half_rtz()
, vstore_half<size>()
, vstore_half<size>_rte()
, vstore_half<size>_rtz()
, and vloada_half<size>()
built-in functions are only allowed to use the global
and constant
address spaces.
Note: When 16-bit storage support is not assumed, both vload_half
and vstore_half
assume the pointers are aligned to 4 bytes, not 2 bytes. See issue 6.
Builtin functions vstorea_half2()
, vstorea_half4()
, vstorea_half2_rtz()
, vstorea_half4_rtz()
, vstorea_half2_rte()
, and vstorea_half4_rte()
built-in functions have implementations for global, local, and private address spaces.
The vstore_half_rte()
, vstore_half_rtz()
, vstore_half<size>_rte()
, vstore_half<size>_rtz()
, vstorea_half<size>_rte()
, and vstorea_half<size>_rtz()
built-in functions are not guaranteed to round the result correctly if the destination address was not declared as a half*
on the kernel entry point.
The async_work_group_copy()
, async_work_group_strided_copy()
, wait_group_events()
, and prefetch()
built-in functions must not be used.
The shuffle()
, shuffle2()
and vec_step()
built-in functions must not be used.
The printf()
built-in function must not be used.
The get_image_channel_data_type()
and get_image_channel_order()
built-in functions must not be used.
The OpenCL extension cl_khr_subgroups
requires SPIR-V 1.3 or greater and translates the built-in functions to GroupNonUniform operations and Builtin constants as follows:
get_sub_group_size()
is mapped to BuiltInSubgroupSize
constant. Requires CapabilityGroupNonUniform
capability.get_num_sub_groups()
is mapped to BuiltInNumSubgroups
constant. Requires CapabilityGroupNonUniform
capability.get_sub_group_id()
is mapped to BuiltInSubgroupId
constant. Requires CapabilityGroupNonUniform
capability.get_sub_group_local_id()
is mapped to BuiltInSubgroupLocalInvocationId
constant. Requires CapabilityGroupNonUniform
capability.sub_group_broadcast()
is mapped to OpGroupNonUniformBroadcast
operation. Requires CapabilityGroupNonUniformBallot
capability. For SPIR-V version < 1.5 a constant laneId is required.sub_group_all()
is mapped to OpGroupNonUniformAll
operation. Requires CapabilityGroupNonUniformVote
capability.sub_group_any()
is mapped to OpGroupNonUniformAny
operation. Requires CapabilityGroupNonUniformVote
capability.sub_group_<group_op>_add()
is mapped to OpGroupNonUniformIAdd
operation for Integer types and OpGroupNonUniformFAdd
operation for Float types. Requires CapabilityGroupNonUniformArithmetic
capability.sub_group_<group_op>_min()
is mapped to OpGroupNonUniformSMin
operation for Signed-Integer types, OpGroupNonUniformUMin
operation for Unsigned-Integer types and OpGroupNonUniformFMin
operation for Float types. Requires CapabilityGroupNonUniformArithmetic
capability.sub_group_<group_op>_max()
is mapped to OpGroupNonUniformSMax
operation for Signed-Integer types, OpGroupNonUniformUMax
operation for Unsigned-Integer types and OpGroupNonUniformFMax
operation for Float types. Requires CapabilityGroupNonUniformArithmetic
capability.sub_group_barrier()
is mapped to OpControlBarrier with an execution scope of Subgroup
. If no memory scope is specified, Subgroup
is used. The memory semantics depend on the flags on the barrier.The group_op
qualifier translates as follows:
reduce
maps to GroupOperationReduce
.scan_exclusive
maps to GroupOperationExclusiveScan
.scan_inclusive
maps to GroupOperationInclusiveScan
.These extension built-in functions are not supported:
get_max_sub_group_size()
requires CapabilityKernel (incompatible with Shader)get_enqueued_num_sub_groups()
requires CapabilityKernel (incompatible with Shader)sub_group_reserve_read_pipe()
sub_group_reserve_write_pipe()
sub_group_commit_read_pipe()
sub_group_commit_write_pipe()
get_kernel_sub_group_count_for_ndrange()
get_kernel_max_sub_group_size_for_ndrange()
Clspv is not able to reach full accuracy requirements on the supported builtin functions in all cases. Instead, currently, it is able to meet the requirements tested by OpenCL CTS under relaxed precision requirements. In order to achieve this goal, some functions are implemented using the GLSL.std.450 extended or core instructions, some are implemented in the builtin library and some are emulated by the compiler.
Note: Clspv has been tested against five Vulkan implementations from different vendors and is able to achieve the relaxed accuracy requirements broadly.
add
, subtract
, divide
, multiply
, assignment
, not
, acos
, asin
, ceil
, cos
, cosh
, exp
, exp2
, exp10
, fabs
, floor
, fmax
, fmin
, log
, log2
, log10
, mad
, pow
, powr
, rint
, rsqrt
, sin
, sinh
, tan
, trunc
, half_cos
, half_exp
, half_exp2
, half_exp10
, half_log
, half_log2
, half_log10
, half_powr
, half_rsqrt
, half_sin
, half_tan
, clamp
, degrees
, mix
, radians
, sign
, smoothstep
, step
, cross
, dot
, normalize
, fast_distance
, fast_length
, fast_normalize
.
acosh
, asinh
, atanh
, cbrt
, erfc
, erf
, fma
, fmod
, fract
, frexp
, hypot
, ilogb
, ldexp
, lgamma
, lgamma_r
, logb
, maxmag
, modf
, nan
, nextafter
, remainder
, remquo
, rootn
, sqrt
, tanh
, tgamma
, half_divide
, half_recip
, half_sqrt
, distance
, length
, atan
, atan2pi
, atanpi
, atan2
.
Note: fma
has a very high runtime cost unless compiling with -cl-native-math
. Its accuracy requirements are not relaxed by -cl-fast-relaxed-math
and the library implementation emulates it using integers.
Note: acosh
, asinh
, atanh
, atan
, atan2
, atanpi
and atan2pi
, fma
, fmod
, fract
, frexp
, ldexp
, rsqrt
, half_sqrt
, sqrt
, tanh
, distance
, and length
are all implemented using core or extended instructions when compiling with -cl-native-math
.
acospi
, asinpi
, copysign
, cospi
, expm1
, fdim
, log1p
, pown
, round
, sincos
, sinpi
, tanpi
.
frexp
fails using Swiftshader as an implementation due to an internal error.ldexp
and rsqrt
fail to meet accuracy requirements on some Adreno GPUs.