docs/SPIR-V.rst - external/github.com/microsoft/DirectXShaderCompiler - Git at Google

 =====================================
 HLSL to SPIR-V Feature Mapping Manual
 =====================================

 .. contents::
    :local:
    :depth: 3

 Introduction
 ============

 This document describes the mappings from HLSL features to SPIR-V for Vulkan
 adopted by the SPIR-V codegen. For how to build, use, or contribute to the
 SPIR-V codegen and its internals, please see the
 `wiki <https://github.com/Microsoft/DirectXShaderCompiler/wiki/SPIR%E2%80%90V-CodeGen>`_
 page.

 `SPIR-V <https://www.khronos.org/registry/spir-v/>`_ is a binary intermediate
 language for representing graphical-shader stages and compute kernels for
 multiple Khronos APIs, such as Vulkan, OpenGL, and OpenCL. At the moment we
 only intend to support the Vulkan flavor of SPIR-V.

 DirectXShaderCompiler is the reference compiler for HLSL. Adding SPIR-V codegen
 in DirectXShaderCompiler will enable the usage of HLSL as a frontend language
 for Vulkan shader programming. Sharing the same code base also means we can
 track the evolution of HLSL more closely and always deliver the best of HLSL to
 developers. Moreover, developers will also have a unified compiler toolchain for
 targeting both DirectX and Vulkan. We believe this effort will benefit the
 general graphics ecosystem.

 Note that this document is expected to be an ongoing effort and grow as we
 implement more and more HLSL features.

 Overview
 ========

 Although they share the same basic concepts, DirectX and Vulkan are still
 different graphics APIs with semantic gaps. HLSL is the native shading language
 for DirectX, so certain HLSL features do not have corresponding mappings in
 Vulkan, and certain Vulkan specific information does not have native ways to
 express in HLSL source code. This section describes the general translation
 paradigms and how we close some of the major semantic gaps.

 Note that the term "semantic" is overloaded. In HLSL, it can mean the string
 attached to shader input or output. For such cases, we refer it as "HLSL
 semantic" or "semantic string". For other cases, we just use the normal
 "semantic" term.

 Shader entry function
 ---------------------

 HLSL entry functions can read data from the previous shader stage and write
 data to the next shader stage via function parameters and return value. On the
 contrary, Vulkan requires all SPIR-V entry functions taking no parameters and
 returning void. All data passing between stages should use global variables
 in the ``Input`` and ``Output`` storage class.

 To handle this difference, we emit a wrapper function as the SPIR-V entry
 function around the HLSL source code entry function. The wrapper function is
 responsible to read data from SPIR-V ``Input`` global variables and prepare
 them to the types required in the source code entry function signature, call
 the source code entry function, and then decompose the contents in return value
 (and ``out``/``inout`` parameters) to the types required by the SPIR-V
 ``Output`` global variables, and then write out. For details about the wrapper
 function, please refer to the `entry function wrapper`_ section.

 Shader stage IO interface matching
 ----------------------------------

 HLSL leverages semantic strings to link variables and pass data between shader
 stages. Great flexibility is allowed as for how to use the semantic strings.
 They can appear on function parameters, function returns, and struct members.
 In Vulkan, linking variables and passing data between shader stages is done via
 numeric ``Location`` decorations on SPIR-V global variables in the ``Input`` and
 ``Output`` storage class.

 To help handling such differences, we provide `Vulkan specific attributes`_ to
 let the developer to express precisely their intents. The compiler will also try
 its best to deduce the mapping from semantic strings to SPIR-V ``Location``
 numbers when such explicit Vulkan specific attributes are absent. Please see the
 `HLSL semantic and Vulkan Location`_ section for more details about the mapping
 and ``Location`` assignment.

 What makes the story complicated is Vulkan's strict requirements on interface
 matching. Basically, a variable in the previous stage is considered a match to
 a variable in the next stage if and only if they are decorated with the same
 ``Location`` number and with the exact same type, except for the outermost
 arrayness in hull/domain/geometry shader, which can be ignored regarding
 interface matching. This is causing problems together with the flexibility of
 HLSL semantic strings.

 Some HLSL system-value (SV) semantic strings will be mapped into SPIR-V
 variables with builtin decorations, some are not. HLSL non-SV semantic strings
 should all be mapped to SPIR-V variables without builtin decorations (but with
 ``Location`` decorations).

 With these complications, if we are grouping multiple semantic strings in a
 struct in the HLSL source code, that struct should be flattened and each of
 its members should be mapped separately. For example, for the following:

 .. code:: hlsl

   struct T {
     float2 clip0 : SV_ClipDistance0;
     float3 cull0 : SV_CullDistance0;
     float4 foo   : FOO;
   };

   struct S {
     float4 pos   : SV_Position;
     float2 clip1 : SV_ClipDistance1;
     float3 cull1 : SV_CullDistance1;
     float4 bar   : BAR;
     T      t;
   };

 If we have an ``S`` input parameter in pixel shader, we should flatten it
 recursively to generate five SPIR-V ``Input`` variables. Three of them are
 decorated by the ``Position``, ``ClipDistance``, ``CullDistance`` builtin,
 and two of them are decorated by the ``Location`` decoration. (Note that
 ``clip0`` and ``clip1`` are concatenated, also ``cull0`` and ``cull1``.
 The ``ClipDistance`` and ``CullDistance`` builtins are special and explained
 in the `ClipDistance & CullDistance`_ section.)

 Flattening is infective because of Vulkan interface matching rules. If we
 flatten a struct in the output of a previous stage, which may create multiple
 variables decorated with different ``Location`` numbers, we also need to
 flatten it in the input of the next stage. otherwise we may have ``Location``
 mismatch even if we share the same definition of the struct. Because
 hull/domain/geometry shader is optional, we can have different chains of shader
 stages, which means we need to flatten all shader stage interfaces. For
 hull/domain/geometry shader, their inputs/outputs have an additional arrayness.
 So if we are seeing an array of structs in these shaders, we need to flatten
 them into arrays of its fields.

 Vulkan specific features
 ------------------------

 We try to implement Vulkan specific features using the most intuitive and
 non-intrusive ways in HLSL, which means we will prefer native language
 constructs when possible. If that is inadequate, we then consider attaching
 `Vulkan specific attributes`_ to them, or introducing new syntax.

 Descriptors
 ~~~~~~~~~~~

 The compiler provides multiple mechanisms to specify which Vulkan descriptor
 a particular resource binds to.

 In the source code, you can use the ``[[vk::binding(X[, Y])]]`` and
 ``[[vk::counter_binding(X)]]`` attribute. The native ``:register()`` attribute
 is also respected.

 On the command-line, you can use the ``-fvk-{b|s|t|u}-shift`` or
 ``-fvk-bind-register`` option.

 If you can modify the source code, the ``[[vk::binding(X[, Y])]]`` and
 ``[[vk::counter_binding(X)]]`` attribute gives you find-grained control over
 descriptor assignment.

 If you cannot modify the source code, you can use command-line options to change
 how ``:register()`` attribute is handled by the compiler. ``-fvk-bind-register``
 lets you to specify the descriptor for the source at a certain register.
 ``-fvk-{b|s|t|u}-shift`` lets you to apply shifts to all register numbers
 of a certain register type. They cannot be used together, though.

 When the ``[[vk::combinedImageSampler]]`` attribute is applied, only the
 ``-fvk-t-shift`` value will be used to apply shifts to combined texture and
 sampler resource bindings and any ``-fvk-s-shift`` value will be ignored.

 Without attribute and command-line option, ``:register(xX, spaceY)`` will be
 mapped to binding ``X`` in descriptor set ``Y``. Note that register type ``x``
 is ignored, so this may cause overlap.

 The more specific a mechanism is, the higher precedence it has, and command-line
 option has higher precedence over source code attribute.

 For more details, see `HLSL register and Vulkan binding`_, `Vulkan specific
 attributes`_, and `Vulkan-specific options`_.

 Subpass inputs
 ~~~~~~~~~~~~~~

 Within a Vulkan `rendering pass <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#renderpass>`_,
 a subpass can write results to an output target that can then be read by the
 next subpass as an input subpass. The "Subpass Input" feature regards the
 ability to read an output target.

 Subpasses are read through two new builtin resource types, available only in
 pixel shader:

 .. code:: hlsl

   class SubpassInput<T> {
     T SubpassLoad();
   };

   class SubpassInputMS<T> {
     T SubpassLoad(int sampleIndex);
   };

 In the above, ``T`` is a scalar or vector type. If omitted, it will defaults to
 ``float4``.

 Subpass inputs are implicitly addressed by the pixel's (x, y, layer) coordinate.
 These objects support reading the subpass input through the methods as shown
 in the above.

 A subpass input is selected by using a new attribute ``vk::input_attachment_index``.
 For example:

 .. code:: hlsl

   [[vk::input_attachment_index(i)]] SubpassInput input;

 An ``vk::input_attachment_index`` of ``i`` selects the ith entry in the input
 pass list. (See Vulkan API spec for more information.)

 Push constants
 ~~~~~~~~~~~~~~

 Vulkan push constant blocks are represented using normal global variables of
 struct types in HLSL. The variables (not the underlying struct types) should be
 annotated with the ``[[vk::push_constant]]`` attribute.

 Please note as per the requirements of Vulkan, "there must be no more than one
 push constant block statically used per shader entry point."

 Specialization constants
 ~~~~~~~~~~~~~~~~~~~~~~~~

 To use Vulkan specialization constants, annotate global constants with the
 ``[[vk::constant_id(X)]]`` attribute. For example,

 .. code:: hlsl

   [[vk::constant_id(1)]] const bool  specConstBool  = true;
   [[vk::constant_id(2)]] const int   specConstInt   = 42;
   [[vk::constant_id(3)]] const float specConstFloat = 1.5;

 Shader Record Buffer
 ~~~~~~~~~~~~~~~~~~~~

 SPV_NV_ray_tracing exposes user managed buffer in shader binding table by
 using storage class ShaderRecordBufferNV. ConstantBuffer or cbuffer blocks
 can now be mapped to this storage class under HLSL by using
 ``[[vk::shader_record_nv]]`` annotation. It is applicable only on ConstantBuffer
 and cbuffer declarations.

 Please note as per the requirements of VK_NV_ray_tracing, "there must be no
 more than one shader_record_nv block statically used per shader entry point
 otherwise results are undefined."

 The official Khronos ray tracing extension also comes with a SPIR-V storage class
 that has the same functionality. The ``[[vk::shader_record_ext]]`` annotation can
 be used when targeting the SPV_KHR_ray_tracing extension.

 Builtin variables
 ~~~~~~~~~~~~~~~~~

 Some of the Vulkan builtin variables have no equivalents in native HLSL
 language. To support them, ``[[vk::builtin("<builtin>")]]`` is introduced.
 Right now the following ``<builtin>`` are supported:

 * ``PointSize``: The GLSL equivalent is ``gl_PointSize``.
 * ``HelperInvocation``: For Vulkan 1.3 or above, we use its GLSL equivalent
   ``gl_HelperInvocation`` and decorate it with ``HelperInvocation`` builtin
   since Vulkan 1.3 or above supports ``Volatile`` decoration for builtin
   variables. For Vulkan 1.2 or earlier, we do not create a builtin variable for
   ``HelperInvocation``. Instead, we create a variable with ``Private`` storage
   class and set its value as the result of `OpIsHelperInvocationEXT <https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/EXT/SPV_EXT_demote_to_helper_invocation.html#OpIsHelperInvocationEXT>`_
   instruction.
 * ``BaseVertex``: The GLSL equivalent is ``gl_BaseVertexARB``.
   Need ``SPV_KHR_shader_draw_parameters`` extension.
 * ``BaseInstance``: The GLSL equivalent is ``gl_BaseInstanceARB``.
   Need ``SPV_KHR_shader_draw_parameters`` extension.
 * ``DrawIndex``: The GLSL equivalent is ``gl_DrawIDARB``.
   Need ``SPV_KHR_shader_draw_parameters`` extension.
 * ``DeviceIndex``: The GLSL equivalent is ``gl_DeviceIndex``.
   Need ``SPV_KHR_device_group`` extension.
 * ``ViewportMaskNV``: The GLSL equivalent is ``gl_ViewportMask``.

 Please see Vulkan spec. `14.6. Built-In Variables <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#interfaces-builtin-variables>`_
 for detailed explanation of these builtins.

 Supported extensions
 ~~~~~~~~~~~~~~~~~~~~

 * SPV_KHR_16bit_storage
 * SPV_KHR_device_group
 * SPV_KHR_fragment_shading_rate
 * SPV_KHR_multivew
 * SPV_KHR_post_depth_coverage
 * SPV_KHR_shader_draw_parameters
 * SPV_EXT_descriptor_indexing
 * SPV_EXT_fragment_fully_covered
 * SPV_EXT_mesh_shader
 * SPV_EXT_shader_stencil_support
 * SPV_AMD_shader_early_and_late_fragment_tests
 * SPV_AMD_shader_explicit_vertex_parameter
 * SPV_GOOGLE_hlsl_functionality1
 * SPV_GOOGLE_user_type
 * SPV_NV_mesh_shader

 Vulkan specific attributes
 --------------------------

 `C++ attribute specifier sequence <http://en.cppreference.com/w/cpp/language/attributes>`_
 is a non-intrusive way of providing Vulkan specific information in HLSL.

 The namespace ``vk`` will be used for all Vulkan attributes:

 - ``location(X)``: For specifying the location (``X``) numbers for stage
   input/output variables. Allowed on function parameters, function returns,
   and struct fields.
 - ``binding(X[, Y])``: For specifying the descriptor set (``Y``) and binding
   (``X``) numbers for resource variables. The descriptor set (``Y``) is
   optional; if missing, it will be set to 0. Allowed on global variables.
 - ``counter_binding(X)``: For specifying the binding number (``X``) for the
   associated counter for RW/Append/Consume structured buffer. The descriptor
   set number for the associated counter is always the same as the main resource.
 - ``push_constant``: For marking a variable as the push constant block. Allowed
   on global variables of struct type. At most one variable can be marked as
   ``push_constant`` in a shader.
 - ``offset(X)``: For manually layout struct members. Annotating a struct member
   with this attribute will force the compiler to put the member at offset ``X``
   w.r.t. the beginning of the struct. Only allowed on struct members.
 - ``constant_id(X)``: For marking a global constant as a specialization constant.
   Allowed on global variables of boolean/integer/float types.
 - ``input_attachment_index(X)``: To associate the Xth entry in the input pass
   list to the annotated object. Only allowed on objects whose type are
   ``SubpassInput`` or ``SubpassInputMS``.
 - ``builtin("X")``: For specifying an entity should be translated into a certain
   Vulkan builtin variable. Allowed on function parameters, function returns,
   and struct fields.
 - ``index(X)``: For specifying the index at a specific pixel shader output
   location. Used for dual-source blending.
 - ``post_depth_coverage``: The input variable decorated with SampleMask will
   reflect the result of the EarlyFragmentTests. Only valid on pixel shader entry points.
 - ``combinedImageSampler``: For specifying a Texture (e.g., ``Texture2D``,
   ``Texture1DArray``, ``TextureCube``) and ``SamplerState`` to use the combined image
   sampler (or sampled image) type with the same descriptor set and binding numbers (see
   `wiki page <https://github.com/microsoft/DirectXShaderCompiler/wiki/Vulkan-combined-image-sampler-type>`_
   for more detail).
 - ``early_and_late_tests``: Marks an entry point as enabling early and late depth
   tests. If depth is written via ``SV_Depth``, ``depth_unchanged`` must also be specified
   (``SV_DepthLess`` and ``SV_DepthGreater`` can be written freely). If a stencil reference
   value is written via ``SV_StencilRef``, one of ``stencil_ref_unchanged_front``,
   ``stencil_ref_greater_equal_front``, or ``stencil_ref_less_equal_front`` and
   one of ``stencil_ref_unchanged_back``, ``stencil_ref_greater_equal_back``, or
   ``stencil_ref_less_equal_back`` must be specified.
 - ``depth_unchanged``: Specifies that any depth written to ``SV_Depth`` will not
   invalidate the result of early depth tests. Sets the ``DepthUnchanged`` execution
   mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_unchanged_front``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will not invalidate the result of early stencil tests when
   the fragment is front facing. Sets the ``StencilRefUnchangedFrontAMD`` execution
   mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_greater_equal_front``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will be greater than or equal to the stencil reference value
   set by the API when the fragment is front facing. Sets the ``StencilRefGreaterFrontAMD``
   execution mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_less_equal_front``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will be less than or equal to the stencil reference value
   set by the API when the fragment is front facing. Sets the ``StencilRefLessFrontAMD``
   execution mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_unchanged_back``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will not invalidate the result of early stencil tests when
   the fragment is back facing. Sets the ``StencilRefUnchangedBackAMD`` execution
   mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_greater_equal_back``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will be greater than or equal to the stencil reference value
   set by the API when the fragment is back facing. Sets the ``StencilRefGreaterBackAMD``
   execution mode in SPIR-V. Only valid on pixel shader entry points.
 - ``stencil_ref_less_equal_back``: Specifies that any stencil ref written to
   ``SV_StencilRef`` will be less than or equal to the stencil reference value
   set by the API when the fragment is back facing. Sets the ``StencilRefLessBackAMD``
   execution mode in SPIR-V. Only valid on pixel shader entry points.

 Only ``vk::`` attributes in the above list are supported. Other attributes will
 result in warnings and be ignored by the compiler. All C++11 attributes will
 only trigger warnings and be ignored if not compiling towards SPIR-V.

 For example, to specify the layout of resource variables and the location of
 interface variables:

 .. code:: hlsl

   struct S { ... };

   [[vk::binding(X, Y), vk::counter_binding(Z)]]
   RWStructuredBuffer<S> mySBuffer;

   [[vk::location(M)]] float4
   main([[vk::location(N)]] float4 input: A) : B
   { ... }

 Macro for SPIR-V
 ----------------

 If SPIR-V CodeGen is enabled and ``-spirv`` flag is used as one of the command
 line options (meaning that "generates SPIR-V code"), it defines an implicit
 macro ``__spirv__``. For example, this macro definition can be used for SPIR-V
 specific part of the HLSL code:

 .. code:: hlsl

   #ifdef __spirv__
   [[vk::binding(X, Y), vk::counter_binding(Z)]]
   #endif
   RWStructuredBuffer<S> mySBuffer;

 SPIR-V version and extension
 ----------------------------

 SPIR-V CodeGen provides two command-line options for fine-grained SPIR-V target
 environment (hence SPIR-V version) and SPIR-V extension control:

 - ``-fspv-target-env=``: for specifying SPIR-V target environment
 - ``-fspv-extension=``: for specifying allowed SPIR-V extensions

 ``-fspv-target-env=`` accepts a Vulkan target environment (see ``-help`` for
 supported values). If such an option is not given, the CodeGen defaults to
 ``vulkan1.0``. When targeting ``vulkan1.0``, trying to use features that are only
 available in Vulkan 1.1 (SPIR-V 1.3), like `Shader Model 6.0 wave intrinsics`_,
 will trigger a compiler error.

 If ``-fspv-extension=`` is not specified, the CodeGen will select suitable
 SPIR-V extensions to translate the source code. Otherwise, only extensions
 supplied via ``-fspv-extension=`` will be used. If that does not suffice, errors
 will be emitted explaining what additional extensions are required to translate
 what specific feature in the source code. If you want to allow all KHR
 extensions, you can use ``-fspv-extension=KHR``.

 Legalization, optimization, validation
 --------------------------------------

 After initial translation of the HLSL source code, SPIR-V CodeGen will further
 conduct legalization (if needed), optimization (if requested), and validation
 (if not turned off). All these three stages are outsourced to `SPIRV-Tools <https://github.com/KhronosGroup/SPIRV-Tools>`_.
 Here are the options controlling these stages:

 * ``-fcgl``: turn off legalization and optimization
 * ``-Od``: turn off optimization
 * ``-Vd``: turn off validation

 Legalization
 ~~~~~~~~~~~~

 HLSL is a fairly permissive language considering the flexibility it provides for
 manipulating resource objects. The developer can create local copies, pass
 them around as function parameters and return values, as long as after certain
 transformations (function inlining, constant evaluation and propagating, dead
 code elimination, etc.), the compiler can remove all temporary copies and
 pinpoint all uses to unique global resource objects.

 Resulting from the above property of HLSL, if we translate into SPIR-V for
 Vulkan literally from the input HLSL source code, we will sometimes generate
 illegal SPIR-V. Certain transformations are needed to legalize the literally
 translated SPIR-V. Performing such transformations at the frontend AST level
 is cumbersome or impossible (e.g., function inlining). They are better to be
 conducted at SPIR-V level. Therefore, legalization is delegated to SPIRV-Tools.

 Specifically, we need to legalize the following HLSL source code patterns:

 * Using resource types in struct types
 * Creating aliases of global resource objects
 * Control flows invovling the above cases

 Legalization transformations will not run unless the above patterns are
 encountered in the source code.

 For more details, please see the `SPIR-V cookbook <https://github.com/Microsoft/DirectXShaderCompiler/tree/master/docs/SPIRV-Cookbook.rst>`_,
 which contains examples of what HLSL code patterns will be accepted and
 generate valid SPIR-V for Vulkan.

 Optimization
 ~~~~~~~~~~~~

 Optimization is also delegated to SPIRV-Tools. Right now there are no difference
 between optimization levels greater than zero; they will all invoke the same
 optimization recipe. That is, the recipe behind ``spirv-opt -O``.  If you want to
 run a custom optimization recipe, you can do so using the command line option
 ``-Oconfig=`` and specifying a comma-separated list of your desired passes.
 The passes are invoked in the specified order.

 For example, you can specify ``-Oconfig=--loop-unroll,--scalar-replacement=300,--eliminate-dead-code-aggressive``
 to firstly invoke loop unrolling, then invoke scalar replacement of aggregates,
 lastly invoke aggressive dead code elimination. All valid options to
 ``spirv-opt`` are accepted as components to the comma-separated list.

 Here are the typical passes in alphabetical order:

 * ``--ccp``
 * ``--cfg-cleanup``
 * ``--convert-local-access-chains``
 * ``--copy-propagate-arrays``
 * ``--eliminate-dead-branches``
 * ``--eliminate-dead-code-aggressive``
 * ``--eliminate-dead-functions``
 * ``--eliminate-local-multi-store``
 * ``--eliminate-local-single-block``
 * ``--eliminate-local-single-store``
 * ``--flatten-decorations``
 * ``--if-conversion``
 * ``--inline-entry-points-exhaustive``
 * ``--local-redundancy-elimination``
 * ``--loop-fission``
 * ``--loop-fusion``
 * ``--loop-unroll``
 * ``--loop-unroll-partial=[<n>]``
 * ``--loop-peeling`` (requires ``--loop-peeling-threshold``)
 * ``--merge-blocks``
 * ``--merge-return``
 * ``--loop-unswitch``
 * ``--private-to-local``
 * ``--reduce-load-size``
 * ``--redundancy-elimination``
 * ``--remove-duplicates``
 * ``--replace-invalid-opcode``
 * ``--ssa-rewrite``
 * ``--scalar-replacement[=<n>]``
 * ``--simplify-instructions``
 * ``--vector-dce``


 Besides, there are two special batch options; each stands for a recommended
 recipe by itself:

 * ``-O``: A bunch of passes in an appropriate order that attempt to improve
   performance of generated code. Same as ``spirv-opt -O``. Also same as SPIR-V
   CodeGen's default recipe.
 * ``-Os``: A bunch of passes in an appropriate order that attempt to reduce the
   size of the generated code. Same as ``spirv-opt -Os``.

 So if you want to run loop unrolling additionally after the default optimization
 recipe, you can specify ``-Oconfig=-O,--loop-unroll``.

 For the whole list of accepted passes and details about each one, please see
 ``spirv-opt``'s help manual (``spirv-opt --help``), or the SPIRV-Tools `optimizer header file <https://github.com/KhronosGroup/SPIRV-Tools/blob/master/include/spirv-tools/optimizer.hpp>`_.

 Validation
 ~~~~~~~~~~

 Validation is turned on by default as the last stage of SPIR-V CodeGen. Failing
 validation, which indicates there is a CodeGen bug, will trigger a fatal error.
 Please file an issue if you see that.

 Debugging
 ---------

 By default, the compiler will only emit names for types and variables as debug
 information, to aid reading of the generated SPIR-V. The ``-Zi`` option will
 let the compiler emit the following additional debug information:

 * Full path of the main source file using ``OpSource``
 * Preprocessed source code using ``OpSource`` and ``OpSourceContinued``
 * Line information for certain instructions using ``OpLine`` (WIP)
 * DXC Git commit hash using ``OpModuleProcessed`` (requires Vulkan 1.1)
 * DXC command-line options used to compile the shader using ``OpModuleProcessed``
   (requires Vulkan 1.1)

 We chose to embed preprocessed source code instead of original source code to
 avoid pulling in lots of contents unrelated to the current entry point, and
 boilerplate contents generated by engines. We may add a mode for selecting
 between preprocessed single source code and original separated source code in
 the future.

 One thing to note is that to keep the line numbers in consistent with the
 embedded source, the compiler is invoked twice; the first time is for
 preprocessing the source code, and the second time is for feeding the
 preprocessed source code as input for a whole compilation. So using ``-Zi``
 means performance penality.

 If you want to have fine-grained control over the categories of emitted debug
 information, you can use ``-fspv-debug=``. It accepts:

 * ``file``: for emitting full path of the main source file
 * ``source``: for emitting preprocessed source code (turns on ``file`` implicitly)
 * ``line``: for emitting line information (turns on ``source`` implicitly)
 * ``tool``: for emitting DXC Git commit hash and command-line options

 These ``-fspv-debug=`` options overrule ``-Zi``. And you can provide multiple
 instances of ``-fspv-debug=``. For example, you can use ``-fspv-debug=file
 -fspv-debug=tool`` to turn on emitting file path and DXC information; source
 code and line information will not be emitted.

 If you want to generate `NonSemantic.Shader.DebugInfo.100 <http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/main/nonsemantic/NonSemantic.Shader.DebugInfo.100.html>`_ extended instructions, you can use
 ``-fspv-debug=vulkan-with-source``. These instructions support source-level
 shader debugging with tools such as RenderDoc, even if the SPIR-V is optimized.
 This option overrules the other ``-fspv-debug`` options above.

 Reflection
 ----------

 Making reflection easier is one of the goals of SPIR-V CodeGen. This section
 provides guidelines about how to reflect on certain facts.

 Note that we generate ``OpName``/``OpMemberName`` instructions for various
 types/variables both explicitly defined in the source code and interally created
 by the compiler. These names are primarily for debugging purposes in the
 compiler. They have "no semantic impact and can safely be removed" according
 to the SPIR-V spec. And they are subject to changes without notice. So we do
 not suggest to use them for reflection.

 Source code shader profile
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 The source code shader profile version can be re-discovered by the "Version"
 operand in ``OpSource`` instruction. For ``*s_<major>_<minor>``, the "Verison"
 operand in ``OpSource`` will be set as ``<major>`` * 100 + ``<minor>`` * 10.
 For example, ``vs_5_1`` will have 510, ``ps_6_2`` will have 620.

 HLSL Semantic
 ~~~~~~~~~~~~~

 HLSL semantic strings are by default not emitted into the SPIR-V binary module.
 If you need them, by specifying ``-fspv-reflect``, the compiler will use
 the ``Op*DecorateStringGOOGLE`` instruction in `SPV_GOOGLE_hlsl_funtionality1 <https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/GOOGLE/SPV_GOOGLE_hlsl_functionality1.asciidoc>`_
 extension to emit them.

 HLSL User Types
 ~~~~~~~~~~~~~~~

 HLSL type information is by default not emitted into the SPIR-V binary module.
 If you need them, by specifying ``-fspv-reflect``, the compiler will emit
 ``OpDecorateString*`` instructions with a ``UserTypeGOOGLE`` decoration and the
 `SPV_GOOGLE_user_type <https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/GOOGLE/SPV_GOOGLE_user_type.asciidoc>`_
 extension. A string name for the unambiguous type of the decorated object will
 be included in the user's source using the lowercase type name followed by
 template params. For example, ``Texture2DMSArray<float4, 64> arr`` would be
 decorated with ``OpDecorateString %arr UserTypeGOOGLE "texture2dmsarray:<float4,64>"``.

 Counter buffers for RW/Append/Consume StructuredBuffer
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The association between a counter buffer and its main RW/Append/Consume
 StructuredBuffer is conveyed by ``OpDecorateId <structured-buffer-id>
 HLSLCounterBufferGOOGLE <counter-buffer-id>`` instruction from the
 `SPV_GOOGLE_hlsl_funtionality1 <https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/GOOGLE/SPV_GOOGLE_hlsl_functionality1.asciidoc>`_
 extension. This information is by default missing; you need to specify
 ``-fspv-reflect`` to direct the compiler to emit them.

 Read-only vs. read-write resource types
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 There are no clear and consistent decorations in the SPIR-V to show whether a
 resource type is translated from a read-only (RO) or read-write (RW) HLSL
 resource type. Instead, you need to use different checks for reflecting different
 resource types:

 * HLSL samplers: RO.
 * HLSL ``Buffer``/``RWBuffer``/``Texture*``/``RWTexture*``: Check the "Sampled"
   operand in the ``OpTypeImage`` instruction they translated into. "2" means RW,
   "1" means RO.
 * HLSL constant/texture/structured/byte buffers: Check both ``Block``/``BufferBlock``
   and ``NonWritable`` decoration. If decorated with ``Block`` (``cbuffer`` &
   ``ConstantBuffer``), then RO; if decorated with ``BufferBlock`` and ``NonWritable``
   (``tbuffer``, ``TextureBuffer``, ``StructuredBuffer``), then RO; Otherwise, RW.


 HLSL Types
 ==========

 This section lists how various HLSL types are mapped.

 Normal scalar types
 -------------------

 `Normal scalar types <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509646(v=vs.85).aspx>`_
 in HLSL are relatively easy to handle and can be mapped directly to SPIR-V
 type instructions:

 ============================== ======================= ================== =========== =================================
       HLSL                      Command Line Option           SPIR-V       Capability       Extension
 ============================== ======================= ================== =========== =================================
 ``bool``                                               ``OpTypeBool``
 ``int``/``int32_t``                                    ``OpTypeInt 32 1``
 ``int16_t``                    ``-enable-16bit-types`` ``OpTypeInt 16 1`` ``Int16``
 ``uint``/``dword``/``uin32_t``                         ``OpTypeInt 32 0``
 ``uint16_t``                   ``-enable-16bit-types`` ``OpTypeInt 16 0`` ``Int16``
 ``half``                                               ``OpTypeFloat 32``
 ``half``/``float16_t``         ``-enable-16bit-types`` ``OpTypeFloat 16``             ``SPV_AMD_gpu_shader_half_float``
 ``float``/``float32_t``                                ``OpTypeFloat 32``
 ``snorm float``                                        ``OpTypeFloat 32``
 ``unorm float``                                        ``OpTypeFloat 32``
 ``double``/``float64_t``                               ``OpTypeFloat 64`` ``Float64``
 ============================== ======================= ================== =========== =================================

 Please note that ``half`` is translated into 32-bit floating point numbers
 if without ``-enable-16bit-types`` because MSDN says that "this data type
 is provided only for language compatibility. Direct3D 10 shader targets map
 all ``half`` data types to ``float`` data types."

 Minimal precision scalar types
 ------------------------------

 HLSL also supports various
 `minimal precision scalar types <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509646(v=vs.85).aspx>`_,
 which graphics drivers can implement by using any precision greater than or
 equal to their specified bit precision.
 There are no direct mappings in SPIR-V for these types. We translate them into
 the corresponding 16-bit or 32-bit scalar types with the ``RelaxedPrecision`` decoration.
 We use the 16-bit variants if '-enable-16bit-types' command line option is present.
 For more information on these types, please refer to:
 https://github.com/Microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types

 ============== ======================= ================== ==================== ============ =================================
     HLSL        Command Line Option          SPIR-V            Decoration       Capability        Extension
 ============== ======================= ================== ==================== ============ =================================
 ``min16float``                         ``OpTypeFloat 32`` ``RelaxedPrecision``
 ``min10float``                         ``OpTypeFloat 32`` ``RelaxedPrecision``
 ``min16int``                           ``OpTypeInt 32 1`` ``RelaxedPrecision``
 ``min12int``                           ``OpTypeInt 32 1`` ``RelaxedPrecision``
 ``min16uint``                          ``OpTypeInt 32 0`` ``RelaxedPrecision``
 ``min16float`` ``-enable-16bit-types`` ``OpTypeFloat 16``                                   ``SPV_AMD_gpu_shader_half_float``
 ``min10float`` ``-enable-16bit-types`` ``OpTypeFloat 16``                                   ``SPV_AMD_gpu_shader_half_float``
 ``min16int``   ``-enable-16bit-types`` ``OpTypeInt 16 1``                      ``Int16``
 ``min12int``   ``-enable-16bit-types`` ``OpTypeInt 16 1``                      ``Int16``
 ``min16uint``  ``-enable-16bit-types`` ``OpTypeInt 16 0``                      ``Int16``
 ============== ======================= ================== ==================== ============ =================================

 Vectors and matrices
 --------------------

 `Vectors <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509707(v=vs.85).aspx>`_
 and `matrices <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509623(v=vs.85).aspx>`_
 are translated into:

 ==================================== ====================================================
               HLSL                                         SPIR-V
 ==================================== ====================================================
 ``|type|N`` (``N`` > 1)              ``OpTypeVector |type| N``
 ``|type|1``                          The scalar type for ``|type|``
 ``|type|MxN`` (``M`` > 1, ``N`` > 1) ``%v = OpTypeVector |type| N`` ``OpTypeMatrix %v M``
 ``|type|Mx1`` (``M`` > 1)            ``OpTypeVector |type| M``
 ``|type|1xN`` (``N`` > 1)            ``OpTypeVector |type| N``
 ``|type|1x1``                        The scalar type for ``|type|``
 ==================================== ====================================================

 The above table is for float matrices.

 A MxN HLSL float matrix is translated into a SPIR-V matrix with M vectors, each with
 N elements. Conceptually HLSL matrices are row-major while SPIR-V matrices are
 column-major, thus all HLSL matrices are represented by their transposes.
 Doing so may require special handling of certain matrix operations:

 - **Indexing**: no special handling required. ``matrix[m][n]`` will still access
   the correct element since ``m``/``n`` means the ``m``-th/``n``-th row/column
   in HLSL but ``m``-th/``n``-th vector/element in SPIR-V.
 - **Per-element operation**: no special handling required.
 - **Matrix multiplication**: need to swap the operands. ``mat1 x mat2`` should
   be translated as ``transpose(mat2) x transpose(mat1)``. Then the result is
   ``transpose(mat1 x mat2)``.
 - **Storage layout**: ``row_major``/``column_major`` will be translated into
   SPIR-V ``ColMajor``/``RowMajor`` decoration. This is because HLSL matrix
   row/column becomes SPIR-V matrix column/row. If elements in a row/column are
   packed together, they should be loaded into a column/row correspondingly.

 See `Appendix A. Matrix Representation`_ for further explanation regarding these design choices.

 Since the ``Shader`` capability in SPIR-V does not allow to parameterize matrix
 types with non-floating-point types, a non-floating-point MxN matrix is translated
 into an array with M elements, with each element being a vector with N elements.

 Structs
 -------

 `Structs <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509668(v=vs.85).aspx>`_
 in HLSL are defined in the a format similar to C structs. They are translated
 into SPIR-V ``OpTypeStruct``. Depending on the storage classes of the instances,
 a single struct definition may generate multiple ``OpTypeStruct`` instructions
 in SPIR-V. For example, for the following HLSL source code:

 .. code:: hlsl

   struct S { ... }

   ConstantBuffer<S>   myCBuffer;
   StructuredBuffer<S> mySBuffer;

   float4 main() : A {
     S myLocalVar;
     ...
   }

 There will be three different ``OpTypeStruct`` generated, one for each variable
 defined in the above source code. This is because the ``OpTypeStruct`` for
 both ``myCBuffer`` and ``mySBuffer`` will have layout decorations (``Offset``,
 ``MatrixStride``, ``ArrayStride``, ``RowMajor``, ``ColMajor``). However, their
 layout rules are different (by default); ``myCBuffer`` will use vector-relaxed
 OpenGL ``std140`` while ``mySBuffer`` will use vector-relaxed OpenGL ``std430``.
 ``myLocalVar`` will have its ``OpTypeStruct`` without layout decorations.
 Read more about storage classes in the `Constant/Texture/Structured/Byte Buffers`_
 section.

 Structs used as stage inputs/outputs will have semantics attached to their
 members. These semantics are handled in the `entry function wrapper`_.

 Structs used as pixel shader inputs can have optional interpolation modifiers
 for their members, which will be translated according to the following table:

 =========================== ================= =====================
 HLSL Interpolation Modifier SPIR-V Decoration   SPIR-V Capability
 =========================== ================= =====================
 ``linear``                  <none>
 ``centroid``                ``Centroid``
 ``nointerpolation``         ``Flat``
 ``noperspective``           ``NoPerspective``
 ``sample``                  ``Sample``        ``SampleRateShading``
 =========================== ================= =====================

 Arrays
 ------

 Sized (either explicitly or implicitly) arrays are translated into SPIR-V
 `OpTypeArray`. Unsized arrays are translated into `OpTypeRuntimeArray`.

 Arrays, if used for external resources (residing in SPIR-V `Uniform` or
 `UniformConstant` storage class), will need layout decorations like SPIR-V
 `ArrayStride` decoration. For arrays of opaque types, e.g., HLSL textures
 or samplers, we don't decorate with `ArrayStride` decorations since there is
 no meaningful strides. Similarly for arrays of structured/byte buffers.

 User-defined types
 ------------------

 `User-defined types <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509702(v=vs.85).aspx>`_
 are type aliases introduced by typedef. No new types are introduced and we can
 rely on Clang to resolve to the original types.

 Samplers
 --------

 All `sampler types <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509644(v=vs.85).aspx>`_
 will be translated into SPIR-V ``OpTypeSampler``.

 SPIR-V ``OpTypeSampler`` is an opaque type that cannot be parameterized;
 therefore state assignments on sampler types is not supported (yet).

 Textures
 --------

 `Texture types <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509700(v=vs.85).aspx>`_
 are translated into SPIR-V ``OpTypeImage``, with parameters:

 ======================= ==================== ===== =================== ========== ===== ======= == ======= ================ =================
        HLSL                   Vulkan                                        SPIR-V
 ----------------------- -------------------------- ------------------------------------------------------------------------------------------
      Texture Type         Descriptor Type    RO/RW    Storage Class        Dim    Depth Arrayed MS Sampled   Image Format      Capability
 ======================= ==================== ===== =================== ========== ===== ======= == ======= ================ =================
 ``Texture1D``           Sampled Image         RO   ``UniformConstant`` ``1D``      2       0    0    1     ``Unknown``
 ``Texture2D``           Sampled Image         RO   ``UniformConstant`` ``2D``      2       0    0    1     ``Unknown``
 ``Texture3D``           Sampled Image         RO   ``UniformConstant`` ``3D``      2       0    0    1     ``Unknown``
 ``TextureCube``         Sampled Image         RO   ``UniformConstant`` ``Cube``    2       0    0    1     ``Unknown``
 ``Texture1DArray``      Sampled Image         RO   ``UniformConstant`` ``1D``      2       1    0    1     ``Unknown``
 ``Texture2DArray``      Sampled Image         RO   ``UniformConstant`` ``2D``      2       1    0    1     ``Unknown``
 ``Texture2DMS``         Sampled Image         RO   ``UniformConstant`` ``2D``      2       0    1    1     ``Unknown``
 ``Texture2DMSArray``    Sampled Image         RO   ``UniformConstant`` ``2D``      2       1    1    1     ``Unknown``      ``ImageMSArray``
 ``TextureCubeArray``    Sampled Image         RO   ``UniformConstant`` ``3D``      2       1    0    1     ``Unknown``
 ``Buffer<T>``           Uniform Texel Buffer  RO   ``UniformConstant`` ``Buffer``  2       0    0    1     Depends on ``T`` ``SampledBuffer``
 ``RWBuffer<T>``         Storage Texel Buffer  RW   ``UniformConstant`` ``Buffer``  2       0    0    2     Depends on ``T`` ``SampledBuffer``
 ``RWTexture1D<T>``      Storage Image         RW   ``UniformConstant`` ``1D``      2       0    0    2     Depends on ``T``
 ``RWTexture2D<T>``      Storage Image         RW   ``UniformConstant`` ``2D``      2       0    0    2     Depends on ``T``
 ``RWTexture3D<T>``      Storage Image         RW   ``UniformConstant`` ``3D``      2       0    0    2     Depends on ``T``
 ``RWTexture1DArray<T>`` Storage Image         RW   ``UniformConstant`` ``1D``      2       1    0    2     Depends on ``T``
 ``RWTexture2DArray<T>`` Storage Image         RW   ``UniformConstant`` ``2D``      2       1    0    2     Depends on ``T``
 ======================= ==================== ===== =================== ========== ===== ======= == ======= ================ =================

 The meanings of the headers in the above table is explained in ``OpTypeImage``
 of the SPIR-V spec.

 Vulkan specific Image Formats
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Since HLSL lacks the syntax for fully specifying image formats for textures in
 SPIR-V, we introduce ``[[vk::image_format("FORMAT")]]`` attribute for texture types.
 For example,

 .. code:: hlsl

   [[vk::image_format("rgba8")]]
   RWBuffer<float4> Buf;

   [[vk::image_format("rg16f")]]
   RWTexture2D<float2> Tex;

   RWTexture2D<float2> Tex2; // Works like before

 ``rgba8`` means ``Rgba8`` `SPIR-V Image Format <https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_image_format_a_image_format>`_.
 The following table lists the mapping between ``FORMAT`` of
 ``[[vk::image_format("FORMAT")]]`` and its corresponding SPIR-V Image Format.

 ======================= ============================================
        FORMAT                   SPIR-V Image Format
 ======================= ============================================
 ``unknown``             ``Unknown``
 ``rgba32f``             ``Rgba32f``
 ``rgba16f``             ``Rgba16f``
 ``r32f``                ``R32f``
 ``rgba8``               ``Rgba8``
 ``rgba8snorm``          ``Rgba8Snorm``
 ``rg32f``               ``Rg32f``
 ``rg16f``               ``Rg16f``
 ``r11g11b10f``          ``R11fG11fB10f``
 ``r16f``                ``R16f``
 ``rgba16``              ``Rgba16``
 ``rgb10a2``             ``Rgb10A2``
 ``rg16``                ``Rg16``
 ``rg8``                 ``Rg8``
 ``r16``                 ``R16``
 ``r8``                  ``R8``
 ``rgba16snorm``         ``Rgba16Snorm``
 ``rg16snorm``           ``Rg16Snorm``
 ``rg8snorm``            ``Rg8Snorm``
 ``r16snorm``            ``R16Snorm``
 ``r8snorm``             ``R8Snorm``
 ``rgba32i``             ``Rgba32i``
 ``rgba16i``             ``Rgba16i``
 ``rgba8i``              ``Rgba8i``
 ``r32i``                ``R32i``
 ``rg32i``               ``Rg32i``
 ``rg16i``               ``Rg16i``
 ``rg8i``                ``Rg8i``
 ``r16i``                ``R16i``
 ``r8i``                 ``R8i``
 ``rgba32ui``            ``Rgba32ui``
 ``rgba16ui``            ``Rgba16ui``
 ``rgba8ui``             ``Rgba8ui``
 ``r32ui``               ``R32ui``
 ``rgb10a2ui``           ``Rgb10a2ui``
 ``rg32ui``              ``Rg32ui``
 ``rg16ui``              ``Rg16ui``
 ``rg8ui``               ``Rg8ui``
 ``r16ui``               ``R16ui``
 ``r8ui``                ``R8ui``
 ``r64ui``               ``R64ui``
 ``r64i``                ``R64i``
 ======================= ============================================

 Constant/Texture/Structured/Byte Buffers
 ----------------------------------------

 There are serveral buffer types in HLSL:

 - ``cbuffer`` and ``ConstantBuffer``
 - ``tbuffer`` and ``TextureBuffer``
 - ``StructuredBuffer`` and ``RWStructuredBuffer``
 - ``AppendStructuredBuffer`` and ``ConsumeStructuredBuffer``
 - ``ByteAddressBuffer`` and ``RWByteAddressBuffer``

 Note that ``Buffer`` and ``RWBuffer`` are considered as texture object in HLSL.
 They are listed in the above section.

 Please see the following sections for the details of each type. As a summary:

 =========================== ================== ================================ ==================== =================
          HLSL Type          Vulkan Buffer Type    Default Memory Layout Rule    SPIR-V Storage Class SPIR-V Decoration
 =========================== ================== ================================ ==================== =================
 ``cbuffer``                   Uniform Buffer   Vector-relaxed OpenGL ``std140``      ``Uniform``     ``Block``
 ``ConstantBuffer``            Uniform Buffer   Vector-relaxed OpenGL ``std140``      ``Uniform``     ``Block``
 ``tbuffer``                   Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``TextureBuffer``             Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``StructuredBuffer``          Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``RWStructuredBuffer``        Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``AppendStructuredBuffer``    Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``ConsumeStructuredBuffer``   Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``ByteAddressBuffer``         Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 ``RWByteAddressBuffer``       Storage Buffer   Vector-relaxed OpenGL ``std430``      ``Uniform``     ``BufferBlock``
 =========================== ================== ================================ ==================== =================

 To know more about the Vulkan buffer types, please refer to the Vulkan spec
 `13.1 Descriptor Types <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#descriptorsets-types>`_.

 Memory layout rules
 ~~~~~~~~~~~~~~~~~~~

 SPIR-V CodeGen supports four sets of memory layout rules for buffer resources
 right now:

 1. Vector-relaxed OpenGL ``std140`` for uniform buffers and vector-relaxed
    OpenGL ``std430`` for storage buffers: these rules satisfy Vulkan `"Standard
    Uniform Buffer Layout" and "Standard Storage Buffer Layout" <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#interfaces-resources-layout>`_,
    respectively.
    They are the default.
 2. DirectX memory layout rules for uniform buffers and storage buffers:
    they allow packing data on the application side that can be shared with
    DirectX. They can be enabled by ``-fvk-use-dx-layout``.
 3. Strict OpenGL ``std140`` for uniform buffers and strict OpenGL ``std430``
    for storage buffers: they allow packing data on the application side that
    can be shared with OpenGL. They can be enabled by ``-fvk-use-gl-layout``.
 4. Scalar layout rules introduced via `VK_EXT_scalar_block_layout`, which
    basically aligns all aggregrate types according to their elements'
    natural alignment. They can be enabled by ``-fvk-use-scalar-layout``.

 To use scalar layout, the application side need to request
 ``VK_EXT_scalar_block_layout``. This is also true for using DirectX memory
 layout since there is no dedicated DirectX layout extension for Vulkan
 (at least for now). So we must request something more permissive.

 In the above, "vector-relaxed OpenGL ``std140``/``std430``" rules mean OpenGL
 ``std140``/``std430`` rules with the following modification for vector type
 alignment:

 1. The alignment of a vector type is set to be the alignment of its element type
 2. If the above causes an `improper straddle <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#interfaces-resources-layout>`_,
    the alignment will be set to 16 bytes.

 As an exmaple, for the following HLSL definition:

 .. code:: hlsl

   struct S {
       float3 f;
   };

   struct T {
                 float    a_float;
                 float3   b_float3;
                 S        c_S_float3;
                 float2x3 d_float2x3;
       row_major float2x3 e_float2x3;
                 int      f_int_3[3];
                 float2   g_float2_2[2];
   };

 We will have the following offsets for each member:

 ============== ====== ====== ====== ========== ====== ====== ====== ==========
      HLSL             Uniform Buffer                Storage Buffer
 -------------- ------------------------------- -------------------------------
     Member     1 (VK) 2 (DX) 3 (GL) 4 (Scalar) 1 (VK) 2 (DX) 3 (GL) 4 (Scalar)
 ============== ====== ====== ====== ========== ====== ====== ====== ==========
 ``a_float``      0      0      0        0        0      0     0        0
 ``b_float3``     4      4      16       4        4      4     16       4
 ``c_S_float3``   16     16     32       16       16     16    32       16
 ``d_float2x3``   32     32     48       28       32     28    48       28
 ``e_float2x3``   80     80     96       52       64     52    80       52
 ``f_int_3``      112    112    128      76       96     76    112      76
 ``g_float2_2``   160    160    176      88       112    88    128      88
 ============== ====== ====== ====== ========== ====== ====== ====== ==========

 If the above layout rules do not satisfy your needs and you want to manually
 control the layout of struct members, you can use either

 * The native HLSL ``:packoffset()`` attribute: only available for cbuffers; or
 * The Vulkan-specific ``[[vk::offset()]]`` attribute: applies to all resources.

 ``[[vk::offset]]`` overrules ``:packoffset``. Attaching ``[[vk::offset]]``
 to a struct memeber affects all variables of the struct type in question. So
 sharing the same struct definition having ``[[vk::offset]]`` annotations means
 also sharing the layout.

 For global variables (which are collected into the ``$Globals`` cbuffer), you
 can use the native HLSL ``:register(c#)`` attribute. Note that ``[[vk::offset]]``
 and ``:packoffset`` cannot be applied to these variables.

 If ``register(cX)`` is used on any global variable, the offset for that variable
 is set to ``X * 16``, and the offset for all other global variables without the
 ``register(c#)`` annotation will be set to the next available address after
 the highest explicit address. For example:

 .. code:: hlsl

   float x : register(c10);   // Offset = 160 (10 * 16)
   float y;                   // Offset = 164 (160 + 4)
   float z: register(c1);     // Offset = 16  (1  * 16)


 These attributes give great flexibility but also responsibility to the
 developer; the compiler will just take in what is specified in the source code
 and emit it to SPIR-V with no error checking.

 ``cbuffer`` and ``ConstantBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 These two buffer types are treated as uniform buffers using Vulkan's
 terminology. They are translated into an ``OpTypeStruct`` with the
 necessary layout decorations (``Offset``, ``ArrayStride``, ``MatrixStride``,
 ``RowMajor``, ``ColMajor``) and the ``Block`` decoration. The layout rule
 used is vector-relaxed OpenGL ``std140`` (by default). A variable declared as
 one of these types will be placed in the ``Uniform`` storage class.

 For example, for the following HLSL source code:

 .. code:: hlsl

   struct T {
     float  a;
     float3 b;
   };

   ConstantBuffer<T> myCBuffer;

 will be translated into

 .. code:: spirv

   ; Layout decoration
   OpMemberDecorate %type_ConstantBuffer_T 0 Offset 0
   OpMemberDecorate %type_ConstantBuffer_T 0 Offset 4
   ; Block decoration
   OpDecorate %type_ConstantBuffer_T Block

   ; Types
   %type_ConstantBuffer_T = OpTypeStruct %float %v3float
   %_ptr_Uniform_type_ConstantBuffer_T = OpTypePointer Uniform %type_ConstantBuffer_T

   ; Variable
   %myCbuffer = OpVariable %_ptr_Uniform_type_ConstantBuffer_T Uniform

 ``tbuffer`` and ``TextureBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 These two buffer types are treated as storage buffers using Vulkan's
 terminology. They are translated into an ``OpTypeStruct`` with the
 necessary layout decorations (``Offset``, ``ArrayStride``, ``MatrixStride``,
 ``RowMajor``, ``ColMajor``) and the ``BufferBlock`` decoration. All the struct
 members are also decorated with ``NonWritable`` decoration. The layout rule
 used is vector-relaxed OpenGL ``std430`` (by default). A variable declared as
 one of these types will be placed in the ``Uniform`` storage class.


 ``StructuredBuffer`` and ``RWStructuredBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``StructuredBuffer<T>``/``RWStructuredBuffer<T>`` is treated as storage buffer
 using Vulkan's terminology. It is translated into an ``OpTypeStruct`` containing
 an ``OpTypeRuntimeArray`` of type ``T``, with necessary layout decorations
 (``Offset``, ``ArrayStride``, ``MatrixStride``, ``RowMajor``, ``ColMajor``) and
 the ``BufferBlock`` decoration.  The default layout rule used is vector-relaxed
 OpenGL ``std430``. A variable declared as one of these types will be placed in
 the ``Uniform`` storage class.

 For ``RWStructuredBuffer<T>``, each variable will have an associated counter
 variable generated. The counter variable will be of ``OpTypeStruct`` type, which
 only contains a 32-bit integer. The counter variable takes its own binding
 number. ``.IncrementCounter()``/``.DecrementCounter()`` will modify this counter
 variable.

 For example, for the following HLSL source code:

 .. code:: hlsl

   struct T {
     float  a;
     float3 b;
   };

   StructuredBuffer<T> mySBuffer;

 will be translated into

 .. code:: spirv

   ; Layout decoration
   OpMemberDecorate %T 0 Offset 0
   OpMemberDecorate %T 1 Offset 4
   OpDecorate %_runtimearr_T ArrayStride 16
   OpMemberDecorate %type_StructuredBuffer_T 0 Offset 0
   OpMemberDecorate %type_StructuredBuffer_T 0 NoWritable
   ; BufferBlock decoration
   OpDecorate %type_StructuredBuffer_T BufferBlock

   ; Types
   %T = OpTypeStruct %float %v3float
   %_runtimearr_T = OpTypeRuntimeArray %T
   %type_StructuredBuffer_T = OpTypeStruct %_runtimearr_T
   %_ptr_Uniform_type_StructuredBuffer_T = OpTypePointer Uniform %type_StructuredBuffer_T

   ; Variable
   %myCbuffer = OpVariable %_ptr_Uniform_type_ConstantBuffer_T Uniform

 ``AppendStructuredBuffer`` and ``ConsumeStructuredBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``AppendStructuredBuffer<T>``/``ConsumeStructuredBuffer<T>`` is treated as
 storage buffer using Vulkan's terminology. It is translated into an
 ``OpTypeStruct`` containing an ``OpTypeRuntimeArray`` of type ``T``, with
 necessary layout decorations (``Offset``, ``ArrayStride``, ``MatrixStride``,
 ``RowMajor``, ``ColMajor``) and the ``BufferBlock`` decoration. The default
 layout rule used is vector-relaxed OpenGL ``std430``.

 A variable declared as one of these types will be placed in the ``Uniform``
 storage class. Besides, each variable will have an associated counter variable
 generated. The counter variable will be of ``OpTypeStruct`` type, which only
 contains a 32-bit integer. The integer is the total number of elements in the
 buffer. The counter variable takes its own binding number.
 ``.Append()``/``.Consume()`` will use the counter variable as the index and
 adjust it accordingly.

 For example, for the following HLSL source code:

 .. code:: hlsl

   struct T {
     float  a;
     float3 b;
   };

   AppendStructuredBuffer<T> mySBuffer;

 will be translated into

 .. code:: spirv

   ; Layout decorations
   OpMemberDecorate %T 0 Offset 0
   OpMemberDecorate %T 1 Offset 4
   OpDecorate %_runtimearr_T ArrayStride 16
   OpMemberDecorate %type_AppendStructuredBuffer_T 0 Offset 0
   OpDecorate %type_AppendStructuredBuffer_T BufferBlock
   OpMemberDecorate %type_ACSBuffer_counter 0 Offset 0
   OpDecorate %type_ACSBuffer_counter BufferBlock

   ; Binding numbers
   OpDecorate %myASbuffer DescriptorSet 0
   OpDecorate %myASbuffer Binding 0
   OpDecorate %counter_var_myASbuffer DescriptorSet 0
   OpDecorate %counter_var_myASbuffer Binding 1

   ; Types
   %T = OpTypeStruct %float %v3float
   %_runtimearr_T = OpTypeRuntimeArray %T
   %type_AppendStructuredBuffer_T = OpTypeStruct %_runtimearr_T
   %_ptr_Uniform_type_AppendStructuredBuffer_T = OpTypePointer Uniform %type_AppendStructuredBuffer_T
   %type_ACSBuffer_counter = OpTypeStruct %int
   %_ptr_Uniform_type_ACSBuffer_counter = OpTypePointer Uniform %type_ACSBuffer_counter

   ; Variables
   %myASbuffer = OpVariable %_ptr_Uniform_type_AppendStructuredBuffer_T Uniform
   %counter_var_myASbuffer = OpVariable %_ptr_Uniform_type_ACSBuffer_counter Uniform

 ``ByteAddressBuffer`` and ``RWByteAddressBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``ByteAddressBuffer``/``RWByteAddressBuffer`` is treated as storage buffer using
 Vulkan's terminology. It is translated into an ``OpTypeStruct`` containing an
 ``OpTypeRuntimeArray`` of 32-bit unsigned integers, with ``BufferBlock``
 decoration.

 A variable declared as one of these types will be placed in the ``Uniform``
 storage class.

 For example, for the following HLSL source code:

 .. code:: hlsl

   ByteAddressBuffer   myBuffer1;
   RWByteAddressBuffer myBuffer2;

 will be translated into

 .. code:: spirv

   ; Layout decorations

   OpDecorate %_runtimearr_uint ArrayStride 4

   OpDecorate %type_ByteAddressBuffer BufferBlock
   OpMemberDecorate %type_ByteAddressBuffer 0 Offset 0
   OpMemberDecorate %type_ByteAddressBuffer 0 NonWritable

   OpDecorate %type_RWByteAddressBuffer BufferBlock
   OpMemberDecorate %type_RWByteAddressBuffer 0 Offset 0

   ; Types

   %_runtimearr_uint = OpTypeRuntimeArray %uint

   %type_ByteAddressBuffer = OpTypeStruct %_runtimearr_uint
   %_ptr_Uniform_type_ByteAddressBuffer = OpTypePointer Uniform %type_ByteAddressBuffer

   %type_RWByteAddressBuffer = OpTypeStruct %_runtimearr_uint
   %_ptr_Uniform_type_RWByteAddressBuffer = OpTypePointer Uniform %type_RWByteAddressBuffer

   ; Variables

   %myBuffer1 = OpVariable %_ptr_Uniform_type_ByteAddressBuffer Uniform
   %myBuffer2 = OpVariable %_ptr_Uniform_type_RWByteAddressBuffer Uniform

 HLSL Variables and Resources
 ============================

 This section lists how various HLSL variables and resources are mapped.

 According to `Shader Constants <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509581(v=vs.85).aspx>`_,

   There are two default constant buffers available, $Global and $Param. Variables
   that are placed in the global scope are added implicitly to the $Global cbuffer,
   using the same packing method that is used for cbuffers. Uniform parameters in
   the parameter list of a function appear in the $Param constant buffer when a
   shader is compiled outside of the effects framework.

 So all global externally-visible non-resource-type stand-alone variables will
 be collected into a cbuffer named as ``$Globals``, no matter whether they are
 statically referenced by the entry point or not. The ``$Globals`` cbuffer
 follows the layout rules like normal cbuffer.

 Storage class
 -------------

 Normal local variables (without any modifier) will be placed in the ``Function``
 SPIR-V storage class. Normal global variables (without any modifer) will be
 placed in the ``Uniform`` or ``UniformConstant`` storage class.

 - ``static``

   - Global variables with ``static`` modifier will be placed in the ``Private``
     SPIR-V storage class. Initalizers of such global variables will be translated
     into SPIR-V ``OpVariable`` initializers if possible; otherwise, they will be
     initialized at the very beginning of the `entry function wrapper`_ using
     SPIR-V ``OpStore``.
   - Local variables with ``static`` modifier will also be placed in the
     ``Private`` SPIR-V storage class. initializers of such local variables will
     also be translated into SPIR-V ``OpVariable`` initializers if possible;
     otherwise, they will be initialized at the very beginning of the enclosing
     function. To make sure that such a local variable is only initialized once,
     a second boolean variable of the ``Private`` SPIR-V storage class will be
     generated to mark its initialization status.

 - ``groupshared``

   - Global variables with ``groupshared`` modifier will be placed in the
     ``Workgroup`` storage class.
   - Note that this modifier overrules ``static``; if both ``groupshared`` and
     ``static`` are applied to a variable, ``static`` will be ignored.

 - ``uinform``

   - This does not affect codegen. Variables will be treated like normal global
     variables.

 - ``extern``

   - This does not affect codegen. Variables will be treated like normal global
     variables.

 - ``shared``

   - This is a hint to the compiler. It will be ingored.

 - ``volatile``

   - This is a hint to the compiler. It will be ingored.

 HLSL semantic and Vulkan ``Location``
 -------------------------------------

 Direct3D uses HLSL "`semantics <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509647(v=vs.85).aspx>`_"
 to compose and match the interfaces between subsequent stages. These semantic
 strings can appear after struct members, function parameters and return
 values. E.g.,

 .. code:: hlsl

   struct VSInput {
     float4 pos  : POSITION;
     float3 norm : NORMAL;
   };

   float4 VSMain(in  VSInput input,
                 in  float4  tex   : TEXCOORD,
                 out float4  pos   : SV_Position) : TEXCOORD {
     pos = input.pos;
     return tex;
   }

 In contrary, Vulkan stage input and output interface matching is via explicit
 ``Location`` numbers. Details can be found `here <https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#interfaces-iointerfaces>`_.

 To translate HLSL to SPIR-V for Vulkan, semantic strings need to be mapped to
 Vulkan ``Location`` numbers properly. This can be done either explicitly via
 information provided by the developer or implicitly by the compiler.

 Explicit ``Location`` number assignment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``[[vk::location(X)]]`` can be attached to the entities where semantic are
 allowed to attach (struct fields, function parameters, and function returns).
 For the above exmaple we can have:

 .. code:: hlsl

   struct VSInput {
     [[vk::location(0)]] float4 pos  : POSITION;
     [[vk::location(1)]] float3 norm : NORMAL;
   };

   [[vk::location(1)]]
   float4 VSMain(in  VSInput input,
                 [[vk::location(2)]]
                 in  float4  tex     : TEXCOORD,
                 out float4  pos     : SV_Position) : TEXCOORD {
     pos = input.pos;
     return tex;
   }

 In the above, input ``POSITION``, ``NORMAL``, and ``TEXCOORD`` will be mapped to
 ``Location`` 0, 1, and 2, respectively, and output ``TEXCOORD`` will be mapped
 to ``Location`` 1.

 [TODO] Another explicit way: using command-line options

 Please note that the compiler prohibits mixing the explicit and implicit
 approach for the same SigPoint to avoid complexity and fallibility. However,
 for a certain shader stage, one SigPoint using the explicit approach while the
 other adopting the implicit approach is permitted.

 Implicit ``Location`` number assignment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Without hints from the developer, the compiler will try its best to map
 semantics to ``Location`` numbers. However, there is no single rule for this
 mapping; semantic strings should be handled case by case.

 Firstly, under certain `SigPoints <https://github.com/Microsoft/DirectXShaderCompiler/blob/master/docs/DXIL.rst#hlsl-signatures-and-semantics>`_,
 some system-value (SV) semantic strings will be translated into SPIR-V
 ``BuiltIn`` decorations:

 .. table:: Mapping from HLSL SV semantic to SPIR-V builtin and execution mode

 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | HLSL Semantic             | SigPoint    | SPIR-V ``BuiltIn``                     | SPIR-V Execution Mode |   SPIR-V Capability         |
 +===========================+=============+========================================+=======================+=============================+
 |                           | VSOut       | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPIn      | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPOut     | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSCPIn      | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_Position               | DSOut       | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSVIn       | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``Position``                           | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``FragCoord``                          | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | MSOut       | ``Position``                           | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | VSOut       | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPIn      | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPOut     | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSCPIn      | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_ClipDistance           | DSOut       | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSVIn       | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | MSOut       | ``ClipDistance``                       | N/A                   | ``ClipDistance``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | VSOut       | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPIn      | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSCPOut     | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSCPIn      | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_CullDistance           | DSOut       | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSVIn       | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | MSOut       | ``CullDistance``                       | N/A                   | ``CullDistance``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_VertexID               | VSIn        | ``VertexIndex``                        | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_InstanceID             | VSIn        | ``InstanceIndex`` or                   | N/A                   | ``Shader``                  |
 |                           |             | ``InstanceIndex - BaseInstance``       |                       |                             |
 |                           |             | with                                   |                       |                             |
 |                           |             | ``-fvk-support-nonzero-base-instance`` |                       |                             |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_Depth                  | PSOut       | ``FragDepth``                          | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_DepthGreaterEqual      | PSOut       | ``FragDepth``                          | ``DepthGreater``      | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_DepthLessEqual         | PSOut       | ``FragDepth``                          | ``DepthLess``         | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_IsFrontFace            | PSIn        | ``FrontFacing``                        | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | CSIn        | ``GlobalInvocationId``                 | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_DispatchThreadID       | MSIn        | ``GlobalInvocationId``                 | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | ASIn        | ``GlobalInvocationId``                 | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | CSIn        | ``WorkgroupId``                        | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_GroupID                | MSIn        | ``WorkgroupId``                        | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | ASIn        | ``WorkgroupId``                        | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | CSIn        | ``LocalInvocationId``                  | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_GroupThreadID          | MSIn        | ``LocalInvocationId``                  | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | ASIn        | ``LocalInvocationId``                  | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | CSIn        | ``LocalInvocationIndex``               | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_GroupIndex             | MSIn        | ``LocalInvocationIndex``               | N/A                   | ``Shader``                  |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | ASIn        | ``LocalInvocationIndex``               | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_OutputControlPointID   | HSIn        | ``InvocationId``                       | N/A                   | ``Tessellation``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_GSInstanceID           | GSIn        | ``InvocationId``                       | N/A                   | ``Geometry``                |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_DomainLocation         | DSIn        | ``TessCoord``                          | N/A                   | ``Tessellation``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSIn        | ``PrimitiveId``                        | N/A                   | ``Tessellation``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PCIn        | ``PrimitiveId``                        | N/A                   | ``Tessellation``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DsIn        | ``PrimitiveId``                        | N/A                   | ``Tessellation``            |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSIn        | ``PrimitiveId``                        | N/A                   | ``Geometry``                |
 | SV_PrimitiveID            +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``PrimitiveId``                        | N/A                   | ``Geometry``                |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``PrimitiveId``                        | N/A                   | ``Geometry``                |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           |             |                                        |                       | ``MeshShadingNV``           |
 |                           | MSOut       | ``PrimitiveId``                        | N/A                   |                             |
 |                           |             |                                        |                       | ``MeshShadingEXT``          |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PCOut       | ``TessLevelOuter``                     | N/A                   | ``Tessellation``            |
 | SV_TessFactor             +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSIn        | ``TessLevelOuter``                     | N/A                   | ``Tessellation``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PCOut       | ``TessLevelInner``                     | N/A                   | ``Tessellation``            |
 | SV_InsideTessFactor       +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSIn        | ``TessLevelInner``                     | N/A                   | ``Tessellation``            |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_SampleIndex            | PSIn        | ``SampleId``                           | N/A                   | ``SampleRateShading``       |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_StencilRef             | PSOut       | ``FragStencilRefEXT``                  | N/A                   | ``StencilExportEXT``        |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_Barycentrics           | PSIn        | ``BaryCoord*AMD``                      | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``Layer``                              | N/A                   | ``Geometry``                |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``Layer``                              | N/A                   | ``Geometry``                |
 | SV_RenderTargetArrayIndex +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           |             |                                        |                       | ``MeshShadingNV``           |
 |                           | MSOut       | ``Layer``                              | N/A                   |                             |
 |                           |             |                                        |                       | ``MeshShadingEXT``          |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``ViewportIndex``                      | N/A                   | ``MultiViewport``           |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``ViewportIndex``                      | N/A                   | ``MultiViewport``           |
 | SV_ViewportArrayIndex     +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           |             |                                        |                       | ``MeshShadingNV``           |
 |                           | MSOut       | ``ViewportIndex``                      | N/A                   |                             |
 |                           |             |                                        |                       | ``MeshShadingEXT``          |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``SampleMask``                         | N/A                   | ``Shader``                  |
 | SV_Coverage               +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSOut       | ``SampleMask``                         | N/A                   | ``Shader``                  |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_InnerCoverage          | PSIn        | ``FullyCoveredEXT``                    | N/A                   | ``FragmentFullyCoveredEXT`` |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | VSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | HSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | DSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 | SV_ViewID                 +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | MSIn        | ``ViewIndex``                          | N/A                   | ``MultiView``               |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | VSOut       | ``PrimitiveShadingRateKHR``            | N/A                   | ``FragmentShadingRate``     |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | GSOut       | ``PrimitiveShadingRateKHR``            | N/A                   | ``FragmentShadingRate``     |
 | SV_ShadingRate            +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | PSIn        | ``ShadingRateKHR``                     | N/A                   | ``FragmentShadingRate``     |
 |                           +-------------+----------------------------------------+-----------------------+-----------------------------+
 |                           | MSOut       | ``PrimitiveShadingRateKHR``            | N/A                   | ``FragmentShadingRate``     |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+
 | SV_CullPrimitive          | MSOut       | ``CullPrimitiveEXT``                   | N/A                   | ``MeshShadingEXT ``         |
 +---------------------------+-------------+----------------------------------------+-----------------------+-----------------------------+


 For entities (function parameters, function return values, struct fields) with
 the above SV semantic strings attached, SPIR-V variables of the
 ``Input``/``Output`` storage class will be created. They will have the
 corresponding SPIR-V ``Builtin``  decorations according to the above table.

 SV semantic strings not translated into SPIR-V ``BuiltIn`` decorations will be
 handled similarly as non-SV (arbitrary) semantic strings: a SPIR-V variable
 of the ``Input``/``Output`` storage class will be created for each entity with
 such semantic string. Then sort all semantic strings according to declaration
 (the default, or if ``-fvk-stage-io-order=decl`` is given) or alphabetical
 (if ``-fvk-stage-io-order=alpha`` is given) order, and assign ``Location``
 numbers sequentially to the corresponding SPIR-V variables. Note that this means
 flattening all structs if structs are used as function parameters or returns.

 There is an exception to the above rule for SV_Target[N]. It will always be
 mapped to ``Location`` number N.

 ``ClipDistance & CullDistance``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Variables decorated with ``SV_ClipDistanceX`` can be float or vector of float
 type. To map them into one float array in the struct, we firstly sort them
 asecendingly according to ``X``, and then concatenate them tightly. For example,

 .. code:: hlsl

   struct T {
     float clip0: SV_ClipDistance0,
   };

   struct S {
     float3 clip5: SV_ClipDistance5;
     ...
   };

   void main(T t, S s, float2 clip2 : SV_ClipDistance2) { ... }

 Then we have an float array of size (1 + 2 + 3 =) 6 for ``ClipDistance``, with
 ``clip0`` at offset 0, ``clip2`` at offset 1, ``clip5`` at offset 3.

 Decorating a variable or struct member with the ``ClipDistance`` builtin but not
 requiring the ``ClipDistance`` capability is legal as long as we don't read or
 write the variable or struct member. But as per the way we handle `shader entry
 function`_, this is not satisfied because we need to read their contents to
 prepare for the source code entry function call or write back them after the
 call. So annotating a variable or struct member with ``SV_ClipDistanceX`` means
 requiring the ``ClipDistance`` capability in the generated SPIR-V.

 Variables decorated with ``SV_CullDistanceX`` are mapped similarly as above.

 Signature packing
 ~~~~~~~~~~~~~~~~~

 In usual, Vulkan drivers have a limitation of the number of available locations.
 It varies depending on the device. To avoid the driver crash caused by the
 limitation, we added an experimental signature packing support using Component
 decoration (see the Vulkan spec "15.1.5. Component Assignment").
 ``-pack-optimized`` is the command line option to enable it.

 In a high level, for a stage variable that needs ``M`` components in ``N``
 locations e.g., stage variable ``float3 foo[2]`` needs 3 components in 2
 locations, we find a minimum ``K`` where each of ``N`` continuous locations in
 ``[K, K + N)`` has ``M`` continuous unused Component slots. We create a Location
 decoration instruction for the stage variable with ``K`` and a Component
 decoration instruction with the first unused component number of the
 ``M`` continuous unused Component slots.

 HLSL register and Vulkan binding
 --------------------------------

 In shaders for DirectX, resources are accessed via registers; while in shaders
 for Vulkan, it is done via descriptor set and binding numbers. The developer
 can explicitly annotate variables in HLSL to specify descriptor set and binding
 numbers, or leave it to the compiler to derive implicitly from registers.

 Explicit binding number assignment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``[[vk::binding(X[, Y])]]`` can be attached to global variables to specify the
 descriptor set as ``Y`` and binding number as ``X``. The descriptor set number
 is optional; if missing, it will be zero (If ``-auto-binding-space N`` command
 line option is used, then descriptor set #N will be used instead of descriptor
 set #0). RW/append/consume structured buffers have associated counters, which
 will occupy their own Vulkan descriptors. ``[vk::counter_binding(Z)]`` can be
 attached to a RW/append/consume structured buffers to specify the binding number
 for the associated counter to ``Z``. Note that the set number of the counter is
 always the same as the main buffer.

 Implicit binding number assignment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Without explicit annotations, the compiler will try to deduce descriptor sets
 and binding numbers in the following way:

 If there is ``:register(xX, spaceY)`` specified for the given global variable,
 the corresponding resource will be assigned to descriptor set ``Y`` and binding
 number ``X``, regardless of the register type ``x``. Note that this will cause
 binding number collision if, say, two resources are of different register
 type but the same register number. To solve this problem, four command-line
 options, ``-fvk-b-shift N M``, ``-fvk-s-shift N M``, ``-fvk-t-shift N M``, and
 ``-fvk-u-shift N M``, are provided to shift by ``N`` all binding numbers
 inferred for register type ``b``, ``s``, ``t``, and ``u`` in space ``M``,
 respectively.

 If there is no register specification, the corresponding resource will be
 assigned to the next available binding number, starting from 0, in descriptor
 set #0 (If ``-auto-binding-space N`` command line option is used, then
 descriptor set #N will be used instead of descriptor set #0).

 If there is no register specification AND ``-fvk-auto-shift-bindings`` is specified,
 then the register type will be automatically identified based on the resource
 type (according to the following table), and the appropriate shift will
 automatically be applied according to ``-fvk-*shift N M``.

 .. code:: spirv

   t - for shader resource views (SRV)
       TEXTURE1D
       TEXTURE1DARRAY
       TEXTURE2D
       TEXTURE2DARRAY
       TEXTURE3D
       TEXTURECUBE
       TEXTURECUBEARRAY
       TEXTURE2DMS
       TEXTURE2DMSARRAY
       STRUCTUREDBUFFER
       BYTEADDRESSBUFFER
       BUFFER
       TBUFFER

   s - for samplers
       SAMPLER
       SAMPLER1D
       SAMPLER2D
       SAMPLER3D
       SAMPLERCUBE
       SAMPLERSTATE
       SAMPLERCOMPARISONSTATE

   u - for unordered access views (UAV)
       RWBYTEADDRESSBUFFER
       RWSTRUCTUREDBUFFER
       APPENDSTRUCTUREDBUFFER
       CONSUMESTRUCTUREDBUFFER
       RWBUFFER
       RWTEXTURE1D
       RWTEXTURE1DARRAY
       RWTEXTURE2D
       RWTEXTURE2DARRAY
       RWTEXTURE3D

   b - for constant buffer views (CBV)
       CBUFFER
       CONSTANTBUFFER

 Binding number assignment for resources in cbuffer
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Basically, we use the same binding assignment rule described above for a
 cbuffer, but when a cbuffer contains one or more resources, it is inevitable
 to use multiple binding numbers for a single cbuffer. For this type of
 cbuffers, we first assign the next available binding number to the resources.
 Based the order of the appearance in the cbuffer, a resource that appears early
 uses a smaller (earlier available) binding number than a resource that appears
 later. After assigning binding numbers to all resource members, if the cbuffer
 contains one or more members with non-resource types, it creates a struct for
 the remaining members and assign the next available binding number to the
 variable with the struct type.

 For example, the binding numbers for the following resources and cbuffers

 .. code:: hlsl

   cbuffer buf0 : register(b0) {
     float4 non_resource0;
   };
   cbuffer buf1 : register(b4) {
     float4 non_resource1;
   };
   cbuffer buf2 {
     float4 non_resource2;
     Texture2D resource0;
     SamplerState resource1;
   };
   cbuffer buf3 : register(b2) {
     SamplerState resource2;
   }

 will be

 - ``buf0``: 0 because of ``register(b0)``

 - ``buf1``: 4 because of ``register(b4)``

 - ``resource2``: 2 because of ``register(b2)``. Note that ``buf3`` is empty
   without ``resource2``. We do not assign a binding number to an empty struct.

 - ``resource0``: 1 because it is the next available binding number.

 - ``resource1``: 3 because it is the next available binding number.

 - ``buf2`` including only ``non_resource2``: 5 because it is the next available
   binding number.

 Summary
 ~~~~~~~

 In summary, the compiler essentially assigns binding numbers in three passes.

 - Firstly it handles all declarations with explicit ``[[vk::binding(X[, Y])]]``
   annotation.

 - Then the compiler processes all remaining declarations with
   ``:register(xX, spaceY)`` annotation, by applying the shift passed in using
   command-line option ``-fvk-{b|s|t|u}-shift N M``, if provided.

   - If ``:register`` assignment is missing and ``-fvk-auto-shift-bindings`` is
     specified, the register type will be automatically detected based on the
     resource type, and the ``-fvk-{b|s|t|u}-shift N M`` will be applied.

 - Finally, the compiler assigns next available binding numbers to the rest in
   the declaration order.

 As an example, for the following code:

 .. code:: hlsl

   struct S { ... };

   ConstantBuffer<S> cbuffer1 : register(b0);
   Texture2D<float4> texture1 : register(t0);
   Texture2D<float4> texture2 : register(t1, space1);
   SamplerState      sampler1;
   [[vk::binding(3)]]
   RWBuffer<float4> rwbuffer1 : register(u5, space2);

 If we compile with ``-fvk-t-shift 10 0 -fvk-t-shift 20 1``:

 - ``rwbuffer1`` will take binding #3 in set #0, since explicit binding
   assignment has precedence over the rest.
 - ``cbuffer1`` will take binding #0 in set #0, since that's what deduced from
   the register assignment, and there is no shift requested from command line.
 - ``texture1`` will take binding #10 in set #0, and ``texture2`` will take
   binding #21 in set #1, since we requested an 10 shift on t-type registers.
 - ``sampler1`` will take binding 1 in set #0, since that's the next available
   binding number in set #0.

 HLSL global variables and Vulkan binding
 ----------------------------------------
 As mentioned above, all global externally-visible non-resource-type stand-alone
 variables will be collected into a cbuffer named ``$Globals``. By default,
 the ``$Globals`` cbuffer is placed in descriptor set #0, and the binding number
 would be the next available binding number in that set. Meaning, the binding number
 depends on where the very first global variable is in the code.

 Example 1:

 .. code:: hlsl

   float4 someColors;
     // $Globals cbuffer placed at DescriptorSet #0, Binding #0
   Texture2D<float4> texture1;
     // texture1         placed at DescriptorSet #0, Binding #1

 Example 2:

 .. code:: hlsl

   Texture2D<float4> texture1;
     // texture1         placed at DescriptorSet #0, Binding #0
   float4 someColors;
     // $Globals cbuffer placed at DescriptorSet #0, Binding #1

 In order provide more control over the descriptor set and binding number of the
 ``$Globals`` cbuffer, you can use the ``-fvk-bind-globals B S`` command line
 option, which will place this cbuffer at descriptor set ``S``, and binding number ``B``.

 Example 3: (compiled with ``-fvk-bind-globals 2 1``)

 .. code:: hlsl

   Texture2D<float4> texture1;
     // texture1         placed at DescriptorSet #0, Binding #0
   float4 someColors;
     // $Globals cbuffer placed at DescriptorSet #1, Binding #2

 Note that if the developer chooses to use this command line option, it is their
 responsibility to provide proper numbers and avoid binding overlaps.

 HLSL Expressions
 ================

 Unless explicitly noted, matrix per-element operations will be conducted on
 each component vector and then collected into the result matrix. The following
 sections lists the SPIR-V opcodes for scalars and vectors.

 Arithmetic operators
 --------------------

 `Arithmetic operators <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509631(v=vs.85).aspx#Additive_and_Multiplicative_Operators>`_
 (``+``, ``-``, ``*``, ``/``, ``%``) are translated into their corresponding
 SPIR-V opcodes according to the following table.

 +-------+-----------------------------+-------------------------------+--------------------+
 |       | (Vector of) Signed Integers | (Vector of) Unsigned Integers | (Vector of) Floats |
 +=======+=============================+===============================+====================+
 | ``+`` |                         ``OpIAdd``                          |     ``OpFAdd``     |
 +-------+-------------------------------------------------------------+--------------------+
 | ``-`` |                         ``OpISub``                          |     ``OpFSub``     |
 +-------+-------------------------------------------------------------+--------------------+
 | ``*`` |                         ``OpIMul``                          |     ``OpFMul``     |
 +-------+-----------------------------+-------------------------------+--------------------+
 | ``/`` |    ``OpSDiv``               |       ``OpUDiv``              |     ``OpFDiv``     |
 +-------+-----------------------------+-------------------------------+--------------------+
 | ``%`` |    ``OpSRem``               |       ``OpUMod``              |     ``OpFRem``     |
 +-------+-----------------------------+-------------------------------+--------------------+

 Note that for modulo operation, SPIR-V has two sets of instructions: ``Op*Rem``
 and ``Op*Mod``. For ``Op*Rem``, the sign of a non-0 result comes from the first
 operand; while for ``Op*Mod``, the sign of a non-0 result comes from the second
 operand. HLSL doc does not mandate which set of instructions modulo operations
 should be translated into; it only says "the % operator is defined only in cases
 where either both sides are positive or both sides are negative." So technically
 it's undefined behavior to use the modulo operation with operands of different
 signs. But considering HLSL's C heritage and the behavior of Clang frontend, we
 translate modulo operators into ``Op*Rem`` (there is no ``OpURem``).

 For multiplications of float vectors and float scalars, the dedicated SPIR-V
 operation ``OpVectorTimesScalar`` will be used. Similarly, for multiplications
 of float matrices and float scalars, ``OpMatrixTimesScalar`` will be generated.

 Bitwise operators
 -----------------

 `Bitwise operators <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509631(v=vs.85).aspx#Bitwise_Operators>`_
 (``~``, ``&``, ``|``, ``^``, ``<<``, ``>>``) are translated into their
 corresponding SPIR-V opcodes according to the following table.

 +--------+-----------------------------+-------------------------------+
 |        | (Vector of) Signed Integers | (Vector of) Unsigned Integers |
 +========+=============================+===============================+
 | ``~``  |                         ``OpNot``                           |
 +--------+-------------------------------------------------------------+
 | ``&``  |                      ``OpBitwiseAnd``                       |
 +--------+-------------------------------------------------------------+
 | ``|``  |                      ``OpBitwiseOr``                        |
 +--------+-----------------------------+-------------------------------+
 | ``^``  |                      ``OpBitwiseXor``                       |
 +--------+-----------------------------+-------------------------------+
 | ``<<`` |                   ``OpShiftLeftLogical``                    |
 +--------+-----------------------------+-------------------------------+
 | ``>>`` | ``OpShiftRightArithmetic``  | ``OpShiftRightLogical``       |
 +--------+-----------------------------+-------------------------------+

 Note that for ``<<``/``>>``, the right hand side will be culled: only the ``n``
 - 1 least significant bits are considered, where ``n`` is the bitwidth of the
 left hand side.

 Comparison operators
 --------------------

 `Comparison operators <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509631(v=vs.85).aspx#Comparison_Operators>`_
 (``<``, ``<=``, ``>``, ``>=``, ``==``, ``!=``) are translated into their
 corresponding SPIR-V opcodes according to the following table.

 +--------+-----------------------------+-------------------------------+------------------------------+
 |        | (Vector of) Signed Integers | (Vector of) Unsigned Integers |     (Vector of) Floats       |
 +========+=============================+===============================+==============================+
 | ``<``  |  ``OpSLessThan``            |  ``OpULessThan``              |  ``OpFOrdLessThan``          |
 +--------+-----------------------------+-------------------------------+------------------------------+
 | ``<=`` |  ``OpSLessThanEqual``       |  ``OpULessThanEqual``         |  ``OpFOrdLessThanEqual``     |
 +--------+-----------------------------+-------------------------------+------------------------------+
 | ``>``  |  ``OpSGreaterThan``         |  ``OpUGreaterThan``           |  ``OpFOrdGreaterThan``       |
 +--------+-----------------------------+-------------------------------+------------------------------+
 | ``>=`` |  ``OpSGreaterThanEqual``    |  ``OpUGreaterThanEqual``      |  ``OpFOrdGreaterThanEqual``  |
 +--------+-----------------------------+-------------------------------+------------------------------+
 | ``==`` |                     ``OpIEqual``                            |  ``OpFOrdEqual``             |
 +--------+-------------------------------------------------------------+------------------------------+
 | ``!=`` |                     ``OpINotEqual``                         |  ``OpFOrdNotEqual``          |
 +--------+-------------------------------------------------------------+------------------------------+

 Note that for comparison of (vectors of) floats, SPIR-V has two sets of
 instructions: ``OpFOrd*``, ``OpFUnord*``. We translate into ``OpFOrd*`` ones.

 Boolean math operators
 ----------------------

 `Boolean match operators <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509631(v=vs.85).aspx#Boolean_Math_Operators>`_
 (``&&``, ``||``, ``?:``) are translated into their corresponding SPIR-V opcodes
 according to the following table.

 +--------+----------------------+
 |        | (Vector of) Booleans |
 +========+======================+
 | ``&&`` |  ``OpLogicalAnd``    |
 +--------+----------------------+
 | ``||`` |  ``OpLogicalOr``     |
 +--------+----------------------+
 | ``?:`` |  ``OpSelect``        |
 +--------+----------------------+

 Please note that "unlike short-circuit evaluation of ``&&``, ``||``, and ``?:``
 in C, HLSL expressions never short-circuit an evaluation because they are vector
 operations. All sides of the expression are always evaluated."

 Unary operators
 ---------------

 For `unary operators <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509631(v=vs.85).aspx#Unary_Operators>`_:

 - ``!`` is translated into ``OpLogicalNot``. Parsing will gurantee the operands
   are of boolean types by inserting necessary casts.
 - ``+`` requires no additional SPIR-V instructions.
 - ``-`` is translated into ``OpSNegate`` and ``OpFNegate`` for (vectors of)
   integers and floats, respectively.

 Casts
 -----

 Casting between (vectors) of scalar types is translated according to the following table:

 +------------+-------------------+-------------------+-------------------+-------------------+
 | From \\ To |        Bool       |       SInt        |      UInt         |       Float       |
 +============+===================+===================+===================+===================+
 |   Bool     |       no-op       |                 select between one and zero               |
 +------------+-------------------+-------------------+-------------------+-------------------+
 |   SInt     |                   |     no-op         |  ``OpBitcast``    | ``OpConvertSToF`` |
 +------------+                   +-------------------+-------------------+-------------------+
 |   UInt     | compare with zero |   ``OpBitcast``   |      no-op        | ``OpConvertUToF`` |
 +------------+                   +-------------------+-------------------+-------------------+
 |   Float    |                   | ``OpConvertFToS`` | ``OpConvertFToU`` |      no-op        |
 +------------+-------------------+-------------------+-------------------+-------------------+

 It is also feasible in HLSL to cast a float matrix to another float matrix with a smaller size.
 This is known as matrix truncation cast. For instance, the following code casts a 3x4 matrix
 into a 2x3 matrix.

 .. code:: hlsl

   float3x4 m = { 1,  2,  3, 4,
                  5,  6,  7, 8,
                  9, 10, 11, 12 };

   float2x3 a = (float2x3)m;

 Such casting takes the upper-left most corner of the original matrix to generate the result.
 In the above example, matrix ``a`` will have 2 rows, with 3 columns each. First row will be
 ``1, 2, 3`` and the second row will be ``5, 6, 7``.

 Indexing operator
 -----------------

 The ``[]`` operator can also be used to access elements in a matrix or vector.
 A matrix whose row and/or column count is 1 will be translated into a vector or
 scalar. If a variable is used as the index for the dimension whose count is 1,
 that variable will be ignored in the generated SPIR-V code. This is because
 out-of-bound indexing triggers undefined behavior anyway. For example, for a
 1xN matrix ``mat``, ``mat[index][0]`` will be translated into
 ``OpAccessChain ... %mat %uint_0``. Similarly, variable index into a size 1
 vector will also be ignored and the only element will be always returned.

 Assignment operators
 --------------------

 Assigning to struct object may involve decomposing the source struct object and
 assign each element separately and recursively. This happens when the source
 struct object is of different memory layout from the destination struct object.
 For example, for the following source code:

 .. code:: hlsl

   struct S {
     float    a;
     float2   b;
     float2x3 c;
   };

       ConstantBuffer<S> cbuf;
   RWStructuredBuffer<S> sbuf;

   ...
   sbuf[0] = cbuf[0];
   ...

 We need to assign each element because ``ConstantBuffer`` and
 ``RWStructuredBuffer`` has different memory layout.

 HLSL Control Flows
 ==================

 This section lists how various HLSL control flows are mapped.

 Switch statement
 ----------------

 HLSL `switch statements <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509669(v=vs.85).aspx>`_
 are translated into SPIR-V using:

 - **OpSwitch**: if (all case values are integer literals or constant integer
   variables) and (no attribute or the ``forcecase`` attribute is specified)
 - **A series of if statements**: for all other scenarios (e.g., when
   ``flatten``, ``branch``, or ``call`` attribute is specified)

 Loops (for, while, do)
 ----------------------

 HLSL `for statements <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509602(v=vs.85).aspx>`_,
 `while statements <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509708(v=vs.85).aspx>`_,
 and `do statements <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509593(v=vs.85).aspx>`_
 are translated into SPIR-V by constructing all necessary basic blocks and using
 ``OpLoopMerge`` to organize as structured loops.

 The HLSL attributes for these statements are translated into SPIR-V loop control
 masks according to the following table:

 +-------------------------+--------------------------------------------------+
 |   HLSL loop attribute   |            SPIR-V Loop Control Mask              |
 +=========================+==================================================+
 |        ``unroll(x)``    |                ``Unroll``                        |
 +-------------------------+--------------------------------------------------+
 |         ``loop``        |              ``DontUnroll``                      |
 +-------------------------+--------------------------------------------------+
 |        ``fastopt``      |              ``DontUnroll``                      |
 +-------------------------+--------------------------------------------------+
 | ``allow_uav_condition`` |           Currently Unimplemented                |
 +-------------------------+--------------------------------------------------+

 HLSL Functions
 ==============

 All functions reachable from the entry-point function will be translated into
 SPIR-V code. Functions not reachable from the entry-point function will be
 ignored.

 Entry function wrapper
 ----------------------

 HLSL entry functions takes in parameters and returns values. These parameters
 and return values can have semantics attached or if they are struct type,
 the struct fields can have semantics attached. However, in Vulkan, the entry
 function must be of the ``void(void)`` signature. To handle this difference,
 for a given entry function ``main``, we will emit a wrapper function for it.

 The wrapper function will take the name of the source code entry function,
 while the source code entry function will have its name prefixed with "src.".
 The wrapper function reads in stage input/builtin variables created according
 to semantics and groups them into composites meeting the requirements of the
 source code entry point. Then the wrapper calls the source code entry point.
 The return value is extracted and components of it will be written to stage
 output/builtin variables created according to semantics. For example:


 .. code:: hlsl

   // HLSL source code

   struct S {
     bool a : A;
     uint2 b: B;
     float2x3 c: C;
   };

   struct T {
     S x;
     int y: D;
   };

   T main(T input) {
     return input;
   }


 .. code:: spirv

   ; SPIR-V code

   %in_var_A = OpVariable %_ptr_Input_bool Input
   %in_var_B = OpVariable %_ptr_Input_v2uint Input
   %in_var_C = OpVariable %_ptr_Input_mat2v3float Input
   %in_var_D = OpVariable %_ptr_Input_int Input

   %out_var_A = OpVariable %_ptr_Output_bool Output
   %out_var_B = OpVariable %_ptr_Output_v2uint Output
   %out_var_C = OpVariable %_ptr_Output_mat2v3float Output
   %out_var_D = OpVariable %_ptr_Output_int Output

   ; Wrapper function starts

   %main    = OpFunction %void None ...
   ...      = OpLabel

   %param_var_input = OpVariable %_ptr_Function_T Function

   ; Load stage input variables and group into the expected composite

   %inA = OpLoad %bool %in_var_A
   %inB = OpLoad %v2uint %in_var_B
   %inC = OpLoad %mat2v3float %in_var_C
   %inS = OpCompositeConstruct %S %inA %inB %inC
   %inD = OpLoad %int %in_var_D
   %inT = OpCompositeConstruct %T %inS %inD
          OpStore %param_var_input %inT

   %ret = OpFunctionCall %T %src_main %param_var_input

   ; Extract component values from the composite and store into stage output variables

   %outS = OpCompositeExtract %S %ret 0
   %outA = OpCompositeExtract %bool %outS 0
           OpStore %out_var_A %outA
   %outB = OpCompositeExtract %v2uint %outS 1
           OpStore %out_var_B %outB
   %outC = OpCompositeExtract %mat2v3float %outS 2
           OpStore %out_var_C %outC
   %outD = OpCompositeExtract %int %ret 1
           OpStore %out_var_D %outD

   OpReturn
   OpFunctionEnd

   ; Source code entry point starts

   %src_main = OpFunction %T None ...

 In this way, we can concentrate all stage input/output/builtin variable
 manipulation in the wrapper function and handle the source code entry function
 just like other nomal functions.

 Function parameter
 ------------------

 For a function ``f`` which has a parameter of type ``T``, the generated SPIR-V
 signature will use type ``T*`` for the parameter. At every call site of ``f``,
 additional local variables will be allocated to hold the actual arguments.
 The local variables are passed in as direct function arguments. For example:

 .. code:: hlsl

   // HLSL source code

   float4 f(float a, int b) { ... }

   void caller(...) {
     ...
     float4 result = f(...);
     ...
   }

 .. code:: spirv

   ; SPIR-V code

                 ...
   %i32PtrType = OpTypePointer Function %int
   %f32PtrType = OpTypePointer Function %float
       %fnType = OpTypeFunction %v4float %f32PtrType %i32PtrType
                 ...

            %f = OpFunction %v4float None %fnType
            %a = OpFunctionParameter %f32PtrType
            %b = OpFunctionParameter %i32PtrType
                 ...

       %caller = OpFunction ...
                 ...
      %aAlloca = OpVariable %_ptr_Function_float Function
      %bAlloca = OpVariable %_ptr_Function_int Function
                 ...
                 OpStore %aAlloca ...
                 OpStore %bAlloca ...
       %result = OpFunctioncall %v4float %f %aAlloca %bAlloca
                 ...

 This approach gives us unified handling of function parameters and local
 variables: both of them are accessed via load/store instructions.

 Intrinsic functions
 -------------------

 The following intrinsic HLSL functions have no direct SPIR-V opcode or GLSL
 extended instruction mapping, so they are handled with additional steps:

 - ``dot`` : performs dot product of two vectors, each containing floats or
   integers. If the two parameters are vectors of floats, we use SPIR-V's
   ``OpDot`` instruction to perform the translation. If the two parameters are
   vectors of integers, we multiply corresponding vector elements using
   ``OpIMul`` and accumulate the results using ``OpIAdd`` to compute the dot
   product.
 - ``mul``: performs multiplications. Each argument may be a scalar, vector,
   or matrix. Depending on the argument type, this will be translated into
   one of the multiplication instructions.
 - ``all``: returns true if all components of the given scalar, vector, or
   matrix are true. Performs conversions to boolean where necessary. Uses SPIR-V
   ``OpAll`` for scalar arguments and vector arguments. For matrix arguments,
   performs ``OpAll`` on each row, and then again on the vector containing the
   results of all rows.
 - ``any``: returns true if any component of the given scalar, vector, or matrix
   is true. Performs conversions to boolean where necessary. Uses SPIR-V
   ``OpAny`` for scalar arguments and vector arguments. For matrix arguments,
   performs ``OpAny`` on each row, and then again on the vector containing the
   results of all rows.
 - ``asfloat``: converts the component type of a scalar/vector/matrix from float,
   uint, or int into float. Uses ``OpBitcast``. This method currently does not
   support taking non-float matrix arguments.
 - ``asint``: converts the component type of a scalar/vector/matrix from float or
   uint into int. Uses ``OpBitcast``. This method currently does not support
   conversion into integer matrices.
 - ``asuint``: converts the component type of a scalar/vector/matrix from float
   or int into uint. Uses ``OpBitcast``. This method currently does not support
 - ``asuint``: Converts a double into two 32-bit unsigned integers. Uses SPIR-V ``OpBitCast``.
 - ``asdouble``: Converts two 32-bit unsigned integers into a double, or four 32-bit unsigned
   integers into two doubles. Uses SPIR-V ``OpVectorShuffle`` and ``OpBitCast``.
   conversion into unsigned integer matrices.
 - ``isfinite`` : Determines if the specified value is finite. Since ``OpIsFinite``
   requires the ``Kernel`` capability, translation is done using ``OpIsNan`` and
   ``OpIsInf``.  A given value is finite iff it is not NaN and not infinite.
 - ``clip``: Discards the current pixel if the specified value is less than zero.
   Uses conditional control flow as well as SPIR-V ``OpKill``.
 - ``rcp``: Calculates a fast, approximate, per-component reciprocal.
   Uses SIR-V ``OpFDiv``.
 - ``lit``: Returns a lighting coefficient vector. This vector is a float4 with
   components of (ambient, diffuse, specular, 1). How ``diffuse`` and ``specular``
   are calculated are explained `here <https://msdn.microsoft.com/en-us/library/windows/desktop/bb509619(v=vs.85).aspx>`_.
 - ``D3DCOLORtoUBYTE4``: Converts a floating-point, 4D vector set by a D3DCOLOR to a UBYTE4.
   This is achieved by performing ``int4(input.zyxw * 255.002)`` using SPIR-V ``OpVectorShuffle``,
   ``OpVectorTimesScalar``, and ``OpConvertFToS``, respectively.
 - ``dst``: Calculates a distance vector. The resulting vector, ``dest``, has the following specifications:
   ``dest.x = 1.0``, ``dest.y = src0.y * src1.y``, ``dest.z = src0.z``, and ``dest.w = src1.w``.
   Uses SPIR-V ``OpCompositeExtract`` and ``OpFMul``.

 Using SPIR-V opcode
 ~~~~~~~~~~~~~~~~~~~

 The following intrinsic HLSL functions have direct SPIR-V opcodes for them:

 ==================================== =================================
    HLSL Intrinsic Function                   SPIR-V Opcode
 ==================================== =================================
 ``AllMemoryBarrier``                 ``OpMemoryBarrier``
 ``AllMemoryBarrierWithGroupSync``    ``OpControlBarrier``
 ``countbits``                        ``OpBitCount``
 ``DeviceMemoryBarrier``              ``OpMemoryBarrier``
 ``DeviceMemoryBarrierWithGroupSync`` ``OpControlBarrier``
 ``ddx``                              ``OpDPdx``
 ``ddy``                              ``OpDPdy``
 ``ddx_coarse``                       ``OpDPdxCoarse``
 ``ddy_coarse``                       ``OpDPdyCoarse``
 ``ddx_fine``                         ``OpDPdxFine``
 ``ddy_fine``                         ``OpDPdyFine``
 ``fmod``                             ``OpFRem``
 ``fwidth``                           ``OpFwidth``
 ``GroupMemoryBarrier``               ``OpMemoryBarrier``
 ``GroupMemoryBarrierWithGroupSync``  ``OpControlBarrier``
 ``InterlockedAdd``                   ``OpAtomicIAdd``
 ``InterlockedAnd``                   ``OpAtomicAnd``
 ``InterlockedOr``                    ``OpAtomicOr``
 ``InterlockedXor``                   ``OpAtomicXor``
 ``InterlockedMin``                   ``OpAtomicUMin``/``OpAtomicSMin``
 ``InterlockedMax``                   ``OpAtomicUMax``/``OpAtomicSMax``
 ``InterlockedExchange``              ``OpAtomicExchange``
 ``InterlockedCompareExchange``       ``OpAtomicCompareExchange``
 ``InterlockedCompareStore``          ``OpAtomicCompareExchange``
 ``isnan``                            ``OpIsNan``
 ``isInf``                            ``OpIsInf``
 ``reversebits``                      ``OpBitReverse``
 ``transpose``                        ``OpTranspose``
 ``CheckAccessFullyMapped``           ``OpImageSparseTexelsResident``
 ==================================== =================================

 Using GLSL extended instructions
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The following intrinsic HLSL functions are translated using their equivalent
 instruction in the `GLSL extended instruction set <https://www.khronos.org/registry/spir-v/specs/1.0/GLSL.std.450.html>`_.

 ======================= ===================================
 HLSL Intrinsic Function   GLSL Extended Instruction
 ======================= ===================================
 ``abs``                 ``SAbs``/``FAbs``
 ``acos``                ``Acos``
 ``asin``                ``Asin``
 ``atan``                ``Atan``
 ``atan2``               ``Atan2``
 ``ceil``                ``Ceil``
 ``clamp``               ``SClamp``/``UClamp``/``FClamp``
 ``cos``                 ``Cos``
 ``cosh``                ``Cosh``
 ``cross``               ``Cross``
 ``degrees``             ``Degrees``
 ``distance``            ``Distance``
 ``radians``             ``Radian``
 ``determinant``         ``Determinant``
 ``exp``                 ``Exp``
 ``exp2``                ``exp2``
 ``f16tof32``            ``UnpackHalf2x16``
 ``f32tof16``            ``PackHalf2x16``
 ``faceforward``         ``FaceForward``
 ``firstbithigh``        ``FindSMsb`` / ``FindUMsb``
 ``firstbitlow``         ``FindILsb``
 ``floor``               ``Floor``
 ``fma``                 ``Fma``
 ``frac``                ``Fract``
 ``frexp``               ``FrexpStruct``
 ``ldexp``               ``Ldexp``
 ``length``              ``Length``
 ``lerp``                ``FMix``
 ``log``                 ``Log``
 ``log10``               ``Log2`` (scaled by ``1/log2(10)``)
 ``log2``                ``Log2``
 ``mad``                 ``Fma``
 ``max``                 ``SMax``/``UMax``/``NMax``/``FMax``
 ``min``                 ``SMin``/``UMin``/``NMin``/``FMin``
 ``modf``                ``ModfStruct``
 ``normalize``           ``Normalize``
 ``pow``                 ``Pow``
 ``reflect``             ``Reflect``
 ``refract``             ``Refract``
 ``round``               ``RoundEven``
 ``rsqrt``               ``InverseSqrt``
 ``saturate``            ``FClamp``
 ``sign``                ``SSign``/``FSign``
 ``sin``                 ``Sin``
 ``sincos``              ``Sin`` and ``Cos``
 ``sinh``                ``Sinh``
 ``smoothstep``          ``SmoothStep``
 ``sqrt``                ``Sqrt``
 ``step``                ``Step``
 ``tan``                 ``Tan``
 ``tanh``                ``Tanh``
 ``trunc``               ``Trunc``
 ======================= ===================================

 Note on NMax,Nmin,FMax & FMin:

 This compiler supports the ``--ffinite-math-only`` option, which allows
 assuming non-NaN parameters to some operations. ``min`` & ``max`` intrinsics
 will by default generate ``NMin`` & ``NMax`` instructions, but if this option
 is enabled, ``FMin`` & ``FMax`` can be generated instead.

 Synchronization intrinsics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 Synchronization intrinsics are translated into ``OpMemoryBarrier`` (for those
 non-``WithGroupSync`` variants) or ``OpControlBarrier`` (for those ``WithGroupSync``
 variants) instructions with parameters:

 ======================= ============ ===== ======= ========= ==============
        HLSL                SPIR-V          SPIR-V Memory Semantics
 ----------------------- ------------ --------------------------------------
      Intrinsic          Memory Scope Image Uniform Workgroup AcquireRelease
 ======================= ============ ===== ======= ========= ==============
 ``AllMemoryBarrier``    Device       ✓       ✓         ✓          ✓
 ``DeviceMemoryBarrier`` Device       ✓       ✓                    ✓
 ``GroupMemoryBarrier``  Workgroup                       ✓          ✓
 ======================= ============ ===== ======= ========= ==============

 For the ``*WithGroupSync`` intrinsics, SPIR-V memory scope and semantics are the
 same as their counterparts in the above. They have an additional execution
 scope:

 ==================================== ======================
        HLSL Intrinsic                SPIR-V Execution Scope
 ==================================== ======================
 ``AllMemoryBarrierWithGroupSync``    Workgroup
 ``DeviceMemoryBarrierWithGroupSync`` Workgroup
 ``GroupMemoryBarrierWithGroupSync``  Workgroup
 ==================================== ======================

 HLSL OO features
 ================

 A HLSL struct/class member method is translated into a normal SPIR-V function,
 whose signature has an additional first parameter for the struct/class called
 upon. Every calling site of the method is generated to pass in the object as
 the first argument.

 HLSL struct/class static member variables are translated into SPIR-V variables
 in the ``Private`` storage class.

 HLSL Methods
 ============

 This section lists how various HLSL methods are mapped.

 Buffers
 -------

 ``Buffer``
 ~~~~~~~~~~

 ``.Load()``
 +++++++++++
 Since Buffers are represented as ``OpTypeImage`` with ``Sampled`` set to 1
 (meaning to be used with a sampler), ``OpImageFetch`` is used to perform this
 operation. The return value of ``OpImageFetch`` is always a four-component
 vector; so proper additional instructions are generated to truncate the vector
 and return the desired number of elements.
 If an output unsigned integer ``status`` argument is present, ``OpImageSparseFetch``
 is used instead. The resulting SPIR-V ``Residency Code`` will be written to ``status``.

 ``operator[]``
 ++++++++++++++
 Handled similarly as ``.Load()``.

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since Buffers are represented as ``OpTypeImage`` with dimension of ``Buffer``,
 ``OpImageQuerySize`` is used to perform this operation.

 ``RWBuffer``
 ~~~~~~~~~~~~

 ``.Load()``
 +++++++++++
 Since RWBuffers are represented as ``OpTypeImage`` with ``Sampled`` set to 2
 (meaning to be used without a sampler), ``OpImageRead`` is used to perform this
 operation. If an output unsigned integer ``status`` argument is present, ``OpImageSparseRead``
 is used instead. The resulting SPIR-V ``Residency Code`` will be written to ``status``.

 ``operator[]``
 ++++++++++++++
 Using ``operator[]`` for reading is handled similarly as ``.Load()``, while for
 writing, the ``OpImageWrite`` instruction is generated.

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since RWBuffers are represented as ``OpTypeImage`` with dimension of ``Buffer``,
 ``OpImageQuerySize`` is used to perform this operation.

 ``StructuredBuffer`` and ``RWStructuredBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since StructuredBuffers/RWStructuredBuffers are represented as a struct with one
 member that is a runtime array of structures, ``OpArrayLength`` is invoked on
 the runtime array in order to find the dimension.

 ``ByteAddressBuffer``
 ~~~~~~~~~~~~~~~~~~~~~

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since ByteAddressBuffers are represented as a struct with one member that is a
 runtime array of unsigned integers, ``OpArrayLength`` is invoked on the runtime array
 in order to find the number of unsigned integers. This is then multiplied by 4 to find
 the number of bytes.

 ``.Load()``, ``.Load2()``, ``.Load3()``, ``.Load4()``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 ByteAddressBuffers are represented as a struct with one member that is a runtime array of
 unsigned integers. The ``address`` argument passed to the function is first divided by 4
 in order to find the offset into the array (because each array element is 4 bytes). The
 SPIR-V ``OpAccessChain`` instruction is then used to access that offset, and ``OpLoad`` is
 used to load a 32-bit unsigned integer. For ``Load2``, ``Load3``, and ``Load4``, this is
 done 2, 3, and 4 times, respectively. Each time the word offset is incremented by 1 before
 performing ``OpAccessChain``. After all ``OpLoad`` operations are performed, a vector is
 constructed with all the resulting values.

 ``RWByteAddressBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since RWByteAddressBuffers are represented as a struct with one member that is a
 runtime array of unsigned integers, ``OpArrayLength`` is invoked on the runtime array
 in order to find the number of unsigned integers. This is then multiplied by 4 to find
 the number of bytes.

 ``.Load()``, ``.Load2()``, ``.Load3()``, ``.Load4()``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 RWByteAddressBuffers are represented as a struct with one member that is a runtime array of
 unsigned integers. The ``address`` argument passed to the function is first divided by 4
 in order to find the offset into the array (because each array element is 4 bytes). The
 SPIR-V ``OpAccessChain`` instruction is then used to access that offset, and ``OpLoad`` is
 used to load a 32-bit unsigned integer. For ``Load2``, ``Load3``, and ``Load4``, this is
 done 2, 3, and 4 times, respectively. Each time the word offset is incremented by 1 before
 performing ``OpAccessChain``. After all ``OpLoad`` operations are performed, a vector is
 constructed with all the resulting values.

 ``.Store()``, ``.Store2()``, ``.Store3()``, ``.Store4()``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 RWByteAddressBuffers are represented as a struct with one member that is a runtime array of
 unsigned integers. The ``address`` argument passed to the function is first divided by 4
 in order to find the offset into the array (because each array element is 4 bytes). The
 SPIR-V ``OpAccessChain`` instruction is then used to access that offset, and ``OpStore`` is
 used to store a 32-bit unsigned integer. For ``Store2``, ``Store3``, and ``Store4``, this is
 done 2, 3, and 4 times, respectively. Each time the word offset is incremented by 1 before
 performing ``OpAccessChain``.

 ``.Interlocked*()``
 +++++++++++++++++++

 ================================= =================================
      HLSL Intrinsic Method                SPIR-V Opcode
 ================================= =================================
 ``.InterlockedAdd()``             ``OpAtomicIAdd``
 ``.InterlockedAnd()``             ``OpAtomicAnd``
 ``.InterlockedOr()``              ``OpAtomicOr``
 ``.InterlockedXor()``             ``OpAtomicXor``
 ``.InterlockedMin()``             ``OpAtomicUMin``/``OpAtomicSMin``
 ``.InterlockedMax()``             ``OpAtomicUMax``/``OpAtomicSMax``
 ``.InterlockedExchange()``        ``OpAtomicExchange``
 ``.InterlockedCompareExchange()`` ``OpAtomicCompareExchange``
 ``.InterlockedCompareStore()``    ``OpAtomicCompareExchange``
 ================================= =================================

 ``AppendStructuredBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``.Append()``
 +++++++++++++

 The associated counter number will be increased by 1 using ``OpAtomicIAdd``.
 The return value of ``OpAtomicIAdd``, which is the original count number, will
 be used as the index for storing the new element. E.g., for ``buf.Append(vec)``:

 .. code:: spirv

   %counter = OpAccessChain %_ptr_Uniform_int %counter_var_buf %uint_0
     %index = OpAtomicIAdd %uint %counter %uint_1 %uint_0 %uint_1
       %ptr = OpAccessChain %_ptr_Uniform_v4float %buf %uint_0 %index
       %val = OpLoad %v4float %vec
              OpStore %ptr %val

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since AppendStructuredBuffers are represented as a struct with one member that
 is a runtime array, ``OpArrayLength`` is invoked on the runtime array in order
 to find the number of elements. The stride is also calculated based on GLSL
 ``std430`` as explained above.

 ``ConsumeStructuredBuffer``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 ``.Consume()``
 ++++++++++++++

 The associated counter number will be decreased by 1 using ``OpAtomicISub``.
 The return value of ``OpAtomicISub`` minus 1, which is the new count number,
 will be used as the index for reading the new element. E.g., for
 ``buf.Consume(vec)``:

 .. code:: spirv

   %counter = OpAccessChain %_ptr_Uniform_int %counter_var_buf %uint_0
      %prev = OpAtomicISub %uint %counter %uint_1 %uint_0 %uint_1
     %index = OpISub %uint %prev %uint_1
       %ptr = OpAccessChain %_ptr_Uniform_v4float %buf %uint_0 %index
       %val = OpLoad %v4float %vec
              OpStore %ptr %val

 ``.GetDimensions()``
 ++++++++++++++++++++
 Since ConsumeStructuredBuffers are represented as a struct with one member that
 is a runtime array, ``OpArrayLength`` is invoked on the runtime array in order
 to find the number of elements. The stride is also calculated based on GLSL
 ``std430`` as explained above.

 Read-only textures
 ------------------

 Methods common to all texture types are explained in the "common texture methods"
 section. Methods unique to a specific texture type is explained in the section
 for that texture type.

 Common texture methods
 ~~~~~~~~~~~~~~~~~~~~~~

 ``.Sample(sampler, location[, offset][, clamp][, Status])``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture2DMS`` and ``Texture2DMSArray``.

 The ``OpImageSampleImplicitLod`` instruction is used to translate ``.Sample()``
 since texture types are represented as ``OpTypeImage``. An ``OpSampledImage`` is
 created based on the ``sampler`` passed to the function. The resulting sampled
 image and the ``location`` passed to the function are used as arguments to
 ``OpImageSampleImplicitLod``, with the optional ``offset`` tranlated into
 addtional SPIR-V image operands ``ConstOffset`` or ``Offset`` on it. The optional
 ``clamp`` argument will be translated to the ``MinLod`` image operand.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleImplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.SampleLevel(sampler, location, lod[, offset][, Status])``
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture2DMS`` and ``Texture2DMSArray``.

 The ``OpImageSampleExplicitLod`` instruction is used to translate this method.
 An ``OpSampledImage`` is created based on the ``sampler`` passed to the function.
 The resulting sampled image and the ``location`` passed to the function are used
 as arguments to ``OpImageSampleExplicitLod``. The ``lod`` passed to the function
 is attached to the instruction as an SPIR-V image operands ``Lod``. The optional
 ``offset`` is also tranlated into addtional SPIR-V image operands ``ConstOffset``
 or ``Offset`` on it.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleExplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.SampleGrad(sampler, location, ddx, ddy[, offset][, clamp][, Status])``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture2DMS`` and ``Texture2DMSArray``.

 Similarly to ``.SampleLevel``, the ``ddx`` and ``ddy`` parameter are attached to
 the ``OpImageSampleExplicitLod`` instruction as an SPIR-V image operands
 ``Grad``. The optional ``clamp`` argument will be translated into the ``MinLod``
 image operand.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleExplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.SampleBias(sampler, location, bias[, offset][, clamp][, Status])``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture2DMS`` and ``Texture2DMSArray``.

 The translation is similar to ``.Sample()``, with the ``bias`` parameter
 attached to the ``OpImageSampleImplicitLod`` instruction as an SPIR-V image
 operands ``Bias``.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleImplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.SampleCmp(sampler, location, comparator[, offset][, clamp][, Status])``
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture3D``, ``Texture2DMS``, and ``Texture2DMSArray``.

 The translation is similar to ``.Sample()``, but the
 ``OpImageSampleDrefImplicitLod`` instruction are used.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleDrefImplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.SampleCmpLevelZero(sampler, location, comparator[, offset][, Status])``
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture3D``, ``Texture2DMS``, and ``Texture2DMSArray``.

 The translation is similar to ``.Sample()``, but the
 ``OpImageSampleDrefExplicitLod`` instruction are used, with the additional
 ``Lod`` image operands set to 0.0.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseSampleDrefExplicitLod`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.Gather()``
 +++++++++++++

 Available to ``Texture2D``, ``Texture2DArray``, ``TextureCube``, and
 ``TextureCubeArray``.

 The translation is similar to ``.Sample()``, but the ``OpImageGather``
 instruction is used, with component setting to 0.

 If an output unsigned integer ``status`` argument is present,
 ``OpImageSparseGather`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``.GatherRed()``, ``.GatherGreen()``, ``.GatherBlue()``, ``.GatherAlpha()``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Available to ``Texture2D``, ``Texture2DArray``, ``TextureCube``, and
 ``TextureCubeArray``.

 The ``OpImageGather`` instruction is used to translate these functions, with
 component setting to 0, 1, 2, and 3 respectively.

 There are a few overloads for these functions:

 - For those overloads taking 4 offset parameters, those offset parameters will
   be conveyed as an additional ``ConstOffsets`` image operands to the
   instruction if those offset parameters are all constants. Otherwise,
   4 separate ``OpImageGather`` instructions will be emitted to get each texel
   from each offset, using the ``Offset`` image operands.
 - For those overloads with the ``status`` parameter, ``OpImageSparseGather``
   is used instead, and the resulting SPIR-V ``Residency Code`` will be
   written to ``status``.

 ``.GatherCmp()``
 ++++++++++++++++

 Available to ``Texture2D``, ``Texture2DArray``, ``TextureCube``, and
 ``TextureCubeArray``.

 The translation is similar to ``.Sample()``, but the ``OpImageDrefGather``
 instruction is used.

 For the overload with the output unsigned integer ``status`` argument,
 ``OpImageSparseDrefGather`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.


 ``.GatherCmpRed()``
 +++++++++++++++++++

 Available to ``Texture2D``, ``Texture2DArray``, ``TextureCube``, and
 ``TextureCubeArray``.

 The translation is the same as ``.GatherCmp()``.

 ``.Load(location[, sampleIndex][, offset])``
 ++++++++++++++++++++++++++++++++++++++++++++

 The ``OpImageFetch`` instruction is used for translation because texture types
 are represented as ``OpTypeImage``. The last element in the ``location``
 parameter will be used as arguments to the ``Lod`` SPIR-V image operand attached
 to the ``OpImageFetch`` instruction, and the rest are used as the coordinate
 argument to the instruction. ``offset`` is handled similarly to ``.Sample()``.
 The return value of ``OpImageFetch`` is always a four-component vector; so
 proper additional instructions are generated to truncate the vector and return
 the desired number of elements.

 For the overload with the output unsigned integer ``status`` argument,
 ``OpImageSparseFetch`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``operator[]``
 ++++++++++++++
 Handled similarly as ``.Load()``.

 ``.mips[lod][position]``
 ++++++++++++++++++++++++

 Not available to ``TextureCube``, ``TextureCubeArray``, ``Texture2DMS``, and
 ``Texture2DMSArray``.

 This method is translated into the ``OpImageFetch`` instruction. The ``lod``
 parameter is attached to the instruction as the parameter to the ``Lod`` SPIR-V
 image operands. The ``position`` parameter are used as the coordinate to the
 instruction directly.

 ``.CalculateLevelOfDetail()`` and ``.CalculateLevelOfDetailUnclamped()``
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Not available to ``Texture2DMS`` and ``Texture2DMSArray``.

 Since texture types are represented as ``OpTypeImage``, the ``OpImageQueryLod``
 instruction is used for translation. An ``OpSampledImage`` is created based on
 the ``SamplerState`` passed to the function. The resulting sampled image and
 the coordinate passed to the function are used to invoke ``OpImageQueryLod``.
 The result of ``OpImageQueryLod`` is a ``float2``. The first element contains
 the mipmap array layer. The second element contains the unclamped level of detail.

 ``Texture1D``
 ~~~~~~~~~~~~~

 ``.GetDimensions(width)`` or ``.GetDimensions(MipLevel, width, NumLevels)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture1D is represented as ``OpTypeImage``, the ``OpImageQuerySizeLod`` instruction
 is used for translation. If a ``MipLevel`` argument is passed to ``GetDimensions``, it will
 be used as the ``Lod`` parameter of the query instruction. Otherwise, ``Lod`` of ``0`` be used.

 ``Texture1DArray``
 ~~~~~~~~~~~~~~~~~~

 ``.GetDimensions(width, elements)`` or ``.GetDimensions(MipLevel, width, elements, NumLevels)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture1DArray is represented as ``OpTypeImage``, the ``OpImageQuerySizeLod`` instruction
 is used for translation. If a ``MipLevel`` argument is present, it will be used as the
 ``Lod`` parameter of the query instruction. Otherwise, ``Lod`` of ``0`` be used.

 ``Texture2D``
 ~~~~~~~~~~~~~

 ``.GetDimensions(width, height)`` or ``.GetDimensions(MipLevel, width, height, NumLevels)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture2D is represented as ``OpTypeImage``, the ``OpImageQuerySizeLod`` instruction
 is used for translation. If a ``MipLevel`` argument is present, it will be used as the
 ``Lod`` parameter of the query instruction. Otherwise, ``Lod`` of ``0`` be used.

 ``Texture2DArray``
 ~~~~~~~~~~~~~~~~~~

 ``.GetDimensions(width, height, elements)`` or ``.GetDimensions(MipLevel, width, height, elements, NumLevels)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture2DArray is represented as ``OpTypeImage``, the ``OpImageQuerySizeLod`` instruction
 is used for translation. If a ``MipLevel`` argument is present, it will be used as the
 ``Lod`` parameter of the query instruction. Otherwise, ``Lod`` of ``0`` be used.

 ``Texture3D``
 ~~~~~~~~~~~~~

 ``.GetDimensions(width, height, depth)`` or ``.GetDimensions(MipLevel, width, height, depth, NumLevels)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture3D is represented as ``OpTypeImage``, the ``OpImageQuerySizeLod`` instruction
 is used for translation. If a ``MipLevel`` argument is present, it will be used as the
 ``Lod`` parameter of the query instruction. Otherwise, ``Lod`` of ``0`` be used.

 ``Texture2DMS``
 ~~~~~~~~~~~~~~~

 ``.sample[sample][position]``
 +++++++++++++++++++++++++++++
 This method is translated into the ``OpImageFetch`` instruction. The ``sample``
 parameter is attached to the instruction as the parameter to the ``Sample``
 SPIR-V image operands. The ``position`` parameter are used as the coordinate to
 the instruction directly.

 ``.GetDimensions(width, height, numSamples)``
 +++++++++++++++++++++++++++++++++++++++++++++
 Since Texture2DMS is represented as ``OpTypeImage`` with ``MS`` of ``1``, the ``OpImageQuerySize`` instruction
 is used to get the width and the height. Furthermore, ``OpImageQuerySamples`` is used to get the numSamples.

 ``.GetSamplePosition(index)``
 +++++++++++++++++++++++++++++
 There are no direct mapping SPIR-V instructions for this method. Right now, it
 is translated into the SPIR-V code for the following HLSL source code:

 .. code:: hlsl

   // count is the number of samples in the Texture2DMS(Array)
   // index is the index of the sample we are trying to get the position

   static const float2 pos2[] = {
       { 4.0/16.0,  4.0/16.0 }, {-4.0/16.0, -4.0/16.0 },
   };

   static const float2 pos4[] = {
       {-2.0/16.0, -6.0/16.0 }, { 6.0/16.0, -2.0/16.0 }, {-6.0/16.0,  2.0/16.0 }, { 2.0/16.0,  6.0/16.0 },
   };

   static const float2 pos8[] = {
       { 1.0/16.0, -3.0/16.0 }, {-1.0/16.0,  3.0/16.0 }, { 5.0/16.0,  1.0/16.0 }, {-3.0/16.0, -5.0/16.0 },
       {-5.0/16.0,  5.0/16.0 }, {-7.0/16.0, -1.0/16.0 }, { 3.0/16.0,  7.0/16.0 }, { 7.0/16.0, -7.0/16.0 },
   };

   static const float2 pos16[] = {
       { 1.0/16.0,  1.0/16.0 }, {-1.0/16.0, -3.0/16.0 }, {-3.0/16.0,  2.0/16.0 }, { 4.0/16.0, -1.0/16.0 },
       {-5.0/16.0, -2.0/16.0 }, { 2.0/16.0,  5.0/16.0 }, { 5.0/16.0,  3.0/16.0 }, { 3.0/16.0, -5.0/16.0 },
       {-2.0/16.0,  6.0/16.0 }, { 0.0/16.0, -7.0/16.0 }, {-4.0/16.0, -6.0/16.0 }, {-6.0/16.0,  4.0/16.0 },
       {-8.0/16.0,  0.0/16.0 }, { 7.0/16.0, -4.0/16.0 }, { 6.0/16.0,  7.0/16.0 }, {-7.0/16.0, -8.0/16.0 },
   };

   float2 position = float2(0.0f, 0.0f);

   if (count == 2) {
       position = pos2[index];
   } else if (count == 4) {
       position = pos4[index];
   } else if (count == 8) {
       position = pos8[index];
   } else if (count == 16) {
       position = pos16[index];
   }

 From the above, it's clear that the current implementation only supports standard
 sample settings, i.e., with 1, 2, 4, 8, or 16 samples. For other cases, the
 implementation will just return `(float2)0`.

 ``Texture2DMSArray``
 ~~~~~~~~~~~~~~~~~~~~

 ``.sample[sample][position]``
 +++++++++++++++++++++++++++++
 This method is translated into the ``OpImageFetch`` instruction. The ``sample``
 parameter is attached to the instruction as the parameter to the ``Sample``
 SPIR-V image operands. The ``position`` parameter are used as the coordinate to
 the instruction directly.

 ``.GetDimensions(width, height, elements, numSamples)``
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 Since Texture2DMS is represented as ``OpTypeImage`` with ``MS`` of ``1``, the ``OpImageQuerySize`` instruction
 is used to get the width, the height, and the elements. Furthermore, ``OpImageQuerySamples`` is used to get the numSamples.

 ``.GetSamplePosition(index)``
 +++++++++++++++++++++++++++++
 Similar to Texture2D.

 ``TextureCube``
 ~~~~~~~~~~~~~~~

 ``TextureCubeArray``
 ~~~~~~~~~~~~~~~~~~~~

 Read-write textures
 -------------------

 Methods common to all texture types are explained in the "common texture methods"
 section. Methods unique to a specific texture type is explained in the section
 for that texture type.

 Common texture methods
 ~~~~~~~~~~~~~~~~~~~~~~

 ``.Load()``
 +++++++++++
 Since read-write texture types are represented as ``OpTypeImage`` with
 ``Sampled`` set to 2 (meaning to be used without a sampler), ``OpImageRead`` is
 used to perform this operation.

 For the overload with the output unsigned integer ``status`` argument,
 ``OpImageSparseRead`` is used instead. The resulting SPIR-V
 ``Residency Code`` will be written to ``status``.

 ``operator[]``
 ++++++++++++++
 Using ``operator[]`` for reading is handled similarly as ``.Load()``, while for
 writing, the ``OpImageWrite`` instruction is generated.

 ``RWTexture1D``
 ~~~~~~~~~~~~~~~

 ``.GetDimensions(width)``
 +++++++++++++++++++++++++
 The ``OpImageQuerySize`` instruction is used to find the width.

 ``RWTexture1DArray``
 ~~~~~~~~~~~~~~~~~~~~

 ``.GetDimensions(width, elements)``
 +++++++++++++++++++++++++++++++++++
 The ``OpImageQuerySize`` instruction is used to get a uint2. The first element
 is the width, and the second is the elements.

 ``RWTexture2D``
 ~~~~~~~~~~~~~~~

 ``.GetDimensions(width, height)``
 +++++++++++++++++++++++++++++++++
 The ``OpImageQuerySize`` instruction is used to get a uint2. The first element is the width, and the second
 element is the height.

 ``RWTexture2DArray``
 ~~~~~~~~~~~~~~~~~~~~

 ``.GetDimensions(width, height, elements)``
 +++++++++++++++++++++++++++++++++++++++++++
 The ``OpImageQuerySize`` instruction is used to get a uint3. The first element is the width, the second
 element is the height, and the third is the elements.

 ``RWTexture3D``
 ~~~~~~~~~~~~~~~

 ``.GetDimensions(width, height, depth)``
 ++++++++++++++++++++++++++++++++++++++++
 The ``OpImageQuerySize`` instruction is used to get a uint3. The first element is the width, the second
 element is the height, and the third element is the depth.

 HLSL Shader Stages
 ==================

 Hull Shaders
 ------------

 Hull shaders corresponds to Tessellation Control Shaders (TCS) in Vulkan.
 This section describes how Hull shaders are translated to SPIR-V for Vulkan.

 Hull Entry Point Attributes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The following HLSL attributes are attached to the main entry point of hull shaders
 and are translated to SPIR-V execution modes according to the table below:

 .. table:: Mapping from HLSL attribute to SPIR-V execution mode

 +-------------------------+---------------------+--------------------------+
 | HLSL Attribute          |   value             | SPIR-V Execution Mode    |
 +=========================+=====================+==========================+
 |                         | ``quad``            | ``Quads``                |
 |                         +---------------------+--------------------------+
 |    ``domain``           | ``tri``             | ``Triangles``            |
 |                         +---------------------+--------------------------+
 |                         | ``isoline``         | ``Isoline``              |
 +-------------------------+---------------------+--------------------------+
 |                         | ``integer``         | ``SpacingEqual``         |
 |                         +---------------------+--------------------------+
 |                         | ``fractional_even`` | ``SpacingFractionalEven``|
 |    ``partitioning``     +---------------------+--------------------------+
 |                         | ``fractional_odd``  | ``SpacingFractionalOdd`` |
 |                         +---------------------+--------------------------+
 |                         | ``pow2``            |           N/A            |
 +-------------------------+---------------------+--------------------------+
 |                         | ``point``           | ``PointMode``            |
 |                         +---------------------+--------------------------+
 |                         | ``line``            |           N/A            |
 |  ``outputtopology``     +---------------------+--------------------------+
 |                         | ``triangle_cw``     | ``VertexOrderCw``        |
 |                         +---------------------+--------------------------+
 |                         | ``triangle_ccw``    | ``VertexOrderCcw``       |
 +-------------------------+---------------------+--------------------------+
 |``outputcontrolpoints``  | ``n``               | ``OutputVertices n``     |
 +-------------------------+---------------------+--------------------------+

 The ``patchconstfunc`` attribute does not have a direct equivalent in SPIR-V.
 It specifies the name of the Patch Constant Function. This function is run only
 once per patch. This is further described below.

 InputPatch and OutputPatch
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 Both of ``InputPatch<T, N>`` and ``OutputPatch<T, N>`` are translated to an array
 of constant size ``N`` where each element is of type ``T``.

 InputPatch can be passed to the Hull shader main entry function as well as the
 patch constant function. This would include information about each of the ``N``
 vertices that are input to the tessellation control shader.

 OutputPatch is an array containing ``N`` elements (where ``N`` is the number of
 output vertices). Each element of the array is the hull shader output for each
 output vertex. For example, each element of ``OutputPatch<HSOutput, 3>`` is each
 output value of the hull shader function for each ``SV_OutputControlPointID``.
 It is shared between threads i.e., in the patch constant function, threads for
 the same patch must see the same values for the elements of
 ``OutputPatch<HSOutput, 3>``.

 The SPIR-V ``InvocationID`` (``SV_OutputControlPointID`` in HLSL) is used to index
 into the InputPatch and OutputPatch arrays to read/write information for the given
 vertex.

 The hull main entry function in HLSL returns only one value (say, of type ``T``), but
 that function is in fact executed once for each control point. The Vulkan spec requires that
 "Tessellation control shader per-vertex output variables and blocks, and tessellation control,
 tessellation evaluation, and geometry shader per-vertex input variables and blocks are required
 to be declared as arrays, with each element representing input or output values for a single vertex
 of a multi-vertex primitive". Therefore, we need to create a stage output variable that is an array
 with elements of type ``T``. The number of elements of the array is equal to the number of
 output control points. Each final output control point is written into the corresponding element in
 the array using SV_OutputControlPointID as the index.

 Patch Constant Function
 ~~~~~~~~~~~~~~~~~~~~~~~
 As mentioned above, the patch constant function is to be invoked only once per patch.
 As a result, in the SPIR-V module, the `entry function wrapper`_ will first invoke the
 main entry function, and then use an ``OpControlBarrier`` to wait for all vertex
 processing to finish. After the barrier, *only* the first thread (with InvocationID of 0)
 will invoke the patch constant function. Since the first thread has to see the
 OutputPatch that contains output of the hull shader function for other threads,
 we have to use the output stage variable (with Output storage class) of the
 hull shader function for OutputPatch that can be an input to the patch constant
 function.

 The information resulting from the patch constant function will also be returned
 as stage output variables. The output struct of the patch constant function must include
 ``SV_TessFactor`` and ``SV_InsideTessFactor`` fields which will translate to
 ``TessLevelOuter`` and ``TessLevelInner`` builtin variables, respectively. And the rest
 will be flattened and translated into normal stage output variables, one for each field.

 Geometry Shaders
 ----------------

 This section describes how geometry shaders are translated to SPIR-V for Vulkan.

 Geometry Shader Entry Point Attributes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The following HLSL attribute is attached to the main entry point of geometry shaders
 and is translated to SPIR-V execution mode as follows:

 .. table:: Mapping from geometry shader HLSL attribute to SPIR-V execution mode

 +-------------------------+---------------------+--------------------------+
 | HLSL Attribute          |   value             | SPIR-V Execution Mode    |
 +=========================+=====================+==========================+
 |``maxvertexcount``       | ``n``               | ``OutputVertices n``     |
 +-------------------------+---------------------+--------------------------+
 |``instance``             | ``n``               | ``Invocations n``        |
 +-------------------------+---------------------+--------------------------+

 Translation for Primitive Types
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Geometry shader vertex inputs may be qualified with primitive types. Only one primitive type
 is allowed to be used in a given geometry shader. The following table shows the SPIR-V execution
 mode that is used in order to represent the given primitive type.

 .. table:: Mapping from geometry shader primitive type to SPIR-V execution mode

 +---------------------+-----------------------------+
 | HLSL Primitive Type | SPIR-V Execution Mode       |
 +=====================+=============================+
 |``point``            | ``InputPoints``             |
 +---------------------+-----------------------------+
 |``line``             | ``InputLines``              |
 +---------------------+-----------------------------+
 |``triangle``         | ``Triangles``               |
 +---------------------+-----------------------------+
 |``lineadj``          | ``InputLinesAdjacency``     |
 +---------------------+-----------------------------+
 |``triangleadj``      | ``InputTrianglesAdjacency`` |
 +---------------------+-----------------------------+

 Translation of Output Stream Types
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Supported output stream types in geometry shaders are: ``PointStream<T>``,
 ``LineStream<T>``, and ``TriangleStream<T>``. These types are translated as the underlying
 type ``T``, which is recursively flattened into stand-alone variables for each field.

 Furthermore, output stream objects passed to geometry shader entry points are
 required to be annotated with ``inout``, but the generated SPIR-V only contains
 stage output variables for them.

 The following table shows the SPIR-V execution mode that is used in order to represent the
 given output stream.

 .. table:: Mapping from geometry shader output stream type to SPIR-V execution mode

 +---------------------+-----------------------------+
 | HLSL Output Stream  | SPIR-V Execution Mode       |
 +=====================+=============================+
 |``PointStream``      | ``OutputPoints``            |
 +---------------------+-----------------------------+
 |``LineStream``       | ``OutputLineStrip``         |
 +---------------------+-----------------------------+
 |``TriangleStream``   | ``OutputTriangleStrip``     |
 +---------------------+-----------------------------+

 In other shader stages, stage output variables are only written in the `entry
 function wrapper`_ after calling the source code entry function. However,
 geometry shaders can output as many vertices as they wish, by calling the
 ``.Append()`` method on the output stream object. Therefore, it is incorrect to
 have only one flush in the entry function wrapper like other stages. Instead,
 each time a ``*Stream<T>::Append()`` is encountered, all stage output variables
 behind ``T`` will be flushed before SPIR-V ``OpEmitVertex`` instruction is
 generated. ``.RestartStrip()`` method calls will be translated into the SPIR-V
 ``OpEndPrimitive`` instruction.

 Raytracing Shader Stages
 ------------------------

 DirectX Raytracing adds six new shader stages for raytracing namely ray generation, intersection, closest-hit,
 any-hit, miss and callable.

 | Refer to following pages for details:
 | https://docs.microsoft.com/en-us/windows/desktop/direct3d12/direct3d-12-raytracing
 | https://docs.microsoft.com/en-us/windows/desktop/direct3d12/direct3d-12-raytracing-hlsl-reference


 Flow chart for various stages in a raytracing pipeline is as follows:
 ::

           +---------------------+
           |   Ray generation    |
           +---------------------+
                      |
           TraceRay() |                      +--------------+
                      |      _ _ _ _ _ _ _ _ |   Any Hit    |
                      |     |                +--------------+
                      V     V                       ^
           +---------------------+                  |
           |    Acceleration     |           +--------------+
           |     Structure       |           | Intersection |
           |     Traversal       |           +--------------+
           +---------------------+                  ^
                     |        |                     |
                     |        |_ _ _ _ _ _ _ _ _ _ _|
                     |
                     |
                     V
           +--------------------+            +-------------+
           |      Is Hit ?      |            |  Callable   |
           +--------------------+            +-------------+
               |            |
           Yes |            | No
               V            V
          +---------+    +------+
          | Closest |    | Miss |
          |   Hit   |    |      |
          +---------+    +------+


 | *Note : DXC does not add special shader profiles for raytracing under -T option.*
 | *All raytracing shaders must be compiled as library using lib_6_3/lib_6_4 profile option.*
 | *Note : DXC now targets SPV_KHR_ray_tracing extension by default.*
 | *This extension is provisional and subject to change*.
 | *To compile for NV extension use -fspv-extension=SPV_NV_ray_tracing.*

 Ray Generation Stage
 ~~~~~~~~~~~~~~~~~~~~

 | Ray generation shaders start ray tracing work and work on a compute-like 3D grid of threads.
 | Entry functions of this stage type are annotated with **[shader("raygeneration")]** in HLSL source.
 | Such entry functions must return void and do not accept any arguments.

 | For example:

 .. code:: hlsl

   RaytracingAccelerationStructure rs;
   struct Payload
   {
   float4 color;
   };
   [shader("raygeneration")]
   void main() {
     Payload myPayload = { float4(0.0f,0.0f,0.0f,0.0f) };
     RayDesc rayDesc;
     rayDesc.Origin = float3(0.0f, 0.0f, 0.0f);
     rayDesc.Direction = float3(0.0f, 0.0f, -1.0f);
     rayDesc.TMin = 0.0f;
     rayDesc.TMax = 1000.0f;
     TraceRay(rs, 0x0, 0xff, 0, 1, 0, rayDesc, myPayload);
   }

 Intersection Stage
 ~~~~~~~~~~~~~~~~~~

 | Intersection shader stage is used to implement arbitrary ray-primitive intersections such spheres or axis-aligned bounding boxes (AABB). Triangle primitives do not require a custom intersection shader.
 | Entry functions of this stage are annotated with **[shader("intersection")]** in HLSL source.
 | Such entry functions must return void and do not accept any arguments.

 | For example:

 .. code:: hlsl

   struct Attribute
   {
     float2 bary;
   };

   [shader("intersection")]
   void main() {
   Attribute myHitAttribute = { float2(0.0f,0.0f) };
   ReportHit(0.0f, 0U, myHitAttribute);
   }


 Closest-Hit Stage
 ~~~~~~~~~~~~~~~~~

 | Hit shaders are invoked when a ray primitive intersection is found. A closest-hit shader
 | is invoked for the closest intersection point along a ray and can be used to compute interactions
 | at intersection point or spawn secondary rays.
 | Entry functions of this stage are annotated with **[shader("closesthit")]** in HLSL source.
 | Such entry functions must return void and accept exactly two arguments. First argument must be an inout
 | variable of user defined structure type and second argument must be a in variable of user defined structure type.

 | For example:

 .. code:: hlsl

   struct Attribute
   {
     float2 bary;
   };
   struct Payload {
     float4 color;
   };
   [shader("closesthit")]
   void main(inout Payload a, in Attribute b) {
     a.color = float4(0.0f,1.0f,0.0f,0.0f);
   }

 Any-Hit Stage
 ~~~~~~~~~~~~~~~~~

 | Hit shaders are invoked when a ray primitive intersection is found. An any-hit shader
 | is invoked for all intersections along a ray with a primitive.
 | Entry functions of this stage are annotated with **[shader("anyhit")]** in HLSL source.
 | Such entry functions must return void and accept exactly two arguments. First argument must be an inout
 | variable of user defined structure type and second argument must be an in variable of user defined structure type.

 | For example:

 .. code:: hlsl

   struct Attribute
   {
     float2 bary;
   };
   struct Payload {
     float4 color;
   };
   [shader("anyhit")]
   void main(inout Payload a, in Attribute b) {
     a.color = float4(0.0f,1.0f,0.0f,0.0f);
   }

 Miss Stage
 ~~~~~~~~~~

 | Miss shaders are invoked when no intersection is found.
 | Entry functions of this stage are annotated with **[shader("miss")]** in HLSL source.
 | Such entry functions return void and accept exactly one argument. First argument must be an inout variable of user defined structure type.

 | For example:

 .. code:: hlsl

   struct Payload {
     float4 color;
   };
   [shader("miss")]
   void main(inout Payload a) {
     a.color = float4(0.0f,1.0f,0.0f,0.0f);
   }

 Callable Stage
 ~~~~~~~~~~~~~~

 | Callables are generic function calls which can be invoked from either raygeneration, closest-hit,
 | miss or callable shader stages.
 | Entry functions of this stage are annotated with **[shader("callable")]** in HLSL source.
 | Such entry functions must return void and accept exactly one argument. First argument must be an inout
 | variable of user defined structure type.

 | For example:

 .. code:: hlsl

   struct CallData {
     float4 data;
   };
   [shader("callable")]
   void main(inout CallData a) {
     a.color = float4(0.0f,1.0f,0.0f,0.0f);
   }

 Mesh and Amplification Shaders
 ------------------------------

 | DirectX adds 2 new shader stages for using MeshShading pipeline namely Mesh and Amplification.
 | Amplification shaders corresponds to Task Shaders in Vulkan.
 |
 | Refer to following HLSL and SPIR-V specs for details:
 | https://docs.microsoft.com/<TBD>
 | https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_mesh_shader.asciidoc
 |
 | This section describes how Mesh and Amplification shaders are translated to SPIR-V for Vulkan.

 Entry Point Attributes
 ~~~~~~~~~~~~~~~~~~~~~~
 The following HLSL attributes are attached to the main entry point of Mesh and/or Amplification
 shaders and are translated to SPIR-V execution modes according to the table below:

 .. table:: Mapping from HLSL attribute to SPIR-V execution mode

 +-----------------------+--------------------+-------------------------+
 |  HLSL Attribute       |   Value            | SPIR-V Execution Mode   |
 +=======================+====================+=========================+
 |``outputtopology``     | ``point``          | ``OutputPoints``        |
 |                       +--------------------+-------------------------+
 | (SPV_NV_mesh_shader)  | ``line``           | ``OutputLinesNV``       |
 |                       |                    |                         |
 |                       +--------------------+-------------------------+
 |                       | ``triangle``       | ``OutputTrianglesNV``   |
 +-----------------------+--------------------+-------------------------+
 |``outputtopology``     | ``point``          | ``OutputPoints``        |
 |                       +--------------------+-------------------------+
 | (SPV_EXT_mesh_shader) | ``line``           | ``OutputLinesEXT``      |
 |                       |                    |                         |
 |                       +--------------------+-------------------------+
 |                       | ``triangle``       | ``OutputTrianglesEXT``  |
 +-----------------------+--------------------+-------------------------+
 | ``numthreads``        | ``X, Y, Z``        | ``LocalSize X, Y, Z``   |
 |                       |                    |                         |
 |                       | ``(X*Y*Z <= 128)`` |                         |
 +-----------------------+--------------------+-------------------------+

 Intrinsics
 ~~~~~~~~~~
 The following HLSL intrinsics are used in Mesh or Amplification shaders
 and are translated to SPIR-V intrinsics according to the table below:

 .. table:: Mapping from HLSL intrinsics to SPIR-V intrinsics for SPV_NV_mesh_shader

 +---------------------------+--------------------+-----------------------------------------+
 |  HLSL Intrinsic           |  Parameters        | SPIR-V Intrinsic                        |
 +===========================+====================+=========================================+
 | ``SetMeshOutputCounts``   | ``numVertices``    | ``PrimitiveCountNV numPrimitives``      |
 |                           |                    |                                         |
 | ``(Mesh shader)``         | ``numPrimitives``  |                                         |
 +---------------------------+--------------------+-----------------------------------------+
 | ``DispatchMesh``          | ``ThreadX``        | ``OpControlBarrier``                    |
 |                           |                    |                                         |
 | ``(Amplification shader)``| ``ThreadY``        | ``TaskCountNV ThreadX*ThreadY*ThreadZ`` |
 |                           |                    |                                         |
 |                           | ``ThreadZ``        |                                         |
 |                           |                    |                                         |
 |                           | ``MeshPayload``    |                                         |
 +---------------------------+--------------------+-----------------------------------------+

 .. table:: Mapping from HLSL intrinsics to SPIR-V intrinsics for SPV_EXT_mesh_shader

 +---------------------------+--------------------+--------------------------------------------------------------+
 |  HLSL Intrinsic           |  Parameters        | SPIR-V Intrinsic                                             |
 +===========================+====================+==============================================================+
 | ``SetMeshOutputCounts``   | ``numVertices``    | ``OpSetMeshOutputsEXT``                                      |
 |                           |                    |                                                              |
 | ``(Mesh shader)``         | ``numPrimitives``  |                                                              |
 +---------------------------+--------------------+--------------------------------------------------------------+
 | ``DispatchMesh``          | ``ThreadX``        | ``OpEmitMeshTasksEXT ThreadX ThreadY ThreadZ MeshPayload``   |
 |                           |                    |                                                              |
 | ``(Amplification shader)``| ``ThreadY``        | ``TaskCountNV ThreadX*ThreadY*ThreadZ``                      |
 |                           |                    |                                                              |
 |                           | ``ThreadZ``        |                                                              |
 |                           |                    |                                                              |
 |                           | ``MeshPayload``    |                                                              |
 +---------------------------+--------------------+--------------------------------------------------------------+

 | Note : For ``DispatchMesh`` intrinsic, we also emit ``MeshPayload`` as output block with ``PerTaskNV`` decoration

 Mesh Interface Variables
 ~~~~~~~~~~~~~~~~~~~~~~~~
 | Interface variables are defined for Mesh shaders using HLSL modifiers.
 | Following table gives high level overview of the mapping:
 |

 .. table:: Mapping from HLSL modifiers to SPIR-V definitions

 +-----------------+-------------------------------------------------------------------------+
 |  HLSL modifier  | SPIR-V definition                                                       |
 +=================+=========================================================================+
 | ``indices``     | Maps to SPIR-V intrinsic ``PrimitiveIndicesNV``                         |
 |                 |                                                                         |
 |                 | Defines SPIR-V Execution Mode ``OutputPrimitivesNV <array-size>``       |
 +-----------------+-------------------------------------------------------------------------+
 | ``vertices``    | Maps to per-vertex out attributes                                       |
 |                 |                                                                         |
 |                 | Defines existing SPIR-V Execution Mode ``OutputVertices <array-size>``  |
 +-----------------+-------------------------------------------------------------------------+
 | ``primitives``  | Maps to per-primitive out attributes with ``PerPrimitiveNV`` decoration |
 +-----------------+-------------------------------------------------------------------------+
 | ``payload``     | Maps to per-task in attributes with ``PerTaskNV`` decoration            |
 +-----------------+-------------------------------------------------------------------------+


 Raytracing in Vulkan and SPIRV
 ==============================

 | SPIR-V codegen is currently supported for NVIDIA platforms via SPV_NV_ray_tracing extension or
 | on other platforms via provisional cross vendor SPV_KHR_ray_tracing extension.
 | SPIR-V specification for reference:
 | https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/NV/SPV_NV_ray_tracing.asciidoc
 | https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_ray_tracing.asciidoc

 | Vulkan ray tracing samples:
 | https://developer.nvidia.com/rtx/raytracing/vkray


 Raytracing Mapping to SPIR-V
 ----------------------------

 Intrinsics
 ~~~~~~~~~~


 | Following table provides mapping for system value intrinsics along with supported shader stages.

 ============================    ===============================    ====== ============ =========== ======= ======== ========
         HLSL                               SPIR-V                               HLSL Shader Stage
 ----------------------------    -------------------------------    ---------------------------------------------------------
   System Value Intrinsic               Builtin                     Raygen Intersection Closest Hit Any Hit   Miss   Callable
 ============================    ===============================    ====== ============ =========== ======= ======== ========
 ``DispatchRaysIndex()``         ``LaunchId{NV/KHR}``               ✓       ✓             ✓          ✓       ✓       ✓
 ``DispatchRaysDimensions()``    ``LaunchSize{NV/KHR}``             ✓       ✓             ✓          ✓       ✓       ✓
 ``WorldRayOrigin()``            ``WorldRayOrigin{NV/KHR}``                 ✓             ✓          ✓       ✓
 ``WorldRayDirection()``         ``WorldRayDirection{NV/KHR}``              ✓             ✓          ✓       ✓
 ``RayTMin()``                   ``RayTmin{NV/KHR}``                        ✓             ✓          ✓       ✓
 ``RayTCurrent()``               ``HitT{NV/KHR}``                           ✓             ✓          ✓       ✓
 ``RayFlags()``                  ``IncomingRayFlags{NV/KHR}``               ✓             ✓          ✓       ✓
 ``InstanceIndex()``             ``InstanceId``                             ✓             ✓          ✓
 ``GeometryIndex()``             ``RayGeometryIndexKHR``                    ✓             ✓          ✓
 ``InstanceID()``                ``InstanceCustomIndex{NV/KHR}``            ✓             ✓          ✓
 ``PrimitiveIndex()``            ``PrimitiveId``                            ✓             ✓          ✓
 ``ObjectRayOrigin()``           ``ObjectRayOrigin{NV/KHR}``                ✓             ✓          ✓
 ``ObjectRayDirection()``        ``ObjectRayDirection{NV/KHR}``             ✓             ✓          ✓
 ``ObjectToWorld3x4()``          ``ObjectToWorld{NV/KHR}``                  ✓             ✓          ✓
 ``ObjectToWorld4x3()``          ``ObjectToWorld{NV/KHR}``                  ✓             ✓          ✓
 ``WorldToObject3x4()``          ``WorldToObject{NV/KHR}``                  ✓             ✓          ✓
 ``WorldToObject4x3()``          ``WorldToObject{NV/KHR}``                  ✓             ✓          ✓
 ``HitKind()``                   ``HitKind{NV/KHR}``                        ✓             ✓          ✓
 ============================    ===============================    ====== ============ =========== ======= ======== ========

 | *There is no separate builtin for transposed matrices ObjectToWorld3x4 and WorldToObject3x4 in SPIR-V hence we internally transpose during translation*
 | *GeometryIndex() is only supported under SPV_KHR_ray_tracing extension.*

 | Following table provides mapping for other intrinsics along with supported shader stages.


 ===========================     =================================     ====== ============ =========== ======= ===== ========
         HLSL                               SPIR-V                                 HLSL Shader Stage
 ---------------------------     ---------------------------------     ------------------------------------------------------
    Intrinsic                              Opcode                      Raygen Intersection Closest Hit Any Hit  Miss Callable
 ===========================     =================================     ====== ============ =========== ======= ===== ========
 ``TraceRay``                    ``OpTrace{NV/KHR}``                     ✓                    ✓                    ✓
 ``ReportHit``                   ``OpReportIntersection{NV/KHR}``        ✓         ✓
 ``IgnoreHit``                   ``OpIgnoreIntersection{NV/KHR}``        ✓                             ✓
 ``AcceptHitAndEndSearch``       ``OpTerminateRay{NV/KHR}``              ✓                             ✓
 ``CallShader``                  ``OpExecuteCallable{NV/KHR}``           ✓                    ✓             ✓     ✓
 ===========================     =================================     ====== ============ =========== ======= ===== ========


 Resource Types
 ~~~~~~~~~~~~~~

 | Following table provides mapping for new resource types supported in all raytracing shaders.


 ===================================     =======================================
         HLSL Type                                   SPIR-V Opcode
 -----------------------------------     ---------------------------------------
 ``RaytracingAccelerationStructure``     ``OpTypeAccelerationStructure{NV/KHR}``
 ===================================     =======================================

 Interface Variables
 ~~~~~~~~~~~~~~~~~~~

 | Interface variables are created for various ray tracing storage classes based on intrinsic/shader stage
 | Following table gives high level overview of the mapping.


 =================================       ===========================================================
    SPIR-V Storage Class                        Created For
 ---------------------------------       -----------------------------------------------------------
 ``RayPayload{NV/KHR}``                  Last argument to TraceRay
 ``IncomingRayPayload{NV/KHR}``          First argument of entry for AnyHit/ClosestHit & Miss stage
 ``HitAttribute{NV/KHR}``                Last argument to ReportHit
 ``CallableData{NV/KHR}``                Last argument to CallShader
 ``IncomingCallableData{NV/KHR}``        First argument of entry for Callable stage
 =================================       ===========================================================

 RayQuery
 --------

 Ray Query is subfeature of the DirectX ray tracing and belongs to the DirectX ray tracing spec 1.1 (DXR 1.1).
 DirectX add RayQuery object type and its member TraceRayInline() to do the TraceRay() that doesn't
 use any seperate ray-tracing shader stages.
 Shaders can instantiate RayQuery objects as local variables, the RayQuery object acts as a state
 machine for ray query. The shader interacts with the RayQuery object's methods to advance the
 query through an acceleration structure and query traversal information

 Refer to following pages for details:
 https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html

 A flow chart for a simple ray query process

 ::

           +------------------------------+
           |   RayQuery<RAY_FLAG_NONE> q  |
           +------------------------------+
                          |
                          V
           +------------------------------+
           |      q.TraceRayInline()      |
           +------------------------------+
                   |               — — — — — — — — — — — — —
                   |              |                         |
                   |              |              +------------------------+
                   |              |              | Your intersection code |
                   |              |              +------------------------+
                   |              |                         ^
                   V              V                         |
           +------------------------------+      +---------------------+
           |  q.Proceed() // AS traversal |      |  q.CandidateType()  |
           +------------------------------+      +---------------------+
                |                   |                       ^
            No  |                   | Yes                   |
                |                   |_ _ _ _ _ _ _ _ _ _ _ _|
                V
          +------------------------------+
          |     q.CommittedStatus()      |
          +------------------------------+
                        |
                        V
         +----------------------------------+
         | Your Intersection/shader code    |
         +----------------------------------+


 Example:

 .. code:: hlsl

   void main() {
     RayQuery<RAY_FLAG_CULL_NON_OPAQUE | RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH> q;
     q.TraceRayInline(myAccelerationStructure, 0 , 0xff, myRay);

     // Proceed() is AccelerationStructure traversal loop take places
     while(q.Proceed()) {
       switch(q.CandidateType()) {
         // retrieve intersection information/Do the shadering
       }
     }

     // AccelerationStructure traversal end
     // Get the Committed status
     switch(q.CommittedStatus()) {
       // retrieve intersection information/ Do the shadering
     }
   }

 Ray Query in SPIRV
 ~~~~~~~~~~~~~~~~~~
 RayQuery SPIR-V codegen is currently supported via SPV_KHR_ray_query extension
 SPIR-V specification for reference:
 https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_ray_query.asciidoc

 Object Type
 ~~~~~~~~~~~
 RayQuery<RAY_FLAGS>

 RayQuery represents the state of an inline ray tracing call into an acceleration structure.


 ============ ================================
  HLSL Type            SPIR-V Opcode
 ------------ --------------------------------
 ``RayQuery`` ``OpTypeRayQueryKHR``
 ============ ================================

 RayQuery Mapping to SPIR-V
 ~~~~~~~~~~~~~~~~~~~~~~~~~~

 +---------------------------------------------------+-------------------------------------------------------------------------+
 |      HLSL  RayQuery member Intrinsic              |             SPIR-V Opcode                                               |
 +===================================================+=========================================================================+
 |``.Abort``                                         | ``OpRayQueryTerminateKHR``                                              |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateType``                                 | ``OpRayQueryGetIntersectionTypeKHR``                                    |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateProceduralPrimitiveNonOpaque``         | ``OpRayQueryGetIntersectionCandidateAABBOpaqueKHR``                     |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateInstanceIndex``                        | ``OpRayQueryGetIntersectionInstanceIdKHR``                              |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateInstanceID``                           | ``OpRayQueryGetIntersectionInstanceCustomIndexKHR``                     |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 | ``.CandidateInstanceContributionToHitGroupIndex`` | ``OpRayQueryGetIntersectionInstanceShaderBindingTableRecordOffsetKHR``  |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateGeometryIndex``                        | ``OpRayQueryGetIntersectionGeometryIndexKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidatePrimitiveIndex``                       | ``OpRayQueryGetIntersectionPrimitiveIndexKHR``                          |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateObjectRayOrigin``                      | ``OpRayQueryGetIntersectionObjectRayOriginKHR``                         |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateObjectRayDirection``                   | ``OpRayQueryGetIntersectionObjectRayDirectionKHR``                      |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateObjectToWorld3x4``                     | ``OpRayQueryGetIntersectionObjectToWorldKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateObjectToWorld4x3``                     | ``OpRayQueryGetIntersectionObjectToWorldKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateWorldToObject3x4``                     | ``OpRayQueryGetIntersectionWorldToObjectKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateWorldToObject4x3``                     | ``OpRayQueryGetIntersectionWorldToObjectKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateTriangleBarycentrics``                 | ``OpRayQueryGetIntersectionBarycentricsKHR``                            |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CandidateTriangleFrontFace``                    | ``OpRayQueryGetIntersectionFrontFaceKHR``                               |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedStatus``                               | ``OpRayQueryGetIntersectionTypeKHR``                                    |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedInstanceIndex``                        | ``OpRayQueryGetIntersectionInstanceIdKHR``                              |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedInstanceID``                           | ``OpRayQueryGetIntersectionInstanceCustomIndexKHR``                     |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 | ``.CommittedInstanceContributionToHitGroupIndex`` |  ``OpRayQueryGetIntersectionInstanceShaderBindingTableRecordOffsetKHR`` |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedGeometryIndex``                        | ``OpRayQueryGetIntersectionGeometryIndexKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedPrimitiveIndex``                       | ``OpRayQueryGetIntersectionPrimitiveIndexKHR``                          |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedRayT``                                 | ``OpRayQueryGetIntersectionTKHR``                                       |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedObjectRayOrigin``                      | ``OpRayQueryGetIntersectionObjectRayOriginKHR``                         |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedObjectRayDirection``                   | ``OpRayQueryGetIntersectionObjectRayDirectionKHR``                      |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedObjectToWorld3x4``                     | ``OpRayQueryGetIntersectionObjectToWorldKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedObjectToWorld4x3``                     | ``OpRayQueryGetIntersectionObjectToWorldKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedWorldToObject3x4``                     | ``OpRayQueryGetIntersectionWorldToObjectKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedWorldToObject4x3``                     | ``OpRayQueryGetIntersectionWorldToObjectKHR``                           |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedTriangleBarycentrics``                 | ``OpRayQueryGetIntersectionBarycentricsKHR``                            |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommittedTriangleFrontFace``                    | ``OpRayQueryGetIntersectionFrontFaceKHR``                               |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommitNonOpaqueTriangleHit``                    | ``OpRayQueryConfirmIntersectionKHR``                                    |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.CommitProceduralPrimitiveHit``                  | ``OpRayQueryGenerateIntersectionKHR``                                   |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.Proceed``                                       | ``OpRayQueryProceedKHR``                                                |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.RayFlags``                                      | ``OpRayQueryGetRayFlagsKHR``                                            |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.RayTMin``                                       | ``OpRayQueryGetRayTMinKHR``                                             |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.TraceRayInline``                                | ``OpRayQueryInitializeKHR``                                             |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.WorldRayDirection``                             | ``OpRayQueryGetWorldRayDirectionKHR``                                   |
 +---------------------------------------------------+-------------------------------------------------------------------------+
 |``.WorldRayOrigin`                                 | ``OpRayQueryGetWorldRayOriginKHR``                                      |
 +---------------------------------------------------+-------------------------------------------------------------------------+

 Shader Model 6.0 Wave Intrinsics
 ================================


 Note that Wave intrinsics requires SPIR-V 1.3, which is supported by Vulkan 1.1.
 If you use wave intrinsics in your source code, you will need to specify
 -fspv-target-env=vulkan1.1 via the command line to target Vulkan 1.1.

 Shader model 6.0 introduces a set of wave operations. Apart from
 ``WaveGetLaneCount()`` and ``WaveGetLaneIndex()``, which are translated into
 loading from SPIR-V builtin variable ``SubgroupSize`` and
 ``SubgroupLocalInvocationId`` respectively, the rest are translated into SPIR-V
 group operations with ``Subgroup`` scope according to the following chart:

 ============= ============================ =================================== ======================
 Wave Category       Wave Intrinsics               SPIR-V Opcode                SPIR-V Group Operation
 ============= ============================ =================================== ======================
 Query         ``WaveIsFirstLane()``        ``OpGroupNonUniformElect``
 Vote          ``WaveActiveAnyTrue()``      ``OpGroupNonUniformAny``
 Vote          ``WaveActiveAllTrue()``      ``OpGroupNonUniformAll``
 Vote          ``WaveActiveBallot()``       ``OpGroupNonUniformBallot``
 Reduction     ``WaveActiveAllEqual()``     ``OpGroupNonUniformAllEqual``       ``Reduction``
 Reduction     ``WaveActiveCountBits()``    ``OpGroupNonUniformBallotBitCount`` ``Reduction``
 Reduction     ``WaveActiveSum()``          ``OpGroupNonUniform*Add``           ``Reduction``
 Reduction     ``WaveActiveProduct()``      ``OpGroupNonUniform*Mul``           ``Reduction``
 Reduction     ``WaveActiveBitAdd()``       ``OpGroupNonUniformBitwiseAnd``     ``Reduction``
 Reduction     ``WaveActiveBitOr()``        ``OpGroupNonUniformBitwiseOr``      ``Reduction``
 Reduction     ``WaveActiveBitXor()``       ``OpGroupNonUniformBitwiseXor``     ``Reduction``
 Reduction     ``WaveActiveMin()``          ``OpGroupNonUniform*Min``           ``Reduction``
 Reduction     ``WaveActiveMax()``          ``OpGroupNonUniform*Max``           ``Reduction``
 Scan/Prefix   ``WavePrefixSum()``          ``OpGroupNonUniform*Add``           ``ExclusiveScan``
 Scan/Prefix   ``WavePrefixProduct()``      ``OpGroupNonUniform*Mul``           ``ExclusiveScan``
 Scan/Prefix   ``WavePrefixCountBits()``    ``OpGroupNonUniformBallotBitCount`` ``ExclusiveScan``
 Broadcast     ``WaveReadLaneAt()``         ``OpGroupNonUniformBroadcast``
 Broadcast     ``WaveReadLaneFirst()``      ``OpGroupNonUniformBroadcastFirst``
 Quad          ``QuadReadAcrossX()``        ``OpGroupNonUniformQuadSwap``
 Quad          ``QuadReadAcrossY()``        ``OpGroupNonUniformQuadSwap``
 Quad          ``QuadReadAcrossDiagonal()`` ``OpGroupNonUniformQuadSwap``
 Quad          ``QuadReadLaneAt()``         ``OpGroupNonUniformQuadBroadcast``
 ============= ============================ =================================== ======================

 The Implicit ``vk`` Namespace
 =============================

 Overview
 --------
 We have introduced an implicit namepace (called ``vk``) that will be home to all
 Vulkan-specific functions, enums, etc. Given the similarity between HLSL and
 C++, developers are likely familiar with namespaces -- and implicit namespaces
 (e.g. ``std::`` in C++). The ``vk`` namespace provides an interface for expressing
 Vulkan-specific features (core spec and KHR extensions).

 **The compiler will generate the proper error message (** ``unknown 'vk' identifier`` **)
 if** ``vk::`` **is used for compiling to DXIL.**

 Any intrinsic function or enum in the vk namespace will be deprecated if an
 equivalent one is added to the default namepsace.

 Current Features
 ----------------
 The following intrinsic functions and constants are currently defined in the
 implicit ``vk`` namepsace.

 .. code:: hlsl

   // Implicitly defined when compiling to SPIR-V.
   namespace vk {

     const uint CrossDeviceScope = 0;
     const uint DeviceScope      = 1;
     const uint WorkgroupScope   = 2;
     const uint SubgroupScope    = 3;
     const uint InvocationScope  = 4;
     const uint QueueFamilyScope = 5;

     uint64_t ReadClock(in uint scope);
     T        RawBufferLoad<T = uint>(in uint64_t deviceAddress,
                                      in uint alignment = 4);
   } // end namespace


 Intrinsic Constants
 -------------------
 The following constants are currently defined:

 ========================  ============================================
   Constant                value   (SPIR-V constant equivalent, if any)
 ========================  ============================================
 ``vk::CrossDeviceScope``    ``0`` (``CrossDevice``)
 ``vk::DeviceScope``         ``1`` (``Device``)
 ``vk::WorkgroupScope``      ``2`` (``Workgroup``)
 ``vk::SubgroupScope``       ``3`` (``Subgroup``)
 ``vk::InvocationScope``     ``4`` (``Invocation``)
 ``vk::QueueFamilyScope``    ``5`` (``QueueFamily``)
 ========================  ============================================

 Intrinsic Functions
 -------------------

 ReadClock
 ~~~~~~~~~
 This intrinsic funcion has the following signature:

 .. code:: hlsl

   uint64_t ReadClock(in uint scope);

 It translates to performing ``OpReadClockKHR`` defined in `VK_KHR_shader_clock <https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_shader_clock.html>`_.
 One can use the predefined scopes in the ``vk`` namepsace to specify the scope argument.
 For example:

 .. code:: hlsl

   uint64_t clock = vk::ReadClock(vk::SubgroupScope);

 RawBufferLoad and RawBufferStore
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The Vulkan extension `VK_KHR_buffer_device_address <https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_buffer_device_address.html>`_
 supports getting the 64-bit address of a buffer and passing it to SPIR-V as a
 Uniform buffer. SPIR-V can use the address to load and store data without a descriptor.
 We add the following intrinsic functions to expose a subset of the
 `VK_KHR_buffer_device_address <https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_KHR_buffer_device_address.html>`_
 and `SPV_KHR_physical_storage_buffer <https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_physical_storage_buffer.asciidoc>`_
 functionality to HLSL:

 .. code:: hlsl

   // RawBufferLoad and RawBufferStore use 'uint' for the default template argument.
   // The default alignment is 4. Note that 'alignment' must be a constant integer.
   T RawBufferLoad<T = uint>(in uint64_t deviceAddress, in uint alignment = 4);
   void RawBufferStore<T = uint>(in uint64_t deviceAddress, in T value, in uint alignment = 4);


 These intrinsics allow the shader program to load and store a single value with type T (int, float2, struct, etc...)
 from GPU accessible memory at given address, similar to ``ByteAddressBuffer.Load()``.
 Additionally, these intrinsics allow users to set the memory alignment for the underlying data.
 We assume a 'uint' type when the template argument is missing, and we use a value of '4' for the default alignment.
 Note that the alignment argument must be a constant integer if it is given.

 Though we do support setting the `alignment` of the data load and store, we do not currently
 support setting the memory layout for the data. Since these intrinsics are supposed to load
 "arbitrary" data to or from a random device address, we assume that the program loads/stores some "bytes of data",
 but that its format or layout is unknown. Therefore, keep in mind that these intrinsics
 load or store ``sizeof(T)`` bytes of data, and that loading/storing data with a struct
 with a custom memory alignment may yield undefined behavior due to the missing custom memory layout support.
 Loading data with customized memory layouts is future work.

 Using either of these intrinsics adds ``PhysicalStorageBufferAddresses`` capability and
 ``SPV_KHR_physical_storage_buffer`` extension requirements as well as changing
 the addressing model to ``PhysicalStorageBuffer64``.

 Example:

 .. code:: hlsl

   uint64_t address;
   [numthreads(32, 1, 1)]
   void main(uint3 tid : SV_DispatchThreadID) {
     double foo = vk::RawBufferLoad<double>(address, 8);
     uint bar = vk::RawBufferLoad(address + 8);
     ...
     vk::RawBufferStore<uint>(address + tid.x, bar + tid.x);
   }

 Inline SPIR-V (HLSL version of GL_EXT_spirv_intrinsics)
 =======================================================

 GL_EXT_spirv_intrinsics is an extension of GLSL that allows users to embed
 arbitrary SPIR-V instructions in the GLSL code similar to the concept of
 inline assembly in the C code. We support the HLSL version of
 GL_EXT_spirv_intrinsics. See
 `wiki <https://github.com/microsoft/DirectXShaderCompiler/wiki/GL_EXT_spirv_intrinsics-for-SPIR-V-code-gen>`_
 for the details.

 Supported Command-line Options
 ==============================

 Command-line options supported by SPIR-V CodeGen are listed below. They are
 also recognized by the library API calls.

 General options
 ---------------

 - ``-T``: specifies shader profile
 - ``-E``: specifies entry point
 - ``-D``: Defines macro
 - ``-I``: Adds directory to include search path
 - ``-O{|0|1|2|3}``: Specifies optimization level
 - ``-enable-16bit-types``: enables 16-bit types and disables min precision types
 - ``-Zpc``: Packs matrices in column-major order by deafult
 - ``-Zpr``: Packs matrices in row-major order by deafult
 - ``-Fc``: outputs SPIR-V disassembly to the given file
 - ``-Fe``: outputs warnings and errors to the given file
 - ``-Fo``: outputs SPIR-V code to the given file
 - ``-Fh``: outputs SPIR-V code as a header file
 - ``-Vn``: specifies the variable name for SPIR-V code in generated header file
 - ``-Zi``: Emits more debug information (see `Debugging`_)
 - ``-Cc``: colorizes SPIR-V disassembly
 - ``-No``: adds instruction byte offsets to SPIR-V disassembly
 - ``-H``:  Shows header includes and nesting depth
 - ``-Vi``: Shows details about the include process
 - ``-Vd``: Disables SPIR-V verification
 - ``-WX``: Treats warnings as errors
 - ``-no-warnings``: Suppresses all warnings
 - ``-flegacy-macro-expansion``: expands the operands before performing
   token-pasting operation (fxc behavior)

 Vulkan-specific options
 -----------------------

 The following command line options are added into ``dxc`` to support SPIR-V
 codegen for Vulkan:

 - ``-spirv``: Generates SPIR-V code.
 - ``-fvk-b-shift N M``: Shifts by ``N`` the inferred binding numbers for all
   resources in b-type registers of space ``M``. Specifically, for a resouce
   attached with ``:register(bX, spaceM)`` but not ``[vk::binding(...)]``,
   sets its Vulkan descriptor set to ``M`` and binding number to ``X + N``. If
   you need to shift the inferred binding numbers for more than one space,
   provide more than one such option. If more than one such option is provided
   for the same space, the last one takes effect. If you need to shift the
   inferred binding numbers for all sets, use ``all`` as ``M``.
   See `HLSL register and Vulkan binding`_ for explanation and examples.
 - ``-fvk-t-shift N M``, similar to ``-fvk-b-shift``, but for t-type registers.
 - ``-fvk-s-shift N M``, similar to ``-fvk-b-shift``, but for s-type registers.
 - ``-fvk-u-shift N M``, similar to ``-fvk-b-shift``, but for u-type registers.
 - ``-fvk-auto-shift-bindings``: Automatically detects the register type for
   resources that are missing the ``:register`` assignment, so the above shifts
   can be applied to them if needed.
 - ``-fvk-bind-register xX Y N M`` (short alias: ``-vkbr``): Binds the resouce
   at ``register(xX, spaceY)`` to descriptor set ``M`` and binding ``N``. This
   option cannot be used together with other binding assignment options.
   It requires all source code resources have ``:register()`` attribute and
   all registers have corresponding Vulkan descriptors specified using this
   option. If the ``$Globals`` cbuffer resource is used, it must also be bound
   with ``-fvk-bind-globals``.
 - ``-fvk-bind-globals N M``: Places the ``$Globals`` cbuffer at
   descriptor set #M and binding #N. See `HLSL global variables and Vulkan binding`_
   for explanation and examples.
 - ``-fvk-use-gl-layout``: Uses strict OpenGL ``std140``/``std430``
   layout rules for resources.
 - ``-fvk-use-dx-layout``: Uses DirectX layout rules for resources.
 - ``-fvk-invert-y``: Negates (additively inverts) SV_Position.y before writing
   to stage output. Used to accommodate the difference between Vulkan's
   coordinate system and DirectX's. Only allowed in VS/DS/GS.
 - ``-fvk-use-dx-position-w``: Reciprocates (multiplicatively inverts)
   SV_Position.w after reading from stage input. Used to accommodate the
   difference between Vulkan DirectX: the w component of SV_Position in PS is
   stored as 1/w in Vulkan. Only recognized in PS; applying to other stages
   is no-op.
 - ``-fvk-stage-io-order={alpha|decl}``: Assigns the stage input/output variable
   location number according to alphabetical order or declaration order. See
   `HLSL semantic and Vulkan Location`_ for more details.
 - ``-fspv-reflect``: Emits additional SPIR-V instructions to aid reflection.
 - ``-fspv-debug=<category>``: Controls what category of debug information
   should be emitted. Accepted values are ``file``, ``source``, ``line``, and
   ``tool``. See `Debugging`_ for more details.
 - ``-fspv-extension=<extension>``: Only allows using ``<extension>`` in CodeGen.
   If you want to allow multiple extensions, provide more than one such option. If you
   want to allow *all* KHR extensions, use ``-fspv-extension=KHR``.
 - ``-fspv-target-env=<env>``: Specifies the target environment for this compilation.
   The current valid options are ``vulkan1.0`` and ``vulkan1.1``. If no target
   environment is provided, ``vulkan1.0`` is used as default.
 - ``-fspv-flatten-resource-arrays``: Flattens arrays of textures and samplers
   into individual resources, each taking one binding number. For example, an
   array of 3 textures will become 3 texture resources taking 3 binding numbers.
   This makes the behavior similar to DX. Without this option, you would get 1
   array object taking 1 binding number. Note that arrays of
   {RW|Append|Consume}StructuredBuffers are currently not supported in the
   SPIR-V backend. Also note that this requires the optimizer to be able to
   resolve all array accesses with constant indeces. Therefore, all loops using
   the resource arrays must be marked with ``[unroll]``.
 - ``-fspv-entrypoint-name=<name>``: Specify the SPIR-V entry point name. Defaults
   to the HLSL entry point name.
 - ``-fspv-use-legacy-buffer-matrix-order``: Assumes the legacy matrix order (row
   major) when accessing raw buffers (e.g., ByteAdddressBuffer).
 - ``-fspv-preserve-interface``: Preserves all interface variables in the entry
   point, even when those variables are unused.
 - ``-Wno-vk-ignored-features``: Does not emit warnings on ignored features
   resulting from no Vulkan support, e.g., cbuffer member initializer.

 Unsupported HLSL Features
 =========================

 The following HLSL language features are not supported in SPIR-V codegen,
 either because of no Vulkan equivalents at the moment, or because of deprecation.

 * Literal/immediate sampler state: deprecated feature. The compiler will
   emit a warning and ignore it.
 * ``abort()`` intrinsic function: no Vulkan equivalent. The compiler will emit
   an error.
 * ``GetRenderTargetSampleCount()`` intrinsic function: no Vulkan equivalent.
   (Its GLSL counterpart is ``gl_NumSamples``, which is not available in GLSL for
   Vulkan.) The compiler will emit an error.
 * ``GetRenderTargetSamplePosition()`` intrinsic function: no Vulkan equivalent.
   (``gl_SamplePosition`` provides similar functionality but it's only for the
   sample currently being processed.) The compiler will emit an error.
 * ``tex*()`` intrinsic functions: deprecated features. The compiler will
   emit errors.
 * ``.GatherCmpGreen()``, ``.GatherCmpBlue()``, ``.GatherCmpAlpha()`` intrinsic
   method: no Vulkan equivalent. (SPIR-V ``OpImageDrefGather`` instruction does
   not take component as input.) The compiler will emit an error.
 * Since ``StructuredBuffer``, ``RWStructuredBuffer``, ``ByteAddressBuffer``, and
   ``RWByteAddressBuffer`` are not represented as image types in SPIR-V, using the
   output unsigned integer ``status`` argument in their ``Load*`` methods is not
   supported. Using these methods with the ``status`` argument will cause a compiler error.
 * Applying ``row_major`` or ``column_major`` attributes to a stand-alone matrix will be
   ignored by the compiler because ``RowMajor`` and ``ColMajor`` decorations in SPIR-V are
   only allowed to be applied to members of structures. A warning will be issued by the compiler.
 * The Hull shader ``partitioning`` attribute may not have the ``pow2`` value. The compiler
   will emit an error. Other attribute values are supported and described in the
   `Hull Entry Point Attributes`_ section.
 * ``cbuffer``/``tbuffer`` member initializer: no Vulkan equivalent. The compiler
   will emit an warning and ignore it.

 Appendix
 ==========

 Appendix A. Matrix Representation
 ---------------------------------
 Consider a matrix in HLSL defined as ``float2x3 m;``. Conceptually, this is a matrix with 2 rows and 3 columns.
 This means that you can access its elements via expressions such as ``m[i][j]``, where ``i`` can be ``{0, 1}`` and ``j`` can be ``{0, 1, 2}``.

 Now let's look how matrices are defined in SPIR-V:

 .. code:: spirv

   %columnType = OpTypeVector %float      <number of rows>
      %matType = OpTypeMatrix %columnType <number of columns>

 As you can see, SPIR-V conceptually represents matrices as a collection of vectors where each vector is a *column*.

 Now, let's represent our float2x3 matrix in SPIR-V. If we choose a naive translation (3 columns, each of which is a vector of size 2), we get:

 .. code:: spirv

       %v2float = OpTypeVector %float 2
   %mat3v2float = OpTypeMatrix %v2float 3

 Now, let's use this naive translation to access into the matrix (e.g. ``m[0][2]``). This is evaluated by first finding ``n = m[0]``, and then finding ``n[2]``.
 Notice that in HLSL, ``m[0]`` represents a row, which is a vector of size 3. But accessing the first dimension of the SPIR-V matrix give us
 the first column which is a vector of size 2.

 .. code:: spirv

   ; n is a vector of size 2
   %n = OpAccessChain %v2float %m %int_0

 Notice that in HLSL access ``m[i][j]``, ``i`` can be ``{0, 1}`` and ``j`` can be ``{0, 1, 2}``.
 But in SPIR-V OpAccessChain access, the first index (``i``) can be ``{0, 1, 2}`` and the second index (``j``) can be ``{1, 0}``.
 Therefore, the naive translation does not work well with indexing.

 As a result, we must translate a given HLSL float2x3 matrix (with 2 rows and 3 columns) as a SPIR-V matrix with 3 rows and 2 columns:

 .. code:: spirv

       %v3float = OpTypeVector %float 3
   %mat2v3float = OpTypeMatrix %v3float 2

 This way, all accesses into the matrix can be naturally handled correctly.

 Packing
 ~~~~~~~
 The HLSL ``row_major`` and ``column_major`` type modifiers change the way packing is done.
 The following table provides an example which should make our translation more clear:

 +------------------+---------------------------+---------------------------+-----------------------------+-------------------+
 | Host CPU Data    | HLSL Variable             | GPU (HLSL Representation) | GPU (SPIR-V Representation) | SPIR-V Decoration |
 +==================+===========================+===========================+=============================+===================+
 |``{1,2,3,4,5,6}`` |          ``float2x3``     |  ``[1 3 5]``              |  ``[1 2]``                  |                   |
 |                  |                           |                           |                             |                   |
 |                  |                           |  ``[2 4 6]``              |  ``[3 4]``                  |  ``RowMajor``     |
 |                  |                           |                           |                             |                   |
 |                  |                           |                           |  ``[5 6]``                  |                   |
 +------------------+---------------------------+---------------------------+-----------------------------+-------------------+
 |``{1,2,3,4,5,6}`` | ``column_major float2x3`` |  ``[1 3 5]``              |  ``[1 2]``                  |                   |
 |                  |                           |                           |                             |                   |
 |                  |                           |  ``[2 4 6]``              |  ``[3 4]``                  | ``RowMajor``      |
 |                  |                           |                           |                             |                   |
 |                  |                           |                           |  ``[5 6]``                  |                   |
 +------------------+---------------------------+---------------------------+-----------------------------+-------------------+
 |``{1,2,3,4,5,6}`` |    ``row_major float2x3`` |  ``[1 2 3]``              |  ``[1 4]``                  |                   |
 |                  |                           |                           |                             |                   |
 |                  |                           |  ``[4 5 6]``              |  ``[2 5]``                  | ``ColMajor``      |
 |                  |                           |                           |                             |                   |
 |                  |                           |                           |  ``[3 6]``                  |                   |
 +------------------+---------------------------+---------------------------+-----------------------------+-------------------+