docs/OpenCLCOnVulkan.md - external/github.com/google/clspv - Git at Google

 # OpenCL C 1.2 Language on Vulkan

 ## Overview

 The OpenCL C 1.2 language provides an expressive variant of the C language with
 which to program heterogeneous architectures.
 There is already a significant body of code in the wild written in the OpenCL C
 language - both in open source and proprietary software.
 This document explains how the OpenCL C language is mapped onto an
 implementation of the Vulkan standard for high-performance graphics and compute.

 The following subjects are covered:

 - Which SPIR-V features are used.
 - How the Vulkan API makes use of the Vulkan variant of SPIR-V produced.
 - How OpenCL C language constructs are mapped down onto Vulkan's variant of
   SPIR-V.
 - Restrictions on the OpenCL C language as is to be consumed by a Vulkan
   implementation.

 ## Deprecation Notice

 The sampler map is deprecated. It is no longer necessary to specify a sampler
 map for literal samplers. Refer to the bindings generated for the literal
 samplers in descriptor map. The -samplermap option is still accepted, but
 support will be removed at later date. A single sampler binding is generated
 for each unique literal sampler in the program.

 The descriptor map and it's C++ API are deprecated. Clspv is transitioning to
 embedding equivalent information directly into the SPIR-V binary via a
 [non-semantic instruction
 set](http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_non_semantic_info.html).
 The old '-descriptormap' option is no longer accepted by clspv. That file can
 still be generated by running the new 'clspv-reflection' executable. It will
 produce an equivalent descriptor map.

 ## SPIR-V Features

 The SPIR-V as produced from the OpenCL C language can make use of the following
 additional extensions:

 - _SPV\_KHR\_variable\_pointers_ - to enable the support of more expressive
   pointers that the OpenCL C language can make use of.
 - _SPV\_KHR\_storage\_buffer\_storage\_class_ - required by
   _SPV\_KHR\_variable\_pointers_, to enable use of the StorageBuffer storage
   class.
 - _SPV\_KHR\_non\_semantic\_info_ - to enable embedding reflection information
   directly into the binary.

 The SPIR-V as produced from the OpenCL C language can make use of the following
 capabilities:

 - `Shader` as we are targeting the OpenCL C language at a Vulkan implementation.
 - `VariablePointersStorageBuffer`, from the _SPV\_KHR\_variable\_pointers_ extension.
 - `VariablePointers`, from the _SPV\_KHR\_variable\_pointers_ extension.
   - *Note*: the compiler attempts to add the minimal variable pointers capability required.
 - `Int8` if char or uchar types (or composites of them) are used.
 - `Int16` if short or ushort types (or composites of them) are used.
 - `Int64` if long or ulong types (or composites of them) are used.
 - `Float16` if the half type (or composites of it) is used.
   - *Note*: this requires enabling the _cl\_khr\_fp16_ extension in the source.
 - `Float64` if the double type (or composites of it) is used.
 - `ImageStorageWriteWithoutFormat` if _write\_only_ images are used.
 - `ImageQuery` if any image query is used.
 - `Image1D` if a _write\_only_ image is used.
 - `Sampled1D` if a _read\_only_ image is used.
 - `GroupNonUniform` (see cl_khr_subgroups below)

 The command-line switch '-spv-version' can be used to specify the SPIR-V output version.
 Only '1.0' and '1.3' are currently supported, corresponding with vk versions '1.0' and '1.1'
 respectively.

 ## Vulkan Interaction

 A Vulkan implementation that is to consume the SPIR-V produced from the OpenCL C
 language must conform to the following the rules:

 - If the short/ushort types are used in the OpenCL C:
   - The `shaderInt16` field of `VkPhysicalDeviceFeatures` **must** be set to
     true.
 - If images are used in the OpenCL C:
   - The `shaderStorageImageReadWithoutFormat` field of
     `VkPhysicalDeviceFeatures` **must** be set to true.
   - The `shaderStorageImageWriteWithoutFormat` field of
     `VkPhysicalDeviceFeatures` **must** be set to true.
 - The implementation **must** support extensions _VK\_KHR\_storage\_buffer\_storage\_class_ and
   _VK\_KHR\_variable\_pointers_:
   - A call to `vkCreateDevice()` where the `ppEnabledExtensionNames` field of
     `VkDeviceCreateInfo` contains extension strings
     _"VK\_KHR\_storage\_buffer\_storage\_class"_ and
     _"VK\_KHR\_variable\_pointers"_ **must** succeed.
 - If the implementation does not support _VK\_KHR\_shader\_non\_semantic\_info_
   then shaders should be stripped of non-semantic instruction before loading
   the SPIR-V module.

 ### Storage Capabilities

 In order to pass 8- or 16-bit types in the shader interface, Vulkan requires
 the appropriate bits are set in VkPhysicalDevice8BitStorageFeaturesKHR for
 8-bit types or VkPhysicalDevice16BitStorageFeaturesKHR (or the appropriate
 Vulkan 1.1 or 1.2 feature structs). By default clspv assumes all feature bits
 are enabled, but provides options to disallow 8- or 16-bit interfaces.
 -no-8bit-storage reflects 8-bit storage features and -no-16bit-storage reflects
 16-bit storage features. Each option can take the following values (can be
 specified in a comma-separated list or multiple times):

 - `ssbo`: Represents the storage buffer feature bit.
 - `ubo`: Represents the uniform and storage buffer feature bit.
 - `pushconstant`: Represents the push constant feature bit.

 For example, if your device only supports storageBuffer16BitAccess (and no
 8-bit interfaces), pass the following on the command line:
 ```
 -no-16bit-storage=ubo,pushconstant -no-8bit-storage=ssbo,ubo,pushconstant
 ```

 ### Descriptor Type Mappings

 OpenCL C kernel argument types are mapped to Vulkan descriptor types in the
 following way:

 - If the argument to the kernel is a read only image, the matching Vulkan
   descriptor set type is `VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE`.
 - If the argument to the kernel is a write only image, the matching Vulkan
   descriptor set type is `VK_DESCRIPTOR_TYPE_STORAGE_IMAGE`.
 - If the argument to the kernel is a sampler, the matching Vulkan
   descriptor set type is `VK_DESCRIPTOR_TYPE_SAMPLER`.
 - If the argument to the kernel is a constant or global pointer type, the
   matching Vulkan descriptor set type is `VK_DESCRIPTOR_TYPE_STORAGE_BUFFER`.
   If option -constant-args-ubo' is used and the kernel has constant pointer
   types, set `VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER`.
 - If the argument to the kernel is a plain-old-data type, the matching Vulkan
   descriptor set type is `VK_DESCRIPTOR_TYPE_STORAGE_BUFFER` by default.
   If the option `-pod-ubo` is used the descriptor set type is`VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER`.
   If the option `-pod-pushconstant` is used the arg is instead passed via push
   constants.

 Note: `-pod-ubo` and `-pod-pushconstant` are exclusive options.

 Note: By default, all plain-old-data kernel arguments are collected into a
 single structure to be passed in to the compute shader as a single resource. If
 `-cluster-pod-kernel-args=0` is specified, each plain-old-data argument will be
 passed in a via a unique resource.

 Note: `-pod-pushconstant` cannot be specified with `-cluster-pod-kernel-args=0`.

 ## OpenCL C Modifications

 Some OpenCL C language features that are not natively expressible in Vulkan's
 variant of SPIR-V, require a subtle mapping to how Vulkan SPIR-V represents the
 corresponding functionality.

 ### Compilation

 An additional preprocessor macro `VULKAN` is set, to allow developers to guard
 OpenCL C functionality based on whether the Vulkan API is being targeted or not.
 This value is set to 100, to match Vulkan version 1.0.

 ### Kernels

 OpenCL C language kernels take the form:

     void kernel foo(global int* a, global float* b, uint c, float2 local* d);

 SPIR-V tracks OpenCL C language kernels using `OpEntryPoint` opcodes that denote
 the entry-points where an API interacts with a compute kernel.

 Vulkan's variant of SPIR-V requires that the entry-points be `void` return
 functions, and that they take no arguments.
 To pass data into Vulkan SPIR-V shaders, `OpVariable`s are declared outside of
 the functions, and decorated with `DescriptorSet` and `Binding` decorations, to
 denote that the shaders can interact with their data.

 The default way to map an OpenCL C language kernel to a Vulkan SPIR-V compute
 shader is as follows:

 - If a sampler map file is specified, all literal samplers use descriptor set _0_.
 - By default, all kernels in the translation unit use the same descriptor set
   number, either _0_, _1_, or _2_.  (The particular value depends on whether
   a sampler map is used, and how `__constant` variables are mapped.)
   **This is new default behaviour**.
   - Use option `-distinct-kernel-descriptor-sets` to get the old behaviour,
   where each kernel is assigned its own descriptor set number, such that the first
   kernel has descriptor set _0_, and each subsequent kernel is an increment of
   _1_ from the previous.
 - Except for pointer-to-local arguments, each kernel argument
   is assigned a descriptor binding in that kernel's
   corresponding `DescriptorSet`.
 - If the argument to the kernel is a `global` or `constant` pointer, it is
   placed into a SPIR-V `OpTypeStruct` that is decorated with `Block`, and
   an `OpVariable` of this structure type is created and decorated with the
   corresponding `DescriptorSet` and `Binding`, using the `StorageBuffer` storage
   class.
 - If the argument to the kernel is a plain-old-data type, it is placed into a
   SPIR-V `OpTypeStruct` that is decorated with `Block`, and an
   `OpVariable` of this structure type is created and decorated with the
   corresponding `DescriptorSet` and `Binding`, using the `StorageBuffer` storage
   class.
 - If the argument to the kernel is an image or sampler, an `OpVariable` of the
   `OpTypeImage` or `OpTypeSampler` type is created and decorated with the
   corresponding `DescriptorSet` and `Binding`, using the `UniformConstant`
   storage class.
 - If the argument to the kernel is a pointer to type _T_ in `__local` storage,
   then no descriptor is generated.  Instead, that argument is mapped to a
   variable in `Workgroup` storage class, of type array-of-_T_.  The array size
   is specified by an integer specialization constant. The specialization
   ID is reported in the descriptor map file, generated via the `-descriptormap`
   option.

 The shaders produced use the GLSL450 memory model. As such, there is an assumption of
 no aliasing by default. The compiler does not generate *Aliased* decorations
 currently. Users should be aware of this and ensure they are not relying on aliasing.

 #### Embedded Reflection Instructions

 Clspv embeds reflection information via use of the [NonSemantic.ClspvReflection](
 (http://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/nonsemantic/NonSemantic.ClspvReflection.html)
 non-semantic extended instruction set.  It requires
 SPV\_KHR\_non\_semantic\_info. If your Vulkan implementation does not support
 VK\_KHR\_shader\_non\_semantic\_info, the reflection instructions should be
 stripped before loading the module.
 [SPIRV-Tools](https://github.com/KhronosGroup/SPIRV-Tools)'s optimizer has a
 transformation to strip non-semantic instructions (use the `--strip-reflect`
 option).

 The reflection instructions replace the descriptor map.

 #### Descriptor map (DEPRECATED)

 The compiler can report the descriptor set and bindings used for samplers
 in the sampler map and for the kernel arguments, and also array sizing information for
 pointer-to-local arguments.
 Run `clspv-reflection <spirv> -o <descriptor map>` to produce the descriptor map.

 The descriptor map is a text file with comma-separated values.

 Consider this example:

     // First kernel in the translation unit, and no sampler map is used.
     kernel void foo(global int* a, float f, global float* b, uint c) {...}

 It generates the following descriptor map:

     kernel_decl,foo
     kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod,argSize,4
     kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,2,offset,0,argKind,buffer
     kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,3,offset,0,argKind,pod,argSize,4
     spec_constant,workgroup_size_x,spec_id,0
     spec_constant,workgroup_size_y,spec_id,1
     spec_constant,workgroup_size_z,spec_id,2

 The kernels in the module module are declared as follows:
 - `kernel_decl`
 - kernel name

 For kernel arguments of types pointer-to-global, pointer-to-constant, and
 plain-old-data types, the fields are:
 - `kernel` to indicate a kernel argument
 - kernel name
 - `arg` to indicate a kernel argument
 - argument name
 - `argOrdinal` to indicate a kernel argument ordinal position field
 - the argument's 0-based position in the kernel's parameter list
 - `descriptorSet`
 - the DescriptorSet value
 - `binding`
 - the Binding value
 - `offset`
 - The byte offset inside the storage buffer where you should write the argument value.
   This will always be zero, unless you cluster plain-old-data kernel arguments. (See below.)
 - `argKind`
 - a string describing the kind of argument, one of:
   - `buffer` - OpenCL buffer
   - `buffer_ubo` - OpenCL constant buffer. Sent in a uniform buffer.
   - `pod` - Plain Old Data, e.g. a scalar, vector, or structure. Sent in a storage buffer.
   - `pod_ubo` - Plain Old Data, e.g. a scalar, vector, or structure. Sent in a uniform buffer.
   - `pod_pushconstant` - Plain Old Data, e.g. a scalar, vector or structure. Sent in push constants.
   - `ro_image` - Read-only image
   - `wo_image` - Write-only image
   - `sampler` - Sampler
 - `argSize`
 - only present for plain-old-data kernel arguments.

 Module-wide specialization constants are specified as follows:
 - `spec_constant` to describe the use of a module-wide specialization constant
 - specialization constant type (e.g x dimension of WorkgroupSize)
 - `spec_id`
 - the SpecId value

 Consider this example, which uses pointer-to-local arguments:

     kernel void foo(local float* L, global float* A, local float4 *L2) {...}

 It generates the following descriptor map:

     kernel_decl,foo
     kernel,foo,arg,L,argOrdinal,0,argKind,local,arrayElemSize,4,arrayNumElemSpecId,3
     kernel,foo,arg,A,argOrdinal,1,descriptorSet,0,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,L2,argOrdinal,2,argKind,local,arrayElemSize,16,arrayNumElemSpecId,4
     spec_constant,workgroup_size_x,spec_id,0
     spec_constant,workgroup_size_y,spec_id,1
     spec_constant,workgroup_size_z,spec_id,2

 The kernels in the module module are declared as follows:
 - `kernel_decl`
 - kernel name

 For kernel arguments of type pointer-to-local, the fields are:
 - `kernel` to indicate a kernel argument
 - kernel name
 - `arg` to indicate a kernel argument
 - argument name
 - `argOrdinal` to indicate a kernel argument ordinal position field
 - the argument's 0-based position in the kernel's parameter list
 - `argKind`
 - `local` to indicate a pointer-to-local argument
 - `arrayElemSize`
 - the number of bytes in each element of the array
 - `arrayNumElemSpecId`
 - the specialization constant ID used to specify the number of elements to
   allocate for the array in `Workgroup` storage.  Specifically, it is the
   `SpecId` decoration on the integer constant that specficies the array size.
   (This number is always at least 3 so that specialization IDs 0, 1, and 2 can
   be use for the workgroup size dimensions along `x`, `y`, and `z`.)

 Module-wide specialization constants are specified as follows:
 - `spec_constant` to describe the use of a module-wide specialization constant
 - specialization constant type (e.g x dimension of WorkgroupSize)
 - `spec_id`
 - the SpecId value

 Notes: Each pointer-to-local argument is assigned its own array type and
 specialization constant to size the array.  Unless you override the
 array size specialization constant at pipeline creation time, the array will
 only have one element.

 If a sampler map is used, then samplers use descriptor set 0 and kernel descriptor
 set numbers start at 1.  For example, if the sampler map file is `mysamplermap`
 containing:

     CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST,
     CLK_NORMALIZED_COORDS_TRUE  | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR

 Then compiling with:

     clspv foo.cl -samplermap=mysamplermap -descriptormap=mydescriptormap

 Then `mydescriptormap` will contain:

     sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0
     sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1
     kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,1,offset,0,argKind,pod,argSize,4
     kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,2,offset,0,argKind,buffer
     kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,3,offset,0,argKind,pod,argSize,4

 #### Sending in plain-old-data kernel arguments in uniform buffers

 Normally plain-old-data arguments are passed into the kernel via a storage buffer.
 Use option `-pod-ubo` to pass these parameters in via a uniform buffer.  These can
 be faster to read in the shader.

 When option `-pod-ubo` is used, the descriptor map list the `argKind` of a plain-old-data
 argument as `pod_ubo` rather than the default of `pod`.

 #### Sending in plain-old-data kernel arguments in push constants

 Normally plain-old-data arguments are passed into the kernel via a storage buffer.
 Use the option `-pod-pushconstant` to pass these parameters in via push constants. The
 option `-cluster-pod-kernel-args=0` cannot be specified. Push constants are intended
 to provide a fast read path and should be faster to access than a buffer.

 When the option `-pod-pushconstant` is used, the descriptor map lists the `argKind` of
 plain-old-data arguments as `pod_pushconstant` rather than the default of `pod`. There
 is no `descriptorset` or `binding` information for push constants.

 Note: Vulkan implementations have limited push constant storage (default is 128B).
 clspv provides the option `-max-pushconstant-size` to specify (in bytes) the
 implementation limit for push constants. This is validated at the start of the compile.

 Note: Module scope push constanst are currently incompatible with plain-old-data
 arguments sent as push constants.

 TODO(alan-baker): See #529 for the overall plan to address this.

 The descriptor map entry for kernel arguments will not contain `descriptorSet` or
 `binding` entries. For example, an integer arg `f` to kernel `foo` might look like:

   kernel,foo,arg,f,argOrdinal,1,offset,0,argKind,pod_pushconstant,argSize,4

 #### Sending in pointer-to-constant kernel arguments in uniform buffers

 Normally pointer-to-constant kernel arguments are passed into the kernel via a storage
 buffer. Use option `-constant-args-ubo` to pass these parameters in via a uniform buffer.
 Uniform buffers can be faster to read in the shader.

 The compiler will generate an error if the layout of the buffer does not satisfy
 the Standard Uniform Buffer Layout rules of the Vulkan specification (see section
 [15.5.4](https://www.khronos.org/registry/vulkan/specs/1.1/html/vkspec.html#interfaces-resources)).

 #### Clustering plain-old-data kernel arguments to save descriptors

 Descriptors can be scarce.  So the compiler also has an option
 `-cluster-pod-kernel-args` which can be used to reduce the number of descriptors.
 When the option is used:

 - All plain-old-data (POD) kernel arguments are collected into a single struct
   and passed into the compute shader via a single storage buffer resource.
 - The binding numbers are assigned as previously, except:
   - Binding numbers for non-POD arguments are assigned as if there were no
     POD arguments.
   - The binding number for the struct containing the POD arguments is one more
     than the highest non-POD argument.

 By default this option is enabled. To disable this behavior, pass
 `-cluster-pod-kernel-args=0` to the compiler.

 #### Example descriptor set mapping

 For example:

     // First kernel in the translation unit.
     void kernel foo(global int* a, float f, global float* b, uint c);

 In the default case, the bindings are:

 - `a` is mapped to a storage buffer with descriptor set 0, binding 0
 - `b` is mapped to a storage buffer with descriptor set 0, binding 1
 - `f` and `c` are POD arguments, so they are mapped to the first and
   second members of a struct, and that struct is mapped to a storage
   buffer with descriptor set 0 and binding 2

 If `-cluster-pod-kernel-args=0` is used:

 - `a` is mapped to a storage buffer with descriptor set 0, binding 0
 - `f` is mapped to a storage buffer with descriptor set 0, binding 1
 - `b` is mapped to a storage buffer with descriptor set 0, binding 2
 - `c` is mapped to a storage buffer with descriptor set 0, binding 3

 That is, compiling as follows:

     clspv foo.cl -descriptormap=myclusteredmap

 will produce the following in `myclusteredmap`:

     kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,1,offset,0,argKind,buffer
     kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,2,offset,0,argKind,pod,argSize,4
     kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,2,offset,4,argKind,pod,argSize,4

 If `foo` were the second kernel in the translation unit, then its arguments
 would also use descriptor set 0.
 If `foo` were the second kernel in the translation unit _and_ option
 `-distinct-kernel-descriptor-sets` is used, then its arguments would
 use descriptor set 1.

 Compiling with the same sampler map from before:

     clspv foo.cl -descriptormap=myclusteredmap -samplermap=mysamplermap

 produces the following descriptor map:

     sampler,18,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_NEAREST|CLK_NORMALIZED_COORDS_FALSE",descriptorSet,0,binding,0
     sampler,35,samplerExpr,"CLK_ADDRESS_CLAMP_TO_EDGE|CLK_FILTER_LINEAR|CLK_NORMALIZED_COORDS_TRUE",descriptorSet,0,binding,1
     kernel,foo,arg,a,argOrdinal,0,descriptorSet,1,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,b,argOrdinal,2,descriptorSet,1,binding,1,offset,0,argKind,buffer
     kernel,foo,arg,f,argOrdinal,1,descriptorSet,1,binding,2,offset,0,argKind,pod,argSize,4
     kernel,foo,arg,c,argOrdinal,3,descriptorSet,1,binding,2,offset,4,argKind,pod,argSize,4


 TODO(dneto): Give an example using images.

 #### Module-wide Specialization Constants

 The descriptor map includes entries for specialization constants that are used
 module-wide. The following specialization constants are currently generated:
 - `workgroup\_size\_x`: The x-dimension of workgroup size.
 - `workgroup\_size\_y`: The y-dimension of workgroup size.
 - `workgroup\_size\_z`: The z-dimension of workgroup size.
 - `work_dim`: The work dimensions.
 - `global\_offset\_x`: The x-dimension of global offset.
 - `global\_offset\_y`: The y-dimension of global offset.
 - `global\_offset\_z`: The z-dimension of global offset.

 #### Module scope constants

 By default, each module-scope variable in `__constant` address space is mapped to
 a SPIR-V variable in Private address space, with an intializer.  This works only
 for simple scenarios, where:

 - The variable is small, so it's reasonable to fit in a single invocations private registers, and
 - The variable is only read, and in particular its address is not taken.

 In more general cases, use compiler option `-module-constants-in-storage-buffer`. In this case:

 - All module-scope constants are collected into a single SPIR-V storage buffer variable in its
   own descriptor set.
 - The intialization data are written to the descriptor map, and the host program must fill the
   buffer with that data before the kernel executes.

 Consider this example kernel `a.cl`:

     typedef struct {
       char c;
       uint a;
       float f;
     } Foo;
     __constant Foo ppp[3] = {{'a', 0x1234abcd, 1.0}, {'b', 0xffffffff, 1.5}, {0}};

     kernel void foo(global uint* A, uint i) { *A = ppp[i].a; }

 Compiling as follows:

     clspv a.cl -descriptormap=map -module-constants-in-storage-buffer

 Produces the following in file `map`:

     constant,descriptorSet,1,binding,0,hexbytes,61000000cdab34120000803f62000000ffffffff0000c03f000000000000000000000000
     kernel,foo,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
     kernel,foo,arg,i,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod

 The initialization data are in the line starting with `constant`, and its fields are:

 - `constant` to indicate constant initialization data
 - `descriptorSet`
 - the DescriptorSet value
 - `binding`
 - the Binding value
 - `kind`
 - `buffer` to indicate the use of a storage buffer
 - `hexbytes` to indicate the next field is the data, as a sequence of bytes in hexadecimal
 - a sequence of bytes expressed in hexadecimal notation, presented in order from lowest
   address to highest address.

 Take a closer look at the hexadecimal bytes in the example. They are:

 - `61`: ASCII character 'a'
 - `000000`: zero padding to satisfy alignment for the 32-bit integer value that
   follows
 - `cdab3412`: the integer value `0x1234abcd` in little-endian format
 - `0000803f`: the float value 1.0
 - `62`: ASCII character 'b'
 - `000000`: zero padding to satisfy alignment for the 32-bit integer value that
   follows
 - `ffffffff`: the integer value `0xffffffff`
 - `0000c03f`: the float value 1.5
 - `000000000000000000000000`: 12 zero bytes representing the zero-initialized third Foo value.

 #### Module Scope Push constants

 Some features, when enabled, require values to be passed by the application via
 push constants. For each value requested by clspv, an entry will be present in
 the descriptor map. The format is:

 - `pushconstant` to indicate the entry is a push constant
 - `name` to indicate the name of the requested push constant follows
 - the name of the push constant requested
 - `offset` to indicate the offset at which the push constant is expected follows
 - the offset within push constants at which the entry is expected
 - `size` to indicate that the total size follows
 - the total size of the push constant that will be used.

 Here is a list of the push constants currently supported:
 - `global_offset`: the 3D global offset used by `get_global_offset()` and in
   global ID calculations. A vector of 3 32-bit integer values. Lower
   dimensions come first in memory.
 - `enqueued_local_size`: the 3D local work size returned by
   `get_enqueued_local_size()`. A vector of 3 32-bit integer values. Lower
   dimensions come first in memory.
 - `global_size`: the 3D global size of the NDRange as returned by
   `get_global_size()`. A vector of 3 32-bit integer values. Lower dimensions
   come first in memory. Only required when non-uniform NDRanges are supported.
 - `num_workgroups`: the 3D number of work groups in the NDRange as returned by
   `get_num_groups()`. A vector of 3 32-bit integer values. Lower dimensions come
   first in memory. Only required when non-uniform NDRanges are supported.
 - `region_offset`: the sum of the global ID offset into the NDRange for this
   uniform region and the global offset of the NDRange. A vector of 3 32-bit
   integer values. Lower dimensions come first in memory. Only required when
   non-uniform NDRanges are supported.
 - `region_group_offset`: the 3D group ID offset into the NDRange for this
   region. A vector of 3 32-bit integer values. Lower dimensions come first in
   memory. Only required when non-uniform NDRanges are supported.

 ### Attributes

 The following attributes are ignored in the OpenCL C source, and thus have
 no functional impact on the produced SPIR-V:

 - `__attribute__((work_group_size_hint(X, Y, Z)))`
 - `__attribute__((packed))`
 - `__attribute__ ((endian(host)))`
 - `__attribute__ ((endian(device)))`
 - `__attribute__((vec_type_hint(<typen>)))`

 The `__attribute__((reqd_work_group_size(X, Y, Z)))` kernel attribute specifies
 the work-group size that **must** be used with that kernel.


 ### Work-Group Size

 The OpenCL C language allows the work-group size to be set just before executing
 the kernel on the device, at `clEnqueueNDRangeKernel()` time.
 Vulkan requires that the work-group size be specified no later than when the
 `VkPipeline` is created, which in OpenCL terms corresponds to when the
 `cl_kernel` is created.

 To allow for the maximum flexibility to developers who are used to specifying
 the work-group size in the host API and not in the device-side kernel
 language, we can use _specialization constants_ to allow for setting the work-group
 size at `VkPipeline` creation time.

 If all kernels in the OpenCL C source use the `reqd_work_group_size` attribute,
 then that attribute will specify the work-group size that must be used and the
 required values will be set in the SPIR-V via the `LocalSize` execution mode.

 If at least one of the kernels does not use the `reqd_work_group_size`
 attribute, the Vulkan SPIR-V produced by the compiler will contain
 specialization constants as follows:

 - The _x_ dimension of the work-group size is stored in a specialization
   constant that is decorated with the `SpecId`, whose value defaults to
   _1_.
 - The _y_ dimension of the work-group size is stored in a specialization
   constant that is decorated with the `SpecId`, whose value defaults to
   _1_.
 - The _z_ dimension of the work-group size is stored in a specialization
   constant that is decorated with the `SpecId`, whose value defaults to
   _1_.

 In either case, metadata that can be used by the runtime or host code is
 recorded in the generated SPIR-V module, using either
 `PropertyRequiredWorkgroupSize` and/or `SpecConstantWorkgroupSize` non-semantic
 instructions.

 ### Types

 #### Signed Integer Types

 Signed integer types are mapped down onto their unsigned equivalents in SPIR-V
 as produced from OpenCL C.

 Signed integer modulus (`%`) operations, where either argument to the modulus is
 a negative integer, will result in an undefined result.

 ### OpenCL C Built-In Functions

 OpenCL C language built-in functions are mapped, where possible, onto their GLSL
 4.5 built-in equivalents.
 For example, the OpenCL C language built-in function `tan()` is mapped onto
 GLSL's built-in function `tan()`.

 #### Common Functions

 The OpenCL C built-in `sign()` function does not differentiate between a signed
 and unsigned 0.0 input value, nor does it return 0.0 if the input value is a
 NaN.

 #### Integer Functions

 The OpenCL C built-in `mad24()` and `mul24()` functions do not perform their
 operations using 24-bit integers. Instead, they use 32-bit integers, and thus
 have no performance-improving characteristics over normal 32-bit integer
 arithmetic.

 #### Work-Item Functions

 The OpenCL C work-item functions map to Vulkan SPIR-V as follows:

 - `get_work_dim()` is mapped to the `work_dim` spec constant when `-work-dim`
    is enabled (on by default), otherwise it always returns 3.
 - `get_global_size()` is implemented by multiplying the result from
   `get_local_size()` by the result from `get_num_groups()`.
 - `get_global_id()` is mapped to a SPIR-V variable decorated with
   `GlobalInvocationId`. The global offset is added to that variable when
    `-global-offset` or `-global-offset-push-constant` is enabled.
 - `get_local_size()` is mapped to a SPIR-V variable decorated with
   `WorkgroupSize`.
 - `get_local_id()` is mapped to a SPIR-V variable decorated with
   `LocalInvocationId`.
 - `get_num_groups()` is mapped to a SPIR-V variable decorated with
   `NumWorkgroups`.
 - `get_group_id()` is mapped to a SPIR-V variable decorated with
   `WorkgroupId`.
 - `get_global_offset()` is mapped to the `global_offset` spec constant or push
   constant when `-global-offset` or `-global-offset-push-constant` is enabled,
   otherwise it always returns 0.  Spec constants are used unless
   `-global-offset-push-constant` is specified or the language is set to OpenCL
   C++ or OpenCL 2.0.

 ## OpenCL C Restrictions

 Some OpenCL C language features that have no expressible equivalents in Vulkan's
 variant of SPIR-V are restricted.

 ### Kernels

 OpenCL C language kernels **must not** be called from other kernels.

 Pointers of type `half` **must not** be used as kernel arguments.

 ### Types
 #### Boolean

 Booleans are an abstract type - they have no known compile-time size.
 Using a boolean type as the argument to the `sizeof()` operator will result in
 an undefined value.
 The boolean type **must not** be used to form `global`, or `constant` variables,
 nor be used within a struct or union type in the `global`, or `constant` address
 spaces.

 #### 8-Bit Types

 The `char`, `char2`, `char3`, `uchar`, `uchar2`, and `uchar3` types
 can be used. To disable general support for these types, use `-int8=0`.

 #### 64-Bit Types

 The `double`, `double2`, `double3` and `double4` types **must not** be used.

 #### Images

 The `image1d_buffer_t` type **must not** be used.

 #### Events

 The `event_t` type **must not** be used.

 #### Pointers

 Pointers are an abstract type - they have no known compile-time size.
 Using a pointer type as the argument to the `sizeof()` operator will result in
 an undefined value.

 Pointer-to-integer casts **must not** be used.

 Integer-to-pointer casts **must not** be used.

 Pointers **must not** be compared for equality or inequality.

 #### 8- and 16-Wide Vectors

 Vectors of 8 and 16 elements **must not** be used.

 #### Recursive Struct Types

 Recursively defined struct types **must not** be used.

 #### Pointer-Sized Types

 Since pointers have no known compile-time size, the pointer-sized types
 `size_t`, `ptrdiff_t`, `uintptr_t`, and `intptr_t` do not represent types that
 are the same size as a pointer.
 Instead, those types are mapped to 32-bit integer types.

 ### Built-In Functions

 For any OpenCL C language built-in functions that are mapped onto their GLSL
 4.5 built-in equivalents, the precision requirements of the OpenCL C language
 built-ins are not necessarily honoured. In general, Clspv authors expect
 implementations will satisfy the relaxed precision requirements described in
 the OpenCL C specification. This means kernels will operate as if compiled with
 `--cl-fast-relaxed-math`. For higher performance (lower accuracy) variants of
 some builtin functions, clspv also provides the `--cl-native-math` option. This
 option goes beyond fast-relaxed math and provides no precision guarantees
 (similar to the native_ functions in OpenCL).

 #### Atomic Functions

 The `atomic_xchg()` built-in function that takes a floating-point argument
 **must not** be used.

 ##### OpenCL 2.0 Atomic Functions

 The OpenCL 2.0 atomic functions are supported with the following exceptions:
 * `atomic_flag` functions are not supported
 * atomic initialization functions and macros are not supported
 * `memory_order_seq_cst` is weakened to acquire for loads, release for stores
   and acquire release for read-modify-write operations
 * `memory_scope_all_svm_devices` and `memory_scope_all_devices` are not supported
 * `atomic_compare_exchange_weak*` is implemented as `atomic_compare_exchange_strong*`
 * Due to Vulkan restrictions, only 32-bit integer types are currently supported

 #### Conversions

 The `convert_<type>_rte()`, `convert_<type>_rtz()`, `convert_<type>_rtp()`,
 `convert_<type>_rtn()`, `convert_<type>_sat()`, `convert_<type>_sat_rte()`,
 `convert_<type>_sat_rtz()`, `convert_<type>_sat_rtp()`, and
 `convert_<type>_sat_rtn()` built-in functions **must not** be used.

 #### Math Functions

 All supported.

 #### Integer Functions

 All supported.

 #### Relational Functions

 All supported.

 #### Vector Data Load and Store Functions

 The `vload<size>()`, `vstore<size>()`, `vstore_half_rtp()`, `vstore_half_rtn()`,
 `vstore_half<size>_rtp()`, `vstore_half<size>_rtn()`, `vstorea_half<size>_rtp()`
 , and `vstorea_half<size>_rtn()` built-in functions **must not** be used.

 The `vload_half()`, `vload_half<size>()`, `vstore_half()`, `vstore_half_rte()`,
 `vstore_half_rtz()`, `vstore_half<size>()`, `vstore_half<size>_rte()`,
 `vstore_half<size>_rtz()`, and `vloada_half<size>()`
 built-in functions
 are only allowed to use the `global` and `constant` address spaces.

 **Note**: When 16-bit storage support is not assumed, both `vload_half` and
 `vstore_half` assume the pointers are aligned to 4 bytes, not 2 bytes.
 See [issue 6](https://github.com/google/clspv/issues/6).

 Builtin functions
 `vstorea_half2()`,
 `vstorea_half4()`,
 `vstorea_half2_rtz()`,
 `vstorea_half4_rtz()`,
 `vstorea_half2_rte()`,
  and
 `vstorea_half4_rte()` built-in functions
 have implementations for global, local, and private
 address spaces.

 The `vstore_half_rte()`, `vstore_half_rtz()`, `vstore_half<size>_rte()`,
 `vstore_half<size>_rtz()`, `vstorea_half<size>_rte()`, and
 `vstorea_half<size>_rtz()` built-in functions are not guaranteed to round the
 result correctly if the destination address was not declared as a `half*` on the
 kernel entry point.

 #### Async Copy and Prefetch Functions

 The `async_work_group_copy()`, `async_work_group_strided_copy()`,
 `wait_group_events()`, and `prefetch()` built-in functions **must not** be used.

 #### Miscellaneous Vector Functions

 The `shuffle()`, `shuffle2()` and `vec_step()` built-in functions **must not**
 be used.

 #### Printf

 The `printf()` built-in function **must not** be used.

 #### Image Read and Write Functions

 The `get_image_channel_data_type()` and `get_image_channel_order()`
 built-in functions **must not** be used.

 #### cl_khr_subgroups extension

 The OpenCL extension `cl_khr_subgroups` requires SPIR-V 1.3 or greater and
 translates the built-in functions to GroupNonUniform operations and Builtin
 constants as follows:

 - `get_sub_group_size()` is mapped to `BuiltInSubgroupSize` constant.
   Requires `CapabilityGroupNonUniform` capability.
 - `get_num_sub_groups()` is mapped to `BuiltInNumSubgroups` constant.
   Requires `CapabilityGroupNonUniform` capability.
 - `get_sub_group_id()` is mapped to `BuiltInSubgroupId` constant.
   Requires `CapabilityGroupNonUniform` capability.
 - `get_sub_group_local_id()` is mapped to `BuiltInSubgroupLocalInvocationId`
   constant.  Requires `CapabilityGroupNonUniform` capability.
 - `sub_group_broadcast()` is mapped to `OpGroupNonUniformBroadcast` operation.
   Requires `CapabilityGroupNonUniformBallot` capability. For SPIR-V version < 1.5
   a constant laneId is required.
 - `sub_group_all()` is mapped to `OpGroupNonUniformAll` operation. Requires
   `CapabilityGroupNonUniformVote` capability.
 - `sub_group_any()` is mapped to `OpGroupNonUniformAny` operation. Requires
   `CapabilityGroupNonUniformVote` capability.
 - `sub_group_<group_op>_add()` is mapped to `OpGroupNonUniformIAdd` operation
   for Integer types and `OpGroupNonUniformFAdd` operation for Float types.
   Requires `CapabilityGroupNonUniformArithmetic` capability.
 - `sub_group_<group_op>_min()` is mapped to `OpGroupNonUniformSMin` operation
   for Signed-Integer types, `OpGroupNonUniformUMin` operation for
   Unsigned-Integer types and `OpGroupNonUniformFMin` operation for Float types.
   Requires `CapabilityGroupNonUniformArithmetic` capability.
 - `sub_group_<group_op>_max()` is mapped to `OpGroupNonUniformSMax` operation
   for Signed-Integer types, `OpGroupNonUniformUMax` operation for
   Unsigned-Integer types and `OpGroupNonUniformFMax` operation for Float types.
   Requires `CapabilityGroupNonUniformArithmetic` capability.
 - `sub_group_barrier()` is mapped to OpControlBarrier with an execution scope of
   `Subgroup`. If no memory scope is specified, `Subgroup` is used. The memory
   semantics depend on the flags on the barrier.

 The `group_op` qualifier translates as follows:

 - `reduce` maps to `GroupOperationReduce`.
 - `scan_exclusive` maps to `GroupOperationExclusiveScan`.
 - `scan_inclusive` maps to `GroupOperationInclusiveScan`.

 These extension built-in functions are not supported:

 - `get_max_sub_group_size()` requires CapabilityKernel (incompatible with Shader)
 - `get_enqueued_num_sub_groups()` requires CapabilityKernel (incompatible with Shader)
 - `sub_group_reserve_read_pipe()`
 - `sub_group_reserve_write_pipe()`
 - `sub_group_commit_read_pipe()`
 - `sub_group_commit_write_pipe()`
 - `get_kernel_sub_group_count_for_ndrange()`
 - `get_kernel_max_sub_group_size_for_ndrange()`

 ### Numerical Compliance

 Clspv is not able to reach full accuracy requirements on the supported builtin
 functions in all cases. Instead, currently, it is able to meet the requirements
 tested by OpenCL CTS under relaxed precision requirements. In order to achieve
 this goal, some functions are implemented using the GLSL.std.450 extended or
 core instructions, some are implemented in the builtin library and some are
 emulated by the compiler.

 Note: Clspv has been tested against five Vulkan implementations from different
 vendors and is able to achieve the relaxed accuracy requirements broadly.

 #### Implementations Using Core and Extended Instructions

 `add`, `subtract`, `divide`, `multiply`, `assignment`, `not`, `acos`, `asin`,
 `ceil`, `cos`, `cosh`, `exp`, `exp2`, `exp10`, `fabs`, `floor`, `fmax`, `fmin`,
 `log`, `log2`, `log10`, `mad`, `pow`, `powr`, `rint`, `rsqrt`, `sin`, `sinh`,
 `tan`, `trunc`, `half_cos`, `half_exp`, `half_exp2`, `half_exp10`, `half_log`,
 `half_log2`, `half_log10`, `half_powr`, `half_rsqrt`, `half_sin`, `half_tan`,
 `clamp`, `degrees`, `mix`, `radians`, `sign`, `smoothstep`, `step`, `cross`,
 `dot`, `normalize`, `fast_distance`, `fast_length`, `fast_normalize`.

 #### Implementations Using Builtin Library

 `acosh`, `asinh`, `atanh`, `cbrt`, `erfc`, `erf`, `fma`, `fmod`, `fract`,
 `frexp`, `hypot`, `ilogb`, `ldexp`, `lgamma`, `lgamma_r`, `logb`, `maxmag`,
 `modf`, `nan`, `nextafter`, `remainder`, `remquo`, `rootn`, `sqrt`, `tanh`,
 `tgamma`, `half_divide`, `half_recip`, `half_sqrt`, `distance`, `length`,
 `atan`, `atan2pi`, `atanpi`, `atan2`.

 Note: `fma` has a very high runtime cost unless compiling with
 `-cl-native-math`. Its accuracy requirements are not relaxed by
 `-cl-fast-relaxed-math` and the library implementation emulates it using
 integers.

 Note: `acosh`, `asinh`, `atanh`, `atan`, `atan2`, `atanpi` and `atan2pi`,
 `fma`, `fmod`, `fract`, `frexp`, `ldexp`, `rsqrt`, `half_sqrt`, `sqrt`, `tanh`,
 `distance`, and `length` are all implemented using core or extended
 instructions when compiling with `-cl-native-math`.

 #### Implementations Using Emulation

 `acospi`, `asinpi`, `copysign`, `cospi`, `expm1`, `fdim`, `log1p`, `pown`,
 `round`, `sincos`, `sinpi`, `tanpi`.

 #### Known Conformance Issues

 * `frexp` fails using Swiftshader as an implementation due to an internal error.
 * `ldexp` and `rsqrt` fail to meet accuracy requirements on some Adreno GPUs.