| // Copyright 2017-2024 The Khronos Group. This work is licensed under a |
| // Creative Commons Attribution 4.0 International License; see |
| // http://creativecommons.org/licenses/by/4.0/ |
| |
| // Please keep footnotes in alphabetical order! |
| |
| // Note: Mostly follows the suggested syntax from: |
| // https://github.com/asciidoctor/asciidoctor-pdf/issues/1397 |
| |
| :fn-address-operator-result: pass:n[ \ |
| Thus, *&*E* is equivalent to *E* (even if *E* is a null pointer), and *&(E1[E2\])* is equivalent to *\((E1) {plus} (E2))*. \ |
| It is always true that if *E* is an l-value that is a valid operand of the unary *&* operator, *+*&E+* is an l-value equal to *E*. \ |
| ] |
| |
| :fn-aggregate-types: pass:n[ \ |
| That is, for the purpose of applying type-based aliasing rules, a built-in vector data type will be considered equivalent to the corresponding array type. \ |
| ] |
| |
| :fn-atomic-no-consume: pass:n[ \ |
| The <<C11-spec,C11>> consume operation is not supported. \ |
| ] |
| |
| :fn-atomic-double-supported: pass:n[ \ |
| The `atomic_double` type is only supported if double precision is supported and the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled. \ |
| If this is the case then an OpenCL C 3.0 compiler must also define the `+__opencl_c_fp64+` feature. \ |
| ] |
| |
| :fn-atomic-int64-supported: pass:n[ \ |
| The atomic_long and atomic_ulong types are supported if the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled. \ |
| If this is the case then an OpenCL C 3.0 compiler must also define the `+__opencl_c_int64+` feature. \ |
| ] |
| |
| :fn-atomic-size_t-supported: pass:n[ \ |
| If the device address space is 64-bits, the data types `atomic_intptr_t`, `atomic_uintptr_t`, `atomic_size_t` and `atomic_ptrdiff_t` are supported if the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled. \ |
| ] |
| |
| :fn-atomic-weak-rationale: pass:n[ \ |
| This spurious failure enables implementation of compare-and-exchange on a broader class of machines, e.g. load-locked store-conditional machines. \ |
| ] |
| |
| :fn-bitfield-struct-restriction: pass:n[ \ |
| Bit-field struct members are <<restrictions-bitfield, not supported in OpenCL C>>. \ |
| ] |
| |
| :fn-bool: pass:n[ \ |
| When any scalar value is converted to `bool`, the result is 0 if the value compares equal to 0; otherwise, the result is 1. \ |
| ] |
| |
| :fn-cl_double: pass:n[ \ |
| <<unified-spec, Requires>> support for OpenCL C 1.2 or above. \ |
| Also see extension *cl_khr_fp64*. \ |
| ] |
| |
| :fn-cl_khr_fp16: pass:n[ \ |
| Unless the *cl_khr_fp16* extension is supported and has been enabled. \ |
| ] |
| |
| :fn-clang-block-syntax: pass:n[ \ |
| This syntax is already part of the clang source tree on which most vendors have based their OpenCL implementations. \ |
| Additionally, blocks based closures are supported by the clang open source C compiler as well as Mac OS X's C and Objective C compilers. \ |
| Specifically, Mac OS X's Grand Central Dispatch allows applications to queue tasks as a block. \ |
| ] |
| |
| :fn-clang-block-function-pointers: pass:n[ \ |
| OpenCL C <<restrictions,does not allow function pointers>> primarily because it is difficult or expensive to implement generic indirections to executable code in many hardware architectures that OpenCL targets. \ |
| OpenCL C's design of Blocks is intended to respect that same condition, yielding the restrictions listed here. \ |
| As such, Blocks allow a form of dynamically enqueued function scheduling without providing a form of runtime synchronous dynamic dispatch analogous to function pointers. \ |
| ] |
| |
| :fn-CLK_ADDRESS_CLAMP: pass:n[ \ |
| This is similar to the `GL_ADDRESS_CLAMP_TO_BORDER` addressing mode. \ |
| ] |
| |
| :fn-CLK_UNORM_INT_101010_2: pass:n[ \ |
| Although `CL_UNORM_INT_101010_2` was added in OpenCL 2.1, because there was no OpenCL C 2.1 this image channel order <<unified-spec, requires>> OpenCL 3.0. \ |
| ] |
| |
| :fn-double: pass:n[ \ |
| The `double` scalar type is an optional type that is supported if the value of the `CL_DEVICE_DOUBLE_FP_CONFIG` device query is not zero. \ |
| If this is the case then an OpenCL C 3.0 compiler must also define the `+__opencl_c_fp64+` feature macro. \ |
| ] |
| |
| :fn-double-supported: pass:n[ \ |
| Only if double precision is supported. \ |
| In OpenCL C 3.0 this will be indicated by the presence of the `+__opencl_c_fp64+` feature macro. \ |
| ] |
| |
| :fn-double-vec: pass:n[ \ |
| The `double__n__` vector type is an optional type that is supported if the value of the `CL_DEVICE_DOUBLE_FP_CONFIG` device query is not zero. \ |
| If this is the case then an OpenCL C 3.0 compiler must also define the `+__opencl_c_fp64+` feature macro. \ |
| ] |
| |
| :fn-dse-CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP: pass:n[ \ |
| This acts as a memory synchronization point between work-items in a work-group and child kernels enqueued by work-items in the work-group. \ |
| ] |
| |
| :fn-dse-immediate-definition: pass:n[ \ |
| Immediate meaning not side effects resulting from child kernels. \ |
| The side effects would include stores to `global` memory and pipe reads and writes. \ |
| ] |
| |
| :fn-dse-kernel_enqueue_flags_t: pass:n[ \ |
| Implementations are not required to honor this flag. \ |
| Implementations may not schedule kernel launch earlier than the point specified by this flag, however. \ |
| ] |
| |
| :fn-float-conversion-rounding: pass:n[ \ |
| For conversions to floating-point format, when a finite source value exceeds the maximum representable finite floating-point destination value, the rounding mode will affect whether the result is the maximum finite floating-point value or infinity of same sign as the source value, per IEEE-754 rules for rounding. \ |
| ] |
| |
| :fn-float-increment-decrement: pass:n[ \ |
| The pre- and post- increment operators may have unexpected behavior on floating-point values and are therefore not supported for floating-point scalar and vector built-in types. \ |
| For example, if variable _a_ has type `float` and holds the value `0x1.0p25f`, then `_a_{pp}` returns `0x1.0p25f`. + \ |
| Also, `(_a_{pp})--` is not guaranteed to return _a_, if _a_ has fractional value. + \ |
| In non-default rounding modes, `(_a_{pp})--` may produce the same result as `_a_{pp}` or `_a_--` for large _a_. \ |
| ] |
| |
| :fn-float-reinterpretation: pass:n[ \ |
| In addition, some other extensions to the C language designed to support a particular vector ISA (e.g. AltiVec{trade}, CELL Broadband Engine{trade} Architecture) use such conversions in conjunction with swizzle operators to achieve type un-conversion. \ |
| So as to support legacy code of this type, *as_typen*() allows conversions between vectors of the same size but different numbers of elements, even though the behavior of this sort of conversion is not likely to be portable except to other OpenCL implementations for the same hardware architecture. + \ |
| AltiVec is a trademark of Motorola Inc. + \ |
| Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. \ |
| ] |
| |
| :fn-float-required-rounding-mode: pass:n[ \ |
| Except for the embedded profile where either round to zero or round to nearest rounding mode may be supported for single precision floating-point. \ |
| ] |
| |
| :fn-floating-point-exception-nans: pass:n[ \ |
| If an implementation extends this specification to support IEEE-754 flags or exceptions, then all built-in functions defined in the following table shall proceed without raising the _invalid_ floating-point exception when one or more of the operands are NaNs. \ |
| ] |
| |
| :fn-float-types-supported: pass:n[ \ |
| The `half` scalar and vector types can only be used if the *cl_khr_fp16* extension is supported and has been enabled. \ |
| The `double` scalar and vector types can only be used if `double` precision is supported, e.g. for OpenCL C 3.0 the `+__opencl_c_fp64+` feature macro is present. \ |
| ] |
| |
| :fn-fmin-fmax-nan: pass:n[ \ |
| *fmin* and *fmax* behave as defined by C99 and may not match the IEEE 754-2008 definition for *minNum* and *maxNum* with regard to signaling NaNs. \ |
| Specifically, signaling NaNs may behave as quiet NaNs. \ |
| ] |
| |
| :fn-fract-min: pass:n[ \ |
| The *min*() operator is there to prevent *fract*(-small) from returning 1.0. \ |
| It returns the largest positive floating-point number less than 1.0. \ |
| ] |
| |
| :fn-half-supported: pass:n[ \ |
| Only if the *cl_khr_fp16* extension is supported and has been enabled. \ |
| ] |
| |
| :fn-image-functions: pass:n[ \ |
| Refer to the detailed description of the built-in <<image-read-and-write-functions,Image Read and Write Functions>> that use this type. \ |
| ] |
| |
| :fn-int64-supported: pass:n[ \ |
| Only if 64-bit integers are supported. \ |
| In OpenCL C 3.0 this will be indicated by the presence of the `+__opencl_c_int64+` feature macro. \ |
| ] |
| |
| :fn-integer-conversion-rank: pass:n[ \ |
| This is different from the standard integer conversion rank described in <<C99-spec,section 6.3.1.1 of the C99 Specification>>. \ |
| ] |
| |
| :fn-integer-promotion: pass:n[ \ |
| Integer promotion is described in <<C99-spec,section 6.3.1.1 of the C99 Specification>>. \ |
| ] |
| |
| :fn-long: pass:n[ \ |
| The `long`, `unsigned long` and `ulong` scalar types are optional types for EMBEDDED profile devices that are supported if the value of the `CL_DEVICE_EXTENSIONS` device query contains *cles_khr_int64*. \ |
| An OpenCL C 3.0 compiler must also define the `+__opencl_c_int64+` feature macro unconditionally for FULL profile devices, or for EMBEDDED profile devices that support these types. \ |
| ] |
| |
| :fn-long-vec: pass:n[ \ |
| The `long__n__` and `ulong__n__` vector types are optional types for EMBEDDED profile devices that are supported if the value of the `CL_DEVICE_EXTENSIONS` device query contains *cles_khr_int64*. \ |
| An OpenCL C 3.0 compiler must also define the `+__opencl_c_int64+` feature macro unconditionally for FULL profile devices, or for EMBEDDED profile devices that support these types. \ |
| ] |
| |
| :fn-mad-caution: pass:n[ \ |
| The user is cautioned that for some usages, e.g. *mad*(a, b, -a*b), the \ |
| definition of *mad*() is loose enough in the embedded profile \ |
| or with half-precision arguments \ |
| that almost any result is allowed from *mad*() for some values of a and b. \ |
| ] |
| |
| :fn-memory-scope-restrictions: pass:n[ \ |
| Refer to the description and restrictions for <<memory-scope,`memory_scope`>>. \ |
| ] |
| |
| :fn-min-float-constants: pass:n[ \ |
| Here `TYPE_MIN` and `TYPE_MIN_EXP` should be substituted by constants appropriate to the floating-point type under consideration, such as `FLT_MIN` and `FLT_MIN_EXP` for `float`. \ |
| ] |
| |
| :fn-non-uniform-work-groups: pass:n[ \ |
| I.e. the _global_work_size_ values specified to *clEnqueueNDRangeKernel* are not evenly divisible by the _local_work_size_ values for each dimension. \ |
| ] |
| |
| :fn-OPENCL_VERSION: pass:n[ \ |
| When OpenCL C is compiled offline, `+__OPENCL_VERSION__+` may be defined and may substitute any implementation-defined integer value. \ |
| ] |
| |
| :fn-pointer-invalid-value-indirection: pass:n[ \ |
| Among the invalid values for dereferencing a pointer by the unary *+*+* operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime. \ |
| If *+*P+* is an l-value and *T* is the name of an object pointer type, *+*(T)P+* is an l-value that has a type compatible with that to which *T* points. \ |
| ] |
| |
| :fn-pow-performance: pass:n[ \ |
| On some implementations, *powr*() or *pown*() may perform faster than *pow*(). \ |
| If _x_ is known to be >= 0, consider using *powr*() in place of *pow*(), or if _y_ is known to be an integer, consider using *pown*() in place of *pow*(). \ |
| ] |
| |
| :fn-printf-field-width: pass:n[ \ |
| Note that *0* is taken as a flag, not as the beginning of a field width. \ |
| ] |
| |
| :fn-printf-hex-float: pass:n[ \ |
| Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries. \ |
| ] |
| |
| :fn-printf-infinity-nan: pass:n[ \ |
| When applied to infinite and NaN values, the *-*, *+*, and _space_ flag characters have their usual meaning; the *#* and *0* flag characters have no effect. \ |
| ] |
| |
| :fn-printf-literal-string: pass:n[ \ |
| No special provisions are made for multibyte characters. \ |
| The behavior of *printf* with the *s* conversion specifier is undefined if the argument value is not a pointer to a literal string. \ |
| ] |
| |
| :fn-printf-minus-sign: pass:n[ \ |
| The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign. \ |
| ] |
| |
| :fn-read-image-with-sampler: pass:n[ \ |
| Note that the built-in function calls to read images with a sampler are not supported for `image1d_buffer_t` image types. \ |
| ] |
| |
| :fn-reinterpret-vector-types: pass:n[ \ |
| While the union is intended to reflect the organization of data in memory, the *as_type*() and *as_type__n__*() constructs are intended to reflect the organization of data in register. \ |
| The *as_type*() and *as_type__n__*() constructs are intended to compile to no instructions on devices that use a shared register file designed to operate on both the operand and result types. \ |
| Note that while differences in memory organization are expected to largely be limited to those arising from endianness, the register based representation may also differ due to size of the element in register. \ |
| For example, an architecture may load a `char` into a 32-bit register, or a `char` vector into a SIMD vector register with fixed 32-bit element size. \ |
| If the element count does not match, then the implementation should pick a data representation that most closely matches what would happen if an appropriate result type operator was applied to a register containing data of the source type. \ |
| If the number of elements matches, then the *as_type__n__*() should faithfully reproduce the behavior expected from a similar data type reinterpretation using memory/unions. \ |
| So, for example if an implementation stores all single precision data as `double` in register, it should implement *as_int*(`float`) by first down-converting the `double` to single precision and then (if necessary) moving the single precision bits to a register suitable for operating on integer data. \ |
| If data stored in different address spaces do not have the same endianness, then the "`dominant endianness`" of the device should prevail. \ |
| ] |
| |
| :fn-relational-any-all: pass:n[ \ |
| To test whether any or all elements in the result of a vector relational operator test _true_, for example to use in the context in an *if ( )* statement, please see the <<relational-functions,*any* and *all* built-ins>>. \ |
| ] |
| |
| :fn-rhadd-benefit: pass:n[ \ |
| Frequently vector operations need n + 1 bits temporarily to calculate a result. \ |
| The *rhadd* instruction gives you an extra bit without needing to upsample and downsample. \ |
| This can be a profound performance win. \ |
| ] |
| |
| :fn-select-vs-ternary: pass:n[ \ |
| This definition means that the behavior of select and the ternary operator for vector and scalar types is dependent on different interpretations of the bit pattern of _c_. \ |
| ] |
| |
| :fn-size_t: pass:n[ \ |
| This is a 32-bit type if the value of the `CL_DEVICE_ADDRESS_BITS` device query is 32-bits, and a 64-bit type if the value of the query is 64-bits. \ |
| ] |
| |
| :fn-variable-length-array-restriction: pass:n[ \ |
| Variable length arrays are <<restrictions-variable-length,not supported in OpenCL C>>. \ |
| ] |
| |
| :fn-vec3-async-copy: pass:n[ \ |
| *async_work_group_copy* and *async_work_group_strided_copy* for 3-component vector types behave as *async_work_group_copy* and *async_work_group_strided_copy* respectively for 4-component vector types. \ |
| ] |
| |
| :fn-vec3-size: pass:n[ \ |
| Except for 3-component vectors whose size is defined as 4 times the size of each scalar component. \ |
| ] |
| |
| :fn-vec3-vload-vstore: pass:n[ \ |
| *vload3* and *vload_half3* read (_x_,_y_,_z_) components from address `(_p_ + (_offset_ * 3))` into a 3-component vector. \ |
| *vstore3* and *vstore_half3* write (_x_,_y_,_z_) components from a 3-component vector to address `(_p_ + (_offset_ * 3))`. \ |
| In addition, *vloada_half3* reads (_x_,_y_,_z_) components from address `(_p_ + (_offset_ * 4))` into a 3-component vector and *vstorea_half3* writes (_x_,_y_,_z_) components from a 3-component vector to address `(_p_ {plus} (_offset_ * 4))`. \ |
| Whether *vloada_half3* and *vstorea_half3* read/write padding data between the third vector element and the next alignment boundary is implementation-defined. \ |
| The *vloada_* and *vstorea_* variants are provided to access data that is aligned to the size of the vector, and are intended to enable performance on hardware that can take advantage of the increased alignment. \ |
| ] |
| |
| :fn-vec-type-hint: pass:n[ \ |
| Implicit in autovectorization is the assumption that any libraries called from the `+__kernel+` must be recompilable at run time to handle cases where the compiler decides to merge or separate workitems. \ |
| This probably means that such libraries can never be hard coded binaries or that hard coded binaries must be accompanied either by source or some retargetable intermediate representation. \ |
| This may be a code security question for some. \ |
| ] |
| |
| :fn-vector-types: pass:n[ \ |
| Built-in vector data types are supported by the OpenCL implementation even if the underlying compute device does not natively support any or all of the vector data types. \ |
| They are to be converted by the device compiler to appropriate instructions that use underlying built-in types supported natively by the compute device. \ |
| Refer to Appendix B in the OpenCL API specification for a description of the order of the components of a vector type in memory. \ |
| ] |