cxx/annotation.txt - external/github.com/KhronosGroup/OpenCL-Docs - Git at Google

 // Copyright 2017-2024 The Khronos Group. This work is licensed under a
 // Creative Commons Attribution 4.0 International License; see
 // http://creativecommons.org/licenses/by/4.0/

 <<<

 '''''
 [[ftn2]]
 [2] The `double` data type is an optional type that is supported if `CL_DEVICE_DOUBLE_FP_CONFIG` in table 4.3 for a device is not zero.

 [[ftn3]]
 [3] The question mark ? in numerical selector refers to special undefined component of vector; reading from it results in undefined value, writing to it is discarded.

 [[ftn4]]
 [4] Only if the *cl_khr_fp16* extension is enabled and has been supported

 [[ftn5]]
 [5] For conversions to floating-point format, when a finite source value exceeds the maximum representable finite floating-point destination value, the rounding mode will affect whether the result is the maximum finite floating-point value or infinity of same sign as the source value, per IEEE-754 rules for rounding.

 [[ftn6]]
 [6] The `as_type<T>` function is intended to reflect the organization of data in register.
 The `as_type<T>` construct is intended to compile to no instructions on devices that use a shared register file designed to operate on both the operand and result types.
 Note that while differences in memory organization are expected to largely be limited to those arising from endianness, the register based representation may also differ due to size of the element in register.
 (For example, an architecture may load a char into a 32-bit register, or a char vector into a SIMD vector register with fixed 32-bit element size.)
 If the element count does not match, then the implementation should pick a data representation that most closely matches what would happen if an appropriate result type operator was applied to a register containing data of the source type.
 So, for example if an implementation stores all single precision data as double in register, it should implement `as_type<int>(float)` by first downconverting the double to single precision and then (if necessary) moving the single precision bits to a register suitable for operating on integer data.
 If data stored in different address spaces do not have the same endianness, then the "dominant endianness" of the device should prevail.

 [[ftn7]]
 [7] `memory_order_consume` is not supported in OpenCL {cpp}

 [[ftn8]]
 [8] This value for `memory_scope` can only be used with `atomic_fence` with flags set to `mem_fence::image`.

 [[ftn9]]
 [9] We can't require {cpp14} atomics since host programs can be implemented in other programming languages and versions of C or {cpp}, but we do require that the host programs use atomics and that those atomics be compatible with those in {cpp14}.

 [[ftn10]]
 [10] The `atomic_long` and `atomic_ulong` types are supported if the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled.

 [[ftn11]]
 [11] The `atomic_double` type is only supported if double precision is supported and the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled.

 [[ftn12]]
 [12] If the device address space is 64-bits, the data types `atomic_intptr_t`, `atomic_uintptr_t`, `atomic_size_t` and `atomic_ptrdiff_t` are supported only if the *cl_khr_int64_base_atomics* and *cl_khr_int64_extended_atomics* extensions are supported and have been enabled.

 [[ftn13]]
 [13] The `\*_ms` types are supported only if the *cl_khr_gl_msaa_sharing* and *cl_khr_gl_depth_images* extensions are supported and have been enabled.

 [[ftn14]]
 [14] Immediate meaning not side effects resulting from child kernels. The side effects would include stores to global memory and pipe reads and writes.

 [[ftn15]]
 [15] This acts as a memory synchronization point between work-items in a work-group and child kernels enqueued by work-items in the work-group.

 // Footnote 16 removed - duplicated footnote 17

 [[ftn17]]
 [17] i.e. the `global_work_size` values specified to `clEnqueueNDRangeKernel` are not evenly divisible by the `local_work_size` values for each dimension.

 [[ftn18]]
 [18] Only if double precision is supported and has been enabled.

 [[ftn19]]
 [19] Refer to the <<order-and-scope, _Memory order and scope_>> section for description of `memory_scope`.

 [[ftn20]]
 [20] The `min()` operator is there to prevent `fract(-small)` from returning 1.0.
 It returns the largest positive floating-point number less than 1.0.

 [[ftn21]]
 [21] fmin and fmax behave as defined by {cpp14} and may not match the IEEE 754-2008 definition for minNum and maxNum with regard to signaling NaNs.
 Specifically, signaling NaNs may behave as quiet NaNs.

 [[ftn22]]
 [22] The user is cautioned that for some usages, e.g. `mad(a, b, -a*b)`, the definition of `mad()` in the embedded profile is loose enough that almost any result is allowed from `mad()` for some values of `a` and `b`.

 [[ftn23]]
 [23] Frequently vector operations need n + 1 bits temporarily to calculate a result.
 The rhadd instruction gives you an extra bit without needing to upsample and downsample. This can be a profound performance win.

 [[ftn24]]
 [24] The primary purpose of the printf function is to help in debugging OpenCL kernels.

 [[ftn25]]
 [25] Note that _0_ is taken as a flag, not as the beginning of a field width.

 [[ftn26]]
 [26] The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.

 [[ftn27]]
 [27] When applied to infinite and NaN values, the -, +, and space flag characters have their usual meaning; the # and _0_ flag characters have no effect.

 [[ftn28]]
 [28] Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries.

 [[ftn29]]
 [29] No special provisions are made for multibyte characters.
 The behavior of printf with the _s_ conversion specifier is undefined if the argument value is not a pointer to a literal string.

 [[ftn30]]
 [30] Except for the embedded profile whether either round to zero or round to nearest rounding mode may be supported for single precision floating-point.

 [[ftn31]]
 [31] The ULP values for built-in math functions `lgamma` and `lgamma_r` is currently undefined.

 [[ftn32]]
 [32] 0 ulp is used for math functions that do not require rounding.

 [[ftn33]]
 [33] On some implementations, `powr()` or `pown()` may perform faster than `pow()`.
 If `x` is known to be `>= 0`, consider using `powr()` in place of `pow()`, or if `y` is known to be an integer, consider using `pown()` in place of `pow()`.

 // Footnote 34 removed - duplicated footnote 32

 [[ftn35]]
 [35] Here `TYPE_MIN` and `TYPE_MIN_EXP` should be substituted by constants appropriate to the floating-point type under consideration, such as `FLT_MIN` and `FLT_MIN_EXP` for float.
	// Copyright 2017-2024 The Khronos Group. This work is licensed under a
	// Creative Commons Attribution 4.0 International License; see
	// http://creativecommons.org/licenses/by/4.0/

	<<<

	'''''
	[[ftn2]]
	[2] The `double` data type is an optional type that is supported if `CL_DEVICE_DOUBLE_FP_CONFIG` in table 4.3 for a device is not zero.

	[[ftn3]]
	[3] The question mark ? in numerical selector refers to special undefined component of vector; reading from it results in undefined value, writing to it is discarded.

	[[ftn4]]
	[4] Only if the cl_khr_fp16 extension is enabled and has been supported

	[[ftn5]]
	[5] For conversions to floating-point format, when a finite source value exceeds the maximum representable finite floating-point destination value, the rounding mode will affect whether the result is the maximum finite floating-point value or infinity of same sign as the source value, per IEEE-754 rules for rounding.

	[[ftn6]]
	[6] The `as_type<T>` function is intended to reflect the organization of data in register.
	The `as_type<T>` construct is intended to compile to no instructions on devices that use a shared register file designed to operate on both the operand and result types.
	Note that while differences in memory organization are expected to largely be limited to those arising from endianness, the register based representation may also differ due to size of the element in register.
	(For example, an architecture may load a char into a 32-bit register, or a char vector into a SIMD vector register with fixed 32-bit element size.)
	If the element count does not match, then the implementation should pick a data representation that most closely matches what would happen if an appropriate result type operator was applied to a register containing data of the source type.
	So, for example if an implementation stores all single precision data as double in register, it should implement `as_type<int>(float)` by first downconverting the double to single precision and then (if necessary) moving the single precision bits to a register suitable for operating on integer data.
	If data stored in different address spaces do not have the same endianness, then the "dominant endianness" of the device should prevail.

	[[ftn7]]
	[7] `memory_order_consume` is not supported in OpenCL {cpp}

	[[ftn8]]
	[8] This value for `memory_scope` can only be used with `atomic_fence` with flags set to `mem_fence::image`.

	[[ftn9]]
	[9] We can't require {cpp14} atomics since host programs can be implemented in other programming languages and versions of C or {cpp}, but we do require that the host programs use atomics and that those atomics be compatible with those in {cpp14}.

	[[ftn10]]
	[10] The `atomic_long` and `atomic_ulong` types are supported if the cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics extensions are supported and have been enabled.

	[[ftn11]]
	[11] The `atomic_double` type is only supported if double precision is supported and the cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics extensions are supported and have been enabled.

	[[ftn12]]
	[12] If the device address space is 64-bits, the data types `atomic_intptr_t`, `atomic_uintptr_t`, `atomic_size_t` and `atomic_ptrdiff_t` are supported only if the cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics extensions are supported and have been enabled.

	[[ftn13]]
	[13] The `\_ms` types are supported only if the cl_khr_gl_msaa_sharing* and cl_khr_gl_depth_images extensions are supported and have been enabled.

	[[ftn14]]
	[14] Immediate meaning not side effects resulting from child kernels. The side effects would include stores to global memory and pipe reads and writes.

	[[ftn15]]
	[15] This acts as a memory synchronization point between work-items in a work-group and child kernels enqueued by work-items in the work-group.

	// Footnote 16 removed - duplicated footnote 17

	[[ftn17]]
	[17] i.e. the `global_work_size` values specified to `clEnqueueNDRangeKernel` are not evenly divisible by the `local_work_size` values for each dimension.

	[[ftn18]]
	[18] Only if double precision is supported and has been enabled.

	[[ftn19]]
	[19] Refer to the <<order-and-scope, _Memory order and scope_>> section for description of `memory_scope`.

	[[ftn20]]
	[20] The `min()` operator is there to prevent `fract(-small)` from returning 1.0.
	It returns the largest positive floating-point number less than 1.0.

	[[ftn21]]
	[21] fmin and fmax behave as defined by {cpp14} and may not match the IEEE 754-2008 definition for minNum and maxNum with regard to signaling NaNs.
	Specifically, signaling NaNs may behave as quiet NaNs.

	[[ftn22]]
	[22] The user is cautioned that for some usages, e.g. `mad(a, b, -a*b)`, the definition of `mad()` in the embedded profile is loose enough that almost any result is allowed from `mad()` for some values of `a` and `b`.

	[[ftn23]]
	[23] Frequently vector operations need n + 1 bits temporarily to calculate a result.
	The rhadd instruction gives you an extra bit without needing to upsample and downsample. This can be a profound performance win.

	[[ftn24]]
	[24] The primary purpose of the printf function is to help in debugging OpenCL kernels.

	[[ftn25]]
	[25] Note that _0_ is taken as a flag, not as the beginning of a field width.

	[[ftn26]]
	[26] The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.

	[[ftn27]]
	[27] When applied to infinite and NaN values, the -, +, and space flag characters have their usual meaning; the # and _0_ flag characters have no effect.

	[[ftn28]]
	[28] Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble (4-bit) boundaries.

	[[ftn29]]
	[29] No special provisions are made for multibyte characters.
	The behavior of printf with the _s_ conversion specifier is undefined if the argument value is not a pointer to a literal string.

	[[ftn30]]
	[30] Except for the embedded profile whether either round to zero or round to nearest rounding mode may be supported for single precision floating-point.

	[[ftn31]]
	[31] The ULP values for built-in math functions `lgamma` and `lgamma_r` is currently undefined.

	[[ftn32]]
	[32] 0 ulp is used for math functions that do not require rounding.

	[[ftn33]]
	[33] On some implementations, `powr()` or `pown()` may perform faster than `pow()`.
	If `x` is known to be `>= 0`, consider using `powr()` in place of `pow()`, or if `y` is known to be an integer, consider using `pown()` in place of `pow()`.

	// Footnote 34 removed - duplicated footnote 32

	[[ftn35]]
	[35] Here `TYPE_MIN` and `TYPE_MIN_EXP` should be substituted by constants appropriate to the floating-point type under consideration, such as `FLT_MIN` and `FLT_MIN_EXP` for float.