This page summarizes the major functional and performance changes in each release of the 4.x series.
All performance data on this page is measured on an Intel Core i5-9600K clocked at 4.2 GHz, running astcenc using AVX2 and 6 threads.
Status: May 2024
The 4.8.0 release is a minor maintenance release.
-DASTCENC_UBSAN=ON on the CMake configure line.Status: January 2024
The 4.7.0 release is a major maintenance release, fixing rounding behavior in the decompressor to match the Khronos specification. This fix includes the addition of explicit support for optimizing for decode_unorm8 rounding.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.
decode_unorm8 extension rounding rules. This bug could result in LSB bit flips relative to the standard specification.alignas() in the reference C implementation, as the default alignas(16) is narrower than the native minimum alignment requirement on some CPUs.ASTCENC_FLG_USE_DECODE_UNORM8. This flag indicates that the image will be used with the decode_unorm8 decode mode. When set during compression this allows the compressor to use the correct rounding when determining the best encoding.-decode_unorm8. This option indicates that the image will be used with the decode_unorm8 decode mode. This option will automatically be set for decompression (-d*) and trial (-t*) tool operation if the decompressed output image is stored to an 8-bit per component file format. This option must be set manually for compression (-c*) tool operation, as the desired decode mode cannot be reliably determined.-silent is used.Status: November 2023
The 4.6.1 release is a minor maintenance release to fix a scaling bug on large core count Windows systems.
astcenc command line tool can now use more than 64 cores on large core count systems. This change doubled command line performance for -exhaustive compression when testing on an 96 core/192 thread system.astcenc command line tool are now included in the prebuilt release binaries.Status: November 2023
The 4.6.0 release retunes the compressor heuristics to give improvements to performance for trivial losses to image quality. It also includes some minor bug fixes and code quality improvements.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.
ASTCENC_FLG_DECOMPRESS_ONLY flag.reinterpret_cast in the core codec to avoid strict aliasing violations.-medium search quality no longer tests 4 partition encodings for block sizes between 25 and 83 texels (inclusive). This improves performance for a tiny drop in image quality.-thorough and higher search qualities no longer test the mode0 first search for block sizes between 25 and 83 texels (inclusive). This improves performance for a tiny drop in image quality.TUNE_MAX_PARTITIONING_CANDIDATES reduced from 32 to 8 to reduce the size of stack allocated data structures. This causes a tiny drop in image quality for the -verythorough and -exhaustive presets.Status: June 2023
The 4.5.0 release is a maintenance release with small image quality improvements, and a number of build system quality of life improvements.
-ffp-model=precise with -ffp-contract=off which is needed to restore invariance due to recent changes in compiler defaults./fp:precise instead of /fp:strict, which is is now possible because precise no longer implies contraction. This should improve performance for MSVC builds.-ffp-model=precise with -ffp-contract=on. This should improve performance on older Clang versions which defaulted to no contraction./fp:precise with /fp:contract. This should improve performance for MSVC builds.ASTCENC_ prefix to add a namespace and group options when the library is used in a larger project.ASTCENC_UNIVERSAL_BUILD for building macOS universal binaries has been improved to include the x86_64h slice for AVX2 builds. Universal builds are now on by default for macOS, and always include NEON (arm64), SSE4.1 (x86_64), and AVX2 (x86_64h) variants.ASTCENC_NO_INVARIANCE has been inverted to remove the negated option, and is now ASTCENC_INVARIANCE with a default of ON. Disabling this option can substantially improve performance, but images can different across platforms and compilers.Status: March 2023
The 4.4.0 release is a minor release with image quality improvements, a small performance boost, and a few new quality-of-life features.
astcenccli_entry.cpp for an example of code performing this check.-DSHAREDLIB=ON CMake option, resulting in e.g. libastcenc-avx2-shared.so. Note that the command line tool is always statically linked.Key for charts:
Relative performance vs 4.3 release:
Status: January 2023
The 4.3.1 release is a minor maintenance release. No performance or image quality changes are expected.
-2/3/4partitioncandidatelimit CLI options.-3/4partitionindexlimit CLI options.stb_image.h v2.28, which includes multiple fixes and improvements for image loading.Status: January 2023
The 4.3.0 release is an optimization release. There are minor performance and image quality improvements in this release.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.
windows.h include for MinGW compatibility.-mask command line option, ASTCENC_FLG_MAP_MASK in the library API, has been removed.QUANT_256 encodings. This gives a small image quality improvement for the 4x4 block size.decimation_info lookup tables. This significantly reduces compressor memory footprint and improves context creation time. Impact increases with the active block size.Key for charts:
Relative performance vs 4.2 release:
Status: November 2022
The 4.2.0 release is an optimization release. There are significant performance improvements, minor image quality improvements, and library interface changes in this release.
Reminder - the codec library API is not designed to be binary compatible across versions. We always recommend rebuilding your client-side code using the updated astcenc.h header.
config_init() as well as in context_alloc().-exhaustive mode now runs full trials on more partitioning candidates and block candidates. This improves image quality by 0.1 to 0.25 dB, but slows down compression by 3x. The -verythorough and -thorough modes also test more candidates.-verythorough, has been introduced to provide a standard performance point between -thorough and the re-tuned -exhaustive mode. This new mode is faster and higher quality than the -exhaustive preset in the 4.1 release.-medium and -thorough searches, for a minor loss in image quality.-thorough searches, for a minor loss in image quality.avgs_and_dirs() calculation.Key for charts:
Relative performance vs 4.0 and 4.1 release:
Status: August 2022
The 4.1.0 release is a maintenance release. There is no performance or image quality change in this release.
GL_LUMINANCE or GL_LUMINANCE_ALPHA format enums when writing KTX output files. Luminance textures now use the GL_RED format and luminance_alpha textures now use the GL_RG format.-dimage option to generate diagnostic images showing aspects of the compression encoding. The output file name with its extension stripped is used as the stem of the diagnostic image file names.maskmovdqu instructions, as they can generate faults on masked lanes.Status: July 2022
The 4.0.0 release introduces some major performance enhancement, and a number of larger changes to the heuristics used in the codec to find a more effective cost:quality trade off.
-array option for specifying the number of image planes for ASTC 3D volumetric block compression been renamed to -zdim.bin instead of astcenc, allowing the CMake install step to write binaries into /usr/local/bin if the user wishes to do so.-ssw option for specifying the shader sampling swizzle has been added as convenience alternative to the -cw option. This is needed to correct error weighting during compression if not all components are read in the shader. For example, to extract and compress two components from an RGBA input image, weighting the two components equally when sampling through .ra in the shader, use -esw ggga -ssw ra. In this example -ssw ra is equivalent to the alternative -cw 1 0 0 1 encoding.-a alpha weighting option has been re-enabled in the backend, and now again applies alpha scaling to the RGB error metrics when encoding. This is based on the maximum alpha in each block, not the individual texel alpha values used in the earlier implementation.-repeats <count> for testing, which will iterate around compression and decompression count times. Reported performance metrics also now separate compression and decompression scores.cl.exe and clangcl.exe compilers.cl.exe and clangcl.exe compilers.NO_INVARIANCE builds will enable the -ffp-contract=fast option for all targets when using Clang or GCC. In addition AVX2 targets will also set the -mfma option. This reduces image quality by up to 0.2dB (normally much less), but improves performance by up to 5-20%.QUANT_11 or lower. Higher quantization levels assume default 0-1 range, which is less accurate but much faster.Key for charts:
Relative performance vs 3.7 release:
Copyright © 2022-2024, Arm Limited and contributors. All rights reserved.