The Tracy profiler has been setup. Get the doc here.
Used Tracy version: 0.12.1 (alpha)
Relevant CMake options:
-D VVL_ENABLE_TRACY Enable Tracy-D VVL_ENABLE_TRACY_CPU_MEMORY Enable Tracy CPU memory profiling-D VVL_TRACY_CALLSTACK=<N> Define maximum collected call stack size (default: 48)-D VVL_ENABLE_TRACY_GPU Enable GPU profiling: (only retrieves timings for draw, dispatch and trace rays commands)⚠️ Having a big call stack size can have a noticeable impact on performance
⚠️ Make sure your the various dependencies are compiled with the same optimisations levels and debug as VVL, otherwise expect crashes in TracyAlloc/Free
You may notice a stall when shutting down the profiled application: have a look at the profiler, the “query backlog” (satellite icon, around top right) is probably being emptied. It can take some time.
Meant to be used with applications that do not live for only a small amount of time, and create only one VkInstance. => So for now, forget about profiling our test suite...
Manual profiler lifetimes are used: profiler is started when the VVL shared library is loaded, and it is shut down at vkDestoyInstance time. => It is thus assumed that application only create one instance, as of writing it appears to be the case in most applications.
CPU memory profiling cannot be used with Mimalloc. It needs to be setup, a quick stab at it showed that it is blowing up Tracy.
Those features seem to be present on all drivers used by the main VVL developpers.
⚠️ Required features are added by hacking into ValidationStateTracker::PreCallRecordCreateDevice, so profiling can only work if ValidationStateTracker is somehow instantiated.
As of writing, the Tracy GPU profiling implementations only allows to trace individual commands, and expects every profiled commands to actually be submitted, otherwise due to the current query collecting algorithm queries made after un-submitted commands will never be seen. Profiling is supposed to work on any applications, and some do have this failing case. To cope, a custom Tracy fork is now used, with new features allowing to profile at queue submit level: for every vkQueueSubmit, a timestamp is inserted before and after its submissions. This way only actually executed GPU workloads will be profiled, bypassing the previously mentioned limitation. For now, this coarse profiling level is enough.
When looking at the Statistics > GPU window of a trace, you may noticed that some action submits are missing. It just means that by the time application shuts down, not all GPU time stamp queries have been retrieved.