Background to read prior to reading this
Descriptor Buffers (VK_EXT_descriptor_buffer) adds a whole set of challenges for GPU-AV and this walks through the design decisions made.
The one silver lining of Descriptor Buffers is they don't touch the SPIR-V at all, so there is no change to our shader instrumentation to support it.
The following VkPhysicalDeviceDescriptorBufferPropertiesEXT are worth keeping in mind as they shape how we need to think about adding GPU-AV.
maxResourceDescriptorBufferBindingsstorageBufferDescriptorSizevkGetDescriptorSetLayoutSizeEXT before device creation time, so we might need to use this as an estimate how much memory we need to take for descriptors.resourceDescriptorBufferAddressSpaceSizemaxResourceDescriptorBufferRangedescriptorBufferOffsetAlignmentThe main first step to add support for GPU-AV/DebugPrintf is finding a way to inject our descriptors inside the Descriptor Buffer such that our shaders can access it.
There are some core problems that prevent use from easily adding GPU-AV (or even DebugPrintf) support
If we want to inject our descriptors, we want to be able to use memcpy or just point vkGetDescriptorEXT to the descriptor buffer directly. The issue occurs if the memory is not host visible. We would want to call vkCmdCopyBuffer, but that can't be called inside a render pass instance.
Using small numbers, lets say the resourceDescriptorBufferAddressSpaceSize is 1024 bytes, but maxResourceDescriptorBufferRange is only 256 bytes. In this case, the user in a command buffer might bind the offset at 0, then 256, then 512, then 256 and 0 again. In this example, if we want to add our 64 bytes somewhere, we would need to keep track and replace the old memory.
The idea of Push Descriptors is one pSetLayout in your VkPipelineLayout can be “push descriptors”. Instead of calling vkCmdSetDescriptorBufferOffsetsEXT(set = x) you just call vkCmdPushDescriptorSetKHR(set = x). The advantage of this, we can just push when we want and fully ignore the other problems listed above. The disadvantage of this, is we basically need to restrict users from using Push Descriptors with Descriptor Buffers now. Since we can only have a single VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR we need to use it. If the user uses it we run into 2 new problems.
descriptorBufferPushDescriptors is now required (seems every GPU I checked does support it though!).maxPushDescriptors limit.set/binding which has already been baked into the instrumented shader code.The sad answer is there is not going to be a single “magic bullet” here and we will either need to
VK_EXT_descriptor_buffer toolingVK_EXT_descriptor_buffer depending on what the user doesThe first trade off is around the Descriptor Buffer being host visible or not. We could
memcpy/vkGetDescriptorEXT our descriptorFor Classic descriptors we use the dynamic offset in vkCmdBindDescriptorSets to mark on the GPU which draw we are at, which can't be used now. vkCmdCopyBuffer was never used because the restriction of using it inside a render pass instance.
Give up and accept all render pass draws are grouped together
vkCmdCopyBuffer at top of a render pass and call it a day!Use Push Descriptors to set which draw
set/binding the user picks for Push Descriptors.Allocate all possible combination and copy (with memcpy or vkCmdCopyBuffer) them inside the Descriptor Buffer somewhere
VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT.vkCmdBindDescriptorBuffersEXT to make sure we can see the buffer. But if maxResourceDescriptorBufferRange is small, still might not be able to seeSo after lots of discussions, we found the easiest thing to do is just have our own Descriptor Buffer and bind it ourselves. Those who read closly above might have noticed the concern around maxResourceDescriptorBufferBindings, well it seems that very few devices only have the spec minimum limit of 1 and as of this writing, they are all older Intel devices
The plan forward is to take 1 maxResourceDescriptorBufferBindings away from the user and if the device only support 1 binding, fallback to something that likely will work. The goal here is to sacrifice a few older device to the sanity of the GPU-AV code development.