Fix Device::availableMemory crash on Vulkan 1.0 instances#1712
Conversation
|
Are you able to recreate the problem in other ways? I'd like to create the problem on my system so I can test your PR and other possible approaches. I've just tried on my Kubuntu 26.04 + AMD8700G system but selecting the Lavapipe driver using vsgdeviceselection --select 2 and it works. $ vsgdeviceselection models/openstreetmap.vsgt --select 2
vkEnumerateInstanceVersion() 4211029
VK_API_VERSION = 1.4.341.0
Selected vsg::PhysicalDevice ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x78deb10f1838) llvmpipe (LLVM 21.1.8, 256 bits) deviceType = 4Other approaches I'm curious about is making changing the default of VK_API_VERSION_1_0 to VK_API_VERSION_1_1 when creating the Vulkan instance. Another change would be to make the supported check something that is done within vsg::Device on creation rather than on every call to availableMemory. I am also inclined towards to have separate code blocks for the memory available check, one with the memory budget extension and one without, rather than having ? usage sprinkled through the code. The later is more compact but harder to interpret what is going on in each case. |
|
Honestly, I'm not sure how to reproduce the bug on a physical system. I've come accross the issue building my project within a CI instance that's an LXC container without a physical GPU passed through. I think running a headless VSG application within any unprivileged Linux container system will reproduce the problem. |
|
I'm not so familiar with the internal of either Vulkan or VulkanSceneGraph. If you say that |
|
I have been experimenting with runntime checks of VK_EXT_memory_budget extension with the following code: vsg::info("Device supportsApiVersion(VK_API_VERSION_1_1) = ", supportsApiVersion(VK_API_VERSION_1_1));
vsg::info("VK_EXT_MEMORY_BUDGET_EXTENSION_NAME = ", VK_EXT_MEMORY_BUDGET_EXTENSION_NAME,", supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) = ", supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME));
When I run it I see: info: Device supportsApiVersion(VK_API_VERSION_1_1) = 1
info: VK_EXT_MEMORY_BUDGET_EXTENSION_NAME = VK_EXT_memory_budget, supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) = 0But... when I run $ vsgdeviceselection --extensions | grep VK_EXT_memory_budget
extensionName = VK_EXT_memory_budget, spec = 1I don't know the reason for this discrepancy. |
|
OK, I've figured out the discrepancy, VK_EXT_memory_budget isn't enabled by default, so the Device::supportsDeviceExtension(..) method doesn't return true. However, this extension is working fine on my system without it being enabled so I presume the extension has been promoted or enabled by default on my NVidia drivers. |
|
I have implemented an alternative approach: https://github.com/vsg-dev/VulkanSceneGraph/tree/availableMemory_checks @mbait Could these this branch and if it works for your usage case I'll merged it with VSG master. |
|
No, that doesn't work for me - I still experience segfaults running the test app in a headless container. The branch's check is one half of the precondition (device extension) but misses the other (instance API version / For a short period of time I can provide you with a remote container you will be able to ssh into if you need - just let me know. Or you can have one at your own machine with Docker or Podman. The idea is to have the Lavapipe as the default driver, because manual selection might not work as you expected. |
vkGetPhysicalDeviceMemoryProperties2 and VkPhysicalDeviceMemoryBudgetPropertiesEXT were called unconditionally, but the former requires Vulkan 1.1 and the latter requires VK_EXT_memory_budget. On a 1.0 instance this triggered undefined behaviour and crashed Lavapipe during scene compilation. Cache the combined precondition as a Device::memory_budget bool, and fall back to vkGetPhysicalDeviceMemoryProperties with heap sizes when it is unset.
9d802ed to
b634908
Compare
|
Reworked to follow the caching pattern from #1711 / branch
|
Description
Device::availableMemory()unconditionally callsvkGetPhysicalDeviceMemoryProperties2with a chainedVkPhysicalDeviceMemoryBudgetPropertiesEXT. Both have preconditions that are never checked:vkGetPhysicalDeviceMemoryProperties2requires Vulkan 1.1 (orVK_KHR_get_physical_device_properties2, promoted to core in 1.1).VkPhysicalDeviceMemoryBudgetPropertiesEXTrequires theVK_EXT_memory_budgetdevice extension.When a VSG application creates a
vsg::InstancewithVK_API_VERSION_1_0(still the default ofvsg::Instance::create), this function dispatches an unresolved entry point and feeds the driver a struct it never advertised support for. Behaviour is undefined. On Mesa Lavapipe it segfaults inside a driver worker thread during normal scene compilation (MemoryBufferPools::reserveBuffer->Device::availableMemory). The Khronos validation layer flags it as:Fix
Gate the 1.1 path on both
supportsApiVersion(VK_API_VERSION_1_1)andsupportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME). When either is missing, fall back to the Vulkan 1.0vkGetPhysicalDeviceMemoryPropertiesand treatmemoryHeaps[i].sizeas the budget with zero live usage. The fallback returns a conservative upper bound — buffer pool sizing degrades gracefully instead of crashing.No public API changes. The 1.1 + extension code path is unchanged for users who request it.
Type of change
How Has This Been Tested?
Reproduced and verified on Mesa 25.2.8 Lavapipe (
lvp_icd.json) in a GPU-less Linux container with a minimal headless VSG application (vsg::Instance::create(..., VK_API_VERSION_1_0), offscreen framebuffer, oneBuilder::createSphereframe).libvulkan_lvp.soduringviewer->compile(), validation layer reports the API-version violation above.Test Configuration:
Checklist