Skip to content

Fix Device::availableMemory crash on Vulkan 1.0 instances#1712

Open
mbait wants to merge 1 commit into
vsg-dev:masterfrom
mbait:fix/availableMemory-vulkan-1.0-fallback
Open

Fix Device::availableMemory crash on Vulkan 1.0 instances#1712
mbait wants to merge 1 commit into
vsg-dev:masterfrom
mbait:fix/availableMemory-vulkan-1.0-fallback

Conversation

@mbait
Copy link
Copy Markdown

@mbait mbait commented May 16, 2026

Description

Device::availableMemory() unconditionally calls vkGetPhysicalDeviceMemoryProperties2 with a chained VkPhysicalDeviceMemoryBudgetPropertiesEXT. Both have preconditions that are never checked:

  • vkGetPhysicalDeviceMemoryProperties2 requires Vulkan 1.1 (or VK_KHR_get_physical_device_properties2, promoted to core in 1.1).
  • VkPhysicalDeviceMemoryBudgetPropertiesEXT requires the VK_EXT_memory_budget device extension.

When a VSG application creates a vsg::Instance with VK_API_VERSION_1_0 (still the default of vsg::Instance::create), this function dispatches an unresolved entry point and feeds the driver a struct it never advertised support for. Behaviour is undefined. On Mesa Lavapipe it segfaults inside a driver worker thread during normal scene compilation (MemoryBufferPools::reserveBuffer -> Device::availableMemory). The Khronos validation layer flags it as:

vkGetPhysicalDeviceMemoryProperties2(): Attempted to call with an effective API version of 1.0.0 … but this API was not promoted until version 1.1.0.

Fix

Gate the 1.1 path on both supportsApiVersion(VK_API_VERSION_1_1) and supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME). When either is missing, fall back to the Vulkan 1.0 vkGetPhysicalDeviceMemoryProperties and treat memoryHeaps[i].size as the budget with zero live usage. The fallback returns a conservative upper bound — buffer pool sizing degrades gracefully instead of crashing.

No public API changes. The 1.1 + extension code path is unchanged for users who request it.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Reproduced and verified on Mesa 25.2.8 Lavapipe (lvp_icd.json) in a GPU-less Linux container with a minimal headless VSG application (vsg::Instance::create(..., VK_API_VERSION_1_0), offscreen framebuffer, one Builder::createSphere frame).

  • Before the patch: SIGSEGV inside libvulkan_lvp.so during viewer->compile(), validation layer reports the API-version violation above.
  • After the patch: clean run, no validation errors, expected image written.

Test Configuration:

  • OS: Ubuntu 24.04 (LXC container, no GPU)
  • Vulkan loader: 1.3.275
  • Driver: Mesa 25.2.8 Lavapipe (LLVM 20.1.2)
  • Compiler: GCC 13.3.0

Checklist

  • My code follows the style guidelines of this project (clang-format clean)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings

@robertosfield
Copy link
Copy Markdown
Collaborator

Are you able to recreate the problem in other ways?

I'd like to create the problem on my system so I can test your PR and other possible approaches. I've just tried on my Kubuntu 26.04 + AMD8700G system but selecting the Lavapipe driver using vsgdeviceselection --select 2 and it works.

$ vsgdeviceselection models/openstreetmap.vsgt --select 2
vkEnumerateInstanceVersion() 4211029
VK_API_VERSION = 1.4.341.0
Selected vsg::PhysicalDevice ref_ptr<vsg::PhysicalDevice>(vsg::PhysicalDevice 0x78deb10f1838) llvmpipe (LLVM 21.1.8, 256 bits) deviceType = 4

Other approaches I'm curious about is making changing the default of VK_API_VERSION_1_0 to VK_API_VERSION_1_1 when creating the Vulkan instance. Another change would be to make the supported check something that is done within vsg::Device on creation rather than on every call to availableMemory.

I am also inclined towards to have separate code blocks for the memory available check, one with the memory budget extension and one without, rather than having ? usage sprinkled through the code. The later is more compact but harder to interpret what is going on in each case.

@mbait
Copy link
Copy Markdown
Author

mbait commented May 19, 2026

Honestly, I'm not sure how to reproduce the bug on a physical system. I've come accross the issue building my project within a CI instance that's an LXC container without a physical GPU passed through. I think running a headless VSG application within any unprivileged Linux container system will reproduce the problem.

@mbait
Copy link
Copy Markdown
Author

mbait commented May 19, 2026

I'm not so familiar with the internal of either Vulkan or VulkanSceneGraph. If you say that availableMemory is a hot path, then sure - the check needs to be in a different place. I'm open to suggestions and hints. But in general I think that the ability to run VSG applications in headless mode, within containers is a great feature that helps projects to increase the level of confidence when producing automated builds, so I look forward for fixing the current issue that blocks that.

@robertosfield
Copy link
Copy Markdown
Collaborator

I have been experimenting with runntime checks of VK_EXT_memory_budget extension with the following code:

   vsg::info("Device supportsApiVersion(VK_API_VERSION_1_1) = ", supportsApiVersion(VK_API_VERSION_1_1));
    vsg::info("VK_EXT_MEMORY_BUDGET_EXTENSION_NAME = ", VK_EXT_MEMORY_BUDGET_EXTENSION_NAME,", supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) = ", supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME));

When I run it I see:

info: Device supportsApiVersion(VK_API_VERSION_1_1) = 1
info: VK_EXT_MEMORY_BUDGET_EXTENSION_NAME = VK_EXT_memory_budget, supportsDeviceExtension(VK_EXT_MEMORY_BUDGET_EXTENSION_NAME) = 0

But... when I run vsgdeviceselection --extensions it lists VK_EXT_memory_budget:

$ vsgdeviceselection --extensions | grep VK_EXT_memory_budget
    extensionName = VK_EXT_memory_budget, spec = 1

I don't know the reason for this discrepancy.

@robertosfield
Copy link
Copy Markdown
Collaborator

OK, I've figured out the discrepancy, VK_EXT_memory_budget isn't enabled by default, so the Device::supportsDeviceExtension(..) method doesn't return true.

However, this extension is working fine on my system without it being enabled so I presume the extension has been promoted or enabled by default on my NVidia drivers.

@robertosfield
Copy link
Copy Markdown
Collaborator

I have implemented an alternative approach:

https://github.com/vsg-dev/VulkanSceneGraph/tree/availableMemory_checks

@mbait Could these this branch and if it works for your usage case I'll merged it with VSG master.

@mbait
Copy link
Copy Markdown
Author

mbait commented May 20, 2026

No, that doesn't work for me - I still experience segfaults running the test app in a headless container.

The branch's check is one half of the precondition (device extension) but misses the other (instance API version /
KHR companion). My PR also gates on supportsApiVersion(VK_API_VERSION_1_1), which is why it survives the 1.0 case.

For a short period of time I can provide you with a remote container you will be able to ssh into if you need - just let me know. Or you can have one at your own machine with Docker or Podman. The idea is to have the Lavapipe as the default driver, because manual selection might not work as you expected.

vkGetPhysicalDeviceMemoryProperties2 and VkPhysicalDeviceMemoryBudgetPropertiesEXT
were called unconditionally, but the former requires Vulkan 1.1 and the latter
requires VK_EXT_memory_budget. On a 1.0 instance this triggered undefined
behaviour and crashed Lavapipe during scene compilation. Cache the combined
precondition as a Device::memory_budget bool, and fall back to
vkGetPhysicalDeviceMemoryProperties with heap sizes when it is unset.
@mbait mbait force-pushed the fix/availableMemory-vulkan-1.0-fallback branch from 9d802ed to b634908 Compare May 21, 2026 00:34
@mbait
Copy link
Copy Markdown
Author

mbait commented May 21, 2026

Reworked to follow the caching pattern from #1711 / branch availableMemory_checks:

  • Adds const bool Device::memory_budget set once at the end of Device::Device() and re-used on every availableMemory() call.
  • Same field name and type as availableMemory_checks so reconciling the two changes is mechanical.
  • The precondition stored in the cache combines both supportsApiVersion(VK_API_VERSION_1_1) and supportsDeviceExtension(VK_EXT_memory_budget). The API-version half is what fixes the original crash — without it, a vsg::Instance::create(..., VK_API_VERSION_1_0) on a driver that supports VK_EXT_memory_budget (e.g. Mesa Lavapipe) still dispatches vkGetPhysicalDeviceMemoryProperties2 against an instance that never advertised support for it and crashes the driver. Verified against the same headless PoC: green with this PR, still SIGSEGVs on availableMemory_checks alone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants