NVIDIA Open GPU Kernel Modules Version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.95.05 Release Build (dvs-builder@U22-I3-B17-02-5) Tue Sep 23 09:55:41 UTC 2025
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 24.04.3 LTS
Kernel Release
6.14.0-37-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-ef5135f3-1177-e4c8-dd47-3818ddbe9182) GPU 1: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-9d278fa4-f945-cee7-9cb4-2626f3fc5
Describe the bug
Hello,
We've encountered an issue when running LLMs using inference frameworks like vLLM or Sglang in a multi GPU configuration. When I attempt to shut down the machine, either via sudo shutdown now or the desktop UI, it occasionally reboots instead of powering off. After it reboots once, I am usually able to shut it down normally. The issue is non-deterministic. It sometimes shuts down correctly, but other times it triggers a restart. We tested on the four machines with below configuration. The same issue on all machines. Please help to fix it.
- Motherboard: Gibabyte TRX50 AI TOP
- CPU: AMD Ryzen Threadripper 9960X 24-Cores
- GPU: 2xNVIDIA RTX PRO 6000 Blackwell Max-Q
- PSU: FSP2500-57APB
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... Off | 00000000:21:00.0 Off | Off |
| 30% 33C P8 5W / 300W | 276MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA RTX PRO 6000 Blac... Off | 00000000:C1:00.0 Off | Off |
| 30% 34C P8 15W / 300W | 15MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2126 G /usr/lib/xorg/Xorg 118MiB |
| 0 N/A N/A 2276 G /usr/bin/gnome-shell 24MiB |
| 1 N/A N/A 2126 G /usr/lib/xorg/Xorg 4MiB |
cat /proc/driver/nvidia/params | grep DynamicPowerManagement
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
cat /proc/driver/nvidia/gpus/0000\:21\:00.0/power
Runtime D3 status: Disabled by default
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Not Supported
Status: Disabled
Notebook Dynamic Boost: Not Supported
cat /proc/driver/nvidia/gpus/0000\:c1\:00.0/power
Runtime D3 status: Disabled by default
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Not Supported
Status: Disabled
Notebook Dynamic Boost: Not Supported
To Reproduce
- vllm serve --model Qwen/Qwen3-VL-30B-A3B-Instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.9
- sudo shutdown now
- It restarts instead of shutting down
Bug Incidence
Always
nvidia-bug-report.log.gz
Running nvidia-bug-report.sh... complete.
Summary of Skipped Sections:
Skipped Component | Details
================================================================================
ldd output | glxinfo not found
--------------------------------------------------------------------------------
vulkaninfo output | vulkaninfo not found
--------------------------------------------------------------------------------
ibstat output | ibstat not found
--------------------------------------------------------------------------------
acpidump output | acpidump not found
--------------------------------------------------------------------------------
mst output | mst not found
--------------------------------------------------------------------------------
nvlsm-bug-report.sh output | nvlsm-bug-report.sh not found
--------------------------------------------------------------------------------
Summary of Errors:
Error Component | Details | Resolution
=========================================================================================================================
More Info
No response
NVIDIA Open GPU Kernel Modules Version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.95.05 Release Build (dvs-builder@U22-I3-B17-02-5) Tue Sep 23 09:55:41 UTC 2025
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 24.04.3 LTS
Kernel Release
6.14.0-37-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-ef5135f3-1177-e4c8-dd47-3818ddbe9182) GPU 1: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-9d278fa4-f945-cee7-9cb4-2626f3fc5
Describe the bug
Hello,
We've encountered an issue when running LLMs using inference frameworks like vLLM or Sglang in a multi GPU configuration. When I attempt to shut down the machine, either via
sudo shutdown nowor the desktop UI, it occasionally reboots instead of powering off. After it reboots once, I am usually able to shut it down normally. The issue is non-deterministic. It sometimes shuts down correctly, but other times it triggers a restart. We tested on the four machines with below configuration. The same issue on all machines. Please help to fix it.To Reproduce
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
No response