[Question]: When using GPU operator, how do you integrate with node problem detector for gpu checks? e.g. to check if drivers are installed

I migrated from running host drivers to managing drivers running GPU operator on our clusterw. In the past we were able to have node problem detector run and check various information about the device and drivers and set conditions. However, many of those checks relied on nvidia-smi. 

Given, that nvidia-smi is not available on the host when using gpu operator, what is the best approach for integrating with Node Problem Detector? Specifically, we want to be able to have a condition that tells us if drivers are installed which we plan to use with https://kubernetes.io/blog/2026/02/03/introducing-node-readiness-controller/ to taint nodes until they are ready after spinning up a new node,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: When using GPU operator, how do you integrate with node problem detector for gpu checks? e.g. to check if drivers are installed #2137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: When using GPU operator, how do you integrate with node problem detector for gpu checks? e.g. to check if drivers are installed #2137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions