Use std::optional instead of direct default argument for LaunchParams#5980
Use std::optional instead of direct default argument for LaunchParams#5980
Conversation
* LaunchParams is constructed when nvfuser_direct is loaded. * It uses at::cuda::getCurrentDeviceProperties(), which initializes Cuda context.
|
!test |
Description
|
| Relevant files | |||
|---|---|---|---|
| Bug fix |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
API Consistency
|
Test failures
-
(Medium, 1)
NVFuser validation mismatch in TmaPersistentTest on dlcluster_h100Test Name H100 Source TmaPersistentTestP.TmaInnerPersistentRmsNorm/__bfloat_2048_5120 ❌ Link
Greptile SummaryThis PR fixes a CUDA driver initialization error that occurs when importing The fix replaces the default
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User as Python User
participant Mod as nvfuser_direct module
participant Bind as pybind11 bindings
participant KE as KernelExecutor
participant LP as LaunchParams
participant CUDA as CUDA Runtime
Note over User,CUDA: BEFORE (broken)
User->>Mod: import nvfuser_direct
Mod->>Bind: Register KernelExecutor bindings
Bind->>LP: LaunchParams() [default arg]
LP->>LP: assertValid()
LP->>CUDA: getCurrentDeviceProperties()
CUDA-->>LP: ERROR: driver version insufficient
Note over User,CUDA: AFTER (fixed)
User->>Mod: import nvfuser_direct
Mod->>Bind: Register KernelExecutor bindings
Note right of Bind: Default = py::none() (no LaunchParams created)
User->>KE: executor.compile(fusion, args)
KE->>LP: LaunchParams() via value_or()
LP->>LP: assertValid()
LP->>CUDA: getCurrentDeviceProperties()
CUDA-->>LP: OK (CUDA is available at runtime)
Last reviewed commit: 3e40e6c |
wujingyue
left a comment
There was a problem hiding this comment.
I believe I'm missing the context. Why is this the right fix?
CUDA error: CUDA driver version is insufficient for CUDA runtime version
That sounds like a dealbreaker. I don't think any GPU code (including nvfuser) can run with an insufficient driver version.
From ^^^ @xwang233 The smoke test purposefully uses an incorrect driver. You're supposed to be able to |
wujingyue
left a comment
There was a problem hiding this comment.
LaunchParams is constructed when nvfuser_direct is loaded because it a default argument for KernelExecutor.
Could you say more about this? Is KernelExecutor::compile called accidentally?
.def(
"compile",
[](KernelExecutor& self,
Fusion* fusion,
const py::iterable& args,
const LaunchParams& launch_constraints,
const CompileParams& compile_params,
SchedulerType scheduler_type) {
self.compile(
fusion,
from_pyiterable(args),
launch_constraints,
compile_params,
scheduler_type);
},
R"(
Compile a fusion into a CUDA kernel
Parameters
----------
fusion : Fusion
The fusion to compile.
args : KernelArgumentHolder, optional
The kernel arguments. If empty, will be populated during run.
launch_constraints : LaunchParams, optional
Constraints for kernel launch parameters.
compile_params : CompileParams, optional
Parameters for kernel compilation.
scheduler_type : SchedulerType, optional
The type of scheduler to use (default: None).
Returns
-------
None
)",
py::arg("fusion"),
py::arg("args") = py::list(),
py::arg("launch_constraints") = LaunchParams(), <<<< Default argument
py::arg("compile_params") = CompileParams(),
py::arg("scheduler_type") = SchedulerType::None)Gemini Summary: Stack trace: when loading Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/pyenv/lib/python3.12/site-packages/nvfuser_direct/__init__.py", line 23, in <module>
from ._C_DIRECT import * # noqa: F401,F403
^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: CUDA error: CUDA driver version is insufficient for CUDA runtime version
Search for `cudaErrorInsufficientDriver' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
Exception raised from dsa_get_device_count at /pytorch/c10/cuda/CUDADeviceAssertionHost.cpp:60 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x70fab6b71efd in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc0d4 (0x70fab6e9a0d4 in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0x9a (0x70fab6ed2bea in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x4a (0x70fab6ed2eba in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #4: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, unsigned int, bool) + 0x55 (0x70fab6ed3b85 in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #5: c10::cuda::current_device() + 0x33 (0x70fab6ed4bc3 in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libc10_cuda.so)
frame #6: at::cuda::getCurrentDeviceProperties() + 0x9 (0x70f983291939 in /opt/pyenv/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #7: nvfuser::LaunchParams::assertValid() + 0x45 (0x70f8d7482dc5 in /opt/pyenv/lib/python3.12/site-packages/nvfuser_direct/../nvfuser_common/lib/libnvfuser_codegen.so)
frame #8: <unknown function> + 0x13d81c (0x70f8db30681c in /opt/pyenv/lib/python3.12/site-packages/nvfuser_direct/_C_DIRECT.cpython-312-x86_64-linux-gnu.so)
frame #9: <unknown function> + 0x62dbc (0x70f8db22bdbc in /opt/pyenv/lib/python3.12/site-packages/nvfuser_direct/_C_DIRECT.cpython-312-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x581c6 (0x70f8db2211c6 in /opt/pyenv/lib/python3.12/site-packages/nvfuser_direct/_C_DIRECT.cpython-312-x86_64-linux-gnu.so)
<omitting python frames> |
KernelExecutor.CUDA error: CUDA driver version is insufficient for CUDA runtime versionwhen importingnvfuser_direct.smoke_testcheck for this, so the PR usesstd::optionalinstead of direct default argument forLaunchParams