Fix CUDA build with contrib ops disabled#28554
Draft
Copilot wants to merge 2 commits into
Draft
Conversation
The CUDA Attention kernel implementation (core/providers/cuda/llm/attention.cc) depends on contrib ops (flash attention, memory efficient attention, unfused attention helpers from contrib_ops/cuda/bert/). When DISABLE_CONTRIB_OPS is defined, these dependencies are unavailable causing compilation failures. Fix by: 1. Excluding attention.h/attention.cc from the CUDA provider build when contrib ops are disabled (cmake change). 2. Guarding the Attention kernel class declarations and registrations in cuda_execution_provider.cc with #ifndef DISABLE_CONTRIB_OPS. The CPU EP still provides the standard ONNX domain Attention kernel as fallback when the CUDA implementation is unavailable. Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/4bbef367-4e58-49e5-9bca-8d5a2c8ee872 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix onnxruntime build with CUDA enabled and contrib ops disabled
Fix CUDA build with contrib ops disabled
May 19, 2026
| @@ -3083,9 +3089,11 @@ static Status RegisterCudaKernels(KernelRegistry& kernel_registry) { | |||
| BuildKernelCreateInfo<ONNX_OPERATOR_VERSIONED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 23, 23, Unsqueeze)>, | |||
|
|
|||
| // Opset 24 | |||
Contributor
There was a problem hiding this comment.
Suggested change
| // Opset 24 | |
| // Opset 24 |
| @@ -3005,9 +3009,11 @@ static Status RegisterCudaKernels(KernelRegistry& kernel_registry) { | |||
| BuildKernelCreateInfo<ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 22, BFloat16, Sin)>, | |||
|
|
|||
| // Opset 23 | |||
Contributor
There was a problem hiding this comment.
Suggested change
| // Opset 23 | |
| // Opset 23 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The CUDA Attention kernel (
core/providers/cuda/llm/attention.cc) depends on contrib_ops internals (flash attention, memory efficient attention, unfused attention helpers) but was compiled unconditionally. When building with--disable_contrib_ops,GetAttentionKernelOptions()is unavailable (guarded by#ifndef DISABLE_CONTRIB_OPSincuda_kernel.h), causing a compile error.Changes:
cmake/onnxruntime_providers_cuda.cmake— Excludeattention.h/attention.ccfrom the CUDA provider source list when contrib ops are disabledcuda_execution_provider.cc— Guard Attention kernel forward declarations andBuildKernelCreateInforegistrations (opset 23 and 24) with#ifndef DISABLE_CONTRIB_OPSThe CPU EP still provides the ONNX domain Attention kernel as fallback.
Motivation and Context
Building onnxruntime with CUDA enabled and
--disable_contrib_opsfails:This is a valid build configuration (useful for reducing compile time) that should be supported.