Avoid CUDA context initialization during op compatibility checks at import by Achyuthan-S · Pull Request #8078 · deepspeedai/DeepSpeed

Achyuthan-S · 2026-06-19T11:05:15Z

Summary

Importing DeepSpeed initialized a CUDA context in the parent process, which permanently breaks fork()-based multiprocessing. This makes import deepspeed fork-safe.

Fixes #7918.

Root cause

deepspeed/git_version_info.py runs builder.is_compatible() for every op at import time. Eight CUDA op builders call torch.cuda.get_device_properties(0).major inside is_compatible(). That call triggers torch.cuda._lazy_init() and creates a CUDA context in the parent. Any subsequent fork() whose child touches CUDA then fails with:

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with
multiprocessing, you must use the 'spawn' start method

(torch.cuda.is_available() is NVML-backed and fork-safe on modern PyTorch — get_device_properties is the call that poisons the process.)

Fix

Add CUDAOpBuilder.cuda_capability_major(), a fork-safe capability probe that returns the device's compute-capability major only when it is safe to read:

skips the probe when no CUDA context exists yet (not torch.cuda.is_initialized()), so a plain import deepspeed never creates one;
skips it inside a forked child whose inherited context is invalid (torch.cuda._is_in_bad_fork()), mirroring the existing guard added in Avoid CUDA reinit error in CI tests #7977;
returns None in those cases.

All eight builders now route through this helper and skip the compute-capability check when the device cannot be probed safely, deferring it to build/load time (where a context already exists). The is_rocm_pytorch() / is_available() guards stay in the callers.

Behavior note

When the capability cannot be probed safely (e.g. at import before CUDA is initialized), the compute-capability gate in is_compatible() is skipped rather than failing. The real check still runs at build/load time once a context exists, so this only relaxes a redundant import-time check in exchange for fork safety.

Tests

Three unit tests for the helper's decision tree (not-initialized → skip, initialized → probe, bad-fork → skip), following the existing test_bad_fork_jit_* pattern in tests/unit/ops/test_op_builder.py (mocked torch.cuda, no GPU required).
test_import_deepspeed_does_not_initialize_cuda: a subprocess regression test asserting import deepspeed leaves CUDA uninitialized.

To validate on GPU: pytest --forked tests/unit/ops/deepspeed4science/test_DS4Sci_EvoformerAttention.py (the repro from #7918) — should pass on a CUDA runner instead of hitting the re-initialize error.

Docs

Updated CONTRIBUTING.md and docs/contributing.md to clarify that --forked is safe now that import deepspeed no longer initializes CUDA — resolving the contradiction called out in the issue.

cc @tjruwase @loadams @tohtana

…ai#7918) import deepspeed eagerly calls is_compatible() for all ops; eight builders probed get_device_properties(0), which lazy-inits CUDA and breaks fork()-based multiprocessing. Gate the probe on is_initialized() via a shared CUDAOpBuilder.cuda_capability_major() helper, and clarify that pytest --forked is safe now that import no longer initializes a CUDA context. Fixes deepspeedai#7918 Signed-off-by: Achyuthan Sivasankar <achyuthan.sivasankar@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

This PR addresses a fork-safety issue where import deepspeed could initialize a CUDA context (via import-time op compatibility checks), breaking fork()-based multiprocessing. It introduces a fork-safe CUDA capability probe and updates CUDA op builders to avoid context creation during import.

Changes:

Add CUDAOpBuilder.cuda_capability_major() to safely query compute capability only when CUDA is already initialized and not in a bad-fork state.
Update affected CUDA op builders’ is_compatible() logic to use the helper and skip capability gating when probing would be unsafe.
Add unit/regression tests and update contributing docs to reflect that --forked is now safe with DeepSpeed imports.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit/ops/test_op_builder.py	Adds unit tests for the new helper and a subprocess regression test to ensure `import deepspeed` doesn’t initialize CUDA.
op_builder/builder.py	Introduces `CUDAOpBuilder.cuda_capability_major()` with guards to avoid CUDA context initialization.
op_builder/transformer_inference.py	Switches capability checks to the fork-safe helper and gates comparisons on `None`.
op_builder/spatial_inference.py	Switches Ampere gating to the fork-safe helper and guards on `None`.
op_builder/ragged_utils.py	Switches capability checks to the fork-safe helper and guards on `None`.
op_builder/ragged_ops.py	Switches capability checks to the fork-safe helper and guards on `None`.
op_builder/inference_cutlass_builder.py	Switches capability checks to the fork-safe helper and guards on `None`.
op_builder/inference_core_ops.py	Switches capability checks to the fork-safe helper and guards on `None`.
op_builder/fp_quantizer.py	Switches capability checks to the fork-safe helper and guards on `None`.
op_builder/evoformer_attn.py	Switches capability checks to the fork-safe helper and guards on `None`.
docs/contributing.md	Updates contributing guidance to clarify that `--forked` is safe now that imports don’t initialize CUDA.
CONTRIBUTING.md	Mirrors the contributing guidance update from `docs/contributing.md`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    check = (
+        "import torch, deepspeed; "
+        "assert not torch.cuda.is_initialized(), "  #ignore-cuda
+        "'import deepspeed initialized a CUDA context (issue #7918)'")
+    result = subprocess.run([sys.executable, "-c", check], capture_output=True, text=True)
+    if "ModuleNotFoundError" in result.stderr:
+        pytest.skip("deepspeed/torch not importable in a subprocess in this environment")
+    assert result.returncode == 0, result.stderr


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02b1c335cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-19T11:09:42Z

+        "import torch, deepspeed; "
+        "assert not torch.cuda.is_initialized(), "  #ignore-cuda


Verify fork safety, not just CUDA context state

This regression check can pass while the fork failure still exists: import deepspeed still runs op compatibility checks that call torch.cuda.is_available(), and PyTorch only documents that call as non-poisoning when PYTORCH_NVML_BASED_CUDA_CHECK=1 is set (https://docs.pytorch.org/docs/stable/generated/torch.cuda.is_available.html). Since is_available() can poison fork without making torch.cuda.is_initialized() true, CUDA-enabled environments can still fail in a forked child even though this assertion succeeds; the test should actually fork after import and touch CUDA, or the import path must avoid/use the NVML-safe availability check.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings June 19, 2026 11:05

Achyuthan-S requested review from loadams, tjruwase and tohtana as code owners June 19, 2026 11:05

Copilot started reviewing on behalf of Achyuthan-S June 19, 2026 11:05 View session

Achyuthan-S mentioned this pull request Jun 19, 2026

Fork safety #7918

Open

Copilot AI reviewed Jun 19, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid CUDA context initialization during op compatibility checks at import#8078

Avoid CUDA context initialization during op compatibility checks at import#8078
Achyuthan-S wants to merge 1 commit into
deepspeedai:masterfrom
Achyuthan-S:fix/import-fork-safety

Achyuthan-S commented Jun 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		"import torch, deepspeed; "
		"assert not torch.cuda.is_initialized(), " #ignore-cuda

Conversation

Achyuthan-S commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Behavior note

Tests

Docs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Achyuthan-S commented Jun 19, 2026 •

edited

Loading