Skip to content

Add cross-platform device support (Apple Silicon / MPS, CPU)#23

Open
MerrittLegacy wants to merge 1 commit into
dotsimulate:SDTD_031_stablefrom
MerrittLegacy:SDTD_031_stable
Open

Add cross-platform device support (Apple Silicon / MPS, CPU)#23
MerrittLegacy wants to merge 1 commit into
dotsimulate:SDTD_031_stablefrom
MerrittLegacy:SDTD_031_stable

Conversation

@MerrittLegacy

Copy link
Copy Markdown

What

Makes the core StreamDiffusion pipeline run on Apple Silicon (MPS) and CPU in addition to CUDA, without regressing CUDA. Previously the package failed to import/run on a Mac because torch.cuda.* calls were executed unconditionally and the device was effectively hardcoded.

Why

On macOS there is no CUDA, so torch.cuda.Event, torch.cuda.synchronize(), empty_cache(), get_device_properties(), etc. raise or are unavailable, crashing the pipeline before any inference can run. Mac users (e.g. TouchDesigner / StreamDiffusionTD on Apple Silicon) had no working path.

Changes

  • wrapper.py
    • Resolve the requested device against availability (cuda → mps → cpu); a config that still says "cuda" transparently uses MPS on a Mac, while CUDA machines are unaffected. (Previous code forced cpu on any non-MPS box.)
    • Guard cuda.empty_cache / synchronize / ipc_collect and memory queries behind is_available(), with MPS branches (torch.mps.*).
    • Restore self.use_denoising_batch to honor the constructor parameter — it had been hardcoded to False, which breaks img2img with iteration over a 0-d tensor.
    • Skip xformers on Mac (unsupported).
  • pipeline.py — fall back to time.perf_counter() for timing when CUDA events are unavailable (MPS/CPU) instead of constructing torch.cuda.Event.
  • preprocessing/base_orchestrator.py — guard cuda.synchronize() in cleanup.
  • setup.py — allow install on macOS (MPS/CPU torch); drop cuda-python and TensorRT extras on darwin; relax the CUDA-only hard requirement.

Compatibility / risk

  • CUDA: unchanged. device="cuda" on a CUDA host still selects CUDA and all cuda.* calls run exactly as before.
  • Mac: verified end-to-end — SDXL-Turbo loads on MPS and streams img2img (~2.7 FPS at 512×512, acceleration none).
  • TensorRT remains CUDA-only and is simply skipped on Mac.

Notes for reviewers

Scoped to the core library only. TouchDesigner-integration changes (shared-memory transport on Mac, etc.) live outside this package and are not included here.

🤖 Generated with Claude Code

Make the core pipeline run on Apple Silicon (MPS) and CPU in addition to
CUDA, without regressing CUDA. Previously every torch.cuda.* call ran
unconditionally and the device was effectively hardcoded, so the package
failed to import/run on a Mac.

- wrapper.py: resolve the requested device against availability
  (cuda -> mps -> cpu); a config that still says "cuda" transparently
  uses MPS on a Mac. Guard cuda.empty_cache/synchronize/ipc_collect and
  memory queries behind is_available() with MPS/CPU branches. Restore
  self.use_denoising_batch to honor the parameter (forcing it False broke
  img2img with "iteration over a 0-d tensor"). Skip xformers on Mac.
- pipeline.py: fall back to time.perf_counter() when CUDA events are
  unavailable (MPS/CPU) instead of crashing on torch.cuda.Event.
- preprocessing/base_orchestrator.py: guard cuda.synchronize() in cleanup.
- setup.py: allow install on Mac (MPS/CPU torch); drop cuda-python and
  TensorRT extras on darwin; relax the CUDA-only hard requirement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant