Add cross-platform device support (Apple Silicon / MPS, CPU) by MerrittLegacy · Pull Request #23 · dotsimulate/StreamDiffusion

MerrittLegacy · 2026-06-21T04:53:56Z

What

Makes the core StreamDiffusion pipeline run on Apple Silicon (MPS) and CPU in addition to CUDA, without regressing CUDA. Previously the package failed to import/run on a Mac because torch.cuda.* calls were executed unconditionally and the device was effectively hardcoded.

Why

On macOS there is no CUDA, so torch.cuda.Event, torch.cuda.synchronize(), empty_cache(), get_device_properties(), etc. raise or are unavailable, crashing the pipeline before any inference can run. Mac users (e.g. TouchDesigner / StreamDiffusionTD on Apple Silicon) had no working path.

Changes

wrapper.py
- Resolve the requested device against availability (cuda → mps → cpu); a config that still says "cuda" transparently uses MPS on a Mac, while CUDA machines are unaffected. (Previous code forced cpu on any non-MPS box.)
- Guard cuda.empty_cache / synchronize / ipc_collect and memory queries behind is_available(), with MPS branches (torch.mps.*).
- Restore self.use_denoising_batch to honor the constructor parameter — it had been hardcoded to False, which breaks img2img with iteration over a 0-d tensor.
- Skip xformers on Mac (unsupported).
pipeline.py — fall back to time.perf_counter() for timing when CUDA events are unavailable (MPS/CPU) instead of constructing torch.cuda.Event.
preprocessing/base_orchestrator.py — guard cuda.synchronize() in cleanup.
setup.py — allow install on macOS (MPS/CPU torch); drop cuda-python and TensorRT extras on darwin; relax the CUDA-only hard requirement.

Compatibility / risk

CUDA: unchanged. device="cuda" on a CUDA host still selects CUDA and all cuda.* calls run exactly as before.
Mac: verified end-to-end — SDXL-Turbo loads on MPS and streams img2img (~2.7 FPS at 512×512, acceleration none).
TensorRT remains CUDA-only and is simply skipped on Mac.

Notes for reviewers

Scoped to the core library only. TouchDesigner-integration changes (shared-memory transport on Mac, etc.) live outside this package and are not included here.

🤖 Generated with Claude Code

Make the core pipeline run on Apple Silicon (MPS) and CPU in addition to CUDA, without regressing CUDA. Previously every torch.cuda.* call ran unconditionally and the device was effectively hardcoded, so the package failed to import/run on a Mac. - wrapper.py: resolve the requested device against availability (cuda -> mps -> cpu); a config that still says "cuda" transparently uses MPS on a Mac. Guard cuda.empty_cache/synchronize/ipc_collect and memory queries behind is_available() with MPS/CPU branches. Restore self.use_denoising_batch to honor the parameter (forcing it False broke img2img with "iteration over a 0-d tensor"). Skip xformers on Mac. - pipeline.py: fall back to time.perf_counter() when CUDA events are unavailable (MPS/CPU) instead of crashing on torch.cuda.Event. - preprocessing/base_orchestrator.py: guard cuda.synchronize() in cleanup. - setup.py: allow install on Mac (MPS/CPU torch); drop cuda-python and TensorRT extras on darwin; relax the CUDA-only hard requirement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cross-platform device support (Apple Silicon / MPS, CPU)#23

Add cross-platform device support (Apple Silicon / MPS, CPU)#23
MerrittLegacy wants to merge 1 commit into
dotsimulate:SDTD_031_stablefrom
MerrittLegacy:SDTD_031_stable

MerrittLegacy commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MerrittLegacy commented Jun 21, 2026

What

Why

Changes

Compatibility / risk

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant