| OS | Architectures | Acceleration | Status |
|---|---|---|---|
| GNU/Linux | arm, arm64, x86_64 | - | Supported |
| GNU/Linux | arm, arm64, x86_64 | CUDA | Supported |
| GNU/Linux | arm, arm64, x86_64 | BLAS | Supported |
| Windows | x86_64 | BLAS | Untested |
| Windows | x86_64 | CUDA | Supported |
| macOS | x86_64 | - | Supported |
| macOS | aarch64 | - | Supported |
| macOS | aarch64 | Metal | Supported |
| GNU/Linux | x86_64 | Vulkan | Supported |
| Android | arm, arm64, x86_64 | - | Supported |
| Android | arm, arm64, x86_64 | CUDA | Untested |
| iOS / iPadOS | aarch64 | - | Supported |
| iOS / iPadOS | aarch64 | Metal | Supported (A13+ / M-series) |
CUDA >= 12.2 is required for CUDA accelerated systems.
With Rust installed:
CPU only (no acceleration):
cargo build --releaseMetal (Apple Silicon):
cargo build --release --features metalCUDA:
cargo build --release --features cudaIf your system has multiple CUDA toolkit versions installed, set CUDA_HOME to the
version supported by your driver to avoid library version mismatches:
CUDA_HOME=/usr/local/cuda-12.4 cargo build --release --features cudaVulkan (Steam Deck, AMD/Intel GPUs):
cargo build --release --features vulkanCUDA on Windows:
Windows workers require an NVIDIA GPU driver and the CUDA Toolkit >= 12.2 (the installer sets CUDA_PATH automatically).
cargo build --release --features cudaFor Pascal, Maxwell, or other GPUs with compute capability < 7.0, the upstream candle-kernels crate requires patches. See cake-core/src/backends/cuda/compat/ for a one-command fix:
./cake-core/src/backends/cuda/compat/patch.sh
cargo build --release --features cudaThe cake-mobile-app/ directory contains a Kotlin Multiplatform (Compose Multiplatform) worker app that runs on both iOS and Android with shared UI/logic code.
Android (requires cargo-ndk and Android NDK):
make mobile_android
# installs: cake-mobile-app/androidApp/build/outputs/apk/debug/androidApp-debug.apkiOS (run on macOS with Xcode installed):
make mobile_ios
# then open cake-mobile-app/iosApp/iosApp.xcodeproj in Xcode and build/deploymake mobile_ios builds the Rust static library (libcake_mobile.a with Metal), compiles the KMP shared framework via Gradle, and copies it into the Xcode project. Metal acceleration is enabled on A13+ / M-series devices; older devices fall back to CPU automatically.
By default, inference runs on CPU. Enable GPU acceleration with:
| Feature | Backend | Platforms | Notes |
|---|---|---|---|
cuda |
NVIDIA CUDA (PTX kernels + flash-attn) | Linux, Windows | Best for NVIDIA GPUs |
metal |
Apple Metal (MSL shaders + fused SDPA) | macOS, iOS | Best for Apple Silicon (~42 tok/s on M3 Pro with 0.8B model) |
accelerate |
Apple Accelerate (AMX hardware) | macOS | CPU-only; 2.7x faster F32 matmul via Apple BLAS. No F16 support — use metal for F16 models |
vulkan |
Vulkan via wgpu | Linux, Windows, Steam Deck | Portable GPU backend |
flash-attn |
Flash Attention 2 (implies cuda) |
Linux, Windows | Fused attention kernel for long sequences |
Multiple backends can be compiled together — the runtime auto-selects based on available hardware.
Apple Silicon guidance: Use metal for best performance. The accelerate feature only helps CPU inference with F32 models — for F16 models (default), CPU without accelerate is actually faster (26 vs 23 tok/s) because F16 halves memory bandwidth vs the F32 conversion Accelerate requires.
By default, all text model architectures are compiled in. To build only for specific models:
# Only LLaMA support
cargo build --release --no-default-features --features llama
# Only Qwen2 support
cargo build --release --no-default-features --features qwen2
# Only Qwen3.5 support
cargo build --release --no-default-features --features qwen3_5