Installation

Platform Support

OS	Architectures	Acceleration	Status
GNU/Linux	arm, arm64, x86_64	-	Supported
GNU/Linux	arm, arm64, x86_64	CUDA	Supported
GNU/Linux	arm, arm64, x86_64	BLAS	Supported
Windows	x86_64	BLAS	Untested
Windows	x86_64	CUDA	Supported
macOS	x86_64	-	Supported
macOS	aarch64	-	Supported
macOS	aarch64	Metal	Supported
GNU/Linux	x86_64	Vulkan	Supported
Android	arm, arm64, x86_64	-	Supported
Android	arm, arm64, x86_64	CUDA	Untested
iOS / iPadOS	aarch64	-	Supported
iOS / iPadOS	aarch64	Metal	Supported (A13+ / M-series)

CUDA >= 12.2 is required for CUDA accelerated systems.

Building from Source

With Rust installed:

CPU only (no acceleration):

cargo build --release

Metal (Apple Silicon):

cargo build --release --features metal

CUDA:

cargo build --release --features cuda

If your system has multiple CUDA toolkit versions installed, set CUDA_HOME to the version supported by your driver to avoid library version mismatches:

CUDA_HOME=/usr/local/cuda-12.4 cargo build --release --features cuda

Vulkan (Steam Deck, AMD/Intel GPUs):

cargo build --release --features vulkan

CUDA on Windows:

Windows workers require an NVIDIA GPU driver and the CUDA Toolkit >= 12.2 (the installer sets CUDA_PATH automatically).

cargo build --release --features cuda

Pre-Volta NVIDIA GPUs

For Pascal, Maxwell, or other GPUs with compute capability < 7.0, the upstream candle-kernels crate requires patches. See cake-core/src/backends/cuda/compat/ for a one-command fix:

./cake-core/src/backends/cuda/compat/patch.sh
cargo build --release --features cuda

Mobile Worker App (iOS + Android)

The cake-mobile-app/ directory contains a Kotlin Multiplatform (Compose Multiplatform) worker app that runs on both iOS and Android with shared UI/logic code.

Android (requires cargo-ndk and Android NDK):

make mobile_android
# installs: cake-mobile-app/androidApp/build/outputs/apk/debug/androidApp-debug.apk

iOS (run on macOS with Xcode installed):

make mobile_ios
# then open cake-mobile-app/iosApp/iosApp.xcodeproj in Xcode and build/deploy

make mobile_ios builds the Rust static library (libcake_mobile.a with Metal), compiles the KMP shared framework via Gradle, and copies it into the Xcode project. Metal acceleration is enabled on A13+ / M-series devices; older devices fall back to CPU automatically.

Feature Flags

Backend Features

By default, inference runs on CPU. Enable GPU acceleration with:

Feature	Backend	Platforms	Notes
`cuda`	NVIDIA CUDA (PTX kernels + flash-attn)	Linux, Windows	Best for NVIDIA GPUs
`metal`	Apple Metal (MSL shaders + fused SDPA)	macOS, iOS	Best for Apple Silicon (~42 tok/s on M3 Pro with 0.8B model)
`accelerate`	Apple Accelerate (AMX hardware)	macOS	CPU-only; 2.7x faster F32 matmul via Apple BLAS. No F16 support — use `metal` for F16 models
`vulkan`	Vulkan via wgpu	Linux, Windows, Steam Deck	Portable GPU backend
`flash-attn`	Flash Attention 2 (implies `cuda`)	Linux, Windows	Fused attention kernel for long sequences

Multiple backends can be compiled together — the runtime auto-selects based on available hardware.

Apple Silicon guidance: Use metal for best performance. The accelerate feature only helps CPU inference with F32 models — for F16 models (default), CPU without accelerate is actually faster (26 vs 23 tok/s) because F16 halves memory bandwidth vs the F32 conversion Accelerate requires.

Model Features

By default, all text model architectures are compiled in. To build only for specific models:

# Only LLaMA support
cargo build --release --no-default-features --features llama

# Only Qwen2 support
cargo build --release --no-default-features --features qwen2

# Only Qwen3.5 support
cargo build --release --no-default-features --features qwen3_5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Installation

Platform Support

Building from Source

Pre-Volta NVIDIA GPUs

Mobile Worker App (iOS + Android)

Feature Flags

Backend Features

Model Features

Uh oh!

FilesExpand file tree

install.md

Latest commit

History

install.md

File metadata and controls

Installation

Platform Support

Building from Source

Pre-Volta NVIDIA GPUs

Mobile Worker App (iOS + Android)

Feature Flags

Backend Features

Model Features