safetensors-streaming

Drop-in Rust replacement for safetensors.safe_open() and safetensors.torch.load_file() that loads model weights faster by bypassing Python overhead.

Experimental — this project is under active development. APIs may change. Use at your own risk.

What it does

The standard safetensors Python library reads weights through a C extension with Python-level iteration. safetensors-streaming reimplements the hot path in Rust via PyO3, returning tensors directly to PyTorch with less overhead.

It works as a monkey-patch — two lines of code and every library that uses safetensors (transformers, diffusers, vLLM, etc.) automatically uses the faster path.

Usage

import safetensors_streaming

# Patch safetensors globally
safetensors_streaming.patch()

# Everything downstream uses the faster implementation
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B")

Or use a context manager for scoped patching:

import safetensors_streaming

with safetensors_streaming.patched():
    model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B")

Benchmarks

All benchmarks run on an RTX 4090, loading from local cache (no network). Each test loads the model twice — once with the stock safetensors library, once with safetensors-streaming patched in.

Library	Model	Params	Baseline	Patched	Speedup
transformers	Phi-3-mini-4k	3.8B	2.0s	0.2s	8.4x
transformers	Qwen2.5-7B	7.6B	2.2s	0.3s	7.1x
transformers	Qwen2.5-14B	14.8B	2.8s	0.7s	3.9x
diffusers	FLUX.2 Klein 4B	8.0B	29.3s	7.6s	3.86x
diffusers	SD-Turbo	1.3B	4.5s	2.5s	1.77x
sentence-transformers	bge-large-en	335M	1.33s	1.03s	1.29x
vLLM	Qwen2.5-7B	7.6B	13.2s	11.0s	1.21x

The biggest gains are in transformers where weight loading dominates total time. In vLLM, engine initialization (process spawning, KV cache allocation, kernel warmup) accounts for ~75% of startup, so the weight loading speedup has less overall impact.

How it works

The stock safetensors Python library uses safe_open() which returns a handle, then Python code calls get_tensor() in a loop for each tensor. Each call crosses the Python/C boundary and allocates individually.

safetensors-streaming parses the safetensors header in Rust, then reads all tensor data in bulk with minimal Python interaction. For local files it uses memory-mapped I/O. The Rust implementation also supports HTTP Range requests and S3 for streaming weights directly from remote storage.

Install

Requires Python 3.10+, PyTorch, and a Rust toolchain (1.80+).

git clone https://github.com/lightsofapollo/safetensors-streaming.git
cd safetensors-streaming
pip install maturin
maturin develop --release

This compiles the Rust extension and installs it into your current Python environment.

Project status

This is experimental software. Known limitations:

Only tested with float16, bfloat16, and float32 dtypes
Streaming from URLs (HTTP/S3) works but is not yet optimized for production use
The monkey-patching approach covers safe_open() and load_file() — other safetensors APIs are not patched

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.beads		.beads
.claude/skills		.claude/skills
.github/workflows		.github/workflows
benches		benches
crates		crates
dist		dist
python-cuda/safetensors_streaming_cu12		python-cuda/safetensors_streaming_cu12
python/safetensors_streaming		python/safetensors_streaming
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GPU_CLI_BUGS.md		GPU_CLI_BUGS.md
LICENSE		LICENSE
README.md		README.md
gpu.jsonc		gpu.jsonc
pyproject-cuda.toml		pyproject-cuda.toml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

safetensors-streaming

What it does

Usage

Benchmarks

How it works

Install

Project status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

safetensors-streaming

What it does

Usage

Benchmarks

How it works

Install

Project status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages