parakeet-rs

Pure Rust implementation of NVIDIA's Parakeet-TDT-0.6B-v3 speech-to-text model, built on Candle with Metal GPU acceleration for Apple Silicon.

Highlights

3.83% WER on LibriSpeech test-clean (NVIDIA published: 1.93%)
7.6x real-time on Apple Silicon with Metal GPU
25 languages — English, French, German, Spanish, and 21 more European languages
No Python runtime — single static binary, no ONNX/CoreML/MLX dependency
Automatic model download from HuggingFace Hub
Output formats: plain text, JSON, SRT, VTT subtitles

Architecture

Full implementation of the FastConformer-TDT pipeline:

Component	Description
Preprocessor	128-bin log-mel spectrogram (rustfft + GPU filterbank)
Subsampling	8x depthwise-separable conv stack (3 stride-2 layers)
Encoder	24-layer FastConformer (relative positional attention, Macaron FFN, GLU conv)
Decoder	Token-and-Duration Transducer (2-layer LSTM + joint network)
Tokenizer	8192 BPE vocabulary via SentencePiece (multilingual)

Quick Start

# Build (requires Rust 1.85+)
cargo build --release --features metal -p parakeet

# Transcribe audio (downloads model on first run, ~2.4GB)
./target/release/parakeet --input recording.wav

# With timestamps
./target/release/parakeet --input recording.wav --timestamps

# SRT subtitles
./target/release/parakeet --input recording.wav --format srt --output recording.srt

# JSON output with word-level timestamps
./target/release/parakeet --input recording.wav --format json

Using a local model

# Convert NeMo checkpoint to SafeTensors (one-time)
python3 scripts/convert_nemo.py

# Use local model directory
./target/release/parakeet --model-dir converted_model --input recording.wav

Supported Languages

Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Ukrainian

Requirements

Rust 1.85+ (edition 2024)
macOS with Apple Silicon (M1/M2/M3/M4) for Metal acceleration
16-bit mono WAV input at any sample rate (resampled internally to 16kHz)

Benchmarks

Measured on Apple Silicon with --features metal:

Metric	Value
WER (LibriSpeech test-clean, 500 samples)	3.83%
Real-time factor	0.131
Speed	7.6x real-time
Model size	0.6B parameters (~2.4GB)

Also supports v2 (English-only, 2.66% WER) — auto-detected from model directory.

Project Structure

parakeet/
  src/
    audio.rs         # Log-mel spectrogram (rustfft + GPU matmul)
    subsampling.rs   # 8x depthwise-separable conv subsampling
    encoder.rs       # 24-layer FastConformer encoder
    decoder.rs       # TDT decoder (LSTM + joint network)
    decoding.rs      # Greedy TDT decoding with duration skipping
    tokenizer.rs     # BPE tokenizer
    pipeline.rs      # End-to-end transcription pipeline
    config.rs        # Model configuration (v2 + v3)
    weights.rs       # HF Hub download + SafeTensors loading
    wer.rs           # Word error rate computation
    bin/parakeet.rs  # CLI entry point
scripts/
  convert_nemo.py    # NeMo .nemo → SafeTensors converter
  benchmark_wer.py   # LibriSpeech WER benchmark

License

MIT

Model weights are subject to NVIDIA's CC-BY-4.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
parakeet		parakeet
planning/parakeet		planning/parakeet
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parakeet-rs

Highlights

Architecture

Quick Start

Using a local model

Supported Languages

Requirements

Benchmarks

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parakeet-rs

Highlights

Architecture

Quick Start

Using a local model

Supported Languages

Requirements

Benchmarks

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages