feat: implement Rust proxy gateway for vLLM Responses API by leseb · Pull Request #24 · vllm-project/agentic-api

leseb · 2026-05-11T18:45:35Z

Summary

Implements the full Rust proxy gateway for the vLLM Responses API, replacing the Python gateway stub
Supports both streaming (SSE) and non-streaming proxy modes with proper hop-by-hop header filtering
Includes API key injection from environment, comprehensive error mapping (502 for connection errors, 504 for timeouts), and CORS support
Adds CLI with two modes: standalone (--llm-api-base) and integrated (serve <model> spawning vLLM as a subprocess)
Modular architecture: config, proxy, app, server modules with a clean separation of concerns

Test Plan

11 tests covering: non-stream passthrough, stream passthrough, hop-by-hop header stripping, auth injection, client auth precedence, upstream HTTP error passthrough, mid-stream failure handling, connection error → 502, timeout → 504
All tests pass: cargo test (11/11 green)
Clippy clean: cargo clippy --all-targets -- -D warnings
Formatting clean: cargo fmt -- --check

🤖 Generated with Claude Code

Replaces the Python gateway stub with a full Rust implementation using axum, reqwest, and tokio. Supports both streaming (SSE) and non-streaming proxy modes, hop-by-hop header filtering, API key injection, and comprehensive error mapping (502/504). Includes 11 tests covering passthrough, auth, streaming, error propagation, mid-stream failure, connection errors, and timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

@maralbahari

Derive clap::Args on RuntimeConfig directly and use #[arg(skip)] for llm_api_base, eliminating the duplicate GatewayOpts struct and the manual build_config mapping function. Addresses review feedback from @maralbahari. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

maralbahari · 2026-05-13T13:35:10Z

@leseb I was wondering if we should validate that the library choices are optimal and verify stateless proxy throughput/latency against the Python baseline. When I was working on translating Python to Rust, I noticed some performance regression compared to the Python baseline, so I thought it would be best to try and avoid such regressions from the start.

Introduces a proper Error enum via thiserror, replacing Box<dyn std::error::Error> across the library. ProxyState::new now returns Result instead of panicking on HTTP client build failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

leseb · 2026-05-13T14:24:27Z

@leseb I was wondering if we should validate that the library choices are optimal and verify stateless proxy throughput/latency against the Python baseline. When I was working on translating Python to Rust, I noticed some performance regression compared to the Python baseline, so I thought it would be best to try and avoid such regressions from the start.

Can you share more? I'll add a criterion benchmarks

Measures proxy latency overhead by comparing direct upstream requests against proxied requests for both non-streaming and streaming (SSE) paths. Establishes a baseline for regression detection. Results on Apple M-series: non_stream/direct: ~65 µs non_stream/proxied: ~133 µs (68 µs overhead) stream/direct: ~91 µs stream/proxied: ~462 µs (371 µs overhead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Replaces generic "upstream" terminology with "vllm" to make it clear what the gateway is proxying to. Affects config fields, error variants, function names, variables, and test helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>

leseb requested review from bbrowning, franciscojavierarceo, jiahuei, maralbahari, noobHappylife, qandrew and tjtanaa as code owners May 11, 2026 18:45

leseb marked this pull request as draft May 11, 2026 18:46

maralbahari reviewed May 12, 2026

View reviewed changes

Comment thread src/main.rs Outdated

Comment thread src/config.rs Outdated

leseb marked this pull request as ready for review May 13, 2026 14:42

leseb and others added 2 commits May 13, 2026 16:49

fix: harden gateway startup and readiness validation

cdf37f3

Signed-off-by: Sébastien Han <seb@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement Rust proxy gateway for vLLM Responses API#24

feat: implement Rust proxy gateway for vLLM Responses API#24
leseb wants to merge 6 commits into
vllm-project:mainfrom
leseb:building-the-agentic-api-in-rust-after-project-mig

leseb commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

maralbahari commented May 13, 2026

Uh oh!

leseb commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leseb commented May 11, 2026

Summary

Test Plan

Uh oh!

Uh oh!

Uh oh!

maralbahari commented May 13, 2026

Uh oh!

leseb commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants