Skip to content

feat: implement Rust proxy gateway for vLLM Responses API#24

Open
leseb wants to merge 6 commits into
vllm-project:mainfrom
leseb:building-the-agentic-api-in-rust-after-project-mig
Open

feat: implement Rust proxy gateway for vLLM Responses API#24
leseb wants to merge 6 commits into
vllm-project:mainfrom
leseb:building-the-agentic-api-in-rust-after-project-mig

Conversation

@leseb
Copy link
Copy Markdown
Collaborator

@leseb leseb commented May 11, 2026

Summary

  • Implements the full Rust proxy gateway for the vLLM Responses API, replacing the Python gateway stub
  • Supports both streaming (SSE) and non-streaming proxy modes with proper hop-by-hop header filtering
  • Includes API key injection from environment, comprehensive error mapping (502 for connection errors, 504 for timeouts), and CORS support
  • Adds CLI with two modes: standalone (--llm-api-base) and integrated (serve <model> spawning vLLM as a subprocess)
  • Modular architecture: config, proxy, app, server modules with a clean separation of concerns

Test Plan

  • 11 tests covering: non-stream passthrough, stream passthrough, hop-by-hop header stripping, auth injection, client auth precedence, upstream HTTP error passthrough, mid-stream failure handling, connection error → 502, timeout → 504
  • All tests pass: cargo test (11/11 green)
  • Clippy clean: cargo clippy --all-targets -- -D warnings
  • Formatting clean: cargo fmt -- --check

🤖 Generated with Claude Code

Replaces the Python gateway stub with a full Rust implementation
using axum, reqwest, and tokio. Supports both streaming (SSE) and
non-streaming proxy modes, hop-by-hop header filtering, API key
injection, and comprehensive error mapping (502/504).

Includes 11 tests covering passthrough, auth, streaming, error
propagation, mid-stream failure, connection errors, and timeouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Comment thread src/main.rs Outdated
Comment thread src/config.rs Outdated
Derive clap::Args on RuntimeConfig directly and use #[arg(skip)] for
llm_api_base, eliminating the duplicate GatewayOpts struct and the
manual build_config mapping function.

Addresses review feedback from @maralbahari.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@maralbahari
Copy link
Copy Markdown
Collaborator

@leseb I was wondering if we should validate that the library choices are optimal and verify stateless proxy throughput/latency against the Python baseline. When I was working on translating Python to Rust, I noticed some performance regression compared to the Python baseline, so I thought it would be best to try and avoid such regressions from the start.

Introduces a proper Error enum via thiserror, replacing
Box<dyn std::error::Error> across the library. ProxyState::new now
returns Result instead of panicking on HTTP client build failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb
Copy link
Copy Markdown
Collaborator Author

leseb commented May 13, 2026

@leseb I was wondering if we should validate that the library choices are optimal and verify stateless proxy throughput/latency against the Python baseline. When I was working on translating Python to Rust, I noticed some performance regression compared to the Python baseline, so I thought it would be best to try and avoid such regressions from the start.

Can you share more? I'll add a criterion benchmarks

Measures proxy latency overhead by comparing direct upstream requests
against proxied requests for both non-streaming and streaming (SSE)
paths. Establishes a baseline for regression detection.

Results on Apple M-series:
  non_stream/direct:  ~65 µs
  non_stream/proxied: ~133 µs  (68 µs overhead)
  stream/direct:      ~91 µs
  stream/proxied:     ~462 µs  (371 µs overhead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb marked this pull request as ready for review May 13, 2026 14:42
leseb and others added 2 commits May 13, 2026 16:49
Replaces generic "upstream" terminology with "vllm" to make it
clear what the gateway is proxying to. Affects config fields,
error variants, function names, variables, and test helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants