feat: implement Rust proxy gateway for vLLM Responses API#24
Conversation
Replaces the Python gateway stub with a full Rust implementation using axum, reqwest, and tokio. Supports both streaming (SSE) and non-streaming proxy modes, hop-by-hop header filtering, API key injection, and comprehensive error mapping (502/504). Includes 11 tests covering passthrough, auth, streaming, error propagation, mid-stream failure, connection errors, and timeouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Derive clap::Args on RuntimeConfig directly and use #[arg(skip)] for llm_api_base, eliminating the duplicate GatewayOpts struct and the manual build_config mapping function. Addresses review feedback from @maralbahari. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
|
@leseb I was wondering if we should validate that the library choices are optimal and verify stateless proxy throughput/latency against the Python baseline. When I was working on translating Python to Rust, I noticed some performance regression compared to the Python baseline, so I thought it would be best to try and avoid such regressions from the start. |
Introduces a proper Error enum via thiserror, replacing Box<dyn std::error::Error> across the library. ProxyState::new now returns Result instead of panicking on HTTP client build failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Can you share more? I'll add a criterion benchmarks |
Measures proxy latency overhead by comparing direct upstream requests against proxied requests for both non-streaming and streaming (SSE) paths. Establishes a baseline for regression detection. Results on Apple M-series: non_stream/direct: ~65 µs non_stream/proxied: ~133 µs (68 µs overhead) stream/direct: ~91 µs stream/proxied: ~462 µs (371 µs overhead) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Replaces generic "upstream" terminology with "vllm" to make it clear what the gateway is proxying to. Affects config fields, error variants, function names, variables, and test helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Summary
--llm-api-base) and integrated (serve <model>spawning vLLM as a subprocess)config,proxy,app,servermodules with a clean separation of concernsTest Plan
cargo test(11/11 green)cargo clippy --all-targets -- -D warningscargo fmt -- --check🤖 Generated with Claude Code