Mailroom fd hygiene#1610
Draft
DanGould wants to merge 6 commits into
Draft
Conversation
CONNECT and WebSocket bootstrap tunnels proxied bytes with copy_bidirectional and no timeout, spawned without any concurrency limit. Each tunnel pins two file descriptors (the inbound upgraded socket and the outbound TCP stream), so a stalled or malicious client could hold descriptors open indefinitely and exhaust the process limit -- the one relay path whose descriptor use was unbounded. Add a shared TunnelLimits holding a semaphore and a timeout. Acquire an owned permit before any work and reject with 503 when the budget is exhausted; hold the permit for the tunnel's lifetime. Run the copy through a timeout that tears the tunnel down once exceeded, releasing its descriptors. Total tunnel descriptor use is now bounded by the concurrency cap.
Collaborator
Coverage Report for CI Build 27118534399Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-0.2%) to 85.137%Details
Uncovered Changes
Coverage Regressions67 previously-covered lines in 2 files lost coverage.
Coverage Stats
💛 - Coveralls |
Three paths could consume descriptors without bound. Cap accepted connections: wrap the listener in a LimitedListener that acquires an owned semaphore permit per accept, held for the connection lifetime, so the server sheds rather than exhausts descriptors under load. Applied to the plain and non-TLS serve paths. Bound the whole bootstrap tunnel, not just the byte copy: wrap the post-upgrade tunnel future in one tokio timeout covering the upgrade, gateway connect, and copy, and drop the now-unused copy_bidirectional_with_timeout helper. Bound gateway probes: give the prober a probe timeout and wrap its outbound request, so a hung gateway cannot pin an outbound descriptor.
33bf751 to
5688896
Compare
The ACME (TLS-direct, :443) serve path used axum-server's bind and so bypassed the LimitedListener cap, leaving it able to exhaust descriptors after the plain path was bounded. Replace it with a hand-rolled AcmeListener over tokio_rustls_acme's Incoming plus a LimitedTcpIncoming that applies the same accepted- connection semaphore, so the ACME path sheds under load like the others. The rewrite drops axum-server and switches cert renewal to futures::StreamExt, so remove the now-unused axum-server and tokio-stream dependencies and the tokio-rustls-acme "axum" feature.
The directory exhausted its descriptors by accumulating idle inbound connections: axum::serve sets no keep-alive idle timeout and no connection cap, so connections a peer relay leaves open are never reclaimed. Because OHTTP hides end users behind relays, a single peer can pin every descriptor and wedge the listener. Replace axum::serve with a hyper_util auto::Builder accept loop so per-connection HTTP state is reachable, and set a header-read timeout that closes connections idle between requests. It is connection-state aware: it never fires during an in-flight handler, so a long-poll awaiting its mailbox (up to config.timeout) is not interrupted. Keep-alive stays enabled so planned relay-directory HTTP/2 multiplexing remains possible. Bound concurrency with a configurable global cap that reserves descriptor headroom below the soft RLIMIT_NOFILE (read via rustix). Export active-connection and shed counters so accumulation is observable. Limits are configurable via the [connection] section.
The global connection cap alone lets one peer relay pin every slot and starve the others. Add a per-source cap (default 1024) keyed on the connecting peer IP -- IPv4-mapped addresses normalized to IPv4, IPv6 aggregated by a configurable prefix (default /64). This is peer-relay fairness at the transport layer. It is not end-user rationing -- end users are hidden behind OHTTP relays, so the directory only ever sees peer relays -- and it is not the deferred BIP 77 anti-abuse layer. Over-cap connections are shed without consuming a global permit, and counted in the shed metric.
5688896 to
d7b59f2
Compare
The per-source default of 1024 pins one peer relay at the same threshold that originally wedged the directory: the incident peak was ~1009 connections from a single relay (pj.achow101.com) plus several thousand more queued in the listen backlog, so real demand from one relay reached the low thousands. Under OHTTP the directory sees only the relay, not the end users behind it, so a per-source cap throttles a legitimate aggregating relay rather than an abuser. Until relay-directory HTTP/2 multiplexing lands -- which collapses a relay's footprint from one descriptor per receiver to a handful -- default the per-source cap to the global cap (off) and raise the global cap to 32768: well above observed demand, still far below the 65536 descriptor ceiling, so the directory sheds gracefully instead of wedging and no trusted relay is clipped.
0d0d0f7 to
a777a52
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bear with me as I post this draft while it's still in flux and needs a bit of cleaning. Getting it out early so you can see what's up. Follow up to #1608 with actual fixes.
This is the result of adversarial back and forth between Claude and Codex (between environment setup too, so context has been a bit wonky. Commits definitely need cleaning before this gets close to merge)
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.