Prevent shared GitHub GraphQL bucket exhaustion across agent servers#544
Merged
Merged
Conversation
…stion GitHub's GraphQL rate limit (5,000 pt/hr) is PER-USER, so multiple agent servers authed as the same personal account (bborn) share one bucket and drain it, causing intermittent "GraphQL bucket is exhausted" failures. - internal/github/auth.go: CheckAuth() inspects the local gh identity and GraphQL headroom, classifying personal vs GitHub App/bot accounts and detecting logged-out / expired (401) tokens. Findings() turns that into ordered severity findings. - cmd/task/main.go: new `ty doctor` command that renders those findings — warns on personal-account auth, expired tokens, gh not logged in, and low GraphQL headroom; exits non-zero on hard errors. - internal/executor: agent system instructions now tell agents to prefer REST for PR reads and use `gh run watch` / REST check-runs with backoff instead of busy-polling `gh pr checks`. - Tests for account classification, error parsing, findings, and guidance. Follow-up (out of this Go repo's scope): provision each agent server with its own GitHub App installation token during launch/setup, mirroring the offerlab-devs[bot] pattern, for an independent rate-limit bucket. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Err tests Addresses plan-exit-review findings: - 1A: add `ty doctor --strict` so fleet sweeps can branch on exit code (warnings like personal-account auth now exit non-zero under --strict; default behavior unchanged — only hard errors exit non-zero). - 2A: document why pr.go's batch-gate threshold (200) and auth.go's operator-warn threshold (500) intentionally differ, cross-referencing each other so a future tuner sees both. - 3A: extract pure classifyUserErr() from CheckAuth and table-test the stderr->auth-state mapping (expired / logged-out / unknown), so a change to gh's wording is caught rather than silently mis-routed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Agents intermittently fail with "GraphQL bucket is exhausted" and fall back to REST.
Root cause (verified 2026-05-29): GitHub's GraphQL rate limit (5,000 points/hr) is PER-USER, not per-token. Multiple TaskYou agent servers all authenticate
ghas the same personal account (bborn), so they share one bucket. GraphQL-backedgh prcommands — especiallygh pr checkspolling loops — collectively drain it. Servers using a GitHub App (bot) identity likeofferlab-agents[bot]get their own independent bucket and don't contend.What this PR does
ty doctor— a new diagnostic command that inspects the localghauth and GraphQL headroom and warns about the conditions that cause contention:internal/github/auth.go—CheckAuth()probesgh api user+gh api rate_limit, classifies personal vs GitHub App/bot identities, and detects logged-out/expired tokens.Findings()converts that to ordered severity findings (reused byty doctor).Agent guidance — agent system instructions now tell agents to prefer REST for PR reads (separate 5k bucket) and to use
gh run watch/ REST check-runs with backoff instead of busy-pollinggh pr checks.Tests
internal/github/auth_test.go— account classification, 401/logged-out parsing, and findings/severity for each auth state.internal/executor/executor_test.go— locks the new GitHub guidance into the agent system prompt.All affected packages build, vet, and test clean.
Note
This PR was itself opened via the REST API (
POST /repos/.../pulls) because the shared GraphQL bucket was exhausted at creation time — a live demonstration of the exact problem, and of the documented REST workaround.Follow-up (out of this Go repo's scope)
Provision each agent server with its own GitHub App installation token during launch/setup (which lives in the launch plugin), mirroring the
offerlab-devs[bot]pattern, so every server gets an independent rate-limit bucket.🤖 Generated with Claude Code