feat: autonomous PR-watcher (Phase C) by michaelzwang13 · Pull Request #29 · michaelzwang13/AgentOS

michaelzwang13 · 2026-05-24T18:12:11Z

Closes #9. Closes #28.

The last remaining piece of the Code Review Engineer epic (#10). A background asyncio loop in the platform's FastAPI lifespan now walks running code-review-engineer agents × their watched_repos, fetches open PRs from GitHub, dedups against reviewed_prs (Phase D), and dispatches a review task. The agent's LLM then decides to invoke github-pr-review from inside its container — the watcher is the trigger, not the bookkeeper.

What changed

New table — watched_repos (backend/migrations/004_code_review_engineer.sql, mirrored in schema.sql)

unique(user_id, owner, repo) enforces Prevent double-review when multiple agents watch the same repo #28: two agents under the same user cannot watch the same repo. Without it, both would dispatch reviews of the same PR and the user pays 2× tokens.

New model — WatchedRepoModel (backend/app/models/watched_repo.py)

create/list_by_agent/list_all/get_by_id/delete. create translates the PG unique-constraint violation into WatchedRepoExists so the router can return 409.

New router — watched_repos (backend/app/routers/watched_repos.py)

POST /agents/{id}/watched-repos (201) / 409 on cross-agent conflict
GET /agents/{id}/watched-repos
DELETE /agents/{id}/watched-repos/{watched_id}
User-side X-Api-Key auth, scoped to caller's agents (404 if not owned).

New service — PRWatcher (backend/app/services/pr_watcher.py)

120s cadence, single global loop iterating across all running CRE agents.
Per-(agent, repo) error isolation: one failed GitHub call must not stop the rest.
409 from the sidecar = "agent busy, defer to next tick" (no queue, no fork).
Startup-only 30-min staleness gate — on the first tick after platform boot, ignore PRs older than 30 min so we don't backlog-review what accumulated while the platform was down. Steady-state ticks have no time gate.

Lifespan wiring (backend/app/main.py)

lifespan handler starts the watcher task on boot, cancels it cleanly on shutdown.
PR_WATCHER_ENABLED=false opts out — used by the test suite (conftest.py) so TestClient(app) doesn't spin up a real background loop.

Memory injection rides for free — Dispatcher.dispatch_task already injects agent_memory into role_context (Phase D). Every watcher-triggered review gets current memory by reusing that path.

Why a single global loop (not per-agent)

One asyncio.Task iterating across all agents serially is plenty for hackathon scale. Per-agent watchers would multiply the loop count with no real benefit, and complicate the lifespan teardown.

Why dedup stays at the agent level

reviewed_prs is still unique(agent_id, owner, repo, pr_number). #28 is prevented at the watched_repos insert, so by the time a PR is being reviewed, only one agent ever sees it. If we ever allow multi-agent overlap (we won't soon), we'd revisit reviewed_prs then — not now.

Tests

17 new — 125 → 142 total, all passing.

test_watched_repos.py (9): router success/auth/404/409 paths, model conflict translation.
test_pr_watcher.py (8): dispatch-on-fresh, dedup-skip, 409-defer, per-repo isolation, startup staleness, steady-state ignores age, no-agents short-circuit, clean cancel.

Smoke test (against live backend)

Verified end-to-end via /tmp/phaseC_smoke.py:

POST → 201
Same user, different agent, same repo → 409 from the live PG unique constraint (Prevent double-review when multiple agents watch the same repo #28 verified, not just unit-tested)
GET lists per agent
PRWatcher.tick() walks the real DB, hits the missing-GitHub-cred case, per-repo isolation swallows it — tick completes cleanly
DELETE → 204
After delete, the previously-blocked agent can subscribe

Migration

Same 004_code_review_engineer.sql continues to absorb the Code Review Engineer epic (Phases B, D, now C). Idempotent.

Deliberately out of scope

Frontend UI for subscribing/unsubscribing repos — folds into Hire & offboard flow missing from the UI #13 (hire/offboard UI). Demo via curl for now.
Multi-watcher coordination — Prevent double-review when multiple agents watch the same repo #28 prevents it; we don't need to design around overlapping watchers.

Test plan

pytest passes (142/142)
Smoke test against live backend
Manual end-to-end: hire a Code Review Engineer, subscribe to a repo with a real OAuth credential, open a PR, watch the platform logs for pr_watcher: dispatched review …

🤖 Generated with Claude Code

Background asyncio loop in the platform's FastAPI lifespan that walks running code-review-engineer agents × their watched_repos, fetches open PRs from GitHub, dedups against reviewed_prs, and dispatches a review task — the last remaining piece of the Code Review Engineer epic. - watched_repos table (#28: unique(user_id, owner, repo) prevents two agents under the same user from watching the same repo and posting duplicate reviews) - WatchedRepoModel translates the PG unique-constraint violation into WatchedRepoExists so the router can return 409 - watched_repos router: POST/GET/DELETE /agents/{id}/watched-repos under user-side X-Api-Key auth - PRWatcher service: 120s cadence, per-(agent,repo) error isolation, 409-from-sidecar treated as "agent busy, defer to next tick", startup-only 30-min staleness gate to avoid backlog-flooding on platform restart - main.py lifespan starts/cancels the watcher; PR_WATCHER_ENABLED=false opt-out for tests and ad-hoc debug runs - 17 new tests (125 → 142, all passing) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

michaelzwang13

Code Review Summary

PR #29: autonomous PR-watcher (Phase C)

This PR implements the final piece of the Code Review Engineer epic - a background asyncio loop that polls GitHub for open PRs and dispatches review tasks to running agents.

Review

Strengths:

Clean architecture with proper separation of concerns
Good error isolation (per-repo failures do not break the loop)
Smart deduplication via unique(user_id, owner, repo) constraint prevents double reviews (#28)
Proper lifespan management with graceful shutdown
30-minute startup staleness gate prevents backlog flooding on platform restart
409 handling for busy agents is thoughtful
Good test coverage (17 new tests, 125 to 142)

Minor notes:

The broad except Exception with string matching for duplicate key errors in watched_repo.py is a bit fragile, but acceptable given Supabase client limitations
Consider adding a TypedDict for list_running_by_role() return type for better type safety

Overall: Well-structured implementation with clean code, proper documentation, and thoughtful edge case handling. LGTM!

michaelzwang13 commented May 24, 2026

View reviewed changes

michaelzwang13 merged commit d18a240 into main May 24, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: autonomous PR-watcher (Phase C)#29

feat: autonomous PR-watcher (Phase C)#29
michaelzwang13 merged 1 commit into
mainfrom
feat/autonomous-pr-watcher

michaelzwang13 commented May 24, 2026 •

edited

Loading

Uh oh!

michaelzwang13 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michaelzwang13 commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why a single global loop (not per-agent)

Why dedup stays at the agent level

Tests

Smoke test (against live backend)

Migration

Deliberately out of scope

Test plan

Uh oh!

michaelzwang13 left a comment

Choose a reason for hiding this comment

Code Review Summary

Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

michaelzwang13 commented May 24, 2026 •

edited

Loading