Severity: safety. A consult reviewer lane must be read-only/advisory. The Gemini (agy/Antigravity) adapter's agent was observed executing real, mutating commands inside a builder worktree — including a porch state transition (porch done <project>) and a file write. Two distinct failures, the second serious.
(1) Recurring — empty sandbox, no real review
The Gemini lane repeatedly launches in an empty temporary sandbox directory with no repo/diff context, so it never sees the code under review. Its transcript is the agent flailing in that empty dir — "I will list the contents of the workspace directory…", inspecting antigravity-cli folders, env vars, and permissions — instead of reviewing a diff. No VERDICT: line is produced.
This has recurred on nearly every consult run across a multi-PR session. The Claude and Codex lanes receive the worktree diff correctly; the Gemini lane does not.
Knock-on effect: when a lane produces no verdict, porch defaults the result to REQUEST_CHANGES (see §3) — so Gemini contributes a false blocking verdict on essentially every PR, which reviewers learn to dismiss as "just the gemini tooling failure" (dangerous — it trains people to ignore a REQUEST_CHANGES).
(2) NEW — the reviewer agent executed real commands in the worktree
On a re-run, the Gemini agent ran real commands inside the builder worktree:
porch done <project> — which advanced porch protocol state and fired a gate.
- Wrote a spurious file into the project's
codev/projects/<project>/ directory.
In the observed instance the end state happened to be correct (the review was complete, so the gate landed where it belonged). But that was luck, not safety. A reviewer agent with shell access to porch/git/the filesystem can:
- Advance a protocol past a gate before review is actually complete.
- Mutate, delete, or commit code in the worktree it's supposed to passively review.
- Pollute the project's audit trail with stray files.
Why this matters
The consult trust model is that reviewer lanes read + emit a verdict — they are advisory, not actors. A reviewer that can execute porch done or write files is a privilege-escalation-shaped bug: it can corrupt orchestration state and the worktree it's reviewing.
Proposed fixes
- Sandbox consult reviewer agents read-only. The Gemini/Antigravity (
agy) reviewer must run with no ability to execute porch, git, or filesystem-write commands. Its only output is a verdict + comments. Audit how the agy adapter grants tool/command access — it evidently runs with a shell that has real porch + write access to the worktree.
- Fix the empty-sandbox context plumbing. The adapter must receive the builder worktree's diff/context the same way the Claude and Codex lanes do. Until it does, it structurally cannot review.
- Porch: distinguish "absent verdict / tooling failure" from
REQUEST_CHANGES. A lane that produces no VERDICT: line should be recorded as SKIPPED/ERRORED, not defaulted to a blocking REQUEST_CHANGES. The current default trains operators to ignore real REQUEST_CHANGES verdicts.
Evidence
- The Gemini lane's review-output file for an affected run is the empty-sandbox flailing transcript (lists workspace dir, inspects
antigravity-cli, checks env/permissions — never references the actual diff).
- Builder report from an affected run: the agent "went rogue and executed real commands in the worktree (ran
porch done … and wrote a spurious rebuttals file, since removed); porch state verified consistent afterward."
- Pattern: empty-sandbox / no-verdict on the Gemini lane across ~7 consecutive PRs in one session.
Areas
Primary: area/consult (the agy/Antigravity reviewer adapter — read-only sandboxing + empty-sandbox context). Secondary: area/porch (the absent-verdict → REQUEST_CHANGES default in the verdict parser).
Severity: safety. A consult reviewer lane must be read-only/advisory. The Gemini (
agy/Antigravity) adapter's agent was observed executing real, mutating commands inside a builder worktree — including a porch state transition (porch done <project>) and a file write. Two distinct failures, the second serious.(1) Recurring — empty sandbox, no real review
The Gemini lane repeatedly launches in an empty temporary sandbox directory with no repo/diff context, so it never sees the code under review. Its transcript is the agent flailing in that empty dir — "I will list the contents of the workspace directory…", inspecting
antigravity-clifolders, env vars, and permissions — instead of reviewing a diff. NoVERDICT:line is produced.This has recurred on nearly every consult run across a multi-PR session. The Claude and Codex lanes receive the worktree diff correctly; the Gemini lane does not.
Knock-on effect: when a lane produces no verdict, porch defaults the result to
REQUEST_CHANGES(see §3) — so Gemini contributes a false blocking verdict on essentially every PR, which reviewers learn to dismiss as "just the gemini tooling failure" (dangerous — it trains people to ignore a REQUEST_CHANGES).(2) NEW — the reviewer agent executed real commands in the worktree
On a re-run, the Gemini agent ran real commands inside the builder worktree:
porch done <project>— which advanced porch protocol state and fired a gate.codev/projects/<project>/directory.In the observed instance the end state happened to be correct (the review was complete, so the gate landed where it belonged). But that was luck, not safety. A reviewer agent with shell access to
porch/git/the filesystem can:Why this matters
The consult trust model is that reviewer lanes read + emit a verdict — they are advisory, not actors. A reviewer that can execute
porch doneor write files is a privilege-escalation-shaped bug: it can corrupt orchestration state and the worktree it's reviewing.Proposed fixes
agy) reviewer must run with no ability to executeporch,git, or filesystem-write commands. Its only output is a verdict + comments. Audit how theagyadapter grants tool/command access — it evidently runs with a shell that has realporch+ write access to the worktree.REQUEST_CHANGES. A lane that produces noVERDICT:line should be recorded asSKIPPED/ERRORED, not defaulted to a blockingREQUEST_CHANGES. The current default trains operators to ignore real REQUEST_CHANGES verdicts.Evidence
antigravity-cli, checks env/permissions — never references the actual diff).porch done …and wrote a spurious rebuttals file, since removed); porch state verified consistent afterward."Areas
Primary:
area/consult(theagy/Antigravity reviewer adapter — read-only sandboxing + empty-sandbox context). Secondary:area/porch(the absent-verdict → REQUEST_CHANGES default in the verdict parser).