Skip to content

Consult: Gemini (agy/Antigravity) reviewer runs in empty sandbox + can execute real commands in the worktree (ran porch done, wrote files) #1051

@amrmelsayed

Description

@amrmelsayed

Severity: safety. A consult reviewer lane must be read-only/advisory. The Gemini (agy/Antigravity) adapter's agent was observed executing real, mutating commands inside a builder worktree — including a porch state transition (porch done <project>) and a file write. Two distinct failures, the second serious.

(1) Recurring — empty sandbox, no real review

The Gemini lane repeatedly launches in an empty temporary sandbox directory with no repo/diff context, so it never sees the code under review. Its transcript is the agent flailing in that empty dir — "I will list the contents of the workspace directory…", inspecting antigravity-cli folders, env vars, and permissions — instead of reviewing a diff. No VERDICT: line is produced.

This has recurred on nearly every consult run across a multi-PR session. The Claude and Codex lanes receive the worktree diff correctly; the Gemini lane does not.

Knock-on effect: when a lane produces no verdict, porch defaults the result to REQUEST_CHANGES (see §3) — so Gemini contributes a false blocking verdict on essentially every PR, which reviewers learn to dismiss as "just the gemini tooling failure" (dangerous — it trains people to ignore a REQUEST_CHANGES).

(2) NEW — the reviewer agent executed real commands in the worktree

On a re-run, the Gemini agent ran real commands inside the builder worktree:

  • porch done <project> — which advanced porch protocol state and fired a gate.
  • Wrote a spurious file into the project's codev/projects/<project>/ directory.

In the observed instance the end state happened to be correct (the review was complete, so the gate landed where it belonged). But that was luck, not safety. A reviewer agent with shell access to porch/git/the filesystem can:

  • Advance a protocol past a gate before review is actually complete.
  • Mutate, delete, or commit code in the worktree it's supposed to passively review.
  • Pollute the project's audit trail with stray files.

Why this matters

The consult trust model is that reviewer lanes read + emit a verdict — they are advisory, not actors. A reviewer that can execute porch done or write files is a privilege-escalation-shaped bug: it can corrupt orchestration state and the worktree it's reviewing.

Proposed fixes

  1. Sandbox consult reviewer agents read-only. The Gemini/Antigravity (agy) reviewer must run with no ability to execute porch, git, or filesystem-write commands. Its only output is a verdict + comments. Audit how the agy adapter grants tool/command access — it evidently runs with a shell that has real porch + write access to the worktree.
  2. Fix the empty-sandbox context plumbing. The adapter must receive the builder worktree's diff/context the same way the Claude and Codex lanes do. Until it does, it structurally cannot review.
  3. Porch: distinguish "absent verdict / tooling failure" from REQUEST_CHANGES. A lane that produces no VERDICT: line should be recorded as SKIPPED/ERRORED, not defaulted to a blocking REQUEST_CHANGES. The current default trains operators to ignore real REQUEST_CHANGES verdicts.

Evidence

  • The Gemini lane's review-output file for an affected run is the empty-sandbox flailing transcript (lists workspace dir, inspects antigravity-cli, checks env/permissions — never references the actual diff).
  • Builder report from an affected run: the agent "went rogue and executed real commands in the worktree (ran porch done … and wrote a spurious rebuttals file, since removed); porch state verified consistent afterward."
  • Pattern: empty-sandbox / no-verdict on the Gemini lane across ~7 consecutive PRs in one session.

Areas

Primary: area/consult (the agy/Antigravity reviewer adapter — read-only sandboxing + empty-sandbox context). Secondary: area/porch (the absent-verdict → REQUEST_CHANGES default in the verdict parser).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/consultArea: Consult CLI / consultation toolingbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions