Skip to content

feat(agent): add in-pipeline pre-PR self-review phase#263

Open
nizar-lahlali wants to merge 4 commits into
mainfrom
feat/262-pre-pr-self-review
Open

feat(agent): add in-pipeline pre-PR self-review phase#263
nizar-lahlali wants to merge 4 commits into
mainfrom
feat/262-pre-pr-self-review

Conversation

@nizar-lahlali

@nizar-lahlali nizar-lahlali commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds an optional self-review pipeline phase where the LLM critiques its own cumulative diff before the PR is created
  • New self_review.py orchestration module with run_self_review() that generates the diff, invokes run_agent() a second time with a review-focused prompt, and accumulates turns/cost
  • New prompts/self_review.py with a focused review prompt template (correctness, bugs, security, style, test gaps checklist)
  • TaskConfig extended with self_review_enabled (default False) and self_review_max_turns (default 5)
  • Fields threaded through build_config(), get_config(), server.py _extract_invocation_params(), and pipeline.py run_task()
  • NEW: Self-review summary posted as a PR comment — the review agent writes .self-review-summary.md with structured findings, the pipeline posts it via gh pr comment, then deletes the file so it never appears in the diff

Design decisions:

  • Fail-open: self-review errors never block PR creation
  • Uses remaining turns/budget from original max_turns allocation (capped at self_review_max_turns)
  • Second run_agent() call gives the model fresh context and a clean review perspective
  • Skipped for pr_review (read-only), empty diff, no remaining turns/budget
  • Feature is opt-in (disabled by default) — no behavior change for existing users
  • Summary file approach (vs. parsing trajectory): deterministic, decoupled from SDK internals
  • Comment posted after PR creation (natural ordering — PR must exist before commenting)

Closes #262

Test plan

  • All 34 unit tests pass (tests/test_self_review.py) — includes 13 new tests for summary reader and PR comment posting
  • All existing tests pass (no regressions)
  • Ruff lint passes on all new/modified files
  • Manual validation: set SELF_REVIEW_ENABLED=true in local batch mode to verify the phase executes and summary is posted
  • Integration: deploy and submit a task with self_review_enabled: true in the orchestrator payload

@nizar-lahlali nizar-lahlali requested a review from a team as a code owner June 4, 2026 17:39
@krokoko

krokoko commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I think this is fine, just should be exposed using the new workflow system, a.k.a: add a method to do self review, and make it configurable as a step in the coding task workflow yaml file

bgagent added 3 commits June 22, 2026 14:34
Adds an optional self-review phase between agent execution and post-hooks
where the LLM critiques its own cumulative diff before the PR is created.
This improves first-pass PR quality by catching bugs, style issues, and
test gaps before human review.

- New self_review.py orchestration module with run_self_review()
- New prompts/self_review.py with focused review prompt template
- TaskConfig extended with self_review_enabled and self_review_max_turns
- Fields threaded through build_config, get_config, server, pipeline
- Fail-open: self-review errors never block PR creation
- Uses remaining turns/budget from original allocation (capped)
- Feature is opt-in (disabled by default)
After the self-review phase completes, the agent writes a structured
summary of findings to `.self-review-summary.md`. The pipeline reads
this file, posts it as a PR comment via `gh pr comment`, then deletes
it so it never appears in the PR diff. Fail-open: if the file is
missing or the comment fails to post, the pipeline continues normally.
Addresses review feedback: rather than a hardcoded inline pipeline phase
gated by a per-task API flag, self-review is now a first-class workflow
StepKind, configurable as a step in the coding workflow YAML.

- Add `self_review` StepKind to workflow.schema.json (with a `max_turns`
  step field, 1-20, default 5) + models.Step; excluded from repo-less
  workflows (schema conditional + validator rule 3) since it diffs/re-runs
  the agent against the cloned tree.
- Register `_handle_self_review` in the runner's STEP_HANDLERS; it wraps
  run_self_review and accumulates the review loop's turns/cost onto the
  shared agent_result. Non-side-effecting, so it may use on_failure:
  continue (advisory / fail-open).
- Drive the step from pipeline.run_task via `_execute_self_review_step`
  (only_kinds={"self_review"}), mirroring _execute_agent_step. Runs after
  the cancel short-circuit and before post-hooks; posts the summary PR
  comment after ensure_pr when the step ran.
- Declare the step in coding/new-task-v1.yaml (opt-in by presence;
  on_failure: continue). Remove it to disable; tune via max_turns.
- Drop the now-redundant self_review_enabled / self_review_max_turns
  fields from TaskConfig, build_config, get_config, server params, and
  run_task — configurability lives in the workflow YAML.
- Add golden validator-corpus fixtures (valid self_review step; repo-less
  + self_review rejected) to lock parity.

Agent quality green: ruff, ty (0 errors), 1129 tests, 79.87% coverage.
@nizar-lahlali nizar-lahlali force-pushed the feat/262-pre-pr-self-review branch from 19b763c to 3211230 Compare June 22, 2026 20:16
@nizar-lahlali nizar-lahlali requested a review from a team as a code owner June 22, 2026 20:16
@theagenticguy

Copy link
Copy Markdown
Contributor

It might be semantics but should this be called a self review or should it be explicitly described as a isolated sub agent critic (aka adversarial review)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agent): add in-pipeline pre-PR self-review phase

3 participants