feat: automated ai powered qa agent for pr reviews and commits on prs by amaan-bhati · Pull Request #830 · keploy/docs

amaan-bhati · 2026-04-14T08:02:25Z

Implemented a fully autonomous QA Review Agent integrated directly into our GitHub Actions. The agent acts as an elite, automated repository maintainer that validates every Pull Request, especially those from Open Source contributors against a massive, highly specific rulebook derived from our Docusaurus/React architecture.

It prevents Docusaurus SSR crashes, eliminates styling fragmentations (enforcing Tailwind), and ensures all markdown files maintain strict SEO/MDX standards without requiring humans to catch every missing alt-tag or bad window.location call.

How It Validates Changes (The Workflow)

When an OSS contributor opens or pushes to a PR, the agent executes the following validation pipeline:

1. Diff & Context Gathering

The workflow (qa-review.yml) triggers our Python agent. It securely fetches the contributor's diff via the GitHub API and loads our newly constructed 900-line QA_GUIDELINES.md rulebook into memory.

Verified Baseline: Evaluated to ensure the pull_request execution token securely restricts modifications directly to the isolated PR context rather than leaking global read/write keys.
Source Reference: GitHub Actions: Automatic Token Authentication

2. Second-Order Impact Mapping

Before reading the code, the agent cross-references the changed files against our codebase_map.json dependency graph.
Example: If a contributor modifies QuickStartFilter.js, the agent automatically maps out every .mdx file relying on that component to ensure no downstream props or layouts were broken by the isolated change.

Verified Baseline: The /qa-agent/scripts/build_codebase_map.py architecture utilizes Python's native modules to safely construct reverse dependency mappings from modern ECMA / JSX imports without executing untrusted AST code.
Source Reference: Python 3 pathlib & Iteration Docs

3. The 4-Pass AI Validation

The agent packages the diff, dependency graph, and guidelines, sending them to Claude (Anthropic). Claude executes 4 strict passes:

Pass 1 (Mechanical): Checks for basic hygiene (no inline styles, no leftover console.log, correct Frontmatter presence).
Pass 2 (Architectural): Enforces Docusaurus SSR rules (e.g., verifying ExecutionEnvironment.canUseDOM is used before running browser APIs).

Verified Baseline: Validated the strict requirement to wrap window calls inside ExecutionEnvironment.canUseDOM or useEffect to prevent ReferenceError: window is not defined during the SSG generation pass.
Source Reference: Docusaurus SSR Advanced Guide
Pass 3 (Topological): Evaluates if the change creates unhandled "second-order" breakages based on the map.
Pass 4 (Markdown): Validates semantic formatting, SEO tags, Admonitions, and strict absolute asset pathing.

Verified Baseline: The structuring of the qa_review.py analytical prompt adheres strictly to Anthropic's interactive role-playing constraints, structurally separating raw data (local PR Diff) from the execution pipeline (System Checklist).
Source Reference: Anthropic Prompt Engineering Official Docs & Interactive Prompt Tutorial

4. Automated Feedback Delivery

The agent logs back into the GitHub PR and posts its findings natively as a comment, organizing feedback by severity:

CRITICAL: Blocks the merge (e.g., SSR build crashes, syntax breaks).
WARNING: Architecture violations (e.g., using Bootstrap instead of Tailwind, missing alt text).
INFO: Mentor-style suggestions to guide OSS contributors towards our best practices.

Setup

Requires ANTHROPIC_API_KEY to be set within the Repository Action Secrets.
(Note: GITHUB_TOKEN is natively injected).

Architecture Origins & Verified Sources

This document serves as an exhaustive bibliography of the concepts, architectural patterns, and verified engineering sources used to design the Autonomous QA Review Agent.

1. The Idea: LLM Code Review & Dependency Graph Context

The Challenge: Out-of-the-box, Large Language Models (LLMs) struggle to effectively review code if they only see a single isolated file or git diff. If a developer edits <QuickStartFilter />, the LLM has no idea what other files break. Injecting the entire repository into the LLM context limits performance and rapidly exceeds cost ceilings.
The Solution: We combined LLM prompts with a Static Dependency Graph. Instead of blindly sending code, the Python script builds a map of the repository's imports. It performs a "topological lookup" to trace the blast radius of a change before compiling the final context window for Claude. This is an emerging industry standard for LLM code agents.

Verified Industry Sources & Research:

Codebase Graph Analysis: Shifting beyond simple text-based Retrieval-Augmented Generation (RAG) towards Graph-Augmented retrieval for code. Moving Beyond RAG: Codebase Graphs
Anthropic Context Sizing: Best practices for structuring massive context windows without dilution. Anthropic Prompt Engineering: Long Context Windows

2. Designing the Python Automation (`qa_review.py`)

The Challenge: We needed a Python environment that runs securely inside a CI pipeline, gracefully intercepts GitHub webhooks, parses the PR diff, and structures a request.
The Solution: We utilized PyGithub to interface securely via automated Action Tokens, specifically pulling the raw textual diff without needing to perform risky git checkout commands on untrusted contributor forks.

Anthropic SDK: Utilizes the official anthropic Python client to query Claude 3.5 via the Messages API securely.
- Source: Anthropic Python SDK Documentation
PyGithub: Bypasses local git clone risks by directly scraping the .diff_url payload via GitHub's API.
- Source: PyGithub PullRequest Objects
Graph Linking: Leverages native Python libraries (pathlib, re) for traversing imports without untrusted execution.
- Source: Python pathlib Reference

3. Creating the GitHub Automation (`.github/workflows/qa-review.yml`)

The Challenge: The agent needed to run autonomously whenever Open Source (OSS) contributors push code, but it had to operate safely so malicious actors couldn't steal the environment secrets during execution.
The Solution: We bound the execution directly to the .github/workflows system relying on the pull_request trigger. This native trigger deliberately restricts the scope of the pre-injected $GITHUB_TOKEN to "read/comment-only" bounds specifically tied to the PR context.

Verified Sources of Truth:

GitHub Actions Event Triggers: Specifically using pull_request over pull_request_target to safely sandbox untrusted fork executions. GitHub Actions: Security Guidelines for Fork PRs

4. Designing the Ruleset (`QA_GUIDELINES.md`)

The Challenge: Knowing exactly what rules an LLM should aggressively hunt for within a Docusaurus/React architecture.
The Solution: We used anitgravity and codex to craft a massive Markdown file that acts as the "Brain" of the LLM. It focuses intensely on Static Site Generator (SSG) strictness, specifically the fact that Docusaurus is pre-rendered via Node.js before shipping to the client browser.

Verified Sources of Truth:

Docusaurus Node.js SSR Compliance: The fundamental rule requiring browser interactions (window.location) to be isolated behind ExecutionEnvironment.canUseDOM or React useEffect hooks. Docusaurus SSR Advanced Guide
Anthropic System Prompt Design: Structurally separating the Rules (QA_GUIDELINES.md) from the Data (The Pull Request Diff) using XML-tags to prevent prompt injection or hallucination. Anthropic Prompt Engineering Interactive Tutorial
Markdown SEO Adherence: Imposing rigid frontmatter mapping for Algolia and Docusaurus TOC generators natively. Docusaurus Frontmatter SEO Guide

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot

Pull request overview

Adds an automated “QA Review Agent” to run on PR events and post an AI-generated QA review comment, using a repo-specific QA guidelines document and a prebuilt dependency map to flag second-order risks.

Changes:

Introduces a Python-based QA review script that fetches PR diffs, builds a structured prompt, calls Anthropic, and comments the results back on the PR.
Adds a GitHub Actions workflow to run the agent on PR events, manual dispatch, and /qa-review issue comments.
Adds repo-derived QA guidelines and a committed codebase_map.json to support rule checking and downstream risk analysis.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`qa-agent/scripts/qa_review.py`	Implements the PR diff fetch → prompt build → Anthropic review → PR comment flow, plus “second-order” dependency hints.
`qa-agent/requirements.txt`	Adds Python dependencies for Anthropic + PyGithub.
`qa-agent/codebase_map.json`	Provides a static dependency map used for downstream risk reporting.
`qa-agent/QA_GUIDELINES.md`	Adds the rule/checklist framework the agent enforces in its output.
`.github/workflows/qa-review.yml`	Defines CI triggers and job steps to run the agent and post results on PRs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T08:08:49Z

.github/workflows/qa-review.yml

+        github.event.issue.pull_request != null &&
+        (
+          contains(github.event.comment.body, '/qa-review')
+        )


The issue_comment trigger allows anyone who can comment on a PR to run this workflow with repository secrets (including ANTHROPIC_API_KEY) and pull-requests: write, which is an easy vector for cost/abuse. Restrict execution to trusted users (e.g., author_association in {OWNER,MEMBER,COLLABORATOR} or a repo permission check) and/or require a label/maintainer-only command before running.

Suggested change

)

) &&

contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association)

Copilot · 2026-04-14T08:08:50Z

.github/workflows/qa-review.yml

+      - name: Run QA review agent
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PR_NUMBER: ${{ steps.context.outputs.pr_number }}
+          GITHUB_REPOSITORY: ${{ github.repository }}
+          REVIEW_MODE: ${{ steps.context.outputs.mode }}
+        run: python qa-agent/scripts/qa_review.py


On pull_request events from forks, secrets.ANTHROPIC_API_KEY won’t be available, so this step will fail (and may block external contributors). Add a job/step guard to skip gracefully when required secrets are missing (e.g., if: secrets.ANTHROPIC_API_KEY != '') and optionally post a neutral comment explaining it was skipped.

Copilot · 2026-04-14T08:08:50Z

qa-agent/requirements.txt

+anthropic>=0.40.0
+PyGithub>=2.1.1


Using open-ended version ranges (>=) for anthropic/PyGithub can make the QA workflow non-deterministic and break unexpectedly when upstream releases introduce breaking changes. Pin to known-good versions (or at least cap major versions) to keep CI stable and make updates intentional.

Suggested change

anthropic>=0.40.0

PyGithub>=2.1.1

anthropic==0.40.0

PyGithub==2.1.1

Copilot · 2026-04-14T08:08:50Z

qa-agent/scripts/qa_review.py

+
+Triggered by GitHub Actions on:
+- pull_request (opened, synchronize, reopened)
+- push to any open PR


The module docstring lists “push to any open PR” as a trigger, but the current workflow/job conditions don’t run the agent on push events. Update the docstring (or the workflow) so the documented triggers match actual behavior; otherwise it will mislead maintainers debugging CI behavior.

Suggested change

- push to any open PR

Copilot · 2026-04-14T08:08:51Z

qa-agent/scripts/qa_review.py

+def run_review(prompt: str) -> str:
+    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
+    message = client.messages.create(
+        model="claude-3-5-sonnet-20241022",
+        max_tokens=4000,
+        messages=[{"role": "user", "content": prompt}]
+    )
+    return message.content[0].text


This workflow sends PR title/body + diff content to Anthropic. That’s a potential data-exfiltration vector if a PR contains secrets or sensitive code. Consider adding explicit opt-in controls (e.g., only run on labeled PRs / trusted authors), redacting common secret patterns before sending, and documenting this behavior for contributors to avoid unintentionally sharing sensitive data with a third party.

Copilot · 2026-04-14T08:08:52Z

.github/workflows/qa-review.yml

+          elif [ "${{ github.event_name }}" = "issue_comment" ]; then
+            echo "pr_number=${{ github.event.issue.number }}" >> $GITHUB_OUTPUT
+            # check if comment says /qa-review fast
+            if echo "${{ github.event.comment.body }}" | grep -q "fast"; then


The fast mode detection matches any occurrence of the substring "fast" in the comment body (e.g., "breakfast"), which can unintentionally switch to fast mode. Consider parsing the command more strictly (e.g., match /qa-review fast as a token or anchor the regex) to avoid accidental mode changes.

Suggested change

if echo "${{ github.event.comment.body }}" | grep -q "fast"; then

if echo "${{ github.event.comment.body }}" | grep -Eq '(^|[[:space:]])/qa-review[[:space:]]+fast([[:space:]]|$)'; then

Copilot · 2026-04-14T08:08:52Z

qa-agent/scripts/qa_review.py

+    body = (
+        f"## 🔍 QA Review\n\n"
+        f"{review_text}"
+        f"{skipped_note}\n\n"
+        f"---\n"
+        f"*qa-agent · [Guidelines](./qa-agent/QA_GUIDELINES.md) · "


The comment footer link [Guidelines](./qa-agent/QA_GUIDELINES.md) is a relative URL; in PR/issue comments this typically resolves relative to the PR page URL, not the repository root, so it won’t point to the file. Use an absolute GitHub URL (e.g., https://github.com/<owner>/<repo>/blob/<ref>/qa-agent/QA_GUIDELINES.md) or a repo-root absolute path (/owner/repo/blob/...) so the link works reliably.

Suggested change

body = (

f"## 🔍 QA Review\n\n"

f"{review_text}"

f"{skipped_note}\n\n"

f"---\n"

f"*qa-agent · [Guidelines](./qa-agent/QA_GUIDELINES.md) · "

guidelines_url = f"https://github.com/{repo.full_name}/blob/{pr.base.ref}/qa-agent/QA_GUIDELINES.md"

body = (

f"## 🔍 QA Review\n\n"

f"{review_text}"

f"{skipped_note}\n\n"

f"---\n"

f"*qa-agent · [Guidelines]({guidelines_url}) · "

Copilot · 2026-04-14T08:08:52Z

.github/workflows/qa-review.yml

+      )
+
+    permissions:
+      pull-requests: write


qa_review.py posts via pr.create_issue_comment(...), which uses the Issues comments API. This job’s permissions block grants pull-requests: write but not issues: write; with explicit permissions set, missing scopes are none, so the comment call may 403. Add issues: write (as done in .github/workflows/greetings.yml) or switch to a PR review comment API that matches the granted permissions.

Suggested change

pull-requests: write

pull-requests: write

issues: write

Copilot · 2026-04-14T08:08:53Z

qa-agent/scripts/qa_review.py

+    Load the pre-built codebase dependency map.
+    This tells the agent which files depend on which, so it can do second-order analysis.
+    Built by build_codebase_map.py (run during agent setup or on schedule).


load_codebase_map() references a build_codebase_map.py generator, but that script isn’t included anywhere in qa-agent/ in this PR. Either add the generator (and document how/when to run it) or adjust the docstring to reflect the actual process for updating codebase_map.json, otherwise the map will drift and the “second-order analysis” will become misleading.

Suggested change

Load the pre-built codebase dependency map.

This tells the agent which files depend on which, so it can do second-order analysis.

Built by build_codebase_map.py (run during agent setup or on schedule).

Load the checked-in codebase dependency map.

This tells the agent which files depend on which, so it can do second-order analysis.

Keep codebase_map.json updated whenever dependency relationships change so the analysis stays accurate.

Copilot · 2026-04-14T08:08:53Z

qa-agent/scripts/qa_review.py

+# --- configuration ---
+ANTHROPIC_API_KEY  = os.environ["ANTHROPIC_API_KEY"]
+GITHUB_TOKEN       = os.environ["GITHUB_TOKEN"]
+PR_NUMBER          = int(os.environ["PR_NUMBER"])
+REPO_NAME          = os.environ["GITHUB_REPOSITORY"]
+REVIEW_MODE        = os.environ.get("REVIEW_MODE", "full")


Environment variables are read at import time via os.environ[...] (and int(os.environ["PR_NUMBER"])). In GitHub Actions (especially pull_request from forks or when the secret isn’t set), this will raise a KeyError before main() runs, causing the workflow to fail without a helpful message. Move env parsing into main() and use os.getenv with explicit validation; if required secrets are missing, exit cleanly (or skip the job) with a clear next step (e.g., set ANTHROPIC_API_KEY secret).

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

feat: automated ai powered qa agent

fe6051d

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot AI review requested due to automatic review settings April 14, 2026 08:02

Copilot started reviewing on behalf of amaan-bhati April 14, 2026 08:02 View session

add updated readme

8854662

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

Copilot AI reviewed Apr 14, 2026

View reviewed changes

feat: add more details to the qa guidelines

18c7057

Signed-off-by: amaan-bhati <amaanbhati49@gmail.com>

amaan-bhati changed the title ~~feat: automated ai powered qa agent~~ feat: automated ai powered qa agent for pr reviews and commits on prs Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: automated ai powered qa agent for pr reviews and commits on prs#830

feat: automated ai powered qa agent for pr reviews and commits on prs#830
amaan-bhati wants to merge 3 commits intomainfrom
qa-agent

amaan-bhati commented Apr 14, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	)
	) &&
	contains(fromJson('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association)

	if echo "${{ github.event.comment.body }}" \| grep -q "fast"; then
	if echo "${{ github.event.comment.body }}" \| grep -Eq '(^\|[[:space:]])/qa-review[[:space:]]+fast([[:space:]]\|$)'; then

Uh oh!

Conversation

amaan-bhati commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How It Validates Changes (The Workflow)

1. Diff & Context Gathering

2. Second-Order Impact Mapping

3. The 4-Pass AI Validation

4. Automated Feedback Delivery

Setup

Architecture Origins & Verified Sources

1. The Idea: LLM Code Review & Dependency Graph Context

2. Designing the Python Automation (qa_review.py)

3. Creating the GitHub Automation (.github/workflows/qa-review.yml)

4. Designing the Ruleset (QA_GUIDELINES.md)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amaan-bhati commented Apr 14, 2026 •

edited

Loading

2. Designing the Python Automation (`qa_review.py`)

3. Creating the GitHub Automation (`.github/workflows/qa-review.yml`)

4. Designing the Ruleset (`QA_GUIDELINES.md`)