Skip to content

feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171

Merged
dean0x merged 7 commits intomainfrom
feat/evaluator-rename-tester-agent
Apr 3, 2026
Merged

feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171
dean0x merged 7 commits intomainfrom
feat/evaluator-rename-tester-agent

Conversation

@dean0x
Copy link
Copy Markdown
Owner

@dean0x dean0x commented Apr 2, 2026

Summary

Agent and skill consolidation to improve semantic clarity and expand development workflow coverage:

  • Evaluator agent: Renamed from shepherd for clearer analysis-focused intent
  • Tester agent: New agent for test validation and QA workflows
  • QA skill: Comprehensive quality assurance patterns and best practices
  • UI Design plugin: Renamed from frontend-design for consistency with skill naming

Changes

  • 35 files modified/created
  • 913 insertions(+), 82 deletions(-)
  • Includes agent file renames, skill additions, plugin renames, and cascading reference updates

Test Plan

  • Build passes: npm run build
  • Test suite passes: all 134+ tests
  • CLI verified: devflow list shows updated plugin and agent registry
  • No broken references in documentation or configurations

…ill, UI design plugin

- Rename shepherd agent to evaluator (analysis-focused naming)
- Add tester agent for test validation
- Add QA skill for quality assurance workflows
- Rename devflow-frontend-design plugin to devflow-ui-design
- Update all references across 25 files (docs, plugins, skills, tests, configs)

This consolidation improves semantic clarity and expands agent coverage for comprehensive development workflows.
Copy link
Copy Markdown
Owner Author

@dean0x dean0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Based on 9 review reports (architecture, complexity, consistency, documentation, performance, regression, security, testing, typescript), here are the findings:

Blocking Issues (≥80% confidence)

1. Duplicate step 8 in README (95% confidence)

  • File: plugins/devflow-implement/README.md:33
  • Problem: Steps 8 (Simplification) and 8 (PR Creation) have the same number
  • Fix: Change line 33 from 8. **PR Creation** to 9. **PR Creation**

2. Workflow order mismatch (82% confidence)

  • File: plugins/devflow-implement/README.md:25-33
  • Problem: README shows QA Testing → Simplification → PR, but actual commands run Simplification → Evaluator → Tester → PR
  • Fix: Reorder README workflow to match actual execution (see detailed comment)

3. README skill count stale (90% confidence)

  • File: plugins/devflow-implement/README.md:51
  • Problem: Header says (9) but plugin now includes qa skill
  • Fix: Add qa to skill list and update count to (10)

4. Missing qa skill in ambient plugin (82% confidence)

  • File: plugins/devflow-ambient/.claude-plugin/plugin.json
  • Problem: Plugin declares tester agent but not qa skill in array
  • Fix: Add "qa" to skills array

5. Tester agent exceeds size convention (85-90% confidence)

  • File: shared/agents/tester.md:1-195
  • Problem: 195 lines exceeds CLAUDE.md target of 80-120 for Worker agents
  • Fix: Extract Dev Server Lifecycle and Browser Execution to shared/skills/qa/references/browser-testing.md

Security Issues (HIGH confidence)

6. Arbitrary Bash execution without sandboxing (85% confidence)

  • File: shared/agents/tester.md:44-66
  • Problem: Tester executes scenarios derived from ORIGINAL_REQUEST without command restrictions
  • Fix: Add Bash restrictions to Boundaries section (no rm -rf, sudo, eval, etc.)

7. Dev server log at predictable path (82% confidence)

  • File: shared/agents/tester.md:94
  • Problem: /tmp/devflow-tester-server.log vulnerable to symlink attacks
  • Fix: Use mktemp /tmp/devflow-tester-XXXXXX.log instead

Performance Issues

8. Variable shadowing in installer.ts (82% confidence)

  • File: src/cli/utils/installer.ts:215
  • Problem: agentsTarget declared twice in different scopes
  • Fix: Hoist before loop and reuse in both locations

9. Background memory update unbounded loop (83% confidence)

  • File: scripts/hooks/background-memory-update:3-9
  • Problem: Removed tail -3 pre-filter unbounds subprocess spawning
  • Fix: Re-add tail -3 before while loop

Lower-Confidence Suggestions (60-79%)

  • Missing tools frontmatter on Tester agent (65%)
  • SRP tension in Tester (three distinct responsibilities) (70%)
  • CHANGELOG not updated for shepherd→evaluator rename (60%, historical only)
  • Legacy cleanup inconsistency between init.ts and installer.ts (70%)

Recommendation

CHANGES_REQUESTED — All blocking issues should be fixed before merge. The security concerns (arbitrary Bash, symlink attack) are the highest priority. The documentation fixes (duplicate step 8, workflow order, skill count) affect user understanding and should be corrected.

6. **Alignment Check** - Evaluator validates against original request
7. **QA Testing** - Tester executes scenario-based acceptance tests
8. **Simplification** - Simplifier refines code clarity
8. **PR Creation** - Git agent creates pull request
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**Step numbering error (95% confidence)

This step should be numbered `9` (not `8`). The sequence should be:

    1. QA Testing
    1. Simplification
    1. PR Creation ← fix this line

@@ -27,8 +27,9 @@ npx devflow-kit init --plugin=implement
3. **Implementation** - Coder agent implements on feature branch
4. **Validation** - Validator runs build/test/lint checks
5. **Self-Review** - Scrutinizer evaluates against 9-pillar framework
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workflow order does not match actual execution (82% confidence)

README shows QA Testing (7) before Simplification (8), but both implement.md and implement-teams.md run Simplification much earlier (Phase 9) before Evaluator (Phase 12) and Tester (Phase 13).

Correct order to match actual command phases:

  1. Exploration
  2. Planning
  3. Implementation
  4. Validation
  5. Simplification ← moved here
  6. Self-Review
  7. Alignment Check
  8. QA Testing
  9. PR Creation

- `tester` - Scenario-based QA testing
- `validator` - Build/test validation

### Skills (9)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skill count is stale (90% confidence)

Header says `(9)` but plugin.json now includes `qa` skill. README lists 9 total from core+plugin dependencies.

Add `qa` to the list and update count:
```

Skills (10)

  • software-design - Result types, DI, immutability
  • git - Git safety, atomic commits
  • implementation-patterns - CRUD, API, events
  • testing - Test quality, coverage
  • boundary-validation - Boundary validation
  • self-review - 9-pillar framework
  • qa - Scenario-based acceptance testing
  • typescript - TypeScript patterns
  • react - React patterns
  • accessibility - Keyboard, ARIA, focus
    ```

- **Integration**: Components work together correctly
- **Regression**: Existing behavior preserved (if applicable)
6. **Execute scenarios**: Run each via Bash (or browser for web scenarios) — capture stdout/stderr, exit codes, file state
7. **Evaluate results**: Compare actual vs expected behavior for each scenario
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bash execution without sandboxing (85% confidence - SECURITY)

Tester designs scenarios from ORIGINAL_REQUEST and executes them via Bash. A malicious issue could induce harmful commands.

Add explicit restrictions to Boundaries section:

  • NEVER execute: rm -rf, sudo, eval, exec, dd, mkfs
  • NEVER write outside /tmp/devflow-tester-*/
  • ONLY run: test runners, curl, read-only inspection, build commands

- Defaults by framework: Next.js→3000, Vite→5173, CRA→3000, Django→8000, Go→8080

### 4. Start Server (if not already running)
- Run in background: `npm run dev > /tmp/devflow-tester-server.log 2>&1 &`
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symlink attack risk (82% confidence - SECURITY)

Hardcoded path `/tmp/devflow-tester-server.log` is vulnerable on multi-user systems (CWE-59).

Use `mktemp` instead:
```bash
LOG=$(mktemp /tmp/devflow-tester-XXXXXX.log)
npm run dev > "$LOG" 2>&1 &
```


# Tester Agent

You are a scenario-based QA specialist. You design and execute acceptance tests that verify implementation behavior from the user's perspective. You test what was asked for, not implementation details. You report results with evidence — you never fix code yourself.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent size exceeds convention (85% confidence)

195 lines exceeds CLAUDE.md target of 80-120 for Worker agents. Dev Server Lifecycle (lines 70-106) and Browser Execution (lines 108-125) are procedural.

Suggested fix:

  1. Extract to `shared/skills/qa/references/browser-testing.md`
  2. Reference from agent with brief summary + Read instruction
  3. This brings agent to ~140 lines (within limit)

}

// Clean up legacy agent files (renamed or removed agents from prior versions)
const agentsTarget = path.join(claudeDir, 'agents', 'devflow');
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable shadowing: `agentsTarget` declared twice (82% confidence)

`agentsTarget` is declared both inside the for-loop (line 196) and after it (line 215), creating maintenance hazard.

Hoist before loop and reuse:
```typescript
// Before for-loop (around line 167):
const agentsTarget = path.join(claudeDir, 'agents', 'devflow');

// Inside/after loop: reuse, don't redeclare
```

Dean Sharon added 6 commits April 3, 2026 02:11
…s list

- Correct step ordering to match actual command pipeline (Simplification
  before Self-Review, before Alignment Check, before QA Testing)
- Fix duplicate step number 8 (PR Creation renumbered to 9)
- Replace outdated skills list with current plugin.json skills (6 skills:
  agent-teams, implementation-patterns, knowledge-persistence, qa,
  self-review, worktree-support)
…d-memory-update

Re-add tail -3 pre-filter before the json_extract_messages while loop so at most
3 subprocesses are spawned per pipeline regardless of transcript size. Add
head -c 100000 on each line before piping to json_extract_messages to prevent
resource exhaustion from oversized transcript entries.
…NT_NAMES consistency tests

Adds two missing test coverage items from batch-5 review:
- Verify devflow-implement and devflow-ambient declare evaluator/tester agents and qa skill
- Verify no LEGACY_AGENT_NAMES entry overlaps with current plugin agent registrations
… update skill count

- Add missing 'qa' skill to devflow-ambient plugin.json (tester agent dependency)
- Hoist agentsTarget before loop in installer.ts to eliminate variable shadowing
- Update README skill count from 35 to 38 to match actual shared/skills/ directory count
…procedures

Add tools frontmatter restricting Tester to Read/Grep/Glob/Bash and Chrome MCP
tools only. Extract Dev Server Lifecycle and Browser Execution sections to
shared/skills/qa/references/browser-testing.md, reducing agent from 196 to 142
lines. Move Bash execution deny list (rm -rf, sudo, eval, package installs) and
filesystem write constraints into the reference. Fix predictable /tmp log path
to use mktemp, and restrict .env reading to PORT variable only via grep.
…nce, extend test

- Add qa to devflow-ambient skills array in plugins.ts (was in plugin.json only)
- Remove duplicate browser-testing reference in tester.md
- Extend ambient test to assert qa skill dependency
@dean0x dean0x merged commit efb5a0d into main Apr 3, 2026
4 checks passed
@dean0x dean0x deleted the feat/evaluator-rename-tester-agent branch April 3, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant