feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171
feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171
Conversation
…ill, UI design plugin - Rename shepherd agent to evaluator (analysis-focused naming) - Add tester agent for test validation - Add QA skill for quality assurance workflows - Rename devflow-frontend-design plugin to devflow-ui-design - Update all references across 25 files (docs, plugins, skills, tests, configs) This consolidation improves semantic clarity and expands agent coverage for comprehensive development workflows.
dean0x
left a comment
There was a problem hiding this comment.
Code Review Summary
Based on 9 review reports (architecture, complexity, consistency, documentation, performance, regression, security, testing, typescript), here are the findings:
Blocking Issues (≥80% confidence)
1. Duplicate step 8 in README (95% confidence)
- File:
plugins/devflow-implement/README.md:33 - Problem: Steps 8 (Simplification) and 8 (PR Creation) have the same number
- Fix: Change line 33 from
8. **PR Creation**to9. **PR Creation**
2. Workflow order mismatch (82% confidence)
- File:
plugins/devflow-implement/README.md:25-33 - Problem: README shows QA Testing → Simplification → PR, but actual commands run Simplification → Evaluator → Tester → PR
- Fix: Reorder README workflow to match actual execution (see detailed comment)
3. README skill count stale (90% confidence)
- File:
plugins/devflow-implement/README.md:51 - Problem: Header says
(9)but plugin now includesqaskill - Fix: Add
qato skill list and update count to(10)
4. Missing qa skill in ambient plugin (82% confidence)
- File:
plugins/devflow-ambient/.claude-plugin/plugin.json - Problem: Plugin declares
testeragent but notqaskill in array - Fix: Add
"qa"to skills array
5. Tester agent exceeds size convention (85-90% confidence)
- File:
shared/agents/tester.md:1-195 - Problem: 195 lines exceeds CLAUDE.md target of 80-120 for Worker agents
- Fix: Extract Dev Server Lifecycle and Browser Execution to
shared/skills/qa/references/browser-testing.md
Security Issues (HIGH confidence)
6. Arbitrary Bash execution without sandboxing (85% confidence)
- File:
shared/agents/tester.md:44-66 - Problem: Tester executes scenarios derived from ORIGINAL_REQUEST without command restrictions
- Fix: Add Bash restrictions to Boundaries section (no rm -rf, sudo, eval, etc.)
7. Dev server log at predictable path (82% confidence)
- File:
shared/agents/tester.md:94 - Problem:
/tmp/devflow-tester-server.logvulnerable to symlink attacks - Fix: Use
mktemp /tmp/devflow-tester-XXXXXX.loginstead
Performance Issues
8. Variable shadowing in installer.ts (82% confidence)
- File:
src/cli/utils/installer.ts:215 - Problem:
agentsTargetdeclared twice in different scopes - Fix: Hoist before loop and reuse in both locations
9. Background memory update unbounded loop (83% confidence)
- File:
scripts/hooks/background-memory-update:3-9 - Problem: Removed
tail -3pre-filter unbounds subprocess spawning - Fix: Re-add
tail -3before while loop
Lower-Confidence Suggestions (60-79%)
- Missing
toolsfrontmatter on Tester agent (65%) - SRP tension in Tester (three distinct responsibilities) (70%)
- CHANGELOG not updated for shepherd→evaluator rename (60%, historical only)
- Legacy cleanup inconsistency between init.ts and installer.ts (70%)
Recommendation
CHANGES_REQUESTED — All blocking issues should be fixed before merge. The security concerns (arbitrary Bash, symlink attack) are the highest priority. The documentation fixes (duplicate step 8, workflow order, skill count) affect user understanding and should be corrected.
plugins/devflow-implement/README.md
Outdated
| 6. **Alignment Check** - Evaluator validates against original request | ||
| 7. **QA Testing** - Tester executes scenario-based acceptance tests | ||
| 8. **Simplification** - Simplifier refines code clarity | ||
| 8. **PR Creation** - Git agent creates pull request |
There was a problem hiding this comment.
**Step numbering error (95% confidence)
This step should be numbered `9` (not `8`). The sequence should be:
-
- QA Testing
-
- Simplification
-
- PR Creation ← fix this line
plugins/devflow-implement/README.md
Outdated
| @@ -27,8 +27,9 @@ npx devflow-kit init --plugin=implement | |||
| 3. **Implementation** - Coder agent implements on feature branch | |||
| 4. **Validation** - Validator runs build/test/lint checks | |||
| 5. **Self-Review** - Scrutinizer evaluates against 9-pillar framework | |||
There was a problem hiding this comment.
Workflow order does not match actual execution (82% confidence)
README shows QA Testing (7) before Simplification (8), but both implement.md and implement-teams.md run Simplification much earlier (Phase 9) before Evaluator (Phase 12) and Tester (Phase 13).
Correct order to match actual command phases:
- Exploration
- Planning
- Implementation
- Validation
- Simplification ← moved here
- Self-Review
- Alignment Check
- QA Testing
- PR Creation
plugins/devflow-implement/README.md
Outdated
| - `tester` - Scenario-based QA testing | ||
| - `validator` - Build/test validation | ||
|
|
||
| ### Skills (9) |
There was a problem hiding this comment.
Skill count is stale (90% confidence)
Header says `(9)` but plugin.json now includes `qa` skill. README lists 9 total from core+plugin dependencies.
Add `qa` to the list and update count:
```
Skills (10)
- software-design - Result types, DI, immutability
- git - Git safety, atomic commits
- implementation-patterns - CRUD, API, events
- testing - Test quality, coverage
- boundary-validation - Boundary validation
- self-review - 9-pillar framework
- qa - Scenario-based acceptance testing
- typescript - TypeScript patterns
- react - React patterns
- accessibility - Keyboard, ARIA, focus
```
| - **Integration**: Components work together correctly | ||
| - **Regression**: Existing behavior preserved (if applicable) | ||
| 6. **Execute scenarios**: Run each via Bash (or browser for web scenarios) — capture stdout/stderr, exit codes, file state | ||
| 7. **Evaluate results**: Compare actual vs expected behavior for each scenario |
There was a problem hiding this comment.
Bash execution without sandboxing (85% confidence - SECURITY)
Tester designs scenarios from ORIGINAL_REQUEST and executes them via Bash. A malicious issue could induce harmful commands.
Add explicit restrictions to Boundaries section:
- NEVER execute: rm -rf, sudo, eval, exec, dd, mkfs
- NEVER write outside /tmp/devflow-tester-*/
- ONLY run: test runners, curl, read-only inspection, build commands
shared/agents/tester.md
Outdated
| - Defaults by framework: Next.js→3000, Vite→5173, CRA→3000, Django→8000, Go→8080 | ||
|
|
||
| ### 4. Start Server (if not already running) | ||
| - Run in background: `npm run dev > /tmp/devflow-tester-server.log 2>&1 &` |
There was a problem hiding this comment.
Symlink attack risk (82% confidence - SECURITY)
Hardcoded path `/tmp/devflow-tester-server.log` is vulnerable on multi-user systems (CWE-59).
Use `mktemp` instead:
```bash
LOG=$(mktemp /tmp/devflow-tester-XXXXXX.log)
npm run dev > "$LOG" 2>&1 &
```
|
|
||
| # Tester Agent | ||
|
|
||
| You are a scenario-based QA specialist. You design and execute acceptance tests that verify implementation behavior from the user's perspective. You test what was asked for, not implementation details. You report results with evidence — you never fix code yourself. |
There was a problem hiding this comment.
Agent size exceeds convention (85% confidence)
195 lines exceeds CLAUDE.md target of 80-120 for Worker agents. Dev Server Lifecycle (lines 70-106) and Browser Execution (lines 108-125) are procedural.
Suggested fix:
- Extract to `shared/skills/qa/references/browser-testing.md`
- Reference from agent with brief summary + Read instruction
- This brings agent to ~140 lines (within limit)
src/cli/utils/installer.ts
Outdated
| } | ||
|
|
||
| // Clean up legacy agent files (renamed or removed agents from prior versions) | ||
| const agentsTarget = path.join(claudeDir, 'agents', 'devflow'); |
There was a problem hiding this comment.
Variable shadowing: `agentsTarget` declared twice (82% confidence)
`agentsTarget` is declared both inside the for-loop (line 196) and after it (line 215), creating maintenance hazard.
Hoist before loop and reuse:
```typescript
// Before for-loop (around line 167):
const agentsTarget = path.join(claudeDir, 'agents', 'devflow');
// Inside/after loop: reuse, don't redeclare
```
…s list - Correct step ordering to match actual command pipeline (Simplification before Self-Review, before Alignment Check, before QA Testing) - Fix duplicate step number 8 (PR Creation renumbered to 9) - Replace outdated skills list with current plugin.json skills (6 skills: agent-teams, implementation-patterns, knowledge-persistence, qa, self-review, worktree-support)
…d-memory-update Re-add tail -3 pre-filter before the json_extract_messages while loop so at most 3 subprocesses are spawned per pipeline regardless of transcript size. Add head -c 100000 on each line before piping to json_extract_messages to prevent resource exhaustion from oversized transcript entries.
…NT_NAMES consistency tests Adds two missing test coverage items from batch-5 review: - Verify devflow-implement and devflow-ambient declare evaluator/tester agents and qa skill - Verify no LEGACY_AGENT_NAMES entry overlaps with current plugin agent registrations
… update skill count - Add missing 'qa' skill to devflow-ambient plugin.json (tester agent dependency) - Hoist agentsTarget before loop in installer.ts to eliminate variable shadowing - Update README skill count from 35 to 38 to match actual shared/skills/ directory count
…procedures Add tools frontmatter restricting Tester to Read/Grep/Glob/Bash and Chrome MCP tools only. Extract Dev Server Lifecycle and Browser Execution sections to shared/skills/qa/references/browser-testing.md, reducing agent from 196 to 142 lines. Move Bash execution deny list (rm -rf, sudo, eval, package installs) and filesystem write constraints into the reference. Fix predictable /tmp log path to use mktemp, and restrict .env reading to PORT variable only via grep.
…nce, extend test - Add qa to devflow-ambient skills array in plugins.ts (was in plugin.json only) - Remove duplicate browser-testing reference in tester.md - Extend ambient test to assert qa skill dependency
Summary
Agent and skill consolidation to improve semantic clarity and expand development workflow coverage:
Changes
Test Plan
npm run builddevflow listshows updated plugin and agent registry