feat(v2): agent consolidation — evaluator rename, tester agent, QA skill by dean0x · Pull Request #171 · dean0x/devflow

dean0x · 2026-04-02T22:55:08Z

Summary

Agent and skill consolidation to improve semantic clarity and expand development workflow coverage:

Evaluator agent: Renamed from shepherd for clearer analysis-focused intent
Tester agent: New agent for test validation and QA workflows
QA skill: Comprehensive quality assurance patterns and best practices
UI Design plugin: Renamed from frontend-design for consistency with skill naming

Changes

35 files modified/created
913 insertions(+), 82 deletions(-)
Includes agent file renames, skill additions, plugin renames, and cascading reference updates

Test Plan

Build passes: npm run build
Test suite passes: all 134+ tests
CLI verified: devflow list shows updated plugin and agent registry
No broken references in documentation or configurations

…ill, UI design plugin - Rename shepherd agent to evaluator (analysis-focused naming) - Add tester agent for test validation - Add QA skill for quality assurance workflows - Rename devflow-frontend-design plugin to devflow-ui-design - Update all references across 25 files (docs, plugins, skills, tests, configs) This consolidation improves semantic clarity and expands agent coverage for comprehensive development workflows.

dean0x

Code Review Summary

Based on 9 review reports (architecture, complexity, consistency, documentation, performance, regression, security, testing, typescript), here are the findings:

Blocking Issues (≥80% confidence)

1. Duplicate step 8 in README (95% confidence)

File: plugins/devflow-implement/README.md:33
Problem: Steps 8 (Simplification) and 8 (PR Creation) have the same number
Fix: Change line 33 from 8. **PR Creation** to 9. **PR Creation**

2. Workflow order mismatch (82% confidence)

File: plugins/devflow-implement/README.md:25-33
Problem: README shows QA Testing → Simplification → PR, but actual commands run Simplification → Evaluator → Tester → PR
Fix: Reorder README workflow to match actual execution (see detailed comment)

3. README skill count stale (90% confidence)

File: plugins/devflow-implement/README.md:51
Problem: Header says (9) but plugin now includes qa skill
Fix: Add qa to skill list and update count to (10)

4. Missing qa skill in ambient plugin (82% confidence)

File: plugins/devflow-ambient/.claude-plugin/plugin.json
Problem: Plugin declares tester agent but not qa skill in array
Fix: Add "qa" to skills array

5. Tester agent exceeds size convention (85-90% confidence)

File: shared/agents/tester.md:1-195
Problem: 195 lines exceeds CLAUDE.md target of 80-120 for Worker agents
Fix: Extract Dev Server Lifecycle and Browser Execution to shared/skills/qa/references/browser-testing.md

Security Issues (HIGH confidence)

6. Arbitrary Bash execution without sandboxing (85% confidence)

File: shared/agents/tester.md:44-66
Problem: Tester executes scenarios derived from ORIGINAL_REQUEST without command restrictions
Fix: Add Bash restrictions to Boundaries section (no rm -rf, sudo, eval, etc.)

7. Dev server log at predictable path (82% confidence)

File: shared/agents/tester.md:94
Problem: /tmp/devflow-tester-server.log vulnerable to symlink attacks
Fix: Use mktemp /tmp/devflow-tester-XXXXXX.log instead

Performance Issues

8. Variable shadowing in installer.ts (82% confidence)

File: src/cli/utils/installer.ts:215
Problem: agentsTarget declared twice in different scopes
Fix: Hoist before loop and reuse in both locations

9. Background memory update unbounded loop (83% confidence)

File: scripts/hooks/background-memory-update:3-9
Problem: Removed tail -3 pre-filter unbounds subprocess spawning
Fix: Re-add tail -3 before while loop

Lower-Confidence Suggestions (60-79%)

Missing tools frontmatter on Tester agent (65%)
SRP tension in Tester (three distinct responsibilities) (70%)
CHANGELOG not updated for shepherd→evaluator rename (60%, historical only)
Legacy cleanup inconsistency between init.ts and installer.ts (70%)

Recommendation

CHANGES_REQUESTED — All blocking issues should be fixed before merge. The security concerns (arbitrary Bash, symlink attack) are the highest priority. The documentation fixes (duplicate step 8, workflow order, skill count) affect user understanding and should be corrected.

dean0x · 2026-04-02T23:00:19Z

plugins/devflow-implement/README.md

+6. **Alignment Check** - Evaluator validates against original request
+7. **QA Testing** - Tester executes scenario-based acceptance tests
+8. **Simplification** - Simplifier refines code clarity
 8. **PR Creation** - Git agent creates pull request


**Step numbering error (95% confidence)

This step should be numbered `9` (not `8`). The sequence should be:

QA Testing

Simplification

PR Creation ← fix this line

dean0x · 2026-04-02T23:00:30Z

plugins/devflow-implement/README.md

@@ -27,8 +27,9 @@ npx devflow-kit init --plugin=implement
 3. **Implementation** - Coder agent implements on feature branch
 4. **Validation** - Validator runs build/test/lint checks
 5. **Self-Review** - Scrutinizer evaluates against 9-pillar framework


Workflow order does not match actual execution (82% confidence)

README shows QA Testing (7) before Simplification (8), but both implement.md and implement-teams.md run Simplification much earlier (Phase 9) before Evaluator (Phase 12) and Tester (Phase 13).

Correct order to match actual command phases:

Exploration

Planning

Implementation

Validation

Simplification ← moved here

Self-Review

Alignment Check

QA Testing

PR Creation

dean0x · 2026-04-02T23:00:32Z

plugins/devflow-implement/README.md

+- `tester` - Scenario-based QA testing
 - `validator` - Build/test validation

 ### Skills (9)


Skill count is stale (90% confidence)

Header says `(9)` but plugin.json now includes `qa` skill. README lists 9 total from core+plugin dependencies.

Add `qa` to the list and update count:
```

Skills (10)

software-design - Result types, DI, immutability

git - Git safety, atomic commits

implementation-patterns - CRUD, API, events

testing - Test quality, coverage

boundary-validation - Boundary validation

self-review - 9-pillar framework

qa - Scenario-based acceptance testing

typescript - TypeScript patterns

react - React patterns

accessibility - Keyboard, ARIA, focus
```

dean0x · 2026-04-02T23:00:47Z

shared/agents/tester.md

+   - **Integration**: Components work together correctly
+   - **Regression**: Existing behavior preserved (if applicable)
+6. **Execute scenarios**: Run each via Bash (or browser for web scenarios) — capture stdout/stderr, exit codes, file state
+7. **Evaluate results**: Compare actual vs expected behavior for each scenario


Bash execution without sandboxing (85% confidence - SECURITY)

Tester designs scenarios from ORIGINAL_REQUEST and executes them via Bash. A malicious issue could induce harmful commands.

Add explicit restrictions to Boundaries section:

NEVER execute: rm -rf, sudo, eval, exec, dd, mkfs

NEVER write outside /tmp/devflow-tester-*/

ONLY run: test runners, curl, read-only inspection, build commands

dean0x · 2026-04-02T23:00:49Z

shared/agents/tester.md

+- Defaults by framework: Next.js→3000, Vite→5173, CRA→3000, Django→8000, Go→8080
+
+### 4. Start Server (if not already running)
+- Run in background: `npm run dev > /tmp/devflow-tester-server.log 2>&1 &`


Symlink attack risk (82% confidence - SECURITY)

Hardcoded path `/tmp/devflow-tester-server.log` is vulnerable on multi-user systems (CWE-59).

Use `mktemp` instead:
```bash
LOG=$(mktemp /tmp/devflow-tester-XXXXXX.log)
npm run dev > "$LOG" 2>&1 &
```

dean0x · 2026-04-02T23:00:51Z

shared/agents/tester.md

+
+# Tester Agent
+
+You are a scenario-based QA specialist. You design and execute acceptance tests that verify implementation behavior from the user's perspective. You test what was asked for, not implementation details. You report results with evidence — you never fix code yourself.


Agent size exceeds convention (85% confidence)

195 lines exceeds CLAUDE.md target of 80-120 for Worker agents. Dev Server Lifecycle (lines 70-106) and Browser Execution (lines 108-125) are procedural.

Suggested fix:

Extract to `shared/skills/qa/references/browser-testing.md`

Reference from agent with brief summary + Read instruction

This brings agent to ~140 lines (within limit)

dean0x · 2026-04-02T23:01:03Z

src/cli/utils/installer.ts

  }

+  // Clean up legacy agent files (renamed or removed agents from prior versions)
+  const agentsTarget = path.join(claudeDir, 'agents', 'devflow');


Variable shadowing: `agentsTarget` declared twice (82% confidence)

`agentsTarget` is declared both inside the for-loop (line 196) and after it (line 215), creating maintenance hazard.

Hoist before loop and reuse:
```typescript
// Before for-loop (around line 167):
const agentsTarget = path.join(claudeDir, 'agents', 'devflow');

// Inside/after loop: reuse, don't redeclare
```

…s list - Correct step ordering to match actual command pipeline (Simplification before Self-Review, before Alignment Check, before QA Testing) - Fix duplicate step number 8 (PR Creation renumbered to 9) - Replace outdated skills list with current plugin.json skills (6 skills: agent-teams, implementation-patterns, knowledge-persistence, qa, self-review, worktree-support)

…d-memory-update Re-add tail -3 pre-filter before the json_extract_messages while loop so at most 3 subprocesses are spawned per pipeline regardless of transcript size. Add head -c 100000 on each line before piping to json_extract_messages to prevent resource exhaustion from oversized transcript entries.

…NT_NAMES consistency tests Adds two missing test coverage items from batch-5 review: - Verify devflow-implement and devflow-ambient declare evaluator/tester agents and qa skill - Verify no LEGACY_AGENT_NAMES entry overlaps with current plugin agent registrations

… update skill count - Add missing 'qa' skill to devflow-ambient plugin.json (tester agent dependency) - Hoist agentsTarget before loop in installer.ts to eliminate variable shadowing - Update README skill count from 35 to 38 to match actual shared/skills/ directory count

…procedures Add tools frontmatter restricting Tester to Read/Grep/Glob/Bash and Chrome MCP tools only. Extract Dev Server Lifecycle and Browser Execution sections to shared/skills/qa/references/browser-testing.md, reducing agent from 196 to 142 lines. Move Bash execution deny list (rm -rf, sudo, eval, package installs) and filesystem write constraints into the reference. Fix predictable /tmp log path to use mktemp, and restrict .env reading to PORT variable only via grep.

…nce, extend test - Add qa to devflow-ambient skills array in plugins.ts (was in plugin.json only) - Remove duplicate browser-testing reference in tester.md - Extend ambient test to assert qa skill dependency

dean0x commented Apr 2, 2026

View reviewed changes

Dean Sharon added 6 commits April 3, 2026 02:11

refactor: simplifier cleanup — sync qa skill, remove duplicate refere…

e326f37

…nce, extend test - Add qa to devflow-ambient skills array in plugins.ts (was in plugin.json only) - Remove duplicate browser-testing reference in tester.md - Extend ambient test to assert qa skill dependency

dean0x merged commit efb5a0d into main Apr 3, 2026
4 checks passed

dean0x deleted the feat/evaluator-rename-tester-agent branch April 3, 2026 10:42

dean0x mentioned this pull request Apr 7, 2026

v2.0.0 Release: Plan/Design workflow, working memory fix, learning system review #170

Open

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171

feat(v2): agent consolidation — evaluator rename, tester agent, QA skill#171
dean0x merged 7 commits intomainfrom
feat/evaluator-rename-tester-agent

dean0x commented Apr 2, 2026

Uh oh!

dean0x left a comment

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

dean0x Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		# Tester Agent

		You are a scenario-based QA specialist. You design and execute acceptance tests that verify implementation behavior from the user's perspective. You test what was asked for, not implementation details. You report results with evidence — you never fix code yourself.

Conversation

dean0x commented Apr 2, 2026

Summary

Changes

Test Plan

Uh oh!

dean0x left a comment

Choose a reason for hiding this comment

Code Review Summary

Blocking Issues (≥80% confidence)

Security Issues (HIGH confidence)

Performance Issues

Lower-Confidence Suggestions (60-79%)

Recommendation

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Skills (10)

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

dean0x Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant