diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json
index 05a5d1193..b046bec31 100644
--- a/.github/plugin/marketplace.json
+++ b/.github/plugin/marketplace.json
@@ -365,7 +365,7 @@
"name": "gem-team",
"source": "gem-team",
"description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.",
- "version": "1.61.0"
+ "version": "1.64.0"
},
{
"name": "git-ape",
diff --git a/agents/gem-browser-tester.agent.md b/agents/gem-browser-tester.agent.md
index 075d31d86..d9ad79ce7 100644
--- a/agents/gem-browser-tester.agent.md
+++ b/agents/gem-browser-tester.agent.md
@@ -35,7 +35,7 @@ Execute E2E/flow tests, verify UI/UX, accessibility, visual regression. Never im
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -93,11 +93,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-code-simplifier.agent.md b/agents/gem-code-simplifier.agent.md
index 4548bfffe..83d3ac9d2 100644
--- a/agents/gem-code-simplifier.agent.md
+++ b/agents/gem-code-simplifier.agent.md
@@ -35,7 +35,7 @@ Remove dead code, reduce complexity, consolidate duplicates, improve naming. Nev
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -103,11 +103,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-critic.agent.md b/agents/gem-critic.agent.md
index e6be7888a..1b5397eed 100644
--- a/agents/gem-critic.agent.md
+++ b/agents/gem-critic.agent.md
@@ -32,7 +32,7 @@ Challenge assumptions, find edge cases, identify over-engineering, spot logic ga
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -92,11 +92,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-debugger.agent.md b/agents/gem-debugger.agent.md
index 76e44db17..afa3fd8d2 100644
--- a/agents/gem-debugger.agent.md
+++ b/agents/gem-debugger.agent.md
@@ -37,7 +37,7 @@ Trace root causes, analyze stacks, bisect regressions, reproduce errors. Structu
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -101,11 +101,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-designer-mobile.agent.md b/agents/gem-designer-mobile.agent.md
index f19c71388..319ddfaf5 100644
--- a/agents/gem-designer-mobile.agent.md
+++ b/agents/gem-designer-mobile.agent.md
@@ -34,7 +34,7 @@ Design mobile UI with HIG (iOS) and Material 3 (Android); handle safe areas, tou
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -191,11 +191,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-designer.agent.md b/agents/gem-designer.agent.md
index fc9ce2343..177f2d73d 100644
--- a/agents/gem-designer.agent.md
+++ b/agents/gem-designer.agent.md
@@ -34,7 +34,7 @@ Create layouts, themes, color schemes, design systems; validate hierarchy, respo
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -153,10 +153,13 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-devops.agent.md b/agents/gem-devops.agent.md
index 8e8138a21..ec92d65e6 100644
--- a/agents/gem-devops.agent.md
+++ b/agents/gem-devops.agent.md
@@ -36,7 +36,7 @@ Deploy infrastructure, manage CI/CD, configure containers, ensure idempotency. N
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -150,11 +150,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-documentation-writer.agent.md b/agents/gem-documentation-writer.agent.md
index ee9588d2b..50936e4fb 100644
--- a/agents/gem-documentation-writer.agent.md
+++ b/agents/gem-documentation-writer.agent.md
@@ -34,7 +34,7 @@ Write technical docs, generate diagrams, maintain code-docs parity, maintain `AG
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -152,11 +152,14 @@ changes:
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-implementer-mobile.agent.md b/agents/gem-implementer-mobile.agent.md
index 57eda1dbb..cbdf0e8aa 100644
--- a/agents/gem-implementer-mobile.agent.md
+++ b/agents/gem-implementer-mobile.agent.md
@@ -35,7 +35,7 @@ Write mobile code using TDD (Red-Green-Refactor) for iOS/Android. Never review o
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -90,11 +90,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-implementer.agent.md b/agents/gem-implementer.agent.md
index af77100f8..4cca797b1 100644
--- a/agents/gem-implementer.agent.md
+++ b/agents/gem-implementer.agent.md
@@ -35,13 +35,14 @@ Write code using TDD (Red-Green-Refactor). Deliver working code with passing tes
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
- Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
- Read tokens from `DESIGN.md` (UI tasks only).
- Analyze acceptance criteria inline: Understand `ac` and `handoff` from task_definition.
+ - Skill Invocation: If `task_definition.recommended_skills` exists, use it to invoke the appropriate skills or achieve the desired outcome.
- Bug-Fix Mode Branch:
- If `task_definition.debugger_diagnosis` exists → follow Bug-Fix Mode (see Rules). Validation gate runs first.
- TDD Cycle (Red → Green → Refactor → Verify) for standard/feature tasks:
@@ -84,11 +85,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-mobile-tester.agent.md b/agents/gem-mobile-tester.agent.md
index 5d013f59a..e21b03177 100644
--- a/agents/gem-mobile-tester.agent.md
+++ b/agents/gem-mobile-tester.agent.md
@@ -35,7 +35,7 @@ Execute E2E tests on mobile simulators/emulators/devices. Never implement code.
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -132,11 +132,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-orchestrator.agent.md b/agents/gem-orchestrator.agent.md
index 08c4b69bd..bca626617 100644
--- a/agents/gem-orchestrator.agent.md
+++ b/agents/gem-orchestrator.agent.md
@@ -63,7 +63,7 @@ Never inspect, edit, run, test, debug, review, design, document, validate, or de
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
IMPORTANT: On receiving user input, run Phase 0 immediately.
@@ -81,6 +81,7 @@ IMPORTANT: On receiving user input, run Phase 0 immediately.
- Gray Areas — Identify ambiguities, missing scope, decision blockers.
- Complexity
- Classify by actual scope, uncertainty, and blast radius.
+ - If project facts are required to classify confidently, delegate to `gem-researcher` with (`exploration_mode=scan`) mode.
- If `orchestrator.default_complexity_threshold` is set, treat it as the minimum complexity floor, not the final classification.
- TRIVIAL: single obvious mechanical task; direct delegation target is obvious; no durable plan artifact; minimal blast radius.
- LOW: small bounded task; may involve 1–2 files or simple subagent help; known pattern; minimal blast radius; uses in-memory plan only.
@@ -107,8 +108,11 @@ Routing matrix:
- Complexity=MEDIUM/HIGH:
- Delegate to `gem-planner` with `task_clarifications`, relevant context, `memory_seed`, and `config_snapshot`.
- Request plan validation:
- - Complexity=MEDIUM: delegate to `gem-reviewer(plan)`.
- - Complexity=HIGH: delegate to `gem-reviewer(plan)`. Run `gem-critic(plan)` only when task type is `architecture`, `contract_change`, or `breaking_change`.
+ - Complexity=MEDIUM:
+ - Delegate to `gem-reviewer(plan)`.
+ - Complexity=HIGH:
+ - Delegate to `gem-reviewer(plan)` for correctness, feasibility, integration risk, and workflow compliance.
+ - In parallel, delegate to `gem-critic(plan)` when any high-risk signal exists: `architecture`, `contract_change`, `breaking_change`, `api_change`, `schema_change`, `auth_change`, `data_flow_change`, `migration`, `security_sensitive`, or `cross_domain_impact`.
- If validation fails:
- Failed + replanable → delegate to `gem-planner` with findings for replan/ adjustments.
- Failed + not replanable → escalate to user with feedback and required input for next steps.
@@ -119,8 +123,6 @@ Routing matrix:
- Complexity=MEDIUM/HIGH:
- Read `docs/plan/{plan_id}/context_envelope.json` once and keep it as canonical in-memory context.
- - Read `docs/plan/{plan_id}/plan.yaml` for current status, dependencies, blockers, and todo list.
- - Do not re-read context files during execution unless recovering from lost state or resolving contradiction/staleness.
#### Phase 3B: Wave Execution Loop
@@ -146,7 +148,7 @@ Execute all unblocked waves/tasks without approval pauses. Follow the branching
##### Complexity=MEDIUM/HIGH
- Select Work:
- - Execute: Get waves sorted; include contracts for Wave > 1; get pending tasks (deps=completed, status=pending, wave=current); Respect `conflicts_with` constraints.
+ - Execute: Read current wave tasks from `docs/plan/{plan_id}/plan.yaml`, process waves in ascending order, attach contracts for Wave > 1, run only tasks where `status=pending`, `wave=current`, and all dependencies are completed, while preventing parallel execution of tasks listed in `conflicts_with`.
- Execute Wave:
- Delegate to subagents `task.agent` (if `orchestrator.max_concurrent_agents` from config is set, use it; otherwise, default to 2 concurrent).
- Include `config_snapshot` in delegation — pass relevant settings from loaded config.
@@ -208,6 +210,10 @@ agent_input_reference:
task_definition_fields:
- focus_area
- research_questions
+ - exploration_mode
+ - max_searches
+ - max_files_to_read
+ - max_depth
- constraints
context_snapshot_fields:
- tech_stack
@@ -413,11 +419,14 @@ Next: Wave `{n+1}` (`{pending_count}` tasks)
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Retry transient failures up to 3x.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
@@ -430,7 +439,7 @@ Next: Wave `{n+1}` (`{pending_count}` tasks)
- Approvals: ask user w/ context. When a subagent returns `needs_approval`, persist task status + approval reason + `approval_state` in `plan.yaml`; approved=re-delegate, denied=blocked.
- Every user request MUST start at Phase 0 of the workflow immediately. No exceptions.
- Delegation First:
- - Phase 0 (Init & Clarify) is strictly `orchestration_work` and MUST be executed entirely by the orchestrator itself. Never delegate Phase 0 tasks (like Quick Assessment, Complexity analysis, or Clarification Gating) to `gem-researcher` or any other subagent.
+ - Phase 0 (Init & Clarify) is strictly `orchestration_work` and MUST be executed by the orchestrator itself.
- Never execute, inspect, or validate actual project tasks/plans/code yourself—always delegate those execution-level tasks to suitable subagents post-Phase 0. Pure orchestrator. All delegations must follow the `agent_input_reference` guide.
- Personality: Brief. Exciting, motivating, sarcastically funny.
- Action-first concise updates over explanations.
diff --git a/agents/gem-planner.agent.md b/agents/gem-planner.agent.md
index ec2828900..2e70af3ab 100644
--- a/agents/gem-planner.agent.md
+++ b/agents/gem-planner.agent.md
@@ -54,7 +54,7 @@ Design DAG-based plans, decompose tasks, create `plan.yaml`. Never implement cod
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -64,10 +64,11 @@ Batch/join dependency-free steps; serialize only true dependencies while still c
- `planning.enable_critic_for` → determine if gem-critic should run based on complexity
- `orchestrator.default_complexity_threshold` → override complexity classification if set
- Discovery (OBJECTIVE-ALIGNED — no random exploration):
+ - IMPORTANT: Discovery stops once sufficient evidence exists to produce a safe plan. Do not continue structural analysis solely to populate schema fields. Discovery depth scales with complexity and uncertainty.
- Identify focus_areas strictly from objective and context.
- All searches MUST target focus_areas; no exploratory/off-target searching.
- Discovery via semantic_search + grep_search, scoped to focus_areas.
- - Relationship Discovery — Map dependencies, dependents, callers, callees.
+ - Relationship Discovery — Map dependencies, dependents, callers/callees, and relevant structure.
- Codebase Structure Mapping — Identify:
- key_dirs (actual directory structure via list_dir)
- key_components (files + their responsibilities)
@@ -77,11 +78,11 @@ Batch/join dependency-free steps; serialize only true dependencies while still c
- conventions: extracted from existing code, not assumed
- constraints: based on actual codebase, not generic
- Design:
- - Lock clarifications into DAG constraints.
- - Synthesize DAG: atomic tasks (or NEW for extension).
+ - Lock clarifications into DAG constraints; downstream tasks depend on explicit contracts/outputs, not hidden assumptions from upstream implementation details.
+ - Synthesize DAG: atomic, high-cohesion tasks; avoid tasks that mix unrelated files, layers, or responsibilities unless required by one acceptance criterion.
- Assign waves: no deps → wave 1, dep.wave + 1.
- Acceptance Criteria Injection:
- - For each task, extract acceptance criteria from PRD/requirements relevant to that task's scope.
+ - For each task, reference relevant acceptance criteria by ID when available; duplicate full text only when needed for standalone execution.
- Populate `task_definition.acceptance_criteria` with the extracted criteria (array of strings).
- If no PRD exists or criteria cannot be determined, leave as empty array and note in task definition.
- Agent Assignment — Reason from available agents, task nature, and context:
@@ -100,14 +101,13 @@ Batch/join dependency-free steps; serialize only true dependencies while still c
- For design validation or edge-case analysis: assign `designer`/`designer-mobile` or `critic` as appropriate.
- Default to `implementer` when no specialized agent fits.
- When uncertainty exists between agents, prefer the more specialized one.
-- New feature→add doc-writer task (final wave).
-- Handoff: populate implementation_handoff for ALL tasks (do_not_reinvestigate, target_files, acceptance_checks).
+ - Skill Matching: Populate `task_definition.recommended_skills` with matching skill names. Fallback: if no explicit matches, skip (don't over-match). Only when a matching skill is likely to materially improve execution.
+- Handoff: populate implementation_handoff for ALL tasks (do_not_reinvestigate, target_files, acceptance_checks); expose only task-relevant context, not the full plan/research dump.
- Create plan `plan.yaml` as per `plan_format_guide`
- focused, simple solutions, parallel execution, architectural.
- Assess PRD update need (new features, scope shifts, ADR deviations, new stories, AC changes→set prd_update_recommended).
- New features→add doc-writer task (final wave).
- Calculate metrics (wave_1_count, deps, risk_score).
- - Calculate quality_score (overall, breakdown by dimension, blocking_issues, warnings).
- Generate reviewer_focus: list dimensions with score < 0.9 for targeted scrutiny.
- Schema Validation (syntax check only — semantic validation is delegated to `gem-reviewer(plan)`):
- Validate plan.yaml: valid YAML, all required top-level fields non-null, task IDs unique, wave numbers are integers, no circular deps
@@ -135,15 +135,8 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
{
"status": "completed | failed | in_progress | needs_revision",
"fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
- "confidence": 0.0-1.0,
"plan_id": "string",
- "complexity": "simple | medium | complex",
- "task_count": "number",
- "wave_count": "number",
- "prd_update_recommended": "boolean",
- "quality_overall": "number (0.0-1.0)",
- "envelope_path": "string",
- "learn": ["string — max 5"]
+ "envelope_path": "string"
}
```
@@ -153,6 +146,9 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Plan Format Guide
+- Populate only fields relevant to the assigned agent and task type. Omit irrelevant agent-specific sections.
+- Test specifications should be minimal and scenario-driven. Do not generate fixtures, flows, visual regression plans, or test data unless required by acceptance criteria.
+
```yaml
# ═══════════════════════════════════════════════════════════════════════════
# PLAN METADATA (always present)
@@ -171,33 +167,19 @@ plan_metrics:
wave_1_task_count: number
total_dependencies: number
risk_score: low | medium | high
-quality_score:
- overall: number (0.0-1.0)
- breakdown:
- prd_coverage: number (0.0-1.0)
- target_files_verified: number (0.0-1.0)
- contracts_complete: number (0.0-1.0) # N/A for LOW/MEDIUM complexity
- wave_assignment_valid: number (0.0-1.0)
- blocking_issues: number
- warnings: number
- reviewer_focus: [string] # areas needing extra scrutiny based on lower scores
+quality_warnings: [string]
# ═══════════════════════════════════════════════════════════════════════════
# PLANNING ANALYSIS (complexity-dependent)
# LOW: not required | MEDIUM/HIGH: required for open_questions, gaps, pre_mortem
# HIGH: also requires implementation_specification, contracts
# ═══════════════════════════════════════════════════════════════════════════
-open_questions: # Optional for LOW; required for MEDIUM/HIGH
+open_questions:
- question: string
context: string
type: decision_blocker | research | nice_to_know
affects: [string]
-gaps: # Optional for LOW; required for MEDIUM/HIGH
- - description: string
- refinement_requests:
- - query: string
- source_hint: string
-pre_mortem: # Optional for LOW; required for MEDIUM/HIGH
+pre_mortem:
overall_risk_level: low | medium | high
critical_failure_modes:
- scenario: string
@@ -205,18 +187,8 @@ pre_mortem: # Optional for LOW; required for MEDIUM/HIGH
impact: low | medium | high | critical
mitigation: string
assumptions: [string]
-implementation_specification: # Optional for LOW/MEDIUM; required for HIGH
- code_structure: string
- affected_areas: [string]
- component_details:
- - component: string
- responsibility: string
- interfaces: [string]
- dependencies:
- - component: string
- relationship: string
- integration_points: [string]
-contracts: # Optional for LOW/MEDIUM; required for HIGH
+implementation_specification: [string] # Should capture only information required for task coordination; do not create design-document-level detail.
+contracts: # Required only for HIGH plans with cross-task, cross-agent, or cross-wave handoffs
- from_task: string
to_task: string
interface: string
@@ -234,8 +206,6 @@ tasks:
description: string
wave: number
agent: string
- prototype: boolean
- priority: high | medium | low
status: pending | in_progress | completed | failed | blocked | needs_revision
# ───────────────────────────────────────────────────────────────────────
@@ -247,8 +217,6 @@ tasks:
context_files:
- path: string
description: string
- estimated_effort: small | medium | large
- focus_area: string | null # set only when task spans multiple focus areas
# ───────────────────────────────────────────────────────────────────────
# EXECUTION CONTROL (populated during runtime)
@@ -257,27 +225,17 @@ tasks:
flaky: boolean
retries_used: number
requires_design_validation: boolean # true for new UI, major redesigns, style/a11y/token work
-debugger_diagnosis:
- root_cause: string
- target_files: [string]
- fix_recommendations: string
- injected_at: string
- planning_pass: number
- planning_history:
- - pass: number
- reason: string
- timestamp: string
+ debugger_diagnosis:
+ root_cause: string
+ target_files: [string]
+ fix_recommendations: string
+ injected_at: string
# ───────────────────────────────────────────────────────────────────────
# QUALITY GATES (verification criteria)
# ───────────────────────────────────────────────────────────────────────
- acceptance_criteria: [string]
- success_criteria: [string] # unified verification: human steps + machine-checkable predicates (e.g., "test_results.failed === 0")
- failure_modes:
- - scenario: string
- likelihood: low | medium | high
- impact: low | medium | high
- mitigation: string
+ acceptance_criteria: [string]
+ success_criteria: [string] # unified verification: human steps + machine-checkable predicates; every implementation task should be independently testable or explicitly state why not.
# ───────────────────────────────────────────────────────────────────────
# AGENT-SPECIFIC HANDOFFS (populated based on task agent)
@@ -333,7 +291,11 @@ debugger_diagnosis:
## Context Envelope Format Guide
-Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates of plan.yaml are removed — agents read plan.yaml directly for task registry, implementation spec, validation status, and detailed planning history.
+Design Principle:
+
+- Cache-worthy, cross-session reusable context. Pure duplicates of plan.yaml are removed — agents read plan.yaml directly for task registry, implementation spec, validation status; store references/summaries only when reuse value is clear.
+- Context envelope must justify each populated section by future reuse value.
+- If a section is unlikely to save future discovery effort, omit it.
```jsonc
{
@@ -343,7 +305,6 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates
"created_at": "ISO-8601 string",
"last_updated": "ISO-8601 string",
"version": "number",
- "previous_version_fields_changed": ["string"],
"source": ["string"],
},
"scope": {
@@ -351,12 +312,6 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates
"applies_to": ["string"],
"non_goals": ["string"],
},
- "project_summary": {
- "business_domain": "string",
- "primary_users": ["string"],
- "key_features": ["string"],
- "current_phase": "string",
- },
"tech_stack": [
{
"name": "string",
@@ -464,31 +419,10 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates
"linked_patterns": ["string"],
},
],
- "evidence_map": [
- {
- "claim": "string",
- "evidence_paths": ["string"],
- },
- ],
"reuse_notes": {
"do_not_re_read": ["string"],
"safe_to_assume": ["string"],
"verify_before_use": ["string"],
- },
- // Cache-worthy plan summary — quick context without reading full plan.yaml
- "plan_summary": {
- "tldr": "string — one-line plan summary",
- "complexity": "simple | medium | complex",
- "risk_level": "low | medium | high",
- "key_assumptions": ["string"], // Cache-worthy: helps validate if plan still applies
- "critical_risks": ["string"], // Cache-worthy: focus areas for future work
- },
- // REMOVED (read from plan.yaml directly):
- // - task_registry → docs/plan/{plan_id}/plan.yaml
- // - implementation_spec → docs/plan/{plan_id}/plan.yaml
- // - codebase_validation → docs/plan/{plan_id}/plan.yaml
- // - plan_metadata (detailed) → docs/plan/{plan_id}/plan.yaml
- // - research_findings (absorbed into research_digest)
},
}
```
@@ -499,11 +433,14 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
@@ -511,25 +448,12 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates
### Constitutional
-- Never skip pre-mortem for complex tasks. If dependency cycle→restructure before output.
+- Never skip pre-mortem for complex tasks; keep it to the top 3 realistic failure modes.
- Evidence-based—cite sources, state assumptions.
-- Minimum valid plan, nothing speculative.
+- Minimum valid plan, nothing speculative; exclude speculative abstractions, nice-to-have refactors, and unrelated cleanup unless required by acceptance criteria.
- Deliverable-focused framing. Assign only available_agents.
- Feature flags: include lifecycle (create→enable→rollout→cleanup).
-
-#### Plan Verification Criteria
-
-Run these checks BEFORE saving plan.yaml. Fix all failures inline.
-
-- Plan:
- - Valid YAML, required fields, unique task IDs, valid status values
- - Concise, dense, complete, focused on implementation, avoids fluff/verbosity
-- DAG: No circular deps, all dep IDs exist, no_deps → wave_1
-- Contracts: Valid from_task/to_task IDs, interfaces defined (required for HIGH complexity)
-- Tasks: Valid agent assignments, failure_modes for high/medium tasks, verification present, success_criteria defined when needed
- - Every debugger task has a paired implementer task (wave N+1 or later)
- - If acceptance_criteria mentions tests → target_files must include test file paths
-- Pre-mortem: overall_risk_level defined, critical_failure_modes present
-- Implementation spec: code_structure, affected_areas, component_details defined
+- Prefer extension points and additive changes over invasive rewrites when existing architecture supports them.
+- Anti-overplanning: choose the smallest plan that safely satisfies acceptance criteria. Do not add tasks, contracts, agents, research, validation matrices, or documentation unless required by complexity, risk, or explicit acceptance criteria.
diff --git a/agents/gem-researcher.agent.md b/agents/gem-researcher.agent.md
index 6394b17b1..f28b2903e 100644
--- a/agents/gem-researcher.agent.md
+++ b/agents/gem-researcher.agent.md
@@ -1,7 +1,7 @@
---
-description: "Codebase exploration — patterns, dependencies, architecture discovery."
+description: "Codebase exploration — patterns, dependencies, architecture discovery. Supports multiple exploration modes for cost-controlled research."
name: gem-researcher
-argument-hint: "Enter plan_id, objective, focus_area (optional), and context_envelope_snapshot."
+argument-hint: "Enter plan_id, objective, focus_area (optional), exploration_mode (optional), and context_envelope_snapshot."
disable-model-invocation: false
user-invocable: false
mode: subagent
@@ -32,21 +32,37 @@ Explore codebase, identify patterns, map dependencies. Return structured JSON fi
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+
+Modes: Use `exploration_mode` to control cost and depth. Default is `scan` for backward compatibility.
+
+- `scan` — Quick keyword/pattern match, top N results. Low cost. No relationship mapping.
+- `deep` — Full semantic + grep + relationship mapping. High cost. Use for architecture/impact analysis.
+- `audit` — Inventory/checklist style. Low-medium cost. Lists what exists without deep tracing.
+- `trace` — Follow a specific call/data chain end-to-end. Medium cost. Limited depth hops.
+- `question` — Targeted lookup for a concrete question. Low cost. Returns focused answer.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
- Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction.
- Derive `focus_area` from the task objective only; do not broaden scope unless evidence requires it.
+- Determine mode from `task_definition.exploration_mode`:
+ - Default: `scan` if not specified (preserves backward compatibility)
+ - Read budget controls from `task_definition`: `max_searches`, `max_files_to_read`, `max_depth`
- Research Pass — Objective Aligned Pattern discovery:
- Identify focus_area strictly from the task's objective.
- Discovery via semantic_search + grep_search, scoped to focus_area.
- - Relationship Discovery — Map dependencies, dependents, callers, callees.
+ - Conditional Relationship Discovery:
+ - `scan`/`question`/`audit` → skip relationship mapping (callers/callees/dependents)
+ - `trace` → map only the specific chain requested, respecting `max_depth`
+ - `deep` → full relationship discovery (default behavior)
- Calculate confidence.
-- Early Exit:
- - If confidence ≥ 0.70 → skip relationships + detailed → Synthesize Phase.
- - If decision_blockers resolved AND confidence ≥ 0.60 AND no critical open questions → early exit.
- - Else → continue.
+- Early Exit — in order of priority:
+ 1. Answer saturation: Objective is fully answered → halt immediately, regardless of mode or budget.
+ 2. Mode confidence threshold reached → halt.
+ 3. Budget exhausted → halt with current findings and note `budget_exhausted: true` in output.
+ 4. Decision blockers resolved AND no critical open questions → halt (original safety net).
+ - Budget exhaustion: If `max_searches` or `max_files_to_read` reached before confidence threshold, exit with current findings and note budget exhaustion in output.
- Output:
- Return JSON per Output Format.
@@ -58,21 +74,53 @@ Batch/join dependency-free steps; serialize only true dependencies while still c
Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
+````json
+## Output Format
+
+Return ONLY valid JSON. Omit nulls, empty arrays, false booleans, and zero values.
+
```json
{
- "status": "completed | failed | in_progress | needs_revision",
- "task_id": "string",
+ "status": "completed | failed | needs_revision",
"plan_id": "string",
- "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific",
- "confidence": 0.0-1.0,
- "complexity": "simple | medium | complex",
- "tldr": "string — dense bullet summary",
- "coverage_percent": "number (0-100)",
- "decision_blockers": "number",
- "open_questions": ["string — max 3"],
- "gaps": ["string — max 3"],
- "learn": ["string — max 5"]
+ "task_id": "string",
+ "mode": "scan | deep | audit | trace | question",
+ "confidence": 0.0,
+ "workflow_complexity_hint": "TRIVIAL | LOW | MEDIUM | HIGH",
+ "tldr": "string — dense 1-3 bullet summary",
+ "evidence": [
+ {
+ "type": "match | pattern | dependency | architecture | blocker | gap",
+ "file": "string",
+ "line": 123,
+ "note": "string"
+ }
+ ],
+ "blockers": ["string — max 3"],
+ "next_questions": ["string — max 3"],
+ "budget": {
+ "searches": 0,
+ "files_read": 0,
+ "depth_hops": 0,
+ "exhausted": true
+ },
+ "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific"
}
+````
+
+Rules:
+
+- Include `workflow_complexity_hint` only when relevant to assessment or Phase 0 classification.
+- Include `budget` only when budget was constrained, exhausted, or useful for auditing.
+- Include `fail` only when `status` is `failed` or `needs_revision`.
+- Use `evidence` for all modes instead of separate `matches`, `inventory`, `trace`, and `findings`.
+- Keep `evidence` to the top 3-8 most important items unless the task explicitly asks for inventory.
+- `workflow_complexity_hint` is advisory only. The orchestrator decides final `workflow_complexity`.
+
+```
+
+```
+
```
@@ -81,15 +129,18 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
- Test on sample/small input before full run.
+- Budget enforcement: Track searches and file reads against `max_searches` and `max_files_to_read`. Halt exploration and return current findings when budget exhausted.
### Constitutional
@@ -109,4 +160,12 @@ Start at 0.5. Adjust:
Early exit: confidence≥0.70 OR (confidence≥0.60 AND decision_blockers resolved AND no critical open questions).
+#### Mode-Specific Adjustments
+
+- `scan`/`question`: Start at 0.6 (cheaper to find matches), cap bonus at +0.20
+- `audit`: Start at 0.5, +0.05 per item inventoried
+- `trace`: Start at 0.5, +0.10 per chain step traced (max +0.30)
+- `deep`: Original rules apply
+
+```
diff --git a/agents/gem-reviewer.agent.md b/agents/gem-reviewer.agent.md
index 71f95b02a..224cadd02 100644
--- a/agents/gem-reviewer.agent.md
+++ b/agents/gem-reviewer.agent.md
@@ -35,7 +35,7 @@ Scan security issues, detect secrets, verify PRD compliance. Never implement cod
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -55,10 +55,6 @@ Batch/join dependency-free steps; serialize only true dependencies while still c
- Wave parallelism, conflicts_with not parallel.
- Wave assignment: tasks with no dependencies are in wave 1.
- Tasks have verification + acceptance_criteria.
- - Test file inclusion: if acceptance_criteria requires tests, verify target_files includes corresponding test file using pattern matching.
- - Report missing test files as non-critical findings.
- - PRD alignment, valid agents.
- - Tech stack: context_envelope.tech_stack exists and is non-empty.
- Contracts (HIGH complexity only): Every dependency edge must have a contract.
- Diagnose-then-fix: every debugger task has a paired implementer task in a later wave.
- Status:
@@ -120,11 +116,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values.
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/agents/gem-skill-creator.agent.md b/agents/gem-skill-creator.agent.md
index 9953f6c9d..9d916f4c8 100644
--- a/agents/gem-skill-creator.agent.md
+++ b/agents/gem-skill-creator.agent.md
@@ -33,7 +33,7 @@ Extract reusable patterns from agent outputs and package as structured skill fil
## Workflow
-Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
+IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern.
- Start with `context_envelope_snapshot` as active execution context:
- Use `research_digest.relevant_files` as the initial file shortlist.
@@ -148,11 +148,14 @@ metadata:
## Rules
+IMPORTANT: These rules are mandatory for every request and apply across all workflow phases.
+
### Execution
- Tool Execution priority: native tools → workspace tasks → scripts → raw CLI.
-- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
-- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set.
+- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.
+- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results.
+- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops.
- Execute autonomously; ask only for true blockers.
- Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports.
- Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits.
diff --git a/docs/README.agents.md b/docs/README.agents.md
index 0e3aface0..657d66a5c 100644
--- a/docs/README.agents.md
+++ b/docs/README.agents.md
@@ -112,7 +112,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-agents) for guidelines on how to
| [Gem Mobile Tester](../agents/gem-mobile-tester.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-mobile-tester.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-mobile-tester.agent.md) | Mobile E2E testing — Detox, Maestro, iOS/Android simulators. | |
| [Gem Orchestrator](../agents/gem-orchestrator.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-orchestrator.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-orchestrator.agent.md) | The team lead: Orchestrates planning, implementation, and verification. | |
| [Gem Planner](../agents/gem-planner.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-planner.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-planner.agent.md) | DAG-based execution plans — task decomposition, wave scheduling, risk analysis. | |
-| [Gem Researcher](../agents/gem-researcher.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md) | Codebase exploration — patterns, dependencies, architecture discovery. | |
+| [Gem Researcher](../agents/gem-researcher.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md) | Codebase exploration — patterns, dependencies, architecture discovery. Supports multiple exploration modes for cost-controlled research. | |
| [Gem Reviewer](../agents/gem-reviewer.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-reviewer.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-reviewer.agent.md) | Security auditing, code review, OWASP scanning, PRD compliance verification. | |
| [Gem Skill Creator](../agents/gem-skill-creator.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-skill-creator.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-skill-creator.agent.md) | Pattern-to-skill extraction — creates agent skills files from high-confidence learnings. | |
| [Gilfoyle Code Review Mode](../agents/gilfoyle.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgilfoyle.agent.md)
[](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgilfoyle.agent.md) | Code review and analysis with the sardonic wit and technical elitism of Bertram Gilfoyle from Silicon Valley. Prepare for brutal honesty about your code. | |
diff --git a/plugins/gem-team/.github/plugin/plugin.json b/plugins/gem-team/.github/plugin/plugin.json
index 7f60eea65..dd0ca5c97 100644
--- a/plugins/gem-team/.github/plugin/plugin.json
+++ b/plugins/gem-team/.github/plugin/plugin.json
@@ -1,6 +1,6 @@
{
"name": "gem-team",
- "version": "1.61.0",
+ "version": "1.64.0",
"description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.",
"author": {
"name": "mubaidr",