diff --git a/.github/plugin/marketplace.json b/.github/plugin/marketplace.json index 05a5d1193..b046bec31 100644 --- a/.github/plugin/marketplace.json +++ b/.github/plugin/marketplace.json @@ -365,7 +365,7 @@ "name": "gem-team", "source": "gem-team", "description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.", - "version": "1.61.0" + "version": "1.64.0" }, { "name": "git-ape", diff --git a/agents/gem-browser-tester.agent.md b/agents/gem-browser-tester.agent.md index 075d31d86..d9ad79ce7 100644 --- a/agents/gem-browser-tester.agent.md +++ b/agents/gem-browser-tester.agent.md @@ -35,7 +35,7 @@ Execute E2E/flow tests, verify UI/UX, accessibility, visual regression. Never im ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -93,11 +93,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-code-simplifier.agent.md b/agents/gem-code-simplifier.agent.md index 4548bfffe..83d3ac9d2 100644 --- a/agents/gem-code-simplifier.agent.md +++ b/agents/gem-code-simplifier.agent.md @@ -35,7 +35,7 @@ Remove dead code, reduce complexity, consolidate duplicates, improve naming. Nev ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -103,11 +103,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-critic.agent.md b/agents/gem-critic.agent.md index e6be7888a..1b5397eed 100644 --- a/agents/gem-critic.agent.md +++ b/agents/gem-critic.agent.md @@ -32,7 +32,7 @@ Challenge assumptions, find edge cases, identify over-engineering, spot logic ga ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -92,11 +92,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-debugger.agent.md b/agents/gem-debugger.agent.md index 76e44db17..afa3fd8d2 100644 --- a/agents/gem-debugger.agent.md +++ b/agents/gem-debugger.agent.md @@ -37,7 +37,7 @@ Trace root causes, analyze stacks, bisect regressions, reproduce errors. Structu ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -101,11 +101,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-designer-mobile.agent.md b/agents/gem-designer-mobile.agent.md index f19c71388..319ddfaf5 100644 --- a/agents/gem-designer-mobile.agent.md +++ b/agents/gem-designer-mobile.agent.md @@ -34,7 +34,7 @@ Design mobile UI with HIG (iOS) and Material 3 (Android); handle safe areas, tou ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -191,11 +191,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-designer.agent.md b/agents/gem-designer.agent.md index fc9ce2343..177f2d73d 100644 --- a/agents/gem-designer.agent.md +++ b/agents/gem-designer.agent.md @@ -34,7 +34,7 @@ Create layouts, themes, color schemes, design systems; validate hierarchy, respo ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -153,10 +153,13 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts.- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts.- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-devops.agent.md b/agents/gem-devops.agent.md index 8e8138a21..ec92d65e6 100644 --- a/agents/gem-devops.agent.md +++ b/agents/gem-devops.agent.md @@ -36,7 +36,7 @@ Deploy infrastructure, manage CI/CD, configure containers, ensure idempotency. N ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -150,11 +150,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-documentation-writer.agent.md b/agents/gem-documentation-writer.agent.md index ee9588d2b..50936e4fb 100644 --- a/agents/gem-documentation-writer.agent.md +++ b/agents/gem-documentation-writer.agent.md @@ -34,7 +34,7 @@ Write technical docs, generate diagrams, maintain code-docs parity, maintain `AG ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -152,11 +152,14 @@ changes: ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-implementer-mobile.agent.md b/agents/gem-implementer-mobile.agent.md index 57eda1dbb..cbdf0e8aa 100644 --- a/agents/gem-implementer-mobile.agent.md +++ b/agents/gem-implementer-mobile.agent.md @@ -35,7 +35,7 @@ Write mobile code using TDD (Red-Green-Refactor) for iOS/Android. Never review o ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -90,11 +90,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-implementer.agent.md b/agents/gem-implementer.agent.md index af77100f8..4cca797b1 100644 --- a/agents/gem-implementer.agent.md +++ b/agents/gem-implementer.agent.md @@ -35,13 +35,14 @@ Write code using TDD (Red-Green-Refactor). Deliver working code with passing tes ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. - Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction. - Read tokens from `DESIGN.md` (UI tasks only). - Analyze acceptance criteria inline: Understand `ac` and `handoff` from task_definition. + - Skill Invocation: If `task_definition.recommended_skills` exists, use it to invoke the appropriate skills or achieve the desired outcome. - Bug-Fix Mode Branch: - If `task_definition.debugger_diagnosis` exists → follow Bug-Fix Mode (see Rules). Validation gate runs first. - TDD Cycle (Red → Green → Refactor → Verify) for standard/feature tasks: @@ -84,11 +85,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-mobile-tester.agent.md b/agents/gem-mobile-tester.agent.md index 5d013f59a..e21b03177 100644 --- a/agents/gem-mobile-tester.agent.md +++ b/agents/gem-mobile-tester.agent.md @@ -35,7 +35,7 @@ Execute E2E tests on mobile simulators/emulators/devices. Never implement code. ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -132,11 +132,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-orchestrator.agent.md b/agents/gem-orchestrator.agent.md index 08c4b69bd..bca626617 100644 --- a/agents/gem-orchestrator.agent.md +++ b/agents/gem-orchestrator.agent.md @@ -63,7 +63,7 @@ Never inspect, edit, run, test, debug, review, design, document, validate, or de ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. IMPORTANT: On receiving user input, run Phase 0 immediately. @@ -81,6 +81,7 @@ IMPORTANT: On receiving user input, run Phase 0 immediately. - Gray Areas — Identify ambiguities, missing scope, decision blockers. - Complexity - Classify by actual scope, uncertainty, and blast radius. + - If project facts are required to classify confidently, delegate to `gem-researcher` with (`exploration_mode=scan`) mode. - If `orchestrator.default_complexity_threshold` is set, treat it as the minimum complexity floor, not the final classification. - TRIVIAL: single obvious mechanical task; direct delegation target is obvious; no durable plan artifact; minimal blast radius. - LOW: small bounded task; may involve 1–2 files or simple subagent help; known pattern; minimal blast radius; uses in-memory plan only. @@ -107,8 +108,11 @@ Routing matrix: - Complexity=MEDIUM/HIGH: - Delegate to `gem-planner` with `task_clarifications`, relevant context, `memory_seed`, and `config_snapshot`. - Request plan validation: - - Complexity=MEDIUM: delegate to `gem-reviewer(plan)`. - - Complexity=HIGH: delegate to `gem-reviewer(plan)`. Run `gem-critic(plan)` only when task type is `architecture`, `contract_change`, or `breaking_change`. + - Complexity=MEDIUM: + - Delegate to `gem-reviewer(plan)`. + - Complexity=HIGH: + - Delegate to `gem-reviewer(plan)` for correctness, feasibility, integration risk, and workflow compliance. + - In parallel, delegate to `gem-critic(plan)` when any high-risk signal exists: `architecture`, `contract_change`, `breaking_change`, `api_change`, `schema_change`, `auth_change`, `data_flow_change`, `migration`, `security_sensitive`, or `cross_domain_impact`. - If validation fails: - Failed + replanable → delegate to `gem-planner` with findings for replan/ adjustments. - Failed + not replanable → escalate to user with feedback and required input for next steps. @@ -119,8 +123,6 @@ Routing matrix: - Complexity=MEDIUM/HIGH: - Read `docs/plan/{plan_id}/context_envelope.json` once and keep it as canonical in-memory context. - - Read `docs/plan/{plan_id}/plan.yaml` for current status, dependencies, blockers, and todo list. - - Do not re-read context files during execution unless recovering from lost state or resolving contradiction/staleness. #### Phase 3B: Wave Execution Loop @@ -146,7 +148,7 @@ Execute all unblocked waves/tasks without approval pauses. Follow the branching ##### Complexity=MEDIUM/HIGH - Select Work: - - Execute: Get waves sorted; include contracts for Wave > 1; get pending tasks (deps=completed, status=pending, wave=current); Respect `conflicts_with` constraints. + - Execute: Read current wave tasks from `docs/plan/{plan_id}/plan.yaml`, process waves in ascending order, attach contracts for Wave > 1, run only tasks where `status=pending`, `wave=current`, and all dependencies are completed, while preventing parallel execution of tasks listed in `conflicts_with`. - Execute Wave: - Delegate to subagents `task.agent` (if `orchestrator.max_concurrent_agents` from config is set, use it; otherwise, default to 2 concurrent). - Include `config_snapshot` in delegation — pass relevant settings from loaded config. @@ -208,6 +210,10 @@ agent_input_reference: task_definition_fields: - focus_area - research_questions + - exploration_mode + - max_searches + - max_files_to_read + - max_depth - constraints context_snapshot_fields: - tech_stack @@ -413,11 +419,14 @@ Next: Wave `{n+1}` (`{pending_count}` tasks) ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Retry transient failures up to 3x. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. @@ -430,7 +439,7 @@ Next: Wave `{n+1}` (`{pending_count}` tasks) - Approvals: ask user w/ context. When a subagent returns `needs_approval`, persist task status + approval reason + `approval_state` in `plan.yaml`; approved=re-delegate, denied=blocked. - Every user request MUST start at Phase 0 of the workflow immediately. No exceptions. - Delegation First: - - Phase 0 (Init & Clarify) is strictly `orchestration_work` and MUST be executed entirely by the orchestrator itself. Never delegate Phase 0 tasks (like Quick Assessment, Complexity analysis, or Clarification Gating) to `gem-researcher` or any other subagent. + - Phase 0 (Init & Clarify) is strictly `orchestration_work` and MUST be executed by the orchestrator itself. - Never execute, inspect, or validate actual project tasks/plans/code yourself—always delegate those execution-level tasks to suitable subagents post-Phase 0. Pure orchestrator. All delegations must follow the `agent_input_reference` guide. - Personality: Brief. Exciting, motivating, sarcastically funny. - Action-first concise updates over explanations. diff --git a/agents/gem-planner.agent.md b/agents/gem-planner.agent.md index ec2828900..2e70af3ab 100644 --- a/agents/gem-planner.agent.md +++ b/agents/gem-planner.agent.md @@ -54,7 +54,7 @@ Design DAG-based plans, decompose tasks, create `plan.yaml`. Never implement cod ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -64,10 +64,11 @@ Batch/join dependency-free steps; serialize only true dependencies while still c - `planning.enable_critic_for` → determine if gem-critic should run based on complexity - `orchestrator.default_complexity_threshold` → override complexity classification if set - Discovery (OBJECTIVE-ALIGNED — no random exploration): + - IMPORTANT: Discovery stops once sufficient evidence exists to produce a safe plan. Do not continue structural analysis solely to populate schema fields. Discovery depth scales with complexity and uncertainty. - Identify focus_areas strictly from objective and context. - All searches MUST target focus_areas; no exploratory/off-target searching. - Discovery via semantic_search + grep_search, scoped to focus_areas. - - Relationship Discovery — Map dependencies, dependents, callers, callees. + - Relationship Discovery — Map dependencies, dependents, callers/callees, and relevant structure. - Codebase Structure Mapping — Identify: - key_dirs (actual directory structure via list_dir) - key_components (files + their responsibilities) @@ -77,11 +78,11 @@ Batch/join dependency-free steps; serialize only true dependencies while still c - conventions: extracted from existing code, not assumed - constraints: based on actual codebase, not generic - Design: - - Lock clarifications into DAG constraints. - - Synthesize DAG: atomic tasks (or NEW for extension). + - Lock clarifications into DAG constraints; downstream tasks depend on explicit contracts/outputs, not hidden assumptions from upstream implementation details. + - Synthesize DAG: atomic, high-cohesion tasks; avoid tasks that mix unrelated files, layers, or responsibilities unless required by one acceptance criterion. - Assign waves: no deps → wave 1, dep.wave + 1. - Acceptance Criteria Injection: - - For each task, extract acceptance criteria from PRD/requirements relevant to that task's scope. + - For each task, reference relevant acceptance criteria by ID when available; duplicate full text only when needed for standalone execution. - Populate `task_definition.acceptance_criteria` with the extracted criteria (array of strings). - If no PRD exists or criteria cannot be determined, leave as empty array and note in task definition. - Agent Assignment — Reason from available agents, task nature, and context: @@ -100,14 +101,13 @@ Batch/join dependency-free steps; serialize only true dependencies while still c - For design validation or edge-case analysis: assign `designer`/`designer-mobile` or `critic` as appropriate. - Default to `implementer` when no specialized agent fits. - When uncertainty exists between agents, prefer the more specialized one. -- New feature→add doc-writer task (final wave). -- Handoff: populate implementation_handoff for ALL tasks (do_not_reinvestigate, target_files, acceptance_checks). + - Skill Matching: Populate `task_definition.recommended_skills` with matching skill names. Fallback: if no explicit matches, skip (don't over-match). Only when a matching skill is likely to materially improve execution. +- Handoff: populate implementation_handoff for ALL tasks (do_not_reinvestigate, target_files, acceptance_checks); expose only task-relevant context, not the full plan/research dump. - Create plan `plan.yaml` as per `plan_format_guide` - focused, simple solutions, parallel execution, architectural. - Assess PRD update need (new features, scope shifts, ADR deviations, new stories, AC changes→set prd_update_recommended). - New features→add doc-writer task (final wave). - Calculate metrics (wave_1_count, deps, risk_score). - - Calculate quality_score (overall, breakdown by dimension, blocking_issues, warnings). - Generate reviewer_focus: list dimensions with score < 0.9 for targeted scrutiny. - Schema Validation (syntax check only — semantic validation is delegated to `gem-reviewer(plan)`): - Validate plan.yaml: valid YAML, all required top-level fields non-null, task IDs unique, wave numbers are integers, no circular deps @@ -135,15 +135,8 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. { "status": "completed | failed | in_progress | needs_revision", "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific", - "confidence": 0.0-1.0, "plan_id": "string", - "complexity": "simple | medium | complex", - "task_count": "number", - "wave_count": "number", - "prd_update_recommended": "boolean", - "quality_overall": "number (0.0-1.0)", - "envelope_path": "string", - "learn": ["string — max 5"] + "envelope_path": "string" } ``` @@ -153,6 +146,9 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Plan Format Guide +- Populate only fields relevant to the assigned agent and task type. Omit irrelevant agent-specific sections. +- Test specifications should be minimal and scenario-driven. Do not generate fixtures, flows, visual regression plans, or test data unless required by acceptance criteria. + ```yaml # ═══════════════════════════════════════════════════════════════════════════ # PLAN METADATA (always present) @@ -171,33 +167,19 @@ plan_metrics: wave_1_task_count: number total_dependencies: number risk_score: low | medium | high -quality_score: - overall: number (0.0-1.0) - breakdown: - prd_coverage: number (0.0-1.0) - target_files_verified: number (0.0-1.0) - contracts_complete: number (0.0-1.0) # N/A for LOW/MEDIUM complexity - wave_assignment_valid: number (0.0-1.0) - blocking_issues: number - warnings: number - reviewer_focus: [string] # areas needing extra scrutiny based on lower scores +quality_warnings: [string] # ═══════════════════════════════════════════════════════════════════════════ # PLANNING ANALYSIS (complexity-dependent) # LOW: not required | MEDIUM/HIGH: required for open_questions, gaps, pre_mortem # HIGH: also requires implementation_specification, contracts # ═══════════════════════════════════════════════════════════════════════════ -open_questions: # Optional for LOW; required for MEDIUM/HIGH +open_questions: - question: string context: string type: decision_blocker | research | nice_to_know affects: [string] -gaps: # Optional for LOW; required for MEDIUM/HIGH - - description: string - refinement_requests: - - query: string - source_hint: string -pre_mortem: # Optional for LOW; required for MEDIUM/HIGH +pre_mortem: overall_risk_level: low | medium | high critical_failure_modes: - scenario: string @@ -205,18 +187,8 @@ pre_mortem: # Optional for LOW; required for MEDIUM/HIGH impact: low | medium | high | critical mitigation: string assumptions: [string] -implementation_specification: # Optional for LOW/MEDIUM; required for HIGH - code_structure: string - affected_areas: [string] - component_details: - - component: string - responsibility: string - interfaces: [string] - dependencies: - - component: string - relationship: string - integration_points: [string] -contracts: # Optional for LOW/MEDIUM; required for HIGH +implementation_specification: [string] # Should capture only information required for task coordination; do not create design-document-level detail. +contracts: # Required only for HIGH plans with cross-task, cross-agent, or cross-wave handoffs - from_task: string to_task: string interface: string @@ -234,8 +206,6 @@ tasks: description: string wave: number agent: string - prototype: boolean - priority: high | medium | low status: pending | in_progress | completed | failed | blocked | needs_revision # ─────────────────────────────────────────────────────────────────────── @@ -247,8 +217,6 @@ tasks: context_files: - path: string description: string - estimated_effort: small | medium | large - focus_area: string | null # set only when task spans multiple focus areas # ─────────────────────────────────────────────────────────────────────── # EXECUTION CONTROL (populated during runtime) @@ -257,27 +225,17 @@ tasks: flaky: boolean retries_used: number requires_design_validation: boolean # true for new UI, major redesigns, style/a11y/token work -debugger_diagnosis: - root_cause: string - target_files: [string] - fix_recommendations: string - injected_at: string - planning_pass: number - planning_history: - - pass: number - reason: string - timestamp: string + debugger_diagnosis: + root_cause: string + target_files: [string] + fix_recommendations: string + injected_at: string # ─────────────────────────────────────────────────────────────────────── # QUALITY GATES (verification criteria) # ─────────────────────────────────────────────────────────────────────── - acceptance_criteria: [string] - success_criteria: [string] # unified verification: human steps + machine-checkable predicates (e.g., "test_results.failed === 0") - failure_modes: - - scenario: string - likelihood: low | medium | high - impact: low | medium | high - mitigation: string + acceptance_criteria: [string] + success_criteria: [string] # unified verification: human steps + machine-checkable predicates; every implementation task should be independently testable or explicitly state why not. # ─────────────────────────────────────────────────────────────────────── # AGENT-SPECIFIC HANDOFFS (populated based on task agent) @@ -333,7 +291,11 @@ debugger_diagnosis: ## Context Envelope Format Guide -Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates of plan.yaml are removed — agents read plan.yaml directly for task registry, implementation spec, validation status, and detailed planning history. +Design Principle: + +- Cache-worthy, cross-session reusable context. Pure duplicates of plan.yaml are removed — agents read plan.yaml directly for task registry, implementation spec, validation status; store references/summaries only when reuse value is clear. +- Context envelope must justify each populated section by future reuse value. +- If a section is unlikely to save future discovery effort, omit it. ```jsonc { @@ -343,7 +305,6 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates "created_at": "ISO-8601 string", "last_updated": "ISO-8601 string", "version": "number", - "previous_version_fields_changed": ["string"], "source": ["string"], }, "scope": { @@ -351,12 +312,6 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates "applies_to": ["string"], "non_goals": ["string"], }, - "project_summary": { - "business_domain": "string", - "primary_users": ["string"], - "key_features": ["string"], - "current_phase": "string", - }, "tech_stack": [ { "name": "string", @@ -464,31 +419,10 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates "linked_patterns": ["string"], }, ], - "evidence_map": [ - { - "claim": "string", - "evidence_paths": ["string"], - }, - ], "reuse_notes": { "do_not_re_read": ["string"], "safe_to_assume": ["string"], "verify_before_use": ["string"], - }, - // Cache-worthy plan summary — quick context without reading full plan.yaml - "plan_summary": { - "tldr": "string — one-line plan summary", - "complexity": "simple | medium | complex", - "risk_level": "low | medium | high", - "key_assumptions": ["string"], // Cache-worthy: helps validate if plan still applies - "critical_risks": ["string"], // Cache-worthy: focus areas for future work - }, - // REMOVED (read from plan.yaml directly): - // - task_registry → docs/plan/{plan_id}/plan.yaml - // - implementation_spec → docs/plan/{plan_id}/plan.yaml - // - codebase_validation → docs/plan/{plan_id}/plan.yaml - // - plan_metadata (detailed) → docs/plan/{plan_id}/plan.yaml - // - research_findings (absorbed into research_digest) }, } ``` @@ -499,11 +433,14 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. @@ -511,25 +448,12 @@ Design Principle: Cache-worthy, cross-session reusable context. Pure duplicates ### Constitutional -- Never skip pre-mortem for complex tasks. If dependency cycle→restructure before output. +- Never skip pre-mortem for complex tasks; keep it to the top 3 realistic failure modes. - Evidence-based—cite sources, state assumptions. -- Minimum valid plan, nothing speculative. +- Minimum valid plan, nothing speculative; exclude speculative abstractions, nice-to-have refactors, and unrelated cleanup unless required by acceptance criteria. - Deliverable-focused framing. Assign only available_agents. - Feature flags: include lifecycle (create→enable→rollout→cleanup). - -#### Plan Verification Criteria - -Run these checks BEFORE saving plan.yaml. Fix all failures inline. - -- Plan: - - Valid YAML, required fields, unique task IDs, valid status values - - Concise, dense, complete, focused on implementation, avoids fluff/verbosity -- DAG: No circular deps, all dep IDs exist, no_deps → wave_1 -- Contracts: Valid from_task/to_task IDs, interfaces defined (required for HIGH complexity) -- Tasks: Valid agent assignments, failure_modes for high/medium tasks, verification present, success_criteria defined when needed - - Every debugger task has a paired implementer task (wave N+1 or later) - - If acceptance_criteria mentions tests → target_files must include test file paths -- Pre-mortem: overall_risk_level defined, critical_failure_modes present -- Implementation spec: code_structure, affected_areas, component_details defined +- Prefer extension points and additive changes over invasive rewrites when existing architecture supports them. +- Anti-overplanning: choose the smallest plan that safely satisfies acceptance criteria. Do not add tasks, contracts, agents, research, validation matrices, or documentation unless required by complexity, risk, or explicit acceptance criteria. diff --git a/agents/gem-researcher.agent.md b/agents/gem-researcher.agent.md index 6394b17b1..f28b2903e 100644 --- a/agents/gem-researcher.agent.md +++ b/agents/gem-researcher.agent.md @@ -1,7 +1,7 @@ --- -description: "Codebase exploration — patterns, dependencies, architecture discovery." +description: "Codebase exploration — patterns, dependencies, architecture discovery. Supports multiple exploration modes for cost-controlled research." name: gem-researcher -argument-hint: "Enter plan_id, objective, focus_area (optional), and context_envelope_snapshot." +argument-hint: "Enter plan_id, objective, focus_area (optional), exploration_mode (optional), and context_envelope_snapshot." disable-model-invocation: false user-invocable: false mode: subagent @@ -32,21 +32,37 @@ Explore codebase, identify patterns, map dependencies. Return structured JSON fi ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. + +Modes: Use `exploration_mode` to control cost and depth. Default is `scan` for backward compatibility. + +- `scan` — Quick keyword/pattern match, top N results. Low cost. No relationship mapping. +- `deep` — Full semantic + grep + relationship mapping. High cost. Use for architecture/impact analysis. +- `audit` — Inventory/checklist style. Low-medium cost. Lists what exists without deep tracing. +- `trace` — Follow a specific call/data chain end-to-end. Medium cost. Limited depth hops. +- `question` — Targeted lookup for a concrete question. Low cost. Returns focused answer. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. - Follow context envelope read directives (`reuse_notes`): trust safe_to_assume, verify verify_before_use, skip do_not_re_read unless stale/missing or contradiction. - Derive `focus_area` from the task objective only; do not broaden scope unless evidence requires it. +- Determine mode from `task_definition.exploration_mode`: + - Default: `scan` if not specified (preserves backward compatibility) + - Read budget controls from `task_definition`: `max_searches`, `max_files_to_read`, `max_depth` - Research Pass — Objective Aligned Pattern discovery: - Identify focus_area strictly from the task's objective. - Discovery via semantic_search + grep_search, scoped to focus_area. - - Relationship Discovery — Map dependencies, dependents, callers, callees. + - Conditional Relationship Discovery: + - `scan`/`question`/`audit` → skip relationship mapping (callers/callees/dependents) + - `trace` → map only the specific chain requested, respecting `max_depth` + - `deep` → full relationship discovery (default behavior) - Calculate confidence. -- Early Exit: - - If confidence ≥ 0.70 → skip relationships + detailed → Synthesize Phase. - - If decision_blockers resolved AND confidence ≥ 0.60 AND no critical open questions → early exit. - - Else → continue. +- Early Exit — in order of priority: + 1. Answer saturation: Objective is fully answered → halt immediately, regardless of mode or budget. + 2. Mode confidence threshold reached → halt. + 3. Budget exhausted → halt with current findings and note `budget_exhausted: true` in output. + 4. Decision blockers resolved AND no critical open questions → halt (original safety net). + - Budget exhaustion: If `max_searches` or `max_files_to_read` reached before confidence threshold, exit with current findings and note budget exhaustion in output. - Output: - Return JSON per Output Format. @@ -58,21 +74,53 @@ Batch/join dependency-free steps; serialize only true dependencies while still c Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. +````json +## Output Format + +Return ONLY valid JSON. Omit nulls, empty arrays, false booleans, and zero values. + ```json { - "status": "completed | failed | in_progress | needs_revision", - "task_id": "string", + "status": "completed | failed | needs_revision", "plan_id": "string", - "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific", - "confidence": 0.0-1.0, - "complexity": "simple | medium | complex", - "tldr": "string — dense bullet summary", - "coverage_percent": "number (0-100)", - "decision_blockers": "number", - "open_questions": ["string — max 3"], - "gaps": ["string — max 3"], - "learn": ["string — max 5"] + "task_id": "string", + "mode": "scan | deep | audit | trace | question", + "confidence": 0.0, + "workflow_complexity_hint": "TRIVIAL | LOW | MEDIUM | HIGH", + "tldr": "string — dense 1-3 bullet summary", + "evidence": [ + { + "type": "match | pattern | dependency | architecture | blocker | gap", + "file": "string", + "line": 123, + "note": "string" + } + ], + "blockers": ["string — max 3"], + "next_questions": ["string — max 3"], + "budget": { + "searches": 0, + "files_read": 0, + "depth_hops": 0, + "exhausted": true + }, + "fail": "transient | fixable | needs_replan | escalate | flaky | regression | new_failure | platform_specific" } +```` + +Rules: + +- Include `workflow_complexity_hint` only when relevant to assessment or Phase 0 classification. +- Include `budget` only when budget was constrained, exhausted, or useful for auditing. +- Include `fail` only when `status` is `failed` or `needs_revision`. +- Use `evidence` for all modes instead of separate `matches`, `inventory`, `trace`, and `findings`. +- Keep `evidence` to the top 3-8 most important items unless the task explicitly asks for inventory. +- `workflow_complexity_hint` is advisory only. The orchestrator decides final `workflow_complexity`. + +``` + +``` + ``` @@ -81,15 +129,18 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. - Test on sample/small input before full run. +- Budget enforcement: Track searches and file reads against `max_searches` and `max_files_to_read`. Halt exploration and return current findings when budget exhausted. ### Constitutional @@ -109,4 +160,12 @@ Start at 0.5. Adjust: Early exit: confidence≥0.70 OR (confidence≥0.60 AND decision_blockers resolved AND no critical open questions). +#### Mode-Specific Adjustments + +- `scan`/`question`: Start at 0.6 (cheaper to find matches), cap bonus at +0.20 +- `audit`: Start at 0.5, +0.05 per item inventoried +- `trace`: Start at 0.5, +0.10 per chain step traced (max +0.30) +- `deep`: Original rules apply + +``` diff --git a/agents/gem-reviewer.agent.md b/agents/gem-reviewer.agent.md index 71f95b02a..224cadd02 100644 --- a/agents/gem-reviewer.agent.md +++ b/agents/gem-reviewer.agent.md @@ -35,7 +35,7 @@ Scan security issues, detect secrets, verify PRD compliance. Never implement cod ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -55,10 +55,6 @@ Batch/join dependency-free steps; serialize only true dependencies while still c - Wave parallelism, conflicts_with not parallel. - Wave assignment: tasks with no dependencies are in wave 1. - Tasks have verification + acceptance_criteria. - - Test file inclusion: if acceptance_criteria requires tests, verify target_files includes corresponding test file using pattern matching. - - Report missing test files as non-critical findings. - - PRD alignment, valid agents. - - Tech stack: context_envelope.tech_stack exists and is non-empty. - Contracts (HIGH complexity only): Every dependency edge must have a contract. - Diagnose-then-fix: every debugger task has a paired implementer task in a later wave. - Status: @@ -120,11 +116,14 @@ Return ONLY valid JSON. CRITICAL: Omit nulls, empty arrays, zero values. ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/agents/gem-skill-creator.agent.md b/agents/gem-skill-creator.agent.md index 9953f6c9d..9d916f4c8 100644 --- a/agents/gem-skill-creator.agent.md +++ b/agents/gem-skill-creator.agent.md @@ -33,7 +33,7 @@ Extract reusable patterns from agent outputs and package as structured skill fil ## Workflow -Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. +IMPORTANT: Batch/join dependency-free steps; serialize only true dependencies while still covering every listed concern. - Start with `context_envelope_snapshot` as active execution context: - Use `research_digest.relevant_files` as the initial file shortlist. @@ -148,11 +148,14 @@ metadata: ## Rules +IMPORTANT: These rules are mandatory for every request and apply across all workflow phases. + ### Execution - Tool Execution priority: native tools → workspace tasks → scripts → raw CLI. -- Batch by default: Plan the action graph first, then execute all independent tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively, but serialize calls that depend on prior results, mutate the same file/resource, require validation, or may create conflicts. -- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. +- Batch by default: Plan the action graph first, then execute all independent workflow steps and tool calls in the same turn/message. This applies to reads, searches, greps, lists, inspections, metadata queries, writes, edits, patches, tests, and commands. Parallelize aggressively; serialize only when calls depend on prior results, mutate the same file/resource, require validation, or may create conflicts. +- Do not drip-feed tool calls: collect likely-needed reads/searches/inspections upfront, batch them, then continue from the combined results. +- Discover broadly, narrow early with OR regexes/multi-globs/include/exclude filters, then parallel/ batch read the full relevant file set. Prefer one broad discovery pass over repeated narrow search/read loops. - Execute autonomously; ask only for true blockers. - Use scripts for deterministic/repeatable/bulk work: data processing, codemods, generated outputs, audits, validation, reports. - Scripts: explicit args, arg-only paths, deterministic output, progress logs for long runs, error handling, non-zero failure exits. diff --git a/docs/README.agents.md b/docs/README.agents.md index 0e3aface0..657d66a5c 100644 --- a/docs/README.agents.md +++ b/docs/README.agents.md @@ -112,7 +112,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-agents) for guidelines on how to | [Gem Mobile Tester](../agents/gem-mobile-tester.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-mobile-tester.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-mobile-tester.agent.md) | Mobile E2E testing — Detox, Maestro, iOS/Android simulators. | | | [Gem Orchestrator](../agents/gem-orchestrator.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-orchestrator.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-orchestrator.agent.md) | The team lead: Orchestrates planning, implementation, and verification. | | | [Gem Planner](../agents/gem-planner.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-planner.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-planner.agent.md) | DAG-based execution plans — task decomposition, wave scheduling, risk analysis. | | -| [Gem Researcher](../agents/gem-researcher.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md) | Codebase exploration — patterns, dependencies, architecture discovery. | | +| [Gem Researcher](../agents/gem-researcher.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-researcher.agent.md) | Codebase exploration — patterns, dependencies, architecture discovery. Supports multiple exploration modes for cost-controlled research. | | | [Gem Reviewer](../agents/gem-reviewer.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-reviewer.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-reviewer.agent.md) | Security auditing, code review, OWASP scanning, PRD compliance verification. | | | [Gem Skill Creator](../agents/gem-skill-creator.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-skill-creator.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgem-skill-creator.agent.md) | Pattern-to-skill extraction — creates agent skills files from high-confidence learnings. | | | [Gilfoyle Code Review Mode](../agents/gilfoyle.agent.md)
[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgilfoyle.agent.md)
[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fgilfoyle.agent.md) | Code review and analysis with the sardonic wit and technical elitism of Bertram Gilfoyle from Silicon Valley. Prepare for brutal honesty about your code. | | diff --git a/plugins/gem-team/.github/plugin/plugin.json b/plugins/gem-team/.github/plugin/plugin.json index 7f60eea65..dd0ca5c97 100644 --- a/plugins/gem-team/.github/plugin/plugin.json +++ b/plugins/gem-team/.github/plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "gem-team", - "version": "1.61.0", + "version": "1.64.0", "description": "Self-Learning Multi-agent orchestration framework for spec-driven development and automated verification.", "author": { "name": "mubaidr",