diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1c2721e..c49e595 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -74,12 +74,22 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit ## Testing -There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are: +The repo ships a [tiered test suite](tests/README.md) that runs Claude Code headlessly (`claude -p`) to assert observable skill behavior — faster and more reliable than manual walkthroughs. At minimum, run these before opening a PR: + +```bash +./tests/run-tests.sh # tier-2 invariants + tier-1 skill knowledge tests (~4-5 min) +./tests/run-tests.sh --verbose # show per-assertion output +``` + +Additional validation steps: 1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors. 2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow. -3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile. -4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape). +3. **Run the fast test suite**: `./tests/run-tests.sh` — catches skill instruction drift and forbidden patterns without a full end-to-end run. +4. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile. +5. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape). + +For changes to `catalog/agent-team/*.md`, also run the tier-3 end-to-end tests before opening a PR — these exercise the full pipeline on a live playground repo and catch handoff breakage that fast tests miss. See [Running tier-3](tests/README.md#running-tier-3). Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run. diff --git a/catalog/agent-team/README.md b/catalog/agent-team/README.md index 5490837..8ab7e26 100644 --- a/catalog/agent-team/README.md +++ b/catalog/agent-team/README.md @@ -104,15 +104,53 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl 1. Open an issue describing what you want built. 2. Add the single label `agent-team`. 3. Watch the thread. Each role posts its contribution as a comment; the implementer opens a draft PR that closes the issue when merged. -4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually `gh workflow run` a specific role to retry a stuck stage. Manual dispatches must pass the required `workflow_dispatch` inputs, and the downstream workflow markdown must read them via `${{ github.event.inputs.* }}`. -5. **Retrying a blocked task**: clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment). +4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually re-dispatch a specific role to retry a stuck stage. All required inputs must be explicit — agents will not infer missing values from labels or history: + + ```bash + # Re-dispatch the planner (e.g. spec ran but planner stalled) + gh workflow run planner-agent.lock.yml -f issue_number= -f iteration=1 + + # Re-dispatch the implementer — first attempt (opens a new draft PR) + gh workflow run implementer-agent.lock.yml -f issue_number= -f iteration= + + # Re-dispatch the implementer — kickback (updates the existing PR) + gh workflow run implementer-agent.lock.yml -f issue_number= -f iteration= -f pr_number= + + # Re-dispatch the reviewer + gh workflow run reviewer-agent.lock.yml -f pr_number= -f issue_number= -f iteration= + ``` + +5. **Retrying a blocked task** (pipeline stalled, not just a single stage): clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment). ## Limits and gotchas - **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue. - **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback. -- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback. +- **Input propagation**: planner / implementer / reviewer fail loudly if required `workflow_dispatch` inputs are missing or unresolved. When `issue_number` is present, the stalled agent adds `state:blocked` and posts: `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` When `issue_number` itself is absent, the agent emits a `missing_data` safe-output and terminates. Neither case infers missing values from labels or activity. Re-dispatch manually with all inputs explicit to recover (see step 4 above). +- **`pr_number` on the implementer**: optional input. Blank or absent means first attempt — the implementer opens a new draft PR. A real PR number (passed by the reviewer on kickback) tells the implementer to push updates to the existing PR branch instead of opening a new one. - **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff. - **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs. - **No auto-merge**: the reviewer approves but never merges. Humans merge. - **Dispatch visibility**: each `dispatch-workflow` call shows up as a new run in the Actions tab, linked to the upstream run. Makes the chain visible. + +## Troubleshooting + +### `state:blocked` with "workflow_dispatch inputs were not propagated" + +A downstream agent received an empty or unresolved input. To recover: + +1. Remove the `state:blocked` label. +2. Re-dispatch the stalled role manually, passing every required input explicitly (see step 4 above). Agents never infer missing values — omitting any input will re-trigger the same block. + +### Implementer opened a new PR instead of updating the existing one + +The `pr_number` input was blank or absent during a kickback dispatch. The implementer always treats a missing `pr_number` as "first attempt → open new PR." Close the duplicate, then re-dispatch with the correct `pr_number`: + +```bash +gh workflow run implementer-agent.lock.yml \ + -f issue_number= -f iteration= -f pr_number= +``` + +### Spec agent did nothing after the `agent-team` label was added + +By design: spec-agent exits silently if the issue already has any `state:*` label or a `` block. Remove stale `state:*` labels from a prior run, then re-add the `agent-team` label. To rerun from scratch, also delete the prior spec comment.