verkyyi · github-actions · May 23, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -74,12 +74,22 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit
 
 ## Testing
 
-There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are:
+The repo ships a [tiered test suite](tests/README.md) that runs Claude Code headlessly (`claude -p`) to assert observable skill behavior — faster and more reliable than manual walkthroughs. At minimum, run these before opening a PR:
+
+```bash
+./tests/run-tests.sh            # tier-2 invariants + tier-1 skill knowledge tests (~4-5 min)
+./tests/run-tests.sh --verbose  # show per-assertion output
+```
+
+Additional validation steps:
 
 1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors.
 2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow.
-3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
-4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
+3. **Run the fast test suite**: `./tests/run-tests.sh` — catches skill instruction drift and forbidden patterns without a full end-to-end run.
+4. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
+5. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
+
+For changes to `catalog/agent-team/*.md`, also run the tier-3 end-to-end tests before opening a PR — these exercise the full pipeline on a live playground repo and catch handoff breakage that fast tests miss. See [Running tier-3](tests/README.md#running-tier-3).
 
 Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run.
 

diff --git a/catalog/agent-team/README.md b/catalog/agent-team/README.md
@@ -104,15 +104,53 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl
 1. Open an issue describing what you want built.
 2. Add the single label `agent-team`.
 3. Watch the thread. Each role posts its contribution as a comment; the implementer opens a draft PR that closes the issue when merged.
-4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually `gh workflow run` a specific role to retry a stuck stage. Manual dispatches must pass the required `workflow_dispatch` inputs, and the downstream workflow markdown must read them via `${{ github.event.inputs.* }}`.
-5. **Retrying a blocked task**: clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment).
+4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually re-dispatch a specific role to retry a stuck stage. All required inputs must be explicit — agents will not infer missing values from labels or history:
+
+   ```bash
+   # Re-dispatch the planner (e.g. spec ran but planner stalled)
+   gh workflow run planner-agent.lock.yml -f issue_number=<N> -f iteration=1
+
+   # Re-dispatch the implementer — first attempt (opens a new draft PR)
+   gh workflow run implementer-agent.lock.yml -f issue_number=<N> -f iteration=<N>
+
+   # Re-dispatch the implementer — kickback (updates the existing PR)
+   gh workflow run implementer-agent.lock.yml -f issue_number=<N> -f iteration=<N> -f pr_number=<PR-N>
+
+   # Re-dispatch the reviewer
+   gh workflow run reviewer-agent.lock.yml -f pr_number=<PR-N> -f issue_number=<N> -f iteration=<N>
+   ```
+
+5. **Retrying a blocked task** (pipeline stalled, not just a single stage): clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment).
 
 ## Limits and gotchas
 
 - **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue.
 - **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback.
-- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback.
+- **Input propagation**: planner / implementer / reviewer fail loudly if required `workflow_dispatch` inputs are missing or unresolved. When `issue_number` is present, the stalled agent adds `state:blocked` and posts: `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` When `issue_number` itself is absent, the agent emits a `missing_data` safe-output and terminates. Neither case infers missing values from labels or activity. Re-dispatch manually with all inputs explicit to recover (see step 4 above).
+- **`pr_number` on the implementer**: optional input. Blank or absent means first attempt — the implementer opens a new draft PR. A real PR number (passed by the reviewer on kickback) tells the implementer to push updates to the existing PR branch instead of opening a new one.
 - **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff.
 - **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs.
 - **No auto-merge**: the reviewer approves but never merges. Humans merge.
 - **Dispatch visibility**: each `dispatch-workflow` call shows up as a new run in the Actions tab, linked to the upstream run. Makes the chain visible.
+
+## Troubleshooting
+
+### `state:blocked` with "workflow_dispatch inputs were not propagated"
+
+A downstream agent received an empty or unresolved input. To recover:
+
+1. Remove the `state:blocked` label.
+2. Re-dispatch the stalled role manually, passing every required input explicitly (see step 4 above). Agents never infer missing values — omitting any input will re-trigger the same block.
+
+### Implementer opened a new PR instead of updating the existing one
+
+The `pr_number` input was blank or absent during a kickback dispatch. The implementer always treats a missing `pr_number` as "first attempt → open new PR." Close the duplicate, then re-dispatch with the correct `pr_number`:
+
+```bash
+gh workflow run implementer-agent.lock.yml \
+  -f issue_number=<N> -f iteration=<N> -f pr_number=<existing-PR-number>
+```
+
+### Spec agent did nothing after the `agent-team` label was added
+
+By design: spec-agent exits silently if the issue already has any `state:*` label or a `<!-- agent-team:spec -->` block. Remove stale `state:*` labels from a prior run, then re-add the `agent-team` label. To rerun from scratch, also delete the prior spec comment.