Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,12 +74,22 @@ Edit the relevant `SKILL.md` or data file. Test by running the skill locally wit

## Testing

There is no automated test harness for skills — they are instruction sets interpreted by Claude Code, not code with unit tests. The validation steps are:
The repo ships a [tiered test suite](tests/README.md) that runs Claude Code headlessly (`claude -p`) to assert observable skill behavior — faster and more reliable than manual walkthroughs. At minimum, run these before opening a PR:

```bash
./tests/run-tests.sh # tier-2 invariants + tier-1 skill knowledge tests (~4-5 min)
./tests/run-tests.sh --verbose # show per-assertion output
```

Additional validation steps:

1. **Load the plugin**: `claude --plugin-dir .` — confirm no startup errors.
2. **Run the skill manually**: invoke `/discover-workflows` or `/install-workflow` and walk through the flow.
3. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
4. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).
3. **Run the fast test suite**: `./tests/run-tests.sh` — catches skill instruction drift and forbidden patterns without a full end-to-end run.
4. **Validate lock files** (if you changed `.lock.yml` files): `gh aw validate` — safe, does not recompile.
5. **Check grep counts** (if you applied the OAuth tweak): see [skills/install-workflow/auth.md](skills/install-workflow/auth.md#step-4--verify-the-tweak-shape).

For changes to `catalog/agent-team/*.md`, also run the tier-3 end-to-end tests before opening a PR — these exercise the full pipeline on a live playground repo and catch handoff breakage that fast tests miss. See [Running tier-3](tests/README.md#running-tier-3).

Never test by committing untested changes to `main`. The installed workflows run on push to `main`, so a broken install skill or a bad `.lock.yml` will trigger a live workflow run.

Expand Down
44 changes: 41 additions & 3 deletions catalog/agent-team/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,15 +104,53 @@ Then apply the OAuth token tweak to each `.lock.yml` per [`skills/install-workfl
1. Open an issue describing what you want built.
2. Add the single label `agent-team`.
3. Watch the thread. Each role posts its contribution as a comment; the implementer opens a draft PR that closes the issue when merged.
4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually `gh workflow run` a specific role to retry a stuck stage. Manual dispatches must pass the required `workflow_dispatch` inputs, and the downstream workflow markdown must read them via `${{ github.event.inputs.* }}`.
5. **Retrying a blocked task**: clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment).
4. Human override at any time: add `state:blocked` to halt, edit a comment to steer the next agent, or manually re-dispatch a specific role to retry a stuck stage. All required inputs must be explicit — agents will not infer missing values from labels or history:

```bash
# Re-dispatch the planner (e.g. spec ran but planner stalled)
gh workflow run planner-agent.lock.yml -f issue_number=<N> -f iteration=1

# Re-dispatch the implementer — first attempt (opens a new draft PR)
gh workflow run implementer-agent.lock.yml -f issue_number=<N> -f iteration=<N>

# Re-dispatch the implementer — kickback (updates the existing PR)
gh workflow run implementer-agent.lock.yml -f issue_number=<N> -f iteration=<N> -f pr_number=<PR-N>

# Re-dispatch the reviewer
gh workflow run reviewer-agent.lock.yml -f pr_number=<PR-N> -f issue_number=<N> -f iteration=<N>
```

5. **Retrying a blocked task** (pipeline stalled, not just a single stage): clear `state:blocked`, then re-add `agent-team`. Spec-agent treats it as a fresh dispatch (because the state:* labels are gone and the spec markers are already satisfied — actually: to redo from scratch, also delete the prior spec comment).

## Limits and gotchas

- **Concurrency**: each workflow uses `concurrency: group: agent-team-issue-${issue_number}` so only one role runs at a time per issue.
- **Max iterations**: default 3 (reviewer kickback → implementer). The counter lives on the `iteration` input passed through the dispatch chain, bumped exclusively by the reviewer on kickback.
- **Input propagation**: planner / implementer / reviewer must fail loudly if required `workflow_dispatch` inputs are missing. Do not rely on label search or recent-activity inference as a fallback.
- **Input propagation**: planner / implementer / reviewer fail loudly if required `workflow_dispatch` inputs are missing or unresolved. When `issue_number` is present, the stalled agent adds `state:blocked` and posts: `🛑 agent-team: workflow_dispatch inputs were not propagated. Re-dispatch with valid inputs.` When `issue_number` itself is absent, the agent emits a `missing_data` safe-output and terminates. Neither case infers missing values from labels or activity. Re-dispatch manually with all inputs explicit to recover (see step 4 above).
- **`pr_number` on the implementer**: optional input. Blank or absent means first attempt — the implementer opens a new draft PR. A real PR number (passed by the reviewer on kickback) tells the implementer to push updates to the existing PR branch instead of opening a new one.
- **Non-UI only**: no screenshot capture. Reviewer validates via tests/CI status + reading the diff.
- **Cost**: a single task can easily spend 4× the tokens of a monolithic workflow. Set `timeout-minutes` conservatively and monitor the first few runs.
- **No auto-merge**: the reviewer approves but never merges. Humans merge.
- **Dispatch visibility**: each `dispatch-workflow` call shows up as a new run in the Actions tab, linked to the upstream run. Makes the chain visible.

## Troubleshooting

### `state:blocked` with "workflow_dispatch inputs were not propagated"

A downstream agent received an empty or unresolved input. To recover:

1. Remove the `state:blocked` label.
2. Re-dispatch the stalled role manually, passing every required input explicitly (see step 4 above). Agents never infer missing values — omitting any input will re-trigger the same block.

### Implementer opened a new PR instead of updating the existing one

The `pr_number` input was blank or absent during a kickback dispatch. The implementer always treats a missing `pr_number` as "first attempt → open new PR." Close the duplicate, then re-dispatch with the correct `pr_number`:

```bash
gh workflow run implementer-agent.lock.yml \
-f issue_number=<N> -f iteration=<N> -f pr_number=<existing-PR-number>
```

### Spec agent did nothing after the `agent-team` label was added

By design: spec-agent exits silently if the issue already has any `state:*` label or a `<!-- agent-team:spec -->` block. Remove stale `state:*` labels from a prior run, then re-add the `agent-team` label. To rerun from scratch, also delete the prior spec comment.