Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions .github/skills/evaluate-skills/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
name: evaluate-skills
description: Use when creating, updating, or reviewing Vally evals for plugin skills. Covers eval.yaml, fixtures, graders, expect_skills, suites, tags, and eval coverage for new or changed skills.
---

# Evaluate plugin skills

Use the [Vally reference docs](https://microsoft.github.io/vally/) for schema and grader details.

This repo keeps evals under `evals/` and currently uses `.vally.yaml` to define the `pr`
suite from evals tagged with `priority: p0`.

## Layout

```text
.vally.yaml
evals/
├── <plugin-name>/
│ ├── eval.yaml
│ └── fixtures/
│ └── <scenario>/
│ └── ...
└── ...
```

- Each plugin keeps its eval spec in `evals/<plugin-name>/eval.yaml`.
- Put sample inputs under `evals/<plugin-name>/fixtures/`.
- Seed fixture files into the eval environment with `environment.files`.

## When to use this skill

Use this skill when the request involves:

- adding evals for a new plugin skill
- updating evals after changing an existing plugin skill
- writing or editing `evals/<plugin-name>/eval.yaml`
- defining fixtures, graders, `expect_skills`, tags, or suites
- checking whether plugin-skill coverage is missing

## Writing evals

1. Add or update `evals/<plugin-name>/eval.yaml` with `name`, `version`,
`description`, `tags`, `defaults`, `scoring`, and `stimuli`.
2. Define one or more `stimuli` with a realistic user `prompt`, the skill path in
`environment.skills`, and any seeded files in `environment.files`.
3. Use skill paths relative to the eval file, for example
`../../plugins/linting/skills/check-spelling`.
4. Add `constraints.expect_skills` so the eval asserts the intended skill was used.
5. Prefer outcome-based `graders` such as `file-matches`, `file-not-matches`, and
`file-exists`.
6. Keep fixtures small and assertions specific to the behavior the skill should change.

## Configuring evals

1. Keep `.vally.yaml` in sync if you add new suites or change how evals are grouped.
2. Use eval `tags` for suite filters such as the current `priority: p0` pull request
suite.
3. Lint specs with `npx -y @microsoft/vally-cli@0.6.0 lint --eval-spec evals`.
4. Run the pull request suite with
`COPILOT_GITHUB_TOKEN=... npx -y @microsoft/vally-cli@0.6.0 eval --suite pr --output-dir vally-results --junit`.

## Coverage rule

When adding a new plugin skill, or materially changing an existing plugin skill's
behavior, add or update eval coverage in the corresponding `evals/<plugin-name>/`
directory as part of the same change.
8 changes: 8 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Plugin marketplace for [GitHub Copilot CLI] and [Claude Code]. Defined in `.clau
| File | Purpose |
| --- | --- |
| `.claude-plugin/marketplace.json` | Marketplace manifest with plugin versions and sources |
| `.github/skills/<skill-name>/SKILL.md` | Repo-local skill for repository-specific guidance that should not always live in `AGENTS.md` |
| `plugins/<name>/.claude-plugin/plugin.json` | Plugin details: name, description, version, author, skills |
| `plugins/<name>/skills/<skill-name>/SKILL.md` | Skill — YAML frontmatter (`name`, `description`, `compatibility`) + instructions |
| `plugins/<name>/skills/<skill-name>/scripts/` | Python scripts and `requirements.txt` for the skill |
Expand All @@ -23,6 +24,7 @@ Each skill folder follows the [Agent Skills](https://agentskills.io) layout:
```

Skill paths in `plugin.json` are relative to the plugin directory (e.g., `"./skills/check-spelling"` resolves from `plugins/linting/`).
Repo-local skills under `.github/skills/` use the same skill-folder layout.

## Versioning

Expand Down Expand Up @@ -58,6 +60,12 @@ Keep the `README.md` **Plugins** section in sync with the marketplace:
2. **Remove** the subsection when a plugin is deleted from the marketplace.
3. **Update** description, skill names, and skill descriptions when they change in `plugin.json` or `SKILL.md` frontmatter.

## Evals

Add or update associated eval coverage under `evals/<plugin-name>/` when adding a new plugin skill or making substantial changes to an existing one.

Use `.github/skills/evaluate-skills/SKILL.md` for eval work: use when creating, updating, or reviewing Vally evals for plugin skills. Covers `eval.yaml`, fixtures, graders, `expect_skills`, suites, tags, and eval coverage for new or changed skills.

## Pre-commit checklist

1. Plugin files changed → bump plugin `version` in both `marketplace.json` and `plugin.json`.
Expand Down