diff --git a/.github/skills/evaluate-skills/SKILL.md b/.github/skills/evaluate-skills/SKILL.md new file mode 100644 index 0000000..6e694ea --- /dev/null +++ b/.github/skills/evaluate-skills/SKILL.md @@ -0,0 +1,66 @@ +--- +name: evaluate-skills +description: Use when creating, updating, or reviewing Vally evals for plugin skills. Covers eval.yaml, fixtures, graders, expect_skills, suites, tags, and eval coverage for new or changed skills. +--- + +# Evaluate plugin skills + +Use the [Vally reference docs](https://microsoft.github.io/vally/) for schema and grader details. + +This repo keeps evals under `evals/` and currently uses `.vally.yaml` to define the `pr` +suite from evals tagged with `priority: p0`. + +## Layout + +```text +.vally.yaml +evals/ +├── / +│ ├── eval.yaml +│ └── fixtures/ +│ └── / +│ └── ... +└── ... +``` + +- Each plugin keeps its eval spec in `evals//eval.yaml`. +- Put sample inputs under `evals//fixtures/`. +- Seed fixture files into the eval environment with `environment.files`. + +## When to use this skill + +Use this skill when the request involves: + +- adding evals for a new plugin skill +- updating evals after changing an existing plugin skill +- writing or editing `evals//eval.yaml` +- defining fixtures, graders, `expect_skills`, tags, or suites +- checking whether plugin-skill coverage is missing + +## Writing evals + +1. Add or update `evals//eval.yaml` with `name`, `version`, + `description`, `tags`, `defaults`, `scoring`, and `stimuli`. +2. Define one or more `stimuli` with a realistic user `prompt`, the skill path in + `environment.skills`, and any seeded files in `environment.files`. +3. Use skill paths relative to the eval file, for example + `../../plugins/linting/skills/check-spelling`. +4. Add `constraints.expect_skills` so the eval asserts the intended skill was used. +5. Prefer outcome-based `graders` such as `file-matches`, `file-not-matches`, and + `file-exists`. +6. Keep fixtures small and assertions specific to the behavior the skill should change. + +## Configuring evals + +1. Keep `.vally.yaml` in sync if you add new suites or change how evals are grouped. +2. Use eval `tags` for suite filters such as the current `priority: p0` pull request + suite. +3. Lint specs with `npx -y @microsoft/vally-cli@0.6.0 lint --eval-spec evals`. +4. Run the pull request suite with + `COPILOT_GITHUB_TOKEN=... npx -y @microsoft/vally-cli@0.6.0 eval --suite pr --output-dir vally-results --junit`. + +## Coverage rule + +When adding a new plugin skill, or materially changing an existing plugin skill's +behavior, add or update eval coverage in the corresponding `evals//` +directory as part of the same change. diff --git a/AGENTS.md b/AGENTS.md index e68063b..e6dbf9e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -7,6 +7,7 @@ Plugin marketplace for [GitHub Copilot CLI] and [Claude Code]. Defined in `.clau | File | Purpose | | --- | --- | | `.claude-plugin/marketplace.json` | Marketplace manifest with plugin versions and sources | +| `.github/skills//SKILL.md` | Repo-local skill for repository-specific guidance that should not always live in `AGENTS.md` | | `plugins//.claude-plugin/plugin.json` | Plugin details: name, description, version, author, skills | | `plugins//skills//SKILL.md` | Skill — YAML frontmatter (`name`, `description`, `compatibility`) + instructions | | `plugins//skills//scripts/` | Python scripts and `requirements.txt` for the skill | @@ -23,6 +24,7 @@ Each skill folder follows the [Agent Skills](https://agentskills.io) layout: ``` Skill paths in `plugin.json` are relative to the plugin directory (e.g., `"./skills/check-spelling"` resolves from `plugins/linting/`). +Repo-local skills under `.github/skills/` use the same skill-folder layout. ## Versioning @@ -58,6 +60,12 @@ Keep the `README.md` **Plugins** section in sync with the marketplace: 2. **Remove** the subsection when a plugin is deleted from the marketplace. 3. **Update** description, skill names, and skill descriptions when they change in `plugin.json` or `SKILL.md` frontmatter. +## Evals + +Add or update associated eval coverage under `evals//` when adding a new plugin skill or making substantial changes to an existing one. + +Use `.github/skills/evaluate-skills/SKILL.md` for eval work: use when creating, updating, or reviewing Vally evals for plugin skills. Covers `eval.yaml`, fixtures, graders, `expect_skills`, suites, tags, and eval coverage for new or changed skills. + ## Pre-commit checklist 1. Plugin files changed → bump plugin `version` in both `marketplace.json` and `plugin.json`.