From cd5b7f3b3f2278a0bf3920c20ece22d052468740 Mon Sep 17 00:00:00 2001 From: Pedro Paulo Vezza Campos Date: Mon, 18 May 2026 14:43:59 -0700 Subject: [PATCH] docs(skill): improve description to fix 20 pp recall gap in LLM routing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The current one-liner ("Automate browser interactions...") causes the model to miss 20% of valid invocations when routing from description text alone — verified by benchmarking 5 variants with OpenAI Codex as judge against a 50-prompt test set (40 should-trigger, 10 should-not). Benchmark methodology: skill renamed to a fictional name (`headless-pilot`) so the judge cannot use prior knowledge of playwright-cli — forcing it to route purely from description text. All descriptions had perfect precision (1.000); recall was what differed. Results: baseline (current) recall=0.800 F1=0.889 (8 misses) v1 explicit triggers recall=0.950 F1=0.974 (2 misses) v2 intent-first recall=0.925 F1=0.961 (3 misses) v3 verb-dense recall=0.950 F1=0.974 (2 misses) v4 this PR (winner) recall=0.975 F1=0.987 (1 miss) The 8 prompts the baseline misses all have unambiguous playwright-cli solutions but don't contain "automate" or "browser interactions": record video, export PDF, mock API endpoint, run a specific spec file, intercept network requests, debug a failing test by file path. The winning description uses "Use when the user says: ..." with quoted natural-language trigger phrases. This gives the routing model a direct string-match signal rather than requiring it to infer that "record a demo" ≡ "automate browser interactions". Only change: the `description` field in the YAML frontmatter. --- skills/playwright-cli/SKILL.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/skills/playwright-cli/SKILL.md b/skills/playwright-cli/SKILL.md index f034c32..15d3400 100644 --- a/skills/playwright-cli/SKILL.md +++ b/skills/playwright-cli/SKILL.md @@ -1,6 +1,10 @@ --- name: playwright-cli -description: Automate browser interactions, test web pages and work with Playwright tests. +description: > + Use when the user says: "go to this URL", "click this button", "fill this + form", "take a screenshot", "scrape this page", "log in to X", "run my + Playwright tests", "this test is failing", "write a test for", "mock this + API", "record a demo". The skill opens a live browser with Playwright and drives it. allowed-tools: Bash(playwright-cli:*) Bash(npx:*) Bash(npm:*) ---