From cd5b7f3b3f2278a0bf3920c20ece22d052468740 Mon Sep 17 00:00:00 2001
From: Pedro Paulo Vezza Campos <pedro@vezza.com.br>
Date: Mon, 18 May 2026 14:43:59 -0700
Subject: [PATCH] docs(skill): improve description to fix 20 pp recall gap in
 LLM routing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The current one-liner ("Automate browser interactions...") causes the
model to miss 20% of valid invocations when routing from description
text alone — verified by benchmarking 5 variants with OpenAI Codex as
judge against a 50-prompt test set (40 should-trigger, 10 should-not).

Benchmark methodology: skill renamed to a fictional name (`headless-pilot`)
so the judge cannot use prior knowledge of playwright-cli — forcing it to
route purely from description text. All descriptions had perfect precision
(1.000); recall was what differed.

Results:
  baseline (current)   recall=0.800  F1=0.889  (8 misses)
  v1 explicit triggers recall=0.950  F1=0.974  (2 misses)
  v2 intent-first      recall=0.925  F1=0.961  (3 misses)
  v3 verb-dense        recall=0.950  F1=0.974  (2 misses)
  v4 this PR (winner)  recall=0.975  F1=0.987  (1 miss)

The 8 prompts the baseline misses all have unambiguous playwright-cli
solutions but don't contain "automate" or "browser interactions":
record video, export PDF, mock API endpoint, run a specific spec file,
intercept network requests, debug a failing test by file path.

The winning description uses "Use when the user says: ..." with quoted
natural-language trigger phrases. This gives the routing model a direct
string-match signal rather than requiring it to infer that "record a
demo" ≡ "automate browser interactions".

Only change: the `description` field in the YAML frontmatter.
---
 skills/playwright-cli/SKILL.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/skills/playwright-cli/SKILL.md b/skills/playwright-cli/SKILL.md
index f034c32..15d3400 100644
--- a/skills/playwright-cli/SKILL.md
+++ b/skills/playwright-cli/SKILL.md
@@ -1,6 +1,10 @@
 ---
 name: playwright-cli
-description: Automate browser interactions, test web pages and work with Playwright tests.
+description: >
+  Use when the user says: "go to this URL", "click this button", "fill this
+  form", "take a screenshot", "scrape this page", "log in to X", "run my
+  Playwright tests", "this test is failing", "write a test for", "mock this
+  API", "record a demo". The skill opens a live browser with Playwright and drives it.
 allowed-tools: Bash(playwright-cli:*) Bash(npx:*) Bash(npm:*)
 ---