Skip to content

Fix remaining AFDocs checks: HTML/MD parity, llms.txt coverage, and discoverability directives {WIP}#2929

Draft
shahbaz17 wants to merge 6 commits into
mainfrom
imp-agt-scr-90-100
Draft

Fix remaining AFDocs checks: HTML/MD parity, llms.txt coverage, and discoverability directives {WIP}#2929
shahbaz17 wants to merge 6 commits into
mainfrom
imp-agt-scr-90-100

Conversation

@shahbaz17
Copy link
Copy Markdown
Member

@shahbaz17 shahbaz17 commented May 27, 2026

Description

Add turndown + gfm dependencies and enhance llms-html-injector to derive per-page .md files from rendered HTML (cheerio + turndown), prepend llms.txt directive to .md files, inject into HTML heads and an sr-only body directive, prune stale links from llms-.txt, and generate llms-all-.txt indexes from sitemap.xml. Also mark various DocItem UI chrome with data-markdown-ignore so HTML→MD parity ignores UI-only elements, and update static/llms.txt to reference the new complete indexes. package.json and package-lock.json updated to include the new dependencies.

Issue(s) fixed

Fixes #

Preview

Checklist

  • If this PR updates or adds documentation content that changes or adds technical meaning, it has received an approval from an engineer or DevRel from the relevant team.
  • If this PR updates or adds documentation content, it has received an approval from a technical writer.

External contributor checklist

  • I've read the contribution guidelines.
  • I've created a new issue (or assigned myself to an existing issue) describing what this PR addresses.

Note

Medium Risk
Large build-time content transformation affects every published .md and llms*.txt artifact; risk is operational (build time, parity regressions) rather than security, with no runtime auth or payment changes.

Overview
Extends the llms-html-injector post-build pipeline so Agent Score / AFDocs checks pass for LLM-oriented docs: per-page .md is regenerated from rendered HTML (cheerio + node-html-markdown) instead of raw MDX, with KaTeX and escape rules tuned for markdown-content-parity.

Adds llms.txt directives (blockquote on every .md, sr-only body + path-relative rel="alternate" markdown links in HTML), sitemap-driven llms-all-*.txt indexes, pruning of stale links in section llms-*.txt, and Vercel preview host rewriting for absolute URLs in text artifacts. static/llms.txt now points at the complete page indexes.

DocItem layout marks breadcrumbs, TOC, copy button, footer, and paginator with data-markdown-ignore so UI chrome is excluded from HTML↔MD comparison. New deps: cheerio, node-html-markdown.

Reviewed by Cursor Bugbot for commit 614554b. Bugbot is set up for automated code reviews on this repo. Configure here.

Add turndown + gfm dependencies and enhance llms-html-injector to derive per-page .md files from rendered HTML (cheerio + turndown), prepend llms.txt directive to .md files, inject <link rel="alternate" type="text/markdown"> into HTML heads and an sr-only body directive, prune stale links from llms-*.txt, and generate llms-all-*.txt indexes from sitemap.xml. Also mark various DocItem UI chrome with data-markdown-ignore so HTML→MD parity ignores UI-only elements, and update static/llms.txt to reference the new complete indexes. package.json and package-lock.json updated to include the new dependencies.
@shahbaz17 shahbaz17 requested review from a team as code owners May 27, 2026 07:21
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
metamask-docs Ready Ready Preview, Comment May 29, 2026 11:43am

Request Review

@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 27, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednode-html-markdown@​2.0.010010010081100

View full report

Comment thread src/plugins/llms-html-injector/index.js Outdated
Comment thread src/theme/DocItem/Layout/index.jsx Outdated
Swap turndown/turndown-plugin-gfm for node-html-markdown and update llms HTML->MD pipeline. package.json/package-lock: add node-html-markdown and remove turndown deps. src/plugins/llms-html-injector: replace TurndownService with NodeHtmlMarkdown, add custom translators/options to preserve <details>/<summary> and narrow escaping for parity, change injected markdown alt link to use path-only hrefs, and add Vercel preview detection + host-rewrite logic to rewrite production URLs to the preview origin for llms*.txt and per-page .md files. src/theme/DocItem/Layout/index.jsx: change data-markdown-ignore wrapper from <span> to <div> around CopyPageButton so it is stripped correctly during MD generation.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c60b679. Configure here.

Comment thread src/plugins/llms-html-injector/index.js
Add cheerio dependency and update preview URL handling for Vercel deployments.

resolvePreviewSiteUrl now prefers VERCEL_BRANCH_URL and falls back to VERCEL_URL, with expanded JSDoc explaining Vercel env vars and rationale. postProcessLlmsOutput logs which env var was used and rewrites build artifacts to point at the preview host so AFDocs checks (llms txt/markdown link checks) work correctly on branch/preview/development deployments.
@shahbaz17 shahbaz17 marked this pull request as draft May 27, 2026 08:28
@shahbaz17 shahbaz17 changed the title llms: regenerate markdown, inject directives {WIP} Fix remaining AFDocs checks: HTML/MD parity, llms.txt coverage, and discoverability directives {WIP} May 27, 2026
@shahbaz17 shahbaz17 marked this pull request as ready for review May 27, 2026 08:39
@shahbaz17 shahbaz17 marked this pull request as draft May 27, 2026 08:52
Regenerate per-page .md files from their rendered HTML and mark KaTeX MathML spans to improve HTML/Markdown parity. Renamed and expanded the HTML->MD flow (regenerateMdFromHtml): walk build/**/index.html, convert article content to markdown, create missing .md siblings, and report regenerated/created/skipped counts. Add a regex and injection step to mark <span class="katex-mathml"> with data-markdown-ignore (idempotent) and log how many spans were marked. Improve katex handling in htmlToMarkdown by extracting the visual .katex-html text into a <code> node to avoid duplicated math content and by flattening Docusaurus chrome inside <details> so NHM output matches the rendered HTML. Tweak node-html-markdown options/translators (details/summary handling and line-start escaping) and update injector return values and logging. Also add a Vercel route header to serve llms*.txt as text/markdown with utf-8 charset.
Rewrite requests for the site root to /llms.txt in middleware because the homepage has no .md sibling (the root index.html is skipped by the llms-html-injector), ensuring the documentation index is served as text/markdown. Add a Content-Signal line to static/robots.txt to advertise search/AI allowances (search=yes, ai-input=yes, ai-train=yes). Add a Link header for "/" in vercel.json pointing to </llms.txt> as rel="service-doc" and rel="alternate"; type="text/markdown" so clients can discover the markdown service doc.
Replace external MetaMask Provider URL with the internal /metamask-connect/evm/reference/provider-api/ link across many embedded-wallets EVM React Native docs. Also include minor updates to SDK docs, plugins, example maps, static images/logos and vercel.json to keep assets and tooling in sync.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant