Fix remaining AFDocs checks: HTML/MD parity, llms.txt coverage, and discoverability directives {WIP}#2929
Draft
shahbaz17 wants to merge 6 commits into
Draft
Fix remaining AFDocs checks: HTML/MD parity, llms.txt coverage, and discoverability directives {WIP}#2929shahbaz17 wants to merge 6 commits into
shahbaz17 wants to merge 6 commits into
Conversation
Add turndown + gfm dependencies and enhance llms-html-injector to derive per-page .md files from rendered HTML (cheerio + turndown), prepend llms.txt directive to .md files, inject <link rel="alternate" type="text/markdown"> into HTML heads and an sr-only body directive, prune stale links from llms-*.txt, and generate llms-all-*.txt indexes from sitemap.xml. Also mark various DocItem UI chrome with data-markdown-ignore so HTML→MD parity ignores UI-only elements, and update static/llms.txt to reference the new complete indexes. package.json and package-lock.json updated to include the new dependencies.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Swap turndown/turndown-plugin-gfm for node-html-markdown and update llms HTML->MD pipeline. package.json/package-lock: add node-html-markdown and remove turndown deps. src/plugins/llms-html-injector: replace TurndownService with NodeHtmlMarkdown, add custom translators/options to preserve <details>/<summary> and narrow escaping for parity, change injected markdown alt link to use path-only hrefs, and add Vercel preview detection + host-rewrite logic to rewrite production URLs to the preview origin for llms*.txt and per-page .md files. src/theme/DocItem/Layout/index.jsx: change data-markdown-ignore wrapper from <span> to <div> around CopyPageButton so it is stripped correctly during MD generation.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c60b679. Configure here.
Add cheerio dependency and update preview URL handling for Vercel deployments. resolvePreviewSiteUrl now prefers VERCEL_BRANCH_URL and falls back to VERCEL_URL, with expanded JSDoc explaining Vercel env vars and rationale. postProcessLlmsOutput logs which env var was used and rewrites build artifacts to point at the preview host so AFDocs checks (llms txt/markdown link checks) work correctly on branch/preview/development deployments.
Regenerate per-page .md files from their rendered HTML and mark KaTeX MathML spans to improve HTML/Markdown parity. Renamed and expanded the HTML->MD flow (regenerateMdFromHtml): walk build/**/index.html, convert article content to markdown, create missing .md siblings, and report regenerated/created/skipped counts. Add a regex and injection step to mark <span class="katex-mathml"> with data-markdown-ignore (idempotent) and log how many spans were marked. Improve katex handling in htmlToMarkdown by extracting the visual .katex-html text into a <code> node to avoid duplicated math content and by flattening Docusaurus chrome inside <details> so NHM output matches the rendered HTML. Tweak node-html-markdown options/translators (details/summary handling and line-start escaping) and update injector return values and logging. Also add a Vercel route header to serve llms*.txt as text/markdown with utf-8 charset.
Rewrite requests for the site root to /llms.txt in middleware because the homepage has no .md sibling (the root index.html is skipped by the llms-html-injector), ensuring the documentation index is served as text/markdown. Add a Content-Signal line to static/robots.txt to advertise search/AI allowances (search=yes, ai-input=yes, ai-train=yes). Add a Link header for "/" in vercel.json pointing to </llms.txt> as rel="service-doc" and rel="alternate"; type="text/markdown" so clients can discover the markdown service doc.
Replace external MetaMask Provider URL with the internal /metamask-connect/evm/reference/provider-api/ link across many embedded-wallets EVM React Native docs. Also include minor updates to SDK docs, plugins, example maps, static images/logos and vercel.json to keep assets and tooling in sync.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Add turndown + gfm dependencies and enhance llms-html-injector to derive per-page .md files from rendered HTML (cheerio + turndown), prepend llms.txt directive to .md files, inject into HTML heads and an sr-only body directive, prune stale links from llms-.txt, and generate llms-all-.txt indexes from sitemap.xml. Also mark various DocItem UI chrome with data-markdown-ignore so HTML→MD parity ignores UI-only elements, and update static/llms.txt to reference the new complete indexes. package.json and package-lock.json updated to include the new dependencies.
Issue(s) fixed
Fixes #
Preview
Checklist
External contributor checklist
Note
Medium Risk
Large build-time content transformation affects every published
.mdandllms*.txtartifact; risk is operational (build time, parity regressions) rather than security, with no runtime auth or payment changes.Overview
Extends the llms-html-injector post-build pipeline so Agent Score / AFDocs checks pass for LLM-oriented docs: per-page
.mdis regenerated from rendered HTML (cheerio+node-html-markdown) instead of raw MDX, with KaTeX and escape rules tuned for markdown-content-parity.Adds llms.txt directives (blockquote on every
.md, sr-only body + path-relativerel="alternate"markdown links in HTML), sitemap-drivenllms-all-*.txtindexes, pruning of stale links in sectionllms-*.txt, and Vercel preview host rewriting for absolute URLs in text artifacts.static/llms.txtnow points at the complete page indexes.DocItem layout marks breadcrumbs, TOC, copy button, footer, and paginator with
data-markdown-ignoreso UI chrome is excluded from HTML↔MD comparison. New deps:cheerio,node-html-markdown.Reviewed by Cursor Bugbot for commit 614554b. Bugbot is set up for automated code reviews on this repo. Configure here.