feat: generate llms-full.txt with complete docs content#476
Open
sandroqdb wants to merge 3 commits into
Open
Conversation
The website's llms.txt has long advertised a full documentation corpus at /docs/llms-full.txt, but no such file was ever generated — the URL 404s. This adds scripts/generate-llms-full.js, which walks the sidebar (same order as llms.txt) and concatenates every doc's full markdown content into static/llms-full.txt, served at /docs/llms-full.txt. MDX processing (partials, component conversion, import stripping, heading bumping) reuses plugins/raw-markdown/convert-components, so the output matches the per-page .md endpoints exactly. Each doc entry carries a Source: line pointing at its canonical markdown URL. Output on current content: 354 docs, 2.69 MB. Wired into prebuild and gitignored like the other generated files. Companion to questdb/questdb.io#2923, which repairs the llms.txt link; once this deploys, the Full Documentation Content link can be restored there. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
🚀 Build success! Latest successful preview: https://preview-476--questdb-documentation.netlify.app/docs/ Commit SHA: 5d99349
|
…logic
Fixes from high-effort review of the first revision:
- Docs attached to a category only via link: {type: 'doc'} were silently
dropped; both generators now include them (llms.txt gains the one doc
the sidebar only references that way, cookbook/sql/finance/index).
- Pass the remote-repo-example plugin's data to convertAllComponents so
<RemoteRepoExample /> renders real code instead of its 'Example not
found' fallback.
- Doc ids listed in multiple sidebar positions are rendered once in
llms-full.txt (4 duplicate entries skipped, logged); doc count now
reflects rendered docs only.
- Extract canonical-URL construction into scripts/lib/docs-urls.js,
mirroring plugins/raw-markdown/index.js exactly (fixes latent
multi-segment relative-slug divergence) and shared by both the
llms.txt and llms-full.txt generators. Verified: URL set identical to
production llms.txt except the one added doc.
- Parse frontmatter with gray-matter (existing dep, same as the
raw-markdown plugin) instead of a hand-rolled regex.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…lience - Bump body headings by 2 (H1->H3) instead of 1: introduction.md and changelog.mdx carry body H1s that landed at H2 — the per-doc delimiter level — creating phantom doc boundaries. Verified fence-aware: 352 real H2 doc headers == 352 Source lines. - Fix section labeling: loose top-level docs before the first category form an 'Overview' section (no more duplicate 'Getting Started' headers), and loose docs after a category (changelog) get their own title-labeled section instead of folding into the preceding category. - Never fail the docs build on a GitHub flake: remote example data is only used for llms-full.txt, so loadContent gets one retry and then degrades to placeholder examples for that build instead of aborting the whole deploy. - Gate category link docs with subtreeContainsDoc (moved to shared scripts/lib/sidebar-utils.js, used by both generators) so llms-full orders them identically to llms.txt; buffer section bodies so a section whose docs all rendered elsewhere emits no bare header. - docs-urls: restore the introduction -> index.md fallback as a safety net against slug-extraction failure; document that a trailing slash in a slug is deliberately not stripped (the raw-markdown plugin writes '<slug>.md' verbatim, so stripping would link a path it never writes). llms.txt output verified byte-identical before/after the shared-walker refactor. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add
scripts/generate-llms-full.js, which generatesstatic/llms-full.txt— the complete text of all documentation as a single file, served athttps://questdb.com/docs/llms-full.txt.Why
The website's
llms.txt(questdb.io repo) has long advertised a full docs corpus at/docs/llms-full.txt, but no such file was ever generated — the URL 404s. questdb/questdb.io#2923 fixes the links on its side and auto-detects this file at build time, so the two PRs can merge in any order.How
documentation/sidebars.jsin the same order as thellms.txtgenerator. Top-level categories become#sections; loose docs before the first category form an Overview section; loose docs after a category (changelog) get their own title-labeled section. Categorylink: {type: 'doc'}pages are included exactly wherellms.txtlists them (sharedsubtreeContainsDocgate). Docs listed in several sidebar positions render once.plugins/raw-markdown/convert-componentspipeline with theremote-repo-exampleplugin's data, so<RemoteRepoExample />shows real code and output matches the per-page.mdendpoints. Doc entries:##title +Source:canonical markdown URL + body with headings bumped by 2 (some docs carry body H1s — bump-by-1 would collide with the doc-delimiter level).scripts/lib/docs-urls.js(canonical URLs, mirrors the raw-markdown plugin exactly, incl. an introduction→index.md safety net) andscripts/lib/sidebar-utils.js— both shared by the two generators.prebuild, gitignored likellms.txt/reference-full.md. Frontmatter viagray-matter(existing dep).Review history
Two high-effort multi-agent review rounds; every confirmed finding fixed:
Known accepted trade-offs (noted, deliberately out of scope): the MDX→md conversion runs at prebuild and in the plugin's postBuild (as
generate-reference-full.jsalready does); the render pipeline and URL logic are shared between the two generator scripts but the plugin itself doesn't consume the shared modules yet — consolidating generation into the plugin's postBuild is the right future cleanup but means touching the code path that produces every production.mdpage.Verified locally (fence-aware where relevant)
Source:lines — every doc boundary is real, no phantoms.cookbook/sql/finance.mdpresent;vpin.mdexactly once; zero "Example not found" placeholders (real Java/Python/… example code verified).llms.txt: URL set = production + exactly the one previously-missing category-link doc; byte-identical before/after the shared-module refactor.🤖 Generated with Claude Code