Skip to content

feat: add robots.txt allowing search engines and AI crawlers#134

Draft
TaprootFreak wants to merge 1 commit into
developfrom
feat/robots-allow-ai-crawlers
Draft

feat: add robots.txt allowing search engines and AI crawlers#134
TaprootFreak wants to merge 1 commit into
developfrom
feat/robots-allow-ai-crawlers

Conversation

@TaprootFreak
Copy link
Copy Markdown
Collaborator

Add repo-controlled robots.txt that allows search engines and AI crawlers

This adds a version-controlled robots.txt for the public DFX documentation
(docs.dfx.swiss) that explicitly allows both search engines and AI agents
to crawl, index, and learn from the content.

What

  • New file: src/.vuepress/public/robots.txt
  • VuePress copies everything in .vuepress/public/ verbatim to the published
    site root, so it is served at https://docs.dfx.swiss/robots.txt.
  • Grants all content signals — search=yes, ai-input=yes, ai-train=yes — and
    lists the major AI crawlers individually (ClaudeBot, GPTBot, Google-Extended,
    CCBot, Bytespider, Amazonbot, Applebot-Extended, meta-externalagent) in
    addition to the wildcard group, because some bots honor only their own named
    record.

Why

The documentation is public and we want it discoverable by search and usable as
input for AI agents / RAG and training. Keeping the policy in the repo makes it
authoritative and reviewable.

No Sitemap line

The site does not serve or generate a sitemap: GET /sitemap.xml returns a
soft-404 (text/html, the SPA 404.html), and the VuePress build output
contains no sitemap.xml. A Sitemap: directive was therefore intentionally
omitted.

Verification

  • Built locally with npm run build; confirmed robots.txt lands at the
    published root (dist/robots.txt) with identical content.

Note on activation

This file becomes the effective crawl policy only once the site's hosting-level
"Manage robots.txt / Block AI bots" toggle is disabled at the CDN, so that the
repo-served file is what visitors and crawlers receive.

Add a version-controlled robots.txt served from the VuePress public dir
(copied to the published site root) that explicitly allows search engines
and AI agents to crawl, index, and learn from the public documentation.

It grants all content signals (search, ai-input, ai-train) and lists the
major AI crawlers individually in addition to the wildcard group, since some
honor only their own named record.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant