Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
752ebbd
DOC-3356: Initial file setup.
kemister85 Feb 2, 2026
8b01d04
DOC-3356: Initial documentation structure and layout content.
kemister85 Feb 9, 2026
28eef2e
DOC-3356: Add JWT authentication guides for plugin, major content cha…
kemister85 Feb 9, 2026
7075b22
DOC-3356: Update premium plugins link to point to intro page.
kemister85 Feb 9, 2026
4816a91
DOC-3366: Update Converters API ref links to v1. (#3977)
kemister85 Feb 9, 2026
9106ce6
DOC-3358: Restructure JWT guides, fix broken partials across JWT guides.
kemister85 Feb 11, 2026
4b7cb6b
DOC-3369: Fix incorrect EOPS date in 8x supported versions table. (#3…
kemister85 Feb 11, 2026
7bb9494
DOC-3356: Suggested improvements, copy edits and initial review changes.
kemister85 Feb 16, 2026
da64a12
DOC-3356: Update, improve example and context around tinymceai_quicka…
kemister85 Feb 16, 2026
74e85f4
DOC-3356: Restructure JWT auth docs, add permissions page with tables…
kemister85 Feb 17, 2026
a91997f
DOC-3356: Move limits and models in nav.adoc file.
kemister85 Feb 18, 2026
8e69ba1
DOC-3356: Refactor JWT intro page to include refinded details for tin…
kemister85 Feb 19, 2026
9f8763f
DOC-3356: Update options, fetch examples for JWT.
kemister85 Feb 25, 2026
a330761
DOC-3356: Remove toolbar button identifiers for buttons missing icons.
kemister85 Feb 26, 2026
778a140
DOC-3373: Add TinyMCE 8-specific llms.txt files for AI/LLM discoverab…
kemister85 Feb 26, 2026
7727a5b
Update .gitignore for Cursor workflow and local config (#3995)
kemister85 Mar 2, 2026
0ce5228
Fix llms-full.txt URL in generated llms.txt (#4004)
kemister85 Mar 4, 2026
60046d3
Merge tinymce/8 to get updated .gitignore (Cursor setup)
kemister85 Mar 4, 2026
78f1903
DOC-3356: Add svg icons
kemister85 Mar 5, 2026
f3193ee
DOC-3356: Update init content for tinymceai demo.
kemister85 Mar 5, 2026
58747c1
DOC-3356: Update icons and icon_list.
kemister85 Mar 9, 2026
800808a
DOC-3356: Add missing tinymceai_languages option and adhoc fixed.
kemister85 Mar 9, 2026
a8091d2
DOC-3356: copy edits, context styling corrections.
kemister85 Mar 10, 2026
be11756
DOC-3356: Adding -toc page-role to tinymceai demo pages to hide TOC
kemister85 Mar 10, 2026
6f62e1b
DOC-3356: Add note about permissions, and model-configuration to JWT …
kemister85 Mar 10, 2026
dd2a51f
DOC-3356: Add missing chat_welcome_message option and update JWT abou…
kemister85 Mar 10, 2026
de3ef1e
DOC-3356: update events.adoc to remove mention of tinymceai events
kemister85 Mar 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions -scripts/README-llm-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Generating LLM Files

This directory contains scripts to automatically generate `llms.txt` and `llms-full.txt` files for LLM consumption.

## Overview

The LLM files provide structured documentation references that help AI assistants:
- Find the correct documentation pages
- Understand the documentation structure
- Reduce hallucinations by providing accurate URLs
- Discover all available integration options

## Files

- `generate-llm-files.js` - Node.js script that generates the LLM files
- `generate-llm-files.sh` - Shell wrapper script for easier execution

## Usage

### Option 1: After Local Build

1. Build the documentation site:
```bash
yarn antora ./antora-playbook.yml
```

2. Generate LLM files from local sitemap:
```bash
yarn generate-llm-files
# or
./-scripts/generate-llm-files.sh
```

### Option 2: From Remote Sitemap

Generate directly from the published sitemap (useful for syncing with production):

```bash
yarn generate-llm-files-from-url
# or
node ./-scripts/generate-llm-files.js https://www.tiny.cloud/docs/antora-sitemap.xml
```

### Option 3: Custom Sitemap Source

```bash
node ./-scripts/generate-llm-files.js /path/to/sitemap.xml
# or
node ./-scripts/generate-llm-files.js https://example.com/sitemap.xml
```

## Workflow

### Manual Regeneration (Current Approach)

**After major/minor/patch releases:**
1. Run the script to regenerate files from production sitemap:
```bash
yarn generate-llm-files-from-url
```
This ensures the LLM files match what's actually published on the live site.

Alternatively, if you need to generate from a local build:
```bash
yarn generate-llm-files
```
2. Review the generated files in a PR
3. Commit and merge

**Why not automated in CI/CD?**
- The script makes 400+ HTTP requests to fetch H1 titles (~4-5 minutes)
- Resource-intensive and slow for every build
- Manual review ensures quality before committing
- Validates no 404s are listed and titles match actual page content

### File Locations

The files are generated in `modules/ROOT/attachments/`:
- `llms.txt` - Simplified, curated documentation index (~105 lines)
- `llms-full.txt` - Complete documentation index with all pages (~700 lines)

**Post-build:** Files are moved to the root directory (handled in separate PR) and accessible at:
- `https://www.tiny.cloud/docs/tinymce/latest/llms.txt`
- `https://www.tiny.cloud/docs/llms-full.txt`

## How It Works

1. **Reads sitemap.xml** - Extracts all documentation URLs from the sitemap (only `/latest/` URLs)
2. **Fetches H1 titles** - Makes HTTP requests to each page to get the actual H1 title (validates no 404s)
3. **Generates titles** - Uses fetched H1 titles, falls back to URL-based titles if fetch fails
4. **Categorizes pages** - Groups by topic (integrations, plugins, API, etc.) based on URL patterns
5. **Deduplicates** - Removes duplicate URLs and makes titles unique within categories
6. **Generates structured markdown** - Creates both simplified (`llms.txt`) and complete (`llms-full.txt`) indexes

## Customization

The script uses hardcoded categorization logic. To customize:

1. Edit `generate-llm-files.js`
2. Modify the `categorizeUrl()` function to adjust categorization
3. Update `generateLLMsTxt()` and `generateLLMsFullTxt()` to change output format

## Notes

- The script requires Node.js and `sanitize-html` package (installed via `yarn install`)
- Generated files are written to `modules/ROOT/attachments/`
- Uses only the sitemap (no dependency on `nav.adoc`)
- Fetches actual H1 titles from pages (validates no 404s)
- Rate-limited fetching: 10 concurrent requests with 100ms delay between batches
- Request timeout: 10 seconds per page
- Security: Validates URLs to prevent SSRF attacks (only allows tiny.cloud domains)
- Handles HTML entity decoding (`’` → `'`)
- Filters out error pages and duplicate URLs
- Makes titles unique within categories (e.g., "ES6 and npm (Webpack)", "ES6 and npm (Rollup)")
- Falls back to URL-based title generation if H1 fetch fails

## Troubleshooting

**Error: "Source not found"**
- Make sure the sitemap path is correct
- For remote URLs, check your internet connection
- For local files, ensure Antora has generated the site first

**Missing page titles**
- If H1 fetch fails, the script uses URL-based title generation as fallback
- Check that pages return valid HTML with H1 tags
- 404 pages are automatically filtered out

**Incorrect categorization**
- Review the `categorizeUrl()` function (note: function name is singular, not plural)
- Add custom patterns for new page types
Loading
Loading