A modern client-side tool to bulk convert PDFs into a unified document. Features drag-and-drop, auto dark mode, and multi-language support. Built with React and PDF.js.
- Bulk Extraction: Drag & drop unlimited PDF files at once.
- Smart Formatting: Heuristic algorithms attempt to reconstruct paragraphs and detect headers (H1, bold text) from the raw PDF stream.
- Multi-Format Output: Export merged content as:
- Markdown (.md): Perfect for LLM context or note-taking apps.
- HTML (.html): Ready for web use.
- Plain Text (.txt): Raw data.
- Runs entirely in the browser: No server, no backend, no installation required.
- Privacy-first: Your files never leave your computer.
- Auto-Adaptive UI: Automatically detects system language (EN/IT) and Theme (Light/Dark).
- Upload your PDF(s) via the drag & drop interface.
- The tool uses PDF.js to parse the binary data of each file locally.
- It extracts text items and sorts them by coordinates (Y/X) to reconstruct the reading order.
- An algorithm analyzes font size and spacing to determine line breaks and headers.
- Turndown converts the structure into clean Markdown (if selected).
- Download the single, unified document containing all data.
- Zero-Dependency Setup (Offline): Can be run offline by simply opening
index.htmlif libraries are downloaded locally. - Header & Format Detection: Unlike standard "Select All > Copy" methods, this tool tries to preserve the semantic structure of the document (Titles, Bold text).
- Infinite Scalability: Since it runs on your client machine, you are not limited by server upload caps or timeouts.
- LLM Context Preparation: Quickly merge 20+ PDFs into one Markdown file to feed into ChatGPT or Claude.
- Research Consolidation: Combine multiple papers into a single searchable text file.
- Privacy: Sensitive documents remain on your device.
- All processing is done locally in your browser.
- No file is sent to any server.
- No data is stored; memory is cleared upon page refresh.
Simply visit the Demo Page.
- Clone this repository.
- Ensure the library files (
react.js,pdf.js, etc.) are in the root folder. - Open
index.htmlin your browser. - Toggle the comments in the
<head>of the HTML file to switch from CDN to Local libraries.
- Text-Based PDFs Only: This tool extracts text layers. It does not perform OCR. If your PDF is a scanned image (without a text layer), use my PDF Accessibility Fixer instead.
- Complex Layouts: While it handles standard documents well, complex multi-column layouts or tables might be extracted linearly.
- Formatting: The reconstruction is heuristic; it may not be pixel-perfect compared to the original visual layout.
MIT License. See LICENSE for details.
- PDF.js for parsing.
- Turndown for HTML-to-Markdown conversion.
- React & TailwindCSS for the UI.

