Local pdf extractor

A modern client-side tool to bulk convert PDFs into a unified document. Features drag-and-drop, auto dark mode, and multi-language support. Built with React and PDF.js.

👉 Click here to test the page! 👈

🚀 Features

Bulk Extraction: Drag & drop unlimited PDF files at once.
Smart Formatting: Heuristic algorithms attempt to reconstruct paragraphs and detect headers (H1, bold text) from the raw PDF stream.
Multi-Format Output: Export merged content as:
- Markdown (.md): Perfect for LLM context or note-taking apps.
- HTML (.html): Ready for web use.
- Plain Text (.txt): Raw data.
Runs entirely in the browser: No server, no backend, no installation required.
Privacy-first: Your files never leave your computer.
Auto-Adaptive UI: Automatically detects system language (EN/IT) and Theme (Light/Dark).

🛠️ How it works

Upload your PDF(s) via the drag & drop interface.
The tool uses PDF.js to parse the binary data of each file locally.
It extracts text items and sorts them by coordinates (Y/X) to reconstruct the reading order.
An algorithm analyzes font size and spacing to determine line breaks and headers.
Turndown converts the structure into clean Markdown (if selected).
Download the single, unified document containing all data.

🏆 What makes it special?

Zero-Dependency Setup (Offline): Can be run offline by simply opening index.html if libraries are downloaded locally.
Header & Format Detection: Unlike standard "Select All > Copy" methods, this tool tries to preserve the semantic structure of the document (Titles, Bold text).
Infinite Scalability: Since it runs on your client machine, you are not limited by server upload caps or timeouts.

💡 Why use this tool?

LLM Context Preparation: Quickly merge 20+ PDFs into one Markdown file to feed into ChatGPT or Claude.
Research Consolidation: Combine multiple papers into a single searchable text file.
Privacy: Sensitive documents remain on your device.

🔒 Privacy & Security

All processing is done locally in your browser.
No file is sent to any server.
No data is stored; memory is cleared upon page refresh.

⚡ Getting Started

Online

Simply visit the Demo Page.

Local Installation (Offline)

Clone this repository.
Ensure the library files (react.js, pdf.js, etc.) are in the root folder.
Open index.html in your browser.
Toggle the comments in the <head> of the HTML file to switch from CDN to Local libraries.

✨ Limitations & Notes

Text-Based PDFs Only: This tool extracts text layers. It does not perform OCR. If your PDF is a scanned image (without a text layer), use my PDF Accessibility Fixer instead.
Complex Layouts: While it handles standard documents well, complex multi-column layouts or tables might be extracted linearly.
Formatting: The reconstruction is heuristic; it may not be pixel-perfect compared to the original visual layout.

📖 License

MIT License. See LICENSE for details.

🙏 Credits & Inspiration

PDF.js for parsing.
Turndown for HTML-to-Markdown conversion.
React & TailwindCSS for the UI.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
Local_pdf_extractor		Local_pdf_extractor
PDF_example		PDF_example
ReadMe_Imgs		ReadMe_Imgs
docs		docs
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local pdf extractor

👉 Click here to test the page! 👈

🚀 Features

🛠️ How it works

🏆 What makes it special?

💡 Why use this tool?

🔒 Privacy & Security

⚡ Getting Started

Online

Local Installation (Offline)

✨ Limitations & Notes

📖 License

🙏 Credits & Inspiration

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

License

R0mb0/Local_pdf_extractor

Folders and files

Latest commit

History

Repository files navigation

Local pdf extractor

👉 Click here to test the page! 👈

🚀 Features

🛠️ How it works

🏆 What makes it special?

💡 Why use this tool?

🔒 Privacy & Security

⚡ Getting Started

Online

Local Installation (Offline)

✨ Limitations & Notes

📖 License

🙏 Credits & Inspiration

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages