Skip to content

exbuf/peekdocs

Repository files navigation

👀 peekdocs

Latest release   Tests   Python 3.10+   License: MIT

Actively maintained — last reviewed July 2026.

🌍 ES FR DE JP CN BR — GUI available in 7 languages — click here for an intro in yours

 

ES Español

Tienes archivos. Necesitas encontrar algo en ellos.

peekdocs es un banco de trabajo de búsqueda local que hace exactamente eso en más de 100 formatos de archivo — Word, PDF, Excel, correo electrónico, documentos escaneados, archivos comprimidos, código fuente — sin subir nada a ningún lugar. GUI, CLI y API de Python. Funciona en Windows, macOS y Linux. Gratuito y de código abierto bajo la Licencia MIT.

Diseñado para personas que prefieren herramientas locales, transparentes y deterministas. Sin nube, sin telemetría, sin llamadas de red.

Flujo de trabajo típico: Buscar en una carpeta de documentos de formato mixto → revisar coincidencias en la Vista previa de resultados → generar un informe DOCX o HTML resaltado → guardar la búsqueda → añadirla a un Conjunto de Búsqueda → programarla semanalmente.

El flujo principal de trabajo está disponible en español — la pantalla principal, los botones de búsqueda, las opciones de búsqueda avanzada y los mensajes de estado más comunes. Las ventanas de ayuda, los diálogos detallados, los mensajes del CLI y los informes de salida permanecen en inglés más abajo.

Los términos legales (Licencia MIT, garantía, licencias de dependencias) son vinculantes solo en inglés.

FR Français

Vous avez des fichiers. Vous devez y trouver quelque chose.

peekdocs est un atelier de recherche locale qui fait exactement cela à travers plus de 100 formats de fichiers — Word, PDF, Excel, e-mail, documents numérisés, archives, code source — sans rien téléverser nulle part. GUI, CLI et API Python. Fonctionne sous Windows, macOS et Linux. Gratuit et open source sous licence MIT.

Conçu pour les personnes qui préfèrent les outils locaux, transparents et déterministes. Pas de cloud, pas de télémétrie, pas d'appels réseau.

Flux de travail typique : Rechercher dans un dossier de documents de formats mixtes → examiner les correspondances dans l'Aperçu des résultats → générer un rapport DOCX ou HTML surligné → enregistrer la recherche → l'ajouter à une Suite de recherche → la planifier chaque semaine.

Le flux de travail principal est disponible en français — la page principale, les boutons de recherche, les options de recherche avancées et les messages de statut les plus courants. Les fenêtres d'aide, les dialogues détaillés, les messages CLI et les rapports de sortie restent en anglais ci-dessous.

Les termes juridiques (Licence MIT, garantie, licences des dépendances) font foi uniquement en anglais.

DE Deutsch

Sie haben Dateien. Sie müssen etwas darin finden.

peekdocs ist eine lokale Such-Werkbank, die genau das über 100+ Dateiformate hinweg leistet — Word, PDF, Excel, E-Mail, gescannte Dokumente, Archive, Quellcode — ohne irgendetwas irgendwohin hochzuladen. GUI, CLI und Python-API. Läuft unter Windows, macOS und Linux. Kostenlos und Open Source unter MIT-Lizenz.

Entwickelt für Menschen, die lokale, transparente und deterministische Werkzeuge bevorzugen. Keine Cloud, keine Telemetrie, keine Netzwerkaufrufe.

Typischer Arbeitsablauf: Einen Ordner mit gemischten Dokumenten durchsuchen → Treffer in der Ergebnis-Vorschau prüfen → einen hervorgehobenen DOCX- oder HTML-Bericht erstellen → die Suche speichern → sie zu einer Such-Suite hinzufügen → wöchentlich planen.

Der Haupt-Arbeitsablauf ist auf Deutsch verfügbar — die Hauptseite, die Such-Schaltflächen, die erweiterten Suchoptionen und die häufigsten Status-Meldungen. Hilfe-Fenster, detaillierte Dialoge, CLI-Meldungen und Ausgabe-Berichte bleiben auf Englisch weiter unten.

Rechtliche Bedingungen (MIT-Lizenz, Gewährleistung, Abhängigkeitslizenzen) sind nur in englischer Sprache verbindlich.

JP 日本語

ファイルがあります。その中から何かを見つける必要があります。

peekdocs はまさにそれを行うローカルな検索ワークベンチで、Word、PDF、Excel、メール、スキャンドキュメント、アーカイブ、ソースコードなど 100 以上のファイル形式を、どこにもアップロードせずに検索します。GUI、CLI、Python API として利用できます。Windows、macOS、Linux で動作します。MIT ライセンスの下で無料・オープンソース。

ローカル、透明性のある、決定論的なツールを好む人のために構築されています。クラウドなし、テレメトリーなし、ネットワーク通信なし。

典型的なワークフロー: 混合形式のドキュメントフォルダを検索 → 結果プレビューで一致箇所を確認 → ハイライト付きの DOCX または HTML レポートを生成 → 検索を保存 → 検索スイートに追加 → 毎週スケジュール実行。

主要なワークフロー (メインページ、検索ボタン、詳細検索オプション、一般的なステータスメッセージ) は日本語で利用できます。 ヘルプウィンドウ、詳細ダイアログ、CLI メッセージ、出力レポートは英語のままです。詳細は下の英語版をご覧ください。

法的条件 (MIT ライセンス、保証、依存ライブラリのライセンス) は英語版のみが正式なものです。

CN 简体中文

您有文件。您需要在其中找到某些内容。

peekdocs 是一款本地搜索工作台,正是为此而生 — 可在 100 多种文件格式中搜索(Word、PDF、Excel、电子邮件、扫描文档、归档、源代码),不会将任何内容上传到任何地方。提供 GUI、CLI 和 Python API。可在 Windows、macOS 和 Linux 上运行。基于 MIT 许可证免费开源。

为偏好本地、透明、确定性工具的人士而构建。无云端、无遥测、无网络调用。

典型工作流程: 搜索混合格式的文档文件夹 → 在结果预览中查看匹配项 → 生成高亮显示的 DOCX 或 HTML 报告 → 保存搜索 → 将其添加到搜索套件 → 安排每周运行。

主要工作流程已提供简体中文版本 — 主页面、搜索按钮、高级搜索选项以及最常见的状态消息。 帮助窗口、详细对话框、CLI 消息和输出报告仍为英文。详细信息请参见下方英文版。

法律条款(MIT 许可证、保修、依赖项许可)仅以英文版本为准。

BR Português brasileiro

Você tem arquivos. Você precisa encontrar algo neles.

O peekdocs é uma bancada de trabalho de busca local que faz exatamente isso em mais de 100 tipos de arquivos — Word, PDF, Excel, e-mail, documentos digitalizados, arquivos compactados, código-fonte — sem enviar nada para lugar nenhum. GUI, CLI e API Python. Funciona em Windows, macOS e Linux. Software livre e de código aberto sob a Licença MIT.

Feito para quem prefere ferramentas locais, transparentes e determinísticas. Sem nuvem, sem telemetria, sem chamadas de rede.

Fluxo de trabalho típico: Pesquisar uma pasta de documentos de formatos mistos → inspecionar correspondências na Pré-visualização de Resultados → gerar um relatório DOCX ou HTML destacado → salvar a pesquisa → adicioná-la a um Conjunto de Pesquisa → agendá-la semanalmente.

O fluxo de trabalho principal está disponível em português brasileiro — página principal, botões de pesquisa, Opções Avançadas de Pesquisa e as mensagens de status mais comuns. Janelas de ajuda, diálogos detalhados, mensagens do CLI e relatórios de saída permanecem em inglês. Veja abaixo a versão em inglês para detalhes completos.

Os termos legais (Licença MIT, garantia, licenciamento de dependências) são oficialmente válidos apenas em inglês.

 

You have files. You need to find something in them.

peekdocs is a local search workbench that does exactly that across 100+ file types — Word, PDF, Excel, email, archives, source code, and scanned documents via OCR — through a single pipeline. Saved-search suites group recurring workflows into one combined report; a regex pattern workbench runs named collections across any folder; everything runs locally with no network. GUI, CLI, and Python API. Runs on Windows, macOS, and Linux. Free and open-source under the MIT License.

Built for people who prefer private, transparent, deterministic tools. No cloud, no telemetry, no network calls.

Typical workflow: Search a folder of mixed-format documents → inspect matches in the Results Preview → generate a highlighted DOCX or HTML report → save the search → add it to a Search Suite → schedule it weekly.

Who Is It For?

peekdocs is built for anyone who has files and needs to find something in them — across many kinds of files at once (Word, PDF, Excel, email, scanned documents, archives, and 100+ more), entirely on your own computer.

A few examples of what people could do with it:

  • Developer / programmer — Run a regex collection against a source tree and generate JSON. peekdocs also covers the documents that live outside the source tree: legacy specs and requirements in Word/PDF, email archives from past projects, vendor documentation and SDK guides in PDF, archived releases inside .zip / .7z files, scanned whiteboard photos (OCR), old project logs and meeting notes. Find "what did the client say about the authentication requirement in 2019" — pull the answer out of a .docx email attachment buried in a .zip archive without unpacking anything. Also useful for searching across entire codebases — every TODO/FIXME/HACK across all your projects at once, pre-upgrade audit for deprecated APIs, config/build files (.yaml, .toml, .json, .gradle, .cmake, ...), multi-repo search from a parent folder. Use Lines Before/After to see the full function or block surrounding each match. peekdocs handles 40+ source-code and shell-script extensions; see Supported File Types for the full list.
  • Sysadmin — Search 20 GB of log files for a request ID across mixed archives. peekdocs reads .gz, .bz2, .zip, and .tar archives natively, so you don't have to unpack before searching. Long-running --watch mode streams NDJSON matches to stdout for pipeline integration; pair with saved regex collections for patterns your team runs regularly.
  • AI/ML engineer — Search training logs for specific metrics, hyperparameters, or error messages across experiment runs. Find every reference to a model name, checkpoint path, or dataset version across scripts, configs, and documentation. peekdocs reads Jupyter notebooks (.ipynb), JSONL training data (.jsonl), Scala Spark pipelines (.scala), and all common config formats. Search across READMEs, docstrings, and markdown files for outdated model names or deprecated API versions.
  • IT consultant — Search a folder of client documents for a set of terms. Or carry the standalone binary on a USB stick and run against a client's drive without installing anything — --output-dir back to the USB, --no-index to leave zero artifacts on the client machine. See Portable / consulting use for the full workflow and the five common engagement types.
  • Data researcher — Search hundreds of CSV and Excel files for a specific value, account number, or outlier. Cross-reference interview transcripts, survey responses, and field notes for the same keyword to triangulate findings. Literature review: search 500 downloaded PDFs for a method name, author, or statistical technique. Find which analysis scripts reference a specific dataset, parameter, or threshold.
  • Engineer — Search 200 datasheets, design reviews, test reports, and failure analyses for a component value, part number, or tolerance across PDFs and scanned drawings. Find which documents reference a standard (MIL-STD-810, IEC 61508, ISO 9001). Search old design reviews and trade studies to find why a decision was made years ago. Locate error codes and symptoms across equipment manuals and maintenance logs. OCR reads scanned engineering drawings and handwritten notes. The highlighted Word report can be attached to a design review or emailed directly. Supported engineering formats: .m (MATLAB), .v .vhd .vhdl .sv (Verilog/VHDL/SystemVerilog), .cir .sp .spice (SPICE netlists), .dxf (AutoCAD interchange), .vsdx (Visio diagrams), .cmake (CMake build files).
  • Documentation team / tech writer — Search for outdated references, inconsistent terminology, deprecated product names, or specific version numbers across an entire documentation set. Verify consistency across Word docs, PDFs, HTML exports, and Markdown files in a single search.
  • Auditor or review specialist — Sweep a folder of contracts, financial schedules, vendor correspondence, and scanned exhibits for named patterns (account numbers, party names, dollar thresholds, date ranges, status markers) using a saved collection of evidentiary patterns. Suites capture the methodology for a recurring engagement — same searches, same folder, repeatable result. The Regex Search workbench holds named pattern collections reusable across engagements. OCR reads scanned exhibits and image-based PDFs. Optional SHA-256 fingerprints (--hash) attach to every matched file so the same run can be reproduced and verified later. The highlighted DOCX or HTML report is the handoff artifact — matches highlighted in yellow, grouped by section, ready to attach to a workpaper. Repeatable methodology, OCR for scanned exhibits, SHA-256 reproducibility, no-cloud confidentiality. peekdocs is a finding tool, not an engagement-management platform: it doesn't track reviewer assignments, redactions, or time — pair it with whatever case-management workflow your firm already uses.
  • Researcher — Search 3,000 PDFs (journal articles, interview transcripts, survey responses, field notes, datasets) for a specific term, author, citation, or data point and export highlighted results. OCR reads scanned source materials and historical documents. The highlighted Word report doubles as an annotated bibliography.
  • Small business owner — Find vendor contracts expiring in the next 90 days. Save searches by name and reload them later; search across contracts, invoices, reports, and correspondence for terms, pricing, or expiration dates. Personal side: pull a tax document from any of the last seven years across mixed folders — tax returns, insurance policies, receipts, warranties, estate documents.
  • Office worker — Find all invoices over $10,000 from 2024. (fully worked, GUI and CLI, in User Guide → Example 8)

The audiences and scenarios above describe possible uses of peekdocs. peekdocs is provided "as is" under the MIT License, without warranty of any kind, express or implied.

What makes peekdocs distinctive

The combination of local + privacy-first + grep-like power + OCR + regex workflows + reporting + automation across heterogeneous document collections is unusual. peekdocs delivers all of them in one tool.

Watch peekdocs in action

peekdocs has three search modes, each with its own big button on the main page, color-coded so you can tell them apart in the clips below:

  • Standard Search (the large blue Run button on the left; the smaller blue square next to it is the Search Wizard — a form-builder on-ramp with 20 pre-built search-type forms, not a fourth mode) — the everyday keyword search: type terms, pick a folder, hit Run. Shown in the hero clip below.
  • Search Suites (green button) — a named group of saved standard searches that run together and produce one combined highlighted report. The recurring-workflow-in-one-click mode.
  • Regex Search (orange button) — a named collection of regex patterns run against a folder, with per-pattern match counts and per-pattern report sections. The evidentiary-pattern workbench.

Start with the short getting-started clip below, then the hero clip (a Standard Search), Suites, and Regex Search, capped by a tour of the settings surface — every knob one click away.

Prefer to pause, rewind, or scrub to a specific moment? Every clip below is also available as a pausable MP4 on the maintainer's personal site.

peekdocs first-time on-ramp — pointing at a folder, running a first search, opening the highlighted report, looping

Getting started with peekdocs — the first-time on-ramp: point at a folder, run a first search, open the highlighted report. The clips that follow drill into the three search modes and the settings surface.

 

peekdocs GUI mid-search — same budget search the caption describes, looping

A ~46-second walkthrough as a looping GIF: peekdocs searches for budget across a 10,411-file folder and reports back in 3.17 seconds*, with matches highlighted in yellow in the preview pane. The clip then opens the File Types and Categories charts to show the breadth of what was searched in that single pass — PDFs, Word and Excel docs, slides, emails, e-books, OCR'd images, archives, source code, and plain text. * MacBook M4 Pro

 

peekdocs Search Suites — running the Quarterly Content Audit suite, one click, one combined highlighted report, looping

Search Suites running a user-built suite of six standard searches (draft / stale / TODO / owner-missing / outdated-link / deprecated-terminology sweeps) fired together on one click, results merged into one combined highlighted report. The recurring-workflow-in-one-click mode.

 

peekdocs Regex Search — a saved collection of regex patterns run against a folder, per-pattern match counts and report sections, looping

Regex Search running a user-built collection of 10 patterns (TODO / FIXME / HACK markers, Python and JavaScript debug statements, breakpoint / pdb drops, @deprecated markers, UPPER_CASE constants, SemVer version strings) against a source tree in one pass, with per-pattern match counts in the results popup and per-pattern sections in the report — the evidentiary-pattern workbench.

 

peekdocs settings surface — sliding the left curtain open, Advanced Search Options, App Size, Language, Tooltips, Light/Dark, Preview Size, looping

A tour of the settings surface — sliding open the left curtain to reveal settings, expanding the Advanced Search Options row, and cycling through App Size, Language (7 UI translations), Tooltips, Light / Dark mode, and Preview Size. Every knob one click away, none of it buried behind a Preferences dialog.

 

Free   ·   Open-Source (MIT License)   ·   No Cloud   ·   Private   ·   Easy to Use

Windows   ·   macOS   ·   Linux     |     GUI   ·   CLI   ·   Python API

 

Feature Highlights

A workbench for document collections: search them, characterize them through built-in analysis tools, produce highlighted reports, monitor folders live via --watch, and drive it all through whichever interface fits — GUI, CLI, or Python API.

  • 100+ file types in one query — Word, PDF, Excel, email, source code, archives, scanned PDFs (OCR), and more, searched simultaneously.
  • Local-only by design — no network calls, no telemetry, no account; runs with your normal user permissions on Windows, macOS, and Linux.
  • Cloud-output guard — every report write is checked against a cloud-sync detector; iCloud Drive, OneDrive, Google Drive, and Dropbox output paths trigger a prompt (redirect to a safe local folder / write anyway / cancel), or silently redirect if the sticky Advanced Search Options checkbox is on. The no-cloud confidentiality check is enforced at write time, not just at search time.
  • Search depth beyond grep — 20-form Search Wizard, regex collections, Boolean / fuzzy / proximity / inverse / range, plus a long-running --watch mode for live folder monitoring.
  • Built-in analysis and reporting — Duplicate Finder, File Inventory, Age Distribution, Change Tracking; highlighted reports in DOCX / HTML / PDF and machine-readable CSV / JSON / NDJSON.
  • Repeatable workflows — Saved Searches, Search Suites, Regex Collections, Schedule Search, Search History, and Diff Snapshots compose into one workflow system.
  • Same engine across GUI, CLI, and Python API — schemas are shared, so a search you build in the GUI today drives from a Python script or cron job tomorrow with identical results.
  • Polished GUI — yellow-highlighted matches in the preview and the reports, tooltips on every control, dark/light/system theme, adjustable text size, and contextual ? help popups throughout.
  • Works in any language — Like most modern search tools, peekdocs supports Unicode-based exact-character matching for searching documents in any language (no stemming or word segmentation; works equally for English prose, Chinese text, code identifiers, account numbers). The peekdocs GUI itself is also translated into 7 languages — uncommon for a search tool at this scale (partial, native-reviewed contributions welcome).

Detail and caveats on each capability live in the Features section below.

 

How these compose

The Feature Highlights above list the primitives individually. This section is about what happens when you combine them — three compositions that aren't obvious from the bullet list:

Live pattern sweep--watch + --regex-collection. Watch a folder and re-run a saved regex collection on every file create/modify, emitting one self-contained NDJSON record per match to stdout. Pipe stdout to jq, a log shipper, or a shell loop that fires a notification — a live pattern sentinel with no cron and no polling. (Note: --on-match fires on batch searches, not from --watch mode — for --watch the stdout NDJSON stream is the notification channel.)

peekdocs --watch --regex-collection "my patterns" -d ~/docs -r \
  | jq -c '{file, line, pattern_name}'

See the worked example in USER_GUIDE.md § A worked example: real-time pattern monitoring with --watch.

Provenance audit--diff + --hash. --hash bakes a SHA-256 fingerprint of each matched file into the JSON output. Capture a baseline, wait, capture again, --diff the two — results bucket into new / removed / changed / modified, and "changed" means the file's content actually differs, not just its mtime. Match-level and content-level change detection in one workflow.

peekdocs --hash --stdout budget > baseline.json
# …weeks later…
peekdocs --hash --stdout budget > current.json
peekdocs --diff baseline.json current.json

See the worked example in USER_GUIDE.md § A worked example: audit engagement provenance.

Scheduled pattern scan — cron / Task Scheduler + --regex-collection. The GUI's Schedule Search (Tools → Schedule Search) generates a copy-paste-ready cron (macOS/Linux) or Task Scheduler (Windows) command that invokes peekdocs --regex-collection NAME --timestamp — a dated report every N hours or days, no manual runs. Pair with --on-match CMD for notifications when patterns actually appear (email, Slack, PagerDuty — you write the script; peekdocs invokes it with match count and report paths as env vars).

See the worked example in USER_GUIDE.md § A worked example: nightly source-tree watch, which layers this composition with the provenance-audit one to build a full "detect + notify + preserve evidence" workflow.

For every flag and composition above as a copy-pasteable one-liner, see the Complete CLI Reference in USER_GUIDE — 197+ commands, grouped by feature, searchable with Cmd+F / Ctrl+F.

 

Local-only by design. No network calls, no telemetry, no cloud, no account. peekdocs runs entirely on your machine with your normal user permissions — no admin or root required, and it works fine on air-gapped systems with no internet connection.

 

Why local? Most people have at least some documents they would rather not hand to a third party — drafts, work-in-progress, personal correspondence, financial paperwork. peekdocs is local-only because that's the only way the answer to "where does this go?" stays "nowhere — it stayed on my machine." The tradeoff is real: peekdocs doesn't summarize, doesn't answer questions about your documents, doesn't infer meaning. Those are jobs cloud AI tools do well; peekdocs is for finding exact text in a lot of files, repeatably, on your own machine.

 

Transparency over magic. If a file wasn't searched, peekdocs tells you why. If OCR couldn't extract text, you'll know. If a report was created, you'll know where it is. peekdocs favors observable behavior over hidden processing.

 

Quick install

  1. No Python? Download the standalone app — the GUI and CLI binaries are separate downloads; pick what you need.

  2. Have Python 3.10+? A single command installs everything — the GUI, the CLI, and the Python API:

    pipx install git+https://github.com/exbuf/peekdocs.git

    (Already installed? Upgrade with pipx upgrade peekdocs.)

See Installation below for per-platform notes, the pip alternative, upgrade, and uninstall.

Windows tip: if this fails with an SSL / SNI / certificate error in Command Prompt, try the same command in PowerShell instead. See docs/INSTALLATION.md → Windows cmd.exe SSL / SNI / certificate errors for the diagnosis and fix.

What running peekdocs looks like:

# Search from the terminal — peekdocs searches the current directory,
# so cd to the folder you want first
cd ~/Documents
peekdocs "budget"
# Found 47 match(es) in 12 file(s). Files searched: 238 (142.50 MB).
#   2024_tax_return_summary.pdf: 8
#   quarterly_report_Q1.docx: 6
#   vendor_contract_2024.pdf: 5  ...

# Search with the GUI
peekdocs-gui

# Search from the Python API — pass a real path (no shell ~ expansion here)
import os
from peekdocs import search
results = search(["budget"], directory=os.path.expanduser("~/Documents"))
for match in results.matches:
    print(f"{match.filename}:{match.line_num} {match.text}")

Contents

CLI at a Glance

# Recursive search for "budget"
peekdocs -r budget

# Preflight: how many files would this search touch, and how big?
peekdocs --dry-run -r ~/Documents budget

# Regex pattern, piped through jq for the match count
peekdocs --stdout -x "\d{3}-\d{4}" | jq '.matches_found'

# Run a saved Search Suite by name
peekdocs --suite "Code hygiene"

peekdocs -h shows every flag, file type, and regex pattern. The User Guide covers the CLI in full.

Pointing peekdocs at your whole home directory or / is slow — even with --dry-run. Tree walks across ~/Library, every git repo, every node_modules, every Python venv, and every browser cache can easily mean hundreds of thousands of files; the enumeration phase alone can run 5–10+ minutes before any content is read. Press Ctrl+C to cancel at any time. Narrow the path (peekdocs -r ~/Documents budget) or restrict file types (peekdocs -r -t pdf,docx,xlsx ~ budget) to cut the corpus to seconds. During long runs, peekdocs prints Scanning files (this may take a while on large folders)... to stderr while enumerating, then switches to a live [██░░] 12345/89201 file.pdf progress bar once content reads begin.

Features

peekdocs has three search modes, each writing its own self-described report family next to your documents so they never collide:

Mode How to run Reports
Standard Search Blue Run Standard Search button on the main screen, or peekdocs <terms> peekdocs_standard_results.{txt,docx,csv,json,pdf,html}
Regex Search Orange Regex Search button on the main screen (opens the regex popup; its own Run Regex Search button executes the collection), or peekdocs --regex-collection NAME peekdocs_regex_results.{txt,docx}
Suite (group of saved searches) Green Search Suites button on the main screen (opens the suite popup; its own Run Search Suite button executes the selected suite), or peekdocs --suite NAME peekdocs_suite_results.{txt,docx,html,csv,json}

The "mode" is the workflow, not the flag set. A one-off peekdocs -x "pattern" (or -z, -w, -W) is a Standard Search with a regex/fuzzy/wildcard flag and writes peekdocs_standard_results.*. Only the dedicated Regex Search workflow — the GUI popup or --regex-collection — produces peekdocs_regex_results.*.

All three share the same engine, flags, and 100+ file-type support. The matching peekdocs_<mode>_results.* naming means a Regex run never overwrites a Standard run (and vice versa), and peekdocs --clear / Clear Files can find them by prefix. Within a mode, each run overwrites the previous report — add --timestamp (CLI) or check Timestamp in Advanced Search Options (GUI) to append _YYYYMMDD_HHMMSS so every run is preserved. The Schedule Search dialog enables timestamping by default for cron / Task Scheduler use.

Naming convention — no exceptions. Every file peekdocs creates uses the peekdocs_ prefix (visible outputs like the reports above, the error log, tools-menu outputs, and release binaries — which use the dash variant peekdocs-) or the .peekdocs prefix (hidden user-state / per-folder dotfiles: ~/.peekdocsrc, ~/.peekdocs_history.json, .peekdocs_collection.json, .peekdocs.db, etc.). Anything in your folders that doesn't start with one of these two prefixes was not created by peekdocs. For the per-file inventory — what each file contains, sensitivity rating, and how to clean it up — see docs/SECURITY.md.

Search & discovery

  • 100+ file types — Word, PDF, Excel, PowerPoint, emails (.eml, .msg, .pst, .mbox), archives (.zip, .7z, .rar), source code (Python, C/C++, Java, Go, Rust, and more), engineering files (MATLAB, Verilog, VHDL, SPICE, DXF, Visio), Apple Pages/Numbers/Keynote, calendars (.ics), contacts (.vcf), e-books, HTML, and more. Note: .pst requires libpff-python (no Windows wheel) and .rar requires the unrar tool — see Prerequisites
  • Search modes — plain keywords, AND/OR, Boolean expressions, regex, wildcards, fuzzy matching, whole-word, word proximity, line proximity
  • Range queries — filter by dollar amounts, dates, percentages, ages, file sizes
  • OCR — search scanned PDFs and images (requires Tesseract)
  • Multi-folder search — search across multiple folders at once, with optional recursive searching into subfolders. Click +Folder to add folders, or type semicolon-separated paths. Results are combined from all folders
  • Inverse search — find files that are missing required content
  • Search Wizard — guided search builder with 20 pre-built search types (phone, email, dollar range, date range, Boolean, fuzzy, and more) plus a regex pattern builder with 35 named patterns across 6 categories — no flags or regex knowledge needed
  • ▶ Save / ▶ Reload — save a configured search by name and reload it later with one click
  • Recent searches — your last 10 searches are remembered for re-use. Each entry captures the FULL search context (terms + folder + every Advanced Search Options setting), so selecting one from the ▼ Recent popup restores all of those in one click. With the search bar focused, press / to walk through the same list — the arrow shortcut copies only the search-terms text into the bar (leaving your current Advanced options untouched), so use the arrows when you want to reuse the wording with the current settings, and the Recent popup when you want the whole configuration back. ▶ Save is for keeping a configuration permanently under a name, beyond the 10-entry rolling Recent window
  • Search index — optional SQLite FTS5 index for faster repeated searches
  • Works in any language — Unicode-based text handling; searches documents in any language with exact character-sequence matching (no stemming or word segmentation). Documentation is English-only; the GUI ships partial UI translation in seven languages (English, Español, Français, Deutsch, 日本語, 简体中文, Português brasileiro) for the search workflow — see UI translation in the Feature Highlights above — but help popups, dialogs, the CLI banner, and reports remain English. The PDF report uses a Latin-1 font, so non-Latin text shows as ? in .pdf only — use .docx, .html, .txt, .json, or .csv for non-Latin content.

Reporting

  • Highlighted reports — results saved to .docx and .pdf with yellow-highlighted matches, .txt with full context, and optional CSV and JSON output
  • Results preview — see matches inline in the GUI with highlighted terms. View Text on any matched file shows the file's full extracted text with every match highlighted, without opening external software. Double-click any file to open in its native application; click DOCX, HTML, or PDF to open the highlighted multi-file report
  • HTML export — no Word or LibreOffice? Enable HTML output and the highlighted report opens in any browser. The file is stored locally — nothing is uploaded, and it's easy to share by email
  • Desktop notification on complete — opt-in checkbox in Advanced Search Options. When a Standard / Suite / Regex run finishes, fires a native desktop notification (macOS Notification Center, Windows toast, Linux libnotify) with the match count, file count, and elapsed time. Suppressed when the peekdocs window is focused — if you can already see the result, no notification fires. No data leaves the machine

Analysis

  • Collection Summary — one-page consolidated overview of the search folder: total file count and size, oldest/newest file, top file types, age histogram, top 10 largest files, recent-activity counts, unsearchable breakdown, and empty-file count — all in a single fast pass
  • File Inventory — instant summary of every file in a folder: total count, size breakdown by type, oldest and newest files
  • Duplicate Finder — finds identical files by content (not just name), shows how much space is wasted by extra copies
  • Large Files — shows the 50 biggest files so you can reclaim disk space
  • Empty Files — finds zero-byte files: failed downloads, placeholders, junk
  • File Age Distribution — histogram of how recently files were modified, in six buckets from 0–6 months out to 10+ years. Useful for archives, document collections, and personal files — surfaces stale folders at a glance and shows what fraction of a collection is recent activity vs. long-untouched material
  • Recent Changes — which files were modified in the last 7, 30, or 90 days
  • Protected Files — detects password-protected PDFs, Word/Excel/PowerPoint, ZIP/7z/RAR archives that peekdocs can't search
  • Unsearchable Files — categorizes every file peekdocs cannot search (unsupported types, oversized, empty, hidden / OS metadata, peekdocs-created) with counts and per-category file lists. Answers "what fraction of this folder is even searchable?" before you run a search
  • Bookmarks — pin files from search results for quick access later

Automation & integration

  • Search Suites — group saved searches into a named suite and run them all at once (green Search Suites button on the main screen)
  • Repeatable workflows — Saved Searches, Search Suites, Regex Collections, Schedule Search, Search History, and Diff Snapshots compose into a workflow system: define a search by name; group related searches into a suite; reuse pattern sets via Regex Collections; schedule a suite to run on a cadence; audit every run via Search History; compare today's run against last week's via Diff Snapshots.
  • Search History — automatic diary of every search you run: date, terms, match count, file count, elapsed time
  • Diff Snapshots — compare two saved scans to see what files are new, changed, removed, or unchanged between them
  • Schedule Search — generates a ready-to-paste cron (Mac/Linux) or Task Scheduler (Windows) command to run any saved search suite or regex collection on a schedule. Step-by-step instructions walk you through pasting it into the scheduler
  • Indexes — build, refresh, or delete the optional search index that makes repeated searches dramatically faster
  • Three interfaces — terminal CLI, point-and-click GUI (peekdocs-gui), Python API
  • Cross-platform — Windows, macOS, Linux

Privacy & transparency

  • Offline and private — your documents never leave your computer. peekdocs never uploads, transmits, alters, moves, or deletes your files. No cloud, no accounts, no subscriptions. Everything runs locally and stays local
  • Read-only — peekdocs never modifies, moves, or deletes your files. It does create its own output files (reports, indexes, settings) and can delete those when you ask (e.g., Tools → Clear Files, Tools → Indexes → Delete Index(es))
  • Delete on Close — one checkbox automatically deletes every result file and the search index across the session when you close peekdocs. Saved reports, saved searches, settings, and bookmarks are preserved
  • Safe defaults — files over 100 MB are skipped automatically to prevent slow searches and memory issues; archives that would expand past 500 MB are skipped to prevent archive bombs. Adjust Max File Size in Advanced Search Options or set it to 0 for no limit
  • Excluded Files view — after each search, see exactly which files were skipped and why (unsupported type, oversized, hidden, etc.) — no guessing what was missed
  • Error Log — opens peekdocs_errors.log to see any files that couldn't be read and why (corrupt, locked, password-protected, etc.)
  • Clear Files — selectively delete peekdocs's output files (reports, error log, saved searches, index) from the current folder
  • Clean Folder — same idea for any other folder, in case peekdocs files were generated elsewhere

Supported File Types

Category Formats
Documents .doc .docx .epub .html .key .md .odp .odt .pages .pdf .ppt .pptx .rst .rtf .tex
Spreadsheets .csv .numbers .ods .tsv .xls .xlsx
Email .eml .mbox .msg .pst (.pst requires libpff-python — no Windows wheel; see Troubleshooting)
Archives .7z .bz2 .gz .rar .tar .tgz .zip (.rar requires the unrar tool — see Prerequisites)
Calendar/Contacts .ics .vcf
Source Code .asm .bat .c .cmake .cpp .cs .css .f .f90 .go .gradle .h .hpp .java .js .kt .lua .pl .ps1 .py .r .rb .rs .s .scala .scss .sh .swift .tcl .ts .vb
Engineering .cir .dxf .m .sp .spice .sv .v .vhd .vhdl .vsdx
Data/Config .cfg .conf .dockerfile .env .graphql .gql .ini .json .jsonl .log .makefile .ndjson .properties .proto .sql .tf .toml .txt .xml .yaml .yml
Notebooks .ipynb (Jupyter)
Images (OCR) .bmp .jpg .jpeg .png .tif .tiff (requires -O flag)

Note: Apple Numbers (.numbers) and Keynote (.key) files created with recent versions of iWork use a protobuf-based internal format. peekdocs extracts whatever readable text exists inside these files, which may be partial. Older iWork files extract fully. Apple Pages (.pages) is fully supported.

Installation

Prerequisites · Option A: Standalone Download · Option B: pipx (for Python users) · Upgrading

Cautious about installing? See docs/INSTALL_SAFETY.md — plain-English explanation of what peekdocs does and doesn't do, what the SmartScreen / Gatekeeper warnings actually mean, and five ways to verify the download yourself before you run it (checksum match, VirusTotal scan, network monitor, source-code grep, sandbox install).

Prerequisites

Using Option A (standalone download)? Skip this section — no prerequisites needed.

Requirement Why How
Python 3.10+ Required for Option B and source install macOS: brew install python (or python.org). Windows: python.org, check "Add Python to PATH". Linux: sudo apt install python3-venv python3-pip python3-tk. Per-platform deep dives in docs/INSTALLATION.md
Tkinter GUI only (CLI works without it) Windows: included. macOS Homebrew: brew install python-tk@<version>. Linux: covered by python3-tk above
pipx Recommended over pip for Option B pip install pipx (Windows) · brew install pipx (macOS) · sudo apt install pipx (Linux). Then pipx ensurepath and reopen your terminal
Tesseract (optional) OCR for scanned PDFs and images brew install tesseract · Windows installer · sudo apt install tesseract-ocr
UnRAR (optional) Search inside .rar archives brew install unrar · WinRAR · sudo apt install unrar
libpff-python (optional) Search inside Outlook .pst archives (no Windows wheel) macOS/Linux: pip install libpff-python. Windows: convert .pst to .mbox — see TROUBLESHOOTING.md

Everything else installs automatically. pipx install (or pip install) downloads the 18 Python libraries peekdocs needs (PDF reader, Word/Excel/PowerPoint parsers, email reader, and more) plus their transitive dependencies — typically around 200 packages and a few hundred megabytes of disk space. See Dependencies for the full list and what each one does.

Option A: Standalone Download (no Python needed)

Pick this if you don't have Python installed or don't want to install it. No setup — just download and run. (If you already have Python set up, Option B is one command, gives you the CLI and Python API alongside the GUI, and starts noticeably faster — especially on macOS.)

The GUI and CLI standalones are separate downloads. Grab whichever fits how you'll use peekdocs — or both. The GUI is the click-driven interface for interactive search and report viewing; the CLI is for scripting from the terminal, running on a schedule (cron / Task Scheduler), and piping JSON output into other tools. They're independent — installing one doesn't require the other.

Why two binaries instead of one? Each standalone is built with PyInstaller, which freezes its own Python interpreter and every dependency into a single executable. A PyInstaller bundle has one entry point — it can't be both a GUI launcher and a CLI without one carrying the other's weight (the CLI would haul tkinter / customtkinter it never uses; the GUI would carry CLI-only argument-parsing surface). Splitting them keeps each binary small and lets each ship independently. The pipx / pip install path doesn't have this constraint — it drops both peekdocs and peekdocs-gui console scripts into one shared venv from a single command.

Direct GUI downloads (always the latest release):

Platform Download After download
Windows peekdocs-gui-windows.exe Double-click to run. First launch: Windows SmartScreen blocks the .exe with "Windows protected your PC" — click More info (small link near the top of the dialog) → Run anyway (the button that appears). This is expected for unsigned open-source software and does not indicate the app is unsafe.
macOS peekdocs-gui-macos.zip Unzip, open peekdocs-gui.app. First launch: macOS Gatekeeper shows a dialog with only Done / Move to Trash (no Open button). Two ways to bypass — both expected for unsigned open-source software, neither indicates the app is unsafe: (1) System Settings UI: open System Settings → Privacy & Security, scroll down to the message "peekdocs-gui.app" was blocked because it is not from an identified developer, click Open Anyway, then re-launch the app and click Open in the confirmation dialog. (2) Terminal one-liner: xattr -dr com.apple.quarantine ~/Downloads/peekdocs-gui.app, then double-click. Each new download (including upgrades) re-triggers the warning.
Linux peekdocs-gui-linux In the download folder (typically ~/Downloads): cd ~/Downloads && chmod +x peekdocs-gui-linux && ./peekdocs-gui-linux. No first-launch security prompt on Linux.

Why the warnings appear and the full per-platform bypass walkthrough: First-launch security warnings below.

Direct CLI downloads (always the latest release):

Platform Download After download
Windows peekdocs-cli-windows.exe cd $HOME\Downloads, then peekdocs-cli-windows.exe --version (cmd.exe — bare name works) or .\peekdocs-cli-windows.exe --version (PowerShell needs the .\ prefix). First launch: SmartScreen may block the .exe — click More infoRun anyway. For global access from any terminal, see Windows: make peekdocs work from any terminal below the table. PowerShell-specific --% token and .rar/.pst limitations: docs/INSTALLATION.md → CLI on Windows footnotes.
macOS peekdocs-cli-macos.zip Safari auto-unzips → a peekdocs/ folder (the binary is peekdocs/peekdocs; the folder also contains _internal/ with the bundled Python and libraries). cd ~/Downloads && xattr -dr com.apple.quarantine peekdocs && ./peekdocs/peekdocs --version. For global access from any terminal: sudo mv peekdocs /usr/local/lib/peekdocs && sudo ln -s /usr/local/lib/peekdocs/peekdocs /usr/local/bin/peekdocs && sudo xattr -dr com.apple.quarantine /usr/local/lib/peekdocs so peekdocs "query" /path works from any terminal session. The post-move xattr matters — without it Gatekeeper re-verifies on every launch. The folder distribution replaces the older single-binary one because PyInstaller --onedir mode skips the per-invocation self-extraction cost (~5–7s for an unsigned --onefile CLI on macOS dropped to ~1–2s).
Linux peekdocs-cli-linux In the download folder: cd ~/Downloads && chmod +x peekdocs-cli-linux && ./peekdocs-cli-linux --version. Optionally sudo mv peekdocs-cli-linux /usr/local/bin/peekdocs for global access.

Running the CLI from the download folder — the ./ / .\ prefix rule. When you run a downloaded executable from the same folder you're sitting in, most shells require an explicit prefix telling them "look here, not on PATH":

  • macOS: ./peekdocs/peekdocs --version — the unzip produces a folder; the launcher is one level inside (forward slash + dot, then into the folder)
  • Linux: ./peekdocs-cli-linux --version (forward slash + dot)
  • Windows PowerShell: .\peekdocs-cli-windows.exe --version (backslash + dot)
  • Windows cmd.exe: peekdocs-cli-windows.exe --version (bare name works; cmd.exe includes the current directory in its search by default)

The reason: shells search $PATH ($env:Path on Windows) for executables, and the current directory isn't on PATH by default on macOS / Linux / PowerShell (a security default — prevents accidentally running a malicious binary in a folder you cd'd into). The ./ or .\ prefix overrides that. Once you've installed the binary to a folder that is on PATH (/usr/local/bin on macOS / Linux, $HOME\bin on Windows after the steps below), the prefix becomes unnecessary and peekdocs ... works from any directory.

Windows: make peekdocs work from any terminal. Rename the CLI to peekdocs.exe, move it to a folder on your user PATH, and add the folder to PATH. Run this in PowerShell from the download folder:

Rename-Item peekdocs-cli-windows.exe peekdocs.exe
New-Item -ItemType Directory -Force -Path "$HOME\bin" | Out-Null
Move-Item peekdocs.exe "$HOME\bin\"
[Environment]::SetEnvironmentVariable("Path", $env:Path + ";$HOME\bin", "User")

Open a fresh PowerShell window afterward; peekdocs --version then works from any directory.

Or browse the Releases page for older versions, the full asset list (all six GUI + CLI binaries side by side), or release notes. On the GitHub repo page, "Releases" is in the right sidebar under "About" — it's easy to miss if you're not looking for it.

* First-launch security warnings (one-time, per platform). Free, open-source software that hasn't paid for an OS-vendor code-signing certificate triggers a warning on first launch. This is normal and does not mean the software is unsafe.

  • Windows (SmartScreen): Click More infoRun anyway.

  • macOS (Gatekeeper): Recent macOS (Sequoia / Sonoma) shows a warning dialog with only Done and Move to Trash — no Open button. The bypass:

    1. Click Done to dismiss the warning.
    2. Open System Settings → Privacy & Security, scroll down to "peekdocs-gui.app was blocked...", and click Open Anyway.
    3. Re-launch the app and click Open in the final confirm dialog.

    From then on a regular double-click on that copy works. Each new download (including upgrades) re-triggers the warning — the trust is per downloaded file, not per app. The one-line terminal alternative is faster if you upgrade often: xattr -dr com.apple.quarantine ~/Downloads/peekdocs-gui.app. Full walkthrough: docs/INSTALLATION.md → macOS first-launch Gatekeeper. Note: Safari auto-unzips downloaded .zip files, so you'll see peekdocs-gui.app directly in Downloads rather than the peekdocs-gui-macos.zip you clicked — no extra unzip step.

  • Linux: Open a terminal in the folder where the file landed (typically ~/Downloads), then chmod +x peekdocs-gui-linux && ./peekdocs-gui-linux. The ./ prefix is required because the current directory is not on $PATH by default — ./ tells the shell "run the file in this folder." If you moved the file elsewhere, cd there first or run it by absolute path (/path/to/peekdocs-gui-linux).

Upgrading. No need to uninstall the old version first — just download the new version from the same direct download links above and overwrite the existing file (GUI, CLI, or both — whichever you use). Your settings and saved searches live in your home directory, not in the executable — nothing is lost. See Uninstalling below for full removal instructions.

No dependency breakage. The standalone bundles Python, all libraries, and peekdocs into a single file frozen at versions that were tested together — nothing external to upgrade, conflict, or break.

Safe for your computer. No installation option (standalone, pipx, or source) modifies your existing Python, installs system services, writes to the registry, or interferes with any other program.


Done with Option A? Skip ahead to Quick Start. If you have Python installed, Option B below is the better path — one command, faster startup, and you get the CLI and Python API alongside the GUI.

Option B: Quick Install with pipx (for Python users)

If you already have Python set up — or you want the CLI and Python API alongside the GUI — one command installs everything. Works the same on every OS.

pipx install git+https://github.com/exbuf/peekdocs.git    # recommended (isolated venv)
# — or —
pip install git+https://github.com/exbuf/peekdocs.git     # if you prefer pip

These are the first-time install commands. To upgrade later, use pipx upgrade peekdocs (or pip install --upgrade git+https://github.com/exbuf/peekdocs.git). pipx upgrade is cleaner than pipx install --force — it replaces the package's contents in place instead of leaving stale .dist-info directories around (which can desync the reported version from the running code).

After install, peekdocs and peekdocs-gui work from any terminal, any folder, every time — even after restarting your computer. pipx manages the underlying virtual environment for you (pip drops the package into whichever Python environment you used). To uninstall completely: pipx uninstall peekdocs (or pip uninstall peekdocs). See the User Guide for what is and isn't preserved across upgrades.

GUI prerequisite — only if you'll use peekdocs-gui:

  • macOS Homebrew Python: brew install python-tk@3.14 (match your python@<version>)
  • Linux: sudo apt install python3-tk
  • Windows / python.org macOS installer: already included — nothing to do

Niche cases (macOS python3.13 selection, no-git ZIP install, Windows pipx fallback, source install for contributors) are documented in docs/INSTALLATION.md.

Upgrading

Your saved searches, settings, indexes, and reports are stored outside the peekdocs installation — in your home directory and your document folders. Upgrading replaces only the code. These files are never overwritten by an upgrade:

  • ~/.peekdocsrc — your saved settings and preferences
  • ~/.peekdocs_history.json — your search history
  • ~/.peekdocs_bookmarks.json — your bookmarks
  • .peekdocs_collection.json (in each search folder) — your saved searches and search suites
  • .peekdocs.db (in each search folder) — your search index
  • peekdocs_report_*, peekdocs_accumulated_* files — your saved reports

How to upgrade depends on which install method you used:

  • Standalone (Option A): download the new file from the Releases page and replace the old one. No need to uninstall first.
  • pipx (Option B): pipx upgrade peekdocs — replaces the package contents in place without leaving stale .dist-info directories behind. (pipx install --force git+… also works but can accumulate stale dist-info entries that desync the reported version from the running code; pipx uninstall peekdocs && pipx install git+… is the nuclear option if you ever hit that.) Windows note: if either upgrade method fails with "Access is denied" on .pyd / .dll / python.exe files, the existing venv is being held open by a running peekdocs process (or a terminal sitting inside the venv folder). See pipx upgrade on Windows: locked files for the recovery walkthrough. macOS and Linux aren't affected — they let a running process keep using a file that's been replaced.
  • Source install: cd peekdocs && git pull && pip install -e . (see CONTRIBUTING.md).
  • Niche paths (no-git ZIP, Windows pip fallback): see docs/INSTALLATION.md.

Uninstalling

peekdocs doesn't use a system installer — no registry entries, no system services, no kernel extensions. "Uninstalling" just means deleting the executable (standalone) or the Python package (pipx / pip). Your settings, history, bookmarks, saved searches, and indexes are stored in your home directory and search folders — they persist after uninstall so you can reinstall later and pick up where you left off. To wipe those too, see the factory reset paragraph at the end of this section.

How to uninstall depends on which install method you used:

  • Standalone (Option A):
    • Windows: delete peekdocs-gui-windows.exe and/or peekdocs-cli-windows.exe from wherever you saved them (Downloads, Desktop, a folder on PATH, etc.).
    • macOS: drag peekdocs-gui.app from Finder to the Trash. If you put peekdocs-cli on PATH (e.g., /usr/local/bin/peekdocs), sudo rm /usr/local/bin/peekdocs.
    • Linux: delete peekdocs-gui-linux and/or peekdocs-cli-linux from wherever you put them. If either is on PATH, e.g. sudo rm /usr/local/bin/peekdocs.
  • pipx (Option B): pipx uninstall peekdocs — removes the isolated venv cleanly.
  • pip: pip uninstall peekdocs — removes the package from whichever Python environment you installed into.
  • Source install: pip uninstall peekdocs from inside the venv you used. Then rm -rf the cloned repo folder if you no longer need it.

Factory reset (complete wipe). The files listed under Upgrading above are intentionally preserved by uninstall. If you also want those gone — settings, search history, bookmarks, saved searches, indexes, saved reports — delete them manually:

# macOS / Linux
rm -f ~/.peekdocsrc ~/.peekdocs_history.json ~/.peekdocs_bookmarks.json
rm -rf ~/peekdocs_reports
# Plus, in each folder you ever searched:
# rm -f .peekdocs_collection.json .peekdocs.db .peekdocs.db-wal .peekdocs.db-shm
# Windows PowerShell
Remove-Item $HOME\.peekdocsrc, $HOME\.peekdocs_history.json, $HOME\.peekdocs_bookmarks.json -ErrorAction SilentlyContinue
Remove-Item $HOME\peekdocs_reports -Recurse -ErrorAction SilentlyContinue
# Plus, in each folder you ever searched, remove .peekdocs_collection.json and .peekdocs.db*

After that combination, no trace of peekdocs remains on your machine.

Quick Start

Want a quick demo first? Clone this repo and try peekdocs on the bundled samples: cd samples/engineering_test && peekdocs BUILD -r returns 29 hits across multiple source-code and engineering file types (the corpus spans 41 extensions in total). No setup beyond installing peekdocs.

GUI

peekdocs-gui

On first launch, the GUI opens with a Getting Started tab that walks you through your first search. Close it when you're ready to dive in, or skip it and follow these four steps:

  1. Click Browse to select a folder (or Single File to search a specific file)
  2. Type your search terms
  3. Click Run Standard Search
  4. View highlighted matches in the preview pane. To also save a Word report, check DOCX in Advanced Search Options before searching (or HTML, PDF, etc.).

The search bar covers the common case — type your keywords and click Run Standard Search. For more advanced searches, you have two choices: configure Advanced Search Options yourself (regex, fuzzy, Boolean, range queries, and all other settings) — click the ▶ Advanced Search Options header to expand the inline panel in the left pane — or let the Search Wizard do it for you (blue Search Wizard button on the main page, between Run Standard Search and Search Suites): pick a search type from 20 pre-built forms, fill in your values, and click Apply. The wizard also has a separate regex pattern builder with 35 named patterns across 6 categories; it configures Advanced Search Options automatically. The green Search Suites button (run a group of saved searches together) lives on the main screen next to Run Standard Search. The Tools menu in the upper-right also includes Schedule Search, which generates a ready-to-paste cron / Task Scheduler command rather than installing the schedule for you.

The Search tab is split horizontally into a scrollable controls column on the left and a results-preview column on the right, with a draggable sash between them. The right pane carries the search-results headline (files searched · matches · elapsed time), Matched / Excluded count buttons, a Chart popup, and the matches themselves. The left pane carries Steps 1–4, the status row, the report-open buttons, and the collapsible Advanced Search Options panel. The split opens with a slight bias toward the left pane (52%) so the five-wide output-format checkbox row fits at first paint; drag the blue sash to rebalance.

If buttons overlap or text looks too large, use the Text Size dropdown on the bottom-right toolbar to adjust (Normal is recommended).

Terminal

If you used Option A (standalone download) or Option B (pipx), peekdocs is always ready — just open any terminal. If you used the source install for contributors, navigate to the cloned repo folder and activate the virtual environment first:

cd /path/to/peekdocs                 # the folder containing pyproject.toml
source venv/bin/activate             # macOS/Linux (you'll see (venv) in your prompt)
venv\Scripts\activate                # Windows

Tip: Type peekdocs with no arguments to see a handy cheat sheet of all search modes, common options, and cleanup commands — right above your command prompt. Type peekdocs -h for the full reference with all flags, file types, and regex patterns.

Then navigate to your documents and search:

cd /path/to/your/documents
peekdocs budget                      # search for "budget"
peekdocs budget revenue              # OR search (any term)
peekdocs -a budget revenue           # AND search (both terms)
peekdocs -r budget                   # include subfolders
peekdocs -t pdf,docx budget          # only PDFs and Word docs
peekdocs -x "\d{3}-\d{2}-\d{4}"     # regex (9-digit ID with dashes)
peekdocs -e "(budget OR revenue) AND NOT draft"   # Boolean expression
peekdocs -R amount:1000..5000 budget # range query
peekdocs -R date:2024-01-01..2024-12-31 invoice  # date range (also accepts 01/01/2024 format)
peekdocs -P 3 budget acme            # line proximity (terms within 3 lines)
peekdocs --open docx budget          # search and auto-open the .docx report
peekdocs --open html budget          # auto-generate HTML and open in your browser
peekdocs --open csv budget           # auto-generate CSV and open in Excel/LibreOffice
peekdocs --open pdf budget           # auto-generate PDF and open in a PDF viewer
peekdocs --open json budget          # auto-generate JSON and open in a text editor
peekdocs -sa archive --open docx budget  # append to accumulated report and open it
peekdocs -sa archive --open html budget  # append and open accumulated report in browser
peekdocs --clear                    # delete peekdocs_*_results* files in current directory
peekdocs --clear-all                # delete all peekdocs output files (results, saved reports, index)

No matches? First search not turning anything up is common. Try -r to include subfolders, -z for typo-tolerance, drop -W if you had whole-word on (it excludes partial matches like "logger" when searching "log"), or check whether your search terms actually appear in those files by opening one manually. Run peekdocs --list-files to confirm peekdocs sees the files you expect.

Why doesn't the OR match count add up? OR mode counts each matching line ONCE, even when more than one of your terms appears on it. So if bowling alone finds 342 matches and tunick alone finds 23, an OR search for bowling tunick will return fewer than 365 whenever some lines mention both words. For example, if the OR total is 350, that means 15 lines contain both terms — inclusion-exclusion: |A ∪ B| = |A| + |B| − |A ∩ B|. To list those overlap lines, re-run with -a (AND mode) — it returns exactly the intersection. The same explanation lives inside the GUI under Advanced Search Options → ? help → Match counting in OR mode.

If you used the manual install, you'll see (venv) before each command in your terminal — that's normal and means the virtual environment is active.

Results are saved to peekdocs_standard_results.txt in the current directory — the same folder your terminal is in when you run the search. The .txt report is always written and cannot be disabled because the GUI's Results Preview pane and the Matched Files popup both parse it; the matplotlib match-heatmap and other downstream views all read from it too. Every other format is opt-in: peekdocs_standard_results.docx (the highlighted Word report) is produced when DOCX is checked under Advanced Search Options → Output formats in the GUI, or when -o docx is passed on the CLI. CSV / JSON / PDF / HTML work the same way — opt in via the GUI checkbox or -o csv,json,pdf,html. A typical CLI invocation that produces TXT + DOCX is peekdocs -o docx <terms>; to also write HTML, peekdocs -o docx,html <terms>.

All result files are overwritten each time you run a new search. To keep previous results, use -s my_report to save a named copy (saved as peekdocs_report_my_report.txt/.docx so peekdocs never searches its own reports), or --timestamp to add a date/time stamp to each filename so nothing is ever overwritten.

The .docx report opens in whatever app you've set as your OS default for .docx files — Microsoft Word or LibreOffice (free) are common choices. The .txt report works on any computer with no extra software.

To clean up output files: peekdocs --clear (deletes results files) or peekdocs --clear-all (deletes results, saved reports, error log, and index). Neither touches your saved searches or settings.

Run peekdocs -h for the full flag reference with examples. The complete flag list with detailed descriptions is in the User Guide. All flags can be combined freely except: regex (-x), fuzzy (-z), and wildcard (-w) are mutually exclusive (pick one); and expression mode (-e) cannot be combined with AND (-a), exclude (-n), or proximity (-p) since those are built into the expression syntax.

Python API

from peekdocs import search

if __name__ == "__main__":
    result = search(["budget", "revenue"], directory="/path/to/docs")

    print(f"Found {len(result.matches)} matches in {len(result.files_searched)} files")
    for match in result.matches:
        print(f"  {match.filename}:{match.line_num}: {match.text}")

The if __name__ == "__main__": guard is required — peekdocs uses multiprocessing internally, and on macOS and Windows child processes re-import the calling script. Without the guard, the script will crash with RuntimeError on those platforms. See the API Reference for all parameters and options.


Stuck? Run peekdocs --check first — or, if you're using the GUI, open Tools → System Check for the same diagnostic in a window. Either way verifies Python, dependencies, Tesseract, SQLite, and free disk space and tells you what's missing. If the check looks clean but you're still hitting issues, see FAQ & Troubleshooting for common questions and fixes across Windows, macOS, and Linux.

Documentation

Document Description
User Guide Complete reference — GUI, CLI flags, search modes, indexing, file reference
Walkthroughs Seven annotated screenshot tours — same search across three interfaces, Advanced Search Options, Regex Search, Search Suites, Diff Snapshots, Schedule Search, and peekdocs --check
Installation Per-platform Python prerequisites, optional tools (Tesseract, UnRAR, libpff-python), CLI-on-Windows footnotes, and less-common install paths
API Reference Python library API — search() function, parameters, return values
Glossary 85 peekdocs terms: FTS5, regex modes, deterministic, exit codes, Tesseract, jq, SIEM, MSP, network calls, and more
FAQ & Troubleshooting Common questions and solutions for Windows, macOS, and Linux
Security architecture Deep dive for IT and Security teams — data architecture, per-file sensitivity notes, and limitations outside the application's control
Reporting security issues Vulnerability-reporting policy — preferred channel, supported versions, scope, expected response timing
Changelog Version history and release notes
Contributing How to report bugs, suggest features, and submit code

Why peekdocs?

Every search tool — grep, OS file search, cloud AI assistants, enterprise search software — matches text at its core. The differences are in what each one can read, how it presents results, what stays private, and what you can do with the output.

If all you need is to find a word in a plain text file, many search tools work well. If you want to see inside your own files — across 100+ file formats, with context, in a report you can share, without uploading anything — that's what peekdocs was built for.

Why Is peekdocs a Search and Analysis Tool?

peekdocs is a search tool because it helps you find information across PDFs, Office documents, email archives, source code, scanned documents, and 100+ other file types. It is also an analysis tool because it helps you characterize document collections, not just search them. Features such as Duplicate Finder, File Inventory, Large Files, Recent Changes, Protected Files, Diff Snapshots, Bookmarks, and Search History reveal patterns, changes, and characteristics within your files. peekdocs does not interpret results, assign risk scores, or make decisions for you; instead, it gathers and organizes information so you can analyze it yourself. In that sense, peekdocs goes beyond answering "Where is this?" and also helps answer "What do I have?", "What changed?", "What is duplicated?", and "What is taking up the most space?"

Compared with built-in OS search (Windows Search, macOS Spotlight, Linux file managers). OS search is convenient for everyday file discovery. peekdocs is purpose-built for document-search workflows across mixed-format collections — including .pst, .msg, .7z, .rar, .odt, .eml, .mbox, Jupyter notebooks, and scanned PDFs. Results show where each match occurs (filename, line number, surrounding context), and you can run them in Boolean, fuzzy, regex, proximity, or range mode, save them by name, group them into suites, and produce highlighted .docx, .pdf, and .html reports you can save or share. The index is yours to build and refresh on demand, and the same searches work across the GUI, CLI, and Python API.

Compared with cloud AI document tools. Cloud AI tools excel at summarization, question answering, semantic search, and extracting meaning from large document collections — often the right reach for those tasks. peekdocs serves a different purpose: it runs entirely on your computer. For keyword, pattern, date, amount, regex, fuzzy, and proximity searches across mixed-format folders, peekdocs delivers deterministic and repeatable results while keeping your documents local.

peekdocs processes the whole folder in one local pass with no upload step — same engine whether the folder has dozens of files or many thousands. It reads 100+ file types natively, including archives (.zip, .7z, .rar) and Outlook email containers (.pst, .msg, .mbox) opened in place, and OCRs scanned PDFs and images when you enable the -O flag. The size of the corpus, the connection speed, and the formats involved are not constraints peekdocs has to plan around — it works on whatever's on disk, however large, in any of the formats it supports.

peekdocs's JSON output is also the deterministic keyword-retrieval half of a fully-local privacy-preserving LLM workflow. Use peekdocs to narrow a 10,000-file corpus to the 30 files containing the exact terms, dates, or regex patterns you care about, then feed those 30 (not the 10,000) to a local model — Llama 3, Mistral, Gemma, or whatever you run via Ollama, llama.cpp, or LM Studio — for summarization or Q&A. peekdocs doesn't produce embeddings; it returns precise file paths, line numbers, and optional SHA-256 content fingerprints — the structured inputs a local LLM needs to ground its citations against your actual source files. Nothing leaves your machine.

Compared with grep. For plain-text search in a terminal, grep is excellent — use it. peekdocs is built for mixed-format document collections (PDF, Word, Excel, PowerPoint, email, OCR-able scans), with highlighted reports, saved searches, search suites, regex collections, indexing, a GUI, and a Python API. Both can live in your toolkit; they're designed for different jobs.

Capability grep peekdocs
Plain text files (.txt, .log, .csv) Yes Yes
PDF text extraction Requires external conversion (pdftotext) Built in
Word documents (.docx) Requires external conversion Built in
Excel spreadsheets (.xlsx) Requires external conversion Built in
PowerPoint presentations (.pptx) Requires external conversion Built in
Email files and archives (.eml, .msg, .mbox, .pst) Requires external conversion Built in
OCR (scanned PDFs and images) Requires external OCR pipeline Built in (-O)
EPUB, RTF, ODT, ODS, ODP, archives Format-specific tools required Built in
Source code (40+ extensions) Yes Yes
Highlighted .docx / .pdf / .html reports No Yes
CSV and JSON export Requires scripting Built in (-o csv,json)
Boolean expressions Requires shell composition Yes (-e "A AND (B OR C)")
Proximity search Requires custom scripting Yes (-p 5)
Fuzzy / typo-tolerant matching Requires specialized tools Yes (-z)
Range queries (amounts, dates) Requires custom scripting Yes (-R amount:1000..5000)
Saved searches and suites No Yes
Regex collections (batch pattern sets) Requires scripting Built in (--regex-collection)
Search index with on-demand refresh Requires separate indexing tool Built in (--index)
Consistent behavior across Windows, macOS, and Linux Varies (GNU vs BSD grep) Same flags on all three platforms
GUI No Yes
Python API No Yes

What peekdocs Is Not

In one line: peekdocs is a search utility — not a judgment engine, not a compliance certifier, not a forensic platform, not a threat-assessment tool.

peekdocs is a general-purpose local text-search application. To set honest expectations, here are the things it is not, alongside the kind of tool you would reach for instead:

  • Not a security or threat-detection product. peekdocs matches the text patterns you give it. It does not score risk, classify findings, recognize malware, or judge whether a match is good or bad — that's your call. For threat detection, reach for a dedicated security product.
  • Not a substitute for human review. peekdocs surfaces matches; it does not decide which matches matter. Treat its output as a starting point for code review, document review, or whatever judgment task brought you here.
  • Not a forensic or evidence-collection system. The optional SHA-256 with --hash is a content fingerprint for snapshot comparison, not notarized, tamper-evident, or court-admissible evidence handling. For chain-of-custody workflows, reach for a dedicated forensic suite.
  • Not an AI or summarization tool. peekdocs does not infer, summarize, paraphrase, answer questions, or reason about what your documents say. It finds matches; that's it. For summarization or question-answering, use an LLM-based system.
  • Not a file manager or backup tool. peekdocs reads your files; it never moves, modifies, renames, syncs, archives, or version-controls them. It writes its own report and state files — every one named with the peekdocs_ prefix (visible outputs) or .peekdocs prefix (hidden user-state / per-folder dotfiles), with no exceptions — and nothing else.
  • Not networked. peekdocs operates only on files mounted as local paths. It does not crawl websites, hit APIs, read SharePoint or Confluence over a network, or talk to a remote search index. A mapped network drive that appears as a regular folder works; everything else does not.
  • Not a search-index server or enterprise document platform. peekdocs runs as a single-user CLI / GUI / library on one machine. It does not host a shared indexable corpus for many users, manage permissions or roles, version content, or expose an HTTP API for other systems to query. For multi-user document management, reach for Elasticsearch / OpenSearch / Solr (search servers) or SharePoint / M-Files / Documentum / Box (enterprise document platforms).
  • Not a high-assurance or safety-critical tool. peekdocs is offered under the MIT License "as is" without warranty. It is not designed for environments where an incorrect or missed match could cause significant harm. Users remain solely responsible for how they use and interpret its output.

For what peekdocs is, see Feature Highlights and the User Guide.

Performance

Test machine: MacBook Pro, Apple M-series, 24 GB RAM, SSD, Python 3.13. peekdocs used 7 of 14 cores (its default is half; adjustable in Advanced Search Options). Your results will vary depending on CPU, RAM, disk type (SSD vs hard drive), and whether files are local or on a network drive.

Mixed-format test (realistic documents)

The file mix represents a typical home or small business folder:

File type % of files Examples
PDF 35% Bank statements, receipts, tax forms, manuals
Word (.docx) 25% Letters, resumes, reports, contracts
Plain text (.txt, .csv, .log) 15% Notes, data exports, logs
Excel (.xlsx) 10% Budgets, lists, financial records
Email (.eml) 8% Exported correspondence
PowerPoint (.pptx) 5% Presentations
Other (.html, .rtf) 2% Saved web pages, legacy docs

Results (files stored locally on SSD). Each test folder contained the mix of file types shown above. Individual file sizes varied (PDFs 50–500 KB, Word docs 20–200 KB, text files 1–50 KB, etc.). "Total size" is the entire folder.

Files Total folder size Search time
1,000 13 MB ~1 second (no index)
10,000 133 MB ~5 seconds (no index)
50,000 663 MB ~22 seconds (no index)
105 real Word docs 1,878 MB ~4 seconds without index, 0.24 seconds with index

10× more files doesn't mean 10× longer — peekdocs processes files in parallel across multiple CPU cores.

Plain-text stress test

We also tested with small .txt files (~113 bytes each) to see how peekdocs handles extreme file counts:

Files Search time
10,000 1.4 seconds
50,000 4.1 seconds
1,000,000 90 seconds

What does testing 1,000,000 files prove? These were tiny text files (~113 bytes each), not real documents — nobody has a million small .txt files. The test confirms that peekdocs doesn't crash, doesn't run out of memory, and produces correct results at extreme scale. It's a stress test of the software's stability, not a realistic performance benchmark. The mixed-format results above are what real-world performance looks like.

Should you build an index?

Direct search is fast enough for most folders — just click Run Standard Search. An index helps when you have large files or search the same folder repeatedly:

Situation Index helps? Why
Large files (PDFs, Word, Excel) Yes Skips expensive parsing — about 18× faster on the 105-Word-doc test in the Performance section
Same folder searched repeatedly Yes Pre-pays parsing cost once
Files on a network drive Yes Reads local index instead of files over the network
Small files, small folder No Direct search is already fast enough
One-time search you won't repeat No Build time won't be recouped

To try it: open Tools → Indexes, click Build Index(es), or run peekdocs --index.

First-run timing and the banner notice

The first time peekdocs searches a folder, it builds the search index by reading every file once. This can take from a few seconds (small folders) to a few minutes (thousands of files, large PDFs, or scanned documents). Every search after that uses the index and runs in milliseconds.

To make this expectation clear up front, peekdocs prints a short notice in the CLI banner when the search folder has no index yet:

Note: no search index for this folder yet — the first search builds
  one (may take longer); subsequent searches are much faster.
  Use --no-index to skip indexing entirely.

The notice is shown only when it's relevant — peekdocs respects every existing CLI contract:

Scenario Notice shown?
Cold folder (no .peekdocs.db) — interactive search ✓ shown
Warm folder (index exists) — not shown
--no-index flag passed — not shown
Non-search command (--check, --runs, --diff, --list-files, --clear*, --index*, --config) — not shown
Quiet mode (-q or -qq) — banner suppressed entirely — not shown
--stdout JSON output mode — JSON pipeline stays clean — not shown
--runs --json / --diff --json — machine-parsed output stays clean — not shown

Folder detection is -d/--directory-aware, so running peekdocs -d /some/other/folder TODO checks that folder, not the current directory.

If you'd rather avoid indexing entirely, add --no-index to your CLI command or uncheck Use Index in the GUI. Searches will then read files directly each time — fine for one-off searches, slower for repeated searches in the same folder. See the Why is my first search slow but later searches are fast? FAQ entry for additional notes including the 2>/dev/null idiom for absolutely silent automation.

Cold-cache first search even with the index already built. Once the index exists, a fresh terminal session's first search is still slower than the next — typically a few seconds vs. half a second — and there's no rebuild involved. That's the OS filesystem cache being cold for the .peekdocs.db file (often hundreds of MB), Python interpreter startup paid by each fresh invocation, and the refresh_index os.stat() pass hitting disk on its first walk. After the first search in a session, peekdocs is sub-second. The same FAQ entry above covers this in more detail along with a way to pre-warm the cache via a scheduled job.

Network folders: If your files are on a network drive, searches will be slower because every file must be read over the network. Building an index is strongly recommended — the first build is slow, but all subsequent searches query the local index instead.

Why Python? Python was chosen because it has mature, well-established libraries for every file format peekdocs supports — PyMuPDF for PDFs, python-docx for Word, openpyxl for Excel, python-pptx for PowerPoint, and dozens more. In C++ or Rust, equivalent libraries either don't exist or would require years of integration work. Python also runs on Windows, macOS, and Linux without recompilation, installs with a single pip command (no compiling from source), and produces readable open-source code that anyone can inspect or extend. The Python API means any Python programmer can call peekdocs directly from their own scripts. As for speed: the performance-critical work — PDF decoding, ZIP decompression, regex matching — is handled by C-backed libraries under the hood. Python orchestrates; C does the heavy lifting. Multiprocessing (separate OS processes, not threads) means Python's GIL (Global Interpreter Lock — a concurrency limitation) is not a factor.

Platform Notes

Tested on: macOS (development machine), Windows 10/11, and Linux Mint 22.3 (Cinnamon) in a VirtualBox VM on Windows. The CLI and GUI work on all three platforms.

  • High-DPI displays (4K monitors) — if buttons overlap or text looks too large, use the Text Size dropdown on the bottom-right toolbar to adjust. Normal is recommended for most screens
  • Antivirus software (Windows) — some antivirus programs flag Python scripts as suspicious. If peekdocs is blocked, add your Python installation or the peekdocs folder to your antivirus allow list
  • Files locked by other programs (Windows) — Windows locks files that are open in another program. If peekdocs reports "permission denied" on a file, close the program that has it open and search again. Errors are logged to peekdocs_errors.log
  • Corporate firewalls — if pip or pipx can't download packages, use the Standalone Download (no Python, no network needed beyond the initial download) or the ZIP-based pipx install documented in docs/INSTALLATION.md
  • macOS file picker vs Windows — on macOS, the file picker includes a preview panel; on Windows, it does not — this is an OS difference, not peekdocs
  • Linux GUI requires python3-tk — the CLI works without it, but peekdocs-gui needs tkinter. Install with sudo apt install python3-tk (see Prerequisites)

File Handling

peekdocs handles a wide range of real-world file issues automatically on all platforms:

Issue Windows macOS Linux What happens
Word/Excel lock files (~$) Yes Yes Rare Silently skipped
System files (Thumbs.db, .DS_Store) Yes Yes Silently skipped
Temp files (~) Yes Yes Yes Silently skipped
Symlinks Rare Yes Yes Silently skipped
Password-protected archives Yes Yes Yes Reported with clear message
Cloud-only placeholders (OneDrive, iCloud) Yes Yes Rare Reported: "download the file first"
Path length limit (260 chars) Yes Files in archives silently skipped
Raw .gz files (not tar) Yes Yes Yes Decompressed and searched
SSL .key files Yes Yes Yes Detected as non-Keynote, skipped
BOM in text files Common Rare Rare Stripped automatically
macOS resource forks (._) Yes Silently skipped
Named pipes / sockets Possible Yes Detected via stat(), skipped
Virtual filesystems (/proc, /sys) Yes Excluded from recursive search
Corrupted files Yes Yes Yes Logged to error log, search continues

See File-handling details by platform in the User Guide for the reasoning behind each row and platform-specific behavior. For installation and runtime gotchas, see TROUBLESHOOTING.md.

Preparing Your Documents for Searching

Most digital files (PDFs from banks, Word docs, emails, spreadsheets) are already searchable — just point peekdocs at the folder and search. No preparation needed.

For paper documents (tax returns, receipts, old letters), you'll need to scan them first:

  1. Scan at 300 DPI — this is the sweet spot for text recognition. Lower resolutions produce poor OCR results. Most scanners default to 300 DPI.
  2. Save as searchable PDF — modern scanners with built-in OCR (like the Fujitsu ScanSnap) automatically embed a text layer in the PDF. peekdocs reads these directly — no OCR flag needed.
  3. If your scanner doesn't have OCR — save as PDF, JPG, or PNG. peekdocs can still search these using its OCR feature (enable the OCR checkbox in the GUI or use the -O flag in the CLI). Requires Tesseract to be installed.
  4. Already have image-only PDFs? If you have a backlog of scans without a text layer, ocrmypdf (free, open-source, runs locally) adds a text layer in place. Install with brew install ocrmypdf (macOS), pipx install ocrmypdf (Windows), or sudo apt install ocrmypdf (Linux), then run ocrmypdf input.pdf input.pdf (same path twice = convert in place). Batch a folder with for f in *.pdf; do ocrmypdf --skip-text "$f" "$f"; done--skip-text leaves already-searchable PDFs alone, so it's safe to re-run. Once converted, peekdocs finds them instantly without the -O flag. peekdocs itself never modifies your PDFs; ocrmypdf is a separate tool you opt into for permanent conversion.
  5. Organize by topic, not by date — folders like Tax Returns, Insurance, Receipts make it easier to target searches. But peekdocs also works fine with one big folder and recursive search.
  6. Phone camera works too — take a photo of a document and save it as JPG or PNG. peekdocs can OCR it. For best results, photograph in good lighting with the document flat and square in the frame.

Consider going paperless. Scanned PDFs are widely accepted for tax and financial records — the IRS has accepted digital records since 1997, and banks, brokerages, and the IRS itself deliver documents as PDFs. Scan your paper receipts and tax returns, then organize them into folders. Once digitized, peekdocs can search years of documents in seconds — no more digging through shoeboxes. (Consult your tax advisor for your specific situation.)

Tip: Before selling or donating a computer, search your entire documents folder for sensitive data — passwords, account numbers, and personal information you may have forgotten about.

Questions and troubleshooting

Common questions, installation gotchas, and platform-specific issues are collected in docs/TROUBLESHOOTING.md — ~90 entries covering search behavior, indexes, OCR, scheduling, email archives, network drives, uninstall steps, PDF report caveats, and more.

Quick diagnostic: run peekdocs --check (CLI) or open Tools → System Check (GUI). Both report your Python version, dependency status, Tesseract availability, SQLite version, and free disk space — most install-time issues resolve there.

Found a bug or have a feature idea? Open an issue on GitHub.

Glossary

The full glossary of peekdocs terms (FTS5, regex modes, deterministic, exit codes, Tesseract, jq, SIEM, MSP technician, and 85 entries in all — including a list of common Python networking libraries peekdocs deliberately does not use) lives in docs/GLOSSARY.md.

For IT and Security Teams

If you're evaluating peekdocs for your organization, here are the answers to the questions your security team will ask:

Question Answer
Does it send data anywhere? No. peekdocs has no network calls, no telemetry, no tracking, no analytics, no phone-home. It never connects to the internet. All processing happens locally on the user's machine.
Does it store what it finds? Yes — results are written to disk as a .txt report (always written, used internally by the GUI preview pane and chart views). Optional formats — DOCX, CSV, JSON, PDF, HTML — are opt-in via the Advanced Search Options checkboxes or -o docx,csv,json,pdf,html on the CLI. These files contain matched text from your documents. Use Delete on Close to automatically remove them when you close the app, or Wipe Session (Tools → Clear Files) to remove them immediately. Cloud-output guard: before every write, peekdocs checks whether the output directory is inside a cloud-synced folder (iCloud Drive, OneDrive, Google Drive, Dropbox). If it is, the GUI shows a modal (redirect to ~/peekdocs_reports / write anyway / cancel); the CLI aborts with exit 2 unless --allow-cloud-output is passed. Turn on the sticky Redirect cloud-synced output paths to ~/peekdocs_reports checkbox in Advanced Search Options (or set redirect_cloud_output=true in ~/.peekdocsrc) to silently redirect without prompting — recommended for auditors and consultants who want the no-cloud confidentiality check always applied.
What about the search index? The optional search index (.peekdocs.db) is a SQLite database that contains the extracted text of every indexed file — this means it holds a searchable copy of your document content, including any sensitive data in those documents. Treat the index file with the same care as the documents themselves. The index is never required (uncheck "Index" to search files directly), and Wipe Session (Tools → Clear Files) deletes the index along with all result files, preview content, and search history. If you index a folder containing sensitive documents, consider deleting the index when you're done.
Can it access files the user can't? No. peekdocs runs with the user's own file permissions. It cannot read files the user doesn't already have access to. It does not elevate privileges or bypass OS security.
What kind of tool is it? A general-purpose local text search application. It reads documents you point it at, reports what it found, and writes nothing else. See Disclaimer.
What does it install? Python packages only — no system services, no drivers, no registry entries, no background processes. It runs when launched and stops when closed.
Can it modify or delete user files? No. peekdocs only reads user files. It creates its own report and state files — every one named with the peekdocs_ prefix (visible outputs) or .peekdocs prefix (hidden user-state / per-folder dotfiles), with no exceptions — but never modifies, moves, or deletes any user documents.
Is the source code available? Yes. Fully open-source under the MIT License. Available for audit at github.com/exbuf/peekdocs.
How is it installed? Via pipx from the public GitHub source (pipx install git+https://github.com/exbuf/peekdocs.git; upgrade with pipx upgrade peekdocs) — fully auditable, no unsigned executables required. (PyPI upload is planned.)

For the deep dive — every file peekdocs writes (path, contents, sensitivity rating, cleanup), plus a documented list of risks that are outside the application's control (process arguments, swap space, force-kill, backup software, etc.) — see docs/SECURITY.md. To report a suspected vulnerability, see SECURITY.md at the repository root.

Testing

Unit tests — 648 pytest tests that verify correctness: exact match counts, error messages, edge cases, argument validation, regex patterns, expression parsing, range queries, and more.

pytest tests/ -v

Integration test — end-to-end runs of every search mode and flag combination. Verifies that flag combinations run without crashing, all output formats are generated, file type coverage across 100+ sample files is reported, and match counts are confirmed stable. Results are saved to peekdocs_global_test_results.txt. The bash script is run on macOS and Linux, the PowerShell script on Windows, before each release. See the script headers for details.

cd samples/test-files
bash peekdocs_global_test_unix.sh "test file for peekdocs"    # macOS / Linux
# Windows: powershell -ExecutionPolicy Bypass -File peekdocs_global_test_windows.ps1 "test file for peekdocs"

Contributing

Ideas, bug reports, and pull requests are welcome. See CONTRIBUTING.md for details. PRs require a Developer Certificate of Origin sign-off — one flag on git commit -s; full how-to in CONTRIBUTING.md.

If peekdocs saves you time, star the repo and share feedback — it helps others discover the tool.

Author

Built by Robert D. Schoening — electrical engineer, U.S. software patent holder, and independent developer. Developed with assistance from Claude Code by Anthropic. All architecture, review, testing, and maintenance performed by the author.

Why I built it. I built peekdocs to solve a problem I had myself: searching large collections of mixed-format documents locally, privately, and efficiently. It also became an opportunity to learn AI-assisted software development and explore what a single developer can build with today's tools. After relying on it in my own workflow, I decided to share it as free and open-source software under the MIT License.

Disclaimer

peekdocs is provided as a general-purpose local text-search tool under the MIT License, offered "as is" without warranty of any kind.

Regex Search performs pattern matching against text. Results depend entirely on the patterns the user supplies, and may include false positives or miss content that does not match those patterns. Review results in context before making decisions.

The tool is not designed or intended for high-assurance or safety-critical use cases. Users remain solely responsible for how they use and interpret its output.

License

Copyright (c) 2026 Robert D. Schoening. peekdocs's own source code is licensed under the MIT License.

Note on dependencies

peekdocs depends on a number of third-party Python libraries, each with its own license. End users running peekdocs are not affected by this — the AGPL and similar copyleft terms govern distribution and modification, not use. A user who installs peekdocs to search their own documents triggers no obligations.

peekdocs's dependency tree includes a mix of permissive (MIT / BSD / Apache 2.0 / ISC / CC0) and copyleft (LGPL / GPL / AGPL) licenses. The most significant ones to be aware of are:

  • PyMuPDF (the PDF reader) — AGPL v3 or a commercial license from Artifex Software
  • EbookLib (the EPUB reader) — AGPL v3 (no documented commercial-license alternative)
  • extract-msg (Outlook .msg email reader) — GPL
  • py7zr, fpdf2, and the optional libpff-python — LGPL (weak copyleft, generally permits proprietary use through dynamic linking)

For the full per-library license listing — including every direct dependency declared in pyproject.toml, grouped by license category, with upstream links — see THIRD_PARTY_NOTICES.md.

Developers integrating peekdocs into derivative work should be aware that the dependency chain transitively carries AGPL / GPL / LGPL obligations. Three common scenarios:

  • Your derivative work is open-source under an AGPL-compatible license. Straightforward — all licenses coexist.
  • Your derivative work is closed-source or under a permissive license that's incompatible with AGPL (MIT, BSD, Apache 2.0, etc.). You have three practical options: (a) accept that the combined work falls under AGPL terms, (b) acquire a commercial PyMuPDF license from Artifex Software for the PDF-reader piece and avoid the .epub reading code path entirely (since EbookLib has no commercial-license alternative), or (c) vendor or replace these libraries with permissively-licensed alternatives where your use case allows.
  • Internal-only company use without distribution. Generally fine. Copyleft obligations are triggered by distribution / conveyance, not by internal use.

peekdocs makes no representations about license compatibility in your downstream context — consult your own counsel for derivative-work questions.

About

Document search and analysis across 100+ file types — offline, private, OCR-enabled, with highlighted reports, regex, Boolean, fuzzy, proximity, wildcard, and search suites. Windows, macOS, Linux. GUI, CLI, and Python API. Free and open-source (MIT).

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages