A Rust CLI tool that recursively discovers Git repositories, captures state changes, generates diffs, extracts code elements with full snippets, and produces security-focused reports for code review and audit workflows.
- Repository Discovery: Recursively scan directories for Git repos with configurable filters
- State Tracking: Capture pre/post-pull state with commit hashes, messages, and dirty detection
- Diff Generation: Automatic N vs N-1 and historical diff creation with file manifests
- Element Extraction: Parse diffs to identify functions, structs, classes, imports, and more across 10+ languages
- Code Snippets: Extract full before/after code with boundary detection and context windows
- Security Tagging: 18 built-in security patterns (crypto, auth, secrets, SQL injection, XSS, etc.)
- Multi-Format Reports: JSON, Markdown, text, and SARIF outputs with cross-repo security overview
- Branch-Diff Mode: Diff any two refs (branches, tags, commits) in a single repo β ideal for PR reviews
- Performance: Parallel processing with progress bars, LRU caching, and incremental mode
- Installation
- Quick Start
- Usage
- Report Structure
- Configuration
- Architecture
- Testing
- Documentation
- Contributing
git clone https://github.com/Teycir/DiffCatcher.git
cd DiffCatcher
cargo build --release
./target/release/diffcatcher --help- Rust 1.70+
- Git 2.0+
# Scan all repos in a directory (fetch-only, no modifications)
diffcatcher ~/projects
# Pull updates and generate security report
diffcatcher ~/projects --pull -o ./report
# Diff two branches in a single repo (PR review mode)
diffcatcher ./my-repo --diff main..feature/auth -o ./pr-report
# Generate SARIF output for GitHub Code Scanning
diffcatcher ~/projects --summary-format sarif,json -o ./report
# Dry run to see what would be scanned
diffcatcher ~/projects --dry-run
# Fast scan with 8 parallel workers
diffcatcher ~/projects -j 8 --quiet# Scan with default settings (fetch-only)
diffcatcher <ROOT_DIR>
# Custom output directory
diffcatcher ~/projects -o ./my-report
# Include nested repos and follow symlinks
diffcatcher ~/projects --nested --follow-symlinks
# Skip hidden directories
diffcatcher ~/projects --skip-hidden# Fetch only (default - no working tree changes)
diffcatcher ~/projects
# Actually pull changes
diffcatcher ~/projects --pull
# Force pull with stash/pop for dirty repos
diffcatcher ~/projects --pull --force-pull
# Use rebase strategy
diffcatcher ~/projects --pull --pull-strategy rebase
# Skip fetch/pull entirely (historical diffs only)
diffcatcher ~/projects --no-pull# Skip element extraction (raw diffs only)
diffcatcher ~/projects --no-summary-extraction
# Extract elements but skip code snippets
diffcatcher ~/projects --no-snippets
# Adjust snippet context and limits
diffcatcher ~/projects --snippet-context 10 --max-snippet-lines 300
# Limit elements per diff
diffcatcher ~/projects --max-elements 1000# Skip security tagging
diffcatcher ~/projects --no-security-tags
# Include test files in security analysis
diffcatcher ~/projects --include-test-security
# Use custom security patterns
diffcatcher ~/projects --security-tags-file ./custom-patterns.jsonDiffCatcher can auto-load project-local configuration from:
<ROOT_DIR>/.diffcatcher.toml(default)- a custom file via
--config <FILE> - disabled with
--no-config
Example:
output = "reports-local"
no_pull = true
history_depth = 2
summary_formats = ["json", "txt"]
no_security_tags = false
[plugins]
security_pattern_files = ["plugins/security-extra.json"]
extractor_files = ["plugins/extractors.json"]CLI flags still override config values when explicitly set.
DiffCatcher supports two plugin types:
- Security pattern plugins via
--security-plugin-file <FILE>(repeatable) - Extractor plugins via
--extractor-plugin-file <FILE>(repeatable)
Security plugin format matches --security-tags-file JSON (version, mode, tags).
Extractor plugin format:
{
"version": 1,
"extractors": [
{
"name": "policy-rule",
"kind": "Config",
"regex": "^policy\\s+([A-Za-z_][A-Za-z0-9_]*)"
}
]
}# Diff two branches in a single repo
diffcatcher ./my-repo --diff main..feature/auth
# Diff specific commits
diffcatcher ./my-repo --diff abc123..def456
# Diff with SARIF output for CI integration
diffcatcher ./my-repo --diff origin/main..HEAD --summary-format sarif -o ./pr-reportThe --diff BASE..HEAD flag skips repository discovery and fetch/pull β it directly diffs two refs (branches, tags, or commit SHAs) and runs the full extraction + security tagging pipeline on the result.
# Generate SARIF alongside other formats
diffcatcher ~/projects --summary-format sarif,json,md
# SARIF-only for CI/CD upload
diffcatcher ~/projects --summary-format sarif -o ./reportWhen sarif is included in --summary-format, a results.sarif file is written to the report root. This file follows the SARIF 2.1.0 standard and integrates with GitHub Code Scanning, VS Code SARIF Viewer, Azure DevOps, and other SARIF-compatible tools.
# Incremental mode (skip unchanged repos)
diffcatcher ~/projects --incremental -o ./report
# Filter by branch pattern
diffcatcher ~/projects --branch-filter "main"
# Adjust history depth
diffcatcher ~/projects --history-depth 5
# JSON output for CI/CD
diffcatcher ~/projects --quiet --json > result.json
# Verbose output with discovered paths
diffcatcher ~/projects --verbose<report_dir>/
βββ summary.json # Global summary
βββ summary.md # Markdown summary
βββ results.sarif # SARIF 2.1.0 output (when --summary-format sarif)
βββ security_overview.json # Cross-repo security aggregation
βββ security_overview.md
βββ <repo-name>/
β βββ status.json # Repo state
β βββ pull_log.txt
β βββ diffs/
β βββ diff_N_vs_N-1.patch # Raw unified diff
β βββ changes_N_vs_N-1.txt # File manifest
β βββ summary_N_vs_N-1.json # Element extraction
β βββ summary_N_vs_N-1.md
β βββ snippets/
β βββ 001_validate_token_ADDED.rs
β βββ 002_check_permissions_BEFORE.rs
β βββ 002_check_permissions_AFTER.rs
β βββ 002_check_permissions.diff
βββ ...
| Flag | Default | Description |
|---|---|---|
-o, --output |
./reports/<timestamp> |
Report output directory |
-j, --parallel |
4 |
Concurrent repo processing |
-t, --timeout |
120 |
Git operation timeout (seconds) |
-d, --history-depth |
2 |
Historical commits to diff |
--snippet-context |
5 |
Context lines around changes |
--max-snippet-lines |
200 |
Max lines per snippet |
--max-elements |
500 |
Max elements per diff |
--diff |
β | Diff two refs in a single repo (BASE..HEAD) |
--summary-format |
json,md |
Output formats: json, md, txt, sarif |
See diffcatcher --help for all options.
Create a JSON file with custom patterns:
{
"version": 1,
"mode": "extend",
"tags": [
{
"tag": "pii-handling",
"description": "PII data processing",
"severity": "High",
"patterns": ["ssn", "social_security", "passport"]
}
]
}Use with --security-tags-file ./patterns.json
src/
βββ cli.rs # Argument parsing
βββ scanner.rs # Repository discovery
βββ git/ # Git operations
β βββ commands.rs # Git wrappers
β βββ state.rs # State capture
β βββ diff.rs # Diff generation
β βββ file_retrieval.rs
βββ extraction/ # Element extraction
β βββ parser.rs # Unified diff parser
β βββ elements.rs # Element detection
β βββ snippets.rs # Code snippet extraction
β βββ boundary.rs # Bracket/indentation tracking
β βββ languages/ # Language-specific patterns
βββ security/ # Security tagging
β βββ tagger.rs # Pattern matching
β βββ patterns.rs # Built-in patterns
β βββ overview.rs # Cross-repo aggregation
βββ report/ # Report generation
βββ writer.rs # Directory structure
βββ json.rs # JSON serialization
βββ sarif.rs # SARIF 2.1.0 output
βββ markdown.rs # Markdown formatting
βββ snippet_writer.rs
# Run all tests
cargo test
# Run specific test suite
cargo test security_tagger
# Run with output
cargo test -- --nocaptureTest coverage includes:
- Unit tests for extraction, security tagging, boundary detection
- Integration tests for state capture, diff generation, reports
- Golden-file tests for extraction accuracy
- Edge case tests (detached HEAD, bare repos, single-commit)
# Compile benchmark binaries
cargo bench --no-run
# Run benchmark harness
cargo bench --bench core_benchBenchmark source lives in benches/core_bench.rs and tracks parser/extraction throughput.
GitHub Actions workflows are included:
.github/workflows/ci.yml: format check, clippy, tests, bench build.github/workflows/release.yml: tag-based release packaging and GitHub release publishing
- Plan.md - Full specification (v1.2)
- Roadmap.md - Implementation roadmap and progress
- Security patterns reference (see
src/security/patterns.rs)
All modules include comprehensive inline documentation. Key modules:
src/extraction/parser.rs- Unified diff parser with hunk extractionsrc/extraction/elements.rs- Language-aware code element detectionsrc/extraction/snippets.rs- Full code snippet extraction with boundary detectionsrc/security/tagger.rs- Security pattern matching enginesrc/git/commands.rs- Git operation wrappers
Generate full API docs:
cargo doc --open#rust #git #security #code-review #diff-analysis #static-analysis #devops #cli-tool #audit #vulnerability-detection #code-quality #snippet-extraction #parallel-processing #security-scanning
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure
cargo testpasses - Submit a pull request
MIT License - see LICENSE file for details
- Author: Teycir Ben Soltane
- Email: teycir@pxdmail.net
- Website: teycirbensoltane.tn