file-deduplication

Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.

metadata text-extraction full-text full-text-search ravendb ediscovery indexing-engine file-format-detection data-breach file-deduplication pii information-governance-catalog personally-identifiable-information archive-extractor pii-detection file-identification full-text-extraction document-ingestion information-governance

Updated May 28, 2024

sph-mn / sdupes

Star

fast, parallel duplicate file detection

fast command-line file-deduplication statically-linked duplicate-file-finder duplicate-file-detection

Updated Nov 18, 2025
C

9001 / smf

Sponsor

Star

size-match folders

file-deduplication

Updated Sep 4, 2022
Python

VersBinarii / deedoo

Star

File deduplicator

rust tool rust-lang cli-app cli-tool file-deduplication rust-tools

Updated Apr 27, 2022
Rust

koro666 / dedupe

Star

Identical File Hardlinker

linux freebsd deduplication file-deduplication

Updated May 24, 2021
C

gitzain / clonezapper

Star

CloneZapper is a Python script that hunts down identical files within a directory and its subdirectories, ruthlessly eliminating them!

disk-space file-management directory-utilities file-deduplication file-comparison

Updated Jun 1, 2024
Python

levitation-opensource / DuplicateFileFinder

Star

Duplicate file finder and de-duplicator. A tool that detects duplicate files and replaces them with symlinks to a shared file in a special shared files folder.

deduplication deduplicator file-deduplication duplicate-finder duplicate-file-finder symlinker

Updated Mar 28, 2026
C#

Этот проект представляет собой мощный инструмент для поиска и анализа дублирующихся файлов в указанной директории. Программа позволяет эффективно выявлять одинаковые файлы на основе их содержимого, используя алгоритм хеширования SHA-256. Она поддерживает настройку параметров, таких как минимальный размер файла для проверки и игнорирование определен

python hashing productivity multithreading data-deduplication file-system sha256 file-management system-utility cli-tool dev-tools file-deduplication file-comparison disk-cleanup command-line-utility duplicate-file-finder

Updated Feb 14, 2025
Python

dwin / goDeduplicate

Star

Golang Deduplication Utility Library

golang golang-library deduplication file-deduplication

Updated Dec 29, 2017
Go

omkarium / clonehunter

Star

A simple command line utility that identifies groups of identical files and displays them to the console.

cli-app file-deduplication

Updated Mar 5, 2025
Rust

daedalus / duperemover

Sponsor

Star

duperemover: a file deduplication tool

bloom-filter hashes deduplication file-deduplication

Updated Mar 25, 2026
Python

ngpepin / lshash

Star

A corpus-hygiene utility for RAG data pipelines that identifies duplicate content risk, quantifies duplication with actionable statistics, and supports controlled remediation before indexing. It enables staged audit-then-cull workflows that improve retrieval quality, reduce embedding/indexing cost, and strengthen governance in knowledge curation.

bash dotnet knowledge-management data-quality file-deduplication data-curation data-governance rag document-deduplication retrieval-augmented-generation corpus-hygiene

Updated May 3, 2026
Shell

Skippia / twin-scanner-cli

Star

Find duplicate files in multiple folder(s) scanning .txt or/and .torrent files and depending on the selected mode (readonly: true | false) get information about duplicated files /+ extract them into new folders

nodejs cli typescript functional-programming file-scanner inquirer fp-ts file-deduplication eslint-plugin-functional inquirer-fuzzy-path

Updated May 2, 2026
TypeScript

TamKungZ / ImageMergePy

Star

A GUI tool to merge images/videos from multiple folders, remove duplicates with SHA-256, and rename files into a clean sequential format.

desktop-app python cli gui cross-platform image-processing file-management batch-processing file-deduplication nuitka image-merge media-organizer pyside6 file-deduplication-tool

Updated Apr 13, 2026
Python

BaseMax / go-smart-deduper

Sponsor

Star

A high-performance file deduplication tool that detects and manages duplicate files using content hashing and intelligent similarity analysis.

go golang files file file-deduplication file-checker file-duplicator file-duplicate file-deduplication-tool file-deduplication-windows file-duplicates-cleanup-windows

Updated Dec 19, 2025
Go

MahdiKaseAtashin / cleanpulse

Star

Cross-platform duplicate file scanner written in Go, featuring fast content-hash detection, smart filtering, safe deletion workflows, CSV/JSON reporting, and both CLI and desktop GUI interfaces.

desktop-app go cli golang cross-platform duplicate-files developer-tools sha256 file-management file-deduplication fyne duplicate-file-finder

Updated Apr 14, 2026
Go

Improve this page

Add a description, image, and links to the file-deduplication topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the file-deduplication topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file-deduplication

Here are 24 public repositories matching this topic...

kornelski / dupe-krill

jRimbault / yadf

thushan / smash

dotfurther / OpenDiscoverSDK

dotfurther / OpenDiscoverPlatformCaseStudy

sph-mn / sdupes

9001 / smf

VersBinarii / deedoo

koro666 / dedupe

gitzain / clonezapper

levitation-opensource / DuplicateFileFinder

dffdgdg / FindDuplicates

dwin / goDeduplicate

omkarium / clonehunter

daedalus / duperemover

ngpepin / lshash

Skippia / twin-scanner-cli

TamKungZ / ImageMergePy

BaseMax / go-smart-deduper

MahdiKaseAtashin / cleanpulse

Improve this page

Add this topic to your repo