List of duplicate file finders

This is a list of software tools to find and clean duplicate files in a directory.

Open Source

Language *nix Windows OS X CLI GUI Software
Python Yes Yes Yes Green tick Red X ActiveState Recipe - a minimal python command line tool that only detects duplicates
Python Yes Yes Yes ? ? dedupe_copy - filters duplicates while copying and allows automatic reordering
C Yes Cygwin Homebrew or MacPorts Green tick Red X duff - a Unix command-line utility for quickly finding duplicates in a given set of files
C++ No Yes No ? Green tick Duff - a GUI duplicate file finder and processor for Windows
C Yes Yes ? Green tick ? dupedit - Compares many files at once without checksumming. Avoids comparing files against themselves when multiple paths point to the same file.
Python Yes Yes Yes ? Green tick dupeguru - runs on various platforms. Special versions for music or picture available.
C++ Yes Yes No Green tick Green tick Duplicate Files Finder - GUI Application for Windows and Linux. Project site.
Perl ? ? ? ? ? dupious - Perl-based duplication finder for small to large systems, or multiserver setups. Former finddup.pl
C Yes Cygwin ? ? ? dupmerge - POSIX C compliant and runs on various platforms (Win32/64 with Cygwin, *nix, Linux etc.)
Perl Yes ? Yes ? ? dupseek - Perl with algorithm optimized to reduce reads
Python Yes Yes Yes Green tick Red X fastdupes fast and small python command line tool to find duplicates
Perl ? Yes ? ? ? fdf - Perl/c based and runs across most platforms (Win32, *nix and probably others). Uses MD5, SHA1 and other checksum algorithms
Perl Yes Yes Yes ? ? fdupe - a small script written in Perl. Doing its job fast and efficiently.[1]
C Yes No Homebrew Green tick Red X fdupes - Command line tool written in C. MD5 then byte-by-byte. Can also compare hardlinks.
C Yes Yes Untested Green tick Red X fdupes-jody - Enhanced fork of fdupes with much higher performance. This version has also been ported to Windows.
Java Yes Yes Yes Green tick Red X findrepe - free Java-based command-line tool designed for an efficient search of duplicate files, it can search within zips and jars.(GNU/Linux, Mac OS X, *nix, Windows)
C Yes Cygwin ? ? ? freedup - POSIX C compliant and runs across platforms (Windows with Cygwin, Linux, AIX, etc.)
Perl Yes ? ? ? ? freedups - Perl script that hardlinks duplicates to save space, caches file checksums.
Python Yes No No Green tick Green tick fslint - has command line interface and GUI.
Python Yes ? ? Green tick Red X hardlinkpy - A tool to hardlink together identical files in order to save space. It is a complete rewrite and improvement over the original hardlink.c code (which was written by: Jakub Jelinek <jakub@redhat.com>). Performance is orders of magnitude faster than hardlink.c due to a more efficient algorithm.
Python Yes Yes Yes Green tick Red X liten - Pure Python deduplication command line tool, and library, using md5 checksums and a novel byte comparison algorithm. (Linux, Mac OS X, *nix, Windows)
Python Yes No Yes Green tick Red X liten2 - A rewrite of the original Liten, still a command line tool but with a faster interactive mode using SHA-1 checksums (Linux, Mac OS X, *nix)
C# No GUI Yes No GUI Green tick Green tick ndupfinder - uses MD5 hashing to efficiently find duplicates. binaries not available as of now. needs compilation by user. WPF gui available for windows.
Batch No Yes No Green tick Red X phdeldup - very simple easily modifiable .BAT script to delete duplicate files matching a specified mask, using only native Windows shell commands (comp, dir, del).
C++ Yes Cygwin Yes Green tick ? rdfind - One of the few which rank duplicates based on the order of input parameters (directories to scan) in order not to delete in "original/well known" sources (if multiple directories are given). Uses MD5 or SHA1.
Python Yes Partial Yes Green tick Red X remdups - Small python command line tool with intermediate hash list file to produce an option driven remove file shell script.
C, Perl, SH Yes Cygwin No ? ? repeats - C and SH, from littleutils. File sizes, then partial-read hashes, then full-read hashes, then (optionally) byte-for-byte comparisons. Highly efficient. (Linux, *nix, Cygwin)
Bash Yes N/A N/A ? ? rmdupe - a shell script that uses linux tools to detect and remove duplicates.
C Yes No Experimental Green tick Red X rmlint - Tool with command line interface and options to find other lint and duplicate directories. Can use incremental byte-by-byte comparasion or different hashing algorithms. Heavily optimized, including the use of the FIEMAP [2] ioctl on Linux.
C Yes Yes Yes Green tick Red X ssdeep - identify almost identical files using Context Triggered Piecewise Hashing
Java Yes Yes Yes ? Green tick DFS - search by content / size / name
C++ Yes ? ? Green tick Red X ua - Unix/Linux command line tool, designed to work with find (and the like).
Python Yes Yes Yes Green tick Green tick pddf - A tool to find duplicate files with fast and full scan.

Commercial Or With More Restrictive License

See also

References

  1. User "Dr. Liviu Daia" (16:03 GMT-8, 12 Dec 2009). "Re: Comparing large amounts of files". Check date values in: |date= (help);
  2. kernel.org documentation of the FIEMAP ioctl

External links

External Comparisons