rmlint [TARGET_DIR_OR_FILES ...] [//] [TARGET_DIR_OR_FILES ...] [-] [OPTIONS]
rmlint
finds space waste and other broken things on your filesystem and offers
to remove it. Types of waste include:
In order to find the lint, rmlint
is given one or more directories to traverse.
If no directories or files were given, the current working directory is assumed.
rmlint
will take care of things like filesystem loops and symlinks during
traversing.
Found duplicates are divided into the original and duplicates. Original
are what rmlint
thinks to be the file that was first there. You can drive
the original detection with the -S option. If you know which path contains the
originals you can prefix the path with //,
Note: rmlint
will not delete any files. It only produces executable output
for you to remove it.
-T --types="description" (default: defaults): | |
---|---|
Configure the types of lint rmlint is supposed to find. The description string enumerates the types that shall be investigated, separated by a space, comma or semicolon (actually more separators work). At the beginning of the string certain groups may be specified.
All following lint types must be one of the following, optionally prefixed with a + or - to select or deselect it:
Warning: It is good practice to enclose the description in quotes. In obscure cases argument parsing might fail in weird ways: # -ed is recognized as -e and -d here, -d takes "-s 10M" as parameter.
# This will fail to do the supposed, finding also files smaller than 10M.
$ rmlint -T all -ef -ed -s10M /media/music/
|
|
-o --output=spec / -O --add-output=spec (default: -o sh:rmlint.sh -o pretty:stdout -o summary:stdout): | |
Configure the way rmlint outputs it's results. You link a formatter to a
file through If this options is specified, rmlint's defaults are overwritten. The option can be specified several times and formatters can be specified more than once for different files. --add-output works the same way, but does not overwrite the defaults. Both -o and -O may not be specified at the same time. For a list of formatters and their options, look at the Formatters section below. |
|
-c --config=spec[=value] (default: none): | |
Configure a formatter. This option can be used to fine-tune the behaviour of the existing formatters. See the Formatters section for details on the available keys. If the value is omitted it is set to a truthy value. |
:-a --algorithm=name
(default: spooky) :
Choose the hash algorithm to use for finding duplicate files. The following well-known algorithms are available:
spooky, city, murmur, md5. sha1, sha256, sha512.
If not explicitly stated in the name the hashfunctions use 128 bit. There are variations of the above functions:
- bastard: 256bit, half seeded city, half murmur.
- city256, city512, murmur256, murmur512: Slower variations with more bits.
- spooky32, spooky64: Faster version of spooky with less bits.
- paranoid: No hash function, compares files byte-by-byte.
-v --loud / -V --quiet : | |
---|---|
Increase or decrease the verbosity. You can pass these options several times. This only affects rmlint's logging on stderr, but not the outputs defined with -o. |
|
-g --progress / -G --no-progress (default): | |
Convinience shortcut for |
|
-p --paranoid / -P --less-paranoid (default): | |
Increase the paranoia of rmlints internals. Both options can be specified up to three times. They do not do any work themselves, but set some other options implicitly as a shortcut.
The last one is not a hash function in the traditional meaning, but performs a byte-by-byte comparison of each file. See also --max-paranoid-ram. For the adventurous, it is also possible to decrease the default paranoia:
This is really not recommended though. |
|
-D --merge-directories ([experimental] default: disabled): | |
Makes rmlint use a special mode where all found duplicates are collected and checked if whole directory trees are duplicates. This is an HIGHLY EXPERIMENTAL FEATURE and was/is tricky to implement right. Use with caution. You always should make sure that the investigated directory is not modified during rmlint or it's removal scripts run. Output is deferred until all duplicates were found. Sole duplicate groups are printed after the directories. --sortcriteria applies for directories too, but 'p' or 'P' (path index) has no defined (useful) meaning. Sorting takes only place when the number of preferred files in the directory differs. Notes:
|
|
-q --clamp-low=[fac.tor|percent%|offset] (default: 0) / -Q --clamp-top=[fac.tor|percent%|offset] (default: 1.0): | |
The argument can be either passed as factor (a number with a Only look at the content of files in the range of from This is useful in a few cases where a file consists of a constant sized header or footer. With this option you can just compare the data in between. Also it might be useful for approximate comparison where it suffices when the file is the same in the middle part. |
|
-u --max-paranoid-ram=size : | |
Apply a maximum number of bytes to use for --paranoid.
The |
|
-w --with-color (default) / -W --no-with-color : | |
Use color escapes for pretty output or disable them. If you pipe rmlints output to a file -W is assumed automatically. |
|
-h --help / -H --version : | |
Show this manual or print the version string. |
-t --threads=N** (default: 16): | |
---|---|
The number of threads to use during file tree traversal and hashing.
|
|
-s --size=range (default: all): | |
Only consider files in a certain size range. The format of range is min-max, where both ends can be specified as a number with an optional multiplier. The available multipliers are:
The size format is about the same as dd(1) uses. Example: "100KB-2M". It's also possible to specify only one size. In this case the size is interpreted as "up to this size". |
|
-d --max-depth=depth (default: INF): | |
Only recurse up to this depth. A depth of 1 would disable recursion and is equivalent to a directory listing. |
|
-l --hardlinked / -L --no-hardlinked (default): | |
By default |
|
-f --followlinks / -F --no-followlinks / -FF --see-symlinks (**default**) : | |
Follow symbolic links? If file system loops occur Note: Hardlinks are always followed, but it depends on |
|
-x --crossdev (default) / -X --no-crossdev : | |
Do cross over mount points ( |
|
-r --hidden / -R --no-hidden (default): | |
Also traverse hidden directories? This is often not a good idea, since
directories like |
|
-b --match-basename / -B --no-match-basename (default): | |
Only consider those files as dupes that have the same basename.
See also |
|
-e --match-with-extension / -E --no-match-with-extension (default): | |
Only consider those files as dupes that have the same file extension.
For example two photos would only match if they are a |
|
-i --match-without-extension / -I --no-match-without-extension (default): | |
Only consider those files as dupes that have the same basename minus the file
extension. For example: |
|
-n --newer-than-stamp=<timestamp_filename> / -N --newer-than=<iso8601_timestamp_or_unix_timestamp> : | |
Only consider files (and their size siblings for duplicates) newer than a certain modification time (mtime). The age barrier may be given as seconds since the epoch or as ISO8601-Timestamp like 2014-09-08T00:12:32+0200.
If you want to take only the files (and not their size siblings) you can
use
Note: you can make rmlint write out a compatible timestamp with:
|
-k --keep-all-tagged / -K --keep-all-untagged (default): | |
---|---|
Don't delete any duplicates that are in original paths. (Paths that were named after //). Note: for lint types other than duplicates, |
|
-m --must-match-tagged / -M --must-match-untagged (default): | |
Only look for duplicates of which one is in original paths. (Paths that were named after //). |
|
-S --sortcriteria=criteria (default: m): | |
One can have multiple criteria, e.g.: |
csv
: Format all found lint as comma-separated-value list.
Available options:
sh
: Format all found lint as shellscript. Sane defaults for most
lint-types are set. This formatter is activated as default.
Available options:
json
: Print a JSON-formatted dump of all found reports.
Outputs all finds as a json document. The document is a list of dictionaries,
where the first and last element is the header and the footer respectively,
everything between are data-dictionaries.
Available options:
py
: Outputs a python script and a JSON document, just like the json formatter.
The JSON document is written to .rmlint.json
, executing the script will
make it read from there. This formatter is mostly intented for complex usecases
where the lint needs special handling. Therefore the python script can be modified
to do things standard rmlint
is not able to do easily.
stamp
:
Outputs a timestamp of the time rmlint
was run.
Available options:
progressbar
: Shows a progressbar. This is meant for use with stdout or
stderr.
See also: -g
(--progress
) for a convinience shortcut option.
Available options:
pretty
: Shows all found items in realtimes nicely colored. This formatter
is activated as default.
summary
: Shows counts of files and their respective size after the run.
Also list all written files.
fdupes
: Prints an output similar to the popular duplicate finder
fdupes(1). At first a progressbar is printed on stderr. Afterwards the
found files are printed on stdout; each set of duplicates gets printed as a
block separated by newlines. Originals are highlighted in green. At the bottom
a summary is printed on stderr. This is mostly useful for scripts that are used to
parsing this format. We recommend the json
formatter for every other
scripting purpose.
Available options:
-f / --omitfirst
option in fdupes(1)
. Omits the
first line of each set of duplicates (i.e. the original file.-1 / --sameline
option in fdupes(1)
. Does not
print newlines between files, only a space. Newlines are printed only between
sets of duplicates.rmlint
Check the current working directory for duplicates.
find ~/pics -iname '*.png' | ./rmlint -
Read paths from stdin and check all png files for duplicates.
rmlint files_backup // files --keep-all-tagged --must-match-tagged
Check for duplicate files between the current files and the backup of it. Only files in files_backup would be reported as duplicate. Additionally, all reported duplicates must occur in both paths.
rmlint
computes
at least 3 hashes per file and requires them to be the same size, it's very
unlikely to happen. If you're really wary, try the --paranoid option.rmlint
recognized as duplicate is modified afterwards, resulting in a
different file. This is a general problem and cannot be solved from rmlint's
side alone. You should never modify the data until ``rmlint`` and the
shellscript has been run through. Careful persons might even consider to
mount the filesystem you are scanning read-only.If you found a bug, have a feature requests or want to say something nice, please visit https://github.com/sahib/rmlint/issues.
Please make sure to describe your problem in detail. Always include the version
of rmlint
(--version
). If you experienced a crash, please include
at least one of the following information with a debug build of rmlint
:
gdb --ex run -ex bt --args rmlint -vvv [your_options]
valgrind --leak-check=no rmlint -vvv [your_options]
You can build a debug build of rmlint
like this:
git clone git@github.com:sahib/rmlint.git
cd rmlint
scons DEBUG=1
sudo scons install # Optional
rmlint
is licensed under the terms of the GPLv3.
See the COPYRIGHT file that came with the source for more information.
rmlint
was written by:
Also see the THANKS file for other people that helped us.
If you consider a donation you can use Flattr or buy us a beer if we meet: