======
rmlint
======

------------------------------------------------------
find duplicate files and other space waste efficiently
------------------------------------------------------

.. Stuff in curly braces gets replaced by SCons

SYNOPSIS
========

rmlint [TARGET_DIR_OR_FILES ...] [//] [TAGGED_TARGET_DIR_OR_FILES ...] [-] [OPTIONS]

DESCRIPTION
===========

``rmlint`` finds space waste and other broken things on your filesystem.

Types of waste include:
* Duplicate files and directories.
* Nonstripped Binaries (Binaries with debug symbols).
* Broken links.
* Empty files and directories.
* Files with broken user or group id.

``rmlint`` will not delete any files. It does however produce executable output
(for example a shell script) to help you delete the files if you want to.

In order to find the lint, ``rmlint`` is given one or more directories to traverse.
If no directories or files were given, the current working directory is assumed.
By default, ``rmlint`` will ignore hidden files and will not follow symlinks (see
traversal options below).  ``rmlint`` will first find "other lint" and then search
the remaining files for duplicates.

Duplicate sets will be displayed as an original and one or more duplicates.  You
can set criteria for how ``rmlint`` chooses using the `-S` option (by default it
chooses the first-named path on the command line, or if that is equal then the
oldest file based on mtime).  You can also specify that certain paths **only** contain
originals by naming the path after the special path separator **//**.

Examples are given at the end of this manual.

OPTIONS
=======

General Options
---------------

:``-T --types="list"`` (**default\:** *defaults*):

    Configure the types of lint rmlint will look for. The `list` string is a
	comma-separated list of lint types or lint groups (other separators like
	semicolon or space also work).

	One of the following groups can be specified at the beginning of the list: 

    * ``all``: Enables all lint types.
    * ``defaults``: Enables all lint types, but ``nonstripped``.
    * ``minimal``: ``defaults`` minus ``emptyfiles`` and ``emptydirs``.
    * ``minimaldirs``: ``defaults`` minus ``emptyfiles``, ``emptydirs`` and
      ``duplicates``, but with ``duplicatedirs``.
    * ``none``: Disable all lint types [default].

    Any of the following lint types can be added individually, or deselected by
	prefixing with a **-**:

    * ``badids``, ``bi``: Find bad UID, GID or files with both.
    * ``badlinks``, ``bl``: Find bad symlinks pointing nowhere.
    * ``emptydirs``, ``ed``: Find empty directories.
    * ``emptyfiles``, ``ef``: Find empty files.
    * ``nonstripped``, ``ns``: Find nonstripped binaries.
    * ``duplicates``, ``df``: Find duplicate files.
    * ``duplicatedirs``, ``dd``: Find duplicate directories. 

    **WARNING:** It is good practice to enclose the description in quotes. In
    obscure cases argument parsing might fail in weird ways.

:``-o --output=spec`` / ``-O --add-output=spec`` (**default\:** *-o sh\:rmlint.sh -o pretty\:stdout -o summary\:stdout*):

    Configure the way ``rmlint`` outputs it's results. A ``spec`` is in the
	form ``format:file`` or just ``format``.  A file might either be an arbitrary
	path or ``stdout`` or ``stderr``.  If file is omitted, ``stdout`` is assumed.

    If ``-o`` is specified, rmlint's defaults are overwritten.  With ``--O`` the 
	defaults are preserved.  Either ``-o`` or ``-O`` may be specified multiple
	times to get multiple outputs, including multiple outputs of the same format. 

    For a list of formatters and their options, refer to the **Formatters**
    section below.

:``-c --config=spec[=value]`` (**default\:** *none*):

    Configure a format. This option can be used to fine-tune the behaviour of 
    the existing formatters. See the **Formatters** section for details on the
    available keys.

    If the value is omitted it is set to a true value.

:``-z --perms[=[rwx]]`` (**default\:** *no check*):

    Only look into file if it is readable, writable or executable by the current user.
    Which one of the can be given as argument as one of *"rwx"*. 

    If no argument is given, *"rw"* is assumed. Note that *r* does basically
    nothing user-visible since ``rmlint`` will ignore unreadable files anyways.
    It's just there for the sake of completeness.

    By default this check is not done. 

:``-a --algorithm=name`` (**default\:** *sha1*):

    Choose the algorithm to use for finding duplicate files.  The algorithm can be
	either **paranoid** (byte-by-byte file comparison) or use one of several file hash
	algorithms to identify duplicates.  The following well-known algorithms are available:

    **spooky**, **city**, **murmur**, **xxhash**, **md5**, **sha1**, **sha256**,
    **sha512**, **farmhash**.

    The above are all 128 bit except sha1 (160 bit), sha256 and sha512.  There are also
	some compound variations of the above functions:

    * **bastard:** 256bit, combining **city**, and **murmur**. 
    * **city256, city512, murmur256, murmur512:** Use multiple 128-bit hashes with different seeds.
    * **spooky32, spooky64:** Faster version of **spooky** with less bits.

:``-p --paranoid`` / ``-P --less-paranoid`` (**default**):

    Increase or decrease the paranoia of ``rmlint``'s duplicate algorithm.

    * **-p** is equivalent to **--algorithm=sha512**
    * **-pp** is equivalent to **--algorithm=paranoid**

    * **-P** is equivalent to **--algorithm bastard**
    * **-PP** is equivalent to **--algorithm spooky**

:``-v --loud`` / ``-V --quiet``:
    
    Increase or decrease the verbosity. You can pass these options several
    times. This only affects ``rmlint``'s logging on *stderr*, but not the outputs
    defined with **-o**. Passing either option more than three times has no
    effect.

:``-g --progress`` / ``-G --no-progress`` (**default**):

    Convenience shortcut for ``-o progressbar -o summary -o sh:rmlint.sh -VVV``.

    Note: This flag clears all previous outputs. Specify any additional outputs
	after this flag!

:``-D --merge-directories`` (**default\:** *disabled*):

    Makes rmlint use a special mode where all found duplicates are collected and
    checked if whole directory trees are duplicates. Use with caution: You
    always should make sure that the investigated directory is not modified
    during ``rmlint``'s or it's removal scripts run. 

    Output is deferred until all duplicates were found.
    Duplicate directories are printed first, followed by any remaining duplicate files.

    **--rank-by** applies for directories too, but 'p' or 'P' (path index)
    has no defined (i.e. useful) meaning. Sorting takes only place when the number of
    preferred files in the directory differs. 

    **NOTES:**

    * This option enables ``--partial-hidden`` and ``-@`` (``--see-symlinks``)
      for convenience. If this is not desired, you should change this after
      specifying ``-D``.
    * This feature might not deliver perfect result in corner cases.
    * This feature might add some runtime.

:``-y --sort-by=order`` (**default\:** *none*):

    During output, sort the found duplicate groups by criteria described by `order`.
    `order` is a string that may consist of one or more of the following letters:

    * `s`: Sort by size of group.
    * `a`: Sort alphabetically by the basename of the original.
    * `m`: Sort by mtime of the original.
    * `p`: Sort by path-index of the original.
    * `o`: Sort by natural found order (might be different on each run).
    * `n`: Sort by number of files in the group.

    The letter may also be written uppercase (similiar to ``-S /
    --rank-by``) to reverse the sorting. Note that ``rmlint`` has to hold
    back all results to the end of the run before sorting and printing. 

:``--gui``:

    Start the optional graphical frontend to ``rmlint`` called ``Shredder``.

    This will only work when ``Shredder`` and it's dependencies were installed.
    See also: http://rmlint.readthedocs.org/en/latest/gui.html

    The gui has it's own set of options, see ``--gui --help`` for a list.  These
	should be placed at the end, ie ``rmlint --gui [options]``

:``--hash``:

    Make ``rmlint`` work as a multi-threaded file hash utility, similar to the
	popular ``md5sum`` or ``sha1sum`` utilities, but faster.
    A set of paths given on the commandline or from *stdin* is hashed using one
	of the available hash algorithms.  Use ``rmlint --hash -h`` to see options.

:``-w --with-color`` (**default**) / ``-W --no-with-color``:

    Use color escapes for pretty output or disable them. 
    If you pipe `rmlints` output to a file ``-W`` is assumed automatically.

:``-h --help`` / ``-H --show-man``:

    Show a shorter reference help text (``-h``) or this full man page (``-H``).

:``--version``:

    Print the version of rmlint. Includes git revision and compile time
    features.

Traversal Options
-----------------

:``-s --size=range`` (**default\:** *all*):

    Only consider files in a certain size range.
    The format of `range` is `min-max`, where both ends can be specified
    as a number with an optional multiplier. The available multipliers are:

    - *C* (1^1), *W* (2^1), B (512^1), *K* (1000^1), KB (1024^1), *M* (1000^2), *MB* (1024^2), *G* (1000^3), *GB* (1024^3),
    - *T* (1000^4), *TB* (1024^4), *P* (1000^5), *PB* (1024^5), *E* (1000^6), *EB* (1024^6) 

    The size format is about the same as `dd(1)` uses. Example: **"100KB-2M"**.

    It's also possible to specify only one size. In this case the size is
    interpreted as *"bigger than this size"*. If you want to to filter for files
    *up to this size* you can add a ``-`` in front (``-s -1M``).

:``-d --max-depth=depth`` (**default\:** *INF*):

    Only recurse up to this depth. A depth of 1 would disable recursion and is
    equivalent to a directory listing.

:``-l --hardlinked`` (**default**) / ``-L --no-hardlinked``:

    Whether to report hardlinked files as duplicates.

:``-f --followlinks`` / ``-F --no-followlinks`` / ``-@ --see-symlinks`` (**default**):

    ``-f`` will always follow symbolic links. If file system loops occur
    ``rmlint`` will detect this. If `-F` is specified, symbolic links will be
    ignored completely, if ``-@`` is specified, ``rmlint`` will see symlinks and
    treats them like small files with the path to their target in them. The
    latter is the default behaviour, since it is a sensible default for
    ``--merge-directories``.

:``-x --no-crossdev`` / ``-X --crossdev`` (**default**):

    Stay always on the same device (``-x``),
    or allow crossing mountpoints (``-X``)?

:``-r --hidden`` / ``-R --no-hidden`` (**default**) / ``--partial-hidden``:

    Also traverse hidden directories? This is often not a good idea, since
    directories like ``.git/`` would be investigated. 
    With ``--partial-hidden`` hidden files and folders are only considered if
	they're inside duplicate directories (see --merge-directories).

:``-b --match-basename``:

    Only consider those files as dupes that have the same basename. See also
    ``man 1 basename``. The comparison of the basenames is case-insensitive.

:``-B --unmatched-basename``:

    Only consider those files as dupes that do not share the same basename.
    See also ``man 1 basename``. The comparison of the basenames is case-insensitive.

:``-e --match-with-extension`` / ``-E --no-match-with-extension`` (**default**):

    Only consider those files as dupes that have the same file extension. For
    example two photos would only match if they are a ``.png``. The extension is
    compared case insensitive, so ``.PNG`` is the same as ``.png``.

:``-i --match-without-extension`` / ``-I --no-match-without-extension`` (**default**):

    Only consider those files as dupes that have the same basename minus the file
    extension. For example: ``banana.png`` and ``banana.jpeg`` would be considered,
    while ``apple.png`` and ``peach.png`` won't. The comparison is also
    case-insensitive.

:``-n --newer-than-stamp=<timestamp_filename>`` / ``-N --newer-than=<iso8601_timestamp_or_unix_timestamp>``:

    Only consider files (and their size siblings for duplicates) newer than a
    certain modification time (*mtime*).  The age barrier may be given as
    seconds since the epoch or as ISO8601-Timestamp like
    *2014-09-08T00:12:32+0200*. 

    ``-n`` expects a file from which it can read the timestamp. After
    rmlint run, the file will be updated with the current timestamp.
    If the file does not initially exist, no filtering is done but the stampfile
    is still written.

    ``-N`` in contrast takes the timestamp directly and will not write anything.

    Note that ``rmlint`` will find duplicates newer than ``timestamp``, even if the original is
	older.  If you want only find duplicates where both original and duplicate are newer
    than ``timestamp`` you can use ``find(1)``:

    * ``find -mtime -1 | rmlint - # find all files younger than a day``

    *Note:* you can make rmlint write out a compatible timestamp with:

    * ``-O stamp:stdout  # Write a seconds-since-epoch timestamp to stdout on finish.``
    * ``-O stamp:stdout -c stamp:iso8601 # Same, but write as ISO8601.``

Original Detection Options
--------------------------

:``-k --keep-all-tagged`` / ``-K --keep-all-untagged``:

    Don't delete any duplicates that are in tagged paths (``-k``) or that are
    in non-tagged paths (``-K``).
    (Tagged paths are those that were named after **//**).

:``-m --must-match-tagged`` / ``-M --must-match-untagged``:

    Only look for duplicates of which at least one is in one of the tagged paths.
    (Paths that were named after **//**).

:``-S --rank-by=criteria`` (**default\:** *pm*):

    Sort the files in a group of duplicates by one or more criteria.    

    - **m**: keep lowest mtime (oldest)       **M**: keep highest mtime (newest)
    - **a**: keep first alphabetically        **A**: keep last alphabetically
    - **p**: keep first named path            **P**: keep last named path
    - **d**: keep path with lowest depth      **D**: keep path with highest depth
    - **l**: keep path with shortest basename **L**: keep path with longest basename
    - **r**: keep paths matching regex        **R**: keep path not matching regex
    - **r**: keep paths matching regex        **R**: keep path not matching regex
    - **x**: keep basenames matching regex    **X**: keep basenames not matching regex

    Alphabetical sort will only use the basename of the file and ignore it's case.
    One can have multiple criteria, e.g.: ``-S am`` will choose first alphabetically; if tied then by mtime.
    **Note:** original path criteria (specified using `//`) will always take first priority over `-S` options.
    
    For more fine grained control, it is possible to give a regular expression
    to sort by. This can be useful when you know a common fact that identifies
    original paths (like a path component being ``src``). 

    To use the regular expression you simply enclose it in the criteria string
    by adding `<REGULAR_EXPRESSIOn>` after specifying `r` or `x`. Example: ``-S
    'r<.*\.bak$>'`` makes all files that have a ``.bak`` suffix original files. 

    Warning: When using **r** or **x**, try to make your regex to be as specific
    as possible! Good practice includes adding a ``$`` anchor at the end of the regex.

    Tip: **l** is useful for files like `file.mp3 vs file.1.mp3 or file.mp3.bak`.

Caching
-------

:``--replay [path.json]``:

    Read an existing json file and re-output it. This is very useful if you want
    to reformat, refilter or resort the output you got from an previous run.
    Usage is simple: Just pass ``--replay`` on the second run, with other
    changed to the new formatters or filters. You can also merge several previous
    runs by using ``--replay`` more than once, in this case it will merge all files
    given and output them as one big run.

    If you want to view only the duplicates of certain subdirectories, just pass
    them on the commandline as usual.
 
    If ``path.json`` is not given then `./rmlint.json` is used as default.

    By design, some options will not have any effect. Those are:
    
    - `--followlinks`
    - `--algorithm`
    - `--paranoid`
    - `--clamp-low`
    - `--hardlinked`
    - `--write-unfinished`
    - ... and all other caching options below.

:``--xattr-read`` / ``--xattr-write`` / ``--xattr-clear``:

    Read or write cached checksums from the extended file attributes.
    This feature can be used to speed up consecutive runs.

    This is an slighter securer alternative to ``--cache``, but the same notes
    as in ``--cache`` apply.

    **NOTE:** Many tools do not support extended file attributes properly,
    resulting in a loss of the information when copying the file or editing it.
    Also, this is a linux specific feature that works not on all filesystems and 
    only if you have write permissions to the file.

:``-C --cache file.json``:

    Read checksums from a *json* file. This *json* file is the same that is
    outputted via ``-o json``, but you can also enrich the *json* with 
    the checksums of sieved out files via ``--write-unfinished``.

    Usage example: ::

        $ rmlint large_cluster/ -O json:cache.json -U   # first run.
        $ rmlint large_cluster/ -C cache.json           # second run.

    **CAUTION:** This is a potentially unsafe feature. The cache file might be
    changed accidentally, potentially causing ``rmlint`` to report false
    positives. As a security feature the `mtime` of each cached file is checked 
    against the `mtime` of the time the checksum was created.

    **NOTE:** The speedup you may experience may vary wildly. In some cases the
    parsing of the json file might take longer than the actual hashing. Also,
    the cached json file will not be of use when doing many modifications
    between the runs, i.e. causing an update of `mtime` on most files. This
    feature is mostly intended for large datasets in order to prevent the
    re-hashing of large files. 

:``-U --write-unfinished``: 

    Include files in output that have not been hashed fully (i.e. files that
    do not appear to have a duplicate). This is mainly useful in conjunction
    with ``--cache``. When re-running rmlint on a large dataset this can greatly
    speed up a re-run in some cases.

    This option also applies for ``--xattr-write``. 

Rarely used, miscellaneous options
----------------------------------

:``-t --threads=N`` (*default\:* 16):

    The number of threads to use during file tree traversal and hashing.
    ``rmlint`` probably knows better than you how to set the value.

:``-u --max-paranoid-mem=size``:

    Apply a maximum number of bytes to use for **--paranoid**. 
    The ``size``-description has the same format as for **--size**.

:``-q --clamp-low=[fac.tor|percent%|offset]`` (**default\:** *0*) / ``-Q --clamp-top=[fac.tor|percent%|offset]`` (**default\:** *1.0*):

    The argument can be either passed as factor (a number with a ``.`` in it),
    a percent value (suffixed by ``%``) or as absolute number or size spec, like in ``--size``.

    Only look at the content of files in the range of from ``low`` to
    (including) ``high``. This means, if the range is less than ``-q 0%`` to
    ``-Q 100%``, than only partial duplicates are searched. If the file size is
    less than the clamp limits, the file is ignored during traversing. Be careful when
    using this function, you can easily get dangerous results for small files.

    This is useful in a few cases where a file consists of a constant sized
    header or footer. With this option you can just compare the data in between.
    Also it might be useful for approximate comparison where it suffices when
    the file is the same in the middle part.

:``--with-fiemap`` (**default**) / ``--without-fiemap``:
    
    Enable or disable reading the file extents on rotational disk in order to
    optimize disk access patterns.

:``--with-metadata-cache`` / ``--without-metadata-cache`` (**default**):

    Swap certain file metadata attributes onto disk in order to save memory.
    This can help to save memory for very big datasets (several million files)
    where storing the paths alone can eat up several GB RAM.
    Enabling swapping will cause slowdowns in exchange.

    Sometimes the memory savings may be small since rmlint already compresses
	paths but storing them in a special tree structure.

    This feature may not play nice with some other options, causing heavy load
    and long computations: 
    
    - The ``--match-*`` family of options.
    - ``--cache`` might use more memory and takes longer.

    Some of those restrictions might be removed in future ``rmlint`` versions.

    The metadata cache will be stored in ``$XDG_CACHE_HOME/rmlint/$pid``. If the
    cache cannot be created, ``rmlint`` warns you and falls back to normal
    uncached mode.
    
FORMATTERS
==========

* ``csv``: Output all found lint as comma-separated-value list. 
  
  Available options:

  * *no_header*: Do not write a first line describing the column headers.

* ``sh``: Output all found lint as shell script This formatter is activated
    as default.
  
  Available options:

  * *cmd*: Specify a user defined command to run on duplicates. 
    The command can be any valid ``/bin/sh``-expression. The duplicate 
    path and original path can be accessed via ``"$1"`` and ``"$2"``. 
    The command will be written to the ``user_command`` function in the
    ``sh``-file produced by rmlint.

  * *handler* Define a comma separated list of handlers to try on duplicate
    files in that given order until one handler succeeds. Handlers are just the
    name of a way of getting rid of the file and can be any of the following:

    * ``clone``: ``btrfs`` only. Try to clone both files with the
      BTRFS_IOC_FILE_EXTENT_SAME ``ioctl(3p)``. This will physically delete
      duplicate extents. Needs at least kernel 4.2.
    * ``reflink``: Try to reflink the duplicate file to the original. See also
      ``--reflink`` in ``man 1 cp``. Fails if the filesystem does not support
      it.
    * ``hardlink``: Replace the duplicate file with a hardlink to the original
      file. Fails if both files are not on the same partition.
    * ``symlink``: Tries to replace the duplicate file with a symbolic link to
      the original. Never fails.
    * ``remove``: Remove the file using ``rm -rf``. (``-r`` for duplicate dirs).
      Never fails.
    * ``usercmd``: Use the provided user defined command (``-c
      sh:cmd=something``). Never fails.

    Default is ``remove``.
  
  * *link*: Shortcut for ``-c sh:clone,reflink,hardlink,symlink``.
  * *hardlink*: Shortcut for ``-c sh:hardlink,symlink``.
  * *symlink*: Shortcut for ``-c sh:symlink``.

* ``json``: Print a JSON-formatted dump of all found reports.
  Outputs all finds as a json document. The document is a list of dictionaries, 
  where the first and last element is the header and the footer respectively,
  everything between are data-dictionaries. 

  Available options:

  - *no_header=[true|false]:* Print the header with metadata.
  - *no_footer=[true|false]:* Print the footer with statistics.
  - *oneline=[true|false]:* Print one json document per line.

* ``py``: Outputs a python script and a JSON document, just like the **json** formatter.
  The JSON document is written to ``.rmlint.json``, executing the script will
  make it read from there. This formatter is mostly intented for complex use-cases
  where the lint needs special handling. Therefore the python script can be modified 
  to do things standard ``rmlint`` is not able to do easily.

* ``stamp``:

  Outputs a timestamp of the time ``rmlint`` was run.

  Available options:

  - *iso8601=[true|false]:* Write an ISO8601 formatted timestamps or seconds
    since epoch?

* ``progressbar``: Shows a progressbar. This is meant for use with **stdout** or
  **stderr** [default].
  
  See also: ``-g`` (``--progress``) for a convenience shortcut option.
 
  Available options:

  * *update_interval=number:* Number of milliseconds to wait between updates.
    Higher values use less resources (default 50).
  * *ascii:* Do not attempt to use unicode characters, which might not be
    supported by some terminals. 
  * *fancy:* Use a more fancy style for the progressbar.

* ``pretty``: Shows all found items in realtime nicely colored. This formatter
  is activated as default.

* ``summary``: Shows counts of files and their respective size after the run.
  Also list all written output files. 

* ``fdupes``: Prints an output similar to the popular duplicate finder
  **fdupes(1)**. At first a progressbar is printed on **stderr.** Afterwards the
  found files are printed on **stdout;** each set of duplicates gets printed as a
  block separated by newlines. Originals are highlighted in green. At the bottom 
  a summary is printed on **stderr**. This is mostly useful for scripts that were
  set up for parsing fdupes output. We recommend the ``json`` formatter for every other
  scripting purpose.

  Available options:

  * *omitfirst:* Same as the ``-f / --omitfirst`` option in ``fdupes(1)``. Omits the
    first line of each set of duplicates (i.e. the original file.
  * *sameline:* Same as the ``-1 / --sameline`` option in ``fdupes(1)``. Does not
    print newlines between files, only a space. Newlines are printed only between
    sets of duplicates.

EXAMPLES
========

This is a collection of common usecases and other tricks:

* Check the current working directory for duplicates.

  ``$ rmlint``

* Show a progressbar:

  ``$ rmlint -g``

* Quick re-run on large datasets using different ranking criteria on second run:

  ``$ rmlint large_dir/ # First run; writes rmlint.json``

  ``$ rmlint --replay rmlint.json large_dir -S MaD``

* Search only for duplicates and duplicate directories

  ``$ rmlint -T "df,dd" .``

* Compare files byte-by-byte in current directory:

  ``$ rmlint -pp .``

* Find duplicates with same basename (excluding extension):

  ``$ rmlint -e``

* Do more complex traversal using ``find(1)``.

  ``$ find /usr/lib -iname '*.so' -type f | rmlint - # find all duplicate .so files``

  ``$ find ~/pics -iname '*.png' | ./rmlint - # compare png files only``

* Limit file size range to investigate:

  ``$ rmlint -s 2GB    # Find everything >= 2GB``

  ``$ rmlint -s 0-2GB  # Find everything <  2GB``

* Only find writable and executable files:

  ``$ rmlint --perms wx``

* Reflink on btrfs, else try to hardlink duplicates to original. If that does
  not work, replace duplicate with a symbolic link:

  ``$ rmlint -c sh:link`` 

* Inject user-defined command into shell script output:

  ``$ ./rmlint -o sh -c sh:cmd='echo "original:" "$2" "is the same as" "$1"'``  

* Use *data* as master directory. Find **only** duplicates in *backup* that are
  also in *data*. Do not delete any files in *data*:

  ``$ rmlint backup // data --keep-all-tagged --must-match-tagged``

PROBLEMS
========

1. **False Positives:** Depending on the options you use, there is a very slight risk 
   of false positives (files that are erroneously detected as duplicate).
   The default hash function (SHA1) is pretty safe but in theory it is possible for
   two files to have then same hash. This happens about once in 2 ** 80 files, so
   it is very very unlikely. If you're concerned just use the ``--paranoid`` (``-pp``)
   option. This will compare all the files byte-by-byte and is not much slower than SHA1.

2. **File modification during or after rmlint run:** It is possible that a file
   that ``rmlint`` recognized as duplicate is modified afterwards, resulting in a
   different file.  If you use the rmlint-generated shell script to delete the duplicates,
   you can run it with the ``-p`` option to do a full re-check of the duplicate against
   the original before it deletes the file.

SEE ALSO
========

* `find(1)`
* `rm(1)`
* `cp(1)`

Extended documentation and an in-depth tutorial can be found at:

* http://rmlint.rtfd.org

BUGS
====

If you found a bug, have a feature requests or want to say something nice, please
visit https://github.com/sahib/rmlint/issues. 

Please make sure to describe your problem in detail. Always include the version
of ``rmlint`` (``--version``). If you experienced a crash, please include 
at least one of the following information with a debug build of ``rmlint``:

* ``gdb --ex run -ex bt --args rmlint -vvv [your_options]``
* ``valgrind --leak-check=no rmlint -vvv [your_options]``

You can build a debug build of ``rmlint`` like this:

* ``git clone git@github.com:sahib/rmlint.git``
* ``cd rmlint``
* ``scons DEBUG=1``
* ``sudo scons install  # Optional`` 

LICENSE
=======

``rmlint`` is licensed under the terms of the GPLv3.

See the COPYRIGHT file that came with the source for more information.

PROGRAM AUTHORS
===============

``rmlint`` was written by:

* Christopher <sahib> Pahl 2010-2015 (https://github.com/sahib)
* Daniel <SeeSpotRun> T.   2014-2015 (https://github.com/SeeSpotRun)

Also see the  http://rmlint.rtfd.org for other people that helped us.

If you consider a donation you can use *Flattr* or buy us a beer if we meet:

https://flattr.com/thing/302682/libglyr