ddar

ddar is a free de-duplicating archiver for Unix. Save space, bandwidth and time by storing duplicate regions of data only once. Use ddar to:

Key Features

How It Works

Alternatives

Caveats

Licence

ddar is protected by copyright. If you wish, you may distribute it subject to the terms of the GNU GPL version 3.

Installation

I use Ubuntu 8.04, 10.04 and 10.10 in various combinations. Success and failure reports and contributions for wider system support are appreciated.

Ubuntu

8.04 Hardy Heron: install python-protobuf (all architectures) and then ddar (i386/amd64).

10.04 Lucid Lynx and 10.10 Maverick Meerkat: install ddar (i386/amd64). The installer will automatically pull in the python-protobuf dependency from the official tree.

Debian

For Debian stable and oldstable, use my Ubuntu Hardy backport packages as follows:

5.0 Lenny: install python-protobuf (all architectures) and then ddar (i386/amd64).

6.0 Squeeze: install ddar (i386/amd64). You may also need to pull in the python-protobuf dependency from the official tree.

Python sdist

You will need standard compiler tools (Debian: build-essential), setuptools (Debian: python-setuptools) and the Python development packages (Debian: python-dev) installed.

ddar depends on google.protobuf. Due to a bug in protobuf, setuptools/easy_install cannot currently install this automatically. On Debian and Ubuntu, you can install the package python-protobuf. For Ubuntu 8.04 (Hardy Heron), you can use my backport (all architectures).

Once you have protobuf installed, unpack the source dist and run python setup.py install. The man page is ddar.1; at the moment you have to install this manually if using setuptools (patches welcome!).

Source

The source is used to create the Python sdist and Debian packages. Satisfy the Build-Depends from debian/control and then make sdist or debuild as needed. For older versions of Debian and Ubuntu you can use my backports PPA for protobuf.

Getting Started

Back up from the local machine to a remote archive

    $ tar c source_dir|gzip --rsyncable|ddar cf server:dest_archive

If a name cannot be determined (for example, ddar is reading from stdin as in this example), then ddar will auto-generate a suitable name based on the current date. To override this behaviour, use -N name.

Back up from the local machine to a local external disk

    $ tar c source_dir|gzip --rsyncable|ddar cf /mnt/external_disk/dest_archive

Back up from a remote machine to a local archive

    $ ddar cf dest_archive server:\!"'tar c source_dir|gzip --rsyncable'"

The ! indicates that the remote ddar instance should shell out to the specified command which will generate the data on its stdout. This must be escaped to stop bash from treating it as a history lookup. The remote command is run with sh -c, so the single quotes are needed to protect the expansion.

Display archive contents

    $ ddar tf archive

Extract a gzipped tarball stored in the archive

    $ ddar xf archive|gzip -dc|tar x

If a member is not specified, ddar will extract the most recently added member (based on insertion order). To extract a specific member, name it as a positional argument.

Further reading

For more information, see the man page.

Comments and Bug Reports

For now, please contact me directly (reCAPTCHA link via bit.ly). I’ll set something else up when I need to.

Future Directions

Here are some thoughts on possible future improvements. Feedback and feature requests appreciated!

Credits

ddar is sponsored by Synctus, a multi-master, conflict-free, real-time file replication system. ddar was inspired by Tarsnap, a cloud-based de-duplicating backup tool.


You’ve read this far?! In that case, would you like to follow me on Twitter?