Digest(3) User Contributed Perl Documentation Digest(3)NAMEFile::RsyncP::Digest - Perl interface to rsync message digest
algorithms
SYNOPSIS
use File::RsyncP::Digest;
$rsDigest = new File::RsyncP::Digest;
# specify rsync protocol version (default is <= 26 -> buggy digests).
$rsDigest->protocol(version);
# file MD4 digests
$rsDigest->reset();
$rsDigest->add(LIST);
$rsDigest->addfile(HANDLE);
$digest = $rsDigest->digest();
$string = $rsDigest->hexdigest();
# Return 32 byte pair of digests (protocol <= 26 and >= 27).
$digestPair = $rsDigest->digest2();
$digest = File::RsyncP::Digest->hash(SCALAR);
$string = File::RsyncP::Digest->hexhash(SCALAR);
# block digests
$digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
$checksumSeed);
$digests = $rsDigest->blockDigestUpdate($state, $blockSize,
$blockLastLen, $md4DigestLen, $checksumSeed);
$digests2 = $rsDigest->blockDigestExtract($digests16, $md4DigestLen);
DESCRIPTION
The File::RsyncP::Digest module allows you to compute rsync digests,
including the RSA Data Security Inc. MD4 Message Digest algorithm, and
Adler32 checksums from within Perl programs.
Rsync Digests
Rsync uses two main digests (or checksums), for checking with very high
probability that the underlying data is identical, without the need to
exchange the underlying data.
The server (remote) side of rsync generates a checksumSeed (usually
unix time()) that is exchanged during the protocol startup. This seed
is used in both the file and MD4 checksum calculations. This causes
the block and file checksums to change every time Rsync is run.
File Digest
This is an MD4 digest of the checksum seed, followed by the entire
file's contents. This digest is 128 bits long. The file digest is
sent at the end of a file's deltas to ensure that the reconstructed
file is correct. This digest is also optionally computed and sent
as part of the file list if the --checksum option is specified to
rsync.
Block digest
Each file is divided into blocks of default length 700 bytes. The
digest of each block is formed by computing the Adler32 checksum of
the block, and also the MD4 digest of the block followed by the
checksum seed. During phase 1, just the first two bytes of the MD4
digest are sent, meaning the total digest is 6 bytes or 48 bits (4
bytes for Adler32 and the first 2 bytes of the MD4 digest). During
phase 2 (which is necessary for received files that have an
incorrect file digest), the entire MD4 checksum is used (128 bits)
meaning the block digest is 20 bytes or 160 bits. (Prior to rsync
protocol XXX, the full 20 byte digest was sent every time and there
was only a single phase.)
This module contains routines for computing file and block digests in a
manner that is identical to rsync.
Incidentally, rsync contains two bugs in its implementation of MD4 (up
to and including rsync protocol version 26):
· MD4Final() is not called when the data size (ie: file or block size
plus 4 bytes for the checksum seed) is a multiple of 64.
· MD4 is not correct for total data sizes greater than 512MB (2^32
bits). Rsync's MD4 only maintains the data size using a 32 bit
counter, so it overflows for file sizes bigger than 512MB.
The effects of these bugs are benign: the MD4 digest should not be
cryptographically weakened and both sides are consistent.
This module implements both versions of the MD4 digest: the buggy
version for protocol versions <= 26 and the correct version for
protocol versions >= 27. The default mode is the buggy version
(protocol versions <= 26).
You can specify the rsync protocol version to determine which MD4
version is used:
# specify rsync protocol version (default is <= 26 -> buggy digests).
$rsDigest->protocol(version);
Also, you can get both digests in a single call. The result is
returned as a single 32 byte scalar: the first 16 bytes is the buggy
digest and the second 16 bytes is the correct digest:
# Return 32 byte pair of digests (protocol <= 26 and >= 27).
$digestPair = $rsDigest->digest2();
Usage
A new rsync digest context object is created with the new operation.
Multiple simultaneous digest contexts can be maintained, if desired.
Computing Block Digests
After a context is created, the function to compute block checksums is:
$digests = $rsDigest->blockDigest($data, $blockSize, $md4DigestLen,
$checksumSeed)
The first argument is the data, which can contain as much raw data as
you wish (ie: multiple blocks). Both the Adler32 checksum and the MD4
checksum are computed for each block in data. The partial end block
(if present) is also processed. The 4 bytes of the integer
checksumSeed is added at the end of each block digest calculation if it
is non-zero. The blockSize is specified in the second argument
(default is 700). The third argument, md4DigestLen, specifies how many
bytes of the MD4 digest are included in the returned data. Rsync uses
a value of 2 for the first pass (meaning 6 bytes of total digests are
returned per block), and all 16 bytes for the second pass (meaning 20
bytes of total digests are returned per block). The returned number of
bytes is the number of bytes in each digest (Alder32 + partial/compete
MD4) times the number of blocks:
(4 + md4DigestLen) * ceil(length(data) / blockSize);
To allow block checksums to be cached (when checksumSeed is unknown),
and then quickly updated with the known checksumSeed, the checksum data
should be first computed with a digest length of -1 and a checksumSeed
of 0:
$state = $rsDigest->blockDigest($data, $blockSize, -1, 0);
The returned $state should be saved for later retrieval, together with
the length of the last partial block (eg: length($data) % $blockSize).
The length of $state depends upon the number of blocks and the block
size. In addition to the 16 bytes of MD4 state, up to 63 bytes of
unprocessed data per block also is saved in $state. For each block,
16 + ($blockSize % 64)
bytes are saved in $state, so $state is most compact when $blockSize is
a multiple of 64. (The last, partial, block might have a smaller block
size, requiring up to 63 bytes of state even if $blockSize is a
multiple of 64.)
Once the checksumSeed is known the updated checksums can then be
computed using:
$digests = $rsDigest->blockDigestUpdate($state, $blockSize,
$blockLastLen, $md4DigestLen, $checksumSeed);
The first argument is the cached checksums from blockDigest. The third
argument is the length of the (partial) last block.
Alternatively, I hope to add a --checksum-seed=n option to rsync that
allows the checksum seed to be set to 0. This causes the checksum seed
to be omitted from the MD4 calculation and it makes caching the
checksums much easier. A zero checksum seed does not weaken the block
digest. I'm not sure whether or not it weakens the file digest (the
checksum seed is applied at the start of the file digest and end of the
block digest). In this case, the full 16 byte checksums should be
computed using:
$digests16 = $rsDigest->blockDigest($data, $blockSize, 16, 0);
and for phase 1 the 2 byte MD4 substrings can be extracted with:
$digests2 = $rsDigest->blockDigestExtract($digests16, 2);
The original $digests16 does not need any additional processing for
phase 2.
Computing File Digests
In addition, functions identical to Digest::MD4 are provided that allow
rsync's MD4 file digest to be computed. The checksum seed, if non-
zero, is included at the start of the data, before the file's contents
are added.
The context is updated with the add operation which adds the strings
contained in the LIST parameter. Note, however, that "add('foo',
'bar')", "add('foo')" followed by "add('bar')" and "add('foobar')"
should all give the same result.
The final MD4 message digest value is returned by the digest operation
as a 16-byte binary string. This operation delivers the result of add
operations since the last new or reset operation. Note that the digest
operation is effectively a destructive, read-once operation. Once it
has been performed, the context must be reset before being used to
calculate another digest value.
Several convenience functions are also provided. The addfile operation
takes an open file-handle and reads it until end-of file in 1024 byte
blocks adding the contents to the context. The file-handle can either
be specified by name or passed as a type-glob reference, as shown in
the examples below. The hexdigest operation calls digest and returns
the result as a printable string of hexdecimal digits. This is exactly
the same operation as performed by the unpack operation in the examples
below.
The hash operation can act as either a static member function (ie you
invoke it on the MD4 class as in the synopsis above) or as a normal
virtual function. In both cases it performs the complete MD4 cycle
(reset, add, digest) on the supplied scalar value. This is convenient
for handling small quantities of data. When invoked on the class a
temporary context is created. When invoked through an already created
context object, this context is used. The latter form is slightly more
efficient. The hexhash operation is analogous to hexdigest.
EXAMPLES
use File::RsyncP::Digest;
my $rsDigest = new File::RsyncP::Digest;
$rsDigest->add('foo', 'bar');
$rsDigest->add('baz');
my $digest = $rsDigest->digest();
print("Rsync MD4 Digest is " . unpack("H*", $digest) . "\n");
The above example would print out the message
Rsync MD4 Digest is 6df23dc03f9b54cc38a0fc1483df6e21
To compute the rsync phase 1 block checksums (4 + 2 = 6 bytes per
block) for a 2000 byte file containing 700 a's, 700 b's and 600 c's,
with a checksum seed of 0x12345678:
use File::RsyncP::Digest;
my $rsDigest = new File::RsyncP::Digest;
my $data = ("a" x 700) . ("b" x 700) . ("c" x 600);
my $digest = $rsDigest->rsyncChecksum($data, 700, 2, 0x12345678);
print("Rsync block checksums are " . unpack("H*", $digest) . "\n");
This will print:
Rsync block checksums are 3c09a624641bf80b0ce3abd208e8645d5b49
The same result can be achieved in two steps by saving the state, and
then finishing the calculation:
my $state = $rsDigest->blockDigest($data, 700, -1, 0);
my $digest = $rsDigest->blockDigestUpdate($state, 700,
length($data) % 700, 2, 0x12345678);
or by computing full-length MD4 digests, and extracting the 2 byte
version:
my $digest16 = $rsDigest->blockDigest($data, 700, 16, 0x12345678);
my $digest = $rsDigest->blockDigestExtract($digest16, 2);
LICENSE
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License in
the LICENSE file along with this program; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA.
The MD4 algorithm is defined in RFC1320. The basic C code implementing
the algorithm is derived from that in the RFC and is covered by the
following copyright:
MD4 is Copyright (C) 1990-2, RSA Data Security, Inc. All rights
reserved.
License to copy and use this software is granted provided that it
is identified as the "RSA Data Security, Inc. MD4 Message-Digest
Algorithm" in all material mentioning or referencing this software
or this function.
License is also granted to make and use derivative works provided
that such works are identified as "derived from the RSA Data
Security, Inc. MD4 Message-Digest Algorithm" in all material
mentioning or referencing the derived work.
RSA Data Security, Inc. makes no representations concerning either
the merchantability of this software or the suitability of this
software for any particular purpose. It is provided "as is"
without express or implied warranty of any kind.
These notices must be retained in any copies of any part of this
documentation and/or software.
This copyright does not prohibit distribution of any version of Perl
containing this extension under the terms of the GNU or Artistic
licences.
AUTHORFile::RsyncP::Digest was written by Craig Barratt
<cbarratt@users.sourceforge.net> based on Digest::MD4 and the Adler32
implementation was based on rsync 2.5.5.
Digest::MD4 was adapted by Mike McCauley ("mikem@open.com.au"), based
entirely on MD5-1.7, written by Neil Winton
("N.Winton@axion.bt.co.uk").
Rsync was written by Andrew Tridgell <tridge@samba.org> and Paul
Mackerras. It is available under a GPL license. See
<http://rsync.samba.org>.
SEE ALSO
See <http://perlrsync.sourceforge.net> for File::RsyncP's SourceForge
home page.
See File::RsyncP, File::RsyncP::FileIO and File::RsyncP::FileList.
perl v5.18.2 2010-07-25 Digest(3)