[endian] Use in Hash algorithms

28 May 2010

      Popular hash algorithms (MD5, SHA, ...) involve a preprocessing stage
to turn bytes (or individual bits) into (32- or 64-bit) words, using a
certain byte (and bit) order.

For my Hash library, I ended up writing a pack function that takes an
endianness and the number of bits in the input and output values, and
combines or splits the input as needed.

A simple example:

    {
    array<uint8_t, 8> in = {{0x67, 0x45, 0x23, 0x01, 0xEF, 0xCD, 0xAB, 0x89}};
    array<uint32_t, 2> out;
    pack<little_octet_big_bit, 8, 32>(in, out);
    array<uint32_t, 2> eout = {{0x01234567, 0x89ABCDEF}};
    assert(out == eout);
    }

As a bonus, it also handles non-bytes units:

    {
    array<uint8_t, 3> in = {{31, 17, 4}};
    array<uint16_t, 1> out;
    pack<big_bit, 5, 15>(in, out);
    array<uint16_t, 1> eout = {{(31 << 10) | (17 << 5) | (4 << 0)}};
    assert(out == eout);
    }

An extensive set of examples can be found here:
<http://svn.boost.org/svn/boost/sandbox/hash/libs/hash/test/pack.cpp>

This is used to turn the input into words, to turn the length into
words for padding, for figuring out where in the word the "1" padding
bit goes, and for turning the state back into octets for display.
There are enough optimizations SFINAEed in that the first example just
results in a memcpy on x86, but doesn't require contiguous input.
(It's perfectly happy with single-pass input, though usually somewhat
slower in that case.)

I'm not sure how widely applicable this form of the solution would be,
but I think it's a case where both the byte-swapping version and
Beman's swap-on-load approach are awkward.

~ Scott McMurray

Scott McMurray

Tomas Puverle

Scott McMurray

Terry Golubiewski

Scott McMurray

tags

participants (3)