[MD5] now in the vault

Hi all, I finished my own MD5 implementation today and decided to boostify it. It's not header only which is slightly inconvenient but Jamfiles, Docs and Tests are provided. Comments on the interface and other things are welcome! http://boost-consulting.com/vault/index.php?action=downloadfile&filename=md5_v1.zip&directory=& just extract it to your boost directory and run 'bjam md5' regards, Kevin Sopp

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Kevin Sopp Sent: 29 July 2007 22:01 To: boost@lists.boost.org Subject: [boost] [MD5] now in the vault
Hi all, I finished my own MD5 implementation today and decided to boostify it. It's not header only which is slightly inconvenient but Jamfiles, Docs and Tests are provided. Comments on the interface and other things are welcome!
http://boost-consulting.com/vault/index.php?action=downloadfile &filename=md5_v1.zip&directory=&
just extract it to your boost directory and run 'bjam md5'
At a glance, it looks a useful addition to the Boost repertoire. Two questions: 1 Is it bound to be header-only? Or is it just that you haven't tried to make it header-only? 2 Some references to vulnerability as a security feature would be useful. Is this a weakness in its role as tamper-evident packaging, or in other applications? Paul --- Paul A Bristow Prizet Farmhouse, Kendal, Cumbria UK LA8 8AB +44 1539561830 & SMS, Mobile +44 7714 330204 & SMS pbristow@hetp.u-net.com

On 7/30/07, Paul A Bristow <pbristow@hetp.u-net.com> wrote:
1 Is it bound to be header-only? Or is it just that you haven't tried to make it header-only?
I haven't tried to make it header-only, but it should be easy to do so.
2 Some references to vulnerability as a security feature would be useful. Is this a weakness in its role as tamper-evident packaging, or in other applications?
I'm no expert in this domain and haven't read up on these weaknesses so I can't say anything useful about that. After implementing the MD5 algorithm I had a look at SHA1 and found it uses the same structural approach to computing a message digest, the only difference is the core routine and the size of the digest. What I'm doing right now is factoring out the common code for this family of hash functions so that I can just plugin a different hash core and reuse all the interface code. The interface has become a bit richer and I have implemented the SHA1 algorithm. I am now in bugfixing mode, but I'm not sure what went wrong yet ;) Kevin

Is your implementation endian invariant? Kevin Sopp wrote:
Hi all, I finished my own MD5 implementation today and decided to boostify it. It's not header only which is slightly inconvenient but Jamfiles, Docs and Tests are provided. Comments on the interface and other things are welcome!
http://boost-consulting.com/vault/index.php?action=downloadfile&filename=md5_v1.zip&directory=&
just extract it to your boost directory and run 'bjam md5'
regards, Kevin Sopp

On 7/31/07, Arash Partow <arash@partow.net> wrote:
Is your implementation endian invariant?
I do not have the code in front of me right now but I believe that I used only portable operations. Of course if your input data changed from little to big endian then it will produce a different hash. You could also run the unit test file that comes with it on a big endian machine and see what happens. Kevin

Hi all, first of all, my two cents from quickly glancing at the provided MD5 code. Basically all of this was already mentioned by my foreposters and Kevin himself. * The library should definitely be made header-only, and during my quick inspection I did not discover anything that would stand in the way of that. * The library should be extended into a more general framework into which other MD algorithms can easily be integrated. * The implementation should be endian invariant to make cross platform digest checks feasible. The last observation sparked the idea of introducing a simple abstraction layer I call octet iterators. As I envision them, these, in their most basic form, would be iterators that allow iterating over the octets of builtin types in a defined byte order (e.g. network byte order). In a second step, more high level constructs could be introduced, of which two come to my mind naturally: 1. Easy composition operations to allow iterating over the octets of user-defined types. 2. Adapters of other iterator types to make it possible to easily iterate over the octets of a std::list<anything>, for example. Such a library would not only come in handy for the problem at hand, but also for other libraries that have to deal with low level aspects, such as the binary iostreams library discussed recently. I'd greatly appreciate any feedback on this topic, and if it should turn out to be of general use to the boost community, I would be happy to give it a shot. Regards, Kimon

On 8/7/07, Kimon Hoffmann <Kimon.Hoffmann@rapidsolution.de> wrote: Hi,
* The library should definitely be made header-only, and during my quick inspection I did not discover anything that would stand in the way of that.
All that is needed is a bit of work.
* The library should be extended into a more general framework into which other MD algorithms can easily be integrated.
I have done that, in fact in the meantime I have implemented MD4, SHA-1, SHA224, SHA256, SHA384 and SHA512 in addition to MD5. I have added testing for these and added a benchmark. I am now finalizing the message digest lazily, which saves a context switch and saves cpu time and a little memory because only one context is now needed instead of two. This is still on my disk though and not in the vault.
* The implementation should be endian invariant to make cross platform digest checks feasible.
The message digest algorithms cannot know the layout of your data, i.e. is the data you're passing in a bunch of int32_t or int64_t, etc. or mixed types (which will be the case for many binary formats). However the implementation _is_ endian invariant in the sense that if you pass in the same data on big/low endian machines you will receive the same digest.
The last observation sparked the idea of introducing a simple abstraction layer I call octet iterators.
It sounds interesting but I doubt there are many people who would use it to iterate over homogeneous binary data. Maybe there is an audience for endian conversion functions which anyone has written or seen at least once ;) Kevin

Kevin Sopp wrote:
The last observation sparked the idea of introducing a simple abstraction layer I call octet iterators.
It sounds interesting but I doubt there are many people who would use it to iterate over homogeneous binary data. Maybe there is an audience for endian conversion functions which anyone has written or seen at least once ;)
This seems to be similar to the "Dataflow iterators" which are part of the serialization library. Robert Ramey

Hi Kevin,
* The implementation should be endian invariant to make cross platform digest checks feasible.
The message digest algorithms cannot know the layout of your data, i.e. is the data you're passing in a bunch of int32_t or int64_t, etc. or mixed types (which will be the case for many binary formats). However the implementation _is_ endian invariant in the sense that if you pass in the same data on big/low endian machines you will receive the same digest.
I'm sorry, of course it is endian invariant since it hashes sequences of octets and therefor endianess is not an issue. What I actually meant was, that in case someone would want to hash custom types, like POD structs, a lot of mindless and error prone code is needed. The most straightforward approach of casting the instance into a array of chars is also the most flawed one, because of uninitialized data in padding bytes and endianess, of course. So the correct solution would be to build a character array by hand, converting values to a defined byte order as necessary. Since your library already provides iterator based hashing, octet iterators would, easily allow hashing user-defined types, performing any conversion that is needed on the fly. Regards, Kimon
participants (5)
-
Arash Partow
-
Kevin Sopp
-
Kimon Hoffmann
-
Paul A Bristow
-
Robert Ramey