Any interest in hashing algorithms SHA and/or FNV1a?

I have been working on implementations of the SHA and FNV-1a class of hashing algorithms. Both are header-only. SHA (Secure Hash Algorithms) Implementations for SHA-1, -224, -256, -384, and -512. It does not implement the more recent SHA-3 (Keccak) algorithm. It would not be difficult to add support for 512/224 and 512/256 if they were desired. The routines support bit-level messages, meaning they can correctly digest messages that do not end on a byte boundary. More information here: http://en.wikipedia.org/wiki/Secure_Hash_Algorithm FNV (Fowler–Noll–Vo) The header supports the FNV-1a algorithm, which has better distribution characteristics than its cousin, FNV-1. I have added support for 32-, 64-, 128-, 256-, and 1024-bit variants. This is not a cryptographically secure algorithm, however is much faster than its crypto cousins and as such is a solid algorithm for obtaining unique hash values for e.g., data structures. I have also implemented a constexpr variant of the algorithm (32-bit and 64-bit only) for const char* to further performance when needed by pushing the hash to compile-time computation. More information here: http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function The algorithms are currently part of the Adobe Source Libraries on GitHub: SHA: https://github.com/stlab/adobe_source_libraries/blob/sha_cleanup/adobe/sha.h... (note: the master branch version needs updating to this one.) FNV-1a: https://github.com/stlab/adobe_source_libraries/blob/master/adobe/fnv.hpp Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.

On 11/12/2013 1:52 PM, foster brereton wrote:
I have been working on implementations of the SHA and FNV-1a class of hashing algorithms. Both are header-only.
SHA (Secure Hash Algorithms) Implementations for SHA-1, -224, -256, -384, and -512. It does not implement the more recent SHA-3 (Keccak) algorithm. It would not be difficult to add support for 512/224 and 512/256 if they were desired. The routines support bit-level messages, meaning they can correctly digest messages that do not end on a byte boundary. More information here: http://en.wikipedia.org/wiki/Secure_Hash_Algorithm
FNV (Fowler–Noll–Vo) The header supports the FNV-1a algorithm, which has better distribution characteristics than its cousin, FNV-1. I have added support for 32-, 64-, 128-, 256-, and 1024-bit variants. This is not a cryptographically secure algorithm, however is much faster than its crypto cousins and as such is a solid algorithm for obtaining unique hash values for e.g., data structures. I have also implemented a constexpr variant of the algorithm (32-bit and 64-bit only) for const char* to further performance when needed by pushing the hash to compile-time computation. More information here: http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
The algorithms are currently part of the Adobe Source Libraries on GitHub:
SHA: https://github.com/stlab/adobe_source_libraries/blob/sha_cleanup/adobe/sha.h... (note: the master branch version needs updating to this one.)
FNV-1a: https://github.com/stlab/adobe_source_libraries/blob/master/adobe/fnv.hpp
Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.
Have you seen: https://github.com/boost-vault/Miscellaneous/blob/master/boost_crypto.zip https://github.com/boost-vault/Miscellaneous/blob/master/crypto_v01.zip I'll take a look at your ASL links today. Do you have any comparisons with openssl or libcryptocpp? I'm primarily interested in MD5 & SHA1 for legacy reasons. My understanding is that these algorithms are not parallelizable due to inherent data dependencies. In my situation I need both MD5 & SHA1 over the same sequence. I'd like to treat these as a composite hash. My thought is that this should allow the compiler/processor to better schedule instructions leading to better performance. That's my hypothesis anyway. The other issue I have is that with the above mentioned api's each duplicate buffering of incoming data. So in my case I've got the disk, os, std::stream_buf, MD5 and SHA1 each doing buffering. Seems that these could be merged and there would be benefit to interleaving disk io with processing the composite hash on a 64 byte block basis. I know the number of incoming full blocks and would like to call the vault's crypto_v01 process_block functions for those and then call update with the remaining partial block. So, bottom line - I'm interested. Thanks, Jeff

On Wed, Nov 13, 2013 at 6:43 AM, Jeff Flinn
On 11/12/2013 1:52 PM, foster brereton wrote:
[snip]
Have you seen: https://github.com/boost-vault/Miscellaneous/blob/master/boost_crypto.zip https://github.com/boost-vault/Miscellaneous/blob/master/crypto_v01.zip
I have not - I'll grab them and have a look.
I'll take a look at your ASL links today. Do you have any comparisons with openssl or libcryptocpp?
I do not - I am assuming the specific context you're thinking of is performance?
The other issue I have is that with the above mentioned api's each duplicate buffering of incoming data. So in my case I've got the disk, os, std::stream_buf, MD5 and SHA1 each doing buffering. Seems that these could be merged and there would be benefit to interleaving disk io with processing the composite hash on a 64 byte block basis. I know the number of incoming full blocks and would like to call the vault's crypto_v01 process_block functions for those and then call update with the remaining partial block.
Both FNV and SHA implementations do no additional buffering. They do store internal state to manage the current digest, operating message block, etc. There is no duplication of user data during digest.
So, bottom line - I'm interested.
Great - I know ASL has an MD5 implementation as well (also header only, with minimal dependencies.) I haven't looked at it in a long while, and am sure it could do with some cleaning up (especially given what's available now in C++11). If there's enough demand I can roll md5 into the mix, too. -foster

On 11/13/2013 12:21 PM, foster brereton wrote:
On Wed, Nov 13, 2013 at 6:43 AM, Jeff Flinn
wrote: On 11/12/2013 1:52 PM, foster brereton wrote:
[snip]
Have you seen: https://github.com/boost-vault/Miscellaneous/blob/master/boost_crypto.zip https://github.com/boost-vault/Miscellaneous/blob/master/crypto_v01.zip
I have not - I'll grab them and have a look.
I'll take a look at your ASL links today. Do you have any comparisons with openssl or libcryptocpp?
I do not - I am assuming the specific context you're thinking of is performance?
Yes, with the bottom line being overall throughput of the data and hashes of many files.
The other issue I have is that with the above mentioned api's each duplicate buffering of incoming data. So in my case I've got the disk, os, std::stream_buf, MD5 and SHA1 each doing buffering. Seems that these could be merged and there would be benefit to interleaving disk io with processing the composite hash on a 64 byte block basis. I know the number of incoming full blocks and would like to call the vault's crypto_v01 process_block functions for those and then call update with the remaining partial block.
Both FNV and SHA implementations do no additional buffering. They do store internal state to manage the current digest, operating message block, etc. There is no duplication of user data during digest.
Cool if I can call with a single block at a time.
So, bottom line - I'm interested.
Great - I know ASL has an MD5 implementation as well (also header only, with minimal dependencies.) I haven't looked at it in a long while, and am sure it could do with some cleaning up (especially given what's available now in C++11). If there's enough demand I can roll md5 into the mix, too.
Very timely! :-) Jeff

On Nov 12, 2013, at 10:52 AM, foster brereton
I have been working on implementations of the SHA and FNV-1a class of hashing algorithms. Both are header-only.
[ snip algorithm descriptions ]
The algorithms are currently part of the Adobe Source Libraries on GitHub:
SHA: https://github.com/stlab/adobe_source_libraries/blob/sha_cleanup/adobe/sha.h... (note: the master branch version needs updating to this one.)
FNV-1a: https://github.com/stlab/adobe_source_libraries/blob/master/adobe/fnv.hpp
Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.
I believe so, yes. There's been interest off and on over the years in a Boost.Crypto library. So far, there's been either (a) a lack of people willing to put in the work, or (b) disagreement on the interface. A set of hash implementations would be a nice start for this. I'll try to take a look at your code this week. -- Marshall Marshall Clow Idio Software mailto:mclow.lists@gmail.com A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Wed, Nov 13, 2013 at 8:31 AM, Marshall Clow
On Nov 12, 2013, at 10:52 AM, foster brereton
wrote:
[ snip initial proposal / rfc ]
I believe so, yes.
There's been interest off and on over the years in a Boost.Crypto library. So far, there's been either (a) a lack of people willing to put in the work, or (b) disagreement on the interface.
While it is probably too early to start religious wars about interfaces, traditionally I have seen hash algorithms with APIs in triples: init -> update -> finish. I would be in favor of replicating that interface, as well as providing a single digest(...) call that wraps them all together, for more straightforward cases.
A set of hash implementations would be a nice start for this. I'll try to take a look at your code this week.
Great - if you have any questions/comments/concerns just ask. As I mentioned to Jeffrey Flinn I am willing to loop in ASL's md5 implementation, too, if it is worthwhile to do so. -foster

On 12 Nov 2013 at 10:52, foster brereton wrote:
Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.
Be aware that AFIO (currently in the peer review queue) is slowly gaining an asynchronous batch hash engine. It's a bit different from your normal hash engine, because it can do things like use SIMD to process four SHA256 streams in parallel, and then using AFIO's closure engine to parallelise that 4-SHA256 processing across multiple cores i.e. achieve sixteen parallel SHA256 processing streams. This lets one drop the normal 14.9 cycles/byte down to around 1.4 cycles/byte amortised for SHA256 [1], a big win. The batch hash engine uses a compile-time plugin system, so it can be arbitrarily extended with other hash implementations if suitably rewritten to fit. Of course AFIO may not pass peer review, so any work done here may be moot. That therefore should not inhibit your efforts. In case anyone wants to know "when will it be ready?", well being unemployed and having to relocate back to Europe has punched a huge hole in my productivity. Once I'm back (January) I would expect things to move far more swiftly. I should have something which passes basic unit tests this month though - I already have a proof of concept working, it's just converting it into something not so hacked together. [1] These figures are for an Intel i7-3770K running x64. Other CPUs vary hugely (e.g. ARM, whose NEON unit is not well scheduled by GCC or clang). Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 11/13/2013 12:11 PM, Niall Douglas wrote:
On 12 Nov 2013 at 10:52, foster brereton wrote:
Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.
Be aware that AFIO (currently in the peer review queue) is slowly gaining an asynchronous batch hash engine. It's a bit different from your normal hash engine, because it can do things like use SIMD to process four SHA256 streams in parallel, and then using AFIO's closure engine to parallelise that 4-SHA256 processing across multiple cores i.e. achieve sixteen parallel SHA256 processing streams. This lets one drop the normal 14.9 cycles/byte down to around 1.4 cycles/byte amortised for SHA256 [1], a big win. The batch hash engine uses a compile-time plugin system, so it can be arbitrarily extended with other hash implementations if suitably rewritten to fit.
Of course AFIO may not pass peer review, so any work done here may be moot. That therefore should not inhibit your efforts.
In case anyone wants to know "when will it be ready?", well being unemployed and having to relocate back to Europe has punched a huge hole in my productivity. Once I'm back (January) I would expect things to move far more swiftly. I should have something which passes basic unit tests this month though - I already have a proof of concept working, it's just converting it into something not so hacked together.
[1] These figures are for an Intel i7-3770K running x64. Other CPUs vary hugely (e.g. ARM, whose NEON unit is not well scheduled by GCC or clang).
0Sounds interesting. Is the hash engine customizable to composite algorithms? Particularly MD5/SHA1? I'm blocked from using AFIO(on Windows) for my file processing as I need to go thru BackupRead/BackupWrite api's which are explicitly incompatible with overlapped io. I haven't looked if AFIO could offer benefits with that restriction. Thanks, Jeff

On 13 Nov 2013 at 12:48, Jeff Flinn wrote:
Be aware that AFIO (currently in the peer review queue) is slowly gaining an asynchronous batch hash engine. It's a bit different from your normal hash engine, because it can do things like use SIMD to process four SHA256 streams in parallel, and then using AFIO's closure engine to parallelise that 4-SHA256 processing across multiple cores i.e. achieve sixteen parallel SHA256 processing streams. This lets one drop the normal 14.9 cycles/byte down to around 1.4 cycles/byte amortised for SHA256 [1], a big win. The batch hash engine uses a compile-time plugin system, so it can be arbitrarily extended with other hash implementations if suitably rewritten to fit.
0Sounds interesting. Is the hash engine customizable to composite algorithms? Particularly MD5/SHA1?
The API currently works by you instantiating a hash_engine template with the hash you want e.g. hash_engine<SHA256>. In the resulting class instance you have a member function for creating new hashes which returns a vector of fresh handles, a batch function for enqueuing extra memory ranges to a hash handle which returns a vector of futures representing the completed processing of that memory range, and if you pass in a null pointer block it enqueues termination of the hash. Each hash handle provides a future which becomes ready when the hash is terminated. There is absolutely nothing stopping anyone from creating a hash_engine<MD5> and a hash_engine<SHA1> and feeding incoming memory ranges to both engines. Enqueuing ranges is instantaneous, as hashing occurs asynchronously. I would assume the MD5 engine would likely complete before the SHA1 engine, but that's easily coped with thanks to the futures.
I'm blocked from using AFIO(on Windows) for my file processing as I need to go thru BackupRead/BackupWrite api's which are explicitly incompatible with overlapped io. I haven't looked if AFIO could offer benefits with that restriction.
AFIO *always* opens all handles with backup semantics, because AFIO understands symlinks on Windows and treats them (nearly) identically to POSIX. If you go direct to the NT kernel API, the kernel provides a lovely function NtQueryInformationFile which when used with the FILE_ALL_INFORMATION class returns all possible metadata about a file and another function NtSetInformationFile which can set all possible metadata about a file. All you need to do above that is to query and store extended attributes and the security object, again both of which are trivial to do when using the NT kernel API directly. My point is that it is relatively easy to write your own BackupRead and BackupWrite functions, if you skip the Win32 layer and go straight to the kernel. BTW I would be more than happy to accept any patches adding such a feature to AFIO because BackupRead and BackupWrite are seriously broken according to Microsoft themselves :) Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally. It would be nice to have the popular cryptographic algorithms in the Boost, but I'm not sure that it is reasonable to reimplement them. It may be better, for example, to invite the author of the Crypto++ into
12.11.2013 22:52, foster brereton пишет: the Boost community.
The algorithms are currently part of the Adobe Source Libraries on GitHub: SHA:
https://github.com/stlab/adobe_source_libraries/blob/sha_cleanup/adobe/sha.h... I'm not a lawyer but I'm afraid that the MIT license is slightly more restrictive that the Boost license. -- Best regards, Sergey Cheban

On Wed, Nov 13, 2013 at 11:27 AM, Sergey Cheban
12.11.2013 22:52, foster brereton пишет:
Would these algorithms be of general use to the Boost community? If so I would be willing to submit them formally.
It would be nice to have the popular cryptographic algorithms in the Boost, but I'm not sure that it is reasonable to reimplement them. It may be better, for example, to invite the author of the Crypto++ into the Boost community.
You are welcome to start down this road- the Crypto++ authors invariably know more about the domain than I do. That said, if my implementations would serve as a stop-gap until they contribute what they can, so be it.
The algorithms are currently part of the Adobe Source Libraries on GitHub: SHA:
https://github.com/stlab/adobe_source_libraries/blob/sha_cleanup/adobe/sha.h...
I'm not a lawyer but I'm afraid that the MIT license is slightly more restrictive that the Boost license.
The licensing of the files would have to change. I will follow up on this, but there has already been talk about moving ASL over to the Boost license, so if that happens then it would make this issue moot. -foster

I have been working on implementations of the SHA and FNV-1a class of hashing algorithms. Both are header-only.
<snip>
Would these algorithms be of general use to the Boost community?
Yes. Absolutely.
I thought there were some restrictive licensing conditions on the
algorithms themselves that were incompatible with BPL.
But others will know this better than I do.
I have two suggestions, if it ever gets that far.
1) Consider making the interface similar to Boost's CRC
interface, which already exists. I think this does the
init --- digest --- finish thing, more or less, whereby
the concept of the final XOR value might be the finish.
2) Please, please please make them templates suitable
for use in *all* systems including 8-bit through 64-bit
microcontrollers. Thereby I explicitly mean making the
so called "padding length" (of bits in your case) a template
parameter. Hash algos with any kind of 64-bit use are
can not be sensibly ported to a wide range of micros.
I have MD5, SHA1, SHA-256 proven to work, header
only on 8-bit through 64-bit compilers that have been
tested in hard real time if any interest. These are,
however byte-oriented.
Keep going!
Sincerely, Chris.
On Wednesday, November 13, 2013 1:40 AM, foster brereton

On Wed, Nov 13, 2013 at 11:40 AM, Christopher Kormanyos
Yes. Absolutely.
I thought there were some restrictive licensing conditions on the algorithms themselves that were incompatible with BPL. But others will know this better than I do.
I have two suggestions, if it ever gets that far.
1) Consider making the interface similar to Boost's CRC interface, which already exists. I think this does the init --- digest --- finish thing, more or less, whereby the concept of the final XOR value might be the finish.
An interface is being provided that closely mimics init-digest-finish. In OpenSSL's implementation of SHA it's init-update-final, and I am thinking it would be best to conform to other SHA implementations as close as possible.
2) Please, please please make them templates suitable for use in *all* systems including 8-bit through 64-bit microcontrollers. Thereby I explicitly mean making the so called "padding length" (of bits in your case) a template parameter. Hash algos with any kind of 64-bit use are can not be sensibly ported to a wide range of micros.
I'm not entirely sure how the padding length changes based on the processor upon which the routine is running. The pad length is spec'd to SHA, which I am sure you know, so I must be missing something here. Nevertheless I am willing to work with you to get things right on e.g., 8-bit micros. -foster

2) Please, please please make them templates suitable for use in *all* systems including 8-bit through 64-bit microcontrollers. Thereby I explicitly mean making the so called "padding length" (of bits in your case) a template parameter. Hash algos with any kind of 64-bit use are can not be sensibly ported to a wide range of micros.
I'm not entirely sure how the padding length changes based on the processor upon which the routine is running. The pad length is spec'd to SHA, which I am sure you know, so I must be missing something here. Nevertheless I am willing to work with you to get things right on e.g., 8-bit micros. Let me try to add a few details. Most developers only think of large desktop PCs when developing hash algorithms. They are, however, extremely useful for small embedded systems for such versatile tasks as checking the integrity of ROM/Flash memory regions, securing small data packages in communication protocols, or data hiding when field programming with a boot loader, etc. Consider the counter that counts the total number of bits in the digest sequence. Some people call this the padding length, although I find this name non-intuitive. This is the running total bit count that will be integrated into the hash in the finalize stage. Many programmers simply set the size this counter up to the maximum, which may be unsigned 64-bit integer. This makes sense because when supporting very long data streams. But consider a microcontroller that might handle small data payloads. These might fit into unsigned 8-bit integer. In this case, forcing the microcontroller to manipulate a 64-bit counter leads to unduly poor performance. So let the user decide the size of the parameter via template parameter. In addition, there are numerous loops in hashing algorithms that run to an upper limit of, say, 64 or 80. Instead of using plain integer for these, it is possible to retain portable performance and still handle the size of the loop by using uint_least8_t or uint_fast8_t. There are several such design choices. If you would like to compare notes, you may be interested in md5, sha1, or sha256 in the directory shown below at git. https://github.com/ckormanyos/real-time-cpp/tree/master/ref_app/src/math/che... There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made. If you would like to compare notes, or need some MD5, SHA-1, SHA-256 test cases based on the NIST vectors, or just would like to see a really efficient microcontroller (and PC) implementation, the codes in my git above may useful. Keep up the good work! Sincerely, Chris.
On Wednesday, November 13, 2013 8:56 PM, foster brereton
Yes. Absolutely.
I thought there were some restrictive licensing conditions on the algorithms themselves that were incompatible with BPL. But others will know this better than I do.
I have two suggestions, if it ever gets that far.
1) Consider making the interface similar to Boost's CRC interface, which already exists. I think this does the init --- digest --- finish thing, more or less, whereby the concept of the final XOR value might be the finish.
An interface is being provided that closely mimics init-digest-finish. In OpenSSL's implementation of SHA it's init-update-final, and I am thinking it would be best to conform to other SHA implementations as close as possible.
2) Please, please please make them templates suitable for use in *all* systems including 8-bit through 64-bit microcontrollers. Thereby I explicitly mean making the so called "padding length" (of bits in your case) a template parameter. Hash algos with any kind of 64-bit use are can not be sensibly ported to a wide range of micros.
I'm not entirely sure how the padding length changes based on the processor upon which the routine is running. The pad length is spec'd to SHA, which I am sure you know, so I must be missing something here. Nevertheless I am willing to work with you to get things right on e.g., 8-bit micros. -foster _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Nov 13, 2013 at 3:03 PM, Christopher Kormanyos
2) Please, please please make them templates suitable for use in *all* systems including 8-bit through 64-bit microcontrollers. Thereby I explicitly mean making the so called "padding length" (of bits in your case) a template parameter. Hash algos with any kind of 64-bit use are can not be sensibly ported to a wide range of micros.
I'm not entirely sure how the padding length changes based on the processor upon which the routine is running. The pad length is spec'd to SHA, which I am sure you know, so I must be missing something here. Nevertheless I am willing to work with you to get things right on e.g., 8-bit micros.
Let me try to add a few details.
[ snip very helpful details] Yes, I see what you are saying now, thanks for the thorough explanation of the issue. I'll go back and chew on this idea, as I like it a lot (and wonder if there are other types that, if made native to the architecture on which the hash is running, would speed it up.) - foster

On 11/13/13 6:03 PM, Christopher Kormanyos wrote: ...
There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made.
I've this exact situation - currently two hashes are maintained.
If you would like to compare notes, or need some MD5, SHA-1, SHA-256 test cases based on the NIST vectors, or just would like to see a really efficient microcontroller (and PC) implementation, the codes in my git above may useful.
Keep up the good work!
+1 Jeff

On Wed, Nov 13, 2013 at 7:26 PM, Jeff Flinn
On 11/13/13 6:03 PM, Christopher Kormanyos wrote: ...
There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made.
I've this exact situation - currently two hashes are maintained.
So what if the hash classes were written such that this were the default behavior? Specifically that finalize copies the message block state, returns the final digest to the user, and leaves the engine ready to continue processing the message digest. We can still clear the message block state in the destructor, leaving the postconditions of the class the same at the end of its lifetime. For most use cases things would be no different for the user, but there would be much greater flexibility of use without adding more APIs. - foster

So what if the hash classes were written such that this were the default behavior? Specifically that finalize copies the message block state, returns the final digest to the user, and leaves the engine ready to continue processing the message digest. We can still clear the message block state in the destructor, leaving the postconditions of the class the same at the end of its lifetime. For most use cases things would be no different for the user, but there would be much greater flexibility of use without adding more APIs.
I view this as a design choice.
The tried-and-true init --- digest --- finalize mechanism actually
stems from a procedural-language approach. This is because
explicit initialization or finalization (as you mention) can be
eliminated with class-internal flags. So maybe an object-oriented
design needs a slightly altered paradigm.
Maybe it would be a good idea to mimic the interface of
Boost.Crc, in so far as there are analogies such as init,
final XOR --> finalize, etc.
I think these kinds of design choices such as the interface,
the class template parameters, etc. generally should be
agreed upon up front.
Sincerely, Chris.
On Thursday, November 14, 2013 5:31 AM, foster brereton
On 11/13/13 6:03 PM, Christopher Kormanyos wrote: ...
There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made.
I've this exact situation - currently two hashes are maintained.
So what if the hash classes were written such that this were the default behavior? Specifically that finalize copies the message block state, returns the final digest to the user, and leaves the engine ready to continue processing the message digest. We can still clear the message block state in the destructor, leaving the postconditions of the class the same at the end of its lifetime. For most use cases things would be no different for the user, but there would be much greater flexibility of use without adding more APIs. - foster _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, Nov 14, 2013 at 11:39 AM, Christopher Kormanyos
I view this as a design choice.
The tried-and-true init --- digest --- finalize mechanism actually stems from a procedural-language approach. This is because explicit initialization or finalization (as you mention) can be eliminated with class-internal flags. So maybe an object-oriented design needs a slightly altered paradigm.
Maybe it would be a good idea to mimic the interface of Boost.Crc, in so far as there are analogies such as init, final XOR --> finalize, etc.
I think these kinds of design choices such as the interface, the class template parameters, etc. generally should be agreed upon up front.
Well this is why I asked. If I were to move forward with a formal proposal of these headers into Boost I would want to get a consensus of opinion on these design choices, to have a solid rationale for building it out the way I did. It is obvious others have thought more about these interfaces than I, and there are use cases that I have not thought of that dramatically alter the interface. With that in mind, let me see if I can revise the code to accommodate a broader audience. -foster

Hi Chris, On 11/13/2013 6:03 PM, Christopher Kormanyos wrote: ...
There are several such design choices. If you would like to compare notes, you may be interested in md5, sha1, or sha256 in the directory shown below at git.
https://github.com/ckormanyos/real-time-cpp/tree/master/ref_app/src/math/che...
There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made.
If you would like to compare notes, or need some MD5, SHA-1, SHA-256 test cases based on the NIST vectors, or just would like to see a really efficient microcontroller (and PC) implementation, the codes in my git above may useful.
I've looked at your MD5 class declaration at the above link. In the "input" methods: void process_data(const std::uint8_t*, count_type); void process_data(const char*, count_type); all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code. The boost vault hash api exposes a process_block() method avoiding redundant buffering. This allows me to do something like: boost::crypto::md5 h; boost::for_each (block_range , [h&](const block& b){ h.process_block(b); other(b); }); h.input(tail_data_ptr, tail_data_count); h.digest(); Where block_range::value_type is a complete 64Byte block, each of which get hashed and passed on for additional processing. In my case the only call to input with data less than block size is always paired with a digest()/finalize() call. So these could be mereged into a digest_with_subblock(tail_data_ptr, tail_data_count) method; Sometimes I need to break up the block_range into multiple segments, maintaining both outer and inner hashes. Enforcing this break on a block boundary allows me to copy the single outer hash and finalize for the first segment. Additional segments then continue pushing blocks to the outer hash along with pushing to a new inner hash. Jeff

On Thu, Nov 14, 2013 at 9:24 AM, Jeff Flinn
all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code. The boost vault hash api exposes a process_block() method avoiding redundant buffering. This allows me to do something like:
boost::crypto::md5 h;
boost::for_each (block_range , [h&](const block& b){ h.process_block(b); other(b); });
h.input(tail_data_ptr, tail_data_count);
h.digest();
[snip] I see what you mean now about the double-buffering, and my hash code is guilty of the same offense. I can look into modifying it to expose the process_block routine, however... I have been doing profiling of my SHA implementations, and copying the data around doesn't even show up on the radar -- the vast majority of the time, by far, is in the actual digest routine. The profile for MD5 may look different, but at least in the case of SHA I don't know how much the straight-buffer-processing gains you. -foster

On 11/14/2013 1:54 PM, foster brereton wrote:
On Thu, Nov 14, 2013 at 9:24 AM, Jeff Flinn
wrote: [snip]
all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code. The boost vault hash api exposes a process_block() method avoiding redundant buffering. This allows me to do something like:
boost::crypto::md5 h;
boost::for_each (block_range , [h&](const block& b){ h.process_block(b); other(b); });
h.input(tail_data_ptr, tail_data_count);
h.digest();
[snip]
I see what you mean now about the double-buffering, and my hash code is guilty of the same offense. I can look into modifying it to expose the process_block routine, however...
I have been doing profiling of my SHA implementations, and copying the data around doesn't even show up on the radar -- the vast majority of the time, by far, is in the actual digest routine. The profile for MD5 may look different, but at least in the case of SHA I don't know how much the straight-buffer-processing gains you.
It may not show up in profiling in isolation. My actual case defines
h.process_block() as
CompositeHash

I have been doing profiling of my SHA implementations, and copying the data around doesn't even show up on the radar -- the vast majority of the time, by far, is in the actual digest routine. The profile for MD5 may look different, but at least in the case of SHA I don't know how much the straight-buffer-processing gains you.
Well, earlier I was talking about resource-critical microcontrollers.For a micro with, say, a few kilo-bytesof RAM, saving 64 of them
by eliminating redundant internal buffering isa real gain.
Personally, I see this as more of a design choice. But I think I will
try to optimize my own hashes accordingly.
Good stuff!
Sincerely, Chris.
On Thursday, November 14, 2013 7:54 PM, foster brereton
all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code. The boost vault hash api exposes a process_block() method avoiding redundant buffering. This allows me to do something like:
boost::crypto::md5 h;
boost::for_each (block_range , [h&](const block& b){ h.process_block(b); other(b); });
h.input(tail_data_ptr, tail_data_count);
h.digest();
[snip] I see what you mean now about the double-buffering, and my hash code is guilty of the same offense. I can look into modifying it to expose the process_block routine, however... I have been doing profiling of my SHA implementations, and copying the data around doesn't even show up on the radar -- the vast majority of the time, by far, is in the actual digest routine. The profile for MD5 may look different, but at least in the case of SHA I don't know how much the straight-buffer-processing gains you. -foster _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 14 Nov 2013 at 10:54, foster brereton wrote:
I see what you mean now about the double-buffering, and my hash code is guilty of the same offense. I can look into modifying it to expose the process_block routine, however...
I have been doing profiling of my SHA implementations, and copying the data around doesn't even show up on the radar -- the vast majority of the time, by far, is in the actual digest routine. The profile for MD5 may look different, but at least in the case of SHA I don't know how much the straight-buffer-processing gains you.
In AFIO's hash engine I find the SIMD 4-SHA256 implementation is *very* sensitive to unnecessary buffering. I suspect this is because Ivy Bridge can run no more than four prefetchers at once, so the 4-SHA256 implementation which sucks in four separate streams exhausts the prefetchers quickly. BTW try your hash implementation on an Intel Atom! If you can get it fast there, you'll have covered all your bases e.g. on ARM etc. I spent quite some time getting my proof of concept implementation not running pathologically slow on the Atom 220 netbook I'm using to type this, and it was well worth the effort because I threw away my previous design completely. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

Hi Jeff,
... all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code.
Thank you for this astute observation. Somehow, it never dawned on
me that the class-internal buffering can be avoided at the cost of
slightly extended zero-buffer and padding calculations for the final
(non-modulus-64) block.
It looks like I owe you 64 bytes of precious microcontroller SRAM.
Sincerely, Chris.
On Thursday, November 14, 2013 6:25 PM, Jeff Flinn
There are several such design choices. If you would like to compare notes, you may be interested in md5, sha1, or sha256 in the directory shown below at git.
https://github.com/ckormanyos/real-time-cpp/tree/master/ref_app/src/math/che...
There is another interface detail that can lead to a useful flexibility. In addition to init --- digest --- finalize scheme, some use cases need to finalize a running hash, yet retain its state. This can be accomplished with a kind of function such as "get_result_and_nochange_the_state". In this way, a running hash final result can be obtained, but more bytes can be fed into the hash. This can be a constant member function, as a copy of the hash object needs to be made.
If you would like to compare notes, or need some MD5, SHA-1, SHA-256 test cases based on the NIST vectors, or just would like to see a really efficient microcontroller (and PC) implementation, the codes in my git above may useful.
I've looked at your MD5 class declaration at the above link. In the "input" methods: void process_data(const std::uint8_t*, count_type); void process_data(const char*, count_type); all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code. The boost vault hash api exposes a process_block() method avoiding redundant buffering. This allows me to do something like: boost::crypto::md5 h; boost::for_each (block_range , [h&](const block& b){ h.process_block(b); other(b); }); h.input(tail_data_ptr, tail_data_count); h.digest(); Where block_range::value_type is a complete 64Byte block, each of which get hashed and passed on for additional processing. In my case the only call to input with data less than block size is always paired with a digest()/finalize() call. So these could be mereged into a digest_with_subblock(tail_data_ptr, tail_data_count) method; Sometimes I need to break up the block_range into multiple segments, maintaining both outer and inner hashes. Enforcing this break on a block boundary allows me to copy the single outer hash and finalize for the first segment. Additional segments then continue pushing blocks to the outer hash along with pushing to a new inner hash. Jeff _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 11/14/2013 2:28 PM, Christopher Kormanyos wrote:
Hi Jeff,
... all the code but the call to perform_algorithm() deal with buffering, which ends up being redundant with the buffering typically provided by client code.
Thank you for this astute observation. Somehow, it never dawned on me that the class-internal buffering can be avoided at the cost of slightly extended zero-buffer and padding calculations for the final (non-modulus-64) block.
It looks like I owe you 64 bytes of precious microcontroller SRAM.
Hmm, where to spend them... I haven't done inventory on block sizes of all of the hash algorithms, but it seems to me the primary interface of the hash types should include: - compile time block size - a function to accumulate a block - a function to accumulate trailing final bytes IMO buffering should be a separate component which could be used to implement the layered convenience functions built upon the primary interface. Then I can use boost math static_lcm to get a min require block size, then my composite hash would make an integral number of calls for each sub-hash. Jeff
participants (7)
-
Christopher Kormanyos
-
foster brereton
-
Jeff Flinn
-
Jeff Flinn
-
Marshall Clow
-
Niall Douglas
-
Sergey Cheban