
- What is your evaluation of the design? Please see the documentation section - most of the comments there are about the design at least as much as how it is documented.
- What is your evaluation of the implementation? The implementation is unsurprisingly of high quality.
- What is your evaluation of the documentation?
The documentation is very comprehensive. Sufficiently so that it called out all the surprising features of what I'd expected was going to simply be a refined (given the author) implementation of the approach described in the paper "types don't know #". The documentation describes hash algorithm requirements early. I expected to be unsurprised and initially was (vs the hash algorithm requirements in the paper) but the fact that every hash algorithm must have a constructor taking an integer seed and one taking a byte sequence immediately caught my attention. I didn't expect that all of the listed hash algorithms would have either or both of those constructors. So I checked the reference docs. for the algorithms to find out what this requirement has added to the "usual" expectations for each algorithm. FNV-1a looked like a predictable, minimal "how to seed something that isn't a keyed hash i.e. just feed it a "key" as a prefix (by calling update). That did make me wonder why the concept was complicated with the constructors that aren't doing anything the user couldn't do post-construction by simply calling update(). But - xxhash algorithm specification (as opposed to the hash2 library) does define a (result-type sized) seed and sure enough - hash2 provides a constructor that operates as expected (with a twist that the seed is not of result_type so can be larger than seed the 32 bit algorithm supports). This discrepancy and the requirement that different seeds will give different results forces larger seed values to be used creatively. The approach taken is to, if the seed was not small enough to be converted to result type without overflow to use the low 32 bits to do the defined seeding then mix the high 32 bits into the state "somehow" (I checked the implementation, its as if the high 32 bits were copied to unsigned char b[4] then update(b, 4) called 4 times - I think? So - what happens when constructing from a byte sequence? In this case we fallback to default construction followed by update() - oh - the docs say "update() and a call to result(). That looks odd? Why would one call result()? After calling result() further update() calls are valid? Read some more docs. Oh - every hash algorithm is required to not only have these constructors, but also to be an extensible hash. Another surprise. It turns out, result() having a side effect of changing the state is required/expected. And being used here to "mix" the seed. I'm not 100% sure why - not that it's bad - but just how arbitrary is each algorithms seeding behaviour? Is it reasonable to be creating these seeding conventions specific to the library? The obvious answer is - if you don't like it, write a different algorithm implementation. It is true that for the purposes of generating hashes of unordered container keys any "reasonable" seeding convention is good enough - even where hash flooding protection is needed. Perhaps using the library to hash arbitrary C++ objects for any/all uses is getting into an area where the user needs to be a bit more aware of potential interoperability issues. Even so, the algorithm specification for each provided algorithm should be very explicit about just how it seeds - and that there is no interoperability of these features with any other implementation of the "same" algorithm. Unless of course there is - If an algorithm in hash2 follows a specification for seeding and/or extension this should be clearly documented/referenced. An implementation of https://keccak.team/kangarootwelve.html would make an interesting addition due to it having a comprehensive set of functions that might be expected to directly fulfil the concepts (among other nice features).
- What is your evaluation of the potential usefulness of the library? Do you already use it in industry?
I haven't used it yet. But I've used a hash algorithms for various things. Some cryptographic, some purely for high performance (relative to resources) unordered containers / data structures. I would definitely use this library for unordered key hashing, and the convenience of creating a hash for a user defined type independent of the algorithm choice is a valuable capability and the reason I haven't had anything muich to say about that aspect of the library is - I don't have any complaints or suggestions for improvement, except that I think that support for hashing unordered containers should be dropped. because there isn't a robust and performant generic solution (it seems - if there is, it's not in the library). I would think twice before using the library for any cryptographic purpose not because of any specific issues (other than the unordered container issue) beyond it being yet another implementation of the raw hash algorithm without any specific benefits and with a restrictive (for good reason) interface tailored to the uses envisioned in the paper.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
I compiled and ran some tests using g++ 14.2 - It was very easy to do this. Modular boost is great!
- How much effort did you put into your evaluation?
A few hours.
A glance? A quick reading? In-depth study?
I took a glance, mistook it for a nice simple library to make unordered containers easier to use with arbitrary key types, and though I wouldn't dig deeper and probably didn't have a lot to add beyond voting to accept. But the extended scope of the algorithm concept, and also the aspirations to solve the problem of hashing multisets / unordered containers as key type made me take a bit more notice. I don't think the generalisations of the hash algorithm concept do any harm, but the potential for surprise and the potential to expect that the hash algorithms so defined/encapsulated should be used for "everything" is there. The ergonomics are a bit quirky - I do wonder if adopting names that are a bit more descriptive of the hash algorithm operation (if we were talking only about sponges Id say absorb(), squeeze() in place of update() and digest() ) might be in order to show just how different the algorithm concepts are, but I'm not sure that every hash algorithm should provide the latter. Perhaps it is more that everything should provide update(), an extensible hash should provide "stir()" and everything should provide a (const) result(). Surely the few places where result is called multiple times with the expectation that the state was changed can/should make this explicit? Im also not sure I like result types as array-like byte sequences or integers but perhaps optimisers can't quite make that choice irrelevant from a performance perspective. Im not sure the library should even try to solve the problem of extracting "good" bits from a hash that doesn't have great dispersion properties. There isn't anything "wrong" with a fast and less than uniform and certainly not preimage resistant hash function - there is something wrong with trying to "make it work" where it shouldn be being used in the first place.
- Are you knowledgeable about the problem domain? I know enough to be dangerous. And know enough to know I am (dangerous). This library is mostly a way to avoid footgun problems while easing the task for the average knowledge user like me.
Ensure to explicitly include with your review: ACCEPT, REJECT, or
CONDITIONAL ACCEPT (with acceptance conditions). I vote to ACCEPT the library. I don't think adding explicit conditions is necessary as I expect the author will address any issues.