
El 08/12/2024 a las 12:42, Peter Dimov via Boost escribió:
... this is implied by "equivalent to the original", because if the copy produced a different output, it wouldn't be equivalent.
- Should the HashAlgorithm support calls to "update" and "result" several times, interleaving both calls?
It should, yes. Any calls are allowed, in any order.
Understood.
- I don't know if SHA-1/RIPEMD-128 should be in the library at all (insecure and not sure if useful in any real context).
SHA-1 is still widely used, e.g. for Git commit hashes. We added RIPEMD-128 in order to have a recommended replacement for MD5 in the case where the digest has to be exactly 128 bits.
Ok.
- The original proposal mentions a "variadic version of hash_append", (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#variadic) maybe that would be a useful utility in the library.
It might be, yes.
Let's add it to the to-do list ;-)
- The hash_append_unordered_range function is surprising for people like me that are not experts on hashing, but it's an important concept for users. The original proposal explains the problem (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#unordered), I feel the problem should be also explained in the documentation of Hash2, and how the library design solves it (using a copy of the HashAlgorithm, as proposed in n3980).
N3980 also says that "And there are various other schemes. They are all implementable. But they each have their advantages and disadvantages. Therefore this proposal proposes none of them. Should the future expose the ideal hash_append specification for unordered containers, it can always be added at that time."
Why does Hash2 offer a single alternative?
I don't know of a better one.
Constructing a temporary std::set is not a serious alternative. In addition to being allocating, slow, and potentially throwing, it also relies on the existence of operator<, which an unordered container doesn't require.
Ok. Yes, the other alternative described in the original paper is much worse. Not sure how the hash quality is weakened with the curren implementation: all element hashes are converted to 64 bit, added and a final hash is computed. It seems to me that combining and shuffling all original result_type bits from each individual hash would provide a better hash result for the whole sequence, but I'm not certainly able to mathematically prove that...
-N3980 proposes a "type-erase hasher" (https://www.open- std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#type_erased_hasher ), would it be interesting to include it in Hash2?
I had the type erased hasher in the "planned additions" list for a long time, partly because I thought I needed it for the hash2sum example. But it doesn't meet the hash algorithm requirements, because it doesn't have the correct constructors and because the copy constructor is allocating and potentially throwing (the latter is not, strictly speaking, a violation but it's still undesirable.)
Type-erasing requires trade-offs but for types like pimpl idioms, you can't template or it's not desirable to template code that previously was on a cpp. One option is that boost::hash2::type_erased_hasher can have enough small buffer optimization (at least for hashes implemented in the boost::hash2 library) so that allocation can be avoided in most cases. A good type_erased_hasher is not trivial and certainly useful.
- The documentation shows (in the chapter "Use with Unordered Containers") several "hash" classes that are needed to be able to use Hash2 with std/boost unordered_xx classes (or boost::intrusive::unordered_xxx types). I think it would be a good idea to include such utilities in this Hash2 library (under names like "hasher", "seeded_hasher".
One of these, or maybe some different one, will eventually be included, once I decide on which.
Great.
- The design says "The size_type member type of the flavor affects how container and range sizes (typically of type size_t) are serialized. Since the size of size_t in bytes can vary, serializing the type directly results in different hash values when the code is compiled for 64 bit or for 32 bit. Using a fixed width type avoids this".
I don't agree with this, between 32 and 64 bit compilation you will have differences when hashing user-defined types that have pointers, stored std::size_t/ptrdiff_t members, etc..
There are still a lot of types for which the size type is the only source of variability, e.g. std::vector<int>.
IMHO xxx_flavours defined in the library should define size_type as "std::size_t". Those wanting portable hashes between 32 and 64 bit need to do a bigger effort than configuring Hash2, and probably defining size_type as std::uint64_t will reduce the performance of 32 bit users...
It doesn't.
Show me the numbers ;-) Maybe if benchmark results are added to the documentation this can be shown.
- There is an asymmetry between "writeNNle" and "writeNNbe" implementations (little vs big endian). In little endian implementations "detail::is_constant_evaluated() and endianess" is used to be able to use memcpy in some cases (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L26) where in the big endian implementation this is not performed (https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/detail /write.hpp#L73).
Probably few people use big endian platforms (except those like me that still need to compile for powerpc targets...), but I would expect an equivalent implementation for big-endian functions.
The implementations are equivalent in practice. GCC and Clang recognize the pattern used and turn it into a single mov; the memcpy path is only necessary for MSVC, which doesn't support big endian architectures.
Ok, understood.
- class Str example: N is missing the type, missing <algorithm> include, operator== has the wrong type...
Sorry, why does operator== have the wrong type?
Because operator== for Str compares X instances instead of Str instances: class Str { private: static constexpr N = 32; std::uint8_t size_ = 0; char data_[ N ] = {}; public: friend constexpr bool operator==( X const& x1, X const& x2 ) { return x1.size_ == x2.size_ && std::equal( x1.data_, x1.data_ + x1.size_, x2.data_ ); }
- All "Use with Unordered Containers" examples should be compilable, including the needed headers.
The examples are actually compilable, I just stripped the #includes when embedding them in the documentation to avoid the unnecessary repetition. Maybe I shouldn't have. :-)
I've found (at least in my libraries) that some users like to copy-paste examples in the doc and want them to work without needing to figure out what's missing. Best, Ion