Re: [boost] [hash2][review] Early review (due to holidays)

6 Dec 2024

      On 12/6/24 18:54, Julien Blanc via Boost wrote:
...
6 décembre 2024 à 14:24 "Andrey Semashev via Boost" a écrit:
...
I also had compile-time dependencies in mind when I said "lightweight".
To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X
 {
 private:
 std::string a;
 int b;
// ...
 public:
 template<class Hash, class Flavor>
 friend void tag_invoke( boost::hash2::hash_append_tag const&,
 Hash& h, Flavor const& f, X const& v )
 {
 boost::hash2::hash_append(h, f, v.a);
 boost::hash2::hash_append(h, f, v.b);
 }
 };
This requires one to depend on Boost.Hash2 for hash_append_tag and
hash_append. I would prefer if there was a way (properly documented and
supported) to avoid this dependency entirely. I.e. so that something
like this is a working and advertised way to add support for hashing to
your type:
class X
 {
 private:
 std::string a;
 int b;
// ...
 public:
 template<class Hash, class Flavor>
 friend void hash_append( Hash& h, Flavor const& f, X const& v )
 {
 h.update(f, v.a);
 h.update(f, v.a.size());
 h.update(f, v.b);
 }
 };
X x;
 boost::hash2::fnv1a_64 h;
 boost::hash2::hash_append(h, {}, x); // works
I think i start to understand what you mean. In what i have in mind :
* hash algorithm implementation and interface do not depend on boost::hash2
  (this is the case currently)
* custom object hashing does depend on boost::hash2 (this is also the case 
  currently)
From your examples, it seems to me that you would like custom object hashing
to also be independant from boost::hash2. Is that accurate?
Yes.
...
...
...
...
and potentially compatible with std::hash2, if one ever comes to light.
I do not see how this is related.
What I meant by this is that in the above example Hash could become
std::fnv1a_64 or whatever, if the std library adopts the proposal, and
the existing hashing infrastructure will automatically support it. This
is one of the main selling points of the library/proposal, after all.
X x;
 std::fnv1a_64 h;
 std::hash_append(h, {}, x); // works, with no changes to X
What i had in mind was more:
X x;
std::fnv1a_64 h;
boost::hash2::hash_append(h, {}, x); // works, with no changes to X
Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64
finally chose as its interface.
Sure, but why use boost::hash2::hash_append when there's, presumably,
std::hash_append, since you've already switched to std::fnv1a_64?

But it doesn't really matter. My main point was that the support for
hashing in user's type X remains functional and doesn't need updating or
duplication.

Now, after Peter's comments on the tag argument, I can see this would be
difficult - at least the tags would be different between
boost::hash2::hash_append and std::hash_append, assuming that both use
the tag_invoke protocol. This is unfortunate, but even then just the
different tags would be better than entire different APIs, so I still
see value in a member-style hash_append API in the h argument (which
would probably be a wrapper or a proxy for the underlying hasher). A way
to have zero forward declarations would be perfect, though.

And I should also mention that I realize that std::hash2 may be
incompatible with Boost.Hash2 or even never be standardized, so the
whole compatibility argument is rather theoretical at this point.
...
On a broader view, Claudio's made a very valid point, which made me realize that the
current review goals may be a bit unusual. The usual review focus on finding the best
suitable interface for the library. However, i don't think such interface exists:
* span<void const> is not available, even with a bleeding edge compiler.
* support for older C++ versions mandates support for void* + size pointer as separate
arguments (no span vocabulary type), which we know will be flagged by compilers as unsafe
So, in my opinion, part of the review should also focus on how will the library 
evolve toward a safer/modern version, with minimal burden for the library user. 
Basically we know that we will deliver an interface that is outdated by current
standards, but that we have very valid reasons to do that (these reasons would 
not hold if we were targetting std). We should think about of we handle the
cohabitation with a safer interface, and how we provide a smooth transition 
path to users, taking full advantage of available compiler features, so that 
when they update their code to newer standards and safer patterns, boost::hash2 is
not a concern and just works flawlessly.
Personally, I think the "safety" argument doesn't really work in
practice. If you want decent performance, you will likely want to
operate on raw pointers and memory in the hasher implementation. In some
cases you might want to use SIMD intrinsics or process bytes as
system-sized words, for example, which is already throwing type safety
out of the window. You'll also not want bounds checking to get in the
way in your hot loop.

I think, the answer to memory safety won't be the span as the argument
type, which will only get in the way in the hasher implementation, but a
markup of the pointer+size arguments so that the compiler is able to
reason about the buffer size anyway. I think, MSVC and gcc already have
attributes of that kind, though I'm not sure what exactly the compilers
are able to do with them. Maybe, those attributes need to be
standardized instead of pushing span everywhere.