[hash2][review] Early review (due to holidays)

Hello.
My review is the summary of a few discussions I've had with Peter and
Christian regarding the API design being employed for hash2. I'm aware
hash2 is inspired by one of Hinnat's proposals to standardisation that
didn't come to fruition. I would like first to commend the authors for
their work on this project. Having good libraries for this type of task is
important, especially when the big names like OpenSSL provide not-so-great
interfaces. Having said that, I have a few thoughts on what this library
may be missing.
*Spanification*
There's a significant change in mood with regards to safe library design
taking place in the industry. With clang, we now
have -Wunsafe-buffer-usage, and this warning has been continuously improved
recently. For reference https://clang.llvm.org/docs/SafeBuffers.html
When I was going through this library's code I was left with a frustrating
thought that boost should be at the forefront of this type of innovation:
safe, and intuitive library design. Each function that takes a pair of
void*/size_t should be a span

On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost
function that takes a pair of void*/size_t should be a span
or rather `std::span<[const ]std::byte>`. --DD https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes

Dominique Devienne wrote:
On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost
wrote: function that takes a pair of void*/size_t should be a span
or rather `std::span<[const ]std::byte>`. --DD
https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes
Or rather, the nonexistent span<void>, which is like void*, but better. N3980's (equivalent of) update takes by void const*. Originally, I had that changed to unsigned char const* (or even to byte_type const*, with the idea of switching to std::byte eventually), but this proved very cumbersome in practice. Untyped byte sequences sometimes come as char[], sometimes as unsigned char[], sometimes (if you're modern) as std::byte[], and taking by any of these means that the rest need an ugly reinterpret_cast. So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability. (And then had to add unsigned char const* back because constexpr.) The hypothetical span<void> would take either of those three byte-like types (and maybe char8_t[] as well), so that the user doesn't need to reinterpret_cast.

So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.
In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.

By the way, C++20 offers byte extensions https://en.cppreference.com/w/cpp/container/span/as_bytes and interestingly enough they do use `std::byte`.

On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:
By the way, C++20 offers byte extensions
Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs. Sure we could add boost::byte to boost, but that is a different discussion.

Ivan Matek wrote:
On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:
By the way, C++20 offers byte extensions
Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs.
It is C++11, yes. We briefly considered requiring C++14 for constexpr reasons, but feedback on Slack wasn't positive.

As I explained in my review, yes, for the authors to accommodate what I'm
asking, it is fair to say that this library would have to depend on some
robust theoretical span library, span2 if you will, and that's why I didn't
vote accept nor reject, because it seems unfair to impose such a
requirement. There are certainly challenges to reconcile these demands
across different C++ standards.
On Wed, Dec 4, 2024 at 4:14 PM Ivan Matek
On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:
By the way, C++20 offers byte extensions
Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs. Sure we could add boost::byte to boost, but that is a different discussion.

Claudio DeSouza wrote:
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.
In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.
Well... no, they aren't. Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's hash.update( v.data(), v.size() ); And if you have the data in `std::string st`, that's hash.update( st.data(), st.size() ); Now suppose `update` takes `span<unsigned char const>`. For `v`, this is hash.update( v ); which is nice; but for `st`, that would be (assuming C++17) hash.update( as_bytes( span{st} ) ); which is not so nice. In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.

I tend to disagree. In my opinion even this:
hash.update( as_bytes( span{st} ) );
Is better than this:
hash.update( v.data(), v.size() );
Something like as_byte_span is just an extra nicety, but I think it is easy
to underestimate how you can avoid mistakes with the first example as
opposed to the second.
On Wed, Dec 4, 2024 at 2:38 PM Peter Dimov via Boost
Claudio DeSouza wrote:
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.
In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.
Well... no, they aren't.
Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's
hash.update( v.data(), v.size() );
And if you have the data in `std::string st`, that's
hash.update( st.data(), st.size() );
Now suppose `update` takes `span<unsigned char const>`. For `v`, this is
hash.update( v );
which is nice; but for `st`, that would be (assuming C++17)
hash.update( as_bytes( span{st} ) );
which is not so nice.
In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Claudio DeSouza wrote:
I tend to disagree. In my opinion even this:
hash.update( as_bytes( span{st} ) );
Is better than this:
hash.update( v.data(), v.size() );
Maybe. But the two cases with the current interface are consistent - you do the same thing for both vector and string - and the two cases with the "normal" span aren't. I would prefer a span-style interface where the two cases look the same, like this hash.update( v ); hash.update( st ); There'd be no questions here as to whether this is a usability gain compared to the current interface. (Ideally, in a hypothetical stdlib, .data would return a span and the above would be hash.update( v.data() ); hash.update( st.data() ); to make it explicit that we're dropping to a lower level interface, but one can't have everything.)
Something like as_byte_span is just an extra nicety, but I think it is easy to underestimate how you can avoid mistakes with the first example as opposed to the second.
On Wed, Dec 4, 2024 at 2:38 PM Peter Dimov via Boost
wrote: Claudio DeSouza wrote:
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.
In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.
Well... no, they aren't.
Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's
hash.update( v.data(), v.size() );
And if you have the data in `std::string st`, that's
hash.update( st.data(), st.size() );
Now suppose `update` takes `span<unsigned char const>`. For `v`, this is
hash.update( v );
which is nice; but for `st`, that would be (assuming C++17)
hash.update( as_bytes( span{st} ) );
which is not so nice.
In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Am 04.12.24 um 17:01 schrieb Peter Dimov via Boost:
Claudio DeSouza wrote:
I tend to disagree. In my opinion even this:
hash.update( as_bytes( span{st} ) );
Is better than this:
hash.update( v.data(), v.size() ); Maybe.
But the two cases with the current interface are consistent - you do the same thing for both vector and string - and the two cases with the "normal" span aren't.
I would prefer a span-style interface where the two cases look the same, like this
hash.update( v ); hash.update( st );
There'd be no questions here as to whether this is a usability gain compared to the current interface.
(Ideally, in a hypothetical stdlib, .data would return a span and the above would be
hash.update( v.data() ); hash.update( st.data() );
to make it explicit that we're dropping to a lower level interface, but one can't have everything.)
If the hash-algorithms would accept `span<const void>` (or <const char>) instead as the only `update` method (no need for overloads) we could have hash_append( span{v} ); hash_append( span{st} ); by accepting a templated span and casting it to the span-type expected by the hash-algorithms. And/or a `hash2::span` that accepts any range-like in the ctor and doing the cast. This would allow the save interface Claudio proposed while still having the same interface. Or at least have it at the level of the hash algorithm as user interaction is through `hash_append` provided by the library so that could make all necessary "spanification" if I didn't miss anything

Alexander Grund wrote:
If the hash-algorithms would accept `span<const void>` (or <const char>) instead as the only `update` method (no need for overloads) we could have
hash_append( span{v} ); hash_append( span{st} );
by accepting a templated span and casting it to the span-type expected by the hash-algorithms. And/or a `hash2::span` that accepts any range-like in the ctor and doing the cast.
Yes, obviously.
However, there's an additional concern here that people perhaps don't
appreciate. The hash algorithm requirements govern user-defined algorithms
as well. N3980 wasn't supposed to be a library of hash algorithms; it was
supposed to be a framework into which any user-defined hash algorithm
can plug into, and be used for hashing C++ objects.
That's why the hash algorithm interface is kept minimal and uncomplicated
(which is at odds with making it as ergonomic as possible.)
I can obviously support any form of `update` by making it templated and do
the appropriate enable_if incantations, but I don't want to impose this
requirement on hash algorithm authors.
And I don't want to force hash algorithm authors to have to include

And I don't want to force hash algorithm authors to have to include
, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be. But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>);
I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation. But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges. The proposed `hash2::span` would then only appear in the interface of the library, i.e. `hash_append` My thinking is: If this is intended to push N3980 forward for inclusion in a future standard then I'd expect it to not use the "old-style" pointer+size-pairs but std::span readily available in the targeted standard.

Alexander Grund wrote:
But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>);
Which span is that, though?
I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation.
Even if we could assume C++20 and std::span, the interface taking std::span<unsigned char const> is inferior in usability to the current approach because it doesn't work well for ranges of `char` or `byte`.
But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges.
I'm not sure what this means. hash_append already accepts spans and ranges, and is type safe.

Am 05.12.24 um 13:55 schrieb Peter Dimov via Boost:
Alexander Grund wrote:
But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>); Which span is that, though? I left that open to the "chosen span library" I referenced belowor a template that accepts a span-like type. Even if we could assume C++20 and std::span, the interface taking std::span<unsigned char const> is inferior in usability to the current approach because it doesn't work well for ranges of `char` or `byte`. I copied that from the source and missed it in the documentation. So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte) I'm not sure what this means. hash_append already accepts spans and ranges, and is type safe. My bad, sorry for not double-checking.

Alexander Grund wrote:
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)
span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it. If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type. (Four because of char8_t.) Definitely something to consider when Boost drops support for C++17. Should be any time now.

On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost
Alexander Grund wrote:
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)
span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.
If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.
(Four because of char8_t.)
This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *? Zach

Zach Laine wrote:
On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost
wrote: Alexander Grund wrote:
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)
span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.
If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.
(Four because of char8_t.)
This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *?
If you mean this template<class T> void update( std::span<T> sp ); then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work. I'm not particularly fond of the user-facing syntax being hash.update( std::span{buffer} ); instead of hash.update( buffer );

On Thu, Dec 5, 2024 at 10:47 AM Peter Dimov via Boost
Zach Laine wrote:
On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost
wrote: Alexander Grund wrote:
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)
span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.
If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.
(Four because of char8_t.)
This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *?
If you mean this
template<class T> void update( std::span<T> sp );
then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work.
I'm not particularly fond of the user-facing syntax being
hash.update( std::span{buffer} );
instead of
hash.update( buffer );
Thanks, that makes sense. Looking at the docs, I think I get it now -- this is a user-provided API that your lib requires, right? If that's the case, I think I'd also require what you have, since there's no single concrete span type you could write there. An alternative might be to simply change the API depending on the value of __cplusplus -- void* + size_t before C++20, and std::spanstd::byte in C++20 and later. I realize some people might hate this. :) Zach

On 12/5/24 15:27, Alexander Grund via Boost wrote:
And I don't want to force hash algorithm authors to have to include
, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be. But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>); I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation.
But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges. The proposed `hash2::span` would then only appear in the interface of the library, i.e. `hash_append`
My thinking is: If this is intended to push N3980 forward for inclusion in a future standard then I'd expect it to not use the "old-style" pointer+size-pairs but std::span readily available in the targeted standard.
What is the practical advantage of using span (of any provider) over the
traditional pointer+size? I emphasize the "practical" part, as opposed
to any personal preference to "modern-ness", code style or anything like
that.
I'm asking because in my practice I often don't see the point in wrapper
classes such as span

What is the practical advantage of using span (of any provider) over the traditional pointer+size? I emphasize the "practical" part, as opposed to any personal preference to "modern-ness", code style or anything like that.
Like I explained in my original review, there is now a new type of program analysis guarded behind `-Wunsafe-buffers-usage` that this library would greatly benefit from, if it were spanified. The warning itself is opt-in to users. In the end of the day, this is about removing the chance of certain categories of mistakes around the use of pointers, while improving readability.

Andrey Semashev wrote:
What is the practical advantage of using span (of any provider) over the traditional pointer+size? I emphasize the "practical" part, as opposed to any personal preference to "modern-ness", code style or anything like that.
The practical advantage is that if a library is rewritten to use spans throughout, and never pointer arithmetic, this (a) automatically adds bounds checks everywhere and (b) makes the library warning-free under the new -Wunsafe-buffer-usage, which is all the rage nowadays because Chromium developers are rewriting Chromium as above. https://clang.llvm.org/docs/SafeBuffers.html Longer term, it's quite likely that a memory safety "Profile" (Profiles are also all the rage nowadays because they are Bjarne and Herb's proposal to "solve" safety) will be essentially equivalent to -Werror=unsafe-buffer-usage as far as pointer arithmetic is concerned. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3081r0.pdf

No dia 5 de dez. de 2024, às 12:52, Peter Dimov via Boost
escreveu: […] And I don't want to force hash algorithm authors to have to include
, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be.
You can define the interface of update as accepting a templatized arg with span-like semantics. Joaquin M Lopez Munoz

Joaquín M López Muñoz wrote:
No dia 5 de dez. de 2024, às 12:52, Peter Dimov via Boost
escreveu: […] And I don't want to force hash algorithm authors to have to include
, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be. You can define the interface of update as accepting a templatized arg with span-like semantics.
That's what I said in the part you omitted. I consider this a burden that I'd rather not impose on authors of hash algorithms.

I particularly think that at this point span is heading on a direction to be as fundamental as string_view, etc. Claudio

On Thu, Dec 5, 2024 at 5:11 AM Joaquín M López Muñoz via Boost < boost@lists.boost.org> wrote:
You can define the interface of update as accepting a templatized arg with span-like semantics.
This could work. Instead of writing: h.update( span ); You instead write: boost::hash2::update( h, span ); The signatures for update can be conditionally compiled based on C++ version, availability of boost::span, or it could be a function template based on a span-like concept. Hasher's update signature would remain as ( void const*, std::size_t ). Authors of hash algorithms would not need to include a span header or add additional signatures. Thanks to Matt Borland for helping me work this out. Regards

based on a span-like concept. Hasher's update signature would remain as ( void const*, std::size_t ). Authors of hash algorithms would not need to include a span header or add additional signatures.
Am I missing something? Why would that be a benefit? Having safe interfaces is the whole point I was raising. A span header is not something onerous to include. Why should any hashing authors not be provided with friendly, safer interfaces? Claudio.

Le jeudi 05 décembre 2024 à 12:06 -0800, Vinnie Falco via Boost a écrit :
On Thu, Dec 5, 2024 at 5:11 AM Joaquín M López Muñoz via Boost < boost@lists.boost.org> wrote:
You can define the interface of update as accepting a templatized arg with span-like semantics.
This could work. Instead of writing:
h.update( span );
You instead write:
boost::hash2::update( h, span );
+1 on this approach. The proposed library currently consists of two things : * a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects I think it should evolve into something more like : * a concept that hashers must use * a public api for hash users This means that for the library user, the hash object would basically be an opaque type, only accessed through calls like boost::hash2::hash_append(h, X) or boost::hash2::update(h, Y). The public api can then use a set of overloads for Y to anything we want to support (the multiple needed span<X>, a pair of pointers, a pointer + a size, etc.). That burden is left out of hash implementors.
The signatures for update can be conditionally compiled based on C++ version, availability of boost::span, or it could be a function template based on a span-like concept.
+1
Hasher's update signature would remain as (void const*, std::size_t ).
Or, hasher's signature can be selected through a set of supported ones. I'm not convinced by the ”hash writer shall not depend on boost::hash2” argument, but if that really matters then it's technically doable without forcing the use of the pointer + size pair for hash implementors. Regards, Julien

You can define the interface of update as accepting a templatized arg with span-like semantics.
This could work. Instead of writing:
h.update( span );
You instead write:
boost::hash2::update( h, span );
+1 on this approach.
The proposed library currently consists of two things :
* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.
So this discussion is only on which "concept that hashers must use".
If you mean this
template<class T> void update( std::span<T> sp );
then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work.
I'm not particularly fond of the user-facing syntax being
hash.update( std::span{buffer} );
instead of
hash.update( buffer ); Related to the above: Isn't the user-facing API the `hash_append` function and this update method is part of the concept for hash algorithms? And `hash_append` can massage the input into whatever that concept requires.
Given that I kind see that point why void*-size is acceptable, except for the new warning/safety-guidline against such pairs. If that should be adhered too (note that I no longer lean strongly either way) the interface could be defined using a SpanConcept: `void update(SpanConcept)` where `concept SpanConcept = requires(T t) { t.data(); t.size(); }` (Use of C++20 syntax only for illustration) So implementers are not required to support a particular span but to accept any span-like that `hash_append` passes to it. However I see the shortcoming that this would possibly exclude `template<typename T> void update(std::span<T>)` because `hash_append` might use `boost::span` which isn't(?) convertible to std::span (or whatever span type the hash algorithm accepts). So hash-algorithms need to use `template<class T> void update(T data_span)` exclusively, which isn't great either. --> void*-size is likely the most flexible and least burden for hash algorithm authors. If there is the way to silence the compiler warning in a reasonable way I don't see a reason against it.

6 décembre 2024 à 09:31 "Alexander Grund via Boost" a écrit:
The proposed library currently consists of two things :
* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects
As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.
I have a different understanding. Quoting the doc, “This library addresses two major use cases: hashing an untyped sequence of bytes, and hashing C++ objects.”. But for the former, the library relies on the user to directly call the hasher object update function. This is the part i don't feel comfortable, with, i think a hasher completly opaque for the user of the library would be a better design. That's just a minor change in term of implementation.
So this discussion is only on which "concept that hashers must use".
I believe part of the discussion is also a misunderstanding on what the public api is. It seems we don't all have the same opinion on this. Mine is that update is also part of the public api, not only hash_append. This public api needs to be at least compatible with modern / safe paradigms. This is not necessarily
Related to the above: Isn't the user-facing API the `hash_append` function and this update method is part of the concept for hash algorithms? And `hash_append` can massage the input into whatever that concept requires.
If i understand correctly: std::span<char> v("123", 3) hash_append(h, v); // hash "123" + the size 3, as v is a container h.update(v); // hash only "123". Or did i miss something? Regards, Julien

On 12/6/24 12:22, Julien Blanc via Boost wrote:
6 décembre 2024 à 09:31 "Alexander Grund via Boost" a écrit:
The proposed library currently consists of two things :
* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects
As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.
I have a different understanding.
Quoting the doc, “This library addresses two major use cases: hashing an untyped sequence of bytes, and hashing C++ objects.”.
But for the former, the library relies on the user to directly call the hasher object update function. This is the part i don't feel comfortable, with, i think a hasher completly opaque for the user of the library would be a better design. That's just a minor change in term of implementation.
A hasher must expose a common public interface anyway to be compatible with hash_append, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it. I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed. This would allow the user types' hash functions not depend on Boost.Hash2 and therefore make support for hashing more lightweight and potentially compatible with std::hash2, if one ever comes to light.

6 décembre 2024 à 11:15 "Andrey Semashev via Boost" a écrit:
A hasher must expose a common public interface anyway to be compatible with hash_append
This is stricter than what is actually required. hash_append can be made to work with hashers having a different interface, as long as they follow a concept known to hash_append / hash_update. This concept does not need to rely on types defined by boost::hash2, the dependency is the other way (boost::hash2 must include a specialization for the interface of the hasher).
, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it.
That is imho largely worth it. This also gives a lot more flexibility / room for evolution. There is an order of magnitude between the number of usages of the hashing api, and the number of hashers used in a project. Changing the hashing api means changing every call to hash_append (resp hash_update). Changing the hasher concept only means changing a few files. But, as i said before, you can even support multiple hasher concepts in hash_append / hash_update, so changing the hasher recommended api can be made while keeping old hasher api still working. Something that cannot be done if you talk directly to the hasher.
I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed.
I don't suggest to make changes to specifically forbid such use. Just that the nominal / advertised usage uses a thin wrapper (which, i believe, would be fully inlined and not result in additional runtime cost).
This would allow the user types' hash functions not depend on Boost.Hash2 and therefore
This is not required by what i suggest.
make support for hashing more lightweight
I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).
and potentially compatible with std::hash2, if one ever comes to light.
I do not see how this is related. Actually, i think it's quite the opposite: by decoupling the user API from the hasher concept API, you should make it easier to integrate new standards / requirements on either side. Or i'm missing something obvious here. Regards, Julien

On 12/6/24 14:13, Julien Blanc via Boost wrote:
6 décembre 2024 à 11:15 "Andrey Semashev via Boost" a écrit:
A hasher must expose a common public interface anyway to be compatible with hash_append
This is stricter than what is actually required. hash_append can be made to work with hashers having a different interface, as long as they follow a concept known to hash_append / hash_update. This concept does not need to rely on types defined by boost::hash2, the dependency is the other way (boost::hash2 must include a specialization for the interface of the hasher).
, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it.
That is imho largely worth it. This also gives a lot more flexibility / room for evolution. There is an order of magnitude between the number of usages of the hashing api, and the number of hashers used in a project. Changing the hashing api means changing every call to hash_append (resp hash_update). Changing the hasher concept only means changing a few files.
But, as i said before, you can even support multiple hasher concepts in hash_append / hash_update, so changing the hasher recommended api can be made while keeping old hasher api still working. Something that cannot be done if you talk directly to the hasher.
I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed.
I don't suggest to make changes to specifically forbid such use. Just that the nominal / advertised usage uses a thin wrapper (which, i believe, would be fully inlined and not result in additional runtime cost).
This would allow the user types' hash functions not depend on Boost.Hash2 and therefore
This is not required by what i suggest.
make support for hashing more lightweight
I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).
I also had compile-time dependencies in mind when I said "lightweight".
To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X
{
private:
std::string a;
int b;
// ...
public:
template
and potentially compatible with std::hash2, if one ever comes to light.
I do not see how this is related.
What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all. X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X

Andrey Semashev wrote:
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X { private: std::string a; int b;
// ... public: template
friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } }; This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely.
Once things stabilize, you will be allowed to copy hash_append_fwd.hpp locally (or just declare the functions yourself.) This has always been the intent, but GCC bugs and BOOST_CXX14_CONSTEXPR complicated matters. As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue. https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...

As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.
But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.

Claudio DeSouza wrote:
As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.
But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.
Well, we can ask here. Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

On 12/6/24 17:01, Peter Dimov via Boost wrote:
Claudio DeSouza wrote:
As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.
But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.
Well, we can ask here.
Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?
There has to be a cost/benefit analysis. That is, if C++14 is the baseline, will that significantly affect the library design and implementation? And what is the estimate of C++11-only user base that will be cut off? On the first question, library authors are probably the ones who can comment. The second one is more difficult to answer, but I think someone posted here survey results a while ago. Stock compiler versions in the major OSes is another indication. Personally, I'm fine with anything up to and including C++17.

On Fri, Dec 6, 2024 at 6:02 AM Peter Dimov via Boost
Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?
If C++14 allows for a better public-facing design then it is probably best to require it. Can we get a Tony Table? Thanks

Vinnie Falco wrote:
On Fri, Dec 6, 2024 at 6:02 AM Peter Dimov via Boost
mailto:boost@lists.boost.org > wrote: Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?
If C++14 allows for a better public-facing design then it is probably best to require it. Can we get a Tony Table?
No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov
No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.
My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?" Thanks

6 décembre 2024 à 16:48 "Vinnie Falco via Boost" a écrit:
On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov
wrote: No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.
My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"
Not speaking for the others, but “i don't use c++11 anymore” is my usual answer. Finally dropped gcc4.8, migrated everything to C++20, won't go back for sure :). Regards, Julien

Vinnie Falco wrote:
On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov
mailto:pdimov@gmail.com > wrote: No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.
My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"
They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine _for them_. Not asking them to speculate whether a C++14 requirement wouldn't be fine _for some unspecified other people_.

On Fri, Dec 6, 2024 at 7:57 AM Peter Dimov
They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine _for them_. Not asking them to speculate whether a C++14 requirement wouldn't be fine _for some unspecified other people_.
When I read "Do people care about C++11 support in hash2" I interpret "people" as "unspecified other people". And when I read "would it be acceptable to impose a minimum requirement of C++14" I understand this to be asking if it is acceptable to impose the C++14 requirement on everyone. My apologies for the misunderstanding. Thanks

On Fri, Dec 6, 2024 at 7:57 AM Peter Dimov pdimov@gmail.com wrote:
They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine for them. Not asking them to speculate whether a C++14 requirement wouldn't be fine for some unspecified other people.
When I read "Do people care about C++11 support in hash2" I interpret "people" as "unspecified other people". And when I read "would it be acceptable to impose a minimum requirement of C++14" I understand this to be asking if it is acceptable to impose the C++14 requirement on everyone. My apologies for the misunderstanding.
Thanks
To my knowledge now that RHEL 7 has gone EOL there is no supported OS without a C++14 compatible compiler by default. Naked constexpr will break MSVC 14.0, but that's not a big loss. Anecdotally Math and Multiprecision moved to C++14 a few years ago with 0 complaints. I know for fact those run on pretty special hardware (e.g. Oak Ridge's supercomputers) with their own specialized compilers and the situation is fine. Matt

On 12/6/24 18:48, Vinnie Falco via Boost wrote:
On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov
wrote: No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.
My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"
If you're addressing me, I've expressed my thoughts in my previous post: https://lists.boost.org/Archives/boost/2024/12/258515.php I'm not proclaiming anything, and instead I'm saying that the decision is on the authors. If the authors do not think that "replacing BOOST_CXX14_CONSTEXPR with conspexpr" is worth the cost, then don't require C++14, simple.

On 06.12.24 15:01, Peter Dimov via Boost wrote:
Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?
Personally, I'm OK with anything up to C++20. Upgrading my codebase to C++20 was difficult, but it's done now, and I won't be going back. On a more philosophical note, I'd rather see a new library drop C++11 support before it is released than 20 years after it is released. Dropping support will prevent some projects from using the library, but it won't break any existing code because that code does not exist yet. Dropping support later *will* break existing code at that point, even if the C++ standard being dropped is 33 years old at that point. -- Rainer Deyke - rainerd@eldwood.com

On 12/6/24 16:47, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X { private: std::string a; int b;
// ... public: template
friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } }; This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely.
Once things stabilize, you will be allowed to copy hash_append_fwd.hpp locally (or just declare the functions yourself.)
This has always been the intent, but GCC bugs and BOOST_CXX14_CONSTEXPR complicated matters.
As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...
Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.

Andrey Semashev wrote:
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...
Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.
Maybe. Something could probably be figured out, but I don't think your sketch would be it. We can't (or rather, don't want to) require from hash algorithm authors to provide the full hash_append functionality via member functions, and wrapping doesn't work that well, at least in my head. Maybe I need to pass a helper object that provides the hash_append API as member functions. Abusing the tag for that would be a bit too clever. :-)

On 12/6/24 17:34, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...
Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.
Maybe. Something could probably be figured out, but I don't think your sketch would be it.
We can't (or rather, don't want to) require from hash algorithm authors to provide the full hash_append functionality via member functions, and wrapping doesn't work that well, at least in my head.
Maybe I need to pass a helper object that provides the hash_append API as member functions.
Yes, that sounds good to me.
Abusing the tag for that would be a bit too clever. :-)
On that note, what's the use of the tag argument? I have only seen hash_append_tag so far, and I don't think adding new tags in the future would be any different from using separate functions (i.e. to support a new tag would require about the same amount of user's code modifications as adding a new friend function to the user's classes).

Andrey Semashev wrote:
On that note, what's the use of the tag argument? I have only seen hash_append_tag so far, and I don't think adding new tags in the future would be any different from using separate functions (i.e. to support a new tag would require about the same amount of user's code modifications as adding a new friend function to the user's classes).
This is the tag_invoke pattern: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1895r0.pdf Instead of every library making up its own identifier for ADL purposes (e.g. in Hash2 I used do_hash_append in the past), we use tag_invoke with a library-specific tag. The advantage is that the tag is namespaced. If two libraries pick the same name for their ADL customization point, they will conflict; but if two libraries pick the same tag name, they will not conflict because the tags are in their respective namespaces. We use tag_invoke in Boost.JSON as well.

6 décembre 2024 à 14:24 "Andrey Semashev via Boost" a écrit:
I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X { private: std::string a; int b;
// ... public: template
friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } }; This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely. I.e. so that something like this is a working and advertised way to add support for hashing to your type:
class X { private: std::string a; int b;
// ... public: template
friend void hash_append( Hash& h, Flavor const& f, X const& v ) { h.update(f, v.a); h.update(f, v.a.size()); h.update(f, v.b); } }; X x; boost::hash2::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works
I think i start to understand what you mean. In what i have in mind : * hash algorithm implementation and interface do not depend on boost::hash2 (this is the case currently) * custom object hashing does depend on boost::hash2 (this is also the case currently)
From your examples, it seems to me that you would like custom object hashing to also be independant from boost::hash2. Is that accurate?
and potentially compatible with std::hash2, if one ever comes to light.
I do not see how this is related.
What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all.
X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X
What i had in mind was more: X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface. On a broader view, Claudio's made a very valid point, which made me realize that the current review goals may be a bit unusual. The usual review focus on finding the best suitable interface for the library. However, i don't think such interface exists: * span<void const> is not available, even with a bleeding edge compiler. * support for older C++ versions mandates support for void* + size pointer as separate arguments (no span vocabulary type), which we know will be flagged by compilers as unsafe So, in my opinion, part of the review should also focus on how will the library evolve toward a safer/modern version, with minimal burden for the library user. Basically we know that we will deliver an interface that is outdated by current standards, but that we have very valid reasons to do that (these reasons would not hold if we were targetting std). We should think about of we handle the cohabitation with a safer interface, and how we provide a smooth transition path to users, taking full advantage of available compiler features, so that when they update their code to newer standards and safer patterns, boost::hash2 is not a concern and just works flawlessly. I understand that it is asking a lot to the library authors. Regards, Julien

On 12/6/24 18:54, Julien Blanc via Boost wrote:
6 décembre 2024 à 14:24 "Andrey Semashev via Boost" a écrit:
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:
https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...
class X { private: std::string a; int b;
// ... public: template
friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } }; This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely. I.e. so that something like this is a working and advertised way to add support for hashing to your type:
class X { private: std::string a; int b;
// ... public: template
friend void hash_append( Hash& h, Flavor const& f, X const& v ) { h.update(f, v.a); h.update(f, v.a.size()); h.update(f, v.b); } }; X x; boost::hash2::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works
I think i start to understand what you mean. In what i have in mind :
* hash algorithm implementation and interface do not depend on boost::hash2 (this is the case currently) * custom object hashing does depend on boost::hash2 (this is also the case currently)
From your examples, it seems to me that you would like custom object hashing to also be independant from boost::hash2. Is that accurate?
Yes.
and potentially compatible with std::hash2, if one ever comes to light.
I do not see how this is related.
What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all.
X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X
What i had in mind was more:
X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X
Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface.
Sure, but why use boost::hash2::hash_append when there's, presumably, std::hash_append, since you've already switched to std::fnv1a_64? But it doesn't really matter. My main point was that the support for hashing in user's type X remains functional and doesn't need updating or duplication. Now, after Peter's comments on the tag argument, I can see this would be difficult - at least the tags would be different between boost::hash2::hash_append and std::hash_append, assuming that both use the tag_invoke protocol. This is unfortunate, but even then just the different tags would be better than entire different APIs, so I still see value in a member-style hash_append API in the h argument (which would probably be a wrapper or a proxy for the underlying hasher). A way to have zero forward declarations would be perfect, though. And I should also mention that I realize that std::hash2 may be incompatible with Boost.Hash2 or even never be standardized, so the whole compatibility argument is rather theoretical at this point.
On a broader view, Claudio's made a very valid point, which made me realize that the current review goals may be a bit unusual. The usual review focus on finding the best suitable interface for the library. However, i don't think such interface exists:
* span<void const> is not available, even with a bleeding edge compiler. * support for older C++ versions mandates support for void* + size pointer as separate arguments (no span vocabulary type), which we know will be flagged by compilers as unsafe
So, in my opinion, part of the review should also focus on how will the library evolve toward a safer/modern version, with minimal burden for the library user. Basically we know that we will deliver an interface that is outdated by current standards, but that we have very valid reasons to do that (these reasons would not hold if we were targetting std). We should think about of we handle the cohabitation with a safer interface, and how we provide a smooth transition path to users, taking full advantage of available compiler features, so that when they update their code to newer standards and safer patterns, boost::hash2 is not a concern and just works flawlessly.
Personally, I think the "safety" argument doesn't really work in practice. If you want decent performance, you will likely want to operate on raw pointers and memory in the hasher implementation. In some cases you might want to use SIMD intrinsics or process bytes as system-sized words, for example, which is already throwing type safety out of the window. You'll also not want bounds checking to get in the way in your hot loop. I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.

I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.
It depends what you mean by this, but span already offers compile time
fixed extants, that can be used with generic algorithms and entirely elide the runtime checks, if you want to support that type of generic construct. Claudio.

On 12/6/24 19:51, Claudio DeSouza via Boost wrote:
I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.
It depends what you mean by this, but span already offers compile time fixed extants, that can be used with generic algorithms and entirely elide the runtime checks, if you want to support that type of generic construct.
I don't think fixed extents are applicable to the hasher interface. And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.

On 12/6/24 20:00, Claudio DeSouza via Boost wrote:
I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.
I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.

But the examples you presented of downsides are not applicable. If hash2 wants to encode buffer size in the interface then there are tradeoffs, but span can accommodate to those. Claims of span not being compatible with simd or other optimisations are just not the case. So the downsides mentioned pretty much are no applicable as opposed to any alternative. On Fri, 6 Dec 2024 at 17:05, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 20:00, Claudio DeSouza via Boost wrote:
I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.
I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 12/6/24 20:09, Claudio DeSouza via Boost wrote:
On Fri, 6 Dec 2024 at 17:05, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 20:00, Claudio DeSouza via Boost wrote:
I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.
I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.
But the examples you presented of downsides are not applicable. If hash2 wants to encode buffer size in the interface then there are tradeoffs, but span can accommodate to those. Claims of span not being compatible with simd or other optimisations are just not the case. So the downsides mentioned pretty much are no applicable as opposed to any alternative.
The only way to reliably avoid the potential overhead of span and e.g. load the data from it e.g. into a SIMD register is to get a pointer into the span. And if you want to avoid bounds checked on every iteration, you would have to resort to pointer arithmetics. At this point, what additional safety does span bring?

Andrey Semashev wrote:
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.
I also used to think that way, but that was because I didn't understand
the purpose of span.
The purpose of span is to replace pointer arguments. If your function
takes
void f1( unsigned char p[] );
you use
void f1( span<unsigned char> p );
and if it takes
void f2( unsigned char p[4] );
you use
void f2( span

I've been reading several comments on the list and I think a few different things are being discussed concurrently so I'm hoping to clarify and address them. There are two interfaces: one for implementers, and one for users. A goal for implementers is to not have to require any Hash2 headers for their algorithms to be used by the library. For users, we want safe interfaces that always do the Right Thing. hash_append is a proven interface for working with user-facing structures and solves the subtle and difficult problems it needs to, such as the "ab" | "c and "a" | "bc" case, as we've already discussed. Unfortunately, hash_append isn't really suitable as a safe interface for incremental hashing of byte streams, as we've noted that hash_append will append a hash of the size once it's done. update() works properly for hashing byte streams (as it must) but because it's intended as part of an implementation, it's not exactly suitable for most end-users who just want to hash their 3 GB file. We could even go so far as to say that it is an "unsafe" interface. So, we want safety. Vinnie has already alluded to using a free function hash_update which will accept a span. This is, as far as I can see, the best solution to the problems at hand. With this, we can achieve our goals for both implementers and users because the update interface doesn't have to change and then we can layer all the type safety externally to the free functions in a non-intrusive way. - Christian

Christian Mazakas wrote:
update() works properly for hashing byte streams (as it must) but because it's intended as part of an implementation, it's not exactly suitable for most end- users who just want to hash their 3 GB file.
It is, though. Here's the actual example: https://pdimov.github.io/hash2/doc/html/hash2.html#example_md5sum static void md5sum( std::FILE* f, char const* fn ) { boost::hash2::md5_128 hash; int const N = 4096; unsigned char buffer[ N ]; for( ;; ) { std::size_t n = std::fread( buffer, 1, N, f ); if( std::ferror( f ) ) { std::fprintf( stderr, "'%s': read error: %s\n", fn, std::strerror( errno ) ); return; } if( n == 0 ) break; hash.update( buffer, n ); } std::string digest = to_string( hash.result() ); std::printf( "%s *%s\n", digest.c_str(), fn ); } Note how the `update` call is perfectly intuitive and suitable for end use. If we try to replace this with a `span`, it's not going to become any better. Why is that? We know that `span` must result in better code? Well it is because std::fread doesn't return a `span`, as doesn't most of the rest of existing code in 2025. If everything used `span`, everyone else should also use it, but we aren't there yet. Nowadays you get a `size_t`, and the existing `update` will happily take it, without forcing you to construct an unnecessary `span` out of it.

On Fri, Dec 6, 2024 at 6:12 PM Peter Dimov via Boost
Andrey Semashev wrote:
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.
The purpose of span is to replace pointer arguments. If your function takes
I agree, but I wonder if we can maybe have our cake and eat it too, i.e. no need to pick one or the other, just provide easy default for most users and use no span interface for "low level" API.
I think one important thing is to remember that currently span will be
processed as it's bytes + it's size when you call hash_append.
If you want it processed as it's bytes we need to call hash_append_range with
begin and end iterator.
My first attempt of fixing this potential for confusion while addressing
concerns about performance would be to provide higher level api for users
and keep low level API for authors of algorithms since when it comes to
hashing even tiny overhead of using span can be problematic.
To steal placeholder name from other discussions we will call these helpers
simple_hash_{something}. It would use default flavor and not provide ptr,
len interface.
But now here we get to the point that as mentioned above sometimes span is
not treated as it's bytes.
So my intuition would be that we would need to have 2 different helpers
addressing 2 distinct use cases:
1. simple_hash_bytes
2. simple_hash_values
Naming is not great, but I hope you get the idea.
simple_hash_bytes can only process span/vector/array like
arguments(contiguous bytes of some range where underlying data has same
size as char), it would only append bytes, not size. Does not know how to
hash int, std::string, etc... just byte like contiguous ranges.
simple_hash_values processes values, e.g. your user defined type for which
you implemented tag_invoke, std::string, std::pair, int, etc.... here span
is hashed as it's bytes and size.
I think this would cover 2 common use cases:
1. simple_hash_bytes - I am getting my data in batches and hashing
them , but I want same result, e.g. if my file content is 1234 bytes no
matter how I split those 1234 bytes in multiple calls with span I will
get same result.
2. simple_hash_values - I am hashing multiple values together, e.g.
std::pair
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 12/6/24 20:12, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.
I also used to think that way, but that was because I didn't understand the purpose of span.
The purpose of span is to replace pointer arguments. If your function takes
void f1( unsigned char p[] );
you use
void f1( span<unsigned char> p );
and if it takes
void f2( unsigned char p[4] );
you use
void f2( span
p ); The fixed extent span here does two things: first, it inserts a runtime check that the extent of the passed span is >= 4.
void g( span<unsigned char> p ) { f2( p ); // implicit assert( p.size() >= 4 ); }
Second, it can be used to optimize out runtime checks in operator[]:
void f2( span
p ) { uint32_t v = p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24); } Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.
Yes, I understand, but the thing is I very rarely have to write `void
f2( unsigned char p[4] );` in the first place. Most of the time I get a
variable amount of data that I need to process, so I have either an
iterator range or a pointer and size. And if there are fixed-sized
fragments of that data that I need to process, pretty much always I have
checked the entire size (or at least some outer size) of the data
beforehand, so no checks needed for those individual fragments.
So, for example, if I have to parse an RTP packet, I would
void on_rtp_packet(const uint8_t* packet, size_t size)
{
// RTP fixed header is 12 bytes long
if (size < 12)
throw std::invalid_argument("RTP packet too short");
// Parse 12-byte fixed header
uint16_t seqn = read_be16(packet + 2);
uint32_t timestamp = read_be32(packet + 4);
}
Here, I may accept an iterator_range

Andrey Semashev wrote:
Yes, I understand, but the thing is I very rarely have to write `void f2( unsigned char p[4] );` in the first place. Most of the time I get a variable amount of data that I need to process, so I have either an iterator range or a pointer and size. And if there are fixed-sized fragments of that data that I need to process, pretty much always I have checked the entire size (or at least some outer size) of the data beforehand, so no checks needed for those individual fragments.
So, for example, if I have to parse an RTP packet, I would
void on_rtp_packet(const uint8_t* packet, size_t size) { // RTP fixed header is 12 bytes long if (size < 12) throw std::invalid_argument("RTP packet too short");
// Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }
read_be32 takes uint8_t const[4] here, because it has an implicit precondition
that the argument has at least 4 valid bytes.
So
uint16_t read_be16( span

On 12/6/24 20:45, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
Yes, I understand, but the thing is I very rarely have to write `void f2( unsigned char p[4] );` in the first place. Most of the time I get a variable amount of data that I need to process, so I have either an iterator range or a pointer and size. And if there are fixed-sized fragments of that data that I need to process, pretty much always I have checked the entire size (or at least some outer size) of the data beforehand, so no checks needed for those individual fragments.
So, for example, if I have to parse an RTP packet, I would
void on_rtp_packet(const uint8_t* packet, size_t size) { // RTP fixed header is 12 bytes long if (size < 12) throw std::invalid_argument("RTP packet too short");
// Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }
read_be32 takes uint8_t const[4] here, because it has an implicit precondition that the argument has at least 4 valid bytes.
So
uint16_t read_be16( span
p ); uint32_t read_be32( span p ); void on_rtp_packet( span
packet ) { // RTP fixed header is 12 bytes long if (packet.size() < 12) throw std::invalid_argument("RTP packet too short"); // Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }
Since the optimizer sees that packet.size() >= 12, it can elide the checks against 2 and 4.
I don't trust the optimizer. Because it failed me more than once. If you want your code fast, you better write it fast yourself and not hope that the compiler does a good job for you.

On 12/6/24 20:52, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
I don't trust the optimizer. Because it failed me more than once.
That's not the point.
The point is to make life easier for the optimizer, by giving it more information to work with (in the form of the fixed extent).
I've already relieved it from work by not writing the unnecessary checks. :)

Andrey Semashev wrote:
On 12/6/24 20:52, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
I don't trust the optimizer. Because it failed me more than once.
That's not the point.
The point is to make life easier for the optimizer, by giving it more information to work with (in the form of the fixed extent).
I've already relieved it from work by not writing the unnecessary checks. :)
Sure, but you also didn't write the necessary checks that would have caught a bug where you accidentally read past the end of the packet due to a bad copy and paste.

On Fri, Dec 6, 2024 at 6:48 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
I don't trust the optimizer. Because it failed me more than once.
If you want your code fast, you better write it fast yourself and not hope that the compiler does a good job for you.
it is also partially an ABI issue on Windows https://youtu.be/PCP3ckEqYK8?feature=shared&t=2203
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

You can see practical manifestations of these examples in e.g https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp where at https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9... you have BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64 ] ) with the implicit precondition that `block` points at at least 64 valid bytes, and then at https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9... you have for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); } which relies on this implicit precondition to read 16 subspans of size 4. (detail::read32le also has an implicit precondition that its argument points at at least 4 valid bytes.) Replacing these pointer arguments with fixed size spans would theoretically result in all the bounds checks (except one at the top level) being optimized out even if the functions aren't inlined.
-----Original Message----- From: Peter Dimov
Sent: Friday, 6 December 2024 19:13 To: boost@lists.boost.org Subject: RE: [boost] [hash2][review] Early review (due to holidays) Andrey Semashev wrote:
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.
I also used to think that way, but that was because I didn't understand the purpose of span.
The purpose of span is to replace pointer arguments. If your function takes
void f1( unsigned char p[] );
you use
void f1( span<unsigned char> p );
and if it takes
void f2( unsigned char p[4] );
you use
void f2( span
p ); The fixed extent span here does two things: first, it inserts a runtime check that the extent of the passed span is >= 4.
void g( span<unsigned char> p ) { f2( p ); // implicit assert( p.size() >= 4 ); }
Second, it can be used to optimize out runtime checks in operator[]:
void f2( span
p ) { uint32_t v = p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24); } Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.

On 12/6/24 20:30, Peter Dimov via Boost wrote:
You can see practical manifestations of these examples in e.g
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp
where at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64 ] )
with the implicit precondition that `block` points at at least 64 valid bytes, and then at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }
which relies on this implicit precondition to read 16 subspans of size 4.
Assuming you had code like this:
void transform( span

Andrey Semashev wrote:
Assuming you had code like this:
void transform( span
block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } } I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span
accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough.
It could, yes. Giving the compiler the opportunity to elide the checks is the point of the fixed extent span. If you use dynamic extent spans everywhere, the compiler can't optimize out the checks using only local knowledge, it will need to rely on inlining and have to work quite a bit harder.

detail::read32le( block.subspan(i * 4).first<4>()); So you also get a sized span, and avoid checks. On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 20:30, Peter Dimov via Boost wrote:
You can see practical manifestations of these examples in e.g
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp
where at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64
] )
with the implicit precondition that `block` points at at least 64 valid
bytes,
and then at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }
which relies on this implicit precondition to read 16 subspans of size 4.
Assuming you had code like this:
void transform( span
block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } } I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span
accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 12/6/24 21:46, Claudio DeSouza via Boost wrote:
detail::read32le( block.subspan(i * 4).first<4>());
So you also get a sized span, and avoid checks.
I think, first() would need to check the dynamic span size anyway. PS: Please, don't top-post.
On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 20:30, Peter Dimov via Boost wrote:
You can see practical manifestations of these examples in e.g
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp
where at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64
] )
with the implicit precondition that `block` points at at least 64 valid
bytes,
and then at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }
which relies on this implicit precondition to read 16 subspans of size 4.
Assuming you had code like this:
void transform( span
block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } } I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span
accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent. This loop in general would benefit from being changed to something that leverages the fixed extent even more. There are different approaches, but ideally you want to iterate in a stride, which relies on the fixed extent and always returned subspans of fixed extents each step, so you dispense all the arithmetic access and subspan calls. On Fri, 6 Dec 2024 at 18:56, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 21:46, Claudio DeSouza via Boost wrote:
detail::read32le( block.subspan(i * 4).first<4>());
So you also get a sized span, and avoid checks.
I think, first() would need to check the dynamic span size anyway.
PS: Please, don't top-post.
Assuming you had code like this:
void transform( span
block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } } I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span
accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/6/24 20:30, Peter Dimov via Boost wrote:
You can see practical manifestations of these examples in e.g
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp
where at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64
] )
with the implicit precondition that `block` points at at least 64 valid
bytes,
and then at
https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...
you have
for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }
which relies on this implicit precondition to read 16 subspans of size
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Claudio DeSouza wrote:
The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent.
block has a fixed extent, but block.subspan(i * 4) does not. ...
On 12/6/24 21:46, Claudio DeSouza via Boost wrote:
detail::read32le( block.subspan(i * 4).first<4>());
So you also get a sized span, and avoid checks.

Correct, and that's why you `first<N>()` after, however the question is how
effective the compiler is at eliminating this temporary in between.
Claudio.
On Fri, Dec 6, 2024 at 7:07 PM Peter Dimov via Boost
Claudio DeSouza wrote:
The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent.
block has a fixed extent, but block.subspan(i * 4) does not.
...
On 12/6/24 21:46, Claudio DeSouza via Boost wrote:
detail::read32le( block.subspan(i * 4).first<4>());
So you also get a sized span, and avoid checks.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Additionally, you are also allowed to subspan without a dynamic extent, like in: s.subspan<2u, 4u>() My point is that it all depends on what you are doing and what is possible in that context.

On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
Unless this is a new addition, I don't see such a member in std::span.
https://en.cppreference.com/w/cpp/container/span/subspan Thanks

On 12/7/24 00:42, Vinnie Falco wrote:
On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost
mailto:boost@lists.boost.org> wrote: Unless this is a new addition, I don't see such a member in std::span.
Ah, yes, I missed this, thanks. Anyway, it doesn't work as the offset is a runtime value in this case.

So to go back to my point: If you really want to avoid checks, and you do know the size beforehand, you can entirely avoid it by harnessing this type of compile-time information and leveraging the compiler in your favour. It all depends how dedicated you are to make sure you get that. Granted, with dynamic extent the number of checks will depend on several factors, including implementation, and hardening settings the user is building with. It is win after win with span. On Fri, Dec 6, 2024 at 9:47 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/7/24 00:42, Vinnie Falco wrote:
On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost
mailto:boost@lists.boost.org> wrote: Unless this is a new addition, I don't see such a member in std::span.
Ah, yes, I missed this, thanks.
Anyway, it doesn't work as the offset is a runtime value in this case.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 12/7/24 00:51, Claudio DeSouza via Boost wrote:
On Fri, Dec 6, 2024 at 9:47 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 12/7/24 00:42, Vinnie Falco wrote:
On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost
mailto:boost@lists.boost.org> wrote: Unless this is a new addition, I don't see such a member in std::span.
Ah, yes, I missed this, thanks.
Anyway, it doesn't work as the offset is a runtime value in this case.
So to go back to my point: If you really want to avoid checks, and you do know the size beforehand, you can entirely avoid it by harnessing this
type
of compile-time information and leveraging the compiler in your favour. It all depends how dedicated you are to make sure you get that. Granted, with dynamic extent the number of checks will depend on several factors, including implementation, and hardening settings the user is building with.
It is win after win with span.
To convert all the checks that are done in runtime with the code I presented earlier you would have to convert that loop to a series of statements such as this: x[ 0 ] = detail::read32le( block.subspan<0, 4>() ); x[ 1 ] = detail::read32le( block.subspan<4, 4>() ); x[ 3 ] = detail::read32le( block.subspan<8, 4>() ); x[ 4 ] = detail::read32le( block.subspan<12, 4>() ); // ... x[ 15 ] = detail::read32le( block.subspan<60, 4>() ); Or perform some kind of tricks with variadic templates and fold expressions. At least, until `template for` (P1306) is accepted. So, I suppose, it is doable in this limited case. But not something I would willingly do, given that I can achieve the same result with a plain loop and pointers. There's your example of span getting in the way.

Yes, that's the naive way to implement it. Like I said, depending how dedicated you are, in modern C++ you can pretty much just fold something like that in an elegant way. Also, you need to understand that "just using a pointer" doesn't necessarily mean no checks either in many cases. I think the main issue here is that with span we can eliminate all the arithmetic around subscript access, which makes code more readable, and harder to get wrong, while delivering the same result, and with tools that allow us to provide the compiler with more information about what we are doing. So the overall result is something more robust than "just a pointer", and the safety becomes just a natural conclusion of this. Claudio.

Andrey Semashev wrote:
To convert all the checks that are done in runtime with the code I presented earlier you would have to convert that loop to a series of statements such as this:
x[ 0 ] = detail::read32le( block.subspan<0, 4>() ); x[ 1 ] = detail::read32le( block.subspan<4, 4>() ); x[ 3 ] = detail::read32le( block.subspan<8, 4>() ); x[ 4 ] = detail::read32le( block.subspan<12, 4>() ); // ... x[ 15 ] = detail::read32le( block.subspan<60, 4>() );
Or perform some kind of tricks with variadic templates and fold expressions.
You can write this with mp_for_each, but it's not necessary because compilers figure out the for loop easily: https://godbolt.org/z/d861cPf1K

Common span win. I knew that was the case, but I was replying from the perspective of the argument "I don't trust optimisers". In practice, yes, compilers are much different these days, and the more information can be inferred about any given operation, the better the chances of optimisations to be very effective in shedding dead weight. There's a whole debate around bound checks because of that, and the trend is that these types of optimisations will continue to improve going forward, as the demand for bound checks elision increases. Claudio.

On Fri, Dec 6, 2024 at 9:12 AM Peter Dimov via Boost
The purpose of span is to replace pointer arguments. If your function takes ... void f2( unsigned char p[4] ); you use void f2( span
p );
This is quite informative, thank you. For my entire career I have avoided passing C-style arrays because of ignorance of how they work. Is it a copy? Is it just a pointer? Does the compiler know the extent? And so on. I don't like the look of function signatures which accept language arrays, because they do not quite work the same way as other types. The fixed extent span here does two things: first, it inserts a runtime
check that the extent of the passed span is >= 4. ... Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.
This is a strong rationale for using a span-like type in the signatures for those implementation functions. Thanks

On Fri, Dec 6, 2024 at 7:06 PM Vinnie Falco via Boost
This is quite informative, thank you. For my entire career I have avoided passing C-style arrays because of ignorance of how they work. Is it a copy? Is it just a pointer? Does the compiler know the extent? And so on. I don't like the look of function signatures which accept language arrays, because they do not quite work the same way as other types.
I had the same "issue" until I remembered a simple C++ program that shows how compiler sees this functions.
void f(int arr[3]) { } void f(int arr[4]) { } compiler will complain: <source>:5:6: error: redefinition of 'void f(int*)' 5 | void f(int arr[4]) { | <source>:1:6: note: 'void f(int*)' previously defined here 1 | void f(int arr[3]) { So I also in general prefer fixed extent span, but we must remember that that can lead to code size bloat, and that is usually undetectable in micro benchmarks, but can slow down large programs.

On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco
How?
maybe we are not talking about same situation, but this is what I meant,
godbolt https://godbolt.org/z/K8xEjEoMK link [image: image.png] Compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways. This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly. There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/

Hi Vinnie,
did not know pasting image will get me sent straight to moderator approval
due to message size, apologies, here is my message with image removed.
On Fri, Dec 6, 2024 at 7:48 PM Ivan Matek
On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco
wrote: How?
maybe we are not talking about same situation, but this is what I meant,
godbolt https://godbolt.org/z/K8xEjEoMK link
If you look at
nt f_span_static<3ul>(std::__1::span
you will see that compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways.
This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly.
There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/

Yes, this is the challenge with optimisations, caching, etc, is: in many
cases you have to make a trade-off between size/performance. I think it is
also important to mention that if you ultimately want to tag a block as
unsafe buffers usage and just do pointer arithmetics on top of the memory
you received through a span, you are free to do that. That's your
prerogative. That's an aspect of the abstract machine you now have a much
finer control over because you spanified your interfaces. The other way
around is not true though. You can opt-in to spanify your interfaces and
retain the ability to deep dive in a lower level type of code, and use
pointers as you see fit, However, you don't benefit from higher
abstractions once all you have is just offer pointer/length pairs. I think
Peter has made this case in a much more educated manner that I have.
Claudio.
On Fri, Dec 6, 2024 at 11:06 PM Ivan Matek via Boost
On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco
wrote: How?
maybe we are not talking about same situation, but this is what I meant,
godbolt https://godbolt.org/z/K8xEjEoMK link [image: image.png] Compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways.
This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly.
There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev wrote:
I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.
Microsoft has tried these sorts of annotations pretty extensively, and they have moved towards requiring span. (In C they still need to rely on the annotations, of course, but in C++ code, they now prefer span.) std::span has actually been proposed by Microsoft, on the basis of that experience.

On 06.12.24 17:32, Andrey Semashev via Boost wrote:
On 12/6/24 18:54, Julien Blanc via Boost wrote:
X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X
Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface.
Sure, but why use boost::hash2::hash_append when there's, presumably, std::hash_append, since you've already switched to std::fnv1a_64?
Because in generic code you wouldn't know if your hash algorithm is supplied by std or boost::hash2. -- Rainer Deyke - rainerd@eldwood.com

If i understand correctly:
std::span<char> v("123", 3)
hash_append(h, v); // hash "123" + the size 3, as v is a container h.update(v); // hash only "123".
Or did i miss something?
I think this is actually an important question: How to (conveniently) hash only a range? I.e. with the library, not the hasher. As in the example there might be use cases to calculate e.g. the checksum of a file using the hash algorithm as a template. For this the size of the data/file must not be included in the hash. I've seen `hash_append_range` which could be used but needs to get the begin/end manually. I think a 3 parameter form `hash_append_range(Hash&, Flavor&, Range)` would be a good addition to support the above use case without having to call the hasher directly. I.e. to provide a similar interface/abstraction to `hash_append` Maybe name it `hash_append_range_without_size` to discourage people from using it without knowing the caveats (multiple ranges, empty ranges, ...)

Alexander Grund wrote:
As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.
Not quite, because hash_append would add the size in this case. The equivalent is hash_append_range, but it's old school and takes two iterators. Maybe I need to add an overload of hash_append_range that actually takes a range. Incidentally, hash_append_range is what you already have to use if you invoke the hash algorithm at compile time on e.g. char[], because void* and reinterpret_cast don't work in constexpr. See the second example in https://pdimov.github.io/hash2/doc/html/hash2.html#example_compile_time_hash... Calling either `update` won't work in this case.

On Thu, Dec 5, 2024 at 12:06 PM Vinnie Falco
...Instead of writing:
h.update( span );
You instead write:
boost::hash2::update( h, span );
Anyway it looks like this wasn't clear so I guess I'll try again and be even more explicit. Currently, the hash algorithm provides the following member function for sending a contiguous range of binary data: void update( void const*, std::size_t ); The combination of pointer and size passed as individual arguments is triggering to certain folks, who instead prefer the following, or some variation of it (for example, using uint8_t instead of byte) void update( std::span< std::byte const > ); This is disfavored by the library author (and myself) for reasons already posted to the mailing list which I will not repeat. Yet an argument can be made that the use of a span could be favorable to tooling which performs buffer safety analysis. Thus I propose to add the following additional free function. Doing so allows authors of hash functions to call the free function with a span instead of the update member function which accepts a pointer and size, without burdening the authors of hash algorithms: namespace boost::hash2 { template< typename Hasher > void update( Hasher& h, std::span< std::byte const > s ) { h.update( s.data(), s.size() ); } } I have to stress that this is not the same as the hash_append function which by coincidence has an identical signature (and different implementation). This is the alternative to: 1. changing the signature of update in the Hasher concept, or 2. adding an overload of update to the Hasher concept (both of these are disfavored for reasons already given). Thanks

5 décembre 2024 à 12:52 "Peter Dimov via Boost" a écrit:
I can obviously support any form of `update` by making it templated and do the appropriate enable_if incantations, but I don't want to impose this requirement on hash algorithm authors.
Can't it be fully done inside hash_append instead, without requiring the hash update itself being template? In c++20, it's pretty trivial to invoke a callable object with the correct set of arguments (see https://godbolt.org/z/KhP5bcM3d for the kind of things i'm thinking of). Of course it makes the Hash concept pretty hard to define, as it's kinda polymorphic… I don't remember how doable it is with C++11, just pretty sure that it will be a lot more painful. Regards, Julien

There has been some discussion about this and the consensus is that
`uint8_t` is a better option than `std::byte` as the former represents a
determinate number of bytes (
https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const
uint8_t>, and `span
On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost
wrote: function that takes a pair of void*/size_t should be a span
or rather `std::span<[const ]std::byte>`. --DD
https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes

On Wed, Dec 4, 2024 at 2:36 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:
There has been some discussion about this and the consensus is that `uint8_t` is a better option than `std::byte` as the former represents a determinate number of bytes ( https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const uint8_t>, and `span
` for immutable and mutable bytes respectively.
I presume think static_assert for CHAR_BIT value is enough, I am not aware of anybody using systems where this is not true, despite what faq says... but I must admit I am not familiar with all weird archs so I could be wrong

Ivan Matek wrote:
On Wed, Dec 4, 2024 at 2:36 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:
There has been some discussion about this and the consensus is that `uint8_t` is a better option than `std::byte` as the former represents a determinate number of bytes ( https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const uint8_t>, and `span
` for immutable and mutable bytes respectively. I presume think static_assert for CHAR_BIT value is enough, I am not aware of anybody using systems where this is not true, despite what faq says... but I must admit I am not familiar with all weird archs so I could be wrong
Using uint8_t is equivalent to asserting that CHAR_BIT is 8. In practice, uint8_t, when defined, is always unsigned char, except when it's `char` on platforms that strive to make your life interesting. But that's non-conforming. There's an attempt to make CHAR_BIT == 8 required http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3477r0.html but it might be slightly ahead of its time because a few holdover platforms with CHAR_BIT == 32 still remain in use. (On these, uint8_t is a compile time error.)
participants (13)
-
Alexander Grund
-
Andrey Semashev
-
Christian Mazakas
-
Claudio DeSouza
-
Dominique Devienne
-
Ivan Matek
-
Joaquín M López Muñoz
-
Julien Blanc
-
Matt Borland
-
Peter Dimov
-
Rainer Deyke
-
Vinnie Falco
-
Zach Laine