[hash2][review] Early review (due to holidays)

Claudio DeSouza

4 Dec 2024 4 Dec '24

1:09 a.m.

Hello. My review is the summary of a few discussions I've had with Peter and Christian regarding the API design being employed for hash2. I'm aware hash2 is inspired by one of Hinnat's proposals to standardisation that didn't come to fruition. I would like first to commend the authors for their work on this project. Having good libraries for this type of task is important, especially when the big names like OpenSSL provide not-so-great interfaces. Having said that, I have a few thoughts on what this library may be missing. *Spanification* There's a significant change in mood with regards to safe library design taking place in the industry. With clang, we now have -Wunsafe-buffer-usage, and this warning has been continuously improved recently. For reference https://clang.llvm.org/docs/SafeBuffers.html When I was going through this library's code I was left with a frustrating thought that boost should be at the forefront of this type of innovation: safe, and intuitive library design. Each function that takes a pair of void*/size_t should be a span<uint_8>. The model of how to approach these areas is a solved issue at this point. A robust span type with intuitive extensions would have matching constructors for all sorts of contiguous ranges, including native arrays, which would make using this library much simpler. It would also afford the necessary compile-time validations that you get with static extents in a span. At the current state, hash2 relies on a lot of pinky-promise guarantees that we tend to live with in C++, but that in fact can be avoided without much (or any) overhead. *Digest type* Due to limitations to how much of std::array is constexpr in different C++ versions, this library offers its own digest type. However, this just wraps a stack array. I feel that rather than returning a digest that is stack allocated, it would have been better for algorithms to take a fixed size span where the digest can be written to. I say this because in many cases a stack allocated array will be fine. However a mutable fixed span would provide more flexibility about the storage, for instance, a vector. There's also the matter of hardening (https://libcxx.llvm.org/Hardening.html) which one loses by using this particular digest type, rather than storage provided by a standard type. But I would also like to mention that sha3 has a variable digest size, so again, another reason why an out param mutable span would be a more flexible choice on the whole. *Boost Identity* I have been reluctant to write a review because it seems unfair to demand of hash2 to provide robust spanified interfaces, when no other library before was required to do so. However, at the same time, considering that this type of code tends to be used in safety-sensitive contexts, I fear this library may fail to achieve its potential, when a lot of its interfaces are unsafe. if I had to bring this library to any project I'm working on, I would have to wrap it into safe spanified interfaces, to hide away all these pointer/length pairs around, before I could use it in my code. However, I understand that to produce a spanified design, boost would have first to introduce a mature span implementation + extensions that can be leveraged for this type of code design. *Security* I think for things like HMAC, there is a question of allocators that zero-out memory, of secure obfuscated memory content in case of crash dumps, etc, that this library has not been attentive to, so I thought it was worth mentioning. *In conclusion* I'm here to support the authors. I don't feel like voting either accept or reject, because I think Boost as a whole has not provided the mission critical answer to that. I want this library to succeed. I just wish it was a bit more ambitious in its goal on the library-design space. Claudio.

Show replies by date

Dominique Devienne

4 Dec 4 Dec

1:02 p.m.

On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost <boost@lists.boost.org> wrote:

...

function that takes a pair of void*/size_t should be a span<uint_8>

or rather `std::span<[const ]std::byte>`. --DD https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes

Peter Dimov

1:31 p.m.

Dominique Devienne wrote:

...

On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost <boost@lists.boost.org> wrote:

...
function that takes a pair of void*/size_t should be a span<uint_8>

or rather `std::span<[const ]std::byte>`. --DD

https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes

Or rather, the nonexistent span<void>, which is like void*, but better. N3980's (equivalent of) update takes by void const*. Originally, I had that changed to unsigned char const* (or even to byte_type const*, with the idea of switching to std::byte eventually), but this proved very cumbersome in practice. Untyped byte sequences sometimes come as char[], sometimes as unsigned char[], sometimes (if you're modern) as std::byte[], and taking by any of these means that the rest need an ugly reinterpret_cast. So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability. (And then had to add unsigned char const* back because constexpr.) The hypothetical span<void> would take either of those three byte-like types (and maybe char8_t[] as well), so that the user doesn't need to reinterpret_cast.

Claudio DeSouza

2:18 p.m.

...

So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.

In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.

Claudio DeSouza

2:23 p.m.

By the way, C++20 offers byte extensions https://en.cppreference.com/w/cpp/container/span/as_bytes and interestingly enough they do use `std::byte`.

Ivan Matek

4:14 p.m.

On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:

...

By the way, C++20 offers byte extensions

Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs. Sure we could add boost::byte to boost, but that is a different discussion.

Peter Dimov

4:22 p.m.

Ivan Matek wrote:

...

On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:

...
By the way, C++20 offers byte extensions

Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs.

It is C++11, yes. We briefly considered requiring C++14 for constexpr reasons, but feedback on Slack wasn't positive.

Claudio DeSouza

4:23 p.m.

As I explained in my review, yes, for the authors to accommodate what I'm asking, it is fair to say that this library would have to depend on some robust theoretical span library, span2 if you will, and that's why I didn't vote accept nor reject, because it seems unfair to impose such a requirement. There are certainly challenges to reconcile these demands across different C++ standards. On Wed, Dec 4, 2024 at 4:14 PM Ivan Matek <libbooze@gmail.com> wrote:

...

On Wed, Dec 4, 2024 at 3:23 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:

...
By the way, C++20 offers byte extensions

Not to speak for author, but if I understand correctly library is C++11 library intentionally(it explicitly mentions g++ 4.8 in documentation, which is 2013 compiler), so any std::byte interface would be optional/#ifdef flavor, and I am not a huge fan of those APIs. Sure we could add boost::byte to boost, but that is a different discussion.

Peter Dimov

2:38 p.m.

Claudio DeSouza wrote:

...

...
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.

In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.

Well... no, they aren't. Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's hash.update( v.data(), v.size() ); And if you have the data in `std::string st`, that's hash.update( st.data(), st.size() ); Now suppose `update` takes `span<unsigned char const>`. For `v`, this is hash.update( v ); which is nice; but for `st`, that would be (assuming C++17) hash.update( as_bytes( span{st} ) ); which is not so nice. In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.

Claudio DeSouza

2:41 p.m.

I tend to disagree. In my opinion even this: hash.update( as_bytes( span{st} ) ); Is better than this: hash.update( v.data(), v.size() ); Something like as_byte_span is just an extra nicety, but I think it is easy to underestimate how you can avoid mistakes with the first example as opposed to the second. On Wed, Dec 4, 2024 at 2:38 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Claudio DeSouza wrote:

...
...
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.

In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.

Well... no, they aren't.

Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's

hash.update( v.data(), v.size() );

And if you have the data in `std::string st`, that's

hash.update( st.data(), st.size() );

Now suppose `update` takes `span<unsigned char const>`. For `v`, this is

hash.update( v );

which is nice; but for `st`, that would be (assuming C++17)

hash.update( as_bytes( span{st} ) );

which is not so nice.

In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

4:01 p.m.

Claudio DeSouza wrote:

...

I tend to disagree. In my opinion even this:

hash.update( as_bytes( span{st} ) );

Is better than this:

hash.update( v.data(), v.size() );

Maybe. But the two cases with the current interface are consistent - you do the same thing for both vector and string - and the two cases with the "normal" span aren't. I would prefer a span-style interface where the two cases look the same, like this hash.update( v ); hash.update( st ); There'd be no questions here as to whether this is a usability gain compared to the current interface. (Ideally, in a hypothetical stdlib, .data would return a span and the above would be hash.update( v.data() ); hash.update( st.data() ); to make it explicit that we're dropping to a lower level interface, but one can't have everything.)

...

Something like as_byte_span is just an extra nicety, but I think it is easy to underestimate how you can avoid mistakes with the first example as opposed to the second.

On Wed, Dec 4, 2024 at 2:38 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
Claudio DeSouza wrote:

...
...
So I switched `update` back to void const*. The "type safety" gains aren't worth the substantial decrease in usability.

In my experience, the usability/readability gains are on the side of using span. This may involve providing extensions, but at the end of the day you get more C++-friendly constructs.

Well... no, they aren't.

Suppose you have the data in `std::vector<unsigned char> v`. With the current `update` that's

hash.update( v.data(), v.size() );

And if you have the data in `std::string st`, that's

hash.update( st.data(), st.size() );

Now suppose `update` takes `span<unsigned char const>`. For `v`, this is

hash.update( v );

which is nice; but for `st`, that would be (assuming C++17)

hash.update( as_bytes( span{st} ) );

which is not so nice.

In Chromium you have as_byte_span, which does as_bytes(span(x)), so it'd be better.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Alexander Grund

5 Dec 5 Dec

8:54 a.m.

Am 04.12.24 um 17:01 schrieb Peter Dimov via Boost:

...

Claudio DeSouza wrote:

...
I tend to disagree. In my opinion even this:

hash.update( as_bytes( span{st} ) );

Is better than this:

hash.update( v.data(), v.size() ); Maybe.

But the two cases with the current interface are consistent - you do the same thing for both vector and string - and the two cases with the "normal" span aren't.

I would prefer a span-style interface where the two cases look the same, like this

hash.update( v ); hash.update( st );

There'd be no questions here as to whether this is a usability gain compared to the current interface.

(Ideally, in a hypothetical stdlib, .data would return a span and the above would be

hash.update( v.data() ); hash.update( st.data() );

to make it explicit that we're dropping to a lower level interface, but one can't have everything.)

If the hash-algorithms would accept `span<const void>` (or <const char>) instead as the only `update` method (no need for overloads) we could have hash_append( span{v} ); hash_append( span{st} ); by accepting a templated span and casting it to the span-type expected by the hash-algorithms. And/or a `hash2::span` that accepts any range-like in the ctor and doing the cast. This would allow the save interface Claudio proposed while still having the same interface. Or at least have it at the level of the hash algorithm as user interaction is through `hash_append` provided by the library so that could make all necessary "spanification" if I didn't miss anything

Peter Dimov

11:52 a.m.

Alexander Grund wrote:

...

If the hash-algorithms would accept `span<const void>` (or <const char>) instead as the only `update` method (no need for overloads) we could have

hash_append( span{v} ); hash_append( span{st} );

by accepting a templated span and casting it to the span-type expected by the hash-algorithms. And/or a `hash2::span` that accepts any range-like in the ctor and doing the cast.

Yes, obviously. However, there's an additional concern here that people perhaps don't appreciate. The hash algorithm requirements govern user-defined algorithms as well. N3980 wasn't supposed to be a library of hash algorithms; it was supposed to be a framework into which any user-defined hash algorithm can plug into, and be used for hashing C++ objects. That's why the hash algorithm interface is kept minimal and uncomplicated (which is at odds with making it as ergonomic as possible.) I can obviously support any form of `update` by making it templated and do the appropriate enable_if incantations, but I don't want to impose this requirement on hash algorithm authors. And I don't want to force hash algorithm authors to have to include <boost/hash2/span.hpp>, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be.

Alexander Grund

12:27 p.m.

...

And I don't want to force hash algorithm authors to have to include <boost/hash2/span.hpp>, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be. But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>);

I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation. But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges. The proposed `hash2::span` would then only appear in the interface of the library, i.e. `hash_append` My thinking is: If this is intended to push N3980 forward for inclusion in a future standard then I'd expect it to not use the "old-style" pointer+size-pairs but std::span readily available in the targeted standard.

Peter Dimov

12:55 p.m.

Alexander Grund wrote:

...

But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>);

Which span is that, though?

...

I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation.

Even if we could assume C++20 and std::span, the interface taking std::span<unsigned char const> is inferior in usability to the current approach because it doesn't work well for ranges of `char` or `byte`.

...

But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges.

I'm not sure what this means. hash_append already accepts spans and ranges, and is type safe.

Alexander Grund

1:03 p.m.

Am 05.12.24 um 13:55 schrieb Peter Dimov via Boost:

...

Alexander Grund wrote:

...
But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>); Which span is that, though? I left that open to the "chosen span library" I referenced belowor a template that accepts a span-like type. Even if we could assume C++20 and std::span, the interface taking std::span<unsigned char const> is inferior in usability to the current approach because it doesn't work well for ranges of `char` or `byte`. I copied that from the source and missed it in the documentation. So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte) I'm not sure what this means. hash_append already accepts spans and ranges, and is type safe. My bad, sorry for not double-checking.

Peter Dimov

1:31 p.m.

Alexander Grund wrote:

...

So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)

span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it. If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type. (Four because of char8_t.) Definitely something to consider when Boost drops support for C++17. Should be any time now.

Zach Laine

4:08 p.m.

On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Alexander Grund wrote:

...
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)

span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.

If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.

(Four because of char8_t.)

This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *? Zach

Peter Dimov

4:47 p.m.

Zach Laine wrote:

...

On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
Alexander Grund wrote:

...
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)

span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.

If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.

(Four because of char8_t.)

This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *?

If you mean this template<class T> void update( std::span<T> sp ); then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work. I'm not particularly fond of the user-facing syntax being hash.update( std::span{buffer} ); instead of hash.update( buffer );

Zach Laine

6:26 p.m.

On Thu, Dec 5, 2024 at 10:47 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Zach Laine wrote:

...
On Thu, Dec 5, 2024 at 7:31 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
Alexander Grund wrote:

...
So span<void const> then or as that isn't accepted span<char const> assuming sizeof(char)==sizeof(byte)

span<void const> doesn't exist, though. I was giving it as an example of something that I would have added to the hypothetical hash2::span, but no existing span library has it.

If we limit ourselves to std::span, I suppose we can just add three (or four) overloads, one per byte type.

(Four because of char8_t.)

This part confuses me. It's always safe to cast to void const *; what not just have a single template that takes a std::span<T>, and cast it's .begin() and .end() to void const *?

If you mean this

template<class T> void update( std::span<T> sp );

then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work.

I'm not particularly fond of the user-facing syntax being

hash.update( std::span{buffer} );

instead of

hash.update( buffer );

Thanks, that makes sense. Looking at the docs, I think I get it now -- this is a user-provided API that your lib requires, right? If that's the case, I think I'd also require what you have, since there's no single concrete span type you could write there. An alternative might be to simply change the API depending on the value of __cplusplus -- void* + size_t before C++20, and std::span<std::byte> in C++20 and later. I realize some people might hate this. :) Zach

Andrey Semashev

1:05 p.m.

On 12/5/24 15:27, Alexander Grund via Boost wrote:

...

...
And I don't want to force hash algorithm authors to have to include <boost/hash2/span.hpp>, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be. But you are currently defining the interface of the hash algorithms: constexpr void update( unsigned char const* data, std::size_t n ); My proposal is to define it as e.g. constexpr void update(span<unsigned char const>);

I.e. to go range-safe from the start. That would not depend on the hash library but on the chosen span library, which might be more acceptable. However in C++11/14 this would require the Boost span, so I guess this is also off the table, unless the interface would be required to support span-likes which could be done by users via templates and not require a specific span implementation.

But even with the current interface of "potentially unsafe" hash algorithms the `hash_append` could be written to accept only spans/ranges. The proposed `hash2::span` would then only appear in the interface of the library, i.e. `hash_append`

My thinking is: If this is intended to push N3980 forward for inclusion in a future standard then I'd expect it to not use the "old-style" pointer+size-pairs but std::span readily available in the targeted standard.

What is the practical advantage of using span (of any provider) over the traditional pointer+size? I emphasize the "practical" part, as opposed to any personal preference to "modern-ness", code style or anything like that. I'm asking because in my practice I often don't see the point in wrapper classes such as span<uint8_t> and iterator_range<uint8_t*> because the original data is often received as a pointer+size pair (or a pair of pointers) and then consumed through a pointer+size - either by my own code or by a third party code, such as an external library, which is often a C library. A wrapper class is useful for storing as a data member in a class for associating the pointer and the size, but as far as passing the range around is concerned, I don't see much point. I do agree with Peter that not having to depend on Boost.Hash2 to be able to define hash functions and adding support for hashing to user's types is an important advantage.

Claudio DeSouza

1:12 p.m.

...

What is the practical advantage of using span (of any provider) over the traditional pointer+size? I emphasize the "practical" part, as opposed to any personal preference to "modern-ness", code style or anything like that.

Like I explained in my original review, there is now a new type of program analysis guarded behind `-Wunsafe-buffers-usage` that this library would greatly benefit from, if it were spanified. The warning itself is opt-in to users. In the end of the day, this is about removing the chance of certain categories of mistakes around the use of pointers, while improving readability.

Peter Dimov

1:38 p.m.

Andrey Semashev wrote:

...

What is the practical advantage of using span (of any provider) over the traditional pointer+size? I emphasize the "practical" part, as opposed to any personal preference to "modern-ness", code style or anything like that.

The practical advantage is that if a library is rewritten to use spans throughout, and never pointer arithmetic, this (a) automatically adds bounds checks everywhere and (b) makes the library warning-free under the new -Wunsafe-buffer-usage, which is all the rage nowadays because Chromium developers are rewriting Chromium as above. https://clang.llvm.org/docs/SafeBuffers.html Longer term, it's quite likely that a memory safety "Profile" (Profiles are also all the rage nowadays because they are Bjarne and Herb's proposal to "solve" safety) will be essentially equivalent to -Werror=unsafe-buffer-usage as far as pointer arithmetic is concerned. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3081r0.pdf

Joaquín M López Muñoz

1:11 p.m.

...

No dia 5 de dez. de 2024, às 12:52, Peter Dimov via Boost <boost@lists.boost.org> escreveu:

[…] And I don't want to force hash algorithm authors to have to include <boost/hash2/span.hpp>, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be.

You can define the interface of update as accepting a templatized arg with span-like semantics. Joaquin M Lopez Munoz

Peter Dimov

1:40 p.m.

Joaquín M López Muñoz wrote:

...

...
No dia 5 de dez. de 2024, às 12:52, Peter Dimov via Boost <boost@lists.boost.org> escreveu:

[…] And I don't want to force hash algorithm authors to have to include <boost/hash2/span.hpp>, because this would mean a physical dependency on Hash2. At the moment, you can define a hash algorithm without including anything from Hash2, or anything from Boost. And that's how it has to be.

You can define the interface of update as accepting a templatized arg with span-like semantics.

That's what I said in the part you omitted. I consider this a burden that I'd rather not impose on authors of hash algorithms.

Claudio DeSouza

2:13 p.m.

I particularly think that at this point span is heading on a direction to be as fundamental as string_view, etc. Claudio

Peter Dimov

2:28 p.m.

Claudio DeSouza wrote:

...

I particularly think that at this point span is heading on a direction to be as fundamental as string_view, etc.

span is more fundamental than string_view; it's basically a fat pointer. (Like T* under CHERI, or T? in Cyclone, etc.)

Vinnie Falco

8:06 p.m.

On Thu, Dec 5, 2024 at 5:11 AM Joaquín M López Muñoz via Boost < boost@lists.boost.org> wrote:

...

You can define the interface of update as accepting a templatized arg with span-like semantics.

This could work. Instead of writing: h.update( span ); You instead write: boost::hash2::update( h, span ); The signatures for update can be conditionally compiled based on C++ version, availability of boost::span, or it could be a function template based on a span-like concept. Hasher's update signature would remain as ( void const*, std::size_t ). Authors of hash algorithms would not need to include a span header or add additional signatures. Thanks to Matt Borland for helping me work this out. Regards

Claudio DeSouza

6 Dec 6 Dec

1:28 a.m.

...

based on a span-like concept. Hasher's update signature would remain as ( void const*, std::size_t ). Authors of hash algorithms would not need to include a span header or add additional signatures.

Am I missing something? Why would that be a benefit? Having safe interfaces is the whole point I was raising. A span header is not something onerous to include. Why should any hashing authors not be provided with friendly, safer interfaces? Claudio.

Julien Blanc

7:30 a.m.

Le jeudi 05 décembre 2024 à 12:06 -0800, Vinnie Falco via Boost a écrit :

...

On Thu, Dec 5, 2024 at 5:11 AM Joaquín M López Muñoz via Boost < boost@lists.boost.org> wrote:

...
You can define the interface of update as accepting a templatized arg with span-like semantics.

This could work. Instead of writing:

h.update( span );

You instead write:

boost::hash2::update( h, span );

+1 on this approach. The proposed library currently consists of two things : * a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects I think it should evolve into something more like : * a concept that hashers must use * a public api for hash users This means that for the library user, the hash object would basically be an opaque type, only accessed through calls like boost::hash2::hash_append(h, X) or boost::hash2::update(h, Y). The public api can then use a set of overloads for Y to anything we want to support (the multiple needed span<X>, a pair of pointers, a pointer + a size, etc.). That burden is left out of hash implementors.

...

The signatures for update can be conditionally compiled based on C++ version, availability of boost::span, or it could be a function template based on a span-like concept.

...

Hasher's update signature would remain as (void const*, std::size_t ).

Or, hasher's signature can be selected through a set of supported ones. I'm not convinced by the ”hash writer shall not depend on boost::hash2” argument, but if that really matters then it's technically doable without forcing the use of the pointer + size pair for hash implementors. Regards, Julien

Alexander Grund

8:31 a.m.

...

...
...
You can define the interface of update as accepting a templatized arg with span-like semantics.

This could work. Instead of writing:

h.update( span );

You instead write:

boost::hash2::update( h, span );

+1 on this approach.

The proposed library currently consists of two things :

* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.

So this discussion is only on which "concept that hashers must use".

...

If you mean this

template<class T> void update( std::span<T> sp );

then the reason to not use it is that you can't pass things like `string` and `vector<unsigned char>` and `char[65536]` directly because deduction will not work.

I'm not particularly fond of the user-facing syntax being

hash.update( std::span{buffer} );

instead of

hash.update( buffer ); Related to the above: Isn't the user-facing API the `hash_append` function and this update method is part of the concept for hash algorithms? And `hash_append` can massage the input into whatever that concept requires.

Given that I kind see that point why void*-size is acceptable, except for the new warning/safety-guidline against such pairs. If that should be adhered too (note that I no longer lean strongly either way) the interface could be defined using a SpanConcept: `void update(SpanConcept)` where `concept SpanConcept = requires(T t) { t.data(); t.size(); }` (Use of C++20 syntax only for illustration) So implementers are not required to support a particular span but to accept any span-like that `hash_append` passes to it. However I see the shortcoming that this would possibly exclude `template<typename T> void update(std::span<T>)` because `hash_append` might use `boost::span` which isn't(?) convertible to std::span (or whatever span type the hash algorithm accepts). So hash-algorithms need to use `template<class T> void update(T data_span)` exclusively, which isn't great either. --> void*-size is likely the most flexible and least burden for hash algorithm authors. If there is the way to silence the compiler warning in a reasonable way I don't see a reason against it.

Julien Blanc

9:22 a.m.

6 décembre 2024 à 09:31 "Alexander Grund via Boost" a écrit:

...

...
The proposed library currently consists of two things :

* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects

As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.

I have a different understanding. Quoting the doc, “This library addresses two major use cases: hashing an untyped sequence of bytes, and hashing C++ objects.”. But for the former, the library relies on the user to directly call the hasher object update function. This is the part i don't feel comfortable, with, i think a hasher completly opaque for the user of the library would be a better design. That's just a minor change in term of implementation.

...

So this discussion is only on which "concept that hashers must use".

I believe part of the discussion is also a misunderstanding on what the public api is. It seems we don't all have the same opinion on this. Mine is that update is also part of the public api, not only hash_append. This public api needs to be at least compatible with modern / safe paradigms. This is not necessarily

...

Related to the above: Isn't the user-facing API the `hash_append` function and this update method is part of the concept for hash algorithms? And `hash_append` can massage the input into whatever that concept requires.

If i understand correctly: std::span<char> v("123", 3) hash_append(h, v); // hash "123" + the size 3, as v is a container h.update(v); // hash only "123". Or did i miss something? Regards, Julien

Andrey Semashev

10:15 a.m.

On 12/6/24 12:22, Julien Blanc via Boost wrote:

...

6 décembre 2024 à 09:31 "Alexander Grund via Boost" a écrit:

...
...
The proposed library currently consists of two things :

* a concept that hashers must use (for hash implementors) * an hash_append helper to ease the use of hash functions with objects

As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.

I have a different understanding.

Quoting the doc, “This library addresses two major use cases: hashing an untyped sequence of bytes, and hashing C++ objects.”.

But for the former, the library relies on the user to directly call the hasher object update function. This is the part i don't feel comfortable, with, i think a hasher completly opaque for the user of the library would be a better design. That's just a minor change in term of implementation.

A hasher must expose a common public interface anyway to be compatible with hash_append, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it. I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed. This would allow the user types' hash functions not depend on Boost.Hash2 and therefore make support for hashing more lightweight and potentially compatible with std::hash2, if one ever comes to light.

Julien Blanc

11:13 a.m.

6 décembre 2024 à 11:15 "Andrey Semashev via Boost" a écrit:

...

A hasher must expose a common public interface anyway to be compatible with hash_append

This is stricter than what is actually required. hash_append can be made to work with hashers having a different interface, as long as they follow a concept known to hash_append / hash_update. This concept does not need to rely on types defined by boost::hash2, the dependency is the other way (boost::hash2 must include a specialization for the interface of the hasher).

...

, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it.

That is imho largely worth it. This also gives a lot more flexibility / room for evolution. There is an order of magnitude between the number of usages of the hashing api, and the number of hashers used in a project. Changing the hashing api means changing every call to hash_append (resp hash_update). Changing the hasher concept only means changing a few files. But, as i said before, you can even support multiple hasher concepts in hash_append / hash_update, so changing the hasher recommended api can be made while keeping old hasher api still working. Something that cannot be done if you talk directly to the hasher.

...

I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed.

I don't suggest to make changes to specifically forbid such use. Just that the nominal / advertised usage uses a thin wrapper (which, i believe, would be fully inlined and not result in additional runtime cost).

...

This would allow the user types' hash functions not depend on Boost.Hash2 and therefore

This is not required by what i suggest.

...

make support for hashing more lightweight

I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).

...

and potentially compatible with std::hash2, if one ever comes to light.

I do not see how this is related. Actually, i think it's quite the opposite: by decoupling the user API from the hasher concept API, you should make it easier to integrate new standards / requirements on either side. Or i'm missing something obvious here. Regards, Julien

Andrey Semashev

1:24 p.m.

On 12/6/24 14:13, Julien Blanc via Boost wrote:

...

6 décembre 2024 à 11:15 "Andrey Semashev via Boost" a écrit:

...
A hasher must expose a common public interface anyway to be compatible with hash_append

This is stricter than what is actually required. hash_append can be made to work with hashers having a different interface, as long as they follow a concept known to hash_append / hash_update. This concept does not need to rely on types defined by boost::hash2, the dependency is the other way (boost::hash2 must include a specialization for the interface of the hasher).

...
, and at this point why not use the interface in the user types hash functions directly? What value a free hash_append function would bring? The only thing that comes to mind is it could have a more flexible interface, as Vinnie suggested, to keep the hasher interface minimal, but that's about it.

That is imho largely worth it. This also gives a lot more flexibility / room for evolution. There is an order of magnitude between the number of usages of the hashing api, and the number of hashers used in a project. Changing the hashing api means changing every call to hash_append (resp hash_update). Changing the hasher concept only means changing a few files.

But, as i said before, you can even support multiple hasher concepts in hash_append / hash_update, so changing the hasher recommended api can be made while keeping old hasher api still working. Something that cannot be done if you talk directly to the hasher.

...
I do think though that whether or not a free hash_append function is provided, using the hasher directly through its public interface should be allowed.

I don't suggest to make changes to specifically forbid such use. Just that the nominal / advertised usage uses a thin wrapper (which, i believe, would be fully inlined and not result in additional runtime cost).

...
This would allow the user types' hash functions not depend on Boost.Hash2 and therefore

This is not required by what i suggest.

...
make support for hashing more lightweight

I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).

I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs: https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi... class X { private: std::string a; int b; // ... public: template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } }; This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely. I.e. so that something like this is a working and advertised way to add support for hashing to your type: class X { private: std::string a; int b; // ... public: template<class Hash, class Flavor> friend void hash_append( Hash& h, Flavor const& f, X const& v ) { h.update(f, v.a); h.update(f, v.a.size()); h.update(f, v.b); } }; X x; boost::hash2::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works

...

...
and potentially compatible with std::hash2, if one ever comes to light.

I do not see how this is related.

What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all. X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X

Peter Dimov

1:47 p.m.

Andrey Semashev wrote:

...

I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:

https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } };

This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely.

Once things stabilize, you will be allowed to copy hash_append_fwd.hpp locally (or just declare the functions yourself.) This has always been the intent, but GCC bugs and BOOST_CXX14_CONSTEXPR complicated matters. As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue. https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...

Claudio DeSouza

1:51 p.m.

...

As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.

But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.

Peter Dimov

2:01 p.m.

Claudio DeSouza wrote:

...

...
As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.

But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.

Well, we can ask here. Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

Andrey Semashev

2:14 p.m.

On 12/6/24 17:01, Peter Dimov via Boost wrote:

...

Claudio DeSouza wrote:

...
...
As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.

But is Slack representative of a broader C++ audience. It depends which channel you asked at, because if you asked at #boost, you will have a biased response in favour of C++11.

Well, we can ask here.

Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

There has to be a cost/benefit analysis. That is, if C++14 is the baseline, will that significantly affect the library design and implementation? And what is the estimate of C++11-only user base that will be cut off? On the first question, library authors are probably the ones who can comment. The second one is more difficult to answer, but I think someone posted here survey results a while ago. Stock compiler versions in the major OSes is another indication. Personally, I'm fine with anything up to and including C++17.

Peter Dimov

4:06 p.m.

Andrey Semashev wrote:

...

Personally, I'm fine with anything up to and including C++17.

If only. I could definitely use a few `if constexpr` here and there.

Dominique Devienne

3:23 p.m.

On Fri, Dec 6, 2024 at 3:02 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Claudio DeSouza wrote:

...
...
the feedback on Slack was negative.

But is Slack representative of a broader C++ audience?

Do people care about C++11 support in Hash2?

No. --DD

Alexander Grund

3:26 p.m.

...

Well, we can ask here.

Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14? For a new library I think it's very much ok.

Vinnie Falco

3:27 p.m.

On Fri, Dec 6, 2024 at 6:02 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

If C++14 allows for a better public-facing design then it is probably best to require it. Can we get a Tony Table? Thanks

Peter Dimov

3:33 p.m.

Vinnie Falco wrote:

...

On Fri, Dec 6, 2024 at 6:02 AM Peter Dimov via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org> > wrote:

Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

If C++14 allows for a better public-facing design then it is probably best to require it. Can we get a Tony Table?

No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

Vinnie Falco

3:48 p.m.

On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov <pdimov@gmail.com> wrote:

...

No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?" Thanks

Julien Blanc

3:57 p.m.

6 décembre 2024 à 16:48 "Vinnie Falco via Boost" a écrit:

...

On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov <pdimov@gmail.com> wrote:

...
No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"

Not speaking for the others, but “i don't use c++11 anymore” is my usual answer. Finally dropped gcc4.8, migrated everything to C++20, won't go back for sure :). Regards, Julien

Peter Dimov

3:57 p.m.

Vinnie Falco wrote:

...

On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov <pdimov@gmail.com <mailto:pdimov@gmail.com> > wrote:

No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"

They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine _for them_. Not asking them to speculate whether a C++14 requirement wouldn't be fine _for some unspecified other people_.

Vinnie Falco

4:19 p.m.

On Fri, Dec 6, 2024 at 7:57 AM Peter Dimov <pdimov@gmail.com> wrote:

...

They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine _for them_. Not asking them to speculate whether a C++14 requirement wouldn't be fine _for some unspecified other people_.

When I read "Do people care about C++11 support in hash2" I interpret "people" as "unspecified other people". And when I read "would it be acceptable to impose a minimum requirement of C++14" I understand this to be asking if it is acceptable to impose the C++14 requirement on everyone. My apologies for the misunderstanding. Thanks

Matt Borland

4:26 p.m.

...

On Fri, Dec 6, 2024 at 7:57 AM Peter Dimov pdimov@gmail.com wrote:

...

...
They don't need a rationale. I'm asking their opinions on whether a C++14 requirement would be fine for them. Not asking them to speculate whether a C++14 requirement wouldn't be fine for some unspecified other people.

...

When I read "Do people care about C++11 support in hash2" I interpret "people" as "unspecified other people". And when I read "would it be acceptable to impose a minimum requirement of C++14" I understand this to be asking if it is acceptable to impose the C++14 requirement on everyone. My apologies for the misunderstanding.

...

Thanks

To my knowledge now that RHEL 7 has gone EOL there is no supported OS without a C++14 compatible compiler by default. Naked constexpr will break MSVC 14.0, but that's not a big loss. Anecdotally Math and Multiprecision moved to C++14 a few years ago with 0 complaints. I know for fact those run on pretty special hardware (e.g. Oak Ridge's supercomputers) with their own specialized compilers and the situation is fine. Matt

Andrey Semashev

4 p.m.

On 12/6/24 18:48, Vinnie Falco via Boost wrote:

...

On Fri, Dec 6, 2024 at 7:33 AM Peter Dimov <pdimov@gmail.com> wrote:

...
No, the public-facing design doesn't change. `BOOST_CXX14_CONSTEXPR` is replaced by the less ugly `constexpr` and that's pretty much it.

My next question is for those individuals who proclaim that requiring C++14 and later is ok. What is the specific rationale you used to determine whether the loss of users is worth the benefit of "BOOST_CXX14_CONSTEXPR replaced by the less ugly constexpr`?"

If you're addressing me, I've expressed my thoughts in my previous post: https://lists.boost.org/Archives/boost/2024/12/258515.php I'm not proclaiming anything, and instead I'm saying that the decision is on the authors. If the authors do not think that "replacing BOOST_CXX14_CONSTEXPR with conspexpr" is worth the cost, then don't require C++14, simple.

Rainer Deyke

7 Dec 7 Dec

12:47 p.m.

On 06.12.24 15:01, Peter Dimov via Boost wrote:

...

Do people care about C++11 support in Hash2? Would it be acceptable to impose a minimum requirement of C++14?

Personally, I'm OK with anything up to C++20. Upgrading my codebase to C++20 was difficult, but it's done now, and I won't be going back. On a more philosophical note, I'd rather see a new library drop C++11 support before it is released than 20 years after it is released. Dropping support will prevent some projects from using the library, but it won't break any existing code because that code does not exist yet. Dropping support later *will* break existing code at that point, even if the C++ standard being dropped is 33 years old at that point. -- Rainer Deyke - rainerd@eldwood.com

Andrey Semashev

6 Dec 6 Dec

2:17 p.m.

On 12/6/24 16:47, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:

https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } };

This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely.

Once things stabilize, you will be allowed to copy hash_append_fwd.hpp locally (or just declare the functions yourself.)

This has always been the intent, but GCC bugs and BOOST_CXX14_CONSTEXPR complicated matters.

As I mentioned, we briefly considered requiring C++14 so that the latter isn't an issue, but the feedback on Slack was negative. People love C++11 for some reason or other. Either way, this still wouldn't have solved the GCC issue.

https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...

Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.

Peter Dimov

2:34 p.m.

Andrey Semashev wrote:

...

...
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...

Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.

Maybe. Something could probably be figured out, but I don't think your sketch would be it. We can't (or rather, don't want to) require from hash algorithm authors to provide the full hash_append functionality via member functions, and wrapping doesn't work that well, at least in my head. Maybe I need to pass a helper object that provides the hash_append API as member functions. Abusing the tag for that would be a bit too clever. :-)

Andrey Semashev

2:41 p.m.

On 12/6/24 17:34, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
...
https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/hash_append...

Forward declarations are kind of fragile and they limit future evolution of the library. And they are tedious to add, considering that they would be needed everywhere where hashing support is needed. I would rather prefer the approach that doesn't need forward declarations.

Maybe. Something could probably be figured out, but I don't think your sketch would be it.

We can't (or rather, don't want to) require from hash algorithm authors to provide the full hash_append functionality via member functions, and wrapping doesn't work that well, at least in my head.

Maybe I need to pass a helper object that provides the hash_append API as member functions.

Yes, that sounds good to me.

...

Abusing the tag for that would be a bit too clever. :-)

On that note, what's the use of the tag argument? I have only seen hash_append_tag so far, and I don't think adding new tags in the future would be any different from using separate functions (i.e. to support a new tag would require about the same amount of user's code modifications as adding a new friend function to the user's classes).

Peter Dimov

2:48 p.m.

Andrey Semashev wrote:

...

On that note, what's the use of the tag argument? I have only seen hash_append_tag so far, and I don't think adding new tags in the future would be any different from using separate functions (i.e. to support a new tag would require about the same amount of user's code modifications as adding a new friend function to the user's classes).

This is the tag_invoke pattern: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1895r0.pdf Instead of every library making up its own identifier for ADL purposes (e.g. in Hash2 I used do_hash_append in the past), we use tag_invoke with a library-specific tag. The advantage is that the tag is namespaced. If two libraries pick the same name for their ADL customization point, they will conflict; but if two libraries pick the same tag name, they will not conflict because the tags are in their respective namespaces. We use tag_invoke in Boost.JSON as well.

Julien Blanc

3:54 p.m.

6 décembre 2024 à 14:24 "Andrey Semashev via Boost" a écrit:

...

...
I expect the wrapper to incur no additional runtime cost (over manual conversions that would be done at call site instead).

I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:

https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } };

This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely. I.e. so that something like this is a working and advertised way to add support for hashing to your type:

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void hash_append( Hash& h, Flavor const& f, X const& v ) { h.update(f, v.a); h.update(f, v.a.size()); h.update(f, v.b); } };

X x; boost::hash2::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works

I think i start to understand what you mean. In what i have in mind : * hash algorithm implementation and interface do not depend on boost::hash2 (this is the case currently) * custom object hashing does depend on boost::hash2 (this is also the case currently)

...

From your examples, it seems to me that you would like custom object hashing to also be independant from boost::hash2. Is that accurate?

...

...
...
and potentially compatible with std::hash2, if one ever comes to light.

I do not see how this is related.

What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all.

X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X

What i had in mind was more: X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface. On a broader view, Claudio's made a very valid point, which made me realize that the current review goals may be a bit unusual. The usual review focus on finding the best suitable interface for the library. However, i don't think such interface exists: * span<void const> is not available, even with a bleeding edge compiler. * support for older C++ versions mandates support for void* + size pointer as separate arguments (no span vocabulary type), which we know will be flagged by compilers as unsafe So, in my opinion, part of the review should also focus on how will the library evolve toward a safer/modern version, with minimal burden for the library user. Basically we know that we will deliver an interface that is outdated by current standards, but that we have very valid reasons to do that (these reasons would not hold if we were targetting std). We should think about of we handle the cohabitation with a safer interface, and how we provide a smooth transition path to users, taking full advantage of available compiler features, so that when they update their code to newer standards and safer patterns, boost::hash2 is not a concern and just works flawlessly. I understand that it is asking a lot to the library authors. Regards, Julien

Andrey Semashev

4:32 p.m.

On 12/6/24 18:54, Julien Blanc via Boost wrote:

...

6 décembre 2024 à 14:24 "Andrey Semashev via Boost" a écrit:

...
I also had compile-time dependencies in mind when I said "lightweight". To take the example from the docs:

https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_objects_user_defi...

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void tag_invoke( boost::hash2::hash_append_tag const&, Hash& h, Flavor const& f, X const& v ) { boost::hash2::hash_append(h, f, v.a); boost::hash2::hash_append(h, f, v.b); } };

This requires one to depend on Boost.Hash2 for hash_append_tag and hash_append. I would prefer if there was a way (properly documented and supported) to avoid this dependency entirely. I.e. so that something like this is a working and advertised way to add support for hashing to your type:

class X { private: std::string a; int b;

// ... public: template<class Hash, class Flavor> friend void hash_append( Hash& h, Flavor const& f, X const& v ) { h.update(f, v.a); h.update(f, v.a.size()); h.update(f, v.b); } };

X x; boost::hash2::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works

I think i start to understand what you mean. In what i have in mind :

* hash algorithm implementation and interface do not depend on boost::hash2 (this is the case currently) * custom object hashing does depend on boost::hash2 (this is also the case currently)

From your examples, it seems to me that you would like custom object hashing to also be independant from boost::hash2. Is that accurate?

Yes.

...

...
...
...
and potentially compatible with std::hash2, if one ever comes to light.

I do not see how this is related.

What I meant by this is that in the above example Hash could become std::fnv1a_64 or whatever, if the std library adopts the proposal, and the existing hashing infrastructure will automatically support it. This is one of the main selling points of the library/proposal, after all.

X x; std::fnv1a_64 h; std::hash_append(h, {}, x); // works, with no changes to X

What i had in mind was more:

X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X

Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface.

Sure, but why use boost::hash2::hash_append when there's, presumably, std::hash_append, since you've already switched to std::fnv1a_64? But it doesn't really matter. My main point was that the support for hashing in user's type X remains functional and doesn't need updating or duplication. Now, after Peter's comments on the tag argument, I can see this would be difficult - at least the tags would be different between boost::hash2::hash_append and std::hash_append, assuming that both use the tag_invoke protocol. This is unfortunate, but even then just the different tags would be better than entire different APIs, so I still see value in a member-style hash_append API in the h argument (which would probably be a wrapper or a proxy for the underlying hasher). A way to have zero forward declarations would be perfect, though. And I should also mention that I realize that std::hash2 may be incompatible with Boost.Hash2 or even never be standardized, so the whole compatibility argument is rather theoretical at this point.

...

On a broader view, Claudio's made a very valid point, which made me realize that the current review goals may be a bit unusual. The usual review focus on finding the best suitable interface for the library. However, i don't think such interface exists:

* span<void const> is not available, even with a bleeding edge compiler. * support for older C++ versions mandates support for void* + size pointer as separate arguments (no span vocabulary type), which we know will be flagged by compilers as unsafe

So, in my opinion, part of the review should also focus on how will the library evolve toward a safer/modern version, with minimal burden for the library user. Basically we know that we will deliver an interface that is outdated by current standards, but that we have very valid reasons to do that (these reasons would not hold if we were targetting std). We should think about of we handle the cohabitation with a safer interface, and how we provide a smooth transition path to users, taking full advantage of available compiler features, so that when they update their code to newer standards and safer patterns, boost::hash2 is not a concern and just works flawlessly.

Personally, I think the "safety" argument doesn't really work in practice. If you want decent performance, you will likely want to operate on raw pointers and memory in the hasher implementation. In some cases you might want to use SIMD intrinsics or process bytes as system-sized words, for example, which is already throwing type safety out of the window. You'll also not want bounds checking to get in the way in your hot loop. I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.

Claudio DeSouza

4:51 p.m.

...

I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.

It depends what you mean by this, but span already offers compile time

fixed extants, that can be used with generic algorithms and entirely elide the runtime checks, if you want to support that type of generic construct. Claudio.

Andrey Semashev

4:55 p.m.

On 12/6/24 19:51, Claudio DeSouza via Boost wrote:

...

...
I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.

It depends what you mean by this, but span already offers compile time fixed extants, that can be used with generic algorithms and entirely elide the runtime checks, if you want to support that type of generic construct.

I don't think fixed extents are applicable to the hasher interface. And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

Claudio DeSouza

5 p.m.

I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.

Andrey Semashev

5:05 p.m.

On 12/6/24 20:00, Claudio DeSouza via Boost wrote:

...

I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.

I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.

Claudio DeSouza

5:09 p.m.

But the examples you presented of downsides are not applicable. If hash2 wants to encode buffer size in the interface then there are tradeoffs, but span can accommodate to those. Claims of span not being compatible with simd or other optimisations are just not the case. So the downsides mentioned pretty much are no applicable as opposed to any alternative. On Fri, 6 Dec 2024 at 17:05, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

On 12/6/24 20:00, Claudio DeSouza via Boost wrote:

...
I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.

I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

5:55 p.m.

On 12/6/24 20:09, Claudio DeSouza via Boost wrote:

...

On Fri, 6 Dec 2024 at 17:05, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...
On 12/6/24 20:00, Claudio DeSouza via Boost wrote:

...
I’m clarifying things. Putting safety in quotes and asserting span can’t be used because of simd or fixed sized annotations is just not the case.

I'm not saying that span can't be used. I'm saying there are downsides to it and there probably are better alternatives in this case.

But the examples you presented of downsides are not applicable. If hash2 wants to encode buffer size in the interface then there are tradeoffs, but span can accommodate to those. Claims of span not being compatible with simd or other optimisations are just not the case. So the downsides mentioned pretty much are no applicable as opposed to any alternative.

The only way to reliably avoid the potential overhead of span and e.g. load the data from it e.g. into a SIMD register is to get a pointer into the span. And if you want to avoid bounds checked on every iteration, you would have to resort to pointer arithmetics. At this point, what additional safety does span bring?

Peter Dimov

5:12 p.m.

Andrey Semashev wrote:

...

And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

I also used to think that way, but that was because I didn't understand the purpose of span. The purpose of span is to replace pointer arguments. If your function takes void f1( unsigned char p[] ); you use void f1( span<unsigned char> p ); and if it takes void f2( unsigned char p[4] ); you use void f2( span<unsigned char, 4> p ); The fixed extent span here does two things: first, it inserts a runtime check that the extent of the passed span is >= 4. void g( span<unsigned char> p ) { f2( p ); // implicit assert( p.size() >= 4 ); } Second, it can be used to optimize out runtime checks in operator[]: void f2( span<unsigned char, 4> p ) { uint32_t v = p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24); } Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.

Christian Mazakas

5:23 p.m.

I've been reading several comments on the list and I think a few different things are being discussed concurrently so I'm hoping to clarify and address them. There are two interfaces: one for implementers, and one for users. A goal for implementers is to not have to require any Hash2 headers for their algorithms to be used by the library. For users, we want safe interfaces that always do the Right Thing. hash_append is a proven interface for working with user-facing structures and solves the subtle and difficult problems it needs to, such as the "ab" | "c and "a" | "bc" case, as we've already discussed. Unfortunately, hash_append isn't really suitable as a safe interface for incremental hashing of byte streams, as we've noted that hash_append will append a hash of the size once it's done. update() works properly for hashing byte streams (as it must) but because it's intended as part of an implementation, it's not exactly suitable for most end-users who just want to hash their 3 GB file. We could even go so far as to say that it is an "unsafe" interface. So, we want safety. Vinnie has already alluded to using a free function hash_update which will accept a span. This is, as far as I can see, the best solution to the problems at hand. With this, we can achieve our goals for both implementers and users because the update interface doesn't have to change and then we can layer all the type safety externally to the free functions in a non-intrusive way. - Christian

Peter Dimov

5:40 p.m.

Christian Mazakas wrote:

...

update() works properly for hashing byte streams (as it must) but because it's intended as part of an implementation, it's not exactly suitable for most end- users who just want to hash their 3 GB file.

It is, though. Here's the actual example: https://pdimov.github.io/hash2/doc/html/hash2.html#example_md5sum static void md5sum( std::FILE* f, char const* fn ) { boost::hash2::md5_128 hash; int const N = 4096; unsigned char buffer[ N ]; for( ;; ) { std::size_t n = std::fread( buffer, 1, N, f ); if( std::ferror( f ) ) { std::fprintf( stderr, "'%s': read error: %s\n", fn, std::strerror( errno ) ); return; } if( n == 0 ) break; hash.update( buffer, n ); } std::string digest = to_string( hash.result() ); std::printf( "%s *%s\n", digest.c_str(), fn ); } Note how the `update` call is perfectly intuitive and suitable for end use. If we try to replace this with a `span`, it's not going to become any better. Why is that? We know that `span` must result in better code? Well it is because std::fread doesn't return a `span`, as doesn't most of the rest of existing code in 2025. If everything used `span`, everyone else should also use it, but we aren't there yet. Nowadays you get a `size_t`, and the existing `update` will happily take it, without forcing you to construct an unnecessary `span` out of it.

Ivan Matek

5:24 p.m.

On Fri, Dec 6, 2024 at 6:12 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Andrey Semashev wrote:

...
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

The purpose of span is to replace pointer arguments. If your function takes

I agree, but I wonder if we can maybe have our cake and eat it too, i.e. no need to pick one or the other, just provide easy default for most users and use no span interface for "low level" API.

I think one important thing is to remember that currently span will be processed as it's bytes + it's size when you call hash_append. If you want it processed as it's bytes we need to call hash_append_range with begin and end iterator. My first attempt of fixing this potential for confusion while addressing concerns about performance would be to provide higher level api for users and keep low level API for authors of algorithms since when it comes to hashing even tiny overhead of using span can be problematic. To steal placeholder name from other discussions we will call these helpers simple_hash_{something}. It would use default flavor and not provide ptr, len interface. But now here we get to the point that as mentioned above sometimes span is not treated as it's bytes. So my intuition would be that we would need to have 2 different helpers addressing 2 distinct use cases: 1. simple_hash_bytes 2. simple_hash_values Naming is not great, but I hope you get the idea. simple_hash_bytes can only process span/vector/array like arguments(contiguous bytes of some range where underlying data has same size as char), it would only append bytes, not size. Does not know how to hash int, std::string, etc... just byte like contiguous ranges. simple_hash_values processes values, e.g. your user defined type for which you implemented tag_invoke, std::string, std::pair, int, etc.... here span is hashed as it's bytes and size. I think this would cover 2 common use cases: 1. simple_hash_bytes - I am getting my data in batches and hashing them , but I want same result, e.g. if my file content is 1234 bytes no matter how I split those 1234 bytes in multiple calls with span I will get same result. 2. simple_hash_values - I am hashing multiple values together, e.g. std::pair<std::span<char>, std::span<char>> and I do *not *want .first and .second to just be concatenated. Now you can obviously construct a usecase where neither of helpers work, e.g. I am getting some data from a file in chunks, then I want to append hash of std::string. But I think most cases are situations where people just hash bytes in chunks, and situations where people hash values(each value in 1 call). Potentially user can wrongly assume that simple_hash_values with span will behave as simple_hash_bytes, but I think documentation can explain that in few sentences pretty clearly.

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

5:25 p.m.

On 12/6/24 20:12, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

I also used to think that way, but that was because I didn't understand the purpose of span.

The purpose of span is to replace pointer arguments. If your function takes

void f1( unsigned char p[] );

you use

void f1( span<unsigned char> p );

and if it takes

void f2( unsigned char p[4] );

you use

void f2( span<unsigned char, 4> p );

The fixed extent span here does two things: first, it inserts a runtime check that the extent of the passed span is >= 4.

void g( span<unsigned char> p ) { f2( p ); // implicit assert( p.size() >= 4 ); }

Second, it can be used to optimize out runtime checks in operator[]:

void f2( span<unsigned char, 4> p ) { uint32_t v = p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24); }

Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.

Yes, I understand, but the thing is I very rarely have to write `void f2( unsigned char p[4] );` in the first place. Most of the time I get a variable amount of data that I need to process, so I have either an iterator range or a pointer and size. And if there are fixed-sized fragments of that data that I need to process, pretty much always I have checked the entire size (or at least some outer size) of the data beforehand, so no checks needed for those individual fragments. So, for example, if I have to parse an RTP packet, I would void on_rtp_packet(const uint8_t* packet, size_t size) { // RTP fixed header is 12 bytes long if (size < 12) throw std::invalid_argument("RTP packet too short"); // Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); } Here, I may accept an iterator_range<const uint8_t*> or span<const uint8_t> as argument in on_rtp_packet, it's not important. The essential part is that read_be* don't need to check the size because the caller already has.

Peter Dimov

5:45 p.m.

Andrey Semashev wrote:

...

Yes, I understand, but the thing is I very rarely have to write `void f2( unsigned char p[4] );` in the first place. Most of the time I get a variable amount of data that I need to process, so I have either an iterator range or a pointer and size. And if there are fixed-sized fragments of that data that I need to process, pretty much always I have checked the entire size (or at least some outer size) of the data beforehand, so no checks needed for those individual fragments.

So, for example, if I have to parse an RTP packet, I would

void on_rtp_packet(const uint8_t* packet, size_t size) { // RTP fixed header is 12 bytes long if (size < 12) throw std::invalid_argument("RTP packet too short");

// Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }

read_be32 takes uint8_t const[4] here, because it has an implicit precondition that the argument has at least 4 valid bytes. So uint16_t read_be16( span<uint8_t const, 2> p ); uint32_t read_be32( span<uint8_t const, 4> p ); void on_rtp_packet( span<uint8_t const> packet ) { // RTP fixed header is 12 bytes long if (packet.size() < 12) throw std::invalid_argument("RTP packet too short"); // Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); } Since the optimizer sees that packet.size() >= 12, it can elide the checks against 2 and 4.

Andrey Semashev

5:48 p.m.

On 12/6/24 20:45, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
Yes, I understand, but the thing is I very rarely have to write `void f2( unsigned char p[4] );` in the first place. Most of the time I get a variable amount of data that I need to process, so I have either an iterator range or a pointer and size. And if there are fixed-sized fragments of that data that I need to process, pretty much always I have checked the entire size (or at least some outer size) of the data beforehand, so no checks needed for those individual fragments.

So, for example, if I have to parse an RTP packet, I would

void on_rtp_packet(const uint8_t* packet, size_t size) { // RTP fixed header is 12 bytes long if (size < 12) throw std::invalid_argument("RTP packet too short");

// Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }

read_be32 takes uint8_t const[4] here, because it has an implicit precondition that the argument has at least 4 valid bytes.

So

uint16_t read_be16( span<uint8_t const, 2> p ); uint32_t read_be32( span<uint8_t const, 4> p );

void on_rtp_packet( span<uint8_t const> packet ) { // RTP fixed header is 12 bytes long if (packet.size() < 12) throw std::invalid_argument("RTP packet too short");

// Parse 12-byte fixed header uint16_t seqn = read_be16(packet + 2); uint32_t timestamp = read_be32(packet + 4); }

Since the optimizer sees that packet.size() >= 12, it can elide the checks against 2 and 4.

I don't trust the optimizer. Because it failed me more than once. If you want your code fast, you better write it fast yourself and not hope that the compiler does a good job for you.

Peter Dimov

5:52 p.m.

Andrey Semashev wrote:

...

I don't trust the optimizer. Because it failed me more than once.

That's not the point. The point is to make life easier for the optimizer, by giving it more information to work with (in the form of the fixed extent).

Andrey Semashev

5:56 p.m.

On 12/6/24 20:52, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
I don't trust the optimizer. Because it failed me more than once.

That's not the point.

The point is to make life easier for the optimizer, by giving it more information to work with (in the form of the fixed extent).

I've already relieved it from work by not writing the unnecessary checks. :)

Peter Dimov

6:01 p.m.

Andrey Semashev wrote:

...

On 12/6/24 20:52, Peter Dimov via Boost wrote:

...
Andrey Semashev wrote:

...
I don't trust the optimizer. Because it failed me more than once.

That's not the point.

The point is to make life easier for the optimizer, by giving it more information to work with (in the form of the fixed extent).

I've already relieved it from work by not writing the unnecessary checks. :)

Sure, but you also didn't write the necessary checks that would have caught a bug where you accidentally read past the end of the packet due to a bad copy and paste.

Ivan Matek

5:55 p.m.

On Fri, Dec 6, 2024 at 6:48 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

I don't trust the optimizer. Because it failed me more than once.

If you want your code fast, you better write it fast yourself and not hope that the compiler does a good job for you.

it is also partially an ABI issue on Windows https://youtu.be/PCP3ckEqYK8?feature=shared&t=2203

...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

5:30 p.m.

You can see practical manifestations of these examples in e.g https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp where at https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9... you have BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64 ] ) with the implicit precondition that `block` points at at least 64 valid bytes, and then at https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9... you have for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); } which relies on this implicit precondition to read 16 subspans of size 4. (detail::read32le also has an implicit precondition that its argument points at at least 4 valid bytes.) Replacing these pointer arguments with fixed size spans would theoretically result in all the bounds checks (except one at the top level) being optimized out even if the functions aren't inlined.

...

-----Original Message----- From: Peter Dimov <pdimov@gmail.com> Sent: Friday, 6 December 2024 19:13 To: boost@lists.boost.org Subject: RE: [boost] [hash2][review] Early review (due to holidays)

Andrey Semashev wrote:

...
And fixed extents are not as useful as the dynamic extent in general, in my experience, as most of the time we deal with variable-sized sequences.

I also used to think that way, but that was because I didn't understand the purpose of span.

The purpose of span is to replace pointer arguments. If your function takes

void f1( unsigned char p[] );

you use

void f1( span<unsigned char> p );

and if it takes

void f2( unsigned char p[4] );

you use

void f2( span<unsigned char, 4> p );

The fixed extent span here does two things: first, it inserts a runtime check that the extent of the passed span is >= 4.

void g( span<unsigned char> p ) { f2( p ); // implicit assert( p.size() >= 4 ); }

Second, it can be used to optimize out runtime checks in operator[]:

void f2( span<unsigned char, 4> p ) { uint32_t v = p[0] + (p[1] << 8) + (p[2] << 16) + (p[3] << 24); }

Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.

Andrey Semashev

5:42 p.m.

On 12/6/24 20:30, Peter Dimov via Boost wrote:

...

You can see practical manifestations of these examples in e.g

https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp

where at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

you have

BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64 ] )

with the implicit precondition that `block` points at at least 64 valid bytes, and then at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

you have

for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }

which relies on this implicit precondition to read 16 subspans of size 4.

Assuming you had code like this: void transform( span<unsigned char const, 64> block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } } I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span<const unsigned char, 4> accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks.

Peter Dimov

5:49 p.m.

Andrey Semashev wrote:

...

Assuming you had code like this:

void transform( span<unsigned char const, 64> block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } }

I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span<const unsigned char, 4> accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough.

It could, yes. Giving the compiler the opportunity to elide the checks is the point of the fixed extent span. If you use dynamic extent spans everywhere, the compiler can't optimize out the checks using only local knowledge, it will need to rely on inlining and have to work quite a bit harder.

Claudio DeSouza

6:46 p.m.

detail::read32le( block.subspan(i * 4).first<4>()); So you also get a sized span, and avoid checks. On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

On 12/6/24 20:30, Peter Dimov via Boost wrote:

...
You can see practical manifestations of these examples in e.g

https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp

where at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
you have

BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64

] )

...
with the implicit precondition that `block` points at at least 64 valid

bytes,

...
and then at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
you have

for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }

which relies on this implicit precondition to read 16 subspans of size 4.

Assuming you had code like this:

void transform( span<unsigned char const, 64> block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } }

I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span<const unsigned char, 4> accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

6:55 p.m.

On 12/6/24 21:46, Claudio DeSouza via Boost wrote:

...

detail::read32le( block.subspan(i * 4).first<4>());

So you also get a sized span, and avoid checks.

I think, first() would need to check the dynamic span size anyway. PS: Please, don't top-post.

...

On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...
On 12/6/24 20:30, Peter Dimov via Boost wrote:

...
You can see practical manifestations of these examples in e.g

https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp

where at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
you have

BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64

] )

...
with the implicit precondition that `block` points at at least 64 valid

bytes,

...
and then at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
you have

for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }

which relies on this implicit precondition to read 16 subspans of size 4.

Assuming you had code like this:

void transform( span<unsigned char const, 64> block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } }

I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span<const unsigned char, 4> accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Claudio DeSouza

7:03 p.m.

The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent. This loop in general would benefit from being changed to something that leverages the fixed extent even more. There are different approaches, but ideally you want to iterate in a stride, which relies on the fixed extent and always returned subspans of fixed extents each step, so you dispense all the arithmetic access and subspan calls. On Fri, 6 Dec 2024 at 18:56, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

On 12/6/24 21:46, Claudio DeSouza via Boost wrote:

...
detail::read32le( block.subspan(i * 4).first<4>());

So you also get a sized span, and avoid checks.

I think, first() would need to check the dynamic span size anyway.

PS: Please, don't top-post.

...
...
Assuming you had code like this:

void transform( span<unsigned char const, 64> block ) { for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block.subspan(i * 4, 4) ); } }

I would presume, the subspan call would have a runtime check for the offset and size. Then the converting constructor from span<const unsigned char> returned from subspan to span<const unsigned char, 4> accepted by read32le would have to check the size again. Yes, the compiler could optimize those checks away, if it's smart enough. Or it could not, e.g. if optimizations are disabled or it fails to propagate constants for some reason. But none of these checks are actually necessary, except maybe one at the entry, to enforce the size of the entire block. Maybe not even that, if the callers of transform are guaranteed to always pass the whole blocks.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

...
On Fri, 6 Dec 2024 at 17:42, Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...
On 12/6/24 20:30, Peter Dimov via Boost wrote:

...
You can see practical manifestations of these examples in e.g

https://github.com/pdimov/hash2/blob/develop/include/boost/hash2/md5.hpp

...
where at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
...
you have

BOOST_CXX14_CONSTEXPR void transform( unsigned char const block[ 64

] )

...
with the implicit precondition that `block` points at at least 64 valid

bytes,

...
and then at

https://github.com/pdimov/hash2/blob/26b16023dc639ae354e246458719e4b569bc1d9...

...
...
you have

for( int i = 0; i < 16; ++i ) { x[ i ] = detail::read32le( block + i * 4 ); }

which relies on this implicit precondition to read 16 subspans of size

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

7:07 p.m.

Claudio DeSouza wrote:

...

The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent.

block has a fixed extent, but block.subspan(i * 4) does not. ...

...

...
On 12/6/24 21:46, Claudio DeSouza via Boost wrote:

...
detail::read32le( block.subspan(i * 4).first<4>());

So you also get a sized span, and avoid checks.

Claudio DeSouza

8:35 p.m.

Correct, and that's why you `first<N>()` after, however the question is how effective the compiler is at eliminating this temporary in between. Claudio. On Fri, Dec 6, 2024 at 7:07 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Claudio DeSouza wrote:

...
The compiler can still see the max value being passed in and your example block has a fixed extent. By using first<> you get another span of fixed extent.

block has a fixed extent, but block.subspan(i * 4) does not.

...

...
...
On 12/6/24 21:46, Claudio DeSouza via Boost wrote:

...
detail::read32le( block.subspan(i * 4).first<4>());

So you also get a sized span, and avoid checks.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Claudio DeSouza

8:46 p.m.

Additionally, you are also allowed to subspan without a dynamic extent, like in: s.subspan<2u, 4u>() My point is that it all depends on what you are doing and what is possible in that context.

Andrey Semashev

9:38 p.m.

On 12/6/24 23:46, Claudio DeSouza via Boost wrote:

...

Additionally, you are also allowed to subspan without a dynamic extent, like in:

s.subspan<2u, 4u>()

Unless this is a new addition, I don't see such a member in std::span.

Vinnie Falco

9:42 p.m.

On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

Unless this is a new addition, I don't see such a member in std::span.

https://en.cppreference.com/w/cpp/container/span/subspan Thanks

Andrey Semashev

9:47 p.m.

On 12/7/24 00:42, Vinnie Falco wrote:

...

On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

Unless this is a new addition, I don't see such a member in std::span.

https://en.cppreference.com/w/cpp/container/span/subspan <https:// en.cppreference.com/w/cpp/container/span/subspan>

Ah, yes, I missed this, thanks. Anyway, it doesn't work as the offset is a runtime value in this case.

Claudio DeSouza

9:51 p.m.

So to go back to my point: If you really want to avoid checks, and you do know the size beforehand, you can entirely avoid it by harnessing this type of compile-time information and leveraging the compiler in your favour. It all depends how dedicated you are to make sure you get that. Granted, with dynamic extent the number of checks will depend on several factors, including implementation, and hardening settings the user is building with. It is win after win with span. On Fri, Dec 6, 2024 at 9:47 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

On 12/7/24 00:42, Vinnie Falco wrote:

...
On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

Unless this is a new addition, I don't see such a member in std::span.

https://en.cppreference.com/w/cpp/container/span/subspan <https:// en.cppreference.com/w/cpp/container/span/subspan>

Ah, yes, I missed this, thanks.

Anyway, it doesn't work as the offset is a runtime value in this case.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrey Semashev

10:09 p.m.

On 12/7/24 00:51, Claudio DeSouza via Boost wrote:

...

On Fri, Dec 6, 2024 at 9:47 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...
On 12/7/24 00:42, Vinnie Falco wrote:

...
On Fri, Dec 6, 2024 at 1:38 PM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

Unless this is a new addition, I don't see such a member in std::span.

https://en.cppreference.com/w/cpp/container/span/subspan <https:// en.cppreference.com/w/cpp/container/span/subspan>

Ah, yes, I missed this, thanks.

Anyway, it doesn't work as the offset is a runtime value in this case.

So to go back to my point: If you really want to avoid checks, and you do know the size beforehand, you can entirely avoid it by harnessing this

type

...

of compile-time information and leveraging the compiler in your favour. It all depends how dedicated you are to make sure you get that. Granted, with dynamic extent the number of checks will depend on several factors, including implementation, and hardening settings the user is building with.

It is win after win with span.

To convert all the checks that are done in runtime with the code I presented earlier you would have to convert that loop to a series of statements such as this: x[ 0 ] = detail::read32le( block.subspan<0, 4>() ); x[ 1 ] = detail::read32le( block.subspan<4, 4>() ); x[ 3 ] = detail::read32le( block.subspan<8, 4>() ); x[ 4 ] = detail::read32le( block.subspan<12, 4>() ); // ... x[ 15 ] = detail::read32le( block.subspan<60, 4>() ); Or perform some kind of tricks with variadic templates and fold expressions. At least, until `template for` (P1306) is accepted. So, I suppose, it is doable in this limited case. But not something I would willingly do, given that I can achieve the same result with a plain loop and pointers. There's your example of span getting in the way.

Claudio DeSouza

10:15 p.m.

Yes, that's the naive way to implement it. Like I said, depending how dedicated you are, in modern C++ you can pretty much just fold something like that in an elegant way. Also, you need to understand that "just using a pointer" doesn't necessarily mean no checks either in many cases. I think the main issue here is that with span we can eliminate all the arithmetic around subscript access, which makes code more readable, and harder to get wrong, while delivering the same result, and with tools that allow us to provide the compiler with more information about what we are doing. So the overall result is something more robust than "just a pointer", and the safety becomes just a natural conclusion of this. Claudio.

Peter Dimov

7 Dec 7 Dec

12:35 a.m.

Andrey Semashev wrote:

...

To convert all the checks that are done in runtime with the code I presented earlier you would have to convert that loop to a series of statements such as this:

x[ 0 ] = detail::read32le( block.subspan<0, 4>() ); x[ 1 ] = detail::read32le( block.subspan<4, 4>() ); x[ 3 ] = detail::read32le( block.subspan<8, 4>() ); x[ 4 ] = detail::read32le( block.subspan<12, 4>() ); // ... x[ 15 ] = detail::read32le( block.subspan<60, 4>() );

Or perform some kind of tricks with variadic templates and fold expressions.

You can write this with mp_for_each, but it's not necessary because compilers figure out the for loop easily: https://godbolt.org/z/d861cPf1K

Claudio DeSouza

3:17 a.m.

Common span win. I knew that was the case, but I was replying from the perspective of the argument "I don't trust optimisers". In practice, yes, compilers are much different these days, and the more information can be inferred about any given operation, the better the chances of optimisations to be very effective in shedding dead weight. There's a whole debate around bound checks because of that, and the trend is that these types of optimisations will continue to improve going forward, as the demand for bound checks elision increases. Claudio.

Peter Dimov

6 Dec 6 Dec

7:04 p.m.

Claudio DeSouza wrote:

...

detail::read32le( block.subspan(i * 4).first<4>());

So you also get a sized span, and avoid checks.

It doesn't matter, you get the check either way; it's just in first<> instead of the constructor.

Vinnie Falco

6:06 p.m.

On Fri, Dec 6, 2024 at 9:12 AM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

The purpose of span is to replace pointer arguments. If your function takes ... void f2( unsigned char p[4] ); you use void f2( span<unsigned char, 4> p );

This is quite informative, thank you. For my entire career I have avoided passing C-style arrays because of ignorance of how they work. Is it a copy? Is it just a pointer? Does the compiler know the extent? And so on. I don't like the look of function signatures which accept language arrays, because they do not quite work the same way as other types. The fixed extent span here does two things: first, it inserts a runtime

...

check that the extent of the passed span is >= 4. ... Since it's statically known that 0<4, 1<4, 2<4, 3<4, these accesses don't cause any asserts to be inserted.

This is a strong rationale for using a span-like type in the signatures for those implementation functions. Thanks

Ivan Matek

6:16 p.m.

On Fri, Dec 6, 2024 at 7:06 PM Vinnie Falco via Boost <boost@lists.boost.org> wrote:

...

This is quite informative, thank you. For my entire career I have avoided passing C-style arrays because of ignorance of how they work. Is it a copy? Is it just a pointer? Does the compiler know the extent? And so on. I don't like the look of function signatures which accept language arrays, because they do not quite work the same way as other types.

I had the same "issue" until I remembered a simple C++ program that shows how compiler sees this functions.

void f(int arr[3]) { } void f(int arr[4]) { } compiler will complain: <source>:5:6: error: redefinition of 'void f(int*)' 5 | void f(int arr[4]) { | <source>:1:6: note: 'void f(int*)' previously defined here 1 | void f(int arr[3]) { So I also in general prefer fixed extent span, but we must remember that that can lead to code size bloat, and that is usually undetectable in micro benchmarks, but can slow down large programs.

Vinnie Falco

6:18 p.m.

On Fri, Dec 6, 2024 at 10:16 AM Ivan Matek <libbooze@gmail.com> wrote:

...

...that can lead to code size bloat

How? Thanks

Ivan Matek

6:48 p.m.

On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:

...

...
How?

maybe we are not talking about same situation, but this is what I meant,

godbolt <https://godbolt.org/z/K8xEjEoMK> link [image: image.png] Compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways. This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly. There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/

Ivan Matek

6:52 p.m.

Hi Vinnie, did not know pasting image will get me sent straight to moderator approval due to message size, apologies, here is my message with image removed. On Fri, Dec 6, 2024 at 7:48 PM Ivan Matek <libbooze@gmail.com> wrote:

...

On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:

...
...
How?

maybe we are not talking about same situation, but this is what I meant,

godbolt <https://godbolt.org/z/K8xEjEoMK> link

If you look at nt f_span_static<3ul>(std::__1::span<int, 3ul>): mov eax,DWORD PTR [rdi+0x4] add eax,DWORD PTR [rdi] add eax,DWORD PTR [rdi+0x8] ret nop DWORD PTR [rax+0x0] int f_span_static<4ul>(std::__1::span<int, 4ul>): movdqu xmm0,XMMWORD PTR [rdi] pshufd xmm1,xmm0,0xee paddd xmm1,xmm0 pshufd xmm0,xmm1,0x55 paddd xmm0,xmm1 movd eax,xmm0 ret

...

you will see that compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways.

This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly.

There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/

Claudio DeSouza

11:14 p.m.

Yes, this is the challenge with optimisations, caching, etc, is: in many cases you have to make a trade-off between size/performance. I think it is also important to mention that if you ultimately want to tag a block as unsafe buffers usage and just do pointer arithmetics on top of the memory you received through a span, you are free to do that. That's your prerogative. That's an aspect of the abstract machine you now have a much finer control over because you spanified your interfaces. The other way around is not true though. You can opt-in to spanify your interfaces and retain the ability to deep dive in a lower level type of code, and use pointers as you see fit, However, you don't benefit from higher abstractions once all you have is just offer pointer/length pairs. I think Peter has made this case in a much more educated manner that I have. Claudio. On Fri, Dec 6, 2024 at 11:06 PM Ivan Matek via Boost <boost@lists.boost.org> wrote:

...

On Fri, Dec 6, 2024 at 7:19 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:

...
...
How?

maybe we are not talking about same situation, but this is what I meant,

godbolt <https://godbolt.org/z/K8xEjEoMK> link [image: image.png] Compiler will generate "specialized" functions for each different size of span with non dynamic extent. Here you can see how he implemented summation for 3 and 4 integers in different ways.

This is great for performance and makes checking easier since as Peter explained compiler knows more, but it creates larger binaries(generally speaking, I know compilers are smart, can inline, for 2 instantiations does not really matter, etc). Function taking dynamic span or runtime specified n will probably be slower because it does normal loop, but there is only one copy of it in resulting assembly.

There is a real life example of this here in fmt. note link gives certificate error and I did not manage to find another link https://vitaut.net/posts/2020/reducing-library-size/

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

4:55 p.m.

Andrey Semashev wrote:

...

I think, the answer to memory safety won't be the span as the argument type, which will only get in the way in the hasher implementation, but a markup of the pointer+size arguments so that the compiler is able to reason about the buffer size anyway. I think, MSVC and gcc already have attributes of that kind, though I'm not sure what exactly the compilers are able to do with them. Maybe, those attributes need to be standardized instead of pushing span everywhere.

Microsoft has tried these sorts of annotations pretty extensively, and they have moved towards requiring span. (In C they still need to rely on the annotations, of course, but in C++ code, they now prefer span.) std::span has actually been proposed by Microsoft, on the basis of that experience.

Rainer Deyke

7 Dec 7 Dec

1:22 p.m.

On 06.12.24 17:32, Andrey Semashev via Boost wrote:

...

On 12/6/24 18:54, Julien Blanc via Boost wrote:

...

...
X x; std::fnv1a_64 h; boost::hash2::hash_append(h, {}, x); // works, with no changes to X

Notice the namespace is still boost::hash2. I expect it to work whatever std::fnv1a_64 finally chose as its interface.

Sure, but why use boost::hash2::hash_append when there's, presumably, std::hash_append, since you've already switched to std::fnv1a_64?

Because in generic code you wouldn't know if your hash algorithm is supplied by std or boost::hash2. -- Rainer Deyke - rainerd@eldwood.com

Alexander Grund

6 Dec 6 Dec

11:56 a.m.

...

If i understand correctly:

std::span<char> v("123", 3)

hash_append(h, v); // hash "123" + the size 3, as v is a container h.update(v); // hash only "123".

Or did i miss something?

I think this is actually an important question: How to (conveniently) hash only a range? I.e. with the library, not the hasher. As in the example there might be use cases to calculate e.g. the checksum of a file using the hash algorithm as a template. For this the size of the data/file must not be included in the hash. I've seen `hash_append_range` which could be used but needs to get the begin/end manually. I think a 3 parameter form `hash_append_range(Hash&, Flavor&, Range)` would be a good addition to support the above use case without having to call the hasher directly. I.e. to provide a similar interface/abstraction to `hash_append` Maybe name it `hash_append_range_without_size` to discourage people from using it without knowing the caveats (multiple ranges, empty ranges, ...)

Peter Dimov

12:09 p.m.

Alexander Grund wrote:

...

As far as I understood this is exactly how the library is implemented: It is not `boost::hash2::update( h, span )` but `boost::hash2::hash_append( h, {}, span )` but thats minor.

Not quite, because hash_append would add the size in this case. The equivalent is hash_append_range, but it's old school and takes two iterators. Maybe I need to add an overload of hash_append_range that actually takes a range. Incidentally, hash_append_range is what you already have to use if you invoke the hash algorithm at compile time on e.g. char[], because void* and reinterpret_cast don't work in constexpr. See the second example in https://pdimov.github.io/hash2/doc/html/hash2.html#example_compile_time_hash... Calling either `update` won't work in this case.

Vinnie Falco

3:32 p.m.

On Fri, Dec 6, 2024 at 12:31 AM Alexander Grund via Boost < boost@lists.boost.org> wrote:

...

As far as I understood this is exactly how the library is implemented:

Your understanding is incorrect.

Vinnie Falco

3:44 p.m.

On Thu, Dec 5, 2024 at 12:06 PM Vinnie Falco <vinnie.falco@gmail.com> wrote:

...

...Instead of writing:

h.update( span );

You instead write:

boost::hash2::update( h, span );

Anyway it looks like this wasn't clear so I guess I'll try again and be even more explicit. Currently, the hash algorithm provides the following member function for sending a contiguous range of binary data: void update( void const*, std::size_t ); The combination of pointer and size passed as individual arguments is triggering to certain folks, who instead prefer the following, or some variation of it (for example, using uint8_t instead of byte) void update( std::span< std::byte const > ); This is disfavored by the library author (and myself) for reasons already posted to the mailing list which I will not repeat. Yet an argument can be made that the use of a span could be favorable to tooling which performs buffer safety analysis. Thus I propose to add the following additional free function. Doing so allows authors of hash functions to call the free function with a span instead of the update member function which accepts a pointer and size, without burdening the authors of hash algorithms: namespace boost::hash2 { template< typename Hasher > void update( Hasher& h, std::span< std::byte const > s ) { h.update( s.data(), s.size() ); } } I have to stress that this is not the same as the hash_append function which by coincidence has an identical signature (and different implementation). This is the alternative to: 1. changing the signature of update in the Hasher concept, or 2. adding an overload of update to the Hasher concept (both of these are disfavored for reasons already given). Thanks

Julien Blanc

5 Dec 5 Dec

4:11 p.m.

5 décembre 2024 à 12:52 "Peter Dimov via Boost" a écrit:

...

I can obviously support any form of `update` by making it templated and do the appropriate enable_if incantations, but I don't want to impose this requirement on hash algorithm authors.

Can't it be fully done inside hash_append instead, without requiring the hash update itself being template? In c++20, it's pretty trivial to invoke a callable object with the correct set of arguments (see https://godbolt.org/z/KhP5bcM3d for the kind of things i'm thinking of). Of course it makes the Hash concept pretty hard to define, as it's kinda polymorphic… I don't remember how doable it is with C++11, just pretty sure that it will be a lot more painful. Regards, Julien

Claudio DeSouza

4 Dec 4 Dec

1:35 p.m.

There has been some discussion about this and the consensus is that `uint8_t` is a better option than `std::byte` as the former represents a determinate number of bytes ( https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const uint8_t>, and `span<uint8_t>` for immutable and mutable bytes respectively. On Wed, Dec 4, 2024 at 1:03 PM Dominique Devienne <ddevienne@gmail.com> wrote:

...

On Wed, Dec 4, 2024 at 2:09 AM Claudio DeSouza via Boost <boost@lists.boost.org> wrote:

...
function that takes a pair of void*/size_t should be a span<uint_8>

or rather `std::span<[const ]std::byte>`. --DD

https://en.cppreference.com/w/cpp/types/byte https://en.cppreference.com/w/cpp/container/span/as_bytes

Ivan Matek

2:05 p.m.

On Wed, Dec 4, 2024 at 2:36 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:

...

There has been some discussion about this and the consensus is that `uint8_t` is a better option than `std::byte` as the former represents a determinate number of bytes ( https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const uint8_t>, and `span<uint8_t>` for immutable and mutable bytes respectively.

I presume think static_assert for CHAR_BIT value is enough, I am not aware of anybody using systems where this is not true, despite what faq says... but I must admit I am not familiar with all weird archs so I could be wrong

Peter Dimov

2:13 p.m.

Ivan Matek wrote:

...

On Wed, Dec 4, 2024 at 2:36 PM Claudio DeSouza via Boost < boost@lists.boost.org> wrote:

...
There has been some discussion about this and the consensus is that `uint8_t` is a better option than `std::byte` as the former represents a determinate number of bytes ( https://isocpp.org/wiki/faq/intrinsic-types#bits-per-byte). So `span<const uint8_t>, and `span<uint8_t>` for immutable and mutable bytes respectively.

I presume think static_assert for CHAR_BIT value is enough, I am not aware of anybody using systems where this is not true, despite what faq says... but I must admit I am not familiar with all weird archs so I could be wrong

Using uint8_t is equivalent to asserting that CHAR_BIT is 8. In practice, uint8_t, when defined, is always unsigned char, except when it's `char` on platforms that strive to make your life interesting. But that's non-conforming. There's an attempt to make CHAR_BIT == 8 required http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3477r0.html but it might be slightly ahead of its time because a few holdover platforms with CHAR_BIT == 32 still remain in use. (On these, uint8_t is a compile time error.)

218

Age (days ago)

221

Last active (days ago)

List overview

Download

107 comments

13 participants

participants (13)

Alexander Grund
Andrey Semashev
Christian Mazakas
Claudio DeSouza
Dominique Devienne
Ivan Matek
Joaquín M López Muñoz
Julien Blanc
Matt Borland
Peter Dimov
Rainer Deyke
Vinnie Falco
Zach Laine