[review][fixed_string] End of review period approaching and a plea

Joaquin M López Muñoz

3 Dec 2019 3 Dec '19

8:10 p.m.

The review period for candidate Boost.FixedString finishes tomorrow, Dec 4. There's an ongoing lively discussion on many aspects of the library, which is very good, but only three of you have cast your acceptance/rejection vote, which is not good. This is a gentle petition to the people engaging in the review who haven't voted yet: Vinícius, Nevin, Mike, Andrzej, JeanHeyd, Hans, Peter, Julien, Andrey, Tim, Gavin, Emil, (and anyone who might be still lurking) to please submit a review with an explicit vote. If you feel like you need some more time to do the work, I think we are in a position to extend the review period, but please express your intentions before I ask the review wizards. Thank you! Joaquín M López Muñoz Review manager for candidate Boost.FixedString

Show replies by date

Krystian Stasiowski

4 Dec 4 Dec

4:34 a.m.

Hi all, Thank you all for participating in the review so far. I'd like to outline some of the changes that have been implemented thus far, and others that are pending as there is not a clear consensus. The live branch can be found at https://github.com/18/fixed_string/tree/review-patch - Empty class specialization for N = 0: Implemented - Use of smallest possible type to store size: Implemented - More noexcept: Mostly implemented, still need to address exception specification for template overloads intended for `string_view` - Hash support: Implemented for both std::hash and boost::hash - Change `substr` to return a `fixed_string` and add a `subview` function: Implemented - Fixed the user configuration macro and changed the name to BOOST_FIXED_STRING_STANDALONE - Added mandatory meta folder A few things on the list of change remain: - `constexpr` all the things!... Only C++20 allows for members to remain uninitialized by a constexpr constructor, so use in previous revisions of the standard would require initialization of each element, which could impose an unacceptable performance penalty - We still have not settled on a name that is concise and descriptive - No consensus has been reached on how operator+ should be implemented - Documentation needs to be updated Thank you, Krystian Stasiowski On Tue, Dec 3, 2019 at 3:11 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:

...

The review period for candidate Boost.FixedString finishes tomorrow, Dec 4.

There's an ongoing lively discussion on many aspects of the library, which is very good, but only three of you have cast your acceptance/rejection vote, which is not good.

This is a gentle petition to the people engaging in the review who haven't voted yet:

Vinícius, Nevin, Mike, Andrzej, JeanHeyd, Hans, Peter, Julien, Andrey, Tim, Gavin, Emil,

(and anyone who might be still lurking) to please submit a review with an explicit vote. If you feel like you need some more time to do the work, I think we are in a position to extend the review period, but please express your intentions before I ask the review wizards.

Thank you!

Joaquín M López Muñoz Review manager for candidate Boost.FixedString

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Bjørn Roald

7:35 a.m.

...

On 4 Dec 2019, at 05:34, Krystian Stasiowski via Boost <boost@lists.boost.org> wrote:

- We still have not settled on a name that is concise and descriptive

Maybe name should emphasize storage placement, rather than capacity fixedness. inplace_string — bjørn

Peter Dimov

7:41 a.m.

Krystian Stasiowski wrote:

...

- Empty class specialization for N = 0: Implemented

I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance. Storing the size (as capacity - size) in the last char for N < 256 will have more impact, but I'm not sure that it too is worth the added complexity.

degski

1:42 p.m.

On Wed, 4 Dec 2019 at 01:41, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Krystian Stasiowski wrote:

...
- Empty class specialization for N = 0: Implemented

I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance.

+1. Storing the size (as capacity - size) in the last char for N < 256 will

...

have more impact, but I'm not sure that it too is worth the added complexity.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Vinnie Falco

2:03 p.m.

On Tue, Dec 3, 2019 at 11:42 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance.

I don't feel strongly about it either way, but since when is data() guaranteed to be unique per instance? Thanks

Peter Dimov

4:01 p.m.

Vinnie Falco wrote:

...

On Tue, Dec 3, 2019 at 11:42 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance.

I don't feel strongly about it either way, but since when is data() guaranteed to be unique per instance?

This is a property of the implementation that is no longer true after the change. It may not have been "guaranteed" but it was true.

Andrey Semashev

6:32 p.m.

On 2019-12-04 19:01, Peter Dimov via Boost wrote:

...

Vinnie Falco wrote:

...
...
I don't like this change. A special case for close to zero benefit

On Tue, Dec 3, 2019 at 11:42 PM Peter Dimov via Boost <boost@lists.boost.org> wrote: that > changes the semantics of data() to not be unique per instance.

I don't feel strongly about it either way, but since when is data() guaranteed to be unique per instance?

This is a property of the implementation that is no longer true after the change. It may not have been "guaranteed" but it was true.

I don't think it should be guaranteed.

Peter Dimov

6:52 p.m.

Andrey Semashev wrote:

...

I don't think it should be guaranteed.

I honestly see no reason for it not to be. The difference between a one-byte class and a zero-byte class is only relevant on a few occasions, which rarely occur for fixed_string. Unnecessary lack of consistence is a clear downside.

Phil Endecott

3:12 p.m.

New subject: [review][fixed_string] End of review period approaching and a plea

Peter Dimov <pdimov@gmail.com> wrote:

...

Krystian Stasiowski wrote:

...
- Empty class specialization for N = 0: Implemented

I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance.

I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

...

Storing the size (as capacity - size) in the last char for N < 256 will have more impact, but I'm not sure that it too is worth the added complexity.

Why the last char, rather than always having the size (of whatever appropriate type) first? Is the idea that this makes data() and c_str() essentially no-ops? I guess the benefit of that depends on how often you need empty() or size() vs. data() or c_str(). Or is there some other issue? Regards, Phil.

Alexander Grund

3:24 p.m.

...

I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

I don't think so: "There is a special case for a zero-length array (|N == 0|). In that case, array.begin() == array.end(), which is some unique value. The effect of calling front() or back() on a zero-sized array is undefined." from https://en.cppreference.com/w/cpp/container/array IMO this excludes nullptr as that won't be unique

Andrey Semashev

6:30 p.m.

On 2019-12-04 18:24, Alexander Grund via Boost wrote:

...

...
I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

I don't think so:

"There is a special case for a zero-length array (|N == 0|). In that case, array.begin() == array.end(), which is some unique value. The effect of calling front() or back() on a zero-sized array is undefined." from https://en.cppreference.com/w/cpp/container/array

IMO this excludes nullptr as that won't be unique

I don't think iterators from different instances of a container are comparable. IOW, "unique" means distinct from any possible values of valid iterators obtained from this particular container instance. That might not apply to pointers, though. I don't remember whether there are any guarantees wrt. data() of a zero-sized array, for example.

Peter Dimov

6:44 p.m.

Andrey Semashev wrote:

...

On 2019-12-04 18:24, Alexander Grund via Boost wrote:

...
...
I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

I don't think so:

"There is a special case for a zero-length array (|N == 0|). In that case, array.begin() == array.end(), which is some unique value. The effect of calling front() or back() on a zero-sized array is undefined." from https://en.cppreference.com/w/cpp/container/array

IMO this excludes nullptr as that won't be unique

I don't think iterators from different instances of a container are comparable. IOW, "unique" means distinct from any possible values of valid iterators obtained from this particular container instance.

That might not apply to pointers, though.

Pointers are comparable with ==, and array<T, 0> usually returns reinterpret_cast<T*>(this) from data() to satisfy the uniqueness requirement. Or used to return - I see that libstdc++ returns nullptr now, probably because reinterpret_cast can't be constexpr. https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/a...

JeanHeyd Meneide

6:56 p.m.

On Wed, Dec 4, 2019 at 1:50 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Andrey Semashev wrote:

...
On 2019-12-04 18:24, Alexander Grund via Boost wrote:

...
...
I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

I don't think so:

"There is a special case for a zero-length array (|N == 0|). In that case, array.begin() == array.end(), which is some unique value. The effect of calling front() or back() on a zero-sized array is undefined." from https://en.cppreference.com/w/cpp/container/array

IMO this excludes nullptr as that won't be unique

I don't think iterators from different instances of a container are comparable. IOW, "unique" means distinct from any possible values of valid iterators obtained from this particular container instance.

That might not apply to pointers, though.

Pointers are comparable with ==, and array<T, 0> usually returns reinterpret_cast<T*>(this) from data() to satisfy the uniqueness requirement.

Or used to return - I see that libstdc++ returns nullptr now, probably because reinterpret_cast can't be constexpr.

https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/a...

Precisely for this reason, but also the uniqueness requirement was on `begin()` and `end()` and not `.data()`: "In the case that N == 0, begin() == end() == unique value. The return value of data() is unspecified." -- Working Draft, http://eel.is/c++draft/array#zero-2 begin() and end() are already restricted library-wide to not be comparable with begin() and end() -- let alone any iterator -- from a container that does not generate them, so cross-container uniqueness has never been a thing. There is an issue open to generate better wording around this to make it more clear: https://cplusplus.github.io/LWG/lwg-active.html#2157

Alexander Grund

5 Dec 5 Dec

7:59 a.m.

On 04.12.19 19:30, Andrey Semashev via Boost wrote:

...

On 2019-12-04 18:24, Alexander Grund via Boost wrote:

...
...
I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

I don't think so:

"There is a special case for a zero-length array (|N == 0|). In that case, array.begin() == array.end(), which is some unique value. The effect of calling front() or back() on a zero-sized array is undefined." from https://en.cppreference.com/w/cpp/container/array

IMO this excludes nullptr as that won't be unique

I don't think iterators from different instances of a container are comparable. IOW, "unique" means distinct from any possible values of valid iterators obtained from this particular container instance.

That might not apply to pointers, though. I don't remember whether there are any guarantees wrt. data() of a zero-sized array, for example. You are right: "If size() is 0, data() may or may not return a null pointer." https://en.cppreference.com/w/cpp/container/array/data

Peter Dimov

4 Dec 4 Dec

4:05 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

Phil Endecott wrote:

...

Peter Dimov <pdimov@gmail.com> wrote:

...
Krystian Stasiowski wrote:

...
- Empty class specialization for N = 0: Implemented

I don't like this change. A special case for close to zero benefit that changes the semantics of data() to not be unique per instance.

I would hope to see close to the same semantics as std::array<T,0>, which I believe allows data() to return nullptr.

nullptr is invalid here, because the string is null-terminated.

...

...
Storing the size (as capacity - size) in the last char for N < 256 will have more impact, but I'm not sure that it too is worth the added complexity.

Why the last char, rather than always having the size (of whatever appropriate type) first? Is the idea that this makes data() and c_str() essentially no-ops?

The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

Andrey Semashev

6:10 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-04 19:05, Peter Dimov via Boost wrote:

...

Phil Endecott wrote:

...
Peter Dimov <pdimov@gmail.com> wrote:

...
...
Storing the size (as capacity - size) in the last char for N < 256 will > have more impact, but I'm not sure that it too is worth the added > complexity.

Why the last char, rather than always having the size (of whatever appropriate type) first? Is the idea that this makes data() and c_str() essentially no-ops?

The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

I'm not sure this would actually be beneficial in terms if performance. Ignoring the fact that size() becomes more expensive, and this is a relatively often used function, you also have to access the tail of the storage, which is likely on a different cache line than the beginning of the string. It is more likely that the user will want to process the string in the forward direction, possibly not until the end (think comparison operators, copy/assignment, for instance). If the string is not close to full capacity, you would only fetch the tail cache line to get the string size. It is for this reason placing any auxiliary members like size is preferable before the storage array. Of course, if you prefer memory size over speed, placing size in the last byte is preferable.

Peter Dimov

6:23 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

Andrey Semashev wrote:

...

...
The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

I'm not sure this would actually be beneficial in terms if performance. Ignoring the fact that size() becomes more expensive, and this is a relatively often used function, you also have to access the tail of the storage, which is likely on a different cache line than the beginning of the string.

Yes, probably. As I said, it's not clear that we should bother with it (I wouldn't), but it at least gives consistent space savings for all short strings. That said, one could use it for N < 32, when the size will be in the same cache line.

Krystian Stasiowski

8:55 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On Wed, Dec 4, 2019 at 1:24 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Yes, probably. As I said, it's not clear that we should bother with it (I wouldn't), but it at least gives consistent space savings for all short strings.

That said, one could use it for N < 32, when the size will be in the same cache line.

What should I do? I can implement this, but since `size()` would be have to be called as opposed to simply accessing the member (this already happens, since a proxy function is used due to the specialization for zero size, but it only returns the member each time so it is easily inlined), as Andrey mentioned this may not be cache friendly.

Peter Dimov

10:24 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

Krystian Stasiowski wrote:

...

On Wed, Dec 4, 2019 at 1:24 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
Yes, probably. As I said, it's not clear that we should bother with it (I wouldn't), but it at least gives consistent space savings for all short strings.

That said, one could use it for N < 32, when the size will be in the same cache line.

What should I do? I can implement this, but since `size()` would be have to be called as opposed to simply accessing the member (this already happens, since a proxy function is used due to the specialization for zero size, but it only returns the member each time so it is easily inlined), as Andrey mentioned this may not be cache friendly.

Well as I said, if it were up to me, I wouldn't use this optimization. smallest_width_t size_; char data_[ N+1 ]; is good enough for me, in all cases, including N == 0. This makes fixed_string<0> two bytes instead of one, but I'm pretty sure I (and 99.4% of the Earth population) can live with that. With Boost components, especially fundamental ones, I always prefer clarity of implementation over optimizations (at the margin).

Andrey Semashev

11:19 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-05 01:24, Peter Dimov via Boost wrote:

...

Krystian Stasiowski wrote:

...
On Wed, Dec 4, 2019 at 1:24 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
Yes, probably. As I said, it's not clear that we should bother with it > (I wouldn't), but it at least gives consistent space savings for all > short strings.

That said, one could use it for N < 32, when the size will be in the same cache line.

What should I do? I can implement this, but since `size()` would be have to be called as opposed to simply accessing the member (this already happens, since a proxy function is used due to the specialization for zero size, but it only returns the member each time so it is easily inlined), as Andrey mentioned this may not be cache friendly.

Well as I said, if it were up to me, I wouldn't use this optimization.

smallest_width_t size_; char data_[ N+1 ];

is good enough for me, in all cases, including N == 0. This makes fixed_string<0> two bytes instead of one, but I'm pretty sure I (and 99.4% of the Earth population) can live with that.

I think, fixed_string<0> should be specialized to an empty class. Not that I have a specific use case for fixed_string<0>, but making the class empty in general is useful for EBO and [[non_unique_address]] and e.g. tuples that employ these techniques. If we can attach the attribute to a captured value in a lambda, that'd be useful as well.

Peter Dimov

11:43 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

Andrey Semashev wrote:

...

I think, fixed_string<0> should be specialized to an empty class. Not that I have a specific use case for fixed_string<0>, but making the class empty in general is useful for EBO and [[non_unique_address]] and e.g. tuples that employ these techniques.

Since values of the same type can't have the same address, you'll be gaining at most one byte per tuple even if you put 1089 fixed_string<0>s in it. Although I suppose tuple<unsigned long, fixed_string<0>> can be 8 bytes instead of 16. My position here is as it has always been; when an actual user asks for this and provides an actual use case in which it leads to actual measurable benefits, add it. Otherwise, not.

Andrey Semashev

11:50 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-05 02:43, Peter Dimov via Boost wrote:

...

Andrey Semashev wrote:

...
I think, fixed_string<0> should be specialized to an empty class. Not that I have a specific use case for fixed_string<0>, but making the class empty in general is useful for EBO and [[non_unique_address]] and e.g. tuples that employ these techniques.

Since values of the same type can't have the same address, you'll be gaining at most one byte per tuple even if you put 1089 fixed_string<0>s in it.

Is it so? I thought [[non_unique_address]] was supposed to allow that.

...

Although I suppose tuple<unsigned long, fixed_string<0>> can be 8 bytes instead of 16.

If such a tuple is e.g. bound into std::function, this may mean a difference between embedded storage and dynamic memory allocation.

Peter Koch Larsen

8:50 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On Wed, Dec 4, 2019 at 7:10 PM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...

On 2019-12-04 19:05, Peter Dimov via Boost wrote:

...
Phil Endecott wrote:

...
Peter Dimov <pdimov@gmail.com> wrote:

...
...
Storing the size (as capacity - size) in the last char for N < 256 will > have more impact, but I'm not sure that it too is worth the added > complexity.

Why the last char, rather than always having the size (of whatever appropriate type) first? Is the idea that this makes data() and c_str() essentially no-ops?

The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

I'm not sure this would actually be beneficial in terms if performance. Ignoring the fact that size() becomes more expensive, and this is a relatively often used function, you also have to access the tail of the storage, which is likely on a different cache line than the beginning of the string. It is more likely that the user will want to process the string in the forward direction, possibly not until the end (think comparison operators, copy/assignment, for instance). If the string is not close to full capacity, you would only fetch the tail cache line to** get the string size.

It is for this reason placing any auxiliary members like size is preferable before the storage array. Of course, if you prefer memory size over speed, placing size in the last byte is preferable.

There is another reason to place the size (or rather the free size) at the end of the data: C-compatibility. I have a similar class named fc_string where I for a fixed_string of N chars use one extra char for the free space. This character steps in and becomes the null-terminator in case the string is full, so I use no extra space in any case. If (for char-based arrays) more than 255 bytes are used, I store 255 in the end and the free size in the characters below the last one. This has the advantage that fixed_string<N> becomes a drop-in replacement for char[N+1] anytime the string is not required to ne normalised while still having an O(1) size. I doubt that cacheing does matter (much) for performance. In practice, fixed_strings are not large, and if they are the time to access the individual characters will probably dominate anyway. If there are many small strings, they will typically be stored in an array, and accessing the size of a string will load the beginning of the next string in cache.

Andrey Semashev

11:07 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-04 23:50, Peter Koch Larsen wrote:

...

On Wed, Dec 4, 2019 at 7:10 PM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
On 2019-12-04 19:05, Peter Dimov via Boost wrote:

...
The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

I'm not sure this would actually be beneficial in terms if performance. Ignoring the fact that size() becomes more expensive, and this is a relatively often used function, you also have to access the tail of the storage, which is likely on a different cache line than the beginning of the string. It is more likely that the user will want to process the string in the forward direction, possibly not until the end (think comparison operators, copy/assignment, for instance). If the string is not close to full capacity, you would only fetch the tail cache line to** get the string size.

It is for this reason placing any auxiliary members like size is preferable before the storage array. Of course, if you prefer memory size over speed, placing size in the last byte is preferable.

There is another reason to place the size (or rather the free size) at the end of the data: C-compatibility. I have a similar class named fc_string where I for a fixed_string of N chars use one extra char for the free space. This character steps in and becomes the null-terminator in case the string is full, so I use no extra space in any case. If (for char-based arrays) more than 255 bytes are used, I store 255 in the end and the free size in the characters below the last one.

C compatibility beyond zero termination of strings is non-existant. You have that special convention, and that is fine, but that convention is not standard and only you know and follow it. No C function would be able to use that extra information without explicit support. You special use case does not make an argument for designing a general utility like fixed_string.

...

I doubt that cacheing does matter (much) for performance.

It all depends on the use case, of course, but memory bandwidth is the main bottleneck in the modern systems. From the space standpoint, there is little difference between N and N+1 or even N+4 or N+8 bytes for a fixed_string<N> object. Given this, it is preferable to choose a data layout that is more efficient in terms of memory accesses and computation complexity on typical use.

Peter Koch Larsen

11:30 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On Thu, Dec 5, 2019 at 12:07 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...

On 2019-12-04 23:50, Peter Koch Larsen wrote:

...
On Wed, Dec 4, 2019 at 7:10 PM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
On 2019-12-04 19:05, Peter Dimov via Boost wrote:

...
The idea here is that you win one byte by reusing the last byte of the storage as the size, overlapping it with the null terminator in the size() == N case (because capacity - size becomes 0).

I'm not sure this would actually be beneficial in terms if performance. Ignoring the fact that size() becomes more expensive, and this is a relatively often used function, you also have to access the tail of the storage, which is likely on a different cache line than the beginning of the string. It is more likely that the user will want to process the string in the forward direction, possibly not until the end (think comparison operators, copy/assignment, for instance). If the string is not close to full capacity, you would only fetch the tail cache line to** get the string size.

It is for this reason placing any auxiliary members like size is preferable before the storage array. Of course, if you prefer memory size over speed, placing size in the last byte is preferable.

There is another reason to place the size (or rather the free size) at the end of the data: C-compatibility. I have a similar class named fc_string where I for a fixed_string of N chars use one extra char for the free space. This character steps in and becomes the null-terminator in case the string is full, so I use no extra space in any case. If (for char-based arrays) more than 255 bytes are used, I store 255 in the end and the free size in the characters below the last one.

C compatibility beyond zero termination of strings is non-existant. You have that special convention, and that is fine, but that convention is not standard and only you know and follow it. No C function would be able to use that extra information without explicit support. You special use case does not make an argument for designing a general utility like fixed_string.

It is not a convention. It is a question of enforcing a memory size and a layout. For embedded systems, this can be important. We have types that are required to be standard layout and have a alignment of 1 - something that we enforce programmatically. I believe that fixed_string is sufficiently specialised to also take embedded development into consideration.

...

...
I doubt that cacheing does matter (much) for performance.

It all depends on the use case, of course, but memory bandwidth is the main bottleneck in the modern systems.

From the space standpoint, there is little difference between N and N+1 or even N+4 or N+8 bytes for a fixed_string<N> object. Given this, it is preferable to choose a data layout that is more efficient in terms of memory accesses and computation complexity on typical use.

It is not just N + 4 or N + 8. Considering alignment restrictions it could be N + 7 or N + 15. This is significant and also a waste of cache.

Andrey Semashev

11:45 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-05 02:30, Peter Koch Larsen wrote:

...

On Thu, Dec 5, 2019 at 12:07 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
C compatibility beyond zero termination of strings is non-existant. You have that special convention, and that is fine, but that convention is not standard and only you know and follow it. No C function would be able to use that extra information without explicit support. You special use case does not make an argument for designing a general utility like fixed_string.

It is not a convention. It is a question of enforcing a memory size and a layout. For embedded systems, this can be important. We have types that are required to be standard layout and have a alignment of 1 - something that we enforce programmatically.

When you need a specific memory layout, you should use a specialized structure or direct byte-wise memory accesses. No general purpose utility guarantees any particular binary representation, and neither should fixed_string.

...

I believe that fixed_string is sufficiently specialised to also take embedded development into consideration.

fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

...

...
From the space standpoint, there is little difference between N and N+1 or even N+4 or N+8 bytes for a fixed_string<N> object. Given this, it is preferable to choose a data layout that is more efficient in terms of memory accesses and computation complexity on typical use. It is not just N + 4 or N + 8. Considering alignment restrictions it could be N + 7 or N + 15. This is significant and also a waste of cache.

If you place the size before the array, you will normally not have unused space between the size and the storage because the alignment of the storage is less than that of the size. You can only have alignment gap up to 3 bytes if the size alignment is lower than that of the character type (i.e. that means the worst case of fixed_string<N, char32_t> will have size of N+4, if the size is represented by 1 to 4 bytes, and N+8, if it is 8).

Andrey Semashev

5 Dec 5 Dec

12:01 a.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-05 02:45, Andrey Semashev wrote:

...

On 2019-12-05 02:30, Peter Koch Larsen wrote:

...
On Thu, Dec 5, 2019 at 12:07 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
C compatibility beyond zero termination of strings is non-existant. You have that special convention, and that is fine, but that convention is not standard and only you know and follow it. No C function would be able to use that extra information without explicit support. You special use case does not make an argument for designing a general utility like fixed_string.

It is not a convention. It is a question of enforcing a memory size and a layout. For embedded systems, this can be important. We have types that are required to be standard layout and have a alignment of 1 - something that we enforce programmatically.

When you need a specific memory layout, you should use a specialized structure or direct byte-wise memory accesses. No general purpose utility guarantees any particular binary representation, and neither should fixed_string.

...
I believe that fixed_string is sufficiently specialised to also take embedded development into consideration.

fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

...
...
From the space standpoint, there is little difference between N and N+1 or even N+4 or N+8 bytes for a fixed_string<N> object. Given this, it is preferable to choose a data layout that is more efficient in terms of memory accesses and computation complexity on typical use. It is not just N + 4 or N + 8. Considering alignment restrictions it could be N + 7 or N + 15. This is significant and also a waste of cache.

If you place the size before the array, you will normally not have unused space between the size and the storage because the alignment of the storage is less than that of the size. You can only have alignment gap up to 3 bytes if the size alignment is lower than that of the character type (i.e. that means the worst case of fixed_string<N, char32_t> will have size of N+4, if the size is represented by 1 to 4 bytes, and N+8, if it is 8).

Sorry, that should be N*sizeof(char32_t)+4 and N*sizeof(char32_t)+8, respectively.

Gavin Lambert

3:09 a.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 5/12/2019 12:45, Andrey Semashev wrote:

...

fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

But when one of the stated design goals is to target "embedded environments without a free store", it seems reasonable to consider the memory layout of the type quite closely. I think Peter's point is that there is another use case for plain-array-storage strings. I don't quite think that fixed_string in its current role is intended (or suitable) for this case, but other than the storage another type which does implement it would end up looking nearly identical to fixed_string. For example, if you already have some blittable data structures: struct ImportantData { int32_t a; int16_t b; int16_t c; char d[32]; }; The memory layout of this type is guaranteed (assuming platform alignments and padding are taken into account or specified explicitly), and it can be safely memcpy'd or stream.write()'d. Wouldn't it be nice if you could replace "d" with a fixed_string<32> (or <31>, depending on how you count the null terminator), and preserve all of those properties, while gaining the string methods? The (big) downside is that if you want to actually be able to swap between these two implementations (eg. given existing serialized data in the old format), then you can't do fancy things with the storage such as putting the size at the end (or indeed anywhere). This then starts leading you down the rabbit-hole of storage adapter policy templates so the user can configure how it reads and writes the length (maybe it keeps it in c; maybe it doesn't store it at all and just uses strlen; maybe it does keep it at the end of d but with some strlen fallback). I hope we can all agree that this would be an over-engineering nightmare, despite the initial attractiveness of the idea of an all-in-one "array-like string" type.

Andrey Semashev

8:44 a.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-05 06:09, Gavin Lambert via Boost wrote:

...

On 5/12/2019 12:45, Andrey Semashev wrote:

...
fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

But when one of the stated design goals is to target "embedded environments without a free store", it seems reasonable to consider the memory layout of the type quite closely.

Yes, but only from the implementation efficiency perspective, which is my point.

...

I think Peter's point is that there is another use case for plain-array-storage strings. I don't quite think that fixed_string in its current role is intended (or suitable) for this case, but other than the storage another type which does implement it would end up looking nearly identical to fixed_string.

For example, if you already have some blittable data structures:

struct ImportantData { int32_t a; int16_t b; int16_t c; char d[32]; };

The memory layout of this type is guaranteed (assuming platform alignments and padding are taken into account or specified explicitly), and it can be safely memcpy'd or stream.write()'d.

Not really, there's also endianness issue. Also, strictly speaking, IIRC up to C++20 binary representation of char is not guaranteed (i.e. it's not required to be two's complement), and even after that its size and signedness are not guaranteed. You should be using (u)int8_t instead.

...

Wouldn't it be nice if you could replace "d" with a fixed_string<32> (or <31>, depending on how you count the null terminator), and preserve all of those properties, while gaining the string methods?

No, unless you write your own fixed_string, which guarantees the specific memory representation you require. You can't replace it with a general purpose fixed_string, which was not written for your specific use case and is not required to maintain your required memory representation. Just as well as you can't replace ImportantData with a std::tuple<int32_t, int16_t, int16_t, std::array<char, 32>> and expect it to preserve the memory layout. It may actually do preserve, but by a pure coincidence.

...

The (big) downside is that if you want to actually be able to swap between these two implementations (eg. given existing serialized data in the old format), then you can't do fancy things with the storage such as putting the size at the end (or indeed anywhere).

This then starts leading you down the rabbit-hole of storage adapter policy templates so the user can configure how it reads and writes the length (maybe it keeps it in c; maybe it doesn't store it at all and just uses strlen; maybe it does keep it at the end of d but with some strlen fallback). I hope we can all agree that this would be an over-engineering nightmare, despite the initial attractiveness of the idea of an all-in-one "array-like string" type.

Right.

Krystian Stasiowski

6:07 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

Hi all, constexpr has been implemented for the majority of functions (there are 3-4 overloads that will require modification for constexpr). If you would like to play around with it, you can find the source here: https://github.com/18/fixed_string/tree/constexpr

Peter Koch Larsen

10:28 p.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On Thu, Dec 5, 2019 at 9:45 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...

On 2019-12-05 06:09, Gavin Lambert via Boost wrote:

...
On 5/12/2019 12:45, Andrey Semashev wrote:

...
fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

...

...
But when one of the stated design goals is to target "embedded environments without a free store", it seems reasonable to consider the memory layout of the type quite closely.

Yes, but only from the implementation efficiency perspective, which is my point. Why? There are two goals: usability and efficiency. You should weight

What do you mean by specialized? It is designed to also favour embedded systems. these and not just unconditionally focus on one aspect. In practice I believe that efficiency will not suffer measurably no matter how you represent the length.

...

...
I think Peter's point is that there is another use case for plain-array-storage strings. I don't quite think that fixed_string in its current role is intended (or suitable) for this case, but other than the storage another type which does implement it would end up looking nearly identical to fixed_string.

For example, if you already have some blittable data structures:

struct ImportantData { int32_t a; int16_t b; int16_t c; char d[32]; };

Gavin expresses my point of view quite precisely except that we are a bit stricter wrt. alignment.

...

...
The memory layout of this type is guaranteed (assuming platform alignments and padding are taken into account or specified explicitly), and it can be safely memcpy'd or stream.write()'d.

Not really, there's also endianness issue. Also, strictly speaking, IIRC up to C++20 binary representation of char is not guaranteed (i.e. it's not required to be two's complement), and even after that its size and signedness are not guaranteed. You should be using (u)int8_t instead.

That does not really matter. As an embedded developer we know our hardware. We do not require it to be ultra portable and might be happy with working on platforms with 8-bit two-complement characters.

...

...
Wouldn't it be nice if you could replace "d" with a fixed_string<32> (or <31>, depending on how you count the null terminator), and preserve all of those properties, while gaining the string methods?

No, unless you write your own fixed_string, which guarantees the specific memory representation you require. Or it is provided by boost?

...

You can't replace it with a general purpose fixed_string, which was not written for your specific use case and is not required to maintain your required memory representation. Just as well as you can't replace ImportantData with a std::tuple<int32_t, int16_t, int16_t, std::array<char, 32>> and expect it to preserve the memory layout. It may actually do preserve, but by a pure coincidence. There is not such a guarantee for std::tuple. This is not a surprise - we program against a standard.

...
The (big) downside is that if you want to actually be able to swap between these two implementations (eg. given existing serialized data in the old format), then you can't do fancy things with the storage such as putting the size at the end (or indeed anywhere).

This then starts leading you down the rabbit-hole of storage adapter policy templates so the user can configure how it reads and writes the length (maybe it keeps it in c; maybe it doesn't store it at all and just uses strlen; maybe it does keep it at the end of d but with some strlen fallback). I hope we can all agree that this would be an over-engineering nightmare, despite the initial attractiveness of the idea of an all-in-one "array-like string" type.

Andrey Semashev

6 Dec 6 Dec

12:17 a.m.

New subject: [review][fixed_string] End of review periodapproaching and a plea

On 2019-12-06 01:28, Peter Koch Larsen wrote:

...

On Thu, Dec 5, 2019 at 9:45 AM Andrey Semashev via Boost <boost@lists.boost.org> wrote:

...
On 2019-12-05 06:09, Gavin Lambert via Boost wrote:

...
On 5/12/2019 12:45, Andrey Semashev wrote:

...
fixed_string can be optimized for speed or space considerations, which may play in favor of embedded systems, but it should not be specialized to embedded systems, let alone to a specific memory layout.

What do you mean by specialized? It is designed to also favour embedded systems.

"Specialized" means catered to a specific use case. You can implement fixed_string many ways, and as long as it doesn't perform dynamic memory allocations I would call it "favoring embedded systems". Requiring it to have a specific binary representation is something else entirely.

...

...
...
But when one of the stated design goals is to target "embedded environments without a free store", it seems reasonable to consider the memory layout of the type quite closely.

Yes, but only from the implementation efficiency perspective, which is my point.

Why? There are two goals: usability and efficiency. You should weight these and not just unconditionally focus on one aspect. In practice I believe that efficiency will not suffer measurably no matter how you represent the length.

I'm not discussing usability because from the API perspective, usability is the same regardless of how you store the size internally.

...

...
...
Wouldn't it be nice if you could replace "d" with a fixed_string<32> (or <31>, depending on how you count the null terminator), and preserve all of those properties, while gaining the string methods?

No, unless you write your own fixed_string, which guarantees the specific memory representation you require.

Or it is provided by boost?

If you name it <your_representation_name_here>_string and document it as a string that is targeted specifically to your use case, and there are enough reviewers interested in it - sure. Though personally I don't think Boost would be the right place for such a component since Boost libraries tend to be general purpose, even if within their specific domains. fixed_string, as I understand it, is a more or less generic string, which is basically a counterpart of static_vector, that's it. Don't try to make it something else, please.

...

...
You can't replace it with a general purpose fixed_string, which was not written for your specific use case and is not required to maintain your required memory representation. Just as well as you can't replace ImportantData with a std::tuple<int32_t, int16_t, int16_t, std::array<char, 32>> and expect it to preserve the memory layout. It may actually do preserve, but by a pure coincidence.

There is not such a guarantee for std::tuple. This is not a surprise - we program against a standard.

Right. IMO, there shouldn't be such a guarantee for fixed_string either.

Alex Hagen-Zanker

4 Dec 4 Dec

11:04 a.m.

...

We still have not settled on a name that is concise and descriptive

How about capped_string, perhaps too British but it does mean with a fixed capacity.

Phil Endecott

2:35 p.m.

New subject: [review][fixed_string] End of review period approaching and a plea

joaquinlopezmunoz@gmail.com wrote:

...

only three of you have cast your acceptance/rejection vote

Reviews aren't votes!

Joaquin M López Muñoz

3:38 p.m.

El 04/12/2019 a las 15:35, Phil Endecott via Boost escribió:

...

joaquinlopezmunoz@gmail.com wrote:

...
only three of you have cast your acceptance/rejection vote

Reviews aren't votes!

Absolutely. But proper reviews must include the vote, just as you've done. Thank you, Joaquín M López Muñoz Review manager for candidate Boost.FixedString

2087

Age (days ago)

2090

Last active (days ago)

List overview

Download

35 comments

13 participants

participants (13)

Alex Hagen-Zanker
Alexander Grund
Andrey Semashev
Bjørn Roald
degski
Gavin Lambert
JeanHeyd Meneide
Joaquin M López Muñoz
Krystian Stasiowski
Peter Dimov
Peter Koch Larsen
Phil Endecott
Vinnie Falco