Hi y'all, I've hacked together a cstring_view class and would like to ask if it could be included in boost.utility or boost.core. https://gist.github.com/klemens-morgenstern/fc02dab8b37fc77ac80efbca30f297cf The main motivation is to have a view that works with C-APIs. I also need the char_traits for dealing with odd char semantics for some system APIs. (e.g. environment keys on windows are char-insensitive but preserving).
On May 10, 2022, at 8:35 PM, Klemens Morgenstern via Boost
Hi y'all,
I've hacked together a cstring_view class and would like to ask if it could be included in boost.utility or boost.core.
https://gist.github.com/klemens-morgenstern/fc02dab8b37fc77ac80efbca30f297cf
The main motivation is to have a view that works with C-APIs. I also need the char_traits for dealing with odd char semantics for some system APIs. (e.g. environment keys on windows are char-insensitive but preserving).
I don’t see how this is supposed to maintain the invariant ( null temination) w/o it’s own storage. Consider: cstring_view sv; // is sv-data() pointing to a null-terminated string? If so, where did it come from? cstring_view sv1(“abcdefg”)’ sv1.substring(3, 2); // what’s the value of strlen(sv.data) here? How can it be 2? — Marshall
On Tue, 2022-05-10 at 21:15 -0700, Marshall Clow via Boost wrote:
On May 10, 2022, at 8:35 PM, Klemens Morgenstern via Boost
wrote: Hi y'all,
I've hacked together a cstring_view class and would like to ask if it could be included in boost.utility or boost.core.
https://gist.github.com/klemens-morgenstern/fc02dab8b37fc77ac80efbca30f297cf
The main motivation is to have a view that works with C-APIs. I also need the char_traits for dealing with odd char semantics for some system APIs. (e.g. environment keys on windows are char-insensitive but preserving).
I don’t see how this is supposed to maintain the invariant ( null temination) w/o it’s own storage.
There's a static "" for default initialization. BOOST_CONSTEXPR static const_pointer null_char_() {return null_char_(CharT{});} BOOST_CONSTEXPR static const char* null_char_(char) {return "";} BOOST_CONSTEXPR static const wchar_t* null_char_(wchar_t) {return L"";}
Consider: cstring_view sv; // is sv-data() pointing to a null-terminated string? If so, where did it come from?
cstring_view sv1(“abcdefg”)’ sv1.substring(3, 2); // what’s the value of strlen(sv.data) here? How can it be 2?
The substr returns a string_view if you pass it a second parameter: BOOST_CONSTEXPR basic_cstring_view substr(size_type pos = 0) const; BOOST_CONSTEXPR string_view_type substr(size_type pos , size_type n) const; I can only move the starting character with substr and maintain the null-terminator, so if I change the end, I'll get a regular string_view.
— Marshall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I don’t see how this is supposed to maintain the invariant ( null temination) w/o it’s own storage.
I had similar concerns, but then I looked at the code, and ... you know, it answers the question.
Consider: cstring_view sv; // is sv-data() pointing to a null-terminated string? If so, where did it come from?
The appriopriate static character literal.
cstring_view sv1(“abcdefg”)’ sv1.substring(3, 2); // what’s the value of strlen(sv.data) here? How can it be 2?
Length-constrained substring returns string_view. Seth
I guess when I read the impl I think to myself: What this is missing is a _length member. But then it just becomes boost::string_view. What value added is there to this impl except that it is smaller than boost::string_view due to lacking a _length member? bien Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows From: Seth via Boostmailto:boost@lists.boost.org Sent: Friday, May 13, 2022 6:35 AM To: Boost Listmailto:boost@lists.boost.org Cc: Sethmailto:bugs@sehe.nl Subject: Re: [boost] cstring_view
I don’t see how this is supposed to maintain the invariant ( null temination) w/o it’s own storage.
I had similar concerns, but then I looked at the code, and ... you know, it answers the question.
Consider: cstring_view sv; // is sv-data() pointing to a null-terminated string? If so, where did it come from?
The appriopriate static character literal.
cstring_view sv1(“abcdefg”)’ sv1.substring(3, 2); // what’s the value of strlen(sv.data) here? How can it be 2?
Length-constrained substring returns string_view. Seth _______________________________________________ Unsubscribe & other changes: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.boost...
David Bien wrote:
I guess when I read the impl I think to myself: What this is missing is a _length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than boost::string_view due to lacking a _length member?
Please see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1402r0.pdf No idea why this particular implementation doesn't store the size. I would.
On Fri, 2022-05-13 at 21:09 +0300, Peter Dimov via Boost wrote:
David Bien wrote:
I guess when I read the impl I think to myself: What this is missing is a _length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than boost::string_view due to lacking a _length member?
Please see
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1402r0.pdf
No idea why this particular implementation doesn't store the size. I would.
I want to use it as a small wrapper around system-APIs, that can be implicitly constructed from any matching string type, e.g. boost::static_string as well as const char *. `strlen` would just be unnecessary. void set_env(cstring_view name, cstring_view value, error_code & ec) { auto e = ::setenv(name.c_str(), value.c_str()); if (!e) ec = some_error; } Using strlen seems unnecessary to me. I also stripped out most of the functions to just have a bare-bone view of a cstring in the PR I submitted: https://github.com/boostorg/utility/pull/100 Basic idea being: provide whatever you can do on a null terminated string, for anything more just use a string_view. I don't think there's much utility to the cstring_view outside of interaction with C-APIs.
On May 13, 2022, at 10:42 AM, David Bien via Boost
I guess when I read the impl I think to myself: What this is missing is a _length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than boost::string_view due to lacking a _length member?
If you’re careful and don’t do much with it, it can hand you back a null terminated string. — Marshall PS. I note that P1402 (the paper proposing cstring_view for the standard) was was reviewed by LEWG in 2019, and the resolution of that group was "We will not pursue P1402R0 or this problem space”
Marshall Clow wrote:
On May 13, 2022, at 10:42 AM, David Bien via Boost
wrote: I guess when I read the impl I think to myself: What this is missing is a
_length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than
boost::string_view due to lacking a _length member?
If you’re careful and don’t do much with it, it can hand you back a null terminated string.
??? In what scenarios will it not give you a null-terminated string?
Exactly. But then it should just be called cstring_wrapper – calling it cstring_view seems to impart qualities to it that it doesn’t have – like I would assume that I could get a cstring_view that is a subview of an existing cstring_view for instance. Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows From: Peter Dimov via Boostmailto:boost@lists.boost.org Sent: Friday, May 13, 2022 12:19 PM To: boost@lists.boost.orgmailto:boost@lists.boost.org Cc: Peter Dimovmailto:pdimov@gmail.com Subject: Re: [boost] cstring_view Marshall Clow wrote:
On May 13, 2022, at 10:42 AM, David Bien via Boost
wrote: I guess when I read the impl I think to myself: What this is missing is a
_length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than
boost::string_view due to lacking a _length member?
If you’re careful and don’t do much with it, it can hand you back a null terminated string.
??? In what scenarios will it not give you a null-terminated string? _______________________________________________ Unsubscribe & other changes: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.boost...
On Fri, 2022-05-13 at 18:20 +0000, David Bien via Boost wrote:
Exactly. But then it should just be called cstring_wrapper – calling it cstring_view seems to impart qualities to it that it doesn’t have – like I would assume that I could get a cstring_view that is a subview of an existing cstring_view for instance.
That's a fair point, I was thinking about naming it cstring_ref originally. The Paper reference by Peter made me reconsider. With cstring_ref, someone else could write a cstring_view that also contains the size. Do however note that you can get that subview if the result is still a cstring. Thus obly moving the start of the cstring works.
On May 13, 2022, at 11:18 AM, Peter Dimov via Boost
Marshall Clow wrote:
On May 13, 2022, at 10:42 AM, David Bien via Boost
wrote: I guess when I read the impl I think to myself: What this is missing is a
_length member. But then it just becomes boost::string_view.
What value added is there to this impl except that it is smaller than
boost::string_view due to lacking a _length member?
If you’re careful and don’t do much with it, it can hand you back a null terminated string.
???
In what scenarios will it not give you a null-terminated string?
char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom — Marshall PS. It promises to give you a null-terminated string, but has no way to actually guarantee that.
Marshall Clow wrote:
If you’re careful and don’t do much with it, it can hand you back a null terminated string.
???
In what scenarios will it not give you a null-terminated string?
char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom
The main use of cstring_view, like string_view, is as a parameter (and return) type. So if you have a function void f1( cstring_view csv ); it's true that if inside f1 you write to some random character this may invalidate csv's promise to be null-terminated, but I see little salient difference between this and void f2( char const* csv ); // pre: csv is null-terminated char seq where f2 writing to a carefully chosen char may also invalidate the precondition. Typing "cstring_view" is merely a different way of spelling out the "pre" of f2. Similarly, cstring_view g1(); is an alternative way of spelling char const* g2(); // post: the return value is a null-terminated char seq
On 13.05.22 20:39, Marshall Clow via Boost wrote:
On May 13, 2022, at 11:18 AM, Peter Dimov via Boost
wrote: In what scenarios will it not give you a null-terminated string?
char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom
— Marshall
PS. It promises to give you a null-terminated string, but has no way to actually guarantee that.
That's an issue with views in general, not just cstring_view. std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom It is the responsibility of the creator of a view to ensure that the object being viewed does not change in a way that breaks the invariants of the view while the view is in use. -- Rainer Deyke (rainerd@eldwood.com)
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
On 13.05.22 20:39, Marshall Clow via Boost wrote:
On May 13, 2022, at 11:18 AM, Peter Dimov via Boost
wrote: In what scenarios will it not give you a null-terminated string? char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom — Marshall PS. It promises to give you a null-terminated string, but has no way to actually guarantee that.
That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::'). No assertion failure; no undefined behavior (unlike the cstring_view example) — Marshall
Marshall Clow wrote:
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
wrote: On 13.05.22 20:39, Marshall Clow via Boost wrote:
On May 13, 2022, at 11:18 AM, Peter Dimov via Boost
wrote: In what scenarios will it not give you a null-terminated string? char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom — Marshall PS. It promises to give you a null-terminated string, but has no way to actually guarantee that.
That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::').
No assertion failure; no undefined behavior (unlike the cstring_view example)
Only because "hello!" fits into the small buffer, I suspect. If `s` reallocates, `sv` would be left dangling.
On May 13, 2022, at 3:19 PM, Peter Dimov via Boost
wrote: Marshall Clow wrote:
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
wrote: On 13.05.22 20:39, Marshall Clow via Boost wrote:
On May 13, 2022, at 11:18 AM, Peter Dimov via Boost
wrote: In what scenarios will it not give you a null-terminated string? char arr[6] = “hello"; cstring_view csv(arr); assert(strlen(csv.data())) == 5); arr[5] = ‘!’; assert(strlen(csv.data())) == 5); // boom — Marshall PS. It promises to give you a null-terminated string, but has no way to actually guarantee that.
That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::').
No assertion failure; no undefined behavior (unlike the cstring_view example)
Only because "hello!" fits into the small buffer, I suspect. If `s` reallocates, `sv` would be left dangling.
Agreed. But even if the string *did* reallocate, the call "assert(sv.size() == 5)” is still valid and well defined. In the cstring_view example I wrote, there are no allocations (it’s a static buffer), and the call exhibits undefined behavior (as well as the assertion failure). The whole point of cstring_view is “I have a sequence of N characters here, and I *swear* that then n+1st one is a NUL” — Marshall P. S. Std::string has the same behavior (which I really dislike), but at least it owns the storage, so it can enforce the presence of the NUL.
On 14.05.22 00:42, Marshall Clow via Boost wrote:
On May 13, 2022, at 3:19 PM, Peter Dimov via Boost
wrote: Marshall Clow wrote:
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
wrote: That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::').
No assertion failure; no undefined behavior (unlike the cstring_view example)
Only because "hello!" fits into the small buffer, I suspect. If `s` reallocates, `sv` would be left dangling.
Agreed. But even if the string *did* reallocate, the call "assert(sv.size() == 5)” is still valid and well defined.
No it's not. sv.size() works by subtracting pointers, and it's only legal to subtract two pointers if they point into the same memory region. Which sv.begin() and sv.end() no longer do if s reallocates. It's subtle, but it's definitely undefined behavior. -- Rainer Deyke (rainerd@eldwood.com)
Am 14.05.2022 um 09:45 schrieb Rainer Deyke via Boost:
On 14.05.22 00:42, Marshall Clow via Boost wrote:
On May 13, 2022, at 3:19 PM, Peter Dimov via Boost
wrote: Marshall Clow wrote:
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
wrote: That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::').
No assertion failure; no undefined behavior (unlike the cstring_view example)
Only because "hello!" fits into the small buffer, I suspect. If `s` reallocates, `sv` would be left dangling.
Agreed. But even if the string *did* reallocate, the call "assert(sv.size() == 5)” is still valid and well defined.
No it's not. sv.size() works by subtracting pointers, and it's only legal to subtract two pointers if they point into the same memory region. Which sv.begin() and sv.end() no longer do if s reallocates. It's subtle, but it's definitely undefined behavior.
Not really. The standard (in its current draft) is silent about the invalidation of size() and talks only about iterators, references and pointers with respect to the viewed object [string.view.template.general]/2. On top of that, afaik all major implementations have agreed on and settled on the same structure layout as shown in the standard as exposition only. So technically, this is unspecified. Ciao Dani -- PGP/GPG: 2CCB 3ECB 0954 5CD3 B0DB 6AA0 BA03 56A1 2C4638C5
On 14.05.22 10:31, Daniela Engert via Boost wrote:
Am 14.05.2022 um 09:45 schrieb Rainer Deyke via Boost:
No it's not. sv.size() works by subtracting pointers, and it's only legal to subtract two pointers if they point into the same memory region. Which sv.begin() and sv.end() no longer do if s reallocates. It's subtle, but it's definitely undefined behavior.
Not really. The standard (in its current draft) is silent about the invalidation of size() and talks only about iterators, references and pointers with respect to the viewed object [string.view.template.general]/2. On top of that, afaik all major implementations have agreed on and settled on the same structure layout as shown in the standard as exposition only. So technically, this is unspecified.
I did not know that. Still, almost every other use of an invalidated string_view results in undefined behavior. assert(sv[0] == "h"); // boom assert(sv == "hello"); // boom -- Rainer Deyke (rainerd@eldwood.com)
On May 14, 2022, at 12:45 AM, Rainer Deyke via Boost
wrote: On 14.05.22 00:42, Marshall Clow via Boost wrote:
On May 13, 2022, at 3:19 PM, Peter Dimov via Boost
wrote: Marshall Clow wrote:
On May 13, 2022, at 12:29 PM, Rainer Deyke via Boost
wrote: That's an issue with views in general, not just cstring_view.
std::string s = "hello"; string_view sv = s; assert(sv.size() == 5); s += "!"; assert(sv.size() == 5); // boom
I don’t see the problem here (and when I run the code I get no error - after adding the missing ’std::').
No assertion failure; no undefined behavior (unlike the cstring_view example)
Only because "hello!" fits into the small buffer, I suspect. If `s` reallocates, `sv` would be left dangling. Agreed. But even if the string *did* reallocate, the call "assert(sv.size() == 5)” is still valid and well defined.
No it's not. sv.size() works by subtracting pointers, and it's only legal to subtract two pointers if they point into the same memory region. Which sv.begin() and sv.end() no longer do if s reallocates. It's subtle, but it's definitely undefined behavior.
Um, no. string_view::size() *could* work by subtracting two pointers, but it’s certainly not *required* to do so. From the implementation of boost::string_view: BOOST_CONSTEXPR size_type size() const BOOST_NOEXCEPT { return len_; } From the implementation od libc++’s string_view: _LIBCPP_CONSTEXPR _LIBCPP_INLINE_VISIBILITY size_type size() const _NOEXCEPT { return __size; } From the implementation of libstdc++’s string_view: constexpr size_type size() const noexcept { return this->_M_len; } — Marshall
participants (7)
-
Daniela Engert
-
David Bien
-
Klemens Morgenstern
-
Marshall Clow
-
Peter Dimov
-
Rainer Deyke
-
Seth