I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case? Ex. template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); } string_ref test( "test" ); I see that adding this directly doesn't work as the compiler is always choosing to decay and pick the const char* overload. Is there no way to make this work?
AMDG On 11/01/2013 04:42 PM, Michael Marcin wrote:
I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
I see that adding this directly doesn't work as the compiler is always choosing to decay and pick the const char* overload.
Is there no way to make this work?
The behavior is not guaranteed to be the same, anyway. Not all char arrays are string literals. In Christ, Steven Watanabe
On Friday 01 November 2013 18:42:05 Michael Marcin wrote:
I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] )
: basic_string_ref( str, N-1 )
{ static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
I see that adding this directly doesn't work as the compiler is always choosing to decay and pick the const char* overload.
Is there no way to make this work?
See boost/log/utility/string_literal.hpp, the above constructor is not quite enough. It doesn't protect against constructing from an array of characters, so for string_ref there shouldn't be such a constructor, it should always do strlen.
On 2.11.2013. 0:42, Michael Marcin wrote:
I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
You first have to find a compiler that does not already eliminate the strlen ;) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman
On 2 November 2013 10:27, Domagoj Saric wrote:
On 2.11.2013. 0:42, Michael Marcin wrote:
I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
You first have to find a compiler that does not already eliminate the strlen ;)
Indeed, the strlen call on a string literal can (and should) be optimised away, so the current constructor should already produce the optimal result.
On Nov 1, 2013, at 4:42 PM, Michael Marcin
I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
So, what should string_ref ( "test\0test" ) do? { ptr, 4 } --> current behavior { ptr, 9 } --> your suggested behavior -- Marshall Marshall Clow Idio Software mailto:mclow.lists@gmail.com A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki
On 11/2/2013 9:43 AM, Marshall Clow wrote:
On Nov 1, 2013, at 4:42 PM, Michael Marcin
wrote: I noticed string_ref doesn't have a constructor for a string literal. Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
So, what should string_ref ( "test\0test" ) do?
{ ptr, 4 } --> current behavior { ptr, 9 } --> your suggested behavior
So, what should const char s[] = {'0', '1', '2'}; string_ref test{s}; do? Neither seems to be very important and can be handled by requiring sane preconditions.
2013/11/3 Michael Marcin
On 11/2/2013 9:43 AM, Marshall Clow wrote:
On Nov 1, 2013, at 4:42 PM, Michael Marcin
wrote: I noticed string_ref doesn't have a constructor for a string literal.
Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
So, what should string_ref ( "test\0test" ) do?
{ ptr, 4 } --> current behavior { ptr, 9 } --> your suggested behavior
So, what should
const char s[] = {'0', '1', '2'}; string_ref test{s};
do?
Neither seems to be very important and can be handled by requiring sane preconditions.
Marshall, how about adding the following constructor: template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, std::min(N, strlen(str)) ) /* pseudo code, we'll need something like strlen_s */ {} Such constructor won't change the current behavior string_ref ( "test\0test" ) // { ptr, 4 } but will also work for non-zero terminated fixed length arrays: const char s[] = {'0', '1', '2'}; string_ref test(s); // {ptr, 3} -- Best regards, Antony Polukhin
On Nov 2, 2013, at 10:49 PM, Antony Polukhin
2013/11/3 Michael Marcin
On 11/2/2013 9:43 AM, Marshall Clow wrote:
On Nov 1, 2013, at 4:42 PM, Michael Marcin
wrote: I noticed string_ref doesn't have a constructor for a string literal.
Wouldn't this save a call to strlen for a common case?
Ex.
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, N-1 ) { static_assert(N >= 1, "not a string literal"); }
string_ref test( "test" );
So, what should string_ref ( "test\0test" ) do?
{ ptr, 4 } --> current behavior { ptr, 9 } --> your suggested behavior
So, what should
const char s[] = {'0', '1', '2'}; string_ref test{s};
do?
Neither seems to be very important and can be handled by requiring sane preconditions.
Marshall, how about adding the following constructor:
template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, std::min(N, strlen(str)) ) /* pseudo code, we'll need something like strlen_s */ {}
Such constructor won't change the current behavior
string_ref ( "test\0test" ) // { ptr, 4 }
but will also work for non-zero terminated fixed length arrays:
const char s[] = {'0', '1', '2'}; string_ref test(s); // {ptr, 3}
No, actually, it won't, because the strlen will read past the end of the array, looking for the terminating NULL. (and that's undefined behavior) Personally, I'm content with the current situation where * If you have a NULL terminated string (by far the most common case), the { pointer } constructor works fine. * If you have something else, you have to use the { pointer, size } constructor. -- Marshall Marshall Clow Idio Software mailto:mclow.lists@gmail.com A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki
On 7/11/2013 05:19, Quoth Marshall Clow:
On Nov 2, 2013, at 10:49 PM, Antony Polukhin
wrote: template< std::size_t N > basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, std::min(N, strlen(str)) ) /* pseudo code, we'll need something like strlen_s */ {}
Such constructor won't change the current behavior
string_ref ( "test\0test" ) // { ptr, 4 }
but will also work for non-zero terminated fixed length arrays:
const char s[] = {'0', '1', '2'}; string_ref test(s); // {ptr, 3}
No, actually, it won't, because the strlen will read past the end of the array, looking for the terminating NULL. (and that's undefined behavior)
I was thinking about pointing that out earlier but I assumed that resolving that issue was what he meant by the "strlen_s" comment. Although given a strlen_s that eliminates that possibility, the call to std::min would be redundant, so either way it's a little odd. (Though most of the time you'd be able to get away with the above even as written, because the chances that you'd run off the top of the stack before encountering a null byte are almost nil. Not that it should be encouraged, of course.)
Personally, I'm content with the current situation where * If you have a NULL terminated string (by far the most common case),the { pointer } constructor works fine. * If you have something else, you have to use the { pointer, size } constructor.
While I mostly agree with this, it is probably just as common to have a char buffer on the stack that you *think* is null terminated, but might not be. (Particularly if the classic "strncpy" has been involved at some point.) That's where having this sort of constructor could be beneficial, to act as a backstop against such issues. Though personally I'm still inclined to leave it explicitly up to the application -- the app code needs to be much more aware of what it's doing if it's intentionally passing around (potentially) non-terminated strings; and if it's accidental then it's a bug that should be fixed immediately (eg. with a safe_strncpy) rather than being quietly resolved by a library.
2013/11/7 Gavin Lambert
On 7/11/2013 05:19, Quoth Marshall Clow:
On Nov 2, 2013, at 10:49 PM, Antony Polukhin
wrote: template< std::size_t N >
basic_string_ref( const charT( &str )[N] ) : basic_string_ref( str, std::min(N, strlen(str)) ) /* pseudo code, we'll need something like strlen_s */ {}
Such constructor won't change the current behavior
string_ref ( "test\0test" ) // { ptr, 4 }
but will also work for non-zero terminated fixed length arrays:
const char s[] = {'0', '1', '2'}; string_ref test(s); // {ptr, 3}
No, actually, it won't, because the strlen will read past the end of the array, looking for the terminating NULL. (and that's undefined behavior)
I was thinking about pointing that out earlier but I assumed that resolving that issue was what he meant by the "strlen_s" comment.
<...> Yes, I meant strlen_s. This function in not in Standard so some pseodocode `std::min(N, strlen(str))` was provided to describe the idea.
While I mostly agree with this, it is probably just as common to have a char buffer on the stack that you *think* is null terminated, but might not be. (Particularly if the classic "strncpy" has been involved at some point.) That's where having this sort of constructor could be beneficial, to act as a backstop against such issues.
Though personally I'm still inclined to leave it explicitly up to the application -- the app code needs to be much more aware of what it's doing if it's intentionally passing around (potentially) non-terminated strings; and if it's accidental then it's a bug that should be fixed immediately (eg. with a safe_strncpy) rather than being quietly resolved by a library.
This is one of the good points. But from the view of usability user may assume that library will determinate the size in this situation: const char s[] = {'0', '1', '2'}; /* User knows that this is not zero terminated*/ string_ref test(s); // Size of array is determinated at this point. Why not have {ptr, 3}? Questions about nonzero terminated strings arise quite often. On the other hand, if we apply the strlen_s then users may be surprised by the following: const char s[] = {'0', '1', '2', '\0', '\1'}; /* User knows that this has fixed size*/ string_ref test(s); // Size of array is determinated at this point. Why *we have* {ptr, 3}? Looks like there is no solution that will satisfy all the users. -- Best regards, Antony Polukhin
On 7/11/2013 20:33, Quoth Antony Polukhin:
But from the view of usability user may assume that library will determinate the size in this situation:
const char s[] = {'0', '1', '2'}; /* User knows that this is not zero terminated*/ string_ref test(s); // Size of array is determinated at this point. Why not have {ptr, 3}?
Questions about nonzero terminated strings arise quite often.
On the other hand, if we apply the strlen_s then users may be surprised by the following:
const char s[] = {'0', '1', '2', '\0', '\1'}; /* User knows that this has fixed size*/ string_ref test(s); // Size of array is determinated at this point. Why *we have* {ptr, 3}?
Looks like there is no solution that will satisfy all the users.
Which is why I think it ought to be explicit (always requiring the size to be provided -- the app can call a strlen_s equivalent itself if it wishes that behaviour). The first example in particular actually seems potentially dangerous to me, because the ability to handle the non-terminated string is "hidden", making it more likely that the author would forget and pass s to something else that's expecting a terminated string. Or perhaps they get into the habit of writing code like the first example, and then at some point someone moves s to be a parameter (or extracts the string_ref construction into a subfunction). This code style increases the risk that they won't notice the array decaying into a pointer and end up calling the non-array constructor, which is expecting a terminated string.
On 7 November 2013 16:00, Gavin Lambert
The first example in particular actually seems potentially dangerous to me, because the ability to handle the non-terminated string is "hidden", making it more likely that the author would forget and pass s to something else that's expecting a terminated string.
How does that differ from anything else that uses a C string? I just don't see how string_ref is any worse. -- Nevin ":-)" Liber mailto:nevin@eviloverlord.com (847) 691-1404
On 8/11/2013 11:27, Quoth Nevin Liber:
On 7 November 2013 16:00, Gavin Lambert
wrote: The first example in particular actually seems potentially dangerous to me, because the ability to handle the non-terminated string is "hidden", making it more likely that the author would forget and pass s to something else that's expecting a terminated string.
How does that differ from anything else that uses a C string?
I just don't see how string_ref is any worse.
The current version of string_ref is not any worse. What we're discussing here is a possible extra constructor, which I am saying is potentially useful but possibly too dangerous to be worthwhile.
On Nov 7, 2013, at 5:00 PM, Gavin Lambert
On 7/11/2013 20:33, Quoth Antony Polukhin:
But from the view of usability user may assume that library will determinate the size in this situation:
const char s[] = {'0', '1', '2'}; /* User knows that this is not zero terminated*/ string_ref test(s); // Size of array is determinated at this point. Why not have {ptr, 3}?
Questions about nonzero terminated strings arise quite often.
On the other hand, if we apply the strlen_s then users may be surprised by the following:
const char s[] = {'0', '1', '2', '\0', '\1'}; /* User knows that this has fixed size*/ string_ref test(s); // Size of array is determinated at this point. Why *we have* {ptr, 3}?
Looks like there is no solution that will satisfy all the users.
Which is why I think it ought to be explicit (always requiring the size to be provided -- the app can call a strlen_s equivalent itself if it wishes that behaviour).
The first example in particular actually seems potentially dangerous to me, because the ability to handle the non-terminated string is "hidden", making it more likely that the author would forget and pass s to something else that's expecting a terminated string.
If code assumes a string_ref refers to a null-terminated string, that code is wrong.
Or perhaps they get into the habit of writing code like the first example, and then at some point someone moves s to be a parameter (or extracts the string_ref construction into a subfunction). This code style increases the risk that they won't notice the array decaying into a pointer and end up calling the non-array constructor, which is expecting a terminated string.
Why not add named constructors to clarify the caller's intent? ___ Rob (Sent from my portable computation engine)
On 8/11/2013 12:40, Quoth Rob Stewart:
On Nov 7, 2013, at 5:00 PM, Gavin Lambert
wrote: const char s[] = {'0', '1', '2'}; /* User knows that this is not zero terminated*/ string_ref test(s); // Size of array is determinated at this point. Why not have {ptr, 3}? [...] The first example in particular actually seems potentially dangerous to me, because the ability to handle the non-terminated string is "hidden", making it more likely that the author would forget and pass sto something else that's expecting a terminated string.
If code assumes a string_ref refers to a null-terminated string, that code is wrong.
I was referring to "s", the original char array, not to "test", the string_ref. ie. if string_ref handled it gracefully and "hidden", the user might forget that other things they can legally (according to the compiler) pass "s" to might not. But that's reaching a bit. My main objection relates to the surprise decay from array to pointer if the code is refactored, and that this would change the behaviour in a potentially surprising way if that constructor existed, without a sufficiently obvious change to the code.
Why not add named constructors to clarify the caller's intent?
That would be better. Still, where an lvalue is available, I don't see any particular benefit to this -- almost anything you could do with this suggested new constructor you could do with string_ref(s, boost::size(s)). (Or _countof, or ARRAY_LENGTH, or sizeof, depending on your char type and platform of choice.) The only case I can see that it might provide some benefit (other than being slightly shorter) would be when passing a string literal directly -- but string literals are always null-terminated, and the compiler should be able to optimise away the strlen, so I don't think it gains much in the end. I suppose it's possible to imagine other cases where you might have an rvalue array, but most that I can think of seem overly contrived to me, especially as the utility of string_ref on an rvalue is limited since it's a non-owning object.
participants (10)
-
Andrey Semashev
-
Antony Polukhin
-
Domagoj Saric
-
Gavin Lambert
-
Jonathan Wakely
-
Marshall Clow
-
Michael Marcin
-
Nevin Liber
-
Rob Stewart
-
Steven Watanabe