Constructing string_ref from rvalue string

I've just tested our codebase with the 1.64.0 master snapshot, and i've encountered an issue which i've tracked down to this commit: https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd4696 1f76d2cb55 Basically this removes the constructor which allows constructing a boost::string_ref from an rvalue std::string. I am wondering if this is really that dangerous. In our codebase we almost exlusively use boost::string_ref as function parameters, with the understanding that passed string_refs parameters are only valid within the function body. (so you need to copy it into an std::string if you need to access it afterwards) With these constraints i think it is OK to construct a string_ref from an rvalue std::string because the temporary should live long enough. (but please correct me if i'm wrong here) Is the safety (?) added by this commit worth it if it breaks working code? Simon

On 17/03/2017 20:29, Simon Sasburg via Boost wrote:
I've just tested our codebase with the 1.64.0 master snapshot, and i've encountered an issue which i've tracked down to this commit:
https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd4696 1f76d2cb55
Basically this removes the constructor which allows constructing a boost::string_ref from an rvalue std::string. I am wondering if this is really that dangerous. In our codebase we almost exlusively use boost::string_ref as function parameters, with the understanding that passed string_refs parameters are only valid within the function body. (so you need to copy it into an std::string if you need to access it afterwards) With these constraints i think it is OK to construct a string_ref from an rvalue std::string because the temporary should live long enough. (but please correct me if i'm wrong here)
If you are calling a function that returns a std::string (thus you have an rvalue temporary) and you are immediately wrapping this in a string_ref (as another temporary) and then passing this as a function parameter, AFAIK the compiler is within its rights to delete the temporary std::string prior to the call as it is not the actual parameter. (To put it more generally: only the temporaries that are the actual parameters are "safe" -- any temporaries used to construct those temporaries might be deleted prior to the call.) Although you might get away with it sometimes (perhaps even most of the time) as it probably wouldn't do so without aggressive optimisation -- but you shouldn't rely on this; it's unsafe usage and subject to the whims of the compiler. The safe version is to store the returned std::string as an lvalue (eg. a local variable) prior to the function call that constructs the temporary rvalue string_refs.
Is the safety (?) added by this commit worth it if it breaks working code?
Can you post an example of such working code that's broken by this change? (Assuming that it's not covered by the above case.)

On Sun, Mar 19, 2017 at 10:42 PM, Gavin Lambert via Boost
If you are calling a function that returns a std::string (thus you have an rvalue temporary) and you are immediately wrapping this in a string_ref (as another temporary) and then passing this as a function parameter, AFAIK the compiler is within its rights to delete the temporary std::string prior to the call as it is not the actual parameter.
Nope. Those temporaries live until the end of the enclosing full-expression.

On 20 Mar 2017, at 03:14, Tim Song via Boost
wrote: On Sun, Mar 19, 2017 at 10:42 PM, Gavin Lambert via Boost
wrote: If you are calling a function that returns a std::string (thus you have an rvalue temporary) and you are immediately wrapping this in a string_ref (as another temporary) and then passing this as a function parameter, AFAIK the compiler is within its rights to delete the temporary std::string prior to the call as it is not the actual parameter.
Nope. Those temporaries live until the end of the enclosing full-expression.
I thought it worth bringing to your attention that clang-tidy has a check to find misuse of string_view construction from temporaries: http://clang.llvm.org/extra/clang-tidy/checks/misc-dangling-handle.html Regards Jonathan
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On 20/03/2017 16:14, Tim Song wrote:
On Sun, Mar 19, 2017 at 10:42 PM, Gavin Lambert wrote:
If you are calling a function that returns a std::string (thus you have an rvalue temporary) and you are immediately wrapping this in a string_ref (as another temporary) and then passing this as a function parameter, AFAIK the compiler is within its rights to delete the temporary std::string prior to the call as it is not the actual parameter.
Nope. Those temporaries live until the end of the enclosing full-expression.
Well, yes, in theory. Though I've been burned in the past by some (older) compilers doing it wrong in these sorts of cases, so that usage makes me nervous. Any modern compiler shouldn't have an issue with it though.

On Mon, Mar 20, 2017 at 11:57 PM, Gavin Lambert via Boost
On 20/03/2017 16:14, Tim Song wrote:
On Sun, Mar 19, 2017 at 10:42 PM, Gavin Lambert wrote:
If you are calling a function that returns a std::string (thus you have an rvalue temporary) and you are immediately wrapping this in a string_ref (as another temporary) and then passing this as a function parameter, AFAIK the compiler is within its rights to delete the temporary std::string prior to the call as it is not the actual parameter.
Nope. Those temporaries live until the end of the enclosing full-expression.
Well, yes, in theory. Though I've been burned in the past by some (older) compilers doing it wrong in these sorts of cases, so that usage makes me nervous. Any modern compiler shouldn't have an issue with it though.
void f2(const string&); string f1(); f1(f2()); Isn't this a very common pattern? -- Olaf

On 23/03/2017 03:28, Olaf van der Spek via Boost wrote:
void f2(const string&); string f1();
f1(f2());
Isn't this a very common pattern?
Yes (although you have it reversed), but that's not what we were talking about. We were talking about this: void f1(const string_ref&); string f2(); f1(f2()); Or this: void f1(const string_ref&); string_ref f2(const string_ref&); string f3(); f1(f2(f3())); They both *should* be safe because the temporary string should not be destroyed until after f1 is called. But some old/embedded compilers would sometimes get this wrong when optimising. You can also get into trouble even on modern compilers if binding is involved in certain patterns, since then the full-expression might end before the call actually happens. I did make an error in my initial response; I've just been burned by buggy implementations in the past and it makes me nervous, so I jumped the gun. I apologise for any confusion.

On 03/23/17 01:38, Gavin Lambert via Boost wrote:
On 23/03/2017 03:28, Olaf van der Spek via Boost wrote:
void f2(const string&); string f1();
f1(f2());
Isn't this a very common pattern?
Yes (although you have it reversed), but that's not what we were talking about. We were talking about this:
void f1(const string_ref&); string f2();
f1(f2());
Or this:
void f1(const string_ref&); string_ref f2(const string_ref&); string f3();
f1(f2(f3()));
They both *should* be safe because the temporary string should not be destroyed until after f1 is called. But some old/embedded compilers would sometimes get this wrong when optimising.
IMHO, compiler bugs should not be the reason for interface design choices. And BTW, I know Sun Pro compiler (Oracle Studio nowdays) had a mode with the opposite behavior - it would not destroy temporaries until the end of the scope.
You can also get into trouble even on modern compilers if binding is involved in certain patterns, since then the full-expression might end before the call actually happens.
I did make an error in my initial response; I've just been burned by buggy implementations in the past and it makes me nervous, so I jumped the gun. I apologise for any confusion.
Whatever we decide to do about this issue, I think we should make the choice before 1.64 is released. I think, this is a new change for 1.64, and it's not too late to roll it back if we choose so (IMHO, we should).

I've just blogged about this topic: http://foonathan.net/blog/2017/03/22/string_view-temporary.html (Planned that post before this thread) In my opinion I think `boost::string_ref` and `std::string_view` are two different things, `boost::string_ref` is for use cases where `std::string_view` is potentially more dangerous (due to rvalue strings) like in return values, and `std::string_view` is for arguments as `boost::string_ref` isn't great there (due to not accepting rvalue strings). Blog post goes into a little bit more detail. I'm just throwing my opinion out there. Jonathan

On Wed, Mar 22, 2017 at 11:38 PM, Gavin Lambert via Boost
On 23/03/2017 03:28, Olaf van der Spek via Boost wrote:
void f2(const string&); string f1();
f1(f2());
Isn't this a very common pattern?
Yes (although you have it reversed), but that's not what we were talking about.
It's the same underlying issue, replacing const string& by string_view doesn't change it.
We were talking about this:
void f1(const string_ref&); string f2();
f1(f2());
Or this:
void f1(const string_ref&);
AFAIK string_ref should be passed by value, not by const&. -- Olaf

On 23/03/2017 20:57, Olaf van der Spek via Boost wrote:
It's the same underlying issue, replacing const string& by string_view doesn't change it.
It does, because it constructs an additional temporary. The compiler then has an opportunity to destruct the first temporary at the wrong time. (Which it should not use, if it follows the standard, but that was the whole point.) In any case, I think we've bothered the deceased equine enough by now.
We were talking about this:
void f1(const string_ref&); string f2();
f1(f2());
Or this:
void f1(const string_ref&);
AFAIK string_ref should be passed by value, not by const&.
It's larger than a pointer. Ergo, pass by reference. Granted it doesn't make a lot of difference as it's trivial to copy so that performance isn't much of an issue, but why waste stack/register space unnecessarily?

On 2017-03-23 22:46, Gavin Lambert via Boost wrote:
On 23/03/2017 20:57, Olaf van der Spek via Boost wrote:
It's the same underlying issue, replacing const string& by string_view doesn't change it.
It does, because it constructs an additional temporary. The compiler then has an opportunity to destruct the first temporary at the wrong time. (Which it should not use, if it follows the standard, but that was the whole point.)
In any case, I think we've bothered the deceased equine enough by now.
We were talking about this:
void f1(const string_ref&); string f2();
f1(f2());
Or this:
void f1(const string_ref&);
AFAIK string_ref should be passed by value, not by const&.
It's larger than a pointer. Ergo, pass by reference.
Granted it doesn't make a lot of difference as it's trivial to copy so that performance isn't much of an issue, but why waste stack/register space unnecessarily?
How about when *using* the parameter. Effectively passing a reference to a pointer may not be the best way to get high performance. Bo Persson

On 24/03/2017 12:42, Bo Persson wrote:
On 2017-03-23 22:46, Gavin Lambert wrote:
On 23/03/2017 20:57, Olaf van der Spek wrote:
AFAIK string_ref should be passed by value, not by const&.
It's larger than a pointer. Ergo, pass by reference.
Granted it doesn't make a lot of difference as it's trivial to copy so that performance isn't much of an issue, but why waste stack/register space unnecessarily?
How about when *using* the parameter. Effectively passing a reference to a pointer may not be the best way to get high performance.
It makes no real difference. Any interaction with string_ref will be by calling member methods, which in turn require passing a pointer to the "this" instance (either on the stack or in register, depending on calling convention). Either way, it's the same thing -- it either passes the reference it already has as a pointer or it passes the address of the local copy. (Or more likely they'll be inlined.) It might possibly inject one extra stack indirection if the reference is still in the stack rather than in a register, but that'll be in L1 cache (at least near function entry) and be pretty darn fast anyway. If in doubt, profile it later. Performance is a tricksy beast, especially once you go multi-core. It's not that big a deal, though, in this case. But const& is still the safest default parameter style, until you can prove that copies are cheaper than a stack indirection (or that moves are, and you can always move). Don't even get me started on the people who think nothing of passing shared_ptrs around by value.

This is a misfeature and is inconsistent with std::string_view.
I just made a ticket for it here
https://svn.boost.org/trac/boost/ticket/12917
On 17 March 2017 at 07:29, Simon Sasburg via Boost
I've just tested our codebase with the 1.64.0 master snapshot, and i've encountered an issue which i've tracked down to this commit:
https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd4696 1f76d2cb55
Basically this removes the constructor which allows constructing a boost::string_ref from an rvalue std::string. I am wondering if this is really that dangerous. In our codebase we almost exlusively use boost::string_ref as function parameters, with the understanding that passed string_refs parameters are only valid within the function body. (so you need to copy it into an std::string if you need to access it afterwards) With these constraints i think it is OK to construct a string_ref from an rvalue std::string because the temporary should live long enough. (but please correct me if i'm wrong here)
Is the safety (?) added by this commit worth it if it breaks working code?
Simon
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

On Fri, Mar 17, 2017 at 12:29 AM, Simon Sasburg via Boost < boost@lists.boost.org> wrote:
I've just tested our codebase with the 1.64.0 master snapshot, and i've encountered an issue which i've tracked down to this commit:
https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd4696 1f76d2cb55
Basically this removes the constructor which allows constructing a boost::string_ref from an rvalue std::string. I am wondering if this is really that dangerous. In our codebase we almost exlusively use boost::string_ref as function parameters, with the understanding that passed string_refs parameters are only valid within the function body. (so you need to copy it into an std::string if you need to access it afterwards) With these constraints i think it is OK to construct a string_ref from an rvalue std::string because the temporary should live long enough. (but please correct me if i'm wrong here)
Is the safety (?) added by this commit worth it if it breaks working code?
I'll also point out that that commit was in response to this bug report: https://svn.boost.org/trac/boost/ticket/11740 -- Marshall

I'll also point out that that commit was in response to this bug report: https://svn.boost.org/trac/boost/ticket/11740
I see. Using boost::string_ref in that way (local variable) would raise a big red flag for us. So i understand that the change does add some safety, but it also breaks usages that are completely fine. We 'solved' the safety issue with a really simple rule (only use boost::string_ref as function parameters).

On Mar 24, 2017 08:23, "Simon Sasburg via Boost"
I'll also point out that that commit was in response to this bug report: https://svn.boost.org/trac/boost/ticket/11740
I see. Using boost::string_ref in that way (local variable) would raise a big red flag for us. So i understand that the change does add some safety, but it also breaks usages that are completely fine. We 'solved' the safety issue with a really simple rule (only use boost::string_ref as function parameters). _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost That's exactly what I said in my blog post. Function parameters are a special use case that require a different type, that type is std::string_view, for everything else a boost::string_ref that doesn't bind to rvalues could be repurposed.

On 24 March 2017 at 07:57, Jonathan Müller via Boost
That's exactly what I said in my blog post. Function parameters are a special use case that require a different type, that type is std::string_view, for everything else a boost::string_ref that doesn't bind to rvalues could be repurposed.
That sounds like a pernickety solution to me. I'd find it easier to remember the risks than have to remember which type to use at any particular moment. Regardless, my main concern would be that string_view isn't compatible with the standard. Generally when I see a Boost version of a standard library object I expect it be reasonably close to a drop in replacement.

On Fri, Mar 24, 2017 at 2:44 AM, Daniel James via Boost < boost@lists.boost.org> wrote:
On 24 March 2017 at 07:57, Jonathan Müller via Boost
wrote: That's exactly what I said in my blog post. Function parameters are a special use case that require a different type, that type is std::string_view, for everything else a boost::string_ref that doesn't bind to rvalues could be repurposed.
That sounds like a pernickety solution to me. I'd find it easier to remember the risks than have to remember which type to use at any particular moment.
Regardless, my main concern would be that string_view isn't compatible with the standard. Generally when I see a Boost version of a standard library object I expect it be reasonably close to a drop in replacement.
I have reverted this change for this release. -- Marshall

On Fri, Mar 24, 2017 at 4:39 AM, Marshall Clow via Boost
On Fri, Mar 17, 2017 at 12:29 AM, Simon Sasburg via Boost < boost@lists.boost.org> wrote:
I've just tested our codebase with the 1.64.0 master snapshot, and i've encountered an issue which i've tracked down to this commit:
https://github.com/boostorg/utility/commit/9960d9f395b79ee860e39064cd4696 1f76d2cb55
Basically this removes the constructor which allows constructing a boost::string_ref from an rvalue std::string. I am wondering if this is really that dangerous. In our codebase we almost exlusively use boost::string_ref as function parameters, with the understanding that passed string_refs parameters are only valid within the function body. (so you need to copy it into an std::string if you need to access it afterwards) With these constraints i think it is OK to construct a string_ref from an rvalue std::string because the temporary should live long enough. (but please correct me if i'm wrong here)
Is the safety (?) added by this commit worth it if it breaks working code?
I'll also point out that that commit was in response to this bug report: https://svn.boost.org/trac/boost/ticket/11740
The regression breaks perfectly valid code. Could it be reverted? -- Olaf
participants (11)
-
Andrey Semashev
-
Bo Persson
-
Daniel James
-
Gavin Lambert
-
Jonathan Coe
-
Jonathan Müller
-
Marshall Clow
-
Mathias Gaunard
-
Olaf van der Spek
-
Simon Sasburg
-
Tim Song