Re: [boost] Heads up - string_ref landing

Marshall. low <mclow.lists. at> gmail.com> writes:
Please let me know what you think!
As probably everyone, we have our own device for this. Here are some points and experience we gathered. 1. This class is essentially just an iterator_ range<char*> (modulo template for const/wchar_t), so it should either inherit from it or have corresponding converting ctors/operators. In this sense Olaf/Gennadiy's remarks are pretty valid. OTOH size is needed very frequently and having it precomputed is a good thing, so conversion approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important. 2. Given the above, the name is misleading as it's not a reference to std::string. We use name char_range. 3. It's worth having a static constructor 'literal' (templated with size) to construct char_ranges from literals - as the compiler knows their size in compile time (minus zero terminator). It can be a constexpr too. Same manner - static function 'from_array', embrasing an array of chars in whole, assuming there is no zero terminator - useful for working with structures representing messages in char-based protocols with fixed-width fields. 4. (hack) We had to add a member function 'assign' (this is just operator=) to make boost::tokenizer return char_ranges. I don't like this part at all as it contradicts the meaning of say std::vector::assign, so I'd prefer another solution, probably in form of a customization point in boost::tokenizer. Thanks, Maxim

2012/11/16 Yanchenko Maxim <maximyanchenko@yandex.ru>:
Marshall. low <mclow.lists. at> gmail.com> writes:
Please let me know what you think!
As probably everyone, we have our own device for this. Here are some points and experience we gathered.
1. This class is essentially just an iterator_ range<char*> (modulo template for const/wchar_t), so it should either inherit from it or have corresponding converting ctors/operators. In this sense Olaf/Gennadiy's remarks are pretty valid. OTOH size is needed very frequently and having it precomputed is a good thing, so conversion approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators.
2. Given the above, the name is misleading as it's not a reference to std::string. We use name char_range.
Better then other names, but at first glance it is not clear, that char_range can be used as string.
3. It's worth having a static constructor 'literal' (templated with size) to construct char_ranges from literals - as the compiler knows their size in compile time (minus zero terminator). It can be a constexpr too. Same manner - static function 'from_array', embrasing an array of chars in whole, assuming there is no zero terminator - useful for working with structures representing messages in char-based protocols with fixed-width fields.
Instead of 'literal' I'd propose a following constructor: template <size_type N> explicit string_ref(const Char (&str)[N]); As I know, lots of people are unhappy with current design of std::basic_string. They think that it has too many member functions in it. If those functions were also implemented as free, more containers would be able to reuse them (they can be reused by basic_string implementation in Boost.Containers, by string_ref implementation). May be authors of Boost.StringAlgo, Boost.Container, Boost.StringRef cooperate for better code reuse? -- Best regards, Antony Polukhin

On November 16, 2012 10:27:33 AM Antony Polukhin <antoshkka@gmail.com> wrote:
2012/11/16 Yanchenko Maxim <maximyanchenko@yandex.ru>:
Marshall. low <mclow.lists. at> gmail.com> writes:
Please let me know what you think!
As probably everyone, we have our own device for this. Here are some points and experience we gathered.
1. This class is essentially just an iterator_ range<char*> (modulo template for const/wchar_t), so it should either inherit from it or have corresponding converting ctors/operators. In this sense Olaf/Gennadiy's remarks are pretty valid. OTOH size is needed very frequently and having it precomputed is a good thing, so conversion approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators.
As long as iterator_range uses the begin()/end() protocol and string_ref has the corresponding member functions it should work without any special constructors/operators, shouldn't it?
2. Given the above, the name is misleading as it's not a reference to std::string. We use name char_range.
Better then other names, but at first glance it is not clear, that char_range can be used as string.
I'll add external_string or ext_string as alternative (because it operates on characters in an external storage).
3. It's worth having a static constructor 'literal' (templated with size) to construct char_ranges from literals - as the compiler knows their size in compile time (minus zero terminator). It can be a constexpr too. Same manner - static function 'from_array', embrasing an array of chars in whole, assuming there is no zero terminator - useful for working with structures representing messages in char-based protocols with fixed-width fields.
Instead of 'literal' I'd propose a following constructor:
template <size_type N> explicit string_ref(const Char (&str)[N]);
-1 for constructor, +1 for static member or free function generator. Restricting it to constant string literals involves SFINAE machinery (which definitely is worth being implemented in a library rather being written by users). But even with SFINAE it's not 100% bullet proof. The user has to explicitly show that he knows that the characters are a literal and he knows what he is doing. FWIW, I've written my own basic_string_literal class that only operates on string literals and has the necessary SFINAE protection and a free function generator. It pretty much resembles a read-only std::string otherwise.

16.11.12, 10:30, "Antony Polukhin":
Better then other names, but at first glance it is not clear, that char_range can be used as string.
Depends on what you understand by string. It can't grow, for example. No name can embrace all features.
3. It's worth having a static constructor 'literal' (templated with size) to construct char_ranges from literals - as the compiler knows their size in compile time (minus zero terminator). It can be a constexpr too. Same manner - static function 'from_array', embrasing an array of chars in whole, assuming there is no zero terminator - useful for working with structures representing messages in char-based protocols with fixed-width fields.
Instead of 'literal' I'd propose a following constructor:
template <size_type N> explicit string_ref(const Char (&str)[N]);
See the difference: char s[20]="ab\0c"; char_range(s); // size 2 because of strlen inside, runtime char_range::literal( "ab\0c" ); // size 4, compile time char_range::from_array(s); // size 20, compile time char_range::from_array( "ab\0c" ); // size 5, compile time
As I know, lots of people are unhappy with current design of std::basic_string. They think that it has too many member functions in it. If those functions were also implemented as free, more containers would be able to reuse them (they can be reused by basic_string implementation in Boost.Containers, by string_ref implementation). May be authors of Boost.StringAlgo, Boost.Container, Boost.StringRef cooperate for better code reuse?
I believe Boost.StringAlgo already does the job. Thanks, Maxim

On Fri, Nov 16, 2012 at 6:30 AM, Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com> wrote:
How is this different from (say) contiguous_range< char > / contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref.
Does contiguous_range exist? On Fri, Nov 16, 2012 at 7:27 AM, Antony Polukhin <antoshkka@gmail.com> wrote:
1. This class is essentially just an iterator_ range<char*> (modulo template for const/wchar_t), so it should either inherit from it or have corresponding converting ctors/operators. In this sense Olaf/Gennadiy's remarks are pretty valid. OTOH size is needed very frequently and having it precomputed is a good thing, so conversion
Size is cheap to calculate, it's not worth storing it. Standardizing the layout of this type might be good too for ABI stability and maybe interoperability.
approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators.
Shouldn't they be implicit? void f(str_ref); string s = "Olaf"; f(s); You don't want to have to wrap all those arguments in an explicit constructor.
2. Given the above, the name is misleading as it's not a reference to std::string. We use name char_range.
Better then other names, but at first glance it is not clear, that char_range can be used as string.
I went for str_ref. It's a bit shorter and the link to string is a bit weaker. There's also mutable_str_ref for non-const char and data_ref for and mutable_data_ref for unsigned char. -- Olaf

2012/11/16 Andrey Semashev <andrey.semashev@gmail.com>:
On November 16, 2012 10:27:33 AM Antony Polukhin <antoshkka@gmail.com> wrote:
2012/11/16 Yanchenko Maxim <maximyanchenko@yandex.ru>:
Marshall. low <mclow.lists. at> gmail.com> writes:
Please let me know what you think!
As probably everyone, we have our own device for this. Here are some points and experience we gathered.
1. This class is essentially just an iterator_ range<char*> (modulo template for const/wchar_t), so it should either inherit from it or have corresponding converting ctors/operators. In this sense Olaf/Gennadiy's remarks are pretty valid. OTOH size is needed very frequently and having it precomputed is a good thing, so conversion approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators. (explicit or not?)
As long as iterator_range uses the begin()/end() protocol and string_ref has the corresponding member functions it should work without any special constructors/operators, shouldn't it?
Yes, but constructing string_ref from iterator_range requires additional constructor. 2012/11/16 Yanchenko Maxim <maximyanchenko@yandex.ru>:
See the difference:
char s[20]="ab\0c"; char_range(s); // size 2 because of strlen inside, runtime char_range::literal( "ab\0c" ); // size 4, compile time char_range::from_array(s); // size 20, compile time char_range::from_array( "ab\0c" ); // size 5, compile time
Missed that difference at first time.
approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators.
Shouldn't they be implicit?
void f(str_ref);
string s = "Olaf"; f(s);
Looks nice. And what about constructors from std::array? BTW, I'd like to see a std::basic_string to_string()/str() and std::array to_array()/array() member functions. -- Best regards, Antony Polukhin

On Fri, Nov 16, 2012 at 12:52 PM, Antony Polukhin <antoshkka@gmail.com> wrote:
2012/11/16 Andrey Semashev <andrey.semashev@gmail.com>:
As long as iterator_range uses the begin()/end() protocol and string_ref has the corresponding member functions it should work without any special constructors/operators, shouldn't it?
Yes, but constructing string_ref from iterator_range requires additional constructor.
string_ref could probably use the same protocol with the restriction for iterator type being a pointer. Random access is not enough because a contiguous array of characters is needed.

On Fri, Nov 16, 2012 at 9:52 AM, Antony Polukhin <antoshkka@gmail.com> wrote:
2012/11/16 Andrey Semashev <andrey.semashev@gmail.com>:
+1 for having corresponding explicit converting ctors/operators.
Shouldn't they be implicit?
void f(str_ref);
string s = "Olaf"; f(s);
Looks nice. And what about constructors from std::array?
It should be implicitly constructable from any contiguous (char) range.
BTW, I'd like to see a std::basic_string to_string()/str() and std::array to_array()/array() member functions.
Why member functions? Array is fixed size, str_ref isn't. How would you handle a size mismatch? -- Olaf

16.11.12, 12:30, "Olaf van der Spek":
OTOH size is needed very frequently and having it precomputed is a good thing, so conversion
Size is cheap to calculate, it's not worth storing it. Standardizing the layout of this type might be good too for ABI stability and maybe interoperability.
Yes, it's cheap to calculate (end ptr is equally cheap to calculate btw) but you'll end up calculating it all the time in almost every function. At least my analysis showed that in real app size() is called much more frequently than end(). OTOH, storing only pointers is conceptually cleaner.
approach seems to be better (but then we lose passing by reference as iterator_range, type_traits etc). I'm not sure what's more important.
+1 for having corresponding explicit converting ctors/operators.
Shouldn't they be implicit?
Not from std::string. Same argument as for not having implicit conversion to char*.
There's also mutable_str_ref for non-const char and data_ref for and mutable_data_ref for unsigned char.
These are typedefs I assume? Thanks, Maxim

On Fri, Nov 16, 2012 at 10:28 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
16.11.12, 12:30, "Olaf van der Spek":
OTOH size is needed very frequently and having it precomputed is a good thing, so conversion
Size is cheap to calculate, it's not worth storing it. Standardizing the layout of this type might be good too for ABI stability and maybe interoperability.
Yes, it's cheap to calculate (end ptr is equally cheap to calculate btw) but you'll end up calculating it all the time in almost every function. At least my analysis showed that in real app size() is called much more frequently than end().
Really? Do you use subscripted access then? With iterator access you'd use end().
OTOH, storing only pointers is conceptually cleaner.
You can derive size in bytes and memory range from a ptr pair without type info. That may be handy when debugging and in other situations. You can't do that with (ptr, size) without type info.
Shouldn't they be implicit?
Not from std::string. Same argument as for not having implicit conversion to char*.
What argument would that be?
There's also mutable_str_ref for non-const char and data_ref for and mutable_data_ref for unsigned char.
These are typedefs I assume?
Yes, there are. -- Olaf

16.11.12, 13:37, "Olaf van der Spek" <ml@vdspek.org>":
On Fri, Nov 16, 2012 at 10:28 AM
Yes, it's cheap to calculate (end ptr is equally cheap to calculate btw) but you'll end up calculating it all the time in almost every function. At least my analysis showed that in real app size() is called much more frequently than end().
Really? Do you use subscripted access then? With iterator access you'd use end().
Not only subscripted access. Taking a subrange also requires knowing size. Copying from/to (read memcpy) - same. Filling (read memset) - same. Comparing (read memcmp) - same. (char_range is an optimization technique so we aim for maximum speed. If you don't maximize speed you'd be happy with simple and safe std::string copies.) end() is needed only in operations at the end of the range or sequential operations on the range as a whole.
OTOH, storing only pointers is conceptually cleaner.
You can derive size in bytes and memory range from a ptr pair without type info. That may be handy when debugging and in other situations. You can't do that with (ptr, size) without type info.
Sorry, I don't understand this, could you elaborate, please?
Shouldn't they be implicit?
Not from std::string. Same argument as for not having implicit conversion to char*.
What argument would that be?
You are giving away a reference to string internals that are subject to change/die anytime. Making it explicit and visible in the caller code ensures that the programmer will take special measures to make sure that the string doesn't change/die while there's a char_range looking into it. Consider std::vector<char_range>, for example. For the same reason we have explicit char_range::literal and char_range::from_array. Thanks, Maxim

On Fri, Nov 16, 2012 at 11:31 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Not only subscripted access. Taking a subrange also requires knowing size. Copying from/to (read memcpy) - same. Filling (read memset) - same. Comparing (read memcmp) - same.
Those are C-style constructs. The C++-style equivalents are iterator-based.
(char_range is an optimization technique so we aim for maximum speed. If you don't maximize speed you'd be happy with simple and safe std::string copies.) end() is needed only in operations at the end of the range or sequential operations on the range as a whole.
OTOH, storing only pointers is conceptually cleaner.
You can derive size in bytes and memory range from a ptr pair without type info. That may be handy when debugging and in other situations. You can't do that with (ptr, size) without type info.
Sorry, I don't understand this, could you elaborate, please?
Suppose you have two pointers, 0xa0 (begin) and 0xb0 (end). The size in bytes is 0x10. Suppose you have one pointer (0xa0) and one size (0x10). Does this point to the same memory? Yes if sizeof(value_type) == 1, no otherwise. You can't tell to what memory range it points without knowing sizeof(value_type)
Shouldn't they be implicit?
Not from std::string. Same argument as for not having implicit conversion to char*.
What argument would that be?
You are giving away a reference to string internals that are subject to change/die anytime.
Isn't that by definition for a reference? It applies to const string& too. I don't think that's a good reason.
Making it explicit and visible in the caller code ensures that the programmer will take special measures to make sure that the string doesn't change/die while there's a char_range looking into it.
Consider std::vector<char_range>, for example.
For the same reason we have explicit char_range::literal and char_range::from_array.
I'd like this to work: void f(str_ref); f("Olaf"); -- Olaf

On 16/11/12 11:43, Olaf van der Spek wrote:
I'd like this to work: void f(str_ref);
f("Olaf");
It definitely needs to work, otherwise string_ref is useless. The whole point is that it should behave like std::string but avoid useless copies.

16.11.12, 17:30, "Mathias Gaunard" <mathias.gaunard@ens-lyon.org>":
On 16/11/12 11:43, Olaf van der Spek wrote:
I'd like this to work: void f(str_ref);
f("Olaf");
It definitely needs to work, otherwise string_ref is useless. The whole point is that it should behave like std::string but avoid useless copies.
If this is all you need, just use plain references. Thanks, Maxim

On 16/11/12 15:13, Yanchenko Maxim wrote:
16.11.12, 17:30, "Mathias Gaunard" <mathias.gaunard@ens-lyon.org>":
On 16/11/12 11:43, Olaf van der Spek wrote:
I'd like this to work: void f(str_ref);
f("Olaf");
It definitely needs to work, otherwise string_ref is useless. The whole point is that it should behave like std::string but avoid useless copies.
If this is all you need, just use plain references.
void f(std::string const&); f("Olaf"); does make a useless copy.

Olaf van der Spek <ml <at> vdspek.org> writes:
On Fri, Nov 16, 2012 at 3:25 PM, Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> wrote:
void f(std::string const&); f("Olaf");
Are compilers/optimizers not smart enough to construct the temporary object at compile time?
I'm afraid they are not smart enough to eliminate an unneeded temporary when it's something sophisticated like std::string... Maxim

On Nov 16, 2012, at 7:00 AM, Maxim Yanchenko <MaximYanchenko@yandex.ru> wrote:
Olaf van der Spek <ml <at> vdspek.org> writes:
On Fri, Nov 16, 2012 at 3:25 PM, Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> wrote:
void f(std::string const&); f("Olaf");
Are compilers/optimizers not smart enough to construct the temporary object at compile time?
I'm afraid they are not smart enough to eliminate an unneeded temporary when it's something sophisticated like std::string…
Actually, this is a really good example. void f(std::string const&); void g(string_ref); f("Olaf"); g("Olaf"); In the call to "f", the compiler will create a temporary std::string. This will involve a call to strlen (possibly done at compile time), a memory allocation (modulo the small-string optimization), and copying the data. In the call to "g", the compiler will create a temporary string_ref. This will involve a call to strlen (possibly done at compile time). And for most cases, the code internal to "f" and "g" won't care that they got a string_ref instead of a const std::string. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

16.11.2012, 19:30, "Marshall Clow" <mclow.lists@gmail.com>:
And for most cases, the code internal to "f" and "g" won't care that they got a string_ref instead of a  const std::string.
In many, not most, and definitely not all. Here is an example I gave to Olaf: std::vector<char_range> v; v.push_back( "foo" ); // OK - lifetime of string literal is infinite v.push_back( std::string("bar") ); // BOOM { std::string s = "foobar"; v.push_back(s); // BOOM } When it comes to ownership, explicit is better than implicit. Every time you want to store a char_range for future use, you want to be 100% sure that everything it's pointing at will live longer than char_range. How can you enforce it if everything is implicit? Forget about help from compiler, you won't even see the places of such ownership leaks! Making char_range ctor explicit makes a programmer to think twice: "OK, here I'm converting this string to a char_range - is it supposed to live longer than my string? Will my string live long enough and *unchanged* (no resize/reallocation)?" char_range is an optimization tool, and when used with std::string it causes its abstraction to leak as it utilizes the hidden knowledge that the source std::string 1) will live longer than char_range 2) won't resize/realloc its internal buffer as the char_range points directly into it It's essentially a naked pointer into volatile memory and should be used with the same level of (pre)caution (and better not used at all until you need to optimize speed - otherwise just good old *safe* copies of std::string should be used by default). Thanks, Maxim

On Fri, Nov 16, 2012 at 7:18 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
[...] char_range is an optimization tool, and when used with std::string it causes its abstraction to leak as it utilizes the hidden knowledge that the source std::string [...]
I do not see it as an optimization tool. As written in the first paragraph of the paper it is intended to remove such nonsense in the future: void open(const char *p, ios_base::openmode = ...); void open(const std::string &p, ios_base::openmode = ...); // added in C++11 runtime_error(const char *p); runtime_error(const std::string &p); // added in C++11 This discussion has already been raised here in the context of boost program_options. -- Yakov

17.11.2012, 02:29, "Yakov Galka" <ybungalobill@gmail.com>:
On Fri, Nov 16, 2012 at 7:18 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
I do not see it as an optimization tool. As written in the first paragraph of the paper it is intended to remove such nonsense in the future:
void open(const char *p, ios_base::openmode = ...); void open(const std::string &p, ios_base::openmode = ...); // added in C++11
runtime_error(const char *p); runtime_error(const std::string &p); // added in C++11
First, it wasn't a big deal to write c_str() while sending an std::string to open(). Second, instead of adding an overload, they could just replace it with std::string version - everything would work as expected (except code that uses exact signatures). Or you care about unnecessary std::string creation? Then it's optimization. (I believe you don't really care about it in face of file operations.) Again, char_range utilizes unsafe knowledge about std::string internals. This is generally a bad thing and should be considered only when you want to optimize something. If you care not about speed but about clean and *generic* interface (just std::string is already clean enough), then you need a templated open(iterator_range, openmode) where iterator_range can be any char sequence (e.g. coming from expression-template code like dir+'/'+filename+'.'+ext). But it's clearly an overkill, especially given that inside open() there is a call to an API function that takes filename as a zero-terminated C string - therefore you'll need to do c_str() anyway.
This discussion has already been raised here in the context of boost program_options.
Could you give me a reference subj or a link, please? Thanks, Maxim

On Fri, Nov 16, 2012 at 8:50 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
On Fri, Nov 16, 2012 at 7:18 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
I do not see it as an optimization tool. As written in the first
17.11.2012, 02:29, "Yakov Galka" <ybungalobill@gmail.com>: paragraph
of the paper it is intended to remove such nonsense in the future:
void open(const char *p, ios_base::openmode = ...); void open(const std::string &p, ios_base::openmode = ...); // added in C++11
runtime_error(const char *p); runtime_error(const std::string &p); // added in C++11
First, it wasn't a big deal to write c_str() while sending an std::string to open().
I agree. I would leave it a const char *. Second, instead of adding an overload, they could just replace it with
std::string version - everything would work as expected (except code that uses exact signatures).
I guess you cannot use exact signatures of member functions (the implementation is permitted to add arbitrary default parameters). This will, however, break existing code. When you call f.open("xyz") no std::string is constructed in C++03, so no operator new is called. Changing it to const std::string & would change this behavior. Or you care about unnecessary std::string creation? Then it's optimization.
(I believe you don't really care about it in face of file operations.)
I do not care for the file case, but for other things that accept strings it may matter (like runtime_error -- it adds one more heap allocation. This is important in error handling, where you do not want to increase the likelihood of failing to create an exception object). [...] If you care not about speed but about clean and *generic* interface
(just std::string is already clean enough), then you need a templated open(iterator_range, openmode) where iterator_range can be any char sequence (e.g. coming from expression-template code like dir+'/'+filename+'.'+ext).
I care for as generic code as possible within the limits of separate compilation.
[...]
This discussion has already been raised here in the context of boost program_options.
Could you give me a reference subj or a link, please?
http://boost.2283326.n4.nabble.com/program-options-Some-methods-take-const-c... -- Yakov

On 16/11/12 16:00, Maxim Yanchenko wrote:
Olaf van der Spek <ml <at> vdspek.org> writes:
On Fri, Nov 16, 2012 at 3:25 PM, Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> wrote:
void f(std::string const&); f("Olaf");
Are compilers/optimizers not smart enough to construct the temporary object at compile time?
I'm afraid they are not smart enough to eliminate an unneeded temporary when it's something sophisticated like std::string...
Doing that kind of transformation, assuming memory requirements were relaxed to actually allow them, would require full-blown static analysis on all of f's body. So no, you can't expect any compiler to do it.

Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> writes:
If this is all you need, just use plain references.
void f(std::string const&); f("Olaf");
does make a useless copy.
OK. I thought you were talking about Olaf's desire to have implicit conversion from std::string. For this we use just f(char_range::literal("Olaf")); If it's possible to detect in compile time that our argument is a literal, I'm happy to make such a ctor implicit, but I doubt it's possible. Maxim

On Fri, Nov 16, 2012 at 6:50 PM, Maxim Yanchenko <maximyanchenko@yandex.ru> wrote:
For this we use just f(char_range::literal("Olaf"));
Sorry for jumping in but shouldn't it be from_literal? It's more aligned with from_array you suggested before.

Andrey Semashev <andrey.semashev <at> gmail.com> writes:
For this we use just f(char_range::literal("Olaf"));
Sorry for jumping in but shouldn't it be from_literal? It's more aligned with from_array you suggested before.
Yes, makes sense, though a bit lengthy. Maybe both should be just 'literal' and 'array', without 'from_'. I don't feel too strong about it, just want them to be explicit and visible. Thanks, Maxim

On 16/11/12 15:50, Maxim Yanchenko wrote:
Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> writes:
If this is all you need, just use plain references.
void f(std::string const&); f("Olaf");
does make a useless copy.
OK. I thought you were talking about Olaf's desire to have implicit conversion from std::string. For this we use just f(char_range::literal("Olaf"));
basic_string_ref<T> is meant to when being implicitly constructed from a T(&)[N] or T const* that it is null-terminated. I don't even understand why people are still discussing this and making ridiculous proposals. If you want the above use boost::iterator_range<char*>.

On November 18, 2012 1:29:06 AM Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 16/11/12 15:50, Maxim Yanchenko wrote:
Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> writes:
If this is all you need, just use plain references.
void f(std::string const&); f("Olaf");
does make a useless copy.
OK. I thought you were talking about Olaf's desire to have implicit conversion from std::string. For this we use just f(char_range::literal("Olaf"));
basic_string_ref<T> is meant to when being implicitly constructed from a T(&)[N] or T const* that it is null-terminated.
I don't even understand why people are still discussing this and making ridiculous proposals. If you want the above use boost::iterator_range<char*>.
There is no way to safely detect a string literal, so unless you want to strlen it anyway the construction has to be explicit.

On 17/11/12 23:34, Andrey Semashev wrote:
There is no way to safely detect a string literal, so unless you want to strlen it anyway the construction has to be explicit.
There is no need to detect anything. string_ref is defined like so string_ref::string_ref(const char* s) : ptr(s), size(strlen(s)) {} And it's like this by design. string_ref doesn't even have construction from an arbitrary range in N3334. But if it were to be added, the behaviour wrt string literals would need to stay the same. The fact that you can't tell apart an array from a string literal using types is irrelevant.

There is no need to detect anything.
string_ref is defined like so
string_ref::string_ref(const char* s) : ptr(s), size(strlen(s)) {}
Mathias, it's not just compile-time size (which is good to have as well, btw). What about literals with a zero character inside, like "foo\0bar"? I believe string_ref should be able to handle the case, in order to be useful. That's why I'm proposing literal and from_array constructor functions (please see my other email with examples). And I'd like to see them explicit for clear expression of ownership (while Olaf seems to be in strong opposition to this). Maxim

On Mon, Nov 19, 2012 at 4:42 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Mathias, it's not just compile-time size (which is good to have as well, btw). What about literals with a zero character inside, like "foo\0bar"?
What about them? Such literals aren't proper null-terminated strings, are they?
I believe string_ref should be able to handle the case, in order to be useful.
For such literals the (const char*, size_t) constructor should be used. -- Olaf

On Mon, Nov 19, 2012 at 4:49 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Mon, Nov 19, 2012 at 4:42 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Mathias, it's not just compile-time size (which is good to have as well, btw). What about literals with a zero character inside, like "foo\0bar"?
What about them? Such literals aren't proper null-terminated strings, are they?
I believe string_ref should be able to handle the case, in order to be useful.
For such literals the (const char*, size_t) constructor should be used.
This is simply not practical: // code duplication string_ref lit("Hello, World!", sizeof("Hello, World!")); wstring_ref lit(L"Hello, World!", sizeof(L"Hello, World!") / sizeof(wchar_t)); // error prone string_ref lit("Hello, World!", 13); Although I don't aim to address the \0 in the middle of the string literal, I'd like such construction to be cleaner and preferably without strlen/wcslen: string_ref lit = string_ref::from_literal("Hello, World!"); or string_ref lit = make_literal_ref("Hello, World!"); or whatever. Not every compiler is able to optimize away strlen/wcslen on a literal, this generator is intended to remove its need altogether. PS: I don't really understand why we are arguing about this little helper anyway. There are people who will find it useful, others can safely ignore it.

On Sat, Nov 17, 2012 at 11:29 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 16/11/12 15:50, Maxim Yanchenko wrote:
Mathias Gaunard <mathias.gaunard <at> ens-lyon.org> writes:
If this is all you need, just use plain references.
void f(std::string const&); f("Olaf");
does make a useless copy.
OK. I thought you were talking about Olaf's desire to have implicit conversion from std::string. For this we use just f(char_range::literal("Olaf"))**;
basic_string_ref<T> is meant to when being implicitly constructed from a T(&)[N] or T const* that it is null-terminated.
Important point! Current wording does not require the string_ref to point to a null terminated string. This (a) enables working with substrings, (b) makes it unclear how it is better than iterator_range, and (c) does not solve the problem for the fstream::fopen(const char*/std::string&) case. Do you really think that trading (c) for (a) is a good idea? -- Yakov

On Fri, Nov 23, 2012 at 1:07 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
Important point! Current wording does not require the string_ref to point to a null terminated string. This (a) enables working with substrings, (b) makes it unclear how it is better than iterator_range, and (c) does not solve the problem for the fstream::fopen(const char*/std::string&) case.
Do you really think that trading (c) for (a) is a good idea?
Yes, I do. If you'd like to have a zstring_ref, you should write one. ;) A string_ref that's not required to be null-terminated is far more useful. -- Olaf

On Fri, Nov 23, 2012 at 7:17 AM, Olaf van der Spek <ml@vdspek.org> wrote:
On Fri, Nov 23, 2012 at 1:07 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
Important point! Current wording does not require the string_ref to point to a null terminated string. This (a) enables working with substrings, (b) makes it unclear how it is better than iterator_range, and (c) does not solve the problem for the fstream::fopen(const char*/std::string&) case.
Do you really think that trading (c) for (a) is a good idea?
Yes, I do. If you'd like to have a zstring_ref, you should write one. ;)
A string_ref that's not required to be null-terminated is far more useful.
-- Olaf
It does (to me at least) raise the question of whether string_ref is the correct name, as it is missing what might be seen as a fundamental aspect of strings - null termination. Tony

AMDG On 11/23/2012 02:55 PM, Gottlob Frege wrote:
On Fri, Nov 23, 2012 at 7:17 AM, Olaf van der Spek <ml@vdspek.org> wrote:
A string_ref that's not required to be null-terminated is far more useful.
It does (to me at least) raise the question of whether string_ref is the correct name, as it is missing what might be seen as a fundamental aspect of strings - null termination.
fundamental? hardly. std::string is not guaranteed to be null terminated. Besides, if you want a null-terminated string without ownership, const char* works just fine. In Christ, Steven Watanabe

On Sat, Nov 24, 2012 at 1:25 AM, Steven Watanabe <watanabesj@gmail.com>wrote:
AMDG
On 11/23/2012 02:55 PM, Gottlob Frege wrote:
On Fri, Nov 23, 2012 at 7:17 AM, Olaf van der Spek <ml@vdspek.org> wrote:
A string_ref that's not required to be null-terminated is far more useful.
It does (to me at least) raise the question of whether string_ref is the correct name, as it is missing what might be seen as a fundamental aspect of strings - null termination.
fundamental? hardly. std::string is not guaranteed to be null terminated.
Guaranteed (almost) since C++11. c_str() and data() now *mean the same thing*, and s[s.size()] returned null even in C++03. The only thing that is not guaranteed is that *(&s[0] + s.size()) is null. Anyway the point is that you can retrieve a null terminated string from std::string in constant time, without copying anything. So it is practically null terminated.
Besides, if you want a null-terminated string without ownership, const char* works just fine.
It does not, as you cannot implicitly convert std::string (or anything else suitable) to const char*. -- Yakov

On Sat, Nov 24, 2012 at 8:42 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
fundamental? hardly. std::string is not guaranteed to be null terminated.
Guaranteed (almost) since C++11. c_str() and data() now *mean the same thing*,
Really? Got a reference for that?
and s[s.size()] returned null even in C++03.
Is that defined behaviour? You're essentially dereferencing end().
The only thing that is not guaranteed is that *(&s[0] + s.size()) is null. Anyway the point is that you can retrieve a null terminated string from std::string in constant time, without copying anything. So it is practically null terminated.
True, but that's std::string. Not all strings are std::string. " A string_ref that's not required to be null-terminated is far more useful." still holds. -- Olaf

On Sat, Nov 24, 2012 at 3:17 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Sat, Nov 24, 2012 at 8:42 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
fundamental? hardly. std::string is not guaranteed to be null terminated.
Guaranteed (almost) since C++11. c_str() and data() now *mean the same thing*,
Really? Got a reference for that?
See C++11 [string.accessors]/1-3.
and s[s.size()] returned null even in C++03.
Is that defined behaviour? You're essentially dereferencing end().
Yes, it is and was well defined, and no, this is just not the way it is phrased. See C++98 and C++03 in [lib.string.access]/1 where this is the case for the const version only, and C++11 [string.access]/1-2 which corrects the wording for the non-const version too.
The only thing that is
not guaranteed is that *(&s[0] + s.size()) is null. Anyway the point is that you can retrieve a null terminated string from std::string in constant time, without copying anything. So it is practically null terminated.
True, but that's std::string. Not all strings are std::string.
" A string_ref that's not required to be null-terminated is far more useful." still holds
True, but then it is unclear how it is string specific compared to iterator_range. Or said otherwise, it solves a different problem. -- Yakov

24.11.2012 2:55, Gottlob Frege пишет:
It does (to me at least) raise the question of whether string_ref is the correct name, as it is missing what might be seen as a fundamental aspect of strings - null termination. Btw, what about boost::substring, boost::wsubstring and boost::basic_substring<T>? For me, it means a part of the existing string with string-like API.
-- Best regards, Sergey Cheban

On Nov 23, 2012, at 7:17 AM, Olaf van der Spek <ml@vdspek.org> wrote:
On Fri, Nov 23, 2012 at 1:07 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
Important point! Current wording does not require the string_ref to point to a null terminated string. This (a) enables working with substrings, (b) makes it unclear how it is better than iterator_range, and (c) does not solve the problem for the fstream::fopen(const char*/std::string&) case.
Do you really think that trading (c) for (a) is a good idea?
Yes, I do. If you'd like to have a zstring_ref, you should write one. ;)
A string_ref that's not required to be null-terminated is far more useful.
+1 There are many use cases in which start + length/end, without expecting null termination is useful. In that design, substring is like an iterator pair with convenient string syntax. That permits extracting an iterator pair, the length, a substring, etc. while offering concatenation and other common string operations, none of which require a null terminator. IOW, you get a useful and convenient subset of std::string's interface, plus substring support. There are times when a null terminated string is required. Legacy APIs are usually the problem. Thus, it is necessary to convert. A separate type for that is appropriate. If constructed from a null terminated string_ref, it can simply reference the string_ref's data. Otherwise, it can copy the data to the free store or use the SBO to avoid the allocation. There's no extra overhead with that approach (unless program logic allows you to omit the length and termination checks, but then you could just use a char const * directly). Whether you choose to make that conversion via a string_ref member function or a converting constructor of the other class is open for debate. ___ Rob

On Sun, Nov 25, 2012 at 3:25 PM, Rob Stewart <robertstewart@comcast.net> wrote:
There are times when a null terminated string is required. Legacy APIs are usually the problem. Thus, it is necessary to convert. A separate type for that is appropriate. If constructed from a null terminated string_ref, it can simply reference the string_ref's data. Otherwise, it can copy the data to the free store or use the SBO to avoid the allocation. There's no extra overhead with that approach (unless program logic allows you to omit the length and termination checks, but then you could just use a char const * directly). Whether you choose to make that conversion via a string_ref member function or a converting constructor of the other class is open for debate.
I must say I'm a little concerned by the fact that the number of string types increases. If I design my library interface (let's assume it doesn't use legacy APIs internally for now), what string type should I use? I want my library to be used with any string type and I don't want to provide overloads for all possible string types. It would seem that string_ref is the answer, but I don't see any support for third-party string types in it. I will be able to do this: void foo(string_ref const&); foo("hello"); string str; foo(str); But that is still not supported: QString qstr; foo(qstr); string_literal lit = "Hello"; foo(lit); If string_ref is nothing more than a pair of iterators with a few additional member functions, I find iterator_range< const char* > much more superior because it has the begin()/end() extension mechanism. The member algorithms can easily be replaced with the general ones, so they don't really add any value to string_ref. And if you add yet another zstring_ref to that zoo, you're only making things worse. If string_ref is to be proposed for inclusion (and yes, I would like it to follow the common protocol for the new libraries and not silently committed) the first thing I would like to know is how it is better than iterator_range< const char* > and what problems it solves that can't be solved with iterator_range. If there aren't any significant advantages I'd prefer not to introduce yet another string type.

On Nov 25, 2012, at 7:30 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Sun, Nov 25, 2012 at 3:25 PM, Rob Stewart <robertstewart@comcast.net> wrote:
There are times when a null terminated string is required. Legacy APIs are usually the problem. Thus, it is necessary to convert. A separate type for that is appropriate.
I must say I'm a little concerned by the fact that the number of string types increases.
That's a valid concern, but note that we already have a great many types in this realm: char *, std::string, plus third party types.
If I design my library interface (let's assume it doesn't use legacy APIs internally for now), what string type should I use?
std::string is appropriate, unless you care about copies and free store allocations, in which case, we're suggesting string_ref.
I want my library to be used with any string type and I don't want to provide overloads for all possible string types.
That's an impossible order, unless you add compile-time dispatching to your code, and then "all possible string types" means as many as you care to support. boost::string_ref can be extended similarly, but that would never work for std::string_ref.
It would seem that string_ref is the answer, but I don't see any support for third-party string types in it. I will be able to do this:
void foo(string_ref const&);
foo("hello");
string str; foo(str);
It can also support iterator pairs and even std::vector<char>.
But that is still not supported:
QString qstr; foo(qstr);
As above.
string_literal lit = "Hello"; foo(lit);
Why this type?
If string_ref is nothing more than a pair of iterators with a few additional member functions, I find iterator_range< const char* > much more superior because it has the begin()/end() extension mechanism.
That forces every call to extract or compute an iterator range, which is less convenient and more error prone.
The member algorithms can easily be replaced with the general ones, so they don't really add any value to string_ref.
I agree that member versus free is a matter of syntax except for subscripting. (There may be more exceptions, but that one occurred to me.) Subscripting isn't critical, but certainly is convenient and string-like.
And if you add yet another zstring_ref to that zoo, you're only making things worse.
It's only for the times when null termination is required. The two types could even be the same class template with different termination policies.
If string_ref is to be proposed for inclusion (and yes, I would like it to follow the common protocol for the new libraries and not silently committed) the first thing I would like to know is how it is better than iterator_range< const char* > and what problems it solves that can't be solved with iterator_range. If there aren't any significant advantages I'd prefer not to introduce yet another string type.
How'd I do? ___ Rob

On Mon, Nov 26, 2012 at 2:23 PM, Rob Stewart <robertstewart@comcast.net> wrote:
On Nov 25, 2012, at 7:30 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
If I design my library interface (let's assume it doesn't use legacy APIs internally for now), what string type should I use?
std::string is appropriate, unless you care about copies and free store allocations, in which case, we're suggesting string_ref.
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
I want my library to be used with any string type and I don't want to provide overloads for all possible string types.
That's an impossible order, unless you add compile-time dispatching to your code, and then "all possible string types" means as many as you care to support. boost::string_ref can be extended similarly, but that would never work for std::string_ref.
It is possible, if the third-party strings follow the begin()/end() protocol. Ok, it's not all possible string types but it is at least extensible.
It would seem that string_ref is the answer, but I don't see any support for third-party string types in it. I will be able to do this:
void foo(string_ref const&);
foo("hello");
string str; foo(str);
It can also support iterator pairs and even std::vector<char>.
How? Did I miss string_ref constructor from a range?
string_literal lit = "Hello"; foo(lit);
Why this type?
If string_ref is nothing more than a pair of iterators with a few additional member functions, I find iterator_range< const char* > much more superior because it has the begin()/end() extension mechanism.
That forces every call to extract or compute an iterator range, which is less convenient and more error prone.
No, this is not needed. iterator_range has implicit constructor from a range, so the conversion will be hidden from both the user and the library developer.
The member algorithms can easily be replaced with the general ones, so they don't really add any value to string_ref.
I agree that member versus free is a matter of syntax except for subscripting. (There may be more exceptions, but that one occurred to me.) Subscripting isn't critical, but certainly is convenient and string-like.
iterator_range has operator[] for random access iterators.
And if you add yet another zstring_ref to that zoo, you're only making things worse.
It's only for the times when null termination is required. The two types could even be the same class template with different termination policies.
Extracting termination policy to a template parameter is a possibility but it has drawbacks of its own. It makes harder to provide a stable API/ABI for compiled libraries.
If string_ref is to be proposed for inclusion (and yes, I would like it to follow the common protocol for the new libraries and not silently committed) the first thing I would like to know is how it is better than iterator_range< const char* > and what problems it solves that can't be solved with iterator_range. If there aren't any significant advantages I'd prefer not to introduce yet another string type.
How'd I do?
So far I can see only one significant difference of string_ref from an iterator_range: string_ref is a assumed to refer to a contiguous range. I'm not sure the distinction is enough to create a new library rather than extend Boost.Range and Boost.Iterator to introduce a notion of a contiguous range and iterator thereof. You could call the new range type string_ref but that unnecessarily narrows the scope of the component. After all, why not have a contiguous range of ints, for example?

On Mon, Nov 26, 2012 at 12:56 PM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
You're right, although I expect both to get support when detection of contiguous ranges becomes possible.
the component. After all, why not have a contiguous range of ints, for example?
Like boost::iterator_range<const int*>? -- Olaf

On Mon, Nov 26, 2012 at 4:03 PM, Olaf van der Spek <ml@vdspek.org> wrote:
On Mon, Nov 26, 2012 at 12:56 PM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
the component. After all, why not have a contiguous range of ints, for example?
Like boost::iterator_range<const int*>?
Yes.

On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.

On Mon, Nov 26, 2012 at 4:52 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
If the string_ref or range type (let's call it contiguous_range< const char* >) is not implicitly convertible from other string types then it is useless for use cases I pointed out. The thing is to make interfaces transparently support any string types. As a side note, I wonder if it should be contiguous_range< const char*
or contiguous_range< const char >? Since the range is contiguous, we can always store pointers internally, even if the referred range has other iterator types (e.g. std::vector).

On 26 November 2012 13:26, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 4:52 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
If the string_ref or range type (let's call it contiguous_range< const char* >)
A string isn't the same thing as a range of characters.
is not implicitly convertible from other string types then it is useless for use cases I pointed out. The thing is to make interfaces transparently support any string types.
That's why I suggested a customisation mechanism. Something would allow you to indicate that a third party type is a string and, if necessary, how to get a string_ref from it. Perhaps an ADL hook, or a template class that is specialized for strings, or something else entirely.

On Mon, Nov 26, 2012 at 5:48 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 13:26, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 4:52 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
If the string_ref or range type (let's call it contiguous_range< const char* >)
A string isn't the same thing as a range of characters.
Why?
is not implicitly convertible from other string types then it is useless for use cases I pointed out. The thing is to make interfaces transparently support any string types.
That's why I suggested a customisation mechanism. Something would allow you to indicate that a third party type is a string and, if necessary, how to get a string_ref from it. Perhaps an ADL hook, or a template class that is specialized for strings, or something else entirely.
That would mean that the range is limited to strings only. I'm not sure this limitation is justified.

On Mon, Nov 26, 2012 at 6:13 AM, Andrey Semashev <andrey.semashev@gmail.com>wrote:
On Mon, Nov 26, 2012 at 5:48 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 13:26, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 4:52 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
If the string_ref or range type (let's call it contiguous_range< const char* >)
A string isn't the same thing as a range of characters.
Why?
is not implicitly convertible from other string types then it is useless for use cases I pointed out. The thing is to make interfaces transparently support any string types.
That's why I suggested a customisation mechanism. Something would allow you to indicate that a third party type is a string and, if necessary, how to get a string_ref from it. Perhaps an ADL hook, or a template class that is specialized for strings, or something else entirely.
That would mean that the range is limited to strings only. I'm not sure this limitation is justified.
I agree with Andrey. At least for what Marshall was originally proposing this for, iterator_range< char [const] * > seems sufficient for the API of string algorithms. I don't even think a contiguous_range template need be defined (as I had suggested earlier and as Andrey had also mentioned); just partially specialize iterator_range< T * > if you want a broader API than what iterator_range already offers. E.g., add implicit conversions from other contiguous ranges of Ts (this assumes some hook to determine whether a range is a contiguous range; I like the previous suggestion to add a new traversal category deriving from random access). - Jeff

On 26 November 2012 14:13, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 5:48 PM, Daniel James <dnljms@gmail.com> wrote:
A string isn't the same thing as a range of characters.
Why?
Strings are one of the most important types in programming, and they're usually handled differently from ranges. A string is essentially a thing, not a sequence of things. Using a range confuses the string's representation for its type. A range of characters can be binary data, or a collection of small integers, I don't want them coming anywhere near string handling code, and in C++ the type system is the best way to handle that. It should be easy to have distinct overloads for strings and ranges.
is not implicitly convertible from other string types then it is useless for use cases I pointed out. The thing is to make interfaces transparently support any string types.
That's why I suggested a customisation mechanism. Something would allow you to indicate that a third party type is a string and, if necessary, how to get a string_ref from it. Perhaps an ADL hook, or a template class that is specialized for strings, or something else entirely.
That would mean that the range is limited to strings only.
Not at all, you could still explicitly convert to a string_ref.

On Tue, Nov 27, 2012 at 3:42 PM, Daniel James <dnljms@gmail.com> wrote:
On 26 November 2012 14:13, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 5:48 PM, Daniel James <dnljms@gmail.com> wrote:
A string isn't the same thing as a range of characters.
Why?
Strings are one of the most important types in programming, and they're usually handled differently from ranges. A string is essentially a thing, not a sequence of things. Using a range confuses the string's representation for its type.
I guess, that depends on the perspective. I imagine strings as sequences of characters and typically my string processing code is similar to that of other ranges. You just have a certain additional knowledge of its elements' nature.
A range of characters can be binary data, or a collection of small integers, I don't want them coming anywhere near string handling code, and in C++ the type system is the best way to handle that. It should be easy to have distinct overloads for strings and ranges.
Probably, although I can hardly imagine overloads of a single method that have different semantics for a string and a range of characters. This would likely be the sign of a poor interface. OTOH, interfaces that just want to accept "some string" are common.
That's why I suggested a customisation mechanism. Something would allow you to indicate that a third party type is a string and, if necessary, how to get a string_ref from it. Perhaps an ADL hook, or a template class that is specialized for strings, or something else entirely.
That would mean that the range is limited to strings only.
Not at all, you could still explicitly convert to a string_ref.
I would have been happy with what you suggest if I was sure this customization mechanism is going to be adopted by third-party strings. I have a good faith in begin()/end() adoption, at least because this mechanism is used by the language core (range-based for) and it is logical for strings to support it. I am not so sure for a type trait or some other mechanism that detect string types.

26.11.12, 20:54, "Daniel James" <dnljms@gmail.com>":
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed. Look at std::string::c_str/data, any smart_ptr::get etc - everything is explicit for a reason. Implicit conversions to pointers are very dangerous here. No containers give away their internals implicitly, and this is Good Thing. Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction. Thanks, Maxim

On Mon, Nov 26, 2012 at 7:03 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
26.11.12, 20:54, "Daniel James" <dnljms@gmail.com>":
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com>
wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed. Look at std::string::c_str/data, any smart_ptr::get etc - everything is explicit for a reason. Implicit conversions to pointers are very dangerous here. No containers give away their internals implicitly, and this is Good Thing.
Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction.
I was under the impression that string_ref's proposed purpose was to provide a generic concrete (i.e., non-template) interface for string algorithms (both as parameters and results), and for this purpose, the above arguments seem irrelevant. The lifetime concerns are no worse than for a std::string const &. - Jeff

On Tue, Nov 27, 2012 at 9:48 AM, Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com> wrote:
On Mon, Nov 26, 2012 at 7:03 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction.
I was under the impression that string_ref's proposed purpose was to provide a generic concrete (i.e., non-template) interface for string algorithms (both as parameters and results), and for this purpose, the above arguments seem irrelevant. The lifetime concerns are no worse than for a std::string const &.
+1, string_ref by definition doesn't own the referred characters. Its presence is enough indication that the storage of the characters has to be managed elsewhere. And from the user's perspective it should be as transparent as possible.

27.11.12, 14:24, "Andrey Semashev" <andrey.semashev@gmail.com>":
+1, string_ref by definition doesn't own the referred characters. Its presence is enough indication that the storage of the characters has to be managed elsewhere.
Unfortunately there is absolutely no indication of its presence if you make the construction implicit. Implicitly constructible types are invisible at the point of use, this is the problem. This is exactly what I'm talking about: there should be clear indication of what's going on, and it should be at the point of use, not in function declaration or in docs. Making it implicit is a way to error-prone user code.
And from the user's perspective it should be as transparent as possible.
What about making implicit conversion from auto_ptr<T*> to T*? Enforcing use of get() is not transparent as possible either. What about getting rid of c_str()? Isn't safety the reason of having both? Thanks, Maxim

On Tue, Nov 27, 2012 at 10:44 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
27.11.12, 14:24, "Andrey Semashev" <andrey.semashev@gmail.com>":
+1, string_ref by definition doesn't own the referred characters. Its presence is enough indication that the storage of the characters has to be managed elsewhere.
Unfortunately there is absolutely no indication of its presence if you make the construction implicit. Implicitly constructible types are invisible at the point of use, this is the problem. This is exactly what I'm talking about: there should be clear indication of what's going on, and it should be at the point of use, not in function declaration or in docs. Making it implicit is a way to error-prone user code.
And from the user's perspective it should be as transparent as possible.
What about making implicit conversion from auto_ptr<T*> to T*? Enforcing use of get() is not transparent as possible either. What about getting rid of c_str()? Isn't safety the reason of having both?
When passing arguments to functions, c_str() and get() are exactly the things we want to get rid of. That's where string_ref helps. If you want to store string_ref in a container then either bear in mind that the string_refs do not own the storage or choose a different type for that purpose.

27.11.12, 13:51, "Jeffrey Lee Hellrung, Jr." <jeffrey.hellrung@gmail.com>":
On Mon, Nov 26, 2012 at 7:03 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed. Look at std::string::c_str/data, any smart_ptr::get etc - everything is explicit for a reason. Implicit conversions to pointers are very dangerous here. No containers give away their internals implicitly, and this is Good Thing.
Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction.
I was under the impression that string_ref's proposed purpose was to provide a generic concrete (i.e., non-template) interface for string algorithms (both as parameters and results), and for this purpose, the above arguments seem irrelevant. The lifetime concerns are no worse than for a std::string const &.
Should we ban vector<string_ref> then? Otherwise it's error-prone: vector<string_ref> v; std::string s; v.push_back(s); Containers of string_ref are very useful while working with external textual data, e.g. parsing - to store various chunks of parsed text like identifiers. But if you want to add some predefined identifiers from other sources like std::string - you want the compiler to warn you if you're going to do something dangerous. Explicit construction is very useful and saves from leaks and ownership errors. This is from my team's real experience with our string_ref-like class. Thanks, Maxim

On Tue, Nov 27, 2012 at 10:56 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
27.11.12, 13:51, "Jeffrey Lee Hellrung, Jr." <jeffrey.hellrung@gmail.com>":
On Mon, Nov 26, 2012 at 7:03 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed. Look at std::string::c_str/data, any smart_ptr::get etc - everything is explicit for a reason. Implicit conversions to pointers are very dangerous here. No containers give away their internals implicitly, and this is Good Thing.
Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction.
I was under the impression that string_ref's proposed purpose was to provide a generic concrete (i.e., non-template) interface for string algorithms (both as parameters and results), and for this purpose, the above arguments seem irrelevant. The lifetime concerns are no worse than for a std::string const &.
Should we ban vector<string_ref> then? Otherwise it's error-prone:
vector<string_ref> v; std::string s; v.push_back(s);
Containers of string_ref are very useful while working with external textual data, e.g. parsing - to store various chunks of parsed text like identifiers. But if you want to add some predefined identifiers from other sources like std::string - you want the compiler to warn you if you're going to do something dangerous. Explicit construction is very useful and saves from leaks and ownership errors. This is from my team's real experience with our string_ref-like class.
I think, banning is an overstatement. Caution must be exercised, true. Just as much caution as with the following code: vector<pair<const char*, size_t> > v; std::string s; v.push_back(make_pair(s.c_str(), s.size())); I admit it's more explicit, but nevertheless the string_ref variant is still valid.

On Mon, Nov 26, 2012 at 10:56 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
27.11.12, 13:51, "Jeffrey Lee Hellrung, Jr." <jeffrey.hellrung@gmail.com
":
On Mon, Nov 26, 2012 at 7:03 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed. Look at std::string::c_str/data, any smart_ptr::get etc - everything is explicit for a reason. Implicit conversions to pointers are very dangerous here. No containers give away their internals implicitly, and this is Good Thing.
Consider a vector of pointers, for example. You don't want to implicitly put a pointer managed by a smart pointer there, you want it to be explicit so it's visible and you don't forget putting reset/release near a push_back. Same applies to a vector of string_ref - you want to be sure that the string referred by it lives as long a needed, and there is no other way except explicit construction.
I was under the impression that string_ref's proposed purpose was to provide a generic concrete (i.e., non-template) interface for string algorithms (both as parameters and results), and for this purpose, the above arguments seem irrelevant. The lifetime concerns are no worse than for a std::string const &.
Should we ban vector<string_ref> then?
Well, I don't know, we don't ban vector< reference_wrapper<T> > or vector< iterator_range<I> > (do we?), so probably not. Otherwise it's error-prone:
vector<string_ref> v; std::string s; v.push_back(s);
I fail to see the error in the above 3 lines :) Containers of string_ref are very useful while working with external
textual data, e.g. parsing - to store various chunks of parsed text like identifiers. But if you want to add some predefined identifiers from other sources like std::string - you want the compiler to warn you if you're going to do something dangerous. Explicit construction is very useful and saves from leaks and ownership errors.
Yes, but it also makes code uglier and less readable. I think we can agree, there's a balance. This is from my team's real experience with our string_ref-like class.
Good to know, and, on the flip side, I can't speak from such experience, so you got me there. But as far as I'm concerned, string_ref isn't much more than iterator_range< char [const] * >, and iterator_range< char [const] * > already has some of these implicit constructors one might be interested. So, it's hard for me to see how we're creating these huge trap for ownership issues; if anything, one already exists. Do you likewise advocate deprecation of iterator_range's implicit constructors? - Jeff

On 27 November 2012 03:03, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
26.11.12, 20:54, "Daniel James" <dnljms@gmail.com>":
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
Hi Daniel, I'm fully with you here. The construction of string_ref should be explicit (except maybe literals if we can detect them in compile time) as we're giving away pointers to something that is externally managed.
Sorry, I must have not been clear enough, since a few people seem to have misunderstood me. I think that string_ref should be implicitly convertible from appropriate strings, explicitly convertible from other types. Implicit conversion from third party strings could be supported by customisation.

On 26/11/12 13:52, Daniel James wrote:
On 26 November 2012 11:56, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
Shouldn't construction from an arbitrary range be explicit? Arbitrary implicit conversions are problematic. To get implicit construction from third party strings, I'd use some sort of explicit customisation mechanism.
I never understood what problems people had with implicit conversion. Sure, a template implicit conversion operator is dangerous, but all other uses, in particular implicit constructors, have no problem really.

On Nov 26, 2012, at 6:56 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Mon, Nov 26, 2012 at 2:23 PM, Rob Stewart <robertstewart@comcast.net> wrote:
On Nov 25, 2012, at 7:30 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
If I design my library interface (let's assume it doesn't use legacy APIs internally for now), what string type should I use?
std::string is appropriate, unless you care about copies and free store allocations, in which case, we're suggesting string_ref.
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
That's right. We have no universal string/range type for that purpose, so you use the standard string type.
I want my library to be used with any string type and I don't want to provide overloads for all possible string types.
That's an impossible order, unless you add compile-time dispatching to your code, and then "all possible string types" means as many as you care to support. boost::string_ref can be extended similarly, but that would never work for std::string_ref.
It is possible, if the third-party strings follow the begin()/end() protocol.
Now you're changing the rules. TP strings don't all provide iterators.
Ok, it's not all possible string types but it is at least extensible.
Sure. That's within the criteria I mentioned.
It would seem that string_ref is the answer, but I don't see any support for third-party string types in it. I will be able to do this:
void foo(string_ref const&);
foo("hello");
string str; foo(str);
It can also support iterator pairs and even std::vector<char>.
How? Did I miss string_ref constructor from a range?
I haven't been able to examine the proposed string_ref yet. I'm speaking of possibilities and my own class.
string_literal lit = "Hello"; foo(lit);
Why this type?
If string_ref is nothing more than a pair of iterators with a few additional member functions, I find iterator_range< const char* > much more superior because it has the begin()/end() extension mechanism.
That forces every call to extract or compute an iterator range, which is less convenient and more error prone.
No, this is not needed. iterator_range has implicit constructor from a range, so the conversion will be hidden from both the user and the library developer.
That only applies to types recognized as ranges. It isn't all string types. The same support should be part of string_ref, but an important distinction is that string_ref requires a contiguous range.
The member algorithms can easily be replaced with the general ones, so they don't really add any value to string_ref.
I agree that member versus free is a matter of syntax except for subscripting. (There may be more exceptions, but that one occurred to me.) Subscripting isn't critical, but certainly is convenient and string-like.
iterator_range has operator[] for random access iterators.
OK, I'll have to check for any other examples.
And if you add yet another zstring_ref to that zoo, you're only making things worse.
It's only for the times when null termination is required. The two types could even be the same class template with different termination policies.
Extracting termination policy to a template parameter is a possibility but it has drawbacks of its own. It makes harder to provide a stable API/ABI for compiled libraries.
You'd only use the terminated one in APIs in rare cases, so a separate class is simpler.
If string_ref is to be proposed for inclusion (and yes, I would like it to follow the common protocol for the new libraries and not silently committed) the first thing I would like to know is how it is better than iterator_range< const char* > and what problems it solves that can't be solved with iterator_range. If there aren't any significant advantages I'd prefer not to introduce yet another string type.
How'd I do?
So far I can see only one significant difference of string_ref from an iterator_range: string_ref is a assumed to refer to a contiguous range. I'm not sure the distinction is enough to create a new library rather than extend Boost.Range and Boost.Iterator to introduce a notion of a contiguous range and iterator thereof. You could call the new range type string_ref but that unnecessarily narrows the scope of the component. After all, why not have a contiguous range of ints, for example?
There are semantic differences between a contiguous range of characters and a string, but a contiguous range type would be useful in and of itself. ___ Rob

On Tue, Nov 27, 2012 at 2:36 PM, Rob Stewart <robertstewart@comcast.net> wrote:
On Nov 26, 2012, at 6:56 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
That's right. We have no universal string/range type for that purpose, so you use the standard string type.
My point was that, in my understanding, string_ref is aimed to solve this issue in a transparent way but the proposal lacks the necessary interface. I would have used string_ref to unify string-related interfaces if it transparently supported multiple string types, not limited by those defined in STL (and Boost, if boost::string_ref is to be implemented). Limiting it to particular types defeats its purpose.
It is possible, if the third-party strings follow the begin()/end() protocol.
Now you're changing the rules. TP strings don't all provide iterators.
Any reasonable string type will have some notion of iterators, be that custom types or pointers or a pointer and a size, whatever. As long as this holds, the third-party string type can be adopted. I understand that not all (nearly none?) third-party strings support begin()/end() protocol now, but I expect them to support eventually. Even if they don't, the necessary overloads can be provided externally.
No, this is not needed. iterator_range has implicit constructor from a range, so the conversion will be hidden from both the user and the library developer.
That only applies to types recognized as ranges. It isn't all string types. The same support should be part of string_ref, but an important distinction is that string_ref requires a contiguous range.
iterator_range doesn't detect that its constructor argument is a range or not. If applying begin()/end() to it is a valid operation, the conversion will succeed. I'd like string_ref to behave the same way. I see only one corner case: C strings. But I believe the solution is possible. Either begin()/end() can be defined for const char* or the string_ref can have the corresponding constructor. The latter is one (and only, AFAICS) reason to have string_ref type in addition to contiguous_range.
Extracting termination policy to a template parameter is a possibility but it has drawbacks of its own. It makes harder to provide a stable API/ABI for compiled libraries.
You'd only use the terminated one in APIs in rare cases, so a separate class is simpler.
So I would not introduce it at all for that reason. Just use std::string in such cases.
There are semantic differences between a contiguous range of characters and a string, but a contiguous range type would be useful in and of itself.
The semantic difference is a matter of content and its interpretation. You can store non-printable elements in std::string (and it is sometimes more convenient and efficient than std::vector< char >) and printable characters in std::vector< char >. The interface of std::vector< char > and std::string is mostly the same when it comes to string processing (not counting std::string members that can be replaced with free algorithms). The same applies to string_ref and contiguous_range< const char* >, the only notable difference being the construction from const char*.

On Nov 27, 2012, at 7:01 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On Tue, Nov 27, 2012 at 2:36 PM, Rob Stewart <robertstewart@comcast.net> wrote:
On Nov 26, 2012, at 6:56 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
The problem with std::string is the same as with string_ref - it doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
That's right. We have no universal string/range type for that purpose, so you use the standard string type.
My point was that, in my understanding, string_ref is aimed to solve this issue in a transparent way but the proposal lacks the necessary interface.
I didn't realize you were arguing WRT the proposed class versus the concept, which is what I've been doing.
I would have used string_ref to unify string-related interfaces if it transparently supported multiple string types, not limited by those defined in STL (and Boost, if boost::string_ref is to be implemented). Limiting it to particular types defeats its purpose.
OK, I suspect we're agreeing more than disagreeing. Here's the I/F of my string_ref: - converting ctors from: o char const * o std::string const & o std::vector<char> const & o const_substring const & (my substring type) - other ctors: o char const *, size_t o char const *, char const * o char const (&)[N] - similar assignment operators - similar assign() member functions - bool is_null() - safe bool or explicit bool conversion operator - char const * data() - size_t length() - char const * begin()/end() - string_ref substr() - char operator[](size_t) (I think that's a complete list. I'm doing it from memory now.) It is very string-like and convenient. The same behaviors would be messier without a class (versus a range type and algorithms), though less general. I have not extended mine to support arbitrary ranges, via Boost.Range, simply because the need hasn't arisen, but it can be done. Likewise for arbitrary iterator pairs.
It is possible, if the third-party strings follow the begin()/end() protocol.
Now you're changing the rules. TP strings don't all provide iterators.
Any reasonable string type will have some notion of iterators, be that custom types or pointers or a pointer and a size, whatever. As long as this holds, the third-party string type can be adopted.
I understand that not all (nearly none?) third-party strings support begin()/end() protocol now, but I expect them to support eventually. Even if they don't, the necessary overloads can be provided externally.
I think such support is a reasonable addition.
No, this is not needed. iterator_range has implicit constructor from a range, so the conversion will be hidden from both the user and the library developer.
That only applies to types recognized as ranges. It isn't all string types. The same support should be part of string_ref, but an important distinction is that string_ref requires a contiguous range.
iterator_range doesn't detect that its constructor argument is a range or not. If applying begin()/end() to it is a valid operation, the conversion will succeed. I'd like string_ref to behave the same way.
OK
I see only one corner case: C strings. But I believe the solution is possible. Either begin()/end() can be defined for const char* or the string_ref can have the corresponding constructor. The latter is one (and only, AFAICS) reason to have string_ref type in addition to contiguous_range.
(char const *, size_t) is also common and convenient.
Extracting termination policy to a template parameter is a possibility but it has drawbacks of its own. It makes harder to provide a stable API/ABI for compiled libraries.
You'd only use the terminated one in APIs in rare cases, so a separate class is simpler.
So I would not introduce it at all for that reason. Just use std::string in such cases.
Using std::string loses the possibility of using the string_ref when it references a null terminated range. Thus, you'd always allocate and copy.
There are semantic differences between a contiguous range of characters and a string, but a contiguous range type would be useful in and of itself.
The semantic difference is a matter of content and its interpretation. You can store non-printable elements in std::string (and it is sometimes more convenient and efficient than std::vector< char >) and printable characters in std::vector< char >. The interface of std::vector< char > and std::string is mostly the same when it comes to string processing (not counting std::string members that can be replaced with free algorithms). The same applies to string_ref and contiguous_range< const char* >, the only notable difference being the construction from const char*.
I've never used std::string for non-string character storage. I use std::vector<char>. I realize that precludes any SBO opportunity, but I'd use another, non-string type in that case. Like Daniel, I see string processing as special. Maybe I'm just stuck in my old ways. ___ Rob

On 28.11.2012 14:48, Rob Stewart wrote:
OK, I suspect we're agreeing more than disagreeing. Here's the I/F of my string_ref:
- converting ctors from: o char const * o std::string const & o std::vector<char> const & o const_substring const & (my substring type) - other ctors: What about default constructor?
o char const *, size_t o char const *, char const * o char const (&)[N] May be, it is reasonable to add bool trim_last = true here? This would allow not null-terminated character arrays.
- similar assignment operators - similar assign() member functions - bool is_null() I propose empty() instead. I don't think that the substring (and especially string_REF) may be "null".
- safe bool or explicit bool conversion operator I don't think this is a good idea.
- char const * data() - size_t length() I propose the alias size().
- char const * begin()/end() +cbegin/cend.
- string_ref substr() - char operator[](size_t) Throwing at() may be useful, too.
(I think that's a complete list. I'm doing it from memory now.) What about pop_back(), pop_front(), swap()?
And again, I propose "substring" instead of "string_ref". -- Sergey Cheban

On Dec 6, 2012, at 8:18 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 28.11.2012 14:48, Rob Stewart wrote:
OK, I suspect we're agreeing more than disagreeing. Here's the I/F of my string_ref:
- converting ctors from: o char const * o std::string const & o std::vector<char> const & o const_substring const & (my substring type) - other ctors: What about default constructor?
Yes
o char const *, size_t o char const *, char const * o char const (&)[N] May be, it is reasonable to add bool trim_last = true here? This would allow not null-terminated character arrays.
I can see value in that.
- similar assignment operators - similar assign() member functions - bool is_null() I propose empty() instead. I don't think that the substring (and especially string_REF) may be "null".
I suppose it should be empty(), though I hate that it isn't the right part of speech. That is, "empty" should be synonymous with "clear". Still, consistency with std::string should probably take precedence.
- safe bool or explicit bool conversion operator I don't think this is a good idea.
Why not?
- char const * data() - size_t length() I propose the alias size().
string_ref isn't a container like string, so I prefer just length().
- char const * begin()/end() +cbegin/cend.
Ah, yes.
- string_ref substr() - char operator[](size_t) Throwing at() may be useful, too.
I suppose so.
(I think that's a complete list. I'm doing it from memory now.) What about pop_back(), pop_front(), swap()?
string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable.
And again, I propose "substring" instead of "string_ref".
I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) ___ Rob

On 09.12.2012 17:13, Rob Stewart wrote: >>> - safe bool or explicit bool conversion operator >> I don't think this is a good idea. > Why not? This seems to be not intuitive and not so safe. The std::string has no such operator. >>> - char const * data() >>> - size_t length() >> I propose the alias size(). > string_ref isn't a container like string, so I prefer just length(). Ok. >>> (I think that's a complete list. I'm doing it from memory now.) >> What about pop_back(), pop_front(), swap()? > string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable. It is not a container (i.e., it does not own the content) but pop_* methods still may remove the characters from it. Btw, there also are [r]find*(), r[c]begin()/r[c]end() and compare() groups of methods in the std::basic_string. >> And again, I propose "substring" instead of "string_ref". > I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) 1. Are these classes in the Boost library and/or namespace? 2. Do these classes do a different job, or they just have a different interface? -- Sergey Cheban

On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote: > On 09.12.2012 17:13, Rob Stewart wrote: > >>>> - safe bool or explicit bool conversion operator >>> I don't think this is a good idea. >> Why not? > This seems to be not intuitive and not so safe. It is quite intuitive to me. true means non-null, and false means null. > The std::string has no such operator. Why does that matter? It's still convenient. It would be a nice addition to std::string. >>> What about pop_back(), pop_front(), swap()? >> string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable. > It is not a container (i.e., it does not own the content) but pop_* methods still may remove the characters from it. They wouldn't remove characters, they would only "forget" them. The semantic is sufficiently different that I don't think I would find them intuitive. > Btw, there also are [r]find*(), r[c]begin()/r[c]end() and compare() groups of methods in the std::basic_string. I don't think of string_ref as so complete an analogue to string. I see such things as the purview of a substring class, though that's purely subjective. >>> And again, I propose "substring" instead of "string_ref". >> I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) > 1. Are these classes in the Boost library and/or namespace? No. They are my own classes which I've not proposed to Boost. > 2. Do these classes do a different job, or they just have a different interface? They reference a std::string and operate on a subset of the string's characters. The have special constructors and replicate most of string's interface. ___ Rob

On Tue, Dec 11, 2012 at 11:47 PM, Rob Stewart <robertstewart@comcast.net>wrote:
On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 09.12.2012 17:13, Rob Stewart wrote:
- safe bool or explicit bool conversion operator I don't think this is a good idea. Why not? This seems to be not intuitive and not so safe.
It is quite intuitive to me. true means non-null, and false means null.
Would it mean !empty() or !!begin()?
The std::string has no such operator.
Why does that matter? It's still convenient. It would be a nice addition to std::string.
std::string does not have a 'null' state. -- Yakov

On Dec 12, 2012, at 2:38 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Tue, Dec 11, 2012 at 11:47 PM, Rob Stewart <robertstewart@comcast.net>wrote:
On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 09.12.2012 17:13, Rob Stewart wrote:
- safe bool or explicit bool conversion operator I don't think this is a good idea. Why not? This seems to be not intuitive and not so safe.
It is quite intuitive to me. true means non-null, and false means null.
Would it mean !empty() or !!begin()?
The former certainly. string_ref's are never assumed to be null terminated (which is why I provide data() and not c_str()), so the latter is not sensible.
The std::string has no such operator.
Why does that matter? It's still convenient. It would be a nice addition to std::string.
std::string does not have a 'null' state.
It's the state represented by empty(). ___ Rob

On Wed, Dec 12, 2012 at 2:55 PM, Rob Stewart <robertstewart@comcast.net>wrote:
On Dec 12, 2012, at 2:38 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Tue, Dec 11, 2012 at 11:47 PM, Rob Stewart <robertstewart@comcast.net wrote:
On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 09.12.2012 17:13, Rob Stewart wrote:
> - safe bool or explicit bool conversion operator I don't think this is a good idea. Why not? This seems to be not intuitive and not so safe.
It is quite intuitive to me. true means non-null, and false means null.
Would it mean !empty() or !!begin()?
The former certainly. string_ref's are never assumed to be null terminated (which is why I provide data() and not c_str()), so the latter is not sensible.
I wrote !!begin(), not !!*begin(). In other works string_ref can *actually be a null pointer*, or at least I haven't seen anyone saying the contrary.
The std::string has no such operator.
Why does that matter? It's still convenient. It would be a nice
addition to std::string.
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all. -- Yakov

On Wed, Dec 12, 2012 at 2:21 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
I wrote !!begin(), not !!*begin(). In other works string_ref can *actually be a null pointer*, or at least I haven't seen anyone saying the contrary.
begin() isn't guaranteed to return a pointer and !! doesn't work on an iterator (in general).
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
I'm sure it's not accepted by some. -- Olaf

On Dec 12, 2012, at 8:21 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Dec 12, 2012 at 2:55 PM, Rob Stewart <robertstewart@comcast.net>wrote:
On Dec 12, 2012, at 2:38 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Tue, Dec 11, 2012 at 11:47 PM, Rob Stewart <robertstewart@comcast.net wrote:
On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 09.12.2012 17:13, Rob Stewart wrote:
>> - safe bool or explicit bool conversion operator > I don't think this is a good idea. Why not? This seems to be not intuitive and not so safe.
It is quite intuitive to me. true means non-null, and false means null.
Would it mean !empty() or !!begin()?
The former certainly. string_ref's are never assumed to be null terminated (which is why I provide data() and not c_str()), so the latter is not sensible.
I wrote !!begin(), not !!*begin(). In other works string_ref can *actually be a null pointer*, or at least I haven't seen anyone saying the contrary.
Of course. I misread your intent. You're assuming that the iterator type is a pointer. Besides, if string_ref holds a null pointer, I'd expect empty() to return true. Thus, the operator, and empty(), always mean a string with no characters.
The std::string has no such operator.
Why does that matter? It's still convenient. It would be a nice addition to std::string.
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you? ___ Rob

On 12/12/2012 10:49 AM, Rob Stewart wrote:
On Dec 12, 2012, at 8:21 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Wed, Dec 12, 2012 at 2:55 PM, Rob Stewart <robertstewart@comcast.net>wrote:
On Dec 12, 2012, at 2:38 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Tue, Dec 11, 2012 at 11:47 PM, Rob Stewart <robertstewart@comcast.net wrote:
On Dec 10, 2012, at 7:55 AM, Sergey Cheban <s.cheban@drweb.com> wrote:
On 09.12.2012 17:13, Rob Stewart wrote:
>>> - safe bool or explicit bool conversion operator >> I don't think this is a good idea. > Why not? This seems to be not intuitive and not so safe.
It is quite intuitive to me. true means non-null, and false means null.
Would it mean !empty() or !!begin()?
The former certainly. string_ref's are never assumed to be null terminated (which is why I provide data() and not c_str()), so the latter is not sensible.
I wrote !!begin(), not !!*begin(). In other works string_ref can *actually be a null pointer*, or at least I haven't seen anyone saying the contrary.
Of course. I misread your intent. You're assuming that the iterator type is a pointer. Besides, if string_ref holds a null pointer, I'd expect empty() to return true. Thus, the operator, and empty(), always mean a string with no characters.
The std::string has no such operator.
Why does that matter? It's still convenient. It would be a nice addition to std::string.
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you?
+1 Code like: if(string_ref sr = func(...)) { // operate only on non-empty string_refs } IMHO, shouldn't matter whether sr refers to null of size 0, or to some empty range whose begin() may still refer to a valid location in a larger range. Jeff

On Wed, Dec 12, 2012 at 10:49 AM, Rob Stewart <robertstewart@comcast.net>wrote:
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. ie QString name = database.get("Name"); name.empty() == true means the Name field was empty name.isNull() == true means the Name field doesn't exist in the database. Somewhat like optional<string>. I am NOT saying whether this is a good thing. Just a widely known example of interpretation. Tony

On Dec 12, 2012, at 12:30 PM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
On Wed, Dec 12, 2012 at 10:49 AM, Rob Stewart <robertstewart@comcast.net>wrote:
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. [snip]
Somewhat like optional<string>.
That's how I'd spell it. ___ Rob

On Thu, Dec 13, 2012 at 5:20 AM, Rob Stewart <robertstewart@comcast.net>wrote:
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do
you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. [snip]
Somewhat like optional<string>.
That's how I'd spell it.
Me too maybe. But either way, there is another 'generally useful interpretation'. And fairly well known. For string_ref, it could mean the difference between these 2 string_refs, assuming a ptr+size implementation: { ptr != 0, size == 0 } and { ptr == 0, size == hopefully_zero }

On Thu, Dec 13, 2012 at 7:05 PM, Gottlob Frege <gottlobfrege@gmail.com>wrote:
On Thu, Dec 13, 2012 at 5:20 AM, Rob Stewart <robertstewart@comcast.net
wrote:
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do
you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. [snip]
Somewhat like optional<string>.
That's how I'd spell it. [...] For string_ref, it could mean the difference between these 2 string_refs, assuming a ptr+size implementation:
{ ptr != 0, size == 0 } and { ptr == 0, size == hopefully_zero }
Right. And I want to note that it does matter. At least for std::string, whose constructor basic_string(const charT* s, size_type n, const Allocator& a = Allocator()); explicitly requires that `s` is be a non-null pointer (even if n == 0). -- Yakov

On Thu, Dec 13, 2012 at 6:18 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
[...] For string_ref, it could mean the difference between these 2 string_refs, assuming a ptr+size implementation:
{ ptr != 0, size == 0 } and { ptr == 0, size == hopefully_zero }
Right. And I want to note that it does matter. At least for std::string, whose constructor
True. In general containers / ranges have an empty state but not a null state and operator bool() should return !empty() IMO.
basic_string(const charT* s, size_type n, const Allocator& a = Allocator());
explicitly requires that `s` is be a non-null pointer (even if n == 0).
Sounds like a bug / defect. -- Olaf

On 13 December 2012 11:36, Olaf van der Spek <ml@vdspek.org> wrote:
On Thu, Dec 13, 2012 at 6:18 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
basic_string(const charT* s, size_type n, const Allocator& a = Allocator());
explicitly requires that `s` is be a non-null pointer (even if n == 0).
Sounds like a bug / defect.
I agree (and just sent in a defect report). -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

On 13 December 2012 14:27, Nevin Liber <nevin@eviloverlord.com> wrote:
On 13 December 2012 11:36, Olaf van der Spek <ml@vdspek.org> wrote:
On Thu, Dec 13, 2012 at 6:18 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
basic_string(const charT* s, size_type n, const Allocator& a = Allocator());
explicitly requires that `s` is be a non-null pointer (even if n == 0).
Sounds like a bug / defect.
I agree (and just sent in a defect report).
They won't accept it as a defect. *sigh* Now to go make a feature request for C++14... -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

On Dec 13, 2012, at 9:05 AM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
On Thu, Dec 13, 2012 at 5:20 AM, Rob Stewart <robertstewart@comcast.net>wrote:
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do
you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. [snip]
Somewhat like optional<string>.
That's how I'd spell it.
Me too maybe. But either way, there is another 'generally useful interpretation'. And fairly well known.
For string_ref, it could mean the difference between these 2 string_refs, assuming a ptr+size implementation:
{ ptr != 0, size == 0 } and { ptr == 0, size == hopefully_zero }
Just added another set of tests for string_refs. All zero length string_refs are == -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Dec 13, 2012, at 12:05 PM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
On Thu, Dec 13, 2012 at 5:20 AM, Rob Stewart <robertstewart@comcast.net>wrote:
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string. [snip]
Somewhat like optional<string>.
That's how I'd spell it.
Me too maybe. But either way, there is another 'generally useful interpretation'. And fairly well known.
For string_ref, it could mean the difference between these 2 string_refs, assuming a ptr+size implementation:
{ ptr != 0, size == 0 } and { ptr == 0, size == hopefully_zero }
The unset interpretation of "null" is not uncommon, but I don't agree with it's being generally useful. It is useful in some circumstances, but "is empty" is much more broadly useful, hence my statement. ___ Rob

Another heads-up: I just moved string_ref into boost/utility, and checked in some (very preliminary) documentation. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

on Wed Dec 12 2012, Gottlob Frege <gottlobfrege-AT-gmail.com> wrote:
On Wed, Dec 12, 2012 at 10:49 AM, Rob Stewart <robertstewart@comcast.net>wrote:
std::string does not have a 'null' state.
It's the state represented by empty().
I am not sure it is so certainly accepted by all.
It is the logical interpretation. It indicates whether the string is non-empty. I don't see any other generally useful interpretation, do you?
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string.
That's dumb, though.
ie
QString name = database.get("Name");
name.empty() == true means the Name field was empty name.isNull() == true means the Name field doesn't exist in the database.
Somewhat like optional<string>.
I am NOT saying whether this is a good thing. Just a widely known example of interpretation.
I AM saying it's a bad thing! :-) -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Sun, Dec 16, 2012 at 12:48 AM, Dave Abrahams <dave@boostpro.com> wrote:
on Wed Dec 12 2012, Gottlob Frege <gottlobfrege-AT-gmail.com> wrote: [...]
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string.
That's dumb, though.
It is the matter of value semantics vs. pointer semantics. QString is "more like" a pointer to a referenced counted string.
ie
QString name = database.get("Name");
name.empty() == true means the Name field was empty name.isNull() == true means the Name field doesn't exist in the
database.
Somewhat like optional<string>.
I am NOT saying whether this is a good thing. Just a widely known
example
of interpretation.
I AM saying it's a bad thing! :-)
I agree with you that the null state of QString is a bad thing. But this by itself does not justify interpreting an empty() string to be false. Let me explain. The conversion to bool usually tests the validity of the object so one can do some other operations on it. For instance this is what it means in iostreams (stream is in non-fail state so one can continue reading & writing to it) and pointers (not null so one can dereference it). These examples are of well understood validness of states. On the other hand, for strings, it is dubious that empty() strings are in any way invalid. All operations still can be done on them, like concatenation, find, operator [] with index <= size(), etc... Having said that, personally I do not object to empty() being interpreted as false. I am only trying to judge this suggestion objectively. Cheers, -- Yakov

on Sun Dec 16 2012, Yakov Galka <ybungalobill-AT-gmail.com> wrote:
On Sun, Dec 16, 2012 at 12:48 AM, Dave Abrahams <dave@boostpro.com> wrote:
on Wed Dec 12 2012, Gottlob Frege <gottlobfrege-AT-gmail.com> wrote: [...]
Qt's QString has both empty() and isNull() and they are not always the same. Basically empty() is a zero-byte string, but null is a never-been-set-or-allocated string.
That's dumb, though.
It is the matter of value semantics vs. pointer semantics. QString is "more like" a pointer to a referenced counted string.
But that's a dumb thing for QString to be ;-)
ie
QString name = database.get("Name");
name.empty() == true means the Name field was empty name.isNull() == true means the Name field doesn't exist in the database.
Somewhat like optional<string>.
I am NOT saying whether this is a good thing. Just a widely known example of interpretation.
I AM saying it's a bad thing! :-)
I agree with you that the null state of QString is a bad thing. But this by itself does not justify interpreting an empty() string to be false. Let me explain.
The conversion to bool usually tests the validity of the object
Now hold on a minute! I don't buy this premise. In what sense is a null pointer or a zero integer "invalid?"
so one can do some other operations on it. For instance this is what it means in iostreams (stream is in non-fail state so one can continue reading & writing to it)
OK
and pointers (not null so one can dereference it).
It tests for a useful singular state. But if you want to follow this line of reasoning... front(), back(), and operator[] (probably among others) can't be used on an empty string.
These examples are of well understood validness of states. On the other hand, for strings, it is dubious that empty() strings are in any way invalid.
Just as it's dubious that a null pointer or a zero integer is in any way invalid.
All operations still can be done on them, like concatenation, find, operator [] with index <= size(), etc...
See above.
Having said that, personally I do not object to empty() being interpreted as false. I am only trying to judge this suggestion objectively.
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On Wed, Dec 12, 2012 at 1:55 PM, Rob Stewart <robertstewart@comcast.net> wrote:
Would it mean !empty() or !!begin()?
The former certainly.
string_ref's are never assumed to be null terminated (which is why I provide data() and not c_str()), so the latter is not sensible.
It isn't !!*begin(), so null termination isn't related. -- Olaf

On 12.12.2012 1:47, Rob Stewart wrote: >>>>> - safe bool or explicit bool conversion operator >>>> I don't think this is a good idea. >>> Why not? >> This seems to be not intuitive and not so safe. > It is quite intuitive to me. true means non-null, and false means null. If the basic_substring<T> was convertible to bool, it would be used to compare basic_substring<char> with basic_substring<wchar_t> (with meaningless results). >> The std::string has no such operator. > Why does that matter? It's still convenient. It would be a nice addition to std::string. I think that the substring class should be consistent with the existing standard string interface. If you think that the operator bool is a good addition to the [sub]string you may propose it to the standard commitee. >>>> What about pop_back(), pop_front(), swap()? >>> string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable. >> It is not a container (i.e., it does not own the content) but pop_* methods still may remove the characters from it. > They wouldn't remove characters, they would only "forget" them. The semantic is sufficiently different that I don't think I would find them intuitive. What is the difference? The substring does not own the characters anyway. >> Btw, there also are [r]find*(), r[c]begin()/r[c]end() and compare() groups of methods in the std::basic_string. > I don't think of string_ref as so complete an analogue to string. I see such things as the purview of a substring class, though that's purely subjective. Why not? For example, it may be used as a key in some temporary std::map. >>>> And again, I propose "substring" instead of "string_ref". >>> I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) >> 1. Are these classes in the Boost library and/or namespace? > No. They are my own classes which I've not proposed to Boost. I respect your needs but I don't think that it is a good idea for the Boost library to avoid using convenient names just because these names are used by somebody who uses the boost namespace implicitly. >> 2. Do these classes do a different job, or they just have a different interface? > They reference a std::string and operate on a subset of the string's characters. The have special constructors and replicate most of string's interface. It seems that you may just switch to the boost::substring and get rid of your substring implementation some day. -- Sergey Cheban

On Dec 12, 2012, at 9:05 AM, Sergey Cheban <s.cheban@drweb.com> wrote: > On 12.12.2012 1:47, Rob Stewart wrote: > >>>>>> - safe bool or explicit bool conversion operator >>>>> I don't think this is a good idea. >>>> Why not? >>> This seems to be not intuitive and not so safe. >> It is quite intuitive to me. true means non-null, and false means null. > If the basic_substring<T> was convertible to bool, it would be used to compare basic_substring<char> with basic_substring<wchar_t> (with meaningless results). It would not be meaningless. Though certainly not the likely intent. Still, all that's needed are equality operators between the types to poison the flawed comparison. >>> The std::string has no such operator. >> Why does that matter? It's still convenient. It would be a nice addition to std::string. > I think that the substring class should be consistent with the existing standard string interface. If you think that the operator bool is a good addition to the [sub]string you may propose it to the standard commitee. We're discussing string_ref, not substring. Furthermore, why can't string_ref have a feature before string gets it? >>>>> What about pop_back(), pop_front(), swap()? >>>> string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable. >>> It is not a container (i.e., it does not own the content) but pop_* methods still may remove the characters from it. >> They wouldn't remove characters, they would only "forget" them. The semantic is sufficiently different that I don't think I would find them intuitive. > What is the difference? The substring does not own the characters anyway. As I said, without ownership, they don't seem semantically valid. >>> Btw, there also are [r]find*(), r[c]begin()/r[c]end() and compare() groups of methods in the std::basic_string. >> I don't think of string_ref as so complete an analogue to string. I see such things as the purview of a substring class, though that's purely subjective. > Why not? For example, it may be used as a key in some temporary std::map. What's your point? map uses < by default. >>>>> And again, I propose "substring" instead of "string_ref". >>>> I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) >>> 1. Are these classes in the Boost library and/or namespace? >> No. They are my own classes which I've not proposed to Boost. > I respect your needs but I don't think that it is a good idea for the Boost library to avoid using convenient names just because these names are used by somebody who uses the boost namespace implicitly. I don't understand how your comment applies. >>> 2. Do these classes do a different job, or they just have a different interface? >> They reference a std::string and operate on a subset of the string's characters. The have special constructors and replicate most of string's interface. > It seems that you may just switch to the boost::substring and get rid of your substring implementation some day. Of course, but there isn't one and this discussion is about string_ref. ___ Rob

12.12.2012 20:02, Rob Stewart пишет: >>>>>>> - safe bool or explicit bool conversion operator >>>>>> I don't think this is a good idea. >>>>> Why not? >>>> This seems to be not intuitive and not so safe. >>> It is quite intuitive to me. true means non-null, and false means null. >> If the basic_substring<T> was convertible to bool, it would be used to compare basic_substring<char> with basic_substring<wchar_t> (with meaningless results). > It would not be meaningless. Though certainly not the likely intent. Still, all that's needed are equality > operators between the types to poison the flawed comparison. Ok. What about boost::lexical_cast? There is no lexical_cast<unsigned>(const string_ref &) yet but there already is lexical_cast<unsigned>(bool). So, lexical_cast<unsigned>( string_ref("5") ) will return 1. >>>> The std::string has no such operator. >>> Why does that matter? It's still convenient. It would be a nice addition to std::string. >> I think that the substring class should be consistent with the existing standard string interface. If you think that the operator bool is a good addition to the [sub]string you may propose it to the standard commitee. > We're discussing string_ref, not substring. Furthermore, why can't string_ref have a feature before string gets it? Ok, let it be string_ref. :-) The boost classes may extend the standard interfaces and they often do it. But I think that there is some reason why std::string interface does not include operator bool. >>>>>> What about pop_back(), pop_front(), swap()? >>>>> string_ref isn't a container, so pop_back() and pop_front() are inappropriate. However, back(), front(), and swap() are reasonable. >>>> It is not a container (i.e., it does not own the content) but pop_* methods still may remove the characters from it. >>> They wouldn't remove characters, they would only "forget" them. The semantic is sufficiently different that I don't think I would find them intuitive. >> What is the difference? The substring does not own the characters anyway. > As I said, without ownership, they don't seem semantically valid. Ok, I agree that pop() without push() looks inconsistent and so I don't insist on it. >>>> Btw, there also are [r]find*(), r[c]begin()/r[c]end() and compare() groups of methods in the std::basic_string. >>> I don't think of string_ref as so complete an analogue to string. I see such things as the purview of a substring class, though that's purely subjective. >> Why not? For example, it may be used as a key in some temporary std::map. > What's your point? map uses < by default. And what will use operators <, <=, == etc. to compare string_refs if there is no compare() method? Ok, each of them may compare the string_refs directly. >>>>>> And again, I propose "substring" instead of "string_ref". >>>>> I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) >>>> 1. Are these classes in the Boost library and/or namespace? >>> No. They are my own classes which I've not proposed to Boost. >> I respect your needs but I don't think that it is a good idea for the Boost library to avoid using convenient names just because these names are used by somebody who uses the boost namespace implicitly. > I don't understand how your comment applies. You said that the Boost library should not use the "substring" name for the class that represents (but does not own) a part of the existing string because you already use this name for the class with the same meaning in your private code. For me, this sounds strange. If the name "substring" is good for you, why boost::substring is bad for the similar class in the Boost library? >>>> 2. Do these classes do a different job, or they just have a different interface? >>> They reference a std::string and operate on a subset of the string's characters. The have special constructors and replicate most of string's interface. >> It seems that you may just switch to the boost::substring and get rid of your substring implementation some day. > Of course, but there isn't one and this discussion is about string_ref. For me, "substring" is just an alternative name of string_ref. The string_ref looks worse for me because: 1. It is not a kind of C++ reference. 2. It is not related to the std::string. So, both parts of the name "string_ref" are misleading. -- Best regards, Sergey Cheban

On Dec 12, 2012, at 3:58 PM, Sergey Cheban <s.cheban@drweb.com> wrote: > 12.12.2012 20:02, Rob Stewart пишет: > >>>>>>>> - safe bool or explicit bool conversion operator >>>>>>> I don't think this is a good idea. >>>>>> Why not? >>>>> This seems to be not intuitive and not so safe. >>>> It is quite intuitive to me. true means non-null, and false means null. >>> If the basic_substring<T> was convertible to bool, it would be used to compare basic_substring<char> with basic_substring<wchar_t> (with meaningless results). >> It would not be meaningless. Though certainly not the likely intent. Still, all that's needed are equality > > operators between the types to poison the flawed comparison. > Ok. What about boost::lexical_cast? There is no lexical_cast<unsigned>(const string_ref &) yet but there already is lexical_cast<unsigned>(bool). So, lexical_cast<unsigned>( string_ref("5") ) will return 1. This is a safe-bool operator we're discussing. In C++11, it's an explicit bool conversion operator. >>>>>>> And again, I propose "substring" instead of "string_ref". >>>>>> I also have [const_]substring classes which have a different interface, so I disagree. (There is, of course, some overlap.) >>>>> 1. Are these classes in the Boost library and/or namespace? >>>> No. They are my own classes which I've not proposed to Boost. >>> I respect your needs but I don't think that it is a good idea for the Boost library to avoid using convenient names just because these names are used by somebody who uses the boost namespace implicitly. >> I don't understand how your comment applies. > You said that the Boost library should not use the "substring" name for the class that represents (but does not own) a part of the existing string because you already use this name for the class with the same meaning in your private code. Ah, I see the confusion now. I meant that I see substrings as being different, so "string_ref" is more appropriate for the current purpose. >>>>> 2. Do these classes do a different job, or they just have a different interface? >>>> They reference a std::string and operate on a subset of the string's characters. The have special constructors and replicate most of string's interface. >>> It seems that you may just switch to the boost::substring and get rid of your substring implementation some day. >> Of course, but there isn't one and this discussion is about string_ref. > For me, "substring" is just an alternative name of string_ref. The string_ref looks worse for me because: > 1. It is not a kind of C++ reference. > 2. It is not related to the std::string. > So, both parts of the name "string_ref" are misleading. We use "string" in reference to char *'s, too. string_ref extends that to any sequence of characters, not necessarily null terminated. string_ref refers to memory owned elsewhere, so it is a reference to that memory. It is the broader English meaning of "reference" we're using, not the C++ meaning. Notice also my description of my substring class as operating on a std::string. It doesn't operate on any other sequence of characters, hence my distinguishing their names. ___ Rob

On Sun, Dec 9, 2012 at 3:13 PM, Rob Stewart <robertstewart@comcast.net>wrote:
- char const * data() - size_t length() I propose the alias size().
string_ref isn't a container like string, so I prefer just length().
I propose adding size() and throwing length() away. Really. 1. According to the standard, string is not a container either. 2. initializer_list, which is semantically very similar to string_ref, does use .size(). 3. Leaving the scope of the standard, Boost.Range uses boost::size(x) for ranges. I.e. size is a property of a range, not a container. 4. I always found string::length() to be an inconsistency in the standard library. I guess the reason for its existence is purely historical. -- Yakov

On 12/12/2012 2:50 AM, Yakov Galka wrote:
On Sun, Dec 9, 2012 at 3:13 PM, Rob Stewart <robertstewart@comcast.net>wrote:
- char const * data() - size_t length() I propose the alias size().
string_ref isn't a container like string, so I prefer just length().
I propose adding size() and throwing length() away. Really.
1. According to the standard, string is not a container either. 2. initializer_list, which is semantically very similar to string_ref, does use .size(). 3. Leaving the scope of the standard, Boost.Range uses boost::size(x) for ranges. I.e. size is a property of a range, not a container. 4. I always found string::length() to be an inconsistency in the standard library. I guess the reason for its existence is purely historical.
+1 I've got template code that currently takes, string's and vector<char>'s and uses size. With string_ref being a natural candidate, I'd hope it would work out of the box. Jeff

On Dec 12, 2012, at 8:26 AM, Jeff Flinn <Jeffrey.Flinn@gmail.com> wrote:
On 12/12/2012 2:50 AM, Yakov Galka wrote:
On Sun, Dec 9, 2012 at 3:13 PM, Rob Stewart <robertstewart@comcast.net>wrote:
- char const * data() - size_t length() I propose the alias size().
string_ref isn't a container like string, so I prefer just length().
I propose adding size() and throwing length() away. Really.
1. According to the standard, string is not a container either.
Ok, but string contains both.
2. initializer_list, which is semantically very similar to string_ref, does use .size().
OK
3. Leaving the scope of the standard, Boost.Range uses boost::size(x) for ranges. I.e. size is a property of a range, not a container.
OK
4. I always found string::length() to be an inconsistency in the standard library. I guess the reason for its existence is purely historical.
I don't consider it a defect. It is the common vernacular to speak of a string's length.
+1
I've got template code that currently takes, string's and vector<char>'s and uses size. With string_ref being a natural candidate, I'd hope it would work out of the box.
That's a reasonable argument. Perhaps both size() and length() are appropriate. ___ Rob

On 12/12/2012 10:54 AM, Rob Stewart wrote:
On Dec 12, 2012, at 8:26 AM, Jeff Flinn <Jeffrey.Flinn@gmail.com> wrote:
On 12/12/2012 2:50 AM, Yakov Galka wrote:
On Sun, Dec 9, 2012 at 3:13 PM, Rob Stewart <robertstewart@comcast.net>wrote:
- char const * data() - size_t length() I propose the alias size().
string_ref isn't a container like string, so I prefer just length().
I propose adding size() and throwing length() away. Really.
1. According to the standard, string is not a container either.
Ok, but string contains both.
2. initializer_list, which is semantically very similar to string_ref, does use .size().
OK
3. Leaving the scope of the standard, Boost.Range uses boost::size(x) for ranges. I.e. size is a property of a range, not a container.
OK
4. I always found string::length() to be an inconsistency in the standard library. I guess the reason for its existence is purely historical.
I don't consider it a defect. It is the common vernacular to speak of a string's length.
+1
I've got template code that currently takes, string's and vector<char>'s and uses size. With string_ref being a natural candidate, I'd hope it would work out of the box.
That's a reasonable argument.
Perhaps both size() and length() are appropriate.
Seems reasonable. Jeff

On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N]
This constructor is dangerous in cases like char space[100]; snprintf(space, 100, "format", args); string_ref str(space); so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar"). http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2012/n3468.pdf proposed adding various user-defined literals to the standard library. What do you folks think of string_ref UDLs like: namespace boost { namespace string_literals { #define BOOST_DEFINE_STRING_REF_LITERAL(CharT) \ constexpr basic_string_ref<CharT> operator"" _s( \ const CharT* str, size_t len) { \ return basic_string_ref<CharT>(str, len); \ } BOOST_DEFINE_STRING_REF_LITERAL(char); BOOST_DEFINE_STRING_REF_LITERAL(wchar_t); BOOST_DEFINE_STRING_REF_LITERAL(char16_t); BOOST_DEFINE_STRING_REF_LITERAL(char32_t); #undef BOOST_DEFINE_STRING_REF_LITERAL } } using namespace boost::string_literals; constexpr boost::string_ref glbl = "Constexpr"_s; constexpr boost::string_ref contains_null = "Const\0expr"_s; static_assert(contains_null.at(6) == 'e', "Expect string to extend past null"); static_assert(glbl.size() == sizeof("Constexpr") - 1, "Expect string to omit trailing null"); (Tested with Clang r163674) ? Jeffrey

Le 10/12/12 22:16, Jeffrey Yasskin a écrit :
On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N] This constructor is dangerous in cases like
char space[100]; snprintf(space, 100, "format", args); string_ref str(space);
so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar"). I guess this is because the proposed Boost interface should work on C++98 compilers. This doesn't mean that it could not provide something more adapted to C++11.
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2012/n3468.pdf proposed adding various user-defined literals to the standard library. What do you folks think of string_ref UDLs like:
namespace boost { namespace string_literals { #define BOOST_DEFINE_STRING_REF_LITERAL(CharT) \ constexpr basic_string_ref<CharT> operator"" _s( \ const CharT* str, size_t len) { \ return basic_string_ref<CharT>(str, len); \ } BOOST_DEFINE_STRING_REF_LITERAL(char); BOOST_DEFINE_STRING_REF_LITERAL(wchar_t); BOOST_DEFINE_STRING_REF_LITERAL(char16_t); BOOST_DEFINE_STRING_REF_LITERAL(char32_t); #undef BOOST_DEFINE_STRING_REF_LITERAL } }
using namespace boost::string_literals; constexpr boost::string_ref glbl = "Constexpr"_s; constexpr boost::string_ref contains_null = "Const\0expr"_s; static_assert(contains_null.at(6) == 'e', "Expect string to extend past null"); static_assert(glbl.size() == sizeof("Constexpr") - 1, "Expect string to omit trailing null");
(Tested with Clang r163674)
?
I think that the suffix _s is used also for seconds. Can both be used without ambiguity? If not, is _str too long for a suffix for strings? Vicente

On Mon, Dec 10, 2012 at 3:36 PM, Vicente J. Botet Escriba <vicente.botet@wanadoo.fr> wrote:
Le 10/12/12 22:16, Jeffrey Yasskin a écrit :
On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N]
This constructor is dangerous in cases like
char space[100]; snprintf(space, 100, "format", args); string_ref str(space);
so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar").
I guess this is because the proposed Boost interface should work on C++98 compilers. This doesn't mean that it could not provide something more adapted to C++11.
I'm thinking largely in terms of the C++14 proposal, and it sounds like you're saying that, if we target >=C++11, then a UDL would satisfy your want for an array-length-deducing "constructor"? I recognize that Boost probably needs to do something to support C++98 users, although maybe that should be "pass in a length if your literal contains a \0."
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2012/n3468.pdf proposed adding various user-defined literals to the standard library. What do you folks think of string_ref UDLs like:
namespace boost { namespace string_literals { #define BOOST_DEFINE_STRING_REF_LITERAL(CharT) \ constexpr basic_string_ref<CharT> operator"" _s( \ const CharT* str, size_t len) { \ return basic_string_ref<CharT>(str, len); \ } BOOST_DEFINE_STRING_REF_LITERAL(char); BOOST_DEFINE_STRING_REF_LITERAL(wchar_t); BOOST_DEFINE_STRING_REF_LITERAL(char16_t); BOOST_DEFINE_STRING_REF_LITERAL(char32_t); #undef BOOST_DEFINE_STRING_REF_LITERAL } }
using namespace boost::string_literals; constexpr boost::string_ref glbl = "Constexpr"_s; constexpr boost::string_ref contains_null = "Const\0expr"_s; static_assert(contains_null.at(6) == 'e', "Expect string to extend past null"); static_assert(glbl.size() == sizeof("Constexpr") - 1, "Expect string to omit trailing null");
(Tested with Clang r163674)
?
I think that the suffix _s is used also for seconds. Can both be used without ambiguity? If not, is _str too long for a suffix for strings?
_s can be used for both. See [lex.ext] for the lookup rules, but 123_s is looked up as operator"" _s(unsigned long long) or template<char Digit...>operator"" _s() or operator"" _s(const char*), while "123"_s is looked up as operator"" _s(const char*, size_t). IIRC, there was some nervousness in the committee about using that to distinguish, but N3468 seems to say that they resolved this in Portland in favor of using 's' for both string and seconds. (Note that I disagree with that paper about using the 's' UDL to return non-constexpr types, but I can hash that out with the committee. I'm mostly asking this list about whether any UDL spelling would remove the desire for an explicit array constructor.) Thanks, Jeffrey

On Dec 10, 2012, at 4:16 PM, Jeffrey Yasskin <jyasskin@googlers.com> wrote:
On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N]
This constructor is dangerous in cases like
char space[100]; snprintf(space, 100, "format", args); string_ref str(space);
so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar").
Misuse is the user's fault, of course, but I understand the value of enlisting the compiler's help and making things verbose to turn mistakes into errors or to make them more obvious. In your example, from_literal() is incorrect as the construction should be from a char * (after array to pointer decay), with an implicit strlen(), not from a string literal. So, I think you meant that from_array() should replace the referenced constructor so string_ref(char const *) matches the call in your example instead. That's a reasonable change. ___ Rob

On Wed, Dec 12, 2012 at 4:49 AM, Rob Stewart <robertstewart@comcast.net> wrote:
On Dec 10, 2012, at 4:16 PM, Jeffrey Yasskin <jyasskin@googlers.com> wrote:
On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N]
This constructor is dangerous in cases like
char space[100]; snprintf(space, 100, "format", args); string_ref str(space);
so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar").
Misuse is the user's fault, of course, but I understand the value of enlisting the compiler's help and making things verbose to turn mistakes into errors or to make them more obvious.
In your example, from_literal() is incorrect as the construction should be from a char * (after array to pointer decay), with an implicit strlen(), not from a string literal. So, I think you meant that from_array() should replace the referenced constructor so string_ref(char const *) matches the call in your example instead. That's a reasonable change.
I don't understand this paragraph. I was saying that the suggestions on this list were tending toward string_ref("foo\0bar").size()==3 and string_ref::from_literal("foo\0bar").size()==7. I'm not clear on what people intend string_ref::from_array("foo\0bar") to do, but maybe it includes the trailing \0? Then I suggested that from_literal might be replaced by a C++11 UDL, but you ignored that part of my email. Jeffrey

On Thu, Dec 13, 2012 at 3:48 AM, Jeffrey Yasskin <jyasskin@googlers.com> wrote:
Then I suggested that from_literal might be replaced by a C++11 UDL, but you ignored that part of my email.
I feel concerned about defining UDL operators in the public interface of a library. AFAICT, the operators don't support any scoping, which means name clashes are most probable. Therefore I'm opposed to UDL approach and would like to see generator functions. This approach will also support C++03.

On Dec 12, 2012, at 6:48 PM, Jeffrey Yasskin <jyasskin@googlers.com> wrote:
On Wed, Dec 12, 2012 at 4:49 AM, Rob Stewart <robertstewart@comcast.net> wrote:
On Dec 10, 2012, at 4:16 PM, Jeffrey Yasskin <jyasskin@googlers.com> wrote:
On Wed, Nov 28, 2012 at 2:48 AM, Rob Stewart <robertstewart@comcast.net> wrote:
o char const (&)[N]
This constructor is dangerous in cases like
char space[100]; snprintf(space, 100, "format", args); string_ref str(space);
so I think most of the suggestions on this list have moved toward a more explicit but very verbose string_ref::from_literal("foo\0bar").
Misuse is the user's fault, of course, but I understand the value of enlisting the compiler's help and making things verbose to turn mistakes into errors or to make them more obvious.
In your example, from_literal() is incorrect as the construction should be from a char * (after array to pointer decay), with an implicit strlen(), not from a string literal. So, I think you meant that from_array() should replace the referenced constructor so string_ref(char const *) matches the call in your example instead. That's a reasonable change.
I don't understand this paragraph. I was saying that the suggestions on this list were tending toward string_ref("foo\0bar").size()==3 and string_ref::from_literal("foo\0bar").size()==7. I'm not clear on what people intend string_ref::from_array("foo\0bar") to do, but maybe it includes the trailing \0?
I didn't follow that part very carefully, it seems, if your summary is correct. I do think it is important to be able to use an entire array, so from_array() would be useful apart from from_literal().
Then I suggested that from_literal might be replaced by a C++11 UDL, but you ignored that part of my email.
Since I know almost nothing about UDLs, what should I have done differently? ___ Rob

16.11.2012, 14:45, "Olaf van der Spek" <ml@vdspek.org>:
On Fri, Nov 16, 2012 at 11:31 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Not only subscripted access. Taking a subrange also requires knowing size. Copying from/to (read memcpy) - same. Filling (read memset) - same. Comparing (read memcmp) - same.
Those are C-style constructs. The C++-style equivalents are iterator-based.
Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases. (char_range is an optimization technique so we aim for maximum speed. If you don't maximize speed you'd be happy with simple and safe std::string copies.)
Suppose you have two pointers, 0xa0 (begin) and 0xb0 (end). The size in bytes is 0x10. Suppose you have one pointer (0xa0) and one size (0x10). Does this point to the same memory? "this" means 0xa0+0x10? By construction - yes, they do. We trust the caller that he gave us correct size (or correct pair of begin/end pointers from which we compute size in our ctor). std::string makes same assumptions.
Yes if sizeof(value_type) == 1, no otherwise. You can't tell to what memory range it points without knowing sizeof(value_type)
Ah. The first pointer (0xa0) is typed, so we surely know value_type. That's why your 0xa0 - 0xb0 works. They are not void*, they are value_type*.
Shouldn't they be implicit? Not from std::string. Same argument as for not having implicit conversion to char*. What argument would that be? You are giving away a reference to string internals that are subject to change/die anytime.
Isn't that by definition for a reference? It applies to const string& too. I don't think that's a good reason.
It's not a reference to std::string, it's a reference to *internals* of std::string. Those internals are managed by std::string exclusively. I.e. if you have a reference to std::string and you expand the string, the reference will continue to work with no problem, while a reference to internals will be invalidated (the simplest example of a reference to internals are invalidating iterators). But when you give away iterators, you do it explicitly via begin/end. Same way, if you give away a reference to std::string internals, you do it explicitly via data/c_str. This make potentially dangerous code visible. Same should be done with char_range construction from std::string::data - it should be explicit. Btw, const references are not that harmless, consider this innocent-looking code: struct S { const std::string& ref_; S(const std::string& ref): ref_(ref) {} }; S s1("foo"); S s2(std::string("bar"));
Making it explicit and visible in the caller code ensures that the programmer will take special measures to make sure that the string doesn't change/die while there's a char_range looking into it.
Consider std::vector<char_range>, for example. Back to this example:
// std::vector<std::string> v; - too slow, upgrading to our new char_range! std::vector<char_range> v; v.push_back( "foo" ); v.push_back( std::string("bar") ); // BOOM When pushing stuff to this vector, we want to be 100% sure that strings that gave away their char_ranges will live longer than the vector and live unchanged. And for this we need all the help a compiler can give us, namely - force us to explicitly declare the give-away and fail to compile otherwise. char_range is an efficient, but dangerous technique. I'm not a particular fan of Python, but when it comes to ownership management in C++, I prefer their maxima "explicit is better than implicit".
For the same reason we have explicit char_range::literal and char_range::from_array. I'd like this to work: void f(str_ref); f("Olaf"); f( char_range::literal("Olaf") ); Explicit and with size known at compile-time (so compiler can utilize this knowledge).
Thanks, Maxim

On 16 November 2012 10:54, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases.
Now we are getting somewhere. Actual experience. Could you elaborate on the compilers and constructs that need to be hand optimized into equivalent code because the optimizers aren't doing it themselves? Or are there better constructs with size that aren't equivalent to their pointer counterparts? -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Nevin Liber <nevin <at> eviloverlord.com> writes:
Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases.
Now we are getting somewhere. Actual experience. Could you elaborate on the compilers and constructs that need to be hand optimized into equivalent code because the optimizers aren't doing it themselves? Or are there better constructs with size that aren't equivalent to their pointer counterparts?
All memcpy/memcmp/memset functions require ptr+size to be passed. So we either compute the size manually every time from begin-end pointers (it's really nothing comparing to mem* functions execution time) or carry it on board. So for this particular set of use cases I believe it doesn't matter if it's pair of pointers or pointer and size - mem* functions will run order of magnitude longer anyway. And here I'd prefer having 2 pointers as it's conceptually cleaner as (again) char_range is essentially just an iterator_range<char*>. But there are other operations, e.g. sub_string (sub_range in our case) accepts 2 indexes and the second one can be anything up to std::string::npos (see std::string interface) meaning "to the end". So you need size to calculate the result and to avoid jumping beyond the range. As all operations here are very simple, eliminating computation of ptr difference can give some extra speed. This might make some difference in parsers when you have a big input string and then all lexemes and thousands of references from AST to corresponding text ranges are just char_ranges (subranges) pointing into the big string. These are all my speculations, I don't have performance figures of ptr+ptr vs. ptr+size (I measured it one or two years ago in our project (we use ptr+ptr, and I considered switching to ptr+size), and probably didn't notice any observable difference as I didn't finally switch - I don't remember any details already. And yes, we explicitly used mem* functions as we weren't satisfied with the code GCC generated). I hope LLVM people (authors of the original proposal) could share their experience as well. Thanks, Maxim

Nevin Liber wrote:
Now we are getting somewhere. Actual experience. Could you elaborate on the compilers and constructs that need to be hand optimized into equivalent code because the optimizers aren't doing it themselves?
Compilers are, in my experience, better at optimizing i < n loops than they are at optimizing i != end loops. They are better at constant-folding the n, too, which then helps with eventual unrolls. But why are we having this discussion at all? These are private members of the class; the author can do whatever he likes.

on Fri Nov 16 2012, "Peter Dimov" <lists-AT-pdimov.com> wrote:
Nevin Liber wrote:
Now we are getting somewhere. Actual experience. Could you elaborate on the compilers and constructs that need to be hand optimized into equivalent code because the optimizers aren't doing it themselves?
Compilers are, in my experience, better at optimizing i < n loops than they are at optimizing i != end loops. They are better at constant-folding the n, too, which then helps with eventual unrolls.
+1
But why are we having this discussion at all? These are private members of the class; the author can do whatever he likes.
+10 -- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost

On 16/11/12 18:38, Nevin Liber wrote:
On 16 November 2012 10:54, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases.
Now we are getting somewhere. Actual experience. Could you elaborate on the compilers and constructs that need to be hand optimized into equivalent code because the optimizers aren't doing it themselves? Or are there better constructs with size that aren't equivalent to their pointer counterparts?
Hand-optimized code is better than what you obtain through automated code generation optimization algorithms, since those algorithms are only as smart as the transformations the people who engineered them could make the system detect and do, and are therefore one-size-fits-all. On the other hand, hand-optimized code is optimized specially for the particular thing in question. memcpy on x86-64, for example, is typically 1.5x faster when specifically hand-optimized (and that's assuming the compiler detected that it could apply vectorization, which is most of the time not the case).

On Fri, Nov 16, 2012 at 5:54 PM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
Those are C-style constructs. The C++-style equivalents are iterator-based.
Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases.
AFAIK MSVC has library code to use memcpy for std::copy if possible.
(char_range is an optimization technique so we aim for maximum speed. If you don't maximize speed you'd be happy with simple and safe std::string copies.)
Could you stop trying to say what I should be happy with please? It's not just about performance. You can't pass a CString (MFC) or QString (QT) to a function taking a const string&.
Suppose you have two pointers, 0xa0 (begin) and 0xb0 (end). The size in bytes is 0x10. Suppose you have one pointer (0xa0) and one size (0x10). Does this point to the same memory? "this" means 0xa0+0x10? By construction - yes, they do.
No, it could mean 0a0 + 0x40 if sizeof(value_type) == 4.
Yes if sizeof(value_type) == 1, no otherwise. You can't tell to what memory range it points without knowing sizeof(value_type)
Ah. The first pointer (0xa0) is typed, so we surely know value_type.
No, we don't always know the type.
That's why your 0xa0 - 0xb0 works. They are not void*, they are value_type*.
Isn't that by definition for a reference? It applies to const string& too. I don't think that's a good reason.
It's not a reference to std::string, it's a reference to *internals* of std::string. Those internals are managed by std::string exclusively. I.e. if you have a reference to std::string and you expand the string, the reference will continue to work with no problem, while a reference to internals will be invalidated (the simplest example of a reference to internals are invalidating iterators). But when you give away iterators, you do it explicitly via begin/end. Same way, if you give away a reference to std::string internals, you do it explicitly via data/c_str. This make potentially dangerous code visible. Same should be done with char_range construction from std::string::data - it should be explicit.
Btw, const references are not that harmless, consider this innocent-looking code:
struct S { const std::string& ref_; S(const std::string& ref): ref_(ref) {} };
S s1("foo"); S s2(std::string("bar"));
I know. The danger is in storing a reference (or pointer) to something that may die before the reference. It's not in passing a function argument as reference.
// std::vector<std::string> v; - too slow, upgrading to our new char_range! std::vector<char_range> v; v.push_back( "foo" ); v.push_back( std::string("bar") ); // BOOM
When pushing stuff to this vector, we want to be 100% sure that strings that gave away their char_ranges will live longer than the vector and live unchanged. And for this we need all the help a compiler can give us, namely - force us to explicitly declare the give-away and fail to compile otherwise.
I think if you need that kind of 'safety', C++ isn't the language for you.
For the same reason we have explicit char_range::literal and char_range::from_array. I'd like this to work: void f(str_ref); f("Olaf"); f( char_range::literal("Olaf") ); Explicit and with size known at compile-time (so compiler can utilize this knowledge).
Explicit and unclean. -- Olaf

On 16/11/12 17:54, Yanchenko Maxim wrote:
16.11.2012, 14:45, "Olaf van der Spek" <ml@vdspek.org>:
Yes if sizeof(value_type) == 1, no otherwise. You can't tell to what memory range it points without knowing sizeof(value_type)
Ah. The first pointer (0xa0) is typed, so we surely know value_type.
I assume the point of his argument is that you could write most of the code without templates, meaning smaller code size and faster compilation times.

On 16/11/12 10:33, Olaf van der Spek wrote:
On Fri, Nov 16, 2012 at 10:28 AM, Yanchenko Maxim <maximyanchenko@yandex.ru> wrote:
16.11.12, 12:30, "Olaf van der Spek":
OTOH size is needed very frequently and having it precomputed is a good thing, so conversion
Size is cheap to calculate, it's not worth storing it. Standardizing the layout of this type might be good too for ABI stability and maybe interoperability.
Yes, it's cheap to calculate (end ptr is equally cheap to calculate btw) but you'll end up calculating it all the time in almost every function. At least my analysis showed that in real app size() is called much more frequently than end().
Really? Do you use subscripted access then? With iterator access you'd use end().
Most of std::string member functions (which are replicated in string_ref) use a subscript interface.

On 16/11/12 05:47, Yanchenko Maxim wrote:
3. It's worth having a static constructor 'literal' (templated with size) to construct char_ranges from literals - as the compiler knows their size in compile time (minus zero terminator). It can be a constexpr too.
strlen could be made to be constexpr, as it is already in several implementations.
participants (20)
-
Andrey Semashev
-
Antony Polukhin
-
Daniel James
-
Dave Abrahams
-
Gottlob Frege
-
Jeff Flinn
-
Jeffrey Lee Hellrung, Jr.
-
Jeffrey Yasskin
-
Marshall Clow
-
Mathias Gaunard
-
Maxim Yanchenko
-
Nevin Liber
-
Olaf van der Spek
-
Peter Dimov
-
Rob Stewart
-
Sergey Cheban
-
Steven Watanabe
-
Vicente J. Botet Escriba
-
Yakov Galka
-
Yanchenko Maxim