Heads up - string_ref landing

I'm about to check in some new functionality into the string_algo library; an implementation of string_ref. A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways. The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp". Basic sanity tests will be checked in with the header file - I have tested on clang, clang11, and gcc. One reason for checking this in now is to get the tests run on lots of different systems. Docs will be coming soon, but http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html is the proposal for "a future version of C++" ;-) Please let me know what you think! -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

Hi Marshall, On Thu, Nov 15, 2012 at 8:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
Nice and thanks for your work! Personally I've been missing something like this in the standard many times and had to write some such class for several different projects.
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
Basic sanity tests will be checked in with the header file - I have tested on clang, clang11, and gcc. One reason for checking this in now is to get the tests run on lots of different systems.
Docs will be coming soon, but http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html is the proposal for "a future version of C++" ;-)
I hope it gets included into the standard soon. Best, Matus

On Nov 15, 2012, at 11:31 AM, Matus Chochlik <chochlik@gmail.com> wrote:
Hi Marshall,
On Thu, Nov 15, 2012 at 8:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
Nice and thanks for your work! Personally I've been missing something like this in the standard many times and had to write some such class for several different projects.
You're welcome. I use it a bunch, too.
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
I agree - but it is (will be useful) for string algorithms.
Basic sanity tests will be checked in with the header file - I have tested on clang, clang11, and gcc. One reason for checking this in now is to get the tests run on lots of different systems.
Docs will be coming soon, but http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html is the proposal for "a future version of C++" ;-)
I hope it gets included into the standard soon.
"Soon" and "C++ standard" are generally not words you see together. We're hoping for 2014. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Thu, Nov 15, 2012 at 1:39 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
On Nov 15, 2012, at 11:31 AM, Matus Chochlik <chochlik@gmail.com> wrote:
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
I agree - but it is (will be useful) for string algorithms.
-20 ;-) We *really* should stop putting headers in the top level boost dir. The only other obvious place is "boost/utility/string_ref.hpp". -- -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

On 15 November 2012 19:46, Rene Rivera <grafikrobot@gmail.com> wrote:
On Thu, Nov 15, 2012 at 1:39 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
On Nov 15, 2012, at 11:31 AM, Matus Chochlik <chochlik@gmail.com> wrote:
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
I agree - but it is (will be useful) for string algorithms.
-20 ;-) We *really* should stop putting headers in the top level boost dir. The only other obvious place is "boost/utility/string_ref.hpp".
I agree. The top-level directory is overpopulated. Shouldn't this live in Boost.Container next to string? That library's introduction exactly fits what this is (one of the "latest draft features"). Thanks for donating this Marshall. I've used something similar with Boost.Spirit before and seen big speed improvements, so I'd like to try your implementation out. -- Darren

On Nov 15, 2012, at 3:10 PM, Darren Garvey <darren.garvey@gmail.com> wrote:
On 15 November 2012 19:46, Rene Rivera <grafikrobot@gmail.com> wrote:
On Thu, Nov 15, 2012 at 1:39 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
On Nov 15, 2012, at 11:31 AM, Matus Chochlik <chochlik@gmail.com> wrote:
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
I agree - but it is (will be useful) for string algorithms.
-20 ;-) We *really* should stop putting headers in the top level boost dir.
Agreed
The only other obvious place is "boost/utility/string_ref.hpp".
I agree. The top-level directory is overpopulated. Shouldn't this live in Boost.Container next to string? That library's introduction exactly fits what this is (one of the "latest draft features").
I certainly don't think of a string_ref as a container. It owns no elements. Utility makes a lot of sense to me. ___ Rob

On Fri, Nov 16, 2012 at 7:54 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 15/11/12 21:10, Darren Garvey wrote:
I agree. The top-level directory is overpopulated.
What's wrong with it? If it's going to be in the top-level boost namespace, it might as well be a top-level file too.
We are in process of modularizing boost libraries because lumping all of boost into a single repo, or tarball, or whatever doesn't scale. So think about the future - in a modularized boost where the top-level is really just references to libraries, what library does this belong in? --Beman

On 16/11/12 15:13, Beman Dawes wrote:
On Fri, Nov 16, 2012 at 7:54 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 15/11/12 21:10, Darren Garvey wrote:
I agree. The top-level directory is overpopulated.
What's wrong with it? If it's going to be in the top-level boost namespace, it might as well be a top-level file too.
We are in process of modularizing boost libraries because lumping all of boost into a single repo, or tarball, or whatever doesn't scale.
So think about the future - in a modularized boost where the top-level is really just references to libraries, what library does this belong in?
It can be in any module. Surely any module can add files anywhere, as long as two modules do not have files with conflicting names.

On Nov 15, 2012, at 11:46 AM, Rene Rivera <grafikrobot@gmail.com> wrote:
On Thu, Nov 15, 2012 at 1:39 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
On Nov 15, 2012, at 11:31 AM, Matus Chochlik <chochlik@gmail.com> wrote:
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
+1 to boost/string_ref.hpp (it is not an algorithm)
I agree - but it is (will be useful) for string algorithms.
-20 ;-) We *really* should stop putting headers in the top level boost dir. The only other obvious place is "boost/utility/string_ref.hpp".
That's a good idea. The good news is that I don't need to decide until (about) 15-Dec ;-) -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Thu, Nov 15, 2012 at 8:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
Great!
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
Wouldn't a ptr pair be better?
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
I'd go with the latter. I'm missing explicit operator bool() const { return !empty(); } Olaf

AMDG On 11/15/2012 11:39 AM, Olaf van der Spek wrote:
On Thu, Nov 15, 2012 at 8:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
Great!
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
Wouldn't a ptr pair be better?
Why? I don't see that it makes any difference whatsoever.
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
I'd go with the latter. I'm missing explicit operator bool() const { return !empty(); }
-1. A string is not a boolean value nor is an empty string a special invalid value. If you want to know whether a string is empty, you should call s.empty(). In Christ, Steven Watanabe

On Thu, Nov 15, 2012 at 9:32 PM, Steven Watanabe <watanabesj@gmail.com> wrote:
Why? I don't see that it makes any difference whatsoever.
It's just an implementation detail, indeed.
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
I'd go with the latter. I'm missing explicit operator bool() const { return !empty(); }
-1. A string is not a boolean value
Neither is an iostream. Or an int or a pointer. What's your point?
nor is an empty string a special invalid value. If you want to know whether a string is empty, you should call s.empty().
That's a bit hard when you've got code like while (string_ref s = f()) or if (string_ref s = f()) Empty/non-empty is a clear and frequently tested distinction and one of the two happens to be the default value. -- Olaf

Marshall Clow wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
I did something like this once. I called mine const_string_facade - I also had a mutable version - and it was a template that took an iterator pair. This lets you adapt something like a vector<char> into a string. BUT, I think my feeling is that this is actually taking us in the wrong direction: my current coding style tries to avoid the "special" features of std::string and prefers the things that are common to other containers, and std::algorithms. Regards, Phil.

On Nov 15, 2012, at 12:41 PM, "Phil Endecott" <spam_from_boost_dev@chezphil.org> wrote:
Marshall Clow wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
I mistyped: A string_ref is a non-owning reference to a contiguous sequence of characters.
I did something like this once. I called mine const_string_facade - I also had a mutable version - and it was a template that took an iterator pair. This lets you adapt something like a vector<char> into a string.
BUT, I think my feeling is that this is actually taking us in the wrong direction: my current coding style tries to avoid the "special" features of std::string and prefers the things that are common to other containers, and std::algorithms.
Yes. And No. ;-) When you're writing generic code (and many of us do), you want to be as general as possible. In that case, you want a pair of iterators, or .. a range!. However, when you are dealing with contiguous runs of characters (a a surprisingly large set of cases), then a specific solution (like string_ref) can give you a significant performance boost. The LLVM experience has been quite enlightening about this, and while I don't have first-hand knowledge of what happened inside of Google, the reports I get from people there say the same thing. -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki

On Thu, Nov 15, 2012 at 12:51 PM, Marshall Clow <mclow.lists@gmail.com>wrote:
On Nov 15, 2012, at 12:41 PM, "Phil Endecott" < spam_from_boost_dev@chezphil.org> wrote:
I'm about to check in some new functionality into the string_algo
Marshall Clow wrote: library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented
as a {pointer, length} pair, and is exceedingly useful
when parsing, and manipulating strings in "read-only" ways.
I mistyped: A string_ref is a non-owning reference to a contiguous sequence of characters.
How is this different from (say) contiguous_range< char > / contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref. - Jeff

On 16/11/12 06:30, Jeffrey Lee Hellrung, Jr. wrote:
How is this different from (say) contiguous_range< char > / contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref.
It isn't. The only difference is that string_ref provides an interface for substring operations, similar to std::string but without returning copies of the data, hence the string in the name.

On Fri, Nov 16, 2012 at 2:00 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 16/11/12 06:30, Jeffrey Lee Hellrung, Jr. wrote:
How is this different from (say) contiguous_range< char > / contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref.
It isn't. The only difference is that string_ref provides an interface for substring operations, similar to std::string but without returning copies of the data, hence the string in the name.
That'd be better as a non-member function, wouldn't it? -- Olaf

On 16/11/12 14:04, Olaf van der Spek wrote:
On Fri, Nov 16, 2012 at 2:00 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 16/11/12 06:30, Jeffrey Lee Hellrung, Jr. wrote:
How is this different from (say) contiguous_range< char > / contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref.
It isn't. The only difference is that string_ref provides an interface for substring operations, similar to std::string but without returning copies of the data, hence the string in the name.
That'd be better as a non-member function, wouldn't it?
It arguably would, but that wouldn't mimic std::string's interface. I'm talking of stuff like string_ref a = "foobar"; string_ref b = a.substr(2, 4); assert( b == "obar" ); This is somewhat equivalent to the same code without the "_ref", except the above doesn't actually copy anything (modifying b will modify a). See the paper for more info.

On Fri, Nov 16, 2012 at 2:33 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
That'd be better as a non-member function, wouldn't it?
It arguably would, but that wouldn't mimic std::string's interface.
Isn't that a good thing? If we want to move away from that interface, designing new types with that interface is not a good plan. -- Olaf

On Fri, Nov 16, 2012 at 5:00 AM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
On 16/11/12 06:30, Jeffrey Lee Hellrung, Jr. wrote:
How is this different from (say) contiguous_range< char > /
contiguous_range< char const > ? I can imagine a contiguous_range<T> that wraps a pair of T*s, which would seem to be a simple generalization of your proposed string_ref.
It isn't. The only difference is that string_ref provides an interface for substring operations, similar to std::string but without returning copies of the data, hence the string in the name.
Nothing here sounds specific to strings other than naming. That's why it seems like unnecessary specificity to me. IMHO, I would think typedef contiguous_range< char > string_ref; typedef contiguous_range< char const > string_cref; would be better, and if you want an interface with names specifically tailored to strings, make then free functions. (And, no, contiguous_range does not exist in Boost, but given we already have a model of it in the proposed string_ref, it seems like a simple matter to provide it.) - Jeff

On Fri, Nov 16, 2012 at 8:50 AM, Peter Dimov <lists@pdimov.com> wrote:
Jeffrey Lee Hellrung, Jr. wrote:
IMHO, I would think
typedef contiguous_range< char > string_ref; typedef contiguous_range< char const > string_cref;
would be better, ...
In what ways would it be better?
For one, it would provide a single implementation of a contiguous range of char, char const, unsigned char, unsigned char const, wchar_t, and wchar_t const. Secondly (and I haven't read the C++ proposal), I'm gathering one use case for string_ref is to provide a non-template API to accept or return a sequence of characters more generically than using a std::string but with no loss in runtime performance (here meaning no copying of the underlying characters). E.g., a function whose parameter is string_ref may accept a std::string or a std::vector< char >, which I think is an improvement over a parameter of type std::string. But there's nothing inherently string-like about that design; this use case exists for other value types as well. I used this technique recently: A base class defined a virtual function with signature {virtual contiguous_range< foo > bar() const}, and the derived classes could optionally back their collection of foo's by either a C-array, a {boost,std}::array, a std::vector, or my_custom_dynamic_array. Lastly, and I don't know how big a deal this is, but one could write more tailored range algorithms for contiguous_ranges of POD types (using std::memcpy or std::memset, for example, which may produce more optimal code (?) than std::copy or std::fill, resp.). So I can see a plausible advantage in having a single template class abstracting contiguous ranges. That's just my thinking at the moment, I could be missing something. - Jeff

Jeffrey Lee Hellrung, Jr. wrote:
... I haven't read the C++ proposal ...
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html
For one, it would provide a single implementation of a contiguous range of char, char const, unsigned char, unsigned char const, wchar_t, and wchar_t const.
It does.
But there's nothing inherently string-like about that design;
There are a number of very string-like things about the design.
Lastly, and I don't know how big a deal this is, but one could write more tailored range algorithms for contiguous_ranges of POD types (using std::memcpy or std::memset, for example, which may produce more optimal code (?) than std::copy or std::fill, resp.). So I can see a plausible advantage in having a single template class abstracting contiguous ranges.
To do this, you should accept an arbitrary Range and check a trait (is_contiguous), not hardcode contiguous_range<> (for instance, std::vector isn't contiguous_range<>, bit it is a contiguous range).

On Fri, Nov 16, 2012 at 8:16 PM, Peter Dimov <lists@pdimov.com> wrote:
[...]
Lastly, and I don't know how big a deal this is, but one could write more tailored range algorithms for contiguous_ranges of POD types (using std::memcpy or std::memset, for example, which may produce more optimal code (?) than std::copy or std::fill, resp.). So I can see a plausible advantage in having a single template class abstracting contiguous ranges.
To do this, you should accept an arbitrary Range and check a trait (is_contiguous), not hardcode contiguous_range<> (for instance, std::vector isn't contiguous_range<>, bit it is a contiguous range).
I would attack this problem in a different way. There should be a contiguous_iterator_tag : random_access_iterator_tag in the standard, then we get two things for free: * Paragraphs ?.2.2/3 are not longer relevant. We will be able to construct a string_ref from any Range of contiguous_iterators. * The standard library implementation can specialize the standard algorithms for std::copy to use the optimized memcpy whenever a contiguous_iterators to a POD is given. contiguous_range<> would be essentially an iterator_range<> with contiguous_iterators. -- Yakov

17.11.2012, 02:47, "Peter Dimov" <lists@pdimov.com>:
 Yakov Galka wrote:
  I would attack this problem in a different way. There should be a   contiguous_iterator_tag : random_access_iterator_tag in the standard, ...  This deserves a standard proposal, IMO.
Isn't just an std::is_pointer<> enough to make a choice? iterator_range<T*> is already a contiguous range as T* is a contiguous iterator. Maxim

On Fri, Nov 16, 2012 at 8:59 PM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
17.11.2012, 02:47, "Peter Dimov" <lists@pdimov.com>:
Yakov Galka wrote:
I would attack this problem in a different way. There should be a contiguous_iterator_tag : random_access_iterator_tag in the standard, ... This deserves a standard proposal, IMO.
Isn't just an std::is_pointer<> enough to make a choice? iterator_range<T*> is already a contiguous range as T* is a contiguous iterator.
For the pointers case, not for capturing the fact that std::string::iterator and std::vector::iterator are also guaranteed to be contiguous. Contiguous means something like &i[n] == &*i + n. -- Yakov

All, On Fri, Nov 16, 2012 at 11:59 AM, Yanchenko Maxim <maximyanchenko@yandex.ru>wrote:
17.11.2012, 02:47, "Peter Dimov" <lists@pdimov.com>:
Yakov Galka wrote:
I would attack this problem in a different way. There should be a contiguous_iterator_tag : random_access_iterator_tag in the standard, ... This deserves a standard proposal, IMO.
Isn't just an std::is_pointer<> enough to make a choice? iterator_range<T*> is already a contiguous range as T* is a contiguous iterator.
The std library implementations I've looked into (VC++ & libc++) already specialize things like fill & copy for POD pointer types. Such a proposal would improve the speed of applying such algorithms to T* wrapping iterators, where those iterators don't store anything but T* on the stack. (They might have a static variable which they use for iterator invalidation checking -- I recall something like that was discussed for libc++.)
In any case, the utility of such a trait seems limited, but existent. Nate

Yanchenko Maxim wrote:
17.11.2012, 02:47, "Peter Dimov" <lists@pdimov.com>:
Yakov Galka wrote:
I would attack this problem in a different way. There should be a contiguous_iterator_tag : random_access_iterator_tag in the standard, ... This deserves a standard proposal, IMO.
Isn't just an std::is_pointer<> enough to make a choice?
No, because vector<>::iterator is not guaranteed to be a pointer, and isn't, on many implementations. Similarly, string::iterator is not guaranteed to be char* (but often is).

On Fri, Nov 16, 2012 at 10:16 AM, Peter Dimov <lists@pdimov.com> wrote:
Jeffrey Lee Hellrung, Jr. wrote:
... I haven't read the C++ proposal ...
For one, it would provide a single implementation of a contiguous range of
char, char const, unsigned char, unsigned char const, wchar_t, and wchar_t const.
It does.
But there's nothing inherently string-like about that design;
There are a number of very string-like things about the design.
We're talking about 2 different designs. The way Marshall had originally phrased it (or, at least, how I originally read it), it was just a wrapper around a pair of char*s, or a char* and a std::size_t, not much more than that. N3442, on the other hand, is entirely specific to strings, but some of the motivations of N3442 seem questionable (specifically, copying and perpetuating the already bloated interface of std::string; I can see where this is desirable, but I can also see where it's undesirable). If we separate the string-specific stuff out as free functions, then what's left is a component that offers a level of genericity without templates and without sacrificing performance. I gathered Marshall had proposed this primarily in order to make the StringAlgorithms API more flexible without loss of performance. Actually, now that I think about it, contiguous_range<T> == iterator_range<T*>, AFAICT. Maybe iterator_range<T*> might want some member functions added for conversion to/from other contiguous ranges, and maybe there are some other interface tweaks one can make specific for pointers. Lastly, and I don't know how big a deal this is, but one could write more
tailored range algorithms for contiguous_ranges of POD types (using std::memcpy or std::memset, for example, which may produce more optimal code (?) than std::copy or std::fill, resp.). So I can see a plausible advantage in having a single template class abstracting contiguous ranges.
To do this, you should accept an arbitrary Range and check a trait (is_contiguous), not hardcode contiguous_range<> (for instance, std::vector isn't contiguous_range<>, bit it is a contiguous range).
True; indeed, I like Yakov's suggestion of a contiguous_traversal_tag (sorry I'm slightly modifying the suggestion to use boost::traversal_tags rather than std::iterator_tags). - Jeff

On Fri, Nov 16, 2012 at 10:20 PM, Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com> wrote:
Actually, now that I think about it, contiguous_range<T> == iterator_range<T*>, AFAICT. Maybe iterator_range<T*> might want some member functions added for conversion to/from other contiguous ranges, and maybe there are some other interface tweaks one can make specific for pointers.
contiguous_range should have data() and operator[], iterator_range doesn't have them. iterator_range is also problematic when constructing from literals as literals are implicitly treated as arrays. BTW, contiguous_range sounds like the array_ref proposal. Olaf

On Sat, Nov 17, 2012 at 2:38 AM, Olaf van der Spek <ml@vdspek.org> wrote:
On Fri, Nov 16, 2012 at 10:20 PM, Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com> wrote:
Actually, now that I think about it, contiguous_range<T> == iterator_range<T*>, AFAICT. Maybe iterator_range<T*> might want some member functions added for conversion to/from other contiguous ranges, and maybe there are some other interface tweaks one can make specific for pointers.
contiguous_range should have data() and operator[], iterator_range doesn't have them. iterator_range is also problematic when constructing from literals as literals are implicitly treated as arrays.
BTW, contiguous_range sounds like the array_ref proposal.
What would be the advantage of having a separate array_ref class over partially specializing iterator_range<T*> (preserving backward compatibility, of course)? I'm thinking now the latter might be sufficient, if indeed there is a different interface than the primary template (is data() that important? doesn't iterator_range already have operator[]?). - Jeff

On Sat, Nov 17, 2012 at 3:55 PM, Jeffrey Lee Hellrung, Jr. <jeffrey.hellrung@gmail.com> wrote:
contiguous_range should have data() and operator[], iterator_range doesn't have them. iterator_range is also problematic when constructing from literals as literals are implicitly treated as arrays.
BTW, contiguous_range sounds like the array_ref proposal.
What would be the advantage of having a separate array_ref class over partially specializing iterator_range<T*> (preserving backward compatibility, of course)?
Maybe nothing. Currently this doesn't work though: std::vector<int> a; boost::iterator_range<int*> b = a;
I'm thinking now the latter might be sufficient, if indeed there is a different interface than the primary template (is data() that important? doesn't iterator_range already have operator[]?).
Yes, it does have operator[], my bad. data() is frequently used to pass the range to a C-style function. -- Olaf

On Thu, Nov 15, 2012 at 3:41 PM, Phil Endecott <spam_from_boost_dev@chezphil.org> wrote:
Marshall Clow wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
I did something like this once. I called mine const_string_facade - I also had a mutable version - and it was a template that took an iterator pair. This lets you adapt something like a vector<char> into a string.
basic_string_ref doesn't template its member functions with container or iterator parameter types, so it can't do something like adapting a vector<char> or list<char> into a string. A range-like type (basic_string_range?) would be better for that, handles iterator types generically, and also would be able to handle NTCTS more efficiently. OTOH, operations that require random access iterators can use basic_string_ref directly, but can't use a basic_string_range type, which typically only requires input iterators.
BUT, I think my feeling is that this is actually taking us in the wrong direction: my current coding style tries to avoid the "special" features of std::string and prefers the things that are common to other containers, and std::algorithms.
Yeah, I have some similar misgivings. I'd be happier if basic_string_ref and basic_string_range were part of the same proposal so users got in the habit of choosing the approach that is best for their need. --Beman

On 15/11/12 21:41, Phil Endecott wrote:
BUT, I think my feeling is that this is actually taking us in the wrong direction: my current coding style tries to avoid the "special" features of std::string and prefers the things that are common to other containers, and std::algorithms.
My understanding is that string_ref is meant for people who do not use templates. Notice how the proposal is coming from Google and LLVM, mostly C++ projects which prefer runtime polymorphism. And of course, runtime polymorphism on arbitrary iterators or ranges would be extremely inefficient.

Marshall Clow <mclow.lists <at> gmail.com> writes:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is
My experience is that we are better of with 2 pointers
exceedingly useful when parsing, and manipulating strings in "read-only" ways.
* This is identical to boost/test/utils/basic_cstring.hpp I use in Boost.Test forever (and all other projects I ever worked on). * I personally like my name better, but I am not attached to it. I also like names like const_string, string_buffer for different specializations * The template should support both char and const char as character types * Feel free to highjack my unit test if you want * There are some extra features in my interface you can consider as well Regards, Gennadiy

On 16/11/12 01:21, Gennadiy Rozental wrote:
My experience is that we are better of with 2 pointers
Supposedly pointer + size allows better codegen on some compilers. The fact that it was designed like so by people from LLVM which have a ridiculously important focus on performance of their data structures (to the point of mostly rewriting the whole STL with slightly different algorithms better tuned to their use cases) is to be taken into consideration.

On 16 November 2012 07:04, Mathias Gaunard <mathias.gaunard@ens-lyon.org>wrote:
On 16/11/12 01:21, Gennadiy Rozental wrote:
My experience is that we are better of with 2 pointers
Supposedly pointer + size allows better codegen on some compilers. The fact that it was designed like so by people from LLVM which have a ridiculously important focus on performance of their data structures (to the point of mostly rewriting the whole STL with slightly different algorithms better tuned to their use cases) is to be taken into consideration.
Without concrete examples, I have no idea how to take that into consideration. -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

On Nov 15, 2012, at 2:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
I have such a class, too. I'll compare yours with mine to see if there's anything to add (or change :).
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live.
This has nothing to do with string algos, besides offering iterators for use with algos, so it doesn't belong in that library. ___ Rob

On 15/11/12 19:19, Marshall Clow wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
Isn't this really a separate library? If you want to use it as part of the string_algo library make it a 'detail'. Don't get me wrong I think something like this is good. I'm just saying if it can be made a detail in string_algo do that and then separately move for standalone status in boost, and if not then it is really just a separate library and should be treated as such.
A string_ref is a non-owning reference to a string. It is implemented as a {pointer, length} pair, and is exceedingly useful when parsing, and manipulating strings in "read-only" ways.
The header will be in boost/algorithm/string_ref.hpp, but I welcome discussion about where it should live. Ideally, I think it should be just "boost/string_ref.hpp".
I think it should be under 'strings' or 'utility'.
Basic sanity tests will be checked in with the header file - I have tested on clang, clang11, and gcc. One reason for checking this in now is to get the tests run on lots of different systems.
Docs will be coming soon, but http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html is the proposal for "a future version of C++" ;-)
I assume you are following the N3442 interface? (I didn't check).
Please let me know what you think!
If this is really a separate library (I think it is) then shouldn't it be reviewed and then string_algo updated to work with it? This would have the additional benefit of providing feedback in general for N3442. Jamie

On 16/11/12 12:39, Jamie Allsop wrote:
If this is really a separate library (I think it is) then shouldn't it be reviewed and then string_algo updated to work with it? This would have the additional benefit of providing feedback in general for N3442.
It is an interesting idea to use the Boost review process to review C++ proposals.

On Dec 12, 2012, at 9:56 AM, Olaf van der Spek <ml@vdspek.org> wrote:
On Thu, Nov 15, 2012 at 8:19 PM, Marshall Clow <mclow.lists@gmail.com> wrote:
I'm about to check in some new functionality into the string_algo library; an implementation of string_ref.
Hi Marshall,
What's the status?
Well, it's been checked in, and tests are running - and there's been lots of discussion ;-) It needs more tests, and docs (which I hope to write this week), and a final home. boost/utility seems to be the consensus as to where it should live. Unless I get docs/more tests written (which I hope will happen) it won't be part of 1.53.0 -- Marshall Marshall Clow Idio Software <mailto:mclow.lists@gmail.com> A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait). -- Yu Suzuki
participants (18)
-
Beman Dawes
-
Darren Garvey
-
Gennadiy Rozental
-
Jamie Allsop
-
Jeffrey Lee Hellrung, Jr.
-
Marshall Clow
-
Mathias Gaunard
-
Matus Chochlik
-
Nathan Crookston
-
Nevin Liber
-
Olaf van der Spek
-
Peter Dimov
-
Phil Endecott
-
Rene Rivera
-
Rob Stewart
-
Steven Watanabe
-
Yakov Galka
-
Yanchenko Maxim