[c++TR2] N3334, Proposing array_ref<T> and string_ref

On Sat, Jan 28, 2012 at 8:12 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 01/28/2012 05:46 PM, Beman Dawes wrote:
Beman.github.com/string-interoperability/interop_white_paper.html describes Boost components intended to ease string interoperability in general and Unicode string interoperability in particular.
These proposals are the Boost version of the TR2 proposals made in N3336, Adapting Standard Library Strings and I/O to a Unicode World. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
I'm very interested in hearing comments about either the Boost or the TR2 proposal. Are these useful additions? Is there a better way to achieve the same easy interoperability goals?
I think you should consider the points being made in N3334.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3334.html While this proposal isn't from Boost, it impacts interests of Boost developers enough that I think it is worth discussing here as a separate topic. Mathias continues:
While that proposal is in my opinion not good enough, it raises an important issue that is often present with std::string-based or similar designs.
A function that takes a std::string, or a boost::filesystem::path for that matter, necessarily causes the [caller] to copy the data into a heap-allocated buffer, even if there is no need to.
Some std library string implementations avoid the heap allocation for small strings, but still there is an unnecessary copy happening even in those implementations. Your point is well taken and I've often worried about it with boost::filesystem::path.
Use of the range concept would solve that issue, but then that requires making the function a template. A type-erased range would be possible, but that has significant performance overhead. a string_ref or path_ref is maybe the lesser evil.
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers. Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better. What do other Boosters think? --Beman --Beman

2012/1/30 Beman Dawes <bdawes@acm.org>
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers.
Implements are free to use two pointers if that's faster on some platform.
Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better.
I don't see how this could work unless each access to basic_string_ref involves a virtual function call or all functions accepting basic_string_ref must be templates. // Writes a string to file. This function doesn't care who owns // the string. void WriteToFile(basic_string_ref<char, ...> s); What should go in ... if the string is allowed to do encoding conversion on-the-fly? This is the same problem we have with ranges. If you want to write a function that accepts a range of integers, you either have to implement it as a template or use any_iterator/any_range which could be too inefficient. Roman Perepelitsa.

On Mon, Jan 30, 2012 at 9:30 AM, Roman Perepelitsa <roman.perepelitsa@gmail.com> wrote:
2012/1/30 Beman Dawes <bdawes@acm.org>
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers.
Implements are free to use two pointers if that's faster on some platform.
Sure.
Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better.
I don't see how this could work unless each access to basic_string_ref involves a virtual function call or all functions accepting basic_string_ref must be templates. function that accepts a range of integers, you either have to implement it as a template or use any_iterator/any_range which could be too inefficient.
The other alternative is to use the boost::filesystem::path/N3336 approach, which avoids conversion inefficiency if no conversion is necessary, but does come at the cost of creating a temporary when a conversion is required. Anyhow, that's all an aside to the real questions: What are the pros and cons of N3334 in general and basic_string_ref in particular? Thanks, --Beman

On Mon, Jan 30, 2012 at 3:20 PM, Beman Dawes <bdawes@acm.org> wrote:
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers. Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better.
What do other Boosters think?
I think the idea is great. In fact, I've written similiar classes: http://code.google.com/p/xbt/source/browse/trunk/xbt/misc/xbt/data_ref.h I've posted about the idea on this list before, but received few responses. The idea is that you've got a non-template function that takes an array. Often types used are (const void*, size_t) or (const char*, size_t), which is cumbersome. Iterators instead of pointers wouldn't really work. std::string with small string optimizations is sub-optimal if input is not an std::string, but for example std::array. N3334 does not really address the (const void*, size_t) case. BTW, isn't there a forum / mailing list to discuss these proposals? Olaf

On Mon, Jan 30, 2012 at 6:34 PM, Olaf van der Spek <ml@vdspek.org> wrote:
What do other Boosters think?
I think the idea is great. In fact, I've written similiar classes: http://code.google.com/p/xbt/source/browse/trunk/xbt/misc/xbt/data_ref.h I've posted about the idea on this list before, but received few responses.
The idea is that you've got a non-template function that takes an array. Often types used are (const void*, size_t) or (const char*, size_t), which is cumbersome. Iterators instead of pointers wouldn't really work.
std::string with small string optimizations is sub-optimal if input is not an std::string, but for example std::array.
N3334 does not really address the (const void*, size_t) case.
BTW, isn't there a forum / mailing list to discuss these proposals?
Yes, and they are usually better choices than the Boost list for discussion about committee proposals. https://groups.google.com/forum/#!forum/comp.std.c++ for one, but it hasn't been very active recently. The committee has its own discussion lists, but they are available only to members. I particularly value the feedback from Boost members, and thought there might be a lot of Boost interest in this particular proposal. Thanks, --Beman

Olaf van der Spek wrote:
On Mon, Jan 30, 2012 at 3:20 PM, Beman Dawes <bdawes@acm.org> wrote:
What do other Boosters think?
I think the idea is great. In fact, I've written similiar classes:
I wrote and use a string_ref class. I made no attempt to support wchar_t. It provides a constructor for std::vector<char>, besides the constructors given in N3334. I also have a const_substring type for which there's a constructor. The list can grow large, of course, so relying on conversions to string_ref from other string-like types is reasonable. I don't agree with replicating std::string's too large interface in string_ref. I'd prefer to use free functions to augment the functionality.
N3334 does not really address the (const void*, size_t) case.
They support that for string_ref, but not array_ref. Go figure. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com ________________________________ IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Feedback on the paper: 1. I support the additions especially since I encountered similar concepts implemented independently in multiple places, including myself. It seems like a natural compromise between separate compilation and generality. 2. Instead of adding a bunch of implicit constructors and implicit conversion operators to each container with contiguous storage one can proceed as follows. a) std::contiguous_iterator_tag can be added to the standard (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html for example). It would be a great addition by its own since generic code can optimize appropriately when ContiguousIterators to is_trivially_copyable<value_type> are used. b) Add *one* implicit constructor to array_ref: template<class R> array_ref(const R& x, typename enable_if< is_base_of< contiguous_iterator_tag, typename iterator_traits< decltype(begin(x)) >::iterator_category >::value, int >::type = 0); // use &*begin(x), (end(x) - begin(x)) to initialize. On Mon, Jan 30, 2012 at 16:20, Beman Dawes <bdawes@acm.org> wrote:
On Sat, Jan 28, 2012 at 8:12 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 01/28/2012 05:46 PM, Beman Dawes wrote:
Beman.github.com/string-interoperability/interop_white_paper.html describes Boost components intended to ease string interoperability in general and Unicode string interoperability in particular.
These proposals are the Boost version of the TR2 proposals made in N3336, Adapting Standard Library Strings and I/O to a Unicode World. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
I'm very interested in hearing comments about either the Boost or the TR2 proposal. Are these useful additions? Is there a better way to achieve the same easy interoperability goals?
I think you should consider the points being made in N3334.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3334.html
While this proposal isn't from Boost, it impacts interests of Boost developers enough that I think it is worth discussing here as a separate topic.
Mathias continues:
While that proposal is in my opinion not good enough, it raises an important issue that is often present with std::string-based or similar designs.
A function that takes a std::string, or a boost::filesystem::path for that matter, necessarily causes the [caller] to copy the data into a heap-allocated buffer, even if there is no need to.
Some std library string implementations avoid the heap allocation for small strings, but still there is an unnecessary copy happening even in those implementations. Your point is well taken and I've often worried about it with boost::filesystem::path.
Use of the range concept would solve that issue, but then that requires making the function a template. A type-erased range would be possible, but that has significant performance overhead. a string_ref or path_ref is maybe the lesser evil.
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers. Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better.
What do other Boosters think?
--Beman
--Beman
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Yakov

On Tue, Jan 31, 2012 at 7:57 PM, Yakov Galka <ybungalobill@gmail.com> wrote:
b) Add *one* implicit constructor to array_ref:
template<class R> array_ref(const R& x, typename enable_if< is_base_of< contiguous_iterator_tag, typename iterator_traits< decltype(begin(x)) >::iterator_category >::value, int >::type = 0); // use &*begin(x), (end(x) - begin(x)) to initialize.
You should check whether the range is empty before dereferencing begin(). Olaf
participants (5)
-
Beman Dawes
-
Olaf van der Spek
-
Roman Perepelitsa
-
Stewart, Robert
-
Yakov Galka