
On Sat, Jan 28, 2012 at 8:12 PM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 01/28/2012 05:46 PM, Beman Dawes wrote:
Beman.github.com/string-interoperability/interop_white_paper.html describes Boost components intended to ease string interoperability in general and Unicode string interoperability in particular.
These proposals are the Boost version of the TR2 proposals made in N3336, Adapting Standard Library Strings and I/O to a Unicode World. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
I'm very interested in hearing comments about either the Boost or the TR2 proposal. Are these useful additions? Is there a better way to achieve the same easy interoperability goals?
I think you should consider the points being made in N3334.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3334.html While this proposal isn't from Boost, it impacts interests of Boost developers enough that I think it is worth discussing here as a separate topic. Mathias continues:
While that proposal is in my opinion not good enough, it raises an important issue that is often present with std::string-based or similar designs.
A function that takes a std::string, or a boost::filesystem::path for that matter, necessarily causes the [caller] to copy the data into a heap-allocated buffer, even if there is no need to.
Some std library string implementations avoid the heap allocation for small strings, but still there is an unnecessary copy happening even in those implementations. Your point is well taken and I've often worried about it with boost::filesystem::path.
Use of the range concept would solve that issue, but then that requires making the function a template. A type-erased range would be possible, but that has significant performance overhead. a string_ref or path_ref is maybe the lesser evil.
One of my blink reactions is that array_ref<T> and basic_string_ref<charT, traits> are range generators and I was a bit surprised to see the implementation was a pointer and length rather than two pointers. Or better yet, two iterators or an explicit range component. With iterators, a basic_string_ref could do encoding conversions on-the-fly without need of temporary strings. But I have no idea if that is workable or actually is better. What do other Boosters think? --Beman --Beman