Re: [boost] [strings][unicode] Proposals for Improved String Interoperability in a Unicode World

29 Jan 2012

      On 01/28/2012 05:46 PM, Beman Dawes wrote:
...
Beman.github.com/string-interoperability/interop_white_paper.html
describes Boost components intended to ease string interoperability in
general and Unicode string interoperability in particular.
These proposals are the Boost version of the TR2 proposals made in
N3336, Adapting Standard Library Strings and I/O to a Unicode World.
See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
I'm very interested in hearing comments about either the Boost or the
TR2 proposal. Are these useful additions? Is there a better way to
achieve the same easy interoperability goals?
I think you should consider the points being made in N3334.
While that proposal is in my opinion not good enough, it raises an 
important issue that is often present with std::string-based or similar 
designs.

A function that takes a std::string, or a boost::filesystem::path for 
that matter, necessarily causes the callee to copy the data into a 
heap-allocated buffer, even if there is no need to.

Use of the range concept would solve that issue, but then that requires 
making the function a template. A type-erased range would be possible, 
but that has significant performance overhead.
a string_ref or path_ref is maybe the lesser evil.
...
Where is the best home for the Boost proposals? A separate library?
Part of some existing library?
Are these proposals orthogonal to the need for deeper Unicode
functionality, such as Mathias Gaunard's Unicode components?
It seems all you really care about is having iterator adaptors that do 
character set conversion, allowing to lazily convert any range of any 
encoding to a particular Unicode encoding.
This has always been the goal of my library, which somewhat provides 
that along with more advanced Unicode features. Those two things could 
live separately though.

For standardization, the problem with iterator adaptors is that they 
cannot be as fast as free functions operating on pointers, unless the 
optimizer is pretty darn good. The conversion algorithms are also fully 
template and cannot be put in the library binary.
Those are disadvantages compared to the mechanisms that exist today in 
the standard.

By the way you only have input iterator adaptors. In my library I've 
implemented bidirectional iterator adaptors and output iterator adaptors.
You've only been considering input, but output can also be useful 
depending on the situation.