
On 01/28/2012 05:46 PM, Beman Dawes wrote:
Beman.github.com/string-interoperability/interop_white_paper.html describes Boost components intended to ease string interoperability in general and Unicode string interoperability in particular.
These proposals are the Boost version of the TR2 proposals made in N3336, Adapting Standard Library Strings and I/O to a Unicode World. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
I'm very interested in hearing comments about either the Boost or the TR2 proposal. Are these useful additions? Is there a better way to achieve the same easy interoperability goals?
I think you should consider the points being made in N3334. While that proposal is in my opinion not good enough, it raises an important issue that is often present with std::string-based or similar designs. A function that takes a std::string, or a boost::filesystem::path for that matter, necessarily causes the callee to copy the data into a heap-allocated buffer, even if there is no need to. Use of the range concept would solve that issue, but then that requires making the function a template. A type-erased range would be possible, but that has significant performance overhead. a string_ref or path_ref is maybe the lesser evil.
Where is the best home for the Boost proposals? A separate library? Part of some existing library?
Are these proposals orthogonal to the need for deeper Unicode functionality, such as Mathias Gaunard's Unicode components?
It seems all you really care about is having iterator adaptors that do character set conversion, allowing to lazily convert any range of any encoding to a particular Unicode encoding. This has always been the goal of my library, which somewhat provides that along with more advanced Unicode features. Those two things could live separately though. For standardization, the problem with iterator adaptors is that they cannot be as fast as free functions operating on pointers, unless the optimizer is pretty darn good. The conversion algorithms are also fully template and cannot be put in the library binary. Those are disadvantages compared to the mechanisms that exist today in the standard. By the way you only have input iterator adaptors. In my library I've implemented bidirectional iterator adaptors and output iterator adaptors. You've only been considering input, but output can also be useful depending on the situation.