
Eric Niebler wrote:
Agree. Thanks Zach. I'm discouraged that every time the issue of a Unicode library comes up, the discussion immediately descends into a debate about how to design yet another string class. Such a high level wrapper *might* be useful (strong emphasis on "might"), but the core must be the Unicode algorithms, and the design for a Unicode library must start there.
Since it seems like there's a lot of concern with making a new string type, how about the following (off-the-cuff): * Iterator filters a la Zach's message: typedef std::basic_string<char16_t> utf16_string; utf16_string u_string = /*...*/; std::string std_string = /*...*/; typedef boost::recoding_iterator<boost::utf16, boost::utf8> utf16_to_utf8_iter; std::copy(utf16_to_utf8_iter(u_string.begin()), utf16_to_utf8_iter(u_string.end()), std::back_inserter(std_string)); * Runtime-defined filters: typedef boost::recoding_iterator<boost::utf16,boost::runtime> utf16_to_any_iter; boost::runtime *my_codec = /*...*/; std::copy(utf16_to_utf8_iter(u_string.begin(), my_codec), utf16_to_utf8_iter(u_string.end(), my_codec), std::back_inserter(std_string)); * Shorthand for the above two points: boost::transcode(u_string, boost::utf16(), std_string, boost::utf8()); * String views that can wrap up the encoding type and the data (a container of some kind: strings, vector<char>s, ropes, etc): boost::estring_view<utf8> my_utf8_string(std_string); boost::estring_view<> my_rt_string(str, my_codec); boost::transcode(my_utf8_string, my_rt_string); Luckily, most of the work I've done is in making the encoding facets extensible and chooseable at runtime, so I wouldn't mourn the loss of my (frankly none-too-zazzy) string class. - Jim