Re: [boost] [unicode] Interest Check / Proof of Concept

20 Nov 2008

      Eric Niebler wrote:
...
Agree. Thanks Zach. I'm discouraged that every time the issue of a 
Unicode library comes up, the discussion immediately descends into a 
debate about how to design yet another string class. Such a high level 
wrapper *might* be useful (strong emphasis on "might"), but the core 
must be the Unicode algorithms, and the design for a Unicode library 
must start there.
Since it seems like there's a lot of concern with making a new string 
type, how about the following (off-the-cuff):

* Iterator filters a la Zach's message:

	typedef std::basic_string<char16_t> utf16_string;

	utf16_string u_string = /*...*/;
	std::string std_string = /*...*/;

	typedef boost::recoding_iterator<boost::utf16, boost::utf8>
		utf16_to_utf8_iter;
	std::copy(utf16_to_utf8_iter(u_string.begin()),
		utf16_to_utf8_iter(u_string.end()),
		std::back_inserter(std_string));

* Runtime-defined filters:

	typedef boost::recoding_iterator<boost::utf16,boost::runtime>
		utf16_to_any_iter;
	boost::runtime *my_codec = /*...*/;
	std::copy(utf16_to_utf8_iter(u_string.begin(), my_codec),
		utf16_to_utf8_iter(u_string.end(), my_codec),
		std::back_inserter(std_string));

* Shorthand for the above two points:

	boost::transcode(u_string, boost::utf16(),
		std_string, boost::utf8());

* String views that can wrap up the encoding type and the data (a 
container of some kind: strings, vector<char>s, ropes, etc):

	boost::estring_view<utf8> my_utf8_string(std_string);
	boost::estring_view<> my_rt_string(str, my_codec);

	boost::transcode(my_utf8_string, my_rt_string);

Luckily, most of the work I've done is in making the encoding facets 
extensible and chooseable at runtime, so I wouldn't mourn the loss of my 
(frankly none-too-zazzy) string class.

- Jim

Re: [boost] [unicode] Interest Check / Proof of Concept

James Porter