
On Mon, 17 Jan 2011 09:39:20 -0500, Chad Nelson wrote:
Right now, the utf*_t classes assume that any std::string fed directly into them is meant to be translated as-is. It's assumed to consist of characters that should be directly encoded as their unsigned values. That works perfectly for seven-bit ASCII text, but may be problematic for values with the high-bit set.
I've done some research, and it looks like it would require little effort to create an os::string_t type that uses the current locale, and assume all raw std::strings that contain eight-bit values are coded in that instead.
I'm not sure about the os namespace ;) What about just calling it native_t like your other class but in the same namespace as utf8_t etc.
Design-wise, ascii_t would need to change slightly after this, to throw on anything that can't fit into a *seven*-bit value, rather than eight-bit. I'll add the default-character option to both types as well, and maybe make other improvements as I have time.
Sounds good.
Artyom, since you seem to have more experience with this stuff than I, what do you think? Would those alterations take care of your objections?
Also, Artyom's Boost.Locale does very sophisticated encoding conversion but the unicode conversions done by utf*_t look (scarily?) small. Do they do as good a job or should these classes make use of the conversions in Boost.Locale? Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)