Re: [boost] [General] Always treat std::strings as UTF-8

17 Jan 2011


      On Mon, 17 Jan 2011 09:39:20 -0500, Chad Nelson wrote:
...
Right now, the utf*_t classes assume that any std::string fed directly
into them is meant to be translated as-is. It's assumed to consist of
characters that should be directly encoded as their unsigned values.
That works perfectly for seven-bit ASCII text, but may be problematic
for values with the high-bit set.
I've done some research, and it looks like it would require little
effort to create an os::string_t type that uses the current locale, and
assume all raw std::strings that contain eight-bit values are coded in
that instead.
I'm not sure about the os namespace ;)  What about just calling it native_t
like your other class but in the same namespace as utf8_t etc.
...
Design-wise, ascii_t would need to change slightly after this, to throw
on anything that can't fit into a *seven*-bit value, rather than
eight-bit. I'll add the default-character option to both types as well,
and maybe make other improvements as I have time.
Sounds good.
...
Artyom, since you seem to have more experience with this stuff than I,
what do you think? Would those alterations take care of your objections?
Also, Artyom's Boost.Locale does very sophisticated encoding conversion but
the unicode conversions done by utf*_t look (scarily?) small.  Do they do
as good a job or should these classes make use of the conversions in
Boost.Locale?

Alex


-- 
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Re: [boost] [General] Always treat std::strings as UTF-8

Alexander Lamaison