
Few points: 1. wchar_t comes from C and was defined first time when it was indeed enough 16 bits and the size wasn't defined specifically 2. Keeping compatible ABI is very important, so you should just get used to fact that wchar_t may be 2 or 4 bytes 3. Ignoring non-BMP places is very bad idea, so you should assume that std::wstring is UTF-16 or UTF-32 and not UCS-2 or UCS-4! 4. If you write new applications, don't use wchar_t, use std::string and UTF-8 this is perfectly good and right solution. 5. If you support wchar_t - support UTF-16 and support is well. So IMHO the facet we talk about should support UTF-16 as well and should work with correct definition of UTF-8. You can blame standards, you can blame Microsoft or even Unicode but you still have to live with it, and as long as you live with it - do it right. Artyom P.S.: I think UTF-16 should die: http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmf... P.P.S.: As long as Boost supports wchar_t I believe it should support UTF-16 even it is a nightmare. ----- Original Message ----
From: Sebastian Redl <sebastian.redl@getdesigned.at> To: boost@lists.boost.org Sent: Mon, October 18, 2010 9:36:17 AM Subject: Re: [boost] boost utf-8 code conversion facet has security problems
On 18.10.2010 08:07, Patrick Horgan wrote:
On 10/16/2010 06:10 AM, Sebastian Redl wrote:
On 16.10.2010, at 00:23, Patrick Horgan wrote:
Support of the recent C++ drafts requires a char32_t basic type anyway, so I can't imagine anyone using a 16-bit wchar_t going forward, There's absolutely no way Windows programming will ever change wchar_t away from 16 bits, and people will continue to use it. Then that implies that it can only hold UCS2. That's a choice. In C99, the type wchar_t is officially intended to be used only for 32-bit ISO 10646 values, independent of the currently used locale. C99 subclause 6.10.8 specifies that the value of the macro __STDC_ISO_10646__ shall be "an integer constant of the form yyyymmL (for example, 199712L), intended to indicate that values of type wchar_t are the coded representations of the characters defined by ISO/IEC 10646, along with all amendments and technical corrigenda as of the specified year and month." Of course Microsoft isn't able to define that, since you can't hold 20 bits in a 16 bit data type.
Microsoft defines wchar_t to be a UTF-16 2-byte unit, screw the standards.
Sebastian _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost