
On Mon, Mar 4, 2013 at 1:32 AM, Jan Hudec <bulb@ucw.cz> wrote:
On Sat, Mar 02, 2013 at 14:56:52 +0400, Andrey Semashev wrote:
Suppose I have a logging application that writes log records in wide (wchar_t, UTF-16)
wchar_t does not have to be UTF-16. On most non-Windows platforms it is UCS-4.
The standard also seems to expect each wchar_t to contain complete codepoint, which isn't the case with UTF-16, so UTF-16 isn't supported. That said everybody uses it as UTF-16 on Windows, because Microsoft jumped on the Unicode bandwagon too fast and baked 2-byte wchar_t into the API so that using UTF-16 is now the only option to support unicode after 2.0 there.
Yes, I'm aware of that. I have Windows in mind.
However, it seems that the locale should be the same as the one imbued into the stream (basic_ostream::imbue makes sure of that).
Now why do you think? basic_ios::imbue makes it the *default*, but I don't think it forbids overriding the buffer locale.
Come to think of it, you may be right. I cannot find any further indication of that the same locale is expected.
What this leads to is that in order to achieve my goal the locale should be able to convert narrow characters of UTF-8 to wide characters of UTF-16 and wide characters of UTF-16 to narrow characters representing byte sequence of UTF16LE. Is it possible to make such an asymmetric locale with Boost.Locale? Or maybe there is another way of doing this?
It's not needed. Just imbue two different locales. You only have to be careful about the order, because the stream overwrites the buffer's locale.
As I said above, wchar_t does not have to be utf-16, so the buffer needs to use locale with codecvt_utf16 facet and the stream needs to use locale with codecvt_utf8 facet.
Alternatively you can use boost::iostreams::file_sink wrapped in explicit boost::iostreams::code_converter using codecvt_utf16 and imbue the outer stream with codecvt_utf8.
All these assume the availability of codecvt_utf16 from C++11 (codecvt_utf8 can be replaced with Boost.Locale-generated facet, I guess). Also, there seem to be no codecvt_utf32 for some reason, in case if I wanted to write UTF-32 encoded files. As far as I can see, Boost.Locale does not provide C++11 codecvt facets. Is that right? Is this support planned?