
On Sat, Mar 02, 2013 at 14:56:52 +0400, Andrey Semashev wrote:
Suppose I have a logging application that writes log records in wide (wchar_t, UTF-16)
wchar_t does not have to be UTF-16. On most non-Windows platforms it is UCS-4. The standard also seems to expect each wchar_t to contain complete codepoint, which isn't the case with UTF-16, so UTF-16 isn't supported. That said everybody uses it as UTF-16 on Windows, because Microsoft jumped on the Unicode bandwagon too fast and baked 2-byte wchar_t into the API so that using UTF-16 is now the only option to support unicode after 2.0 there.
and narrow (char, UTF-8) encodings and I want these logs to be stored in a UTF-16LE encoded file. For simplicity, let's assume that I write log files with std::wofstream. Now, the standard says that the file stream buffer is supposed to convert wide characters to byte sequences using the locale imbued into the buffer.
Yes, right. And the `operator<<(std::wostream &, const char *)` uses the locale imbued in the stream.
However, it seems that the locale should be the same as the one imbued into the stream (basic_ostream::imbue makes sure of that).
Now why do you think? basic_ios::imbue makes it the *default*, but I don't think it forbids overriding the buffer locale.
What this leads to is that in order to achieve my goal the locale should be able to convert narrow characters of UTF-8 to wide characters of UTF-16 and wide characters of UTF-16 to narrow characters representing byte sequence of UTF16LE. Is it possible to make such an asymmetric locale with Boost.Locale? Or maybe there is another way of doing this?
It's not needed. Just imbue two different locales. You only have to be careful about the order, because the stream overwrites the buffer's locale. As I said above, wchar_t does not have to be utf-16, so the buffer needs to use locale with codecvt_utf16 facet and the stream needs to use locale with codecvt_utf8 facet. Alternatively you can use boost::iostreams::file_sink wrapped in explicit boost::iostreams::code_converter using codecvt_utf16 and imbue the outer stream with codecvt_utf8.
An additional question. Is it possible to to achieve my goal with std::ofstream (as opposed to std::wofstream)? I have a very strong suspicion that the answer is no because the narrow characters will pass on unconverted to the file instead of being translated from UTF-8 to UTF-16LE, but maybe I'm missing something.
All streams accept their character type and plain char, but not other character types. So you can't write wide string into narrow stream at all. -- Jan 'Bulb' Hudec <bulb@ucw.cz>