Re: [boost] [locale] Composing asymmetric locale for character encoding conversion

2 Mar 2013

...
________________________________
From: Andrey Semashev <andrey.semashev@gmail.com>
To: boost@lists.boost.org 
Sent: Saturday, March 2, 2013 12:56 PM
Subject: [boost] [locale] Composing asymmetric locale for character encoding conversion
Hi,
Suppose I have a logging application that writes log records in wide
(wchar_t, UTF-16) and narrow (char, UTF-8) encodings and I want these
logs to be stored in a UTF-16LE encoded file. For simplicity, let's
assume that I write log files with std::wofstream. Now, the standard
says that the file stream buffer is supposed to convert wide
characters to byte sequences using the locale imbued into the buffer.
In generally it is done by codecvt facet, but it id designed to covert
wide characters to 8 bit encode and vise versa.
...
However, it seems that the locale should be the same as the one imbued
into the stream (basic_ostream::imbue makes sure of that). 
No you can install your own codecvt to existing locale object and than
imbue it into the stream.
...
What this
leads to is that in order to achieve my goal the locale should be able
to convert narrow characters of UTF-8 to wide characters of UTF-16 and
wide characters of UTF-16 to narrow characters representing byte
sequence of UTF16LE. Is it possible to make such an asymmetric locale
with Boost.Locale? Or maybe there is another way of doing this?
No, the stuff you are probably looking for is in an interface 
that provides both `std::basic_ostream<char>` and `std::basic_ostream<wchar_t>~

And than implement your stream buffer that would do the conversion.
...
An additional question. Is it possible to to achieve my goal with
...
std::ofstream (as opposed to std::wofstream)?
No, you will need:

1. two different wide and narrow streams.
2. Your custom stream buffer that would convert input characters
   to your arbitrary encoding

You'd better start from boost::iostream and use boost::locale::utf::*
functions for character set manipulation.
...
I have a very strong
suspicion that the answer is no because the narrow characters will
pass on unconverted to the file instead of being translated from UTF-8
to UTF-16LE, but maybe I'm missing something.
Yes you are correct the codecvt<char,char> is no-op.
...
Thank you.
Artyom Beilis
--------------
CppCMS - C++ Web Framework:   http://cppcms.com/
CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/
...
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost