I have some code that does conversion between UTF16 and MBCSs on Windows
only:
template
struct convert
{
basic_string<TO> operator()(const
basic_string<FROM>& from)
{
return from;
}
};
template<>
struct convert
{
string operator()(const wstring& from)
{
return utf16_to_mbcs(from);
}
private:
string utf16_to_mbcs(const wstring& ws)
{
if(ws.empty()) return string();
const size_t BUFFER_SIZE =
(ws.size() << 1) + 1;
shared_array<char> p_mcb(new
char[BUFFER_SIZE]);
bool has_utf16le_bom = (0xFEFF
== ws[0]);
int count =
::WideCharToMultiByte(
AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP,
WC_NO_BEST_FIT_CHARS,
( has_utf16le_bom ?
ws.substr(1) : ws).c_str(),
has_utf16le_bom ?
ws.size() - 1 : ws.size(),
p_mcb.get(),
BUFFER_SIZE,
0,
0 );
return (0 == count)
? string()
: string(p_mcb.get(), count );
}
};
template<>
struct convert
{
wstring operator()(const string& from)
{
return mbcs_to_utf16(from);
}
private:
wstring mbcs_to_utf16(const string& s)
{
if(s.empty()) return wstring();
const size_t BUFFER_SIZE =
(s.size() << 1) + 1;
shared_array p_ws(new
wchar_t[BUFFER_SIZE]);
int count =
::MultiByteToWideChar(
AreFileApisANSI() ? CP_THREAD_ACP : CP_OEMCP,
MB_PRECOMPOSED,
s.c_str(),
s.size(),
p_ws.get(),
BUFFER_SIZE
);
return (0 == count)
? wstring()
: wstring(p_ws.get(),
count );
}
};
Date: Thu, 16 Jul 2009 10:39:31 +0200
From: plarroy
My approach is using std::string, etc. all the time and using UTF-8
internally, only converting to other charsets when it's needed.
I use IBM icu library and made a boost::iostreams filter to convert
encoding, once it's done takes a lot of complexity away, I use it like:
// setup a conversion from charset to utf-8
filt_streamb.push(ucnv_filter(charset.c_str(), "utf-8"));
istream is(&filt_streamb);
Perhaps there's interest to push this charset conversion into
boost::iostreams filters examples.
Regards.
Oh, I also forgot to mention, I am also using boost::filesystem::path.
I
guess this means I need to use wchar_t everywhere (std::wstring,
boost::filesystem::wpath, etc) and just let wxWidgets do the
encoding/decoding? If I don't have to do any encoding/decoding myself,
there really is no need for a special object. But just in case I would
to have the encoding/decoding abilities.
On Sun, Jun 14, 2009 at 12:27 PM, Robert Dailey
wrote:
Hi everyone,
I did a bit of googling to see if Boost 1.39 as any portable support
for
UTF-16 encoded strings, but I did not find any. I'm currently using
wxWidgets in my application, and I need a decent string object to
use. I
know that wxWidgets has UTF-16 string support through wxString,
however I do
not want to expose this object in my interfaces. I want to remain as
abstracted away from wxWidgets as possible. Having said that, if
someone
could tell me if there is any existing UTF-16 string support in
Boost, I'd
appreciate it. I did not find anything in the vault, sandbox, or
Robert Dailey wrote:
then
like
trunk in
Boost.
If boost has no such string object, could someone give me a head
start on
where to look? Thanks.