[serialization?] converting utf8 string to unicode wstring
Hello, I try to accomplish the subj with help of boost's utf8_codecvt_facet. I based my code on this example: http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/codecvt.html . The only difference is that my utf8 text resides in std::string: #include <sstream> #include <iostream> #include "boost/archive/detail/utf8_codecvt_facet.hpp" // link with boost/libs/serialization/src/utf8_codecvt_facet.cpp int main() { std::string utf; utf.resize(11); // hardcode some utf8 text utf[0] = 0xd7; utf[1] = 0x90; utf[2] = 0xd7; utf[3] = 0x99; utf[4] = 0xd7; utf[5] = 0x92; utf[6] = 0xd7; utf[7] = 0x95; utf[8] = 0xd7; utf[9] = 0xa8; utf[10] = 0x0; std::locale old_locale; std::locale utf8_locale(old_locale, new boost::archive::detail::utf8_codecvt_facet()); std::locale::global(utf8_locale); std::stringstream in; in.imbue(utf8_locale); in.str(utf); std::wstringstream out; out << in; std::wcout << out.str() << std::endl; } The above code doesn't work: "out" buffer doesn't contain correct unicode interpretation of the string. Actually, all i want is a c++ equivalent to the following WinAPI: #include "windows.h" int main() { std::string utf; utf.resize(11); // hardcode some utf8 text utf[0] = 0xd7; utf[1] = 0x90; utf[2] = 0xd7; utf[3] = 0x99; utf[4] = 0xd7; utf[5] = 0x92; utf[6] = 0xd7; utf[7] = 0x95; utf[8] = 0xd7; utf[9] = 0xa8; utf[10] = 0x0; wchar_t outBuff[11]; MultiByteToWideChar(CP_UTF8, 0, utf.c_str(), -1, outBuff, 10); } ...which works well. Any idea would be greatly appreciated! Thanks.
On Wed, Oct 7, 2009 at 11:58 AM, Igor R
Hello,
I try to accomplish the subj with help of boost's utf8_codecvt_facet. I based my code on this example: http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/codecvt.html . The only difference is that my utf8 text resides in std::string:
#include <sstream> #include <iostream> #include "boost/archive/detail/utf8_codecvt_facet.hpp" // link with boost/libs/serialization/src/utf8_codecvt_facet.cpp
int main() { std::string utf; utf.resize(11); // hardcode some utf8 text utf[0] = 0xd7; utf[1] = 0x90; utf[2] = 0xd7; utf[3] = 0x99; utf[4] = 0xd7; utf[5] = 0x92; utf[6] = 0xd7; utf[7] = 0x95; utf[8] = 0xd7; utf[9] = 0xa8; utf[10] = 0x0; std::locale old_locale; std::locale utf8_locale(old_locale, new boost::archive::detail::utf8_codecvt_facet()); std::locale::global(utf8_locale); std::stringstream in; in.imbue(utf8_locale); in.str(utf); std::wstringstream out; out << in; std::wcout << out.str() << std::endl; }
The above code doesn't work: "out" buffer doesn't contain correct unicode interpretation of the string. Actually, all i want is a c++ equivalent to the following WinAPI:
#include "windows.h" int main() { std::string utf; utf.resize(11); // hardcode some utf8 text utf[0] = 0xd7; utf[1] = 0x90; utf[2] = 0xd7; utf[3] = 0x99; utf[4] = 0xd7; utf[5] = 0x92; utf[6] = 0xd7; utf[7] = 0x95; utf[8] = 0xd7; utf[9] = 0xa8; utf[10] = 0x0; wchar_t outBuff[11]; MultiByteToWideChar(CP_UTF8, 0, utf.c_str(), -1, outBuff, 10); }
...which works well.
Any idea would be greatly appreciated!
Er... I thought UTF8 *is* a form of Unicode? Looks like you are trying to convert UTF8 to UTF16, for what reason?
Er... I thought UTF8 *is* a form of Unicode?
Well, kind of.. Looks like you are trying to convert UTF8 to UTF16, for what reason?
Because there're lots of API that expect "real" unicode, i.e. wide-character strings (wchar_t *). But after thinking about it again... all the places, where I need it converted, are in windows-specific code anyway, so I'd better keep it as is, and where needed I'll just use MultiByteToWideChar().
boost-users-bounces@lists.boost.org wrote:
Hello,
I try to accomplish the subj with help of boost's utf8_codecvt_facet.
[snip]
std::locale utf8_locale(old_locale, new boost::archive::detail::utf8_codecvt_facet()); std::locale::global(utf8_locale); std::stringstream in; in.imbue(utf8_locale); in.str(utf); std::wstringstream out; out << in; std::wcout << out.str() << std::endl; }
The above code doesn't work: "out" buffer doesn't contain correct unicode interpretation of the string. Actually, all i want is a c++ equivalent to the following WinAPI:
I *think* that this invokes operator<<(const void*) on "out", passing the results of operator void*() on "in". What may have had a chance at producing something near the desired behavior would be to invoke operator<<(wostream&, const char*), like: out << in.str().c_str() Sadly, this won't work either as this uses ctype::widen, not codecvt::do_in to make the conversion. AFAIK, the only use of codedcvt in the iostreams library is for the conversion between the "external" and "internal" encodings done by basic_filebuf. To get the desired behavior, I think you'll have to write your own operator<<(wostream&, const string&) and use the codecvt facet inside it. IIRC, C++0x includes things like wstring_convert to do that.
participants (3)
-
Eric MALENFANT
-
Igor R
-
OvermindDL1