[Boost.Locale] [Boost.Iostreams] Example with code converter does not work with clang
I'm trying the example on the "Character Set Conversions" page of the Boost.Locale documentation: http://www.boost.org/doc/libs/1_49_0/libs/locale/doc/html/charset_handling.h... (the full code I'm running is at the end of this message). When compiling with gcc: g++ -o test -lboost_locale test.cpp It run correctly - it prints "שלום" to standard output. However, when compiling with clang: clang++ -o test -lboost_locale test.cpp It produces, as far as I can tell, gibberish - it prints "ש××" to standard output. The hex values of the output are: 0xc3 0x97 0xc2 0xa9 0xc3 0x97 0xc2 0x9c 0xc3 0x97 0xc2 0x95 0xc3 0x97 0xc2 0x9d -------- What I'd like is to have a stream that accepts wide strings in the current locale and produces narrow strings in UTF-8 - am I completely off the track? If so, what should I do instead, and if not, what am I doing wrong? Thanks! - Jesse Beder ------- (code follows) #include <boost/iostreams/stream.hpp> #include <boost/iostreams/categories.hpp> #include <boost/iostreams/code_converter.hpp> #include <boost/locale.hpp> #include <iostream> namespace io = boost::iostreams; class consumer { public: typedef char char_type; typedef io::sink_tag category; std::streamsize write(const char* s, std::streamsize n) { std::cout.write(s,n); return n; } }; int main() { typedef io::code_converter<consumer> converter_device; typedef io::stream<converter_device> converter_stream; consumer cons; converter_device dev; boost::locale::generator gen; dev.imbue(gen("en_US.UTF-8")); dev.open(cons); converter_stream stream; stream.open(dev); stream << L"שלום"; return 0; }
----- Original Message -----
When compiling with gcc:
g++ -o test -lboost_locale test.cpp
It run correctly - it prints "שלום" to standard output.
However, when compiling with clang:
clang++ -o test -lboost_locale test.cpp
It produces, as far as I can tell, gibberish - it prints "ש××" to standard output. The hex values of the output are:
0xc3 0x97 0xc2 0xa9 0xc3 0x97 0xc2 0x9c 0xc3 0x97 0xc2 0x95 0xc3 0x97 0xc2 0x9d
It is clang... It does not handle string literal's character set correctly. If you encode the wide string as L"\u0539\u05dc\u05d5\u05dd" the sample should work. So if your software does not embed inline unicode literals there should be no problem. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.com/ CppDB - C++ SQL Connectivity: http://cppcms.com/sql/cppdb/
participants (2)
-
Artyom Beilis
-
Jesse Beder