[date_time] UNICODE and wcout problems
data:image/s3,"s3://crabby-images/882bf/882bfc1e480422d18761a211531793e3b5ed124c" alt=""
Hi,
in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no
longer print "international" characters. When I use a temporary
wostringstream for printing the ptime, everything is OK. Minimal repro
see below (the "2" at the end is never printed). With some characters
like the "š" in the example, the output is totally cut off; with others,
like "á", the codepage is changed, so the characters are displayed
incorrectly.
Any suggestions?
Thanks,
Filip
// _UNICODE must be defined; the "2" is never printed.
#include <string>
#include <iostream>
#include <sstream>
#include
data:image/s3,"s3://crabby-images/d55db/d55db063c94acfc5dadbc1528a776499c0194b45" alt=""
Filip Konvička wrote:
Hi,
in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no longer print "international" characters. When I use a temporary wostringstream for printing the ptime, everything is OK. Minimal repro see below (the "2" at the end is never printed). With some characters like the "š" in the example, the output is totally cut off; with others, like "á", the codepage is changed, so the characters are displayed incorrectly.
Any suggestions?
Thanks, Filip
// _UNICODE must be defined; the "2" is never printed. #include <string> #include <iostream> #include <sstream> #include
#include
using boost::posix_time::ptime; using boost::date_time::not_a_date_time; int main() { wstring intl=L"\x161"; wostringstream ss; ss << ptime(not_a_date_time); wcout << ss.str() << endl; wcout << intl << endl; wcout << L"1" << endl; wcout << ptime(not_a_date_time) << endl; wcout << intl << endl; wcout << L"2" << endl; }
No ideas. I suspect this is a bad interaction between streams and terminal i/o...somehow the unicode output is making the subsequent data invisible or something. You could change the characters used for not_a_date_time in the facet and see if that makes things better, but no telling when you'd run into this in some other context. Jeff
data:image/s3,"s3://crabby-images/6517d/6517d1f443380423c45c95ff3515796c64c2fe4c" alt=""
Filip Konvička wrote:
Hi,
in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no longer print "international" characters. When I use a temporary wostringstream for printing the ptime, everything is OK. Minimal repro see below (the "2" at the end is never printed). With some characters like the "š" in the example, the output is totally cut off; with others, like "á", the codepage is changed, so the characters are displayed incorrectly.
Any suggestions?
This is a problem in the MSVC libraries. If you print a character above code 255 then the stream crashes and is good for nothing afterwards. Only std::wstringstream doesn't have this problem. I think if you buy the Dinkumware libraries this works. The _cputws function does work as you would expect, but it can't be piped. The behaviour also changes if you run the program from a command shell with Unicode turned on ( cmd.exe /u). I did talk to PJ Plauger about it on clc. This is what he explained: "When you write to a wofstream, the wchar_t sequence you write gets converted to a byte sequence written to the file. How that conversion occurs depends on the codecvt facet you choose. Choose none any you get some default. In the case of VC++ the default is pretty stupid -- the first 256 codes get written as single bytes and all other wide-character codes fail to write. " http://groups.google.com/group/comp.lang.c++/browse_thread/thread/3c203253708befb5/1bc5d68887f1a72d?lnk=st&q=&rnum=107 Kirit
data:image/s3,"s3://crabby-images/882bf/882bfc1e480422d18761a211531793e3b5ed124c" alt=""
Kirit Sælensminde 26.5.2007 5:35:
Filip Konvička wrote:
Hi,
in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no longer print "international" characters. When I use a temporary wostringstream for printing the ptime, everything is OK. Minimal repro see below (the "2" at the end is never printed). With some characters like the "š" in the example, the output is totally cut off; with others, like "á", the codepage is changed, so the characters are displayed incorrectly.
Any suggestions?
This is a problem in the MSVC libraries. If you print a character above code 255 then the stream crashes and is good for nothing afterwards. Only std::wstringstream doesn't have this problem.
I think if you buy the Dinkumware libraries this works. The _cputws function does work as you would expect, but it can't be piped. The behaviour also changes if you run the program from a command shell with Unicode turned on ( cmd.exe /u).
I did talk to PJ Plauger about it on clc. This is what he explained:
"When you write to a wofstream, the wchar_t sequence you write gets converted to a byte sequence written to the file. How that conversion occurs depends on the codecvt facet you choose. Choose none any you get some default. In the case of VC++ the default is pretty stupid -- the first 256 codes get written as single bytes and all other wide-character codes fail to write. "
How do you explain that the workaround works, then? When I don't send any ptime to wcout, all wcin / wcout i/o works as expected, including international characters (all I do is call setlocale(LC_ALL, ".OEM"); at startup). Filip
data:image/s3,"s3://crabby-images/6517d/6517d1f443380423c45c95ff3515796c64c2fe4c" alt=""
Filip Konvička wrote:
Kirit Sælensminde 26.5.2007 5:35:
Filip Konvička wrote:
Hi,
in MSVC 8.0 _UNICODE build, after I send a ptime to wcout, I can no longer print "international" characters. When I use a temporary wostringstream for printing the ptime, everything is OK. Minimal repro see below (the "2" at the end is never printed). With some characters like the "š" in the example, the output is totally cut off; with others, like "á", the codepage is changed, so the characters are displayed incorrectly.
Any suggestions?
This is a problem in the MSVC libraries. If you print a character above code 255 then the stream crashes and is good for nothing afterwards. Only std::wstringstream doesn't have this problem.
I think if you buy the Dinkumware libraries this works. The _cputws function does work as you would expect, but it can't be piped. The behaviour also changes if you run the program from a command shell with Unicode turned on ( cmd.exe /u).
I did talk to PJ Plauger about it on clc. This is what he explained:
"When you write to a wofstream, the wchar_t sequence you write gets converted to a byte sequence written to the file. How that conversion occurs depends on the codecvt facet you choose. Choose none any you get some default. In the case of VC++ the default is pretty stupid -- the first 256 codes get written as single bytes and all other wide-character codes fail to write. "
How do you explain that the workaround works, then? When I don't send any ptime to wcout, all wcin / wcout i/o works as expected, including international characters (all I do is call setlocale(LC_ALL, ".OEM"); at startup).
I'm not sure which workaround you're referring to I'm afraid. I didn't
notice any call to setlocale in your example. As for why the \x161
displays I don't know (if that is what you are saying happens). Is it a
character available in the code page for the machine you are using?
The Unicode for the console should be able to handle the full range of
display restricted only by the font in use. The streams implementation
doesn't have such wide applicability though as it narrows it to eight
bit output. All the experimentation I've done leads me to the conclusion
that _cputws is able to display the widest range of characters properly,
but only if you start the command shell with the Unicode handling turned on.
I'm not suggesting that this is the only possible explanation for what
you are seeing, but it seemed a reasonable possibility given your
description.
K
This is the program I was using to test things (again must be compiled
with _UNICODE):
#include <iostream>
#include
data:image/s3,"s3://crabby-images/882bf/882bfc1e480422d18761a211531793e3b5ed124c" alt=""
How do you explain that the workaround works, then? When I don't send any ptime to wcout, all wcin / wcout i/o works as expected, including international characters (all I do is call setlocale(LC_ALL, ".OEM"); at startup).
I'm not sure which workaround you're referring to I'm afraid. I didn't notice any call to setlocale in your example. As for why the \x161 displays I don't know (if that is what you are saying happens). Is it a character available in the code page for the machine you are using?
The workaround is that I format ptime via wostringstream rather than directly send it to wcout. The character \x161 displays correctly in the first case, but does not (or, more exactly - looks differently) after sending a ptime to wcout.
The Unicode for the console should be able to handle the full range of display restricted only by the font in use. The streams implementation doesn't have such wide applicability though as it narrows it to eight bit output. All the experimentation I've done leads me to the conclusion that _cputws is able to display the widest range of characters properly, but only if you start the command shell with the Unicode handling turned on.
That does not matter much, I think. The "setlocale(LC_ALL, ".OEM")" call was indeed left out from my example, as it only ensures that characters received via wcin are correctly translated to wchar_t (I assume that the other option is running cmd /u, thanks for the tip!). If I don't call this, the characters that I read from the console via wcin are different from what I see in the debugger watch window, however they are translated to their original form when printed back to wcout. (Unless I send a ptime to wcout, that is.)
I'm not suggesting that this is the only possible explanation for what you are seeing, but it seemed a reasonable possibility given your description.
:-) I'm *not* having problems reading/writing international characters as long as I don't send ptime to wcout. I did some debugging on the matter and I suspect that the line boost/date_time/posix_time/posix_time_io.hpp:61 is the culprit. It changes the locale of wcout and whatever machinery should revert this back, it does not. This agrees with my observation that using a wostringstream for formatting does not damage wcout. What do you think? Cheers, Filip
data:image/s3,"s3://crabby-images/882bf/882bfc1e480422d18761a211531793e3b5ed124c" alt=""
I did some debugging on the matter and I suspect that the line boost/date_time/posix_time/posix_time_io.hpp:61 is the culprit. It changes the locale of wcout and whatever machinery should revert this back, it does not. This agrees with my observation that using a wostringstream for formatting does not damage wcout.
What do you think?
No ideas? I really think that the above code is defective. Should I submit a bug report then? Cheers, Filip
data:image/s3,"s3://crabby-images/d55db/d55db063c94acfc5dadbc1528a776499c0194b45" alt=""
Filip Konvic(ka wrote:
I did some debugging on the matter and I suspect that the line boost/date_time/posix_time/posix_time_io.hpp:61 is the culprit. It changes the locale of wcout and whatever machinery should revert this back, it does not. This agrees with my observation that using a wostringstream for formatting does not damage wcout.
What do you think?
No ideas? I really think that the above code is defective. Should I submit a bug report then?
It's not according to my understanding (see other mail). Right now I'd say the bug belongs to the standard library provider... Jeff
data:image/s3,"s3://crabby-images/d55db/d55db063c94acfc5dadbc1528a776499c0194b45" alt=""
Filip Konvic(ka wrote: ...catching up....
:-) I'm *not* having problems reading/writing international characters as long as I don't send ptime to wcout.
I did some debugging on the matter and I suspect that the line boost/date_time/posix_time/posix_time_io.hpp:61 is the culprit. It changes the locale of wcout and whatever machinery should revert this back, it does not. This agrees with my observation that using a wostringstream for formatting does not damage wcout.
What do you think?
By itself I don't see how this is a problem. You'll note the previous line: std::locale l = std::locale(os.getloc(), f); This code is adds a facet to the existing locale, makes a copy, which then gets imbued into the stream. So, all the previously existing locale settings should be preserved. Jeff
data:image/s3,"s3://crabby-images/882bf/882bfc1e480422d18761a211531793e3b5ed124c" alt=""
I did some debugging on the matter and I suspect that the line boost/date_time/posix_time/posix_time_io.hpp:61 is the culprit. It changes the locale of wcout and whatever machinery should revert this back, it does not. This agrees with my observation that using a wostringstream for formatting does not damage wcout.
By itself I don't see how this is a problem. You'll note the previous line:
std::locale l = std::locale(os.getloc(), f);
This code is adds a facet to the existing locale, makes a copy, which then gets imbued into the stream. So, all the previously existing locale settings should be preserved. Hmm, I see... Thanks for the explanation, I'll stick to stringstream
formatting then. Cheers, Filip
data:image/s3,"s3://crabby-images/150dc/150dc2d9e34237d05afe785afd556a3849880b6d" alt=""
Hello, I think in boost 1.34.0 the 64-bit for toolset=sun is missing. I'm using bjam to build boost. My environment: OS: Solaris 10 x86_64 (with latest patches) CC: Sun C++ 5.8 Patch 121018-10 2007/02/21 (Sun Studio 11 with latest patches) With 1.32.0 I have used the following command to build 64-bit binaries: tools/build/jam_src/bin/bjam -sHAVE_ICU=1 -sICU_PATH=$$PREFIX -sTOOLS=sunpro -d2 -j4 -sBUILD=debug\ "<instruction-set>athlon64" --prefix=$$PREFIX --includedir=$$PREFIX/include --libdir=$$PREFIX/lib/amd64 Now I'm using the following command to build boost: tools/jam/src/bin/bjam --v2 --toolset=sun stdlib=sun-stlport -d2 -j4 variant=debug --without-iostreams --without-python --without-regex --prefix=$$PREFIX --includedir=$$PREFIX/include --libdir=$$PREFIX/lib/amd64 I see no way (without patching tools/build/v2/tools/sun.jam) to get 64-bit binaries. Am I missing something?? Best regards, Markus
participants (4)
-
Filip Konvička
-
Jeff Garland
-
Kirit Sælensminde
-
Markus Bernhardt