
Hello, I hope someone here will be able to help with this. Sorry, but the background is a little long-winded.... I've just discovered to my dismay that Microsoft's implementation of fstream, ifstream and ofstream are fatally flawed. I'm not sure if these characters will display correctly across this mailing list but consider the following character set:- ΔΗΜΗΤΡΗΣ which is apparently the Greek name "Dimitris". In an English locale, the following path would only be considered valid (by Windows) in a Unicode environment:- C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name That's fine - because the user name doesn't contain valid English characters. However in Greece, that same path should qualify as a valid, non-Unicode file path and should therefore be openable without needing Unicode. For example, when building with MSVC this works:- #include <fcntl> int file = _open("C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name", _O_RDONLY); However, this fails to work - even though it uses the same path:- #include <fstream> using std::fsrream; fstream file("C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name", fstream::in | fstream::out); To cut a long story short, I noticed that libboost offers its own 'filesystem' implementation. Does boost::filesystem implement its own fstream, ifstream and ofstream? If so, would they likely suffer from the same problem (on Windows). If not, where can I find some examples of using boost::filesystem? Thanks, John

Hello, I hope someone here will be able to help with this. Sorry, but the background is a little long-winded....
I've just discovered to my dismay that Microsoft's implementation of fstream, ifstream and ofstream are fatally flawed. I'm not sure if these characters will display correctly across this mailing list but consider the following character set:-
ΔΗΜΗΤΡΗΣ
which is apparently the Greek name "Dimitris". In an English locale, the following path would only be considered valid (by Windows) in a Unicode environment:-
C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name
That's fine - because the user name doesn't contain valid English characters. However in Greece, that same path should qualify as a valid, non-Unicode file path and should therefore be openable without needing Unicode. For example, when building with MSVC this works:-
#include <fcntl>
int file = _open("C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name", _O_RDONLY);
However, this fails to work - even though it uses the same path:-
#include <fstream> using std::fsrream;
fstream file("C:/Users/ΔΗΜΗΤΡΗΣ/some_file_name", fstream::in | fstream::out);
That's very strange. I've just tried the following: changed system locale to russian (Win7: Region&Language --> Administrative-->change system locale...), and tried the above c++ code with path in russian -- it works well. (Note that your source-file should be in the appropriate code-page!)
To cut a long story short, I noticed that libboost offers its own 'filesystem' implementation. Does boost::filesystem implement its own fstream, ifstream and ofstream? If so, would they likely suffer from the same problem (on Windows). If not, where can I find some examples of using boost::filesystem?
Here's its documentation: http://www.boost.org/doc/libs/1_48_0/libs/filesystem/v3/doc/reference.html#F...

John Emmas
I've just discovered to my dismay that Microsoft's implementation of fstream, ifstream and ofstream are fatally flawed.
How so? You fail to explain how it does anything other than what the standard tells you it does. Please give a code example of this flaw. If you are using a UTF16LE path, casting it to const char *, and passing it as an argument to ifstream, then it is no wonder you're not getting the results you expect.

On 24 Jan 2012, at 20:00, Christopher wrote:
John Emmas
writes: I've just discovered to my dismay that Microsoft's implementation of fstream, ifstream and ofstream are fatally flawed.
How so?
You fail to explain how it does anything other than what the standard tells you it does.
Please give a code example of this flaw.
I discovered this afternoon that it's already been acknowledged by Microsoft as a bug. It affects VC8 and 9 but is fixed in VC10. http://connect.microsoft.com/VisualStudio/feedback/details/361133/a-call-to-... Since I'm using VC8 I'll be trying some workarounds tomorrow. I did try the fstream implementation from boost::filesystem (v1.40, I think) but it seemed to have the same problem. Thanks for your reply. John

John Emmas
On 24 Jan 2012, at 20:00, Christopher wrote:
John Emmas
writes: I've just discovered to my dismay that Microsoft's implementation of fstream, ifstream and ofstream are fatally flawed.
How so?
You fail to explain how it does anything other than what the standard
you it does.
Please give a code example of this flaw.
I discovered this afternoon that it's already been acknowledged by Microsoft as a bug. It affects VC8 and 9 but is fixed in VC10.
http://connect.microsoft.com/VisualStudio/feedback/details/361133/a-call-to-
tells the-std-filebuf-open-method-with-a-multibyte-path-that-worked-in-vc7-fails-in- vc9
Since I'm using VC8 I'll be trying some workarounds tomorrow. I did try the
fstream implementation from
boost::filesystem (v1.40, I think) but it seemed to have the same problem. Thanks for your reply.
I still don't understand the problem, even after reading the MS connect write- up. A full code example is not presented. std::fstream has a param that is a const char *. Standard C++ streams are documented to use _ANSI C strings_ for their arguments. If you are passing any unicode character, that falls outside of the range shared by the ANSI character set, then you are passing an invalid parameter. If you use wfstream, you may then pass in a wide string with characters of the UTF16LE encoding that share the same range as the ASCII set as an argument. At the time of the last standard C++ did not consider unicode at all. The new cx11 standard supposedly addresses it, with better locales and facets. You will also find that regardless of project settings being unicode or not, whatever text you pass into a stream and out to file is going to be transformed to the character set your machine's environment is set to use. In my case, I might pass in a UTF16LE encoded wstring to wfstream and open it with a hex editor to find it has been transformed to Windows 1252. That is also documented, but a pain in the arse. So, I conceded that, in order to actually stream unicode to file, I'd have to read and write it as bytes, and insert the BOM myself. It would seem, that for the time being, "There is no standard C++ or Microsoft way to stream unicode to file" (without some nuances). This is all what I gleamed through my debugging woes with unicode. I could be wrong.

On Thu, Jan 26, 2012 at 11:55 AM, Christopher
So, I conceded that, in order to actually stream unicode to file, I'd have to read and write it as bytes, and insert the BOM myself. It would seem, that for the time being, "There is no standard C++ or Microsoft way to stream unicode to file" (without some nuances).
This is all what I gleamed through my debugging woes with unicode. I could be wrong.
Have you tried Boost.Locale? I'm using it for UTF-8 in my Windows application and file output seems to be working fine (obviously you have to imbue the fstream with the locale provided by the library). I output text to a file (Greek, Russian, and Arabic, 3 lines, all in the same file), and then opened it in UltraEdit, which correctly opened it as a UTF-8 encoded file... Disclaimer: I am by no means an i18n/l10n expert so what I'm saying may be totally stupid and wrong.
participants (4)
-
Christopher
-
Igor R
-
John Emmas
-
Joshua Boyce