[filesystem] path does not use global locale's codecvt facet - bug or feature

Hello, Boost.Filesystem v3 uses wide path under windows and can convert it from the narrow one using codecvt facet, so I would expect if the global locale is some locale that has special codecvt facet installed boost.filesystem should use it, i.e.: int main() { boost::locale::generator locale_generator; std::locale::global(locale_generator("en_US.UTF-8")); // Now default codecvt facet is UTF-8 one. boost::filesystem::path p("שלום.txt"); boost::filesystem::ofstream test(p); } However this does not work as expected! I had found that you need to imbue locale explicitly: boost::filesystem::path p; p.imbue(std::locale()); // global one p = "שלום.txt"; boost::filesystem::ofstream test(p); Now it works. Should I open a ticket for this or this is "planned" behavior?

On 3/3/2011 8:31 AM, Artyom wrote:
Hello,
Boost.Filesystem v3 uses wide path under windows and can convert it from the narrow one using codecvt facet, so I would expect if the global locale is some locale that has special codecvt facet installed boost.filesystem should use it, i.e.:
int main() { boost::locale::generator locale_generator; std::locale::global(locale_generator("en_US.UTF-8")); // Now default codecvt facet is UTF-8 one. boost::filesystem::path p("שלום.txt"); boost::filesystem::ofstream test(p); }
However this does not work as expected!
I had found that you need to imbue locale explicitly:
boost::filesystem::path p; p.imbue(std::locale()); // global one p = "שלום.txt"; boost::filesystem::ofstream test(p);
Now it works.
Should I open a ticket for this or this is "planned" behavior?
If it is documented, then it seems to me that this would be the normal behavior. If it is not documented I would open a ticket about it.

On Thu, Mar 3, 2011 at 8:31 AM, Artyom <artyomtnk@yahoo.com> wrote:
Hello,
Boost.Filesystem v3 uses wide path under windows and can convert it from the narrow one using codecvt facet, so I would expect if the global locale is some locale that has special codecvt facet installed boost.filesystem should use it, i.e.:
int main() { boost::locale::generator locale_generator; std::locale::global(locale_generator("en_US.UTF-8")); // Now default codecvt facet is UTF-8 one. boost::filesystem::path p("שלום.txt"); boost::filesystem::ofstream test(p); }
However this does not work as expected!
I had found that you need to imbue locale explicitly:
boost::filesystem::path p; p.imbue(std::locale()); // global one p = "שלום.txt"; boost::filesystem::ofstream test(p);
Now it works.
Should I open a ticket for this or this is "planned" behavior?
That depends. The docs recently (Feb 20, rev 69073) got updated to provide more detail. For Windows, including Cygwin and MinGW, this is part of what the docs say: "The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]" So your original code won't do what you want. It would only work if the codepage was a UTF-8 codepage. Most likely it wasn't, and that's why you needed to do an explicit imbue. It would help to know what compiler, including the version number. Most VC++ and recent gcc or MinGW should be OK. You also have to be sure "שלום.txt" is handled as UTF-8 by your editor and your compiler, and not converted to some other encoding. I'm guessing that's not a problem as your imbue change wouldn't have worked correctly otherwise. --Beman

On Thu, Mar 3, 2011 at 8:29 PM, Beman Dawes <bdawes@acm.org> wrote:
On Thu, Mar 3, 2011 at 8:31 AM, Artyom <artyomtnk@yahoo.com> wrote:
Hello,
Boost.Filesystem v3 uses wide path under windows and can convert it from the narrow one using codecvt facet, so I would expect if the global locale is some locale that has special codecvt facet installed boost.filesystem should use it, i.e.:
int main() { boost::locale::generator locale_generator; std::locale::global(locale_generator("en_US.UTF-8")); // Now default codecvt facet is UTF-8 one. boost::filesystem::path p("שלום.txt"); boost::filesystem::ofstream test(p); }
However this does not work as expected!
I had found that you need to imbue locale explicitly:
boost::filesystem::path p; p.imbue(std::locale()); // global one p = "שלום.txt"; boost::filesystem::ofstream test(p);
Now it works.
Should I open a ticket for this or this is "planned" behavior?
That depends. The docs recently (Feb 20, rev 69073) got updated to provide more detail. For Windows, including Cygwin and MinGW, this is part of what the docs say:
"The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]"
It should use CP_ACP not CP_THREAD_ACP, because that's what the Windows API functions (CreateFile etc) and the C library functions (fopen etc) use. The C++ library functions (fstream::open etc) do in fact use the C global locale (by way of mbstowcs if I remember correctly). (CP_THREAD_ACP should never be used for converting code pages - it's based on the User Locale which is for sort orders and numeric formats.) Yechezkel Mett

On Sun, Mar 6, 2011 at 6:32 AM, Yechezkel Mett <ymett.on.boost@gmail.com> wrote:
On Thu, Mar 3, 2011 at 8:29 PM, Beman Dawes <bdawes@acm.org> wrote: ...
"The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]"
It should use CP_ACP not CP_THREAD_ACP, because that's what the Windows API functions (CreateFile etc) and the C library functions (fopen etc) use. The C++ library functions (fstream::open etc) do in fact use the C global locale (by way of mbstowcs if I remember correctly).
(CP_THREAD_ACP should never be used for converting code pages - it's based on the User Locale which is for sort orders and numeric formats.)
Hum... I marked your previous message http://lists.boost.org/Archives/boost/2010/11/173382.php for action, and then never did anything about it. Sorry, my mistake. It would be very confusing and error prone if std::fstream, boost::filesystem::fstream, and boost::filesystem operational functions treat a narrow string filename differently. Since std::fstream can't be changed, that implies whatever a given standard library does should be what boost filesystem does. That further implies that if library A does it one way, and library B does it a different way, boost filesystem should do it the way standard library version does it, even if that means a program using filesystem compiled with VC++ could behave differently than if compiled with some other compiler. Does that make sense? So a test case is needed that will distinguish between the C++ standard library fstream using CP_ACP, C global locale, or something totally different. Do you already have such test code or could you put something together? Thanks, --Beman

On Sun, Mar 6, 2011 at 4:52 PM, Beman Dawes <bdawes@acm.org> wrote:
On Sun, Mar 6, 2011 at 6:32 AM, Yechezkel Mett <ymett.on.boost@gmail.com> wrote:
On Thu, Mar 3, 2011 at 8:29 PM, Beman Dawes <bdawes@acm.org> wrote: ...
"The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]"
It should use CP_ACP not CP_THREAD_ACP, because that's what the Windows API functions (CreateFile etc) and the C library functions (fopen etc) use. The C++ library functions (fstream::open etc) do in fact use the C global locale (by way of mbstowcs if I remember correctly).
(CP_THREAD_ACP should never be used for converting code pages - it's based on the User Locale which is for sort orders and numeric formats.)
Hum... I marked your previous message http://lists.boost.org/Archives/boost/2010/11/173382.php for action, and then never did anything about it. Sorry, my mistake.
It would be very confusing and error prone if std::fstream, boost::filesystem::fstream, and boost::filesystem operational functions treat a narrow string filename differently. Since std::fstream can't be changed, that implies whatever a given standard library does should be what boost filesystem does.
That further implies that if library A does it one way, and library B does it a different way, boost filesystem should do it the way standard library version does it, even if that means a program using filesystem compiled with VC++ could behave differently than if compiled with some other compiler.
We've just ran into this problem too. Our application is running under Chinese Windows. The C locale is set using std::setlocale(LC_CTYPE,"") to match the active code page of the operating system (Chinese), but the thread locale is adjusted to English to select the English resources, not the Chinese ones. A filename selected (e.g. from a file open dialog box) that include Chinese characters will fail if a boost::filesystem::path object is constructed from the const char* ANSI path. If we just pipe the const char* directly into std::ifstream then it works ok. We had to patch Boost.Filesystem locally to use CP_ACP rather than CP_THREAD_ACP. Would love to see this officially changed. Regards, Pete

On Fri, Mar 18, 2011 at 1:09 PM, PB <newbarker@gmail.com> wrote:
On Sun, Mar 6, 2011 at 4:52 PM, Beman Dawes <bdawes@acm.org> wrote:
On Sun, Mar 6, 2011 at 6:32 AM, Yechezkel Mett <ymett.on.boost@gmail.com> wrote:
On Thu, Mar 3, 2011 at 8:29 PM, Beman Dawes <bdawes@acm.org> wrote: ...
"The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]"
It should use CP_ACP not CP_THREAD_ACP, because that's what the Windows API functions (CreateFile etc) and the C library functions (fopen etc) use. The C++ library functions (fstream::open etc) do in fact use the C global locale (by way of mbstowcs if I remember correctly).
(CP_THREAD_ACP should never be used for converting code pages - it's based on the User Locale which is for sort orders and numeric formats.)
...
We've just ran into this problem too. Our application is running under Chinese Windows. The C locale is set using std::setlocale(LC_CTYPE,"") to match the active code page of the operating system (Chinese), but the thread locale is adjusted to English to select the English resources, not the Chinese ones. A filename selected (e.g. from a file open dialog box) that include Chinese characters will fail if a boost::filesystem::path object is constructed from the const char* ANSI path. If we just pipe the const char* directly into std::ifstream then it works ok.
We had to patch Boost.Filesystem locally to use CP_ACP rather than CP_THREAD_ACP. Would love to see this officially changed.
Be aware that std::setlocale(LC_CTYPE,"") won't do the right thing if the User Locale is not the same as the System Locale. I recommend the following instead: std::locale::global(std::locale(str(boost::format(".%||") % GetACP()).c_str(), LC_CTYPE)); Yechezkel Mett

On Sun, Mar 6, 2011 at 6:52 PM, Beman Dawes <bdawes@acm.org> wrote:
On Thu, Mar 3, 2011 at 8:29 PM, Beman Dawes <bdawes@acm.org> wrote: ...
"The default imbued locale provides a codecvt facet that invokes Windows MultiByteToWideChar or WideCharToMultiByte API's with a codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true, otherwise codepage CP_OEMCP. [Rationale: this is the current behavior of C and C++ programs that perform file operations using narrow character string to identify paths. Changing this in the Filesystem library would be too surprising, particularly where user input is involved. -- end rationale]"
It should use CP_ACP not CP_THREAD_ACP, because that's what the Windows API functions (CreateFile etc) and the C library functions (fopen etc) use. The C++ library functions (fstream::open etc) do in fact use the C global locale (by way of mbstowcs if I remember correctly).
(CP_THREAD_ACP should never be used for converting code pages - it's based on the User Locale which is for sort orders and numeric formats.) ... It would be very confusing and error prone if std::fstream, boost::filesystem::fstream, and boost::filesystem operational functions treat a narrow string filename differently. Since std::fstream can't be changed, that implies whatever a given standard
On Sun, Mar 6, 2011 at 6:32 AM, Yechezkel Mett <ymett.on.boost@gmail.com> wrote: library does should be what boost filesystem does.
That further implies that if library A does it one way, and library B does it a different way, boost filesystem should do it the way standard library version does it, even if that means a program using filesystem compiled with VC++ could behave differently than if compiled with some other compiler.
Does that make sense?
It does, though one could argue that boost should always do it one way, even across multiple implementations that work differently, for portability reasons. (I consider consistency and portability reasons to use boost over the implementation supplied library.) Note that consistency with the Windows API is more likely what the user would expect, unless he's been bitten by the std::fstream behaviour already. I would recommend providing the correct incantation to set the global locale to the Windows ANSI codepage as a note in the documentation - it's not obvious: std::locale::global(std::locale(str(boost::format(".%||") % GetACP()).c_str(), LC_CTYPE));
So a test case is needed that will distinguish between the C++ standard library fstream using CP_ACP, C global locale, or something totally different.
Do you already have such test code or could you put something together?
I don't have test code (I traced through the library in the debugger to work out what it was doing), and for licencing reasons I don't think I could provide such code if I did create it. I would imagine the way to test it would be to create files with known names and content in varying codepages (using CreateFileW or by supplying files with the test), and attempting to open the files with fstream whilst varying the locale. Yechezkel Mett
participants (5)
-
Artyom
-
Beman Dawes
-
Edward Diener
-
PB
-
Yechezkel Mett