[nowide] Request for interest (nowide unicode support for windows)

Hello All, I've recently written a small library that allows writing platform independent Unicode aware applications transparently. Problem, basic stuff like opening a file, deleting it, is generally accepted for granted and indeed, STL provides std::fstream and C library provides FILE* API like std::fopen, std::remove, std::rename. However, this API is broken under Windows any time we talk about some basic localization like using Unicode file names. Unlike all other Windows development moved to redesigning API to wide characters instead of adapting backward compatible UTF-8 locales into the core system. Result: it is total nightmare to write any kind of Unicode aware cross platform programs. Even when trying to use wide strings and calling _wfopen or _wremove where needed is not enough, as there is no simple replacement for std::fstream. MSVC provided non-standard extension where std::fstream::open() receives wide string, but this is not accepted by many other compilers including MinGW gcc that has shared libstdc++ over multiple platforms. Not talking about that standard does not define std::fstream::open(wchar_t const *,...) (and not in Tr1 as well). So... proposed solution (in short): namespace boost { namespace nowide { #if !defined(BOOST_WIN32) using namespace std; #else // Windows Wide API std::wstring convert(std::string const &); std::string convert(std::wstring const &); FILE *fopen(char const *,char const *); FILE *freopen(char const *,char const *,FILE *); int remove(char const *); int rename(char const *,char const *); template<typename Char,typename Traits ...> basic_filebuf { ... open(char const *,...); ... }; template<typename Char...> basic_istream {...}; template<typename Char...> basic_ostream {...}; template<typename Char...> basic_fstream {...}; } } When working on non-win32 platform it would use native API (and most POSIX OSes use UTF-8 nativly) On whindows each of these classes and functions assumes UTF-8 strings as input and map underlying functions to _w* alternatives or in case of basic_filebuf implements basic_filebuf over FILE *api and _wfopen. This would allow much easier writing cross platform application using unified and standard API instead of non-standard wide API. Note, functions boost::nowide::convert would allow adapt any library transparently use of widely used UTF-8 API instead of WIN32 one. I had implemented this for my own projects, I'm asking if boost is interested in something like that at all. Artyom

AMDG Artyom wrote:
I've recently written a small library that allows writing platform independent Unicode aware applications transparently.
Problem, basic stuff like opening a file, deleting it, is generally accepted for granted and indeed, STL provides std::fstream and C library provides FILE* API like std::fopen, std::remove, std::rename.
However, this API is broken under Windows any time we talk about some basic localization like using Unicode file names.
Unlike all other Windows development moved to redesigning API to wide characters instead of adapting backward compatible UTF-8 locales into the core system.
Result: it is total nightmare to write any kind of Unicode aware cross platform programs.
Even when trying to use wide strings and calling _wfopen or _wremove where needed is not enough, as there is no simple replacement for std::fstream. MSVC provided non-standard extension where std::fstream::open() receives wide string, but this is not accepted by many other compilers including MinGW gcc that has shared libstdc++ over multiple platforms. Not talking about that standard does not define std::fstream::open(wchar_t const *,...) (and not in Tr1 as well).
What about boost::filesystem::path? In Christ, Steven Watanabe

What about boost::filesystem::path?
In Christ, Steven Watanabe
This has nothing to do with filesystem::path, it is about fixing issues of standard library under Windows where fopen of std::fstream is not capable of opening ordinary files. And BTW simple boost::filesystem::path has exactly the same issue when it is not "wide path" under Microsoft Windows. Artyom

On Sun, Jun 13, 2010 at 12:48 PM, Artyom <artyomtnk@yahoo.com> wrote:
What about boost::filesystem::path?
In Christ, Steven Watanabe
This has nothing to do with filesystem::path, it is about fixing issues of standard library under Windows where fopen of std::fstream is not capable of opening ordinary files.
And BTW simple boost::filesystem::path has exactly the same issue when it is not "wide path" under Microsoft Windows.
Version 3, now in trunk, is totally "wide path" under Windows, at least with the Microsoft supplied standard library. And even with Cygwin, everything is totally wide path except that wide paths in file opens are converted to narrow paths for the actual i/o stream call. --Beman

Version 3, now in trunk, is totally "wide path" under Windows, at least with the Microsoft supplied standard library. And even with Cygwin, everything is totally wide path except that wide paths in file opens are converted to narrow paths for the actual i/o stream call.
Question: Can I write: boost::filesystem::fstream f("שלום.txt",std::ios_base::out); When "שלום.txt" is UTF-8 string and Unicode file name will be created? If so, way to go. If you suggesting: boost::filesystem::fstream f(L"שלום.txt",std::ios_base::out); Then this is not what I'm talking about. I'm not talking about "Wide" path -- this is exectly what I was writing "nowide" make a library compatible with C/C++ **standard** functions like std::fstream::open(char const *,...) or std::fopen(char const *,...) but be fully Unicode enabled (utf-8) as they are on all-other operating systems without all "wide" api. Is somebody interested? Artyom

On Sun, 13 Jun 2010 10:24:15 -0700 (PDT), Artyom wrote:
I'm not talking about "Wide" path -- this is exectly what I was writing "nowide" make a library compatible with C/C++ **standard** functions like std::fstream::open(char const *,...) or std::fopen(char const *,...) but be fully Unicode enabled (utf-8) as they are on all-other operating systems without all "wide" api.
Is somebody interested?
I am. I don't know if this is the right solution but it's definitely worth some thought. ATM I'm writing my library interfaces to take basic_path<T> paramters so that Windows developers can pass a fs::wpath and others can pass fs::path. It would be nice if everyone could pass the same thing. Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

AMDG Artyom wrote:
Question:
Can I write:
boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file name will be created? If so, way to go.
In v3, yes. In Christ, Steven Watanabe

On Tue, Jun 15, 2010 at 10:08 AM, Steven Watanabe <watanabesj@gmail.com> wrote:
AMDG
Artyom wrote:
Question:
Can I write:
boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file name will be created? If so, way to go.
In v3, yes.
There are some caveats, but it should work, and there are some fairly similar test cases passing all compilers. To actually write the string literal like that, the compiler must accept UTF-8 in string literals, for example. On windows, the codepage has to be set to UTF-8. Those are issues that affect any solution, not just filesystem v3. --Beman

On Tue, 15 Jun 2010 11:59:21 -0400, Beman Dawes wrote:
Can I write:
boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file name will be created? If so, way to go.
In v3, yes.
There are some caveats, but it should work, and there are some fairly similar test cases passing all compilers.
This is not what I'd understood from our previous discussion. I was under the impression filesystem v3 running on Windows would take this narrow path string and convert it to UTF-16 using the *local code page*. This means the example above would only work if the computer in question were set to Hebrew. Even that might not work - I'm not sure if the Hebrew code page contains the necessary characters to represent 'txt'. Did I misunderstand? Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Can I write:
boost::filesystem::fstream
f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and
Unicode file name will be created?
If so, way to go.
In v3, yes.
There are some caveats, but it should work, and there are some fairly similar test cases passing all compilers.
This is not what I'd understood from our previous discussion. I was under the impression filesystem v3 running on Windows would take this narrow path string and convert it to UTF-16 using the *local code page*. This means the example above would only work if the computer in question were set to Hebrew.
[snip]
Did I misunderstand?
Yes, you did. When I was talking about UTF-8 I mean Unicode and not subset. For example, in my case I want to open a file std::ofstream f("سلام-שלום-Peace-Мир.txt") I can't do this on Windows (only) So I open it with nowide::ofstream f("سلام-שלום-Peace-Мир.txt") And it works on Windows as well. The only operating system that does not allow **any** file being opened with std::fstream::open or std::fopen is Windows and this is what the library wants to fix. You can download my code there: http://art-blog.no-ip.info/files/nowide.zip It gives you: STL's nowide::ifstream nowide::ofstream nowide::fstream nowide::filebuf STDlib's nowide::fopen nowide::freopen nowide::remove nowide::rename All using UTF-8 strings (as it usually work on all modern operating systems) Artyom

On Tue, 15 Jun 2010 12:38:06 -0700 (PDT), Artyom wrote:
Can I write:
boost::filesystem::fstream f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file name will be created? If so, way to go.
In v3, yes.
There are some caveats, but it should work, and there are some fairly similar test cases passing all compilers.
This is not what I'd understood from our previous discussion. I was under the impression filesystem v3 running on Windows would take this narrow path string and convert it to UTF-16 using the *local code page*. This means the example above would only work if the computer in question were set to Hebrew.
[snip]
Did I misunderstand?
Yes, you did.
When I was talking about UTF-8 I mean Unicode and not subset.
Me too. I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings as UTF-8 by default. Berman said that it did but I beg to differ. Here's what the comments say: // For Windows, wchar_t strings do not undergo conversion. char strings // are converted using the "ANSI" or "OEM" code pages, as determined by // the AreFileApisANSI() function, or, if a conversion argument is given, // using a conversion object modeled on std::wstring_convert. In other words "שלום.txt" would be interpreted as being in whatever encoding the local code page is set to and would, therefore, produce a path containing gibberish for most people. This is standard Windows behaviour :P Your problem is yet another step further than this. Assuming fs3 correctly converted "שלום.txt" to the UTF-16 equivalent, how do you then open a file using this wide-char name? Well, MSVC has wchar_t overloads so this works fine. You're right about glibc++/MinGW though. fs::fstream will fail there. Rather than introducing a nowide library, why don't we just try to fix this in Boost.Filesystem? Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Me too.
I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings as UTF-8 by default. Berman said that it did but I beg to differ. Here's what the comments say:
// For Windows, wchar_t strings do not undergo conversion. char strings // are converted using the "ANSI" or "OEM" code pages, as determined by // the AreFileApisANSI() function, or, if a conversion argument is given, // using a conversion object modeled on std::wstring_convert.
In other words "שלום.txt" would be interpreted as being in whatever encoding the local code page is set to and would, therefore, produce a path containing gibberish for most people. This is standard Windows behaviour :P
This standard Windows behavior is exactly **the** problem. To be honest, have you seen anybody using "wide-path" outside of Windows scope? Do you actually need such "wide-path" for POSIX platforms? The answer is not. Actually, POSIX OS does not care about filename charset, as I can create a file std::ofstream f("\xf9\xec\xe5\xed.txt"); Which is valid file (שלום in ISO-8859-8) but invalid UTF-8. But it is valid file-name (and the locale is UTF-8 locale).
Your problem is yet another step further than this. Assuming fs3 correctly converted "שלום.txt" to the UTF-16 equivalent, how do you then open a file using this wide-char name? Well, MSVC has wchar_t overloads so this works fine. You're right about glibc++/MinGW though. fs::fstream will fail there. Rather than introducing a nowide library, why don't we just try to fix this in Boost.Filesystem?
I think that this can be fixed (the way I fixed it in nowide implementing fstreambuf over stdio+_wfopen) http://art-blog.no-ip.info/files/nowide.zip But this is one particular problem. There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath --------------------- But this is a part of one bigger problem. When I develop cross platform applications I have following options for operating of files. For example when I want to remove, rename, create a file in a program writing cross platform applications, writing using standard platform independent C++, Writing for POSIX operating systems and for MS Windows. OS \ Str | std::string | std::wstring | ----------------------------------------------- Std C++ | Ok | Not Defined! POSIX | Ok | Not Defined! WinAPI | Not UTF-8 | Ok What I can see. I need either use wide strings that works only on Windows but require me to convert to other encoding for operations on files. Or I may use normal strings as standard requires and have problems with Windows as it is not fully supported. Or I need to write two kinds of code: - One for Windows using "Wide" strings - One for anything else using normal strings. Because windows does not support UTF-8 code-page. So far? Why? Why do you need all this if you can just create a tiny layer that makes Window support UTF-8 code page by converting std::string to std::wstring and calling appropriate API? My Opinion: ----------- - There is Neither use nor Need of "Wide" strings for file system operations on all platforms but Windows. - Introducing boost::filesystem::wpath does not help as it meaningless on other OSes. - Using Wide strings is extremely error prone in cross platform applications as on Windows they are UTF-16 and on POSIX they are UTF-32 encodings. Wide Path support just make our applications more complicated and error prone. So... Just create an API that is friendly to UTF-8 strings and forget about this hell. ------------- But from what I see this will never happen in Boost as it is too Windows centric, and Windows is too ignorant to basic programmers needs who want to write a portable programs. Regards. Artyom P.S.: The title of this mail is request for interest. It is ok not to have one.

On Wed, 16 Jun 2010 12:50:19 -0700 (PDT), Artyom wrote:
I think that this can be fixed (the way I fixed it in nowide implementing fstreambuf over stdio+_wfopen)
http://art-blog.no-ip.info/files/nowide.zip
But this is one particular problem.
There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath
Really? I doubt that. In FSv2 it takes a template path: template <class Path> bool remove(const Path& p, system::error_code & ec = singular ); This delegates to RemoveFileA if passed a path and RemoveFileW if passed a wpath. glibc++/MinGW presumably uses the posix_remove API so this does, again, suffer from the problem. We could work around it in boost though I can't help but feel this is a MinGW problem: if it wants to work the windows way is should provide wide APIs as well, if it wants to pretend it's POSIX is should interpret narrow strings as UTF-8.
When I develop cross platform applications I have following options for operating of files.
For example when I want to remove, rename, create a file in a program writing cross platform applications, writing using standard platform independent C++, Writing for POSIX operating systems and for MS Windows.
OS \ Str | std::string | std::wstring | ----------------------------------------------- Std C++ | Ok | Not Defined! POSIX | Ok | Not Defined! WinAPI | Not UTF-8 | Ok
What I can see. I need either use wide strings that works only on Windows but require me to convert to other encoding for operations on files.
Or I may use normal strings as standard requires and have problems with Windows as it is not fully supported.
We could potentially fix this in Filesystem v3 if it interpreted incoming narrow strings as UTF-8. Then you could create a 'path' using whichever type of string you like and the boost::filesystem functions would 'just work' (ok, issues with MinGW but nothing we can't work around by incorporating your code).
So far? Why? Why do you need all this if you can just create a tiny layer that makes Window support UTF-8 code page by converting std::string to std::wstring and calling appropriate API?
Yep, that's pretty much what I'm saying.
- Introducing boost::filesystem::wpath does not help as it meaningless on other OSes.
It's gone in v3.
So... Just create an API that is friendly to UTF-8 strings and forget about this hell.
+1 from me with one modification: don't prevent using wide path on Windows. Often you will need to pass a wide path that you get from somewhere else and it would be a pain if we had to convert these to UTF-8 manually.
But from what I see this will never happen in Boost as it is too Windows centric, and Windows is too ignorant to basic programmers needs who want to write a portable programs.
Why? Boost.Filesystem v3 almost does all of this already. It would need two changes to make it work exactly as you want: - Interpret narrow strings as UTF-8 by default on Windows (the user could always imbue it with the local code page facet if the really wanted to interact with the 'A' versions of Windows APIs). - Work around the MinGW 'bug' by incorporating some of your code.
P.S.: The title of this mail is request for interest. It is ok not to have one.
I'm very much interested. Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath
Really? I doubt that. In FSv2 it takes a template path:
I was talking about v3
This delegates to RemoveFileA if passed a path and RemoveFileW if passed a wpath. glibc++/MinGW presumably uses the posix_remove API so this does, again, suffer from the problem. We could work around it in boost though I can't help but feel this is a MinGW problem: if it wants to work the windows way is should provide wide APIs as well, if it wants to pretend it's POSIX is should interpret narrow strings as UTF-8.
It is not about "pretending to work on POSIX" GCC's stdlibc++ uses CRTL, same as if you call stdlib remove it would use DeleteFileA and if you use _wremove it would call DeleteFileW. And you can use _wremove in MinGW as it is CRTL's function. This has absolutely nothing to do with POSIX
We could potentially fix this in Filesystem v3 if it interpreted incoming narrow strings as UTF-8. Then you could create a 'path' using whichever type of string you like and the boost::filesystem functions would 'just work' (ok, issues with MinGW but nothing we can't work around by incorporating your code).
This would be very good solution..
It's gone in v3.
Very good.
So... Just create an API that is friendly to UTF-8 strings and forget about this hell.
+1 from me with one modification: don't prevent using wide path on Windows. Often you will need to pass a wide path that you get from somewhere else and it would be a pain if we had to convert these to UTF-8 manually.
Agree. if windows users want to use wide path, let them, but this code would be Windows only.
Why? Boost.Filesystem v3 almost does all of this already. It would need two changes to make it work exactly as you want:
- Interpret narrow strings as UTF-8 by default on Windows (the user could always imbue it with the local code page facet if the really wanted to interact with the 'A' versions of Windows APIs).
This is not solution: Windows had never supported, does not support according to Lars Viklund links it seems like it will never be supported. See this quote:
Judging by assorted postings by Michael Kaplan (Unicode Grandmaster at Microsoft), there seems to be much fun to be derived from trying to use the UTF-8 codepage with narrow APIs.
[1] http://blogs.msdn.com/b/michkap/archive/2006/07/14/665714.aspx [2] http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx [3] http://blogs.msdn.com/b/michkap/archive/2006/03/13/550191.aspx [4] http://blogs.msdn.com/b/michkap/archive/2007/05/11/2547703.aspx
Lars Viklund
So the only way to do the thing right is **always** use Wide API on windows and convert normal strings to wide one just before calling apropriate API functions.
- Work around the MinGW 'bug' by incorporating some of your code.
I just want to be clear... This is not a bug (I know you put it in quotes). This is what C++ says... std::basic_streambuf, **does not** have open() member function that receives wide strings. Artyom

On Wed, 16 Jun 2010 23:50:39 -0700 (PDT), Artyom wrote:
There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath
Really? I doubt that. In FSv2 it takes a template path:
I was talking about v3
v3 makes it even easier. Internally it always calls RemoveFileW on Windows regardless is whether the path was created from a narrow or wide string.
Why? Boost.Filesystem v3 almost does all of this already. It would need two changes to make it work exactly as you want:
- Interpret narrow strings as UTF-8 by default on Windows (the user could always imbue it with the local code page facet if the really wanted to interact with the 'A' versions of Windows APIs).
This is not solution:
Windows had never supported, does not support according to Lars Viklund links it seems like it will never be supported.
I think you misunderstood me. I meant Boost.Filesystem could interpret narrow strings as UTF-8 by default on Windows. Then it can convert it and store it internally as a wide string (it already does this) then its operations, remove, fstream etc., can call the W versions of Windows API function (which, again, it already does)
So the only way to do the thing right is **always** use Wide API on windows and convert normal strings to wide one just before calling apropriate API functions.
Yes. Except that Filesystem handles this by converting immediately. It makes little difference. Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

I was talking about v3
v3 makes it even easier. Internally it always calls RemoveFileW on Windows regardless is whether the path was created from a narrow or wide string.
Excellent
Windows had never supported, does not support according to Lars Viklund links it seems like it will never be supported.
I think you misunderstood me. I meant Boost.Filesystem could interpret narrow strings as UTF-8 by default on Windows.
This would be really great.
So the only way to do the thing right is **always** use Wide API on windows and convert normal strings to wide one just before calling appropriate API functions.
Yes. Except that Filesystem handles this by converting immediately. It makes little difference.
If so this is the way to go. Having boost::filesystem::v3 treating path as UTF-8 on windows would solve lots of issues. Artyom

I thought some actual code might be of interest. Attached is a screenshot showing that the code worked as expected with VC++ 9.0. --Beman #include <boost/filesystem/fstream.hpp> #include <boost/filesystem/detail/utf8_codecvt_facet.hpp> #include <iostream> #define BOOST_FILESYSTEM_VERSION 3 int main() { // "שלום" is "hello" in Hebrew; thanks to Artyom for the example std::locale global_loc = std::locale(); std::locale loc(global_loc, new boost::filesystem::detail::utf8_codecvt_facet); std::locale old_loc = boost::filesystem::path::imbue(loc); boost::filesystem::ofstream f("שלום.narrow"); return 0; }

Thats what Alexander Lamaison was talking about. And I like it. Even I would prefer that UTF-8 would be default. But does not mind require imbuing UTF-8 facet to locale. Small points: - This would not work on MinGW and many other compilers/standard libraries as they do not have wide overload for filebuf::open. - boost::details::utf8_codecvt... Does not support UTF-16 only UCS-2 so it would not work correctly on characters outside of BMP. But this is rather limitation of standard and problem in Boost's implementation of this facet. Still not sure if it is possible to implement this facet correctly in terms of definition of standard. But my code in nowide implements filebuf that uses "FILE *" and so it can be opened with "_wfopen" I think it can be adopted to Boost.Filesystem. Artyom --- On Thu, 6/17/10, Beman Dawes <bdawes@acm.org> wrote:
From: Beman Dawes <bdawes@acm.org> Subject: Re: [boost] [nowide] Request for interest (nowide unicode support for windows) To: "boost" <boost@lists.boost.org> Date: Thursday, June 17, 2010, 9:52 PM I thought some actual code might be of interest. Attached is a screenshot showing that the code worked as expected with VC++ 9.0.
--Beman
#include <boost/filesystem/fstream.hpp> #include <boost/filesystem/detail/utf8_codecvt_facet.hpp> #include <iostream>
#define BOOST_FILESYSTEM_VERSION 3
int main() { // "שלום" is "hello" in Hebrew; thanks to Artyom for the example
std::locale global_loc = std::locale(); std::locale loc(global_loc, new boost::filesystem::detail::utf8_codecvt_facet); std::locale old_loc = boost::filesystem::path::imbue(loc);
boost::filesystem::ofstream f("שלום.narrow");
return 0; }
-----Inline Attachment Follows-----
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 17 Jun 2010 12:22:57 -0700 (PDT), Artyom wrote:
Thats what Alexander Lamaison was talking about. And I like it.
Even I would prefer that UTF-8 would be default. But does not mind require imbuing UTF-8 facet to locale.
Nowadays, isn't it more likely that a narrow string is UTF-8 encoded than in a particular code page? Alex -- Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

On 06/16/2010 11:50 PM, Artyom wrote:
To be honest, have you seen anybody using "wide-path" outside of Windows scope? Do you actually need such "wide-path" for POSIX platforms?
Well, we actually use wide paths all around in our code, and Boost.Filesystem does a great job at providing a portable API for all platforms, including POSIX. Personally, I think that wide paths are more convenient than UTF-8 since it is easier to apply string processing algorithms on them.

2010/6/16 Alexander Lamaison <awl03@doc.ic.ac.uk>:
I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings as UTF-8 by default. Beman said that it did...
There is a misunderstanding here. V3, like any Windows program, by default interprets narrow strings according to the File code page. You have to configure that yourself if you want it to be UTF-8. Since that is a pain, and you are using Microsoft or one of the other compilers that support wide opens, it seems easier just to convert from the narrow string to the wide string yourself. But if you want to fool around getting the codepage support in place, V3 should handle it AFAIK.
but I beg to differ. Here's what the comments say:
// For Windows, wchar_t strings do not undergo conversion. char strings // are converted using the "ANSI" or "OEM" code pages, as determined by // the AreFileApisANSI() function, or, if a conversion argument is given, // using a conversion object modeled on std::wstring_convert.
In other words "שלום.txt" would be interpreted as being in whatever encoding the local code page is set to and would, therefore, produce a path containing gibberish for most people. This is standard Windows behaviour :P
Your problem is yet another step further than this. Assuming fs3 correctly converted "שלום.txt" to the UTF-16 equivalent, how do you then open a file using this wide-char name? Well, MSVC has wchar_t overloads so this works fine. You're right about glibc++/MinGW though. fs::fstream will fail there. Rather than introducing a nowide library, why don't we just try to fix this in Boost.Filesystem?
Agreed. If anyone wants to submit a patch for glibc++/MinGW that uses the wide Windows API, that would be a better solution. --Beman

I'm saying that Filesystem v3 on Windows doesn't interpret narrow strings as UTF-8 by default. Beman said that it did...
There is a misunderstanding here. V3, like any Windows program, by default interprets narrow strings according to the File code page. You have to configure that yourself if you want it to be UTF-8.
Windows does not support UTF-8 code page at all (actually it is only supported as parameter of WideToMultibyte/MultibyteToWide)
Agreed. If anyone wants to submit a patch for glibc++/MinGW that uses the wide Windows API, that would be a better solution.
This is not a bug. And there is nothing to fix. std::fsteambuf::open(wchar_t const *,..) is not standard but Microsoft Specific extension . Artyom

Can I write:
boost::filesystem::fstream
f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file
name will be created?
If so, way to go.
In v3, yes.
Are you sure about this? How the file will be open? Can you explain what is the path the UTF-8 string passes till the Win32API system call or standard library call? C++ standard defines open only with "char const *" Quoting latest C++ standard draft (section 27.7 std::basic_streambuf) // 27.9.1.4 Members: bool is_open() const; basic_filebuf<charT,traits>* open(const char* s, ios_base::openmode mode); basic_filebuf<charT,traits>* open(const string& s, ios_base::openmode mode); basic_filebuf<charT,traits>* close(); And this it is defined on GCC's libstdc++. As I can see you use std::basic_filebuf for implementing boost::filesystem::basic_fstream. So how do you open "Wide" path or "UTF-8" path using these functions? - Standard library does not accept "wchar_t const *" as parameter to open (with exception of MSVC specific extension) - Windows API does not support UTF-8 codepage. So how do you deal with it? --------------- In the small library I had written I actually implement the basic_filebuf over stdio, and use CRTL Win32 API _wfopen function to open files with Unicode filenames. I hadn't seen anything like that in boost::filesystem::v3 So do I miss something? Artyom

On Tue, Jun 15, 2010 at 2:36 PM, Artyom <artyomtnk@yahoo.com> wrote:
Can I write:
boost::filesystem::fstream
f("שלום.txt",std::ios_base::out);
When "שלום.txt" is UTF-8 string and Unicode file
name will be created?
If so, way to go.
In v3, yes.
Are you sure about this? How the file will be open?
Depends on the standard library implementation. The Dinkumware library, used by Microsoft and some others, has an additional constructor/open that takes a wide character string. The fallback is to use the standard narrow character constructor/open.
Can you explain what is the path the UTF-8 string passes till the Win32API system call or standard library call?
C++ standard defines open only with "char const *"
The wide character overloads are Dinkumware / Microsoft extensions.
Quoting latest C++ standard draft (section 27.7 std::basic_streambuf)
// 27.9.1.4 Members: bool is_open() const; basic_filebuf<charT,traits>* open(const char* s, ios_base::openmode mode); basic_filebuf<charT,traits>* open(const string& s, ios_base::openmode mode); basic_filebuf<charT,traits>* close();
And this it is defined on GCC's libstdc++.
Yep, os if that library is in use, the fallback is to use the narrow character open. And of course that also what is used on POSIX-like systems.
As I can see you use std::basic_filebuf for implementing boost::filesystem::basic_fstream.
So how do you open "Wide" path or "UTF-8" path using these functions?
- Standard library does not accept "wchar_t const *" as parameter to open (with exception of MSVC specific extension) - Windows API does not support UTF-8 codepage.
So how do you deal with it?
Use the Microsoft UTF-codepage, 65001 HTH, --Beman

On Wed, Jun 16, 2010 at 05:23:19PM -0400, Beman Dawes wrote:
On Tue, Jun 15, 2010 at 2:36 PM, Artyom <artyomtnk@yahoo.com> wrote:
- Standard library does not accept "wchar_t const *" as parameter to open (with exception of MSVC specific extension) - Windows API does not support UTF-8 codepage.
So how do you deal with it?
Use the Microsoft UTF-codepage, 65001
Judging by assorted postings by Michael Kaplan (Unicode Grandmaster at Microsoft), there seems to be much fun to be derived from trying to use the UTF-8 codepage with narrow APIs. [1] http://blogs.msdn.com/b/michkap/archive/2006/07/14/665714.aspx [2] http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx [3] http://blogs.msdn.com/b/michkap/archive/2006/03/13/550191.aspx [4] http://blogs.msdn.com/b/michkap/archive/2007/05/11/2547703.aspx -- Lars Viklund | zao@acc.umu.se

In v3, yes.
Are you sure about this? How the file will be open?
Depends on the standard library implementation. The Dinkumware library, used by Microsoft and some others, has an additional constructor/open that takes a wide character string. The fallback is to use the standard narrow character constructor/open.
Standard library, as according to standard does not accepts wide strings. So MSVC/Dinkumware implements... So what? Does STL Port implements this? Does GNU stdc++ implements this? Does Intel compiler's standard library implements this? Does Comeau C/C++ standard library implements this? Does C++0x standards defines this? This "wide string" file access is rather outstanding feature then a common one. Artyom

Version 3, now in trunk, is totally "wide path" under Windows, at least with the Microsoft supplied standard library. And even with Cygwin, everything is totally wide path except that wide paths in file opens are converted to narrow paths for the actual i/o stream call.
--Beman
Quick glance on the boost::filesystem::v3::basic_filebuf There is a problem with MinGW implementation. libstdc++ (as according to standard does not support opening files with wide characters so as I can see you will not be able to open even wide path with boost::filesystem::ifstream on MinGW platform. Correct me if I wrong or missed something. (Not talking about Cygwin that has native UTF-8 support)
participants (6)
-
Alexander Lamaison
-
Andrey Semashev
-
Artyom
-
Beman Dawes
-
Lars Viklund
-
Steven Watanabe