[filesystem] Problems with wpath on Linux

Hi, I'm trying to use fs::wpath on Linux and I'm encountering some problems with internal conversion of wide strings to external representation. The problem can be demonstrated with the following code: #include <iostream> #include <boost/filesystem/path.hpp> #include <boost/filesystem/convenience.hpp> namespace fs = boost::filesystem; int main() { // Setting the global locale to be environment locale std::locale::global(std::locale("")); // Setting the wpath locale to be global fs::wpath_traits::imbue(std::locale()); fs::wpath mypath(L"/tmp/some/directory"); fs::create_directories(mypath); } To me, this work looks correct and should work. But it terminates with the following message: terminate called after throwing an instance of 'boost::filesystem::basic_filesystem_error< boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits> >' what(): boost::filesystem::wpath::to_external conversion error Aborted At the same time, the following code works: #include <iostream> #include <boost/filesystem/path.hpp> #include <boost/filesystem/convenience.hpp> #include <libs/filesystem/src/utf8_codecvt_facet.hpp> namespace fs = boost::filesystem; int main() { fs::detail::utf8_codecvt_facet utf8_facet; std::locale loc( std::locale(), &utf8_facet ); fs::wpath_traits::imbue( loc ); fs::wpath mypath(L"/tmp/some/directory"); fs::create_directories(mypath); } which uses a UTF-8 facet from boost itself. The first example should work too - this is my understanding. Who is wrong here - me or boost?

Alexei Alexandrov, le 25 janvier 2008 07:11:
namespace fs = boost::filesystem; int main() { // Setting the global locale to be environment locale std::locale::global(std::locale(""));
// Setting the wpath locale to be global fs::wpath_traits::imbue(std::locale());
fs::wpath mypath(L"/tmp/some/directory"); fs::create_directories(mypath); }
To me, this work looks correct and should work. But it terminates with the following message:
terminate called after throwing an instance of 'boost::filesystem::basic_filesystem_error< boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits> >' what(): boost::filesystem::wpath::to_external conversion error Aborted
At the same time, the following code works:
[snip: Same program as above, using fs::detail::utf8_codecvt_facet instead of the system's locale]
which uses a UTF-8 facet from boost itself.
The first example should work too - this is my understanding. Who is wrong here - me or boost?
... or your platform's implementation of codecvt? This is a wild guess as I don't know anything about your platform (in particular: which implementation of the standard library, and the locale environment variables used when running your program) Éric Malenfant --------------------------------------------- Quidquid latine dictum sit, altum viditur.

The first example should work too - this is my understanding. Who is wrong here - me or boost?
... or your platform's implementation of codecvt?
This is a wild guess as I don't know anything about your platform (in
Eric MALENFANT <Eric.Malenfant <at> sagem-interstar.com> writes: particular: which implementation of
the standard library, and the locale environment variables used when running your program)
Yes, I should have provided this information in the first place. It's Linux, x86, gcc 3.4.6. Locale is en_US.UTF-8

Alexei Alexandrov wrote:
Eric MALENFANT <Eric.Malenfant <at> sagem-interstar.com> writes:
The first example should work too - this is my understanding. Who is wrong here - me or boost? ... or your platform's implementation of codecvt?
This is a wild guess as I don't know anything about your platform (in particular: which implementation of the standard library, and the locale environment variables used when running your program)
Yes, I should have provided this information in the first place. It's Linux, x86, gcc 3.4.6. Locale is en_US.UTF-8
So is this information enough? This issue is a real showstopper for me since it doesn't seem to be possible to use wpath on Linux correctly on systems where locale is not UTF-8! -- Alexei Alexandrov

Beman Dawes wrote:
Alexei Alexandrov wrote:
Alexei Alexandrov wrote:
So is this information enough? This issue is a real showstopper for me since it doesn't seem to be possible to use wpath on Linux correctly on systems where locale is not UTF-8!
What version of Boost are you using?
Ah, sorry again for not providing these details - I'm using boost.filesystem from 1.34.1 release. I also ran the failing use case under valgrind - it showed a number of "conditional jumps on uninitialized value" somewhere deep under libstdc++ and also a couple of "4 bytes uninitialized read". I don't know if it's related to the problem though - I was just trying to do what I can. The problem is rather serious for me - I'm ready to do whatever it's needed to help the boost.filesystem maintainer (is it you?) investigate and fix the problem. -- Alexei Alexandrov

Alexei Alexandrov wrote:
Beman Dawes wrote:
Alexei Alexandrov wrote:
So is this information enough? This issue is a real showstopper for me since it doesn't seem to be possible to use wpath on Linux correctly on systems where locale is not UTF-8! What version of Boost are you using?
Ah, sorry again for not providing these details - I'm using boost.filesystem from 1.34.1 release.
I also ran the failing use case under valgrind - it showed a number of "conditional jumps on uninitialized value" somewhere deep under libstdc++ and also a couple of "4 bytes uninitialized read". I don't know if it's related to the problem though - I was just trying to do what I can.
The problem is rather serious for me - I'm ready to do whatever it's needed to help the boost.filesystem maintainer (is it you?) investigate and fix the problem.
Beman, is there any way to help with investigating/fixing this issue? I also wonder whether wpath is being tested on Linux as part of Boost test suite? I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far? -- Alexei Alexandrov

On Mon, Jan 28, 2008 at 10:02:26AM +0300, Alexei Alexandrov wrote:
Alexei Alexandrov wrote:
I also ran the failing use case under valgrind - it showed a number of "conditional jumps on uninitialized value" somewhere deep under libstdc++ and also a couple of "4 bytes uninitialized read". I don't know if it's related to the problem though - I was just trying to do what I can.
The problem is rather serious for me - I'm ready to do whatever it's needed to help the boost.filesystem maintainer (is it you?) investigate and fix the problem.
Beman, is there any way to help with investigating/fixing this issue? I also wonder whether wpath is being tested on Linux as part of Boost test suite? I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
There is simple no need for wchar_t on Linux. If you use a classical encoding in your filesystem it is a 8bit one (except you use a Asian language such as Japanese). All modern distributions switched already to UTF-8 as default encoding and for this you don't need wchar_t as well. Use ordinary char* streams for this ... Remember that you know for UTF-8 always where the current character stops if you just have a pointer to an arbritary byte (in the middle of a multi-byte character). It's also useless to group bytes pairwise as a valid UTF-8 character can consist of more than two bytes. char* is really sufficent. Jens

Jens Seidel wrote There is simple no need for wchar_t on Linux. If you use a classical encoding in your filesystem it is a 8bit one (except you use a Asian language such as Japanese). All modern distributions switched already to UTF-8 as default encoding and for this you don't need wchar_t as well. Use ordinary char* streams for this ...
Remember that you know for UTF-8 always where the current character stops if you just have a pointer to an arbritary byte (in the middle of a multi-byte character). It's also useless to group bytes pairwise as a valid UTF-8 character can consist of more than two bytes. char* is really sufficent.
wchar_t is required on Windows, if Linux doesn't support it fully cross platform work is complicated. Jim ________________________________________________________________________ This e-mail, and any attachment, is confidential. If you have received it in error, do not use or disclose the information in any way, notify me immediately, and please delete it from your system. ________________________________________________________________________

Jens Seidel wrote:
There is simple no need for wchar_t on Linux. If you use a classical encoding in your filesystem it is a 8bit one (except you use a Asian language such as Japanese). All modern distributions switched already to UTF-8 as default encoding and for this you don't need wchar_t as well. Use ordinary char* streams for this ...
Remember that you know for UTF-8 always where the current character stops if you just have a pointer to an arbritary byte (in the middle of a multi-byte character). It's also useless to group bytes pairwise as a valid UTF-8 character can consist of more than two bytes. char* is really sufficent.
This is more of a design choices discussion. As for Boost, it has wpath in its interfaces on Linux so the support is claimed. We made a design choices to use wchar_t cross-platform since the code is Windows/Linux. This is why I want to get it working. -- Alexei Alexandrov

Alexei Alexandrov wrote:
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
We use boost::filesystem::wpaths on Linux without problems (that I'm aware of). Also, IIUC, explicitely imbue()-ing the environment locale (std::locale("")) is not necessary, as it seems to be the default (look at libs/filesystem/src/path.cpp) Éric Malenfant --------------------------------------------- Why is lemon juice made with artificial flavor, and dishwashing liquid made with real lemons?

Eric MALENFANT wrote:
Alexei Alexandrov wrote:
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
We use boost::filesystem::wpaths on Linux without problems (that I'm aware of).
Also, IIUC, explicitely imbue()-ing the environment locale (std::locale("")) is not necessary, as it seems to be the default (look at libs/filesystem/src/path.cpp)
This is very valuable information, thanks! -- Alexei Alexandrov

Eric MALENFANT wrote:
Alexei Alexandrov wrote:
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
We use boost::filesystem::wpaths on Linux without problems (that I'm aware of).
Additional question: are you sure you use fs::wpaths with environment locale? Not boost UTF-8 codecvt facet (this is what is done in libs/filesystem/test/wide_test.cpp). Because boost UTF-8 codecvt facet works fine for me too, but I want to get system locale working - I don't want to rely on system locale encoding being UTF-8.
Also, IIUC, explicitely imbue()-ing the environment locale (std::locale("")) is not necessary, as it seems to be the default (look at libs/filesystem/src/path.cpp)
This is true. So you don't imbue anything at all and it works for you? I'd really appreciate this clarification. Thanks a lot to you and to all who are helping me in this thread! -- Alexei Alexandrov

Alexei Alexandrov, le 29 janvier 2008 01:07:
Eric MALENFANT wrote:
Alexei Alexandrov wrote:
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
We use boost::filesystem::wpaths on Linux without problems (that I'm aware of).
Additional question: are you sure you use fs::wpaths with environment locale? Not boost UTF-8 codecvt facet (this is what is done in libs/filesystem/test/wide_test.cpp). Because boost UTF-8 codecvt facet works fine for me too, but I want to get system locale working - I don't want to rely on system locale encoding being UTF-8.
Also, IIUC, explicitely imbue()-ing the environment locale (std::locale("")) is not necessary, as it seems to be the default (look at libs/filesystem/src/path.cpp)
This is true. So you don't imbue anything at all and it works for you? I'd really appreciate this clarification.
I just made a full search for "imbue" on our entire codebase, and the only occurences I found were on iostreams. So yes, we don't imbue anything, and it works for us. Éric Malenfant --------------------------------------------- In business, if two people always agree, one of them is unnecessary.

Alexei Alexandrov, le 29 janvier 2008 15:19:
Eric MALENFANT wrote:
I just made a full search for "imbue" on our entire
codebase, and the only occurences I found were on iostreams.
So yes, we don't imbue anything, and it works for us.
Oh, one more question - what are the platform and the compiler?
gcc 4.1.1, on RHEL 4 Éric Malenfant ---------------------------------------------

In case it helps track down the source of this, I get the same error. test.c: --------------- #include <iostream> #include <boost/filesystem/path.hpp> #include <boost/filesystem/convenience.hpp> namespace fs = boost::filesystem; int main() { fs::wpath mypath(L"/tmp/some/directory"); fs::create_directories(mypath); } --------------- program output: --------------- terminate called after throwing an instance of 'boost::filesystem::basic_filesystem_error<boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits> >' what(): boost::filesystem::wpath::to_external conversion error Aborted --------------- gdb backtrace: --------------- (gdb) backtrace #0 0xb7fd7410 in __kernel_vsyscall () #1 0xb7d6c085 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0xb7d6da01 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0xb7f7c480 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6 #4 0xb7f79d05 in ?? () from /usr/lib/libstdc++.so.6 #5 0xb7f79d42 in std::terminate () from /usr/lib/libstdc++.so.6 #6 0xb7f79e6a in __cxa_throw () from /usr/lib/libstdc++.so.6 #7 0xb7fbc15e in boost::filesystem::wpath_traits::to_external () from /usr/lib/libboost_filesystem-gcc42-1_34_1.so.1.34.1 #8 0x0804abe7 in boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits>::external_file_string (this=0xbfdffb9c) at /usr/include/boost/filesystem/path.hpp:302 #9 0x0804b35d in boost::filesystem::exists<boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits> > (ph=@0xbfdffb9c) at /usr/include/boost/filesystem/operations.hpp:279 #10 0x0804b49d in boost::filesystem::exists (ph=@0xbfdffb9c) at /usr/include/boost/filesystem/operations.hpp:601 #11 0x0804ba1d in boost::filesystem::create_directories<boost::filesystem::basic_path<std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >, boost::filesystem::wpath_traits> > ( ph=@0xbfdffb9c) at /usr/include/boost/filesystem/convenience.hpp:42 #12 0x0804bbb9 in boost::filesystem::create_directories (ph=@0xbfdffb9c) at /usr/include/boost/filesystem/convenience.hpp:88 #13 0x0804a3e0 in main () at test.c:9 --------------- GCC 4.2.3 Ubuntu 8.10 (Hardy Heron) en_US.UTF-8 Locale Boost 1.34.1 --Yarias -- View this message in context: http://www.nabble.com/-filesystem--Problems-with-wpath-on-Linux-tp15086495p1... Sent from the Boost - Dev mailing list archive at Nabble.com.

Alexei Alexandrov wrote:
Alexei Alexandrov wrote:
Beman Dawes wrote:
Alexei Alexandrov wrote:
So is this information enough? This issue is a real showstopper for me since it doesn't seem to be possible to use wpath on Linux correctly on systems where locale is not UTF-8! What version of Boost are you using?
Ah, sorry again for not providing these details - I'm using boost.filesystem from 1.34.1 release.
I also ran the failing use case under valgrind - it showed a number of "conditional jumps on uninitialized value" somewhere deep under libstdc++ and also a couple of "4 bytes uninitialized read". I don't know if it's related to the problem though - I was just trying to do what I can.
The problem is rather serious for me - I'm ready to do whatever it's needed to help the boost.filesystem maintainer (is it you?) investigate and fix the problem.
Beman, is there any way to help with investigating/fixing this issue? I also wonder whether wpath is being tested on Linux as part of Boost test suite?
Yes. See boost-root/libs/filesystem/test/wide_test.cpp.
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
You are the one I can recall reporting this problem, but I don't have any way to know how widely used the facility is on Linux. --Beman

Beman Dawes wrote:
Alexei Alexandrov wrote: [ snip ]
I mean, I'm the only one who reported this problem or just nobody used wchar_t with standard codecvt on Linux so far?
You are the one I can recall reporting this problem, but I don't have any way to know how widely used the facility is on Linux.
I'm currently using wpath on Linux as I'm writing a gui app using wxWidgets that I want to run on Windows also. This seemed to be the best approach. To get it to work I had to imbue wpath_traits with the experimental UTF8 locale as the example (wide_test.cpp I think) does. I have yet to really try this in anger, but regardless, I too am interested in this. Jamie

Jamie Allsop wrote:
Beman Dawes wrote: To get it to work I had to imbue wpath_traits with the experimental UTF8 locale as the example (wide_test.cpp I think) does.
I have yet to really try this in anger, but regardless, I too am interested in this.
It works fine for me too when imbuing boost UTF-8 codecvt facet - see the original post. But it doesn't work somehow when imbuing system codecvt (system locale encoding is UTF-8). This is what looks like a bug. I don't know where it is yet - boost, me, or my libstdc++ implementation (gcc 3.4.6). -- Alexei Alexandrov

Beman Dawes wrote:
Alexei Alexandrov wrote:
Beman, is there any way to help with investigating/fixing this issue? I also wonder whether wpath is being tested on Linux as part of Boost test suite?
Yes. See boost-root/libs/filesystem/test/wide_test.cpp.
The test imbues boost UTF-8 codecvt facet and this works fine for me tool. What doesn't work is when I imbue system locale codecvt. I'll take a look at it more. I might think that it's a libstdc++ bug, but I don't think so because I tested it imbuing it to a wofstream, outputting some international wchar_t data and the data in the file got converted to UTF-8 properly. -- Alexei Alexandrov

Alexei Alexandrov wrote:
Beman Dawes wrote:
Alexei Alexandrov wrote:
Alexei Alexandrov wrote:
So is this information enough? This issue is a real showstopper for me since it doesn't seem to be possible to use wpath on Linux correctly on systems where locale is not UTF-8! What version of Boost are you using?
Ah, sorry again for not providing these details - I'm using boost.filesystem from 1.34.1 release.
I also ran the failing use case under valgrind - it showed a number of "conditional jumps on uninitialized value" somewhere deep under libstdc++ and also a couple of "4 bytes uninitialized read". I don't know if it's related to the problem though - I was just trying to do what I can.
The problem is rather serious for me - I'm ready to do whatever it's needed to help the boost.filesystem maintainer (is it you?) investigate and fix the problem.
I'm the maintainer. One possible way to isolate the problem is to try a codecvt operation on the locale of interest without involving any boost code at all. If that works, the problem is likely within Boost.Filesystem. But if that fails, the problem is with the locale or use of it, not with Boost.Filesystem. I've got a Linux system here I can test on, but I'm not very familiar with Linux so am hesitant to start testing here as it always takes me awhile to come up to speed on Linux. --Beman

Beman Dawes wrote:
Alexei Alexandrov wrote:
Beman Dawes wrote:
One possible way to isolate the problem is to try a codecvt operation on the locale of interest without involving any boost code at all. If that works, the problem is likely within Boost.Filesystem. But if that fails, the problem is with the locale or use of it, not with Boost.Filesystem.
This is what I did. It was something like int main() { std::wofstream of("test.txt"); of.imbue(std::locale("")); of << utf8_to_wide("Some Russian string in UTF-8") << std::endl; } and the data in the output file appeared correctly as UTF-8. -- Alexei Alexandrov
participants (7)
-
Alexei Alexandrov
-
Beman Dawes
-
Eric MALENFANT
-
James Talbut
-
Jamie Allsop
-
Jens Seidel
-
Yarias