[filesystem][cygwin] Standard conformance for wide characters

I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem. Cygwin's lack of library support for wchar_t is the problem . For the current Boost.Filesystem version (v2), the necessary workarounds are so pervasive that the implementation code is much harder to read and maintain. Witness the number of bug reports that are Cygwin specific. For v3, currently under development, trying to support Cygwin would be even harder, and would cause a serious delay in development. Plus I'm tired of waiting for the cygwin folks to come into full C++ conformance. IIUC, the reason Cygwin doesn't provide C++ standard library support for wchar_t is that the underlying C library is missing the C wchar_t functions. Perhaps Boosters who care about Cygwin could spearhead an effort to add the missing C support? The needed functionality isn't all that complex; the main problem might be just learning enough about how Cygwin/GCC is configured and built to be able to add a fairy small number of C functions. Thoughts? --Beman

I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I understand your point. You already do so much for the community! I do not know how fix the cygwin gcc but is your v3 going to work for non-wide characters on that platform? F. Bron Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.

2009/1/13 <frederic.bron@alcan.com>:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I understand your point. You already do so much for the community! I do not know how fix the cygwin gcc but is your v3 going to work for non-wide characters on that platform?
No. That's the problem. Narrow character only support ends up messing up the code beyond my level of tolerance. For a library that builds in wide character support right from the start, it really isn't C++ anymore if the standard library's wide character support isn't there. The implementation is using wide characters internally, most of the interface is templatized, and wide character test cases are intermingled with narrow character test cases. Perhaps I'm just getting old and cranky, but it's been way over 10 years since all this stuff was standardized. It is time for cygwin to stop tormenting developers and provide full C++ 98 conformance. --Beman

Beman Dawes wrote:
Perhaps I'm just getting old and cranky, but it's been way over 10 years since all this stuff was standardized. It is time for cygwin to stop tormenting developers and provide full C++ 98 conformance.
You mean we are getting 'export' finally? :-) Yeah! Bo Persson

On Tue, Jan 13, 2009 at 4:58 PM, Bo Persson <bop@gmb.dk> wrote:
Beman Dawes wrote:
Perhaps I'm just getting old and cranky, but it's been way over 10 years since all this stuff was standardized. It is time for cygwin to stop tormenting developers and provide full C++ 98 conformance.
You mean we are getting 'export' finally? :-)
Ah, yes, 'export'. I stand corrected. Change that to: "... and provide de facto C++ 98 conformance. --Beman

On Tue, Jan 13, 2009 at 6:16 AM, Beman Dawes <bdawes@acm.org> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I pulled the plug on Cygwin support at my company a couple of years ago for the lack of wchar_t support. They ought to support it.
Cygwin's lack of library support for wchar_t is the problem . For the current Boost.Filesystem version (v2), the necessary workarounds are so pervasive that the implementation code is much harder to read and maintain.
Is it true that had the interface of boost::filesystem been defined in terms of utf8, then the only platform on which wchar_t support would have been instrumental is Windows, and we wouldn't have had problems with Cygwin?
IIUC, the reason Cygwin doesn't provide C++ standard library support for wchar_t is that the underlying C library is missing the C wchar_t functions. Perhaps Boosters who care about Cygwin could spearhead an effort to add the missing C support?
You're right that it's missing the needed C library functions, but I was able to hack into the C++ headers to trick them to support std::wstring anyway. That's not the same as full wchar_t support, but it was the only thing I personally cared about. My hack broke with one of the Cygwin releases that followed and I gave up on tracking down why, but I suppose if someone cares enough about this it can be done (my approach was to inject a modified version of one of the gcc-specific header files into the compiler include path, I'm not sure if this would be good enough for Boost.) Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On Tue, Jan 13, 2009 at 1:17 PM, Emil Dotchevski <emildotchevski@gmail.com> wrote:
On Tue, Jan 13, 2009 at 6:16 AM, Beman Dawes <bdawes@acm.org> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I pulled the plug on Cygwin support at my company a couple of years ago for the lack of wchar_t support. They ought to support it.
Since in this case Cygwin is just a packager of other people's work, it isn't on their radar screen. I doubt it would take much work to add gcc wide character support; the compiler already supports wchar_t as does the C++ standard library. IIUC, the only piece of the puzzle that is missing is the C language library support. I suspect any of us could code that up in short order, but I don't want to take the time to figure out how to integrate it into their build system and sheppard the changes through their process.
Cygwin's lack of library support for wchar_t is the problem . For the current Boost.Filesystem version (v2), the necessary workarounds are so pervasive that the implementation code is much harder to read and maintain.
Is it true that had the interface of boost::filesystem been defined in terms of utf8, then the only platform on which wchar_t support would have been instrumental is Windows, and we wouldn't have had problems with Cygwin?
The problem isn't so much the interface as the internals. Windows' native character type for file names is wchar_t.
IIUC, the reason Cygwin doesn't provide C++ standard library support for wchar_t is that the underlying C library is missing the C wchar_t functions. Perhaps Boosters who care about Cygwin could spearhead an effort to add the missing C support?
You're right that it's missing the needed C library functions, but I was able to hack into the C++ headers to trick them to support std::wstring anyway. That's not the same as full wchar_t support, but it was the only thing I personally cared about.
My hack broke with one of the Cygwin releases that followed and I gave up on tracking down why, but I suppose if someone cares enough about this it can be done (my approach was to inject a modified version of one of the gcc-specific header files into the compiler include path, I'm not sure if this would be good enough for Boost.)
I suspect it is far easier to just fix the problem at the Cygwin source code level that trying to hack and maintain patches for Cygwin object code distributions. --Beman

Beman Dawes wrote:
I doubt it would take much work to add gcc wide character support; the compiler already supports wchar_t as does the C++ standard library. IIUC, the only piece of the puzzle that is missing is the C language library support.
That is correct, and I'm told some work has gone into this very recently, so anybody interested may want to check into this to try to accelerate the process. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

On Wed, Jan 14, 2009 at 5:56 AM, Beman Dawes <bdawes@acm.org> wrote:
On Tue, Jan 13, 2009 at 1:17 PM, Emil Dotchevski
Is it true that had the interface of boost::filesystem been defined in terms of utf8, then the only platform on which wchar_t support would have been instrumental is Windows, and we wouldn't have had problems with Cygwin?
The problem isn't so much the interface as the internals. Windows' native character type for file names is wchar_t.
I was thinking that had the boost::filesystem interface been defined in terms of UTF-8, then on Cygwin, Unicode file names would have passed through boost::filesystem flawlessly, only to fail at Cygwin level due to lack of UTF-8 support (in Cygwin.) But that would have been their problem, not ours, and we wouldn't be talking about dropping (all) Cygwin support. I Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On Wed, Jan 14, 2009 at 11:40 AM, Emil Dotchevski <emildotchevski@gmail.com> wrote:
On Wed, Jan 14, 2009 at 5:56 AM, Beman Dawes <bdawes@acm.org> wrote:
On Tue, Jan 13, 2009 at 1:17 PM, Emil Dotchevski
Is it true that had the interface of boost::filesystem been defined in terms of utf8, then the only platform on which wchar_t support would have been instrumental is Windows, and we wouldn't have had problems with Cygwin?
The problem isn't so much the interface as the internals. Windows' native character type for file names is wchar_t.
I was thinking that had the boost::filesystem interface been defined in terms of UTF-8, then on Cygwin, Unicode file names would have passed through boost::filesystem flawlessly, only to fail at Cygwin level due to lack of UTF-8 support (in Cygwin.)
But that would have been their problem, not ours, and we wouldn't be talking about dropping (all) Cygwin support.
I
Sorry hit Send prematurely :) I wanted to say "I'm probably missing something but I can't see it." Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

On Tue, Jan 13, 2009 at 2:16 PM, Beman Dawes <bdawes@acm.org> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I don't entirely understand the implications of this. Without Boost.Filesystem, are we effectively saying that Boost as a whole non longer supports the predominant free C++ compiler, or is that a bit of an overstatement? Does lack of Boost.Filesystem support impact the useability of other areas of Boost? Thanks, Rob.

On Wed, Jan 14, 2009 at 12:57 AM, Robert Jones <robertgbjones@gmail.com> wrote:
On Tue, Jan 13, 2009 at 2:16 PM, Beman Dawes <bdawes@acm.org> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I don't entirely understand the implications of this. Without Boost.Filesystem, are we effectively saying that Boost as a whole non longer supports the predominant free C++ compiler, or is that a bit of an overstatement?
Excuse me, but MSVC is the predominant free C++ compiler on Windows. :) Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Emil Dotchevski wrote:
Excuse me, but MSVC is the predominant free C++ compiler on Windows. :)
You obviously don't use the same definition of 'Free'. :-) Stefan -- ...ich hab' noch einen Koffer in Berlin...

Edward Diener wrote:
You obviously don't use the same definition of 'Free'.
:-)
Stefan
The express version is free.
It is free of charge, but not Free. Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Thu Jan 15 2009, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Edward Diener wrote:
You obviously don't use the same definition of 'Free'.
:-)
Stefan
The express version is free.
It is free of charge, but not Free.
Stefan
OK, everyone, I'm going to ask you all -- preemptively -- to settle down. This is an old and somewhat un-Boost-ish argument; I'm sure there's a more appropriate place out there to have it. :-) Thanks, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Wed, Jan 14, 2009 at 3:57 AM, Robert Jones <robertgbjones@gmail.com> wrote:
On Tue, Jan 13, 2009 at 2:16 PM, Beman Dawes <bdawes@acm.org> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I don't entirely understand the implications of this. Without Boost.Filesystem, are we effectively saying that Boost as a whole non longer supports the predominant free C++ compiler, or is that a bit of an overstatement?
That's way too strong, although:
Does lack of Boost.Filesystem support impact the useability of other areas of Boost?
Probably. But if anyone cares about this, they should fix Cygwin. --Beman

I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer below. It says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Cygwin is using the wide char functions provided by newlib. Newlib (http://sources.redhat.com/newlib/) is still lacking a couple of wide char functions so far, namely:
fwprintf fwscanf swprintf swscanf vfwprintf vswprintf vwprintf wprintf wscanf wcstod wcstof wcstold wcsftime wcstok
wcstok has been contributed but isn't checked in so far. As for all the other functions, contributions to newlib are always welcome on the newlib mailing list.
Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.

On Thu, Feb 5, 2009 at 8:35 AM, <frederic.bron@alcan.com> wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer below.
Ha! You must have more charm that I do; there was no reply when I asked a similar question on both that and the cygwin apps list. Good work!
It says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Cygwin is using the wide char functions provided by newlib. Newlib (http://sources.redhat.com/newlib/) is still lacking a couple of wide char functions so far, namely:
fwprintf fwscanf swprintf swscanf vfwprintf vswprintf vwprintf wprintf wscanf wcstod wcstof wcstold wcsftime wcstok
wcstok has been contributed but isn't checked in so far. As for all the other functions, contributions to newlib are always welcome on the newlib mailing list.
The list of functions and what mailing list to work with are exactly what I wanted to find out. That list of functions just isn't that scary, particularly since the narrow version of the same set of functions would be available to use as a guideline. I'll post something on the newlib list to try to get a bit of direction from them, but my current plan is to twist people's arms to get that set of functions contributed to newlib, and on into cygwin. --Beman

on Thu Feb 05 2009, frederic.bron-AT-alcan.com wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer below. It says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Implement them? Maybe it's too facile an answer, but I can't think of anything else... -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Feb 5, 2009 at 5:04 PM, David Abrahams <dave@boostpro.com> wrote:
on Thu Feb 05 2009, frederic.bron-AT-alcan.com wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer below. It says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Implement them?
Maybe it's too facile an answer, but I can't think of anything else...
It seems to me that it is very difficult to implement complete wide character support. The thing is, all that boost::filesystem needs is std::wstring (right?), and std::wstring needs only a handful of wide character functions. That subset should be trivial to implement, and then it's just a matter of whether the rest of the standard library implementation can be configured to enable std::wstring only. I still wish filesystem used utf-8 instead but oh well. :) Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Any idea how to help newlib get the missing functions?
Implement them?
Maybe it's too facile an answer, but I can't think of anything else...
Thanks for the advice but I have absolutely no idea of how to use wide characters. I have never used them. C++ standard does not say what can be done with wide characters. For example how can I read properly a file written in UTF8? Don't know. If you have a good book on the topic to recommend, I would be glad to have a look. F. Bron Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.

frederic.bron wrote:
Thanks for the advice but I have absolutely no idea of how to use wide characters. I have never used them. C++ standard does not say what can be done with wide characters. For example how can I read properly a file written in UTF8?
I wanted to reply: std::wifstream f("AFileInUtf8.txt"); f.imbue(std::locale(std::locale(), new boost::utf8_codecvt_facet<wchar_t>); But it seems that the codecvt_facet in boost/libs/detail/utf8_codecvt_facet.cpp only supports UTF-8<=>UCS-4 conversions. It thus seems that it can't be used with wchar_t on a platform where sizeof(wchar_t) != 4, right? Am I missing something?

Eric MALENFANT wrote:
frederic.bron wrote:
Thanks for the advice but I have absolutely no idea of how to use wide characters. I have never used them. C++ standard does not say what can be done with wide characters. For example how can I read properly a file written in UTF8?
I wanted to reply: std::wifstream f("AFileInUtf8.txt"); f.imbue(std::locale(std::locale(), new boost::utf8_codecvt_facet<wchar_t>);
But it seems that the codecvt_facet in boost/libs/detail/utf8_codecvt_facet.cpp only supports UTF-8<=>UCS-4 conversions. It thus seems that it can't be used with wchar_t on a platform where sizeof(wchar_t) != 4, right?
Am I missing something?
There's a similar codecvt_facet in the serialization library under the archive details. Perhaps it can be used? Jeff

Jeff Flinn wrote:
Eric MALENFANT wrote:
But it seems that the codecvt_facet in boost/libs/detail/utf8_codecvt_facet.cpp only supports UTF-8<=>UCS-4 conversions. It thus seems that it can't be used with wchar_t on a platform where sizeof(wchar_t) != 4, right?
Am I missing something?
There's a similar codecvt_facet in the serialization library under the archive details. Perhaps it can be used?
If you're talking about libs/serialization/src/utf8_codecvt_facet.cpp it is not different. In fact, this file only #include ../../detail/utf8_codecvt_facet.cpp.

on Fri Feb 06 2009, frederic.bron-AT-alcan.com wrote:
Any idea how to help newlib get the missing functions?
Implement them?
Maybe it's too facile an answer, but I can't think of anything else...
Thanks for the advice but I have absolutely no idea of how to use wide characters. I have never used them. C++ standard does not say what can be done with wide characters. For example how can I read properly a file written in UTF8?
UTF8 isn't wide characters; it's unicode stored in 8-bit code units. See http://unicode.org/faq/utf_bom.html#UTF8, so wstring would really be an inappropriate way to store UTF8. Wide characters are actually not characters either; they're "code units." See http://unicode.org/faq/utf_bom.html#utf16-1 for example. In general, AFAIK, a wstring is just a container of wchar_ts and C++ doesn't say anything about how those wchar_ts map onto glyphs. Since wchar_t is only required to be a 16-bit quantity, the most likely encoding to store in a wstring is UTF16, but it could be anything. So I don't think any of the wchar_t* c-string functions can possibly be much more than textual copies of the regular char* c-string functions, but operating on a different datatype. HTH, -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Thu, Feb 5, 2009 at 8:04 PM, David Abrahams <dave@boostpro.com> wrote:
on Thu Feb 05 2009, frederic.bron-AT-alcan.com wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer below. It says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Implement them?
Maybe it's too facile an answer, but I can't think of anything else...
I'm also running on the assumption that implementing them is in the cards, but I wanted assurance that if we went to the effort of implementing them, the implementations would be accepted into newlib. Thanks mostly to Frederic, we seem to have finally attracted the attention of some of the key people. I've subscribed to the newlib list and am starting to get a handle on what their requirements are. --Beman

on Fri Feb 06 2009, Beman Dawes <bdawes-AT-acm.org> wrote:
On Thu, Feb 5, 2009 at 8:04 PM, David Abrahams <dave@boostpro.com> wrote:
on Thu Feb 05 2009, frederic.bron-AT-alcan.com wrote:
I've decided not to attempt support for Cygwin in the next version of Boost.Filesystem.
I have posted your comments on the cygwin mailing list and post the answer
below. It
says that cygwin is not using the GNU C library but newlib which does not have the full set of wide character functions. Any idea how to help newlib get the missing functions?
Implement them?
Maybe it's too facile an answer, but I can't think of anything else...
I'm also running on the assumption that implementing them is in the cards, but I wanted assurance that if we went to the effort of implementing them, the implementations would be accepted into newlib.
Oh, I was sort of suggesting that Frederic should try to implement them if he really wants Cygwin support. But if you want to take it on, it's of course OK with me... -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Sat, Feb 7, 2009 at 5:09 PM, David Abrahams <dave@boostpro.com> wrote:
Oh, I was sort of suggesting that Frederic should try to implement them if he really wants Cygwin support. But if you want to take it on, it's of course OK with me...
Progress report: Corinna Vinschen, the Cygwin project co-leader, has implemented wprintf, fwprintf, swprintf, vwprintf, vfwprintf, and vswprintf, and they have been added to the Newlib trunk. Additional comments and suggestions came from Craig Howland, Jeff Johnston, and Eric Blake. Thanks to these folks for moving Newlib, and thus Cygwin C++, wide character support forward! That leaves only the wscanf functions still to go. --Beman
participants (10)
-
Beman Dawes
-
Bo Persson
-
David Abrahams
-
Edward Diener
-
Emil Dotchevski
-
Eric MALENFANT
-
frederic.bron@alcan.com
-
Jeff Flinn
-
Robert Jones
-
Stefan Seefeld