[filesystem] Version 3 request for comments

A prototype implementation of version 3 of Boost.Filesystem is available for comment. The critical feature of this prototype is the replacement of the class basic_path template with a single class path that supports both narrow and wide character paths. It also supports user supplied types, and will support the new C++0x char16_t and char32_t character types as compiler and standard library support becomes available. There is still a lot of additional work before V3 could be added to the trunk and eventually released. For example, work so far has concentrated on the new functionality, and ignored breaking changes to existing code. But enough stuff is working to make requesting comments useful. See http://svn.boost.org/svn/boost/sandbox/filesystem-v3/libs/filesystem/doc/v3_... for a fuller description of the prototype and links to the prototype implementation. I've got some specific issues I'd like opinions on, but would first like to hear comments from anyone interested. --Beman

Beman Dawes wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
........
I've got some specific issues I'd like opinions on, but would first like to hear comments from anyone interested.
--Beman
Hi First of all, thank you for your work in the filesystem library. Apparently the new version have wchar_t internal representations for all Windows OSs. However, Windows 9x versions do not support wchar_t (at least natively). I'm not sure if dropping support for Windows 9x is desired, but it would certainly be a difference with previous versions. In case you need any help with these platforms I'll be glad to do what I could. Best regards Jorge

On Mon, Feb 2, 2009 at 8:28 AM, Jorge Lodos Vigil <lodos@segurmatica.cu> wrote:
... First of all, thank you for your work in the filesystem library.
You are welcome!
Apparently the new version have wchar_t internal representations for all Windows OSs. However, Windows 9x versions do not support wchar_t (at least natively). I'm not sure if dropping support for Windows 9x is desired, but it would certainly be a difference with previous versions.
Is there a real-life case to be made for supporting Windows 9X? Cygwin is also a concern because of lack of standard library wchar_t support. I'm trying to get more information from the Cygwin/gcc maintainers, but no response so far. I think a technical solution might be possible by supporting builds for Windows that use narrow characters and the narrow character API. But there would have to be reasonable (i.e. limited) expectations. Such a build wouldn't magically enable encodings that aren't supported by the operating system and/or file system in use.
In case you need any help with these platforms I'll be glad to do what I could.
I would need help with testing, at the least. --Beman

Beman Dawes:
Cygwin is also a concern because of lack of standard library wchar_t support. I'm trying to get more information from the Cygwin/gcc maintainers, but no response so far.
Would it not be possible to support Cygwin by fixing the narrow encoding to UTF-8? This doesn't seem to require any stdlib support for wchar_t. wchar_t* will still work and this might be enough in practice for interfacing with the OS for tasks not covered by boost.fs (common dialog boxes, command line arguments and so on).

Beman Dawes wrote:
Apparently the new version have wchar_t internal representations for all Windows OSs. However, Windows 9x versions do not support wchar_t (at least natively). I'm not sure if dropping support for Windows 9x is desired, but it would certainly be a difference with previous versions.
Is there a real-life case to be made for supporting Windows 9X?
Unfortunately yes. We use boost::filesystem in our AV engine which is still used by many Windows 9x clients, including several Windows 95! Given the interface standardization, we will use boost::filesystem in 9x anyway, either by keeping v2 or even if you decide not to support 9x in v3 by adding to it what is needed. What I mean is that we will devote resources anyway to this task.
I think a technical solution might be possible by supporting builds for Windows that use narrow characters and the narrow character API. But there would have to be reasonable (i.e. limited) expectations. Such a build wouldn't magically enable encodings that aren't supported by the operating system and/or file system in use.
You are right of course, no magical encodings are possible. It will suffice if ANSI Windows APIs are called in this build instead of wide ones. This is the problem introduced in v3: the narrow API versions are never used.
In case you need any help with these platforms I'll be glad to do what I could.
I would need help with testing, at the least.
Please don't hesitate to ask what you need. If you preder so you may contact me directly. Best regards Jorge

On Mon, Feb 2, 2009 at 12:31 PM, Beman Dawes <bdawes@acm.org> wrote:
Cygwin is also a concern because of lack of standard library wchar_t support. I'm trying to get more information from the Cygwin/gcc maintainers, but no response so far.
I've gotten in contact with the Cygwin and Newlib maintainers, who turn out to be much the same people. Turns out there is very little remaining work to get wide character support into cygwin/gcc. The holdup appears to be lack of volunteer's time rather than any technical problems. It is really just a matter of copying a very few functions from FreeBSD and getting them to compile in the newlib environment. I tried that yesterday with one of the largest functions and had it compiling in an hour so so. The biggest problem is, no surprise, trying to get their build system to work. I'm still working on that, but at least I now know where to ask questions and get answers. Bottom line is that it really does appear easier to get wide character support working in Cygwin/gcc than to continue to have to build workarounds into Boost code. --Beman

Cygwin is also a concern because of lack of standard library wchar_t support. I'm trying to get more information from the Cygwin/gcc maintainers, but no response so far.
I've gotten in contact with the Cygwin and Newlib maintainers, who turn out to be much the same people. Turns out there is very little remaining work to get wide character support into cygwin/gcc. The holdup appears to be lack of volunteer's time rather than any technical problems. It is really just a matter of copying a very few functions from FreeBSD and getting them to compile in the newlib environment. I tried that yesterday with one of the largest functions and had it compiling in an hour so so. The biggest problem is, no surprise, trying to get their build system to work. I'm still working on that, but at least I now know where to ask questions and get answers.
Bottom line is that it really does appear easier to get wide character support working in Cygwin/gcc than to continue to have to build workarounds into Boost code.
Thank you, F. Bron Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.

Beman Dawes:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
...
See http://svn.boost.org/svn/boost/sandbox/filesystem-v3/libs/filesystem/doc/v3_... for a fuller description of the prototype and links to the prototype implementation.
locale( "" ) is not correct on Mac OS X, where the API always takes UTF-8 char* and the filesystem uses wchar_t internally (AFAIK). One might make a case that a default UTF-8 locale is more "correct" on all OSes. On Solaris, it is the recommended choice by Sun for path encoding (AFAIK). All Linuxes are also moving to UTF-8 by default. There do exist vocal minorities that prefer other encodings though. :-) I'm, personally, using UTF-8 as the narrow encoding on Windows as well. The reason it works is that if one always gets a path object from the FS library it doesn't matter that the encoding doesn't match the ANSI APIs because they never get used. In addition, UTF-8 is roundtrip-safe, that is, path( p.string() ) produces a path that is equivalent to p; this is not necessarily the case with the system encoding, where some paths cannot be represented as a narrow string.

See http://svn.boost.org/svn/boost/sandbox/filesystem-v3/libs/filesystem/doc/v3_... for a fuller description of the prototype and links to the prototype implementation.
locale( "" ) is not correct on Mac OS X, where the API always takes UTF-8 char* and the filesystem uses wchar_t internally (AFAIK).
This is irrelevant from a practical point of view, but the standard requires that "" be recognised as a valid locale: "explicit locale(const char* std_name ); Effects: Constructs a locale using standard C locale names, e.g. "POSIX". The resulting locale implements semantics defined to be associated with that name. Throws: runtime_error if the argument is not valid, or is null. Remarks: The set of valid string argument values is "C", "", and any implementation-defined values." John.

John Maddock:
See http://svn.boost.org/svn/boost/sandbox/filesystem-v3/libs/filesystem/doc/v3_... for a fuller description of the prototype and links to the prototype implementation.
locale( "" ) is not correct on Mac OS X, where the API always takes UTF-8 char* and the filesystem uses wchar_t internally (AFAIK).
This is irrelevant from a practical point of view, but the standard requires that "" be recognised as a valid locale:
"" is a valid locale, but it is not correct to use it for path conversions. Paths on Mac OS X are always UTF-8 (when viewed through the POSIX API). (They are also normalized in a variation of form D, but this can probably be ignored for our purposes.) Disclaimer: this is what I read in Apple's docs. I have never used Mac OS X myself. :-)

On Sunday 01 February 2009 18:53:11 Beman Dawes wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
I have one specific question which I wonder how your proposal handles it: Imagine a system where strings for filesystems are by default encoded in a certain encoding like e.g. Latin-1. Now, I mount a CD that is encoded in UTF-8. The system itself neither enforces Latin-1 nor does it in any way validate the mounted filesystem, so I might have a path that contains parts in Latin-1 and parts in UTF-8. How does your proposed filesystem library handle such a beast? Uli

On Fri, Feb 6, 2009 at 1:22 PM, Ulrich Eckhardt <doomster@knuut.de> wrote:
On Sunday 01 February 2009 18:53:11 Beman Dawes wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
I have one specific question which I wonder how your proposal handles it: Imagine a system where strings for filesystems are by default encoded in a certain encoding like e.g. Latin-1. Now, I mount a CD that is encoded in UTF-8. The system itself neither enforces Latin-1 nor does it in any way validate the mounted filesystem, so I might have a path that contains parts in Latin-1 and parts in UTF-8.
Yes, and I think that is possible whether you are using Boost.Filesystem or native API calls.
How does your proposed filesystem library handle such a beast?
You could decompose the path, setting the appropriate locale/codecvt facet at each step of the way. That assumes you know how each element of the path is encoded. Alternately, you could decompose the path into elements of path::string_type, and then convert the elements that need it to whatever other encoding you wish. So Boost.Filesystem helps with some of the details, but it is still really up to you to know what encodings are involved. Did that answer your question? --Beman

Beman Dawes <bdawes@acm.org> wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
This was also true in the old version, but one thing that bothers me is that the filesystem library defines a number of types: regular_file, directory_file, fifo_file, symlink_file, etc. Why does it append "_file" to the end of all of the names? Why is it not just regular_file, directory, fifo, symlink, etc.? The corresponding identification functions are named is_directory and is_symlink, which seems inconsistent. Cheers, Walter Landry wlandry@caltech.edu p.s. I noticed that you changed the names of "nlink" and "readlink" to the more clear names "hard_link_count" and "read_symlink". This might be needlessly confusing for those familiar with Unix, but even Ken Thompson said that, if he had to do it over again, he would have spelled "creat" with an "e" ;)

On Sun, Feb 8, 2009 at 5:12 AM, Walter Landry <wlandry@caltech.edu> wrote:
Beman Dawes <bdawes@acm.org> wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
This was also true in the old version, but one thing that bothers me is that the filesystem library defines a number of types: regular_file, directory_file, fifo_file, symlink_file, etc. Why does it append "_file" to the end of all of the names? Why is it not just regular_file, directory, fifo, symlink, etc.? The corresponding identification functions are named is_directory and is_symlink, which seems inconsistent.
It is inconsistent. _file was appended to avoid polluting the namespace with enumeration constant names "regular", "directory", etc, that might later be needed for other purposes. In C++0x, the preferred practice for such constants is probably going to be to use a scoped enum: enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; That's a lot more satisfying. We can emulate that approach in C++03 like this: #ifndef BOOST_NO_SCOPED_ENUM enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; typedef file_type file_type_enum; #else namespace file_type { enum types { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; } typedef file_type::types file_type_enum; #endif Is such a scoped enum approach worth taking? I haven't made up my mind; I'd like to think about it for a few days and also see what could be done to preserve existing code.
Cheers, Walter Landry wlandry@caltech.edu
p.s. I noticed that you changed the names of "nlink" and "readlink" to the more clear names "hard_link_count" and "read_symlink". This might be needlessly confusing for those familiar with Unix, but even Ken Thompson said that, if he had to do it over again, he would have spelled "creat" with an "e" ;)
I really prefer the clearer names:-) Thanks, --Beman

----- Original Message ----- From: "Beman Dawes" <bdawes@acm.org> To: <boost@lists.boost.org> Sent: Wednesday, February 11, 2009 4:50 PM Subject: Re: [boost] [filesystem] Version 3 request for comments
In C++0x, the preferred practice for such constants is probably going to be to use a scoped enum:
enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown };
That's a lot more satisfying. We can emulate that approach in C++03 like this:
#ifndef BOOST_NO_SCOPED_ENUM
enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown };
typedef file_type file_type_enum;
#else namespace file_type { enum types { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; }
typedef file_type::types file_type_enum; #endif
Is such a scoped enum approach worth taking?
I haven't made up my mind; I'd like to think about it for a few days and also see what could be done to preserve existing code.
Hi, the question is how the user will use these enumerations. if BOOST_NO_SCOPED_ENUM is not defined the user can write file_type e = file_type::file_not_found; which will not evidently compile when defined. How can we force the user to write file_type_enum e = file_type::file_not_found; which is not intuitive, i.e. there is no explicit relation between file_type_enum and file_type. What about making this relation explicit using a metafunction? I'm don't know if the following is clearer enumeration<file_type>::type e = file_type::file_not_found; Best, Vicente __________________________________ #ifndef BOOST_NO_SCOPED_ENUM template <typename E> struct enummeration { typedef E type; }; #else template <typename E> struct enummeration { typedef E::type type; }; #endif #ifndef BOOST_NO_SCOPED_ENUM enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; typedef file_type file_type_enum; #else struct file_type { enum type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; } typedef file_type::types file_type_enum; #endif

On Wed, Feb 11, 2009 at 11:18 AM, vicente.botet <vicente.botet@wanadoo.fr> wrote:
Hi, the question is how the user will use these enumerations. if BOOST_NO_SCOPED_ENUM is not defined the user can write
file_type e = file_type::file_not_found;
which will not evidently compile when defined. How can we force the user to write
file_type_enum e = file_type::file_not_found;
which is not intuitive, i.e. there is no explicit relation between file_type_enum and file_type.
Agreed.
What about making this relation explicit using a metafunction?
I'm don't know if the following is clearer
enumeration<file_type>::type e = file_type::file_not_found;
That doesn't seem any clearer. Maybe the enum should just be moved into class file_status. That would limit the scope of the constants, so the _file could be dropped. Might be simpler, and there is no C++03 compatibility issue. --Beman

Beman Dawes wrote:
#ifndef BOOST_NO_SCOPED_ENUM
enum class file_type { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown };
typedef file_type file_type_enum;
#else namespace file_type { enum types { status_unknown, file_not_found, regular, directory, symlink, block, character, fifo, socket, type_unknown }; }
typedef file_type::types file_type_enum; #endif
Is such a scoped enum approach worth taking?
I didn't dig into the code, but do the file_not_found or status_unknown really describe the type of the file? Should they be extracted to a separate enum? FWIW, I often use such technique and find it very useful. The users' code become much clearer, although more verbose (I can live with it). I vote for it. PS: I would also s/type_unknown/unknown/, if the status_unknown was extracted.

A prototype implementation of version 3 of Boost.Filesystem is available for comment.
Have you now merged it to the trunk? nearly all tests now fail for cygwin. I will try to run the tests with cygwin 1.7 to see if it is better. F. Bron Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.

On Fri, Apr 3, 2009 at 4:33 AM, <frederic.bron@alcan.com> wrote:
A prototype implementation of version 3 of Boost.Filesystem is available for comment.
Have you now merged it to the trunk?
No, and I won't merge it to trunk until code breakage is reduced.
nearly all tests now fail for cygwin. I will try to run the tests with cygwin 1.7 to see if it is better.
I'm really hoping Cygwin will add wide string support soon. The stumbling block was Newlib, and the critical and difficult fixes have now been merged into the Newlib trunk. IFAIK, the only remaining task is for Cygwin's stdlib++ configuration to enable wide strings. You might query the Cygwin mailing list to inquire about progress, and report back if they have any time estimate when wide string support might actually ship. Thanks, --Beman

On Fri, Apr 10, 2009 at 7:37 AM, Beman Dawes <bdawes@acm.org> wrote:
nearly all tests now fail for cygwin. I will try to run the tests with cygwin 1.7 to see if it is better.
I'm really hoping Cygwin will add wide string support soon. The stumbling block was Newlib, and the critical and difficult fixes have now been merged into the Newlib trunk. IFAIK, the only remaining task is for Cygwin's stdlib++ configuration to enable wide strings.
You might query the Cygwin mailing list to inquire about progress, and report back if they have any time estimate when wide string support might actually ship.
Releasing boost.filesystem which requires wide strings support is a good way to get cygwin users to complain. So the failing cygwin tests are a good thing, IMO. Emil Dotchevski Reverge Studios, Inc. http://www.revergestudios.com/reblog/index.php?n=ReCode

Have you now merged it to the trunk?
No, and I won't merge it to trunk until code breakage is reduced.
nearly all tests now fail for cygwin. I will try to run the tests with cygwin 1.7 to see if it is better.
I'm really hoping Cygwin will add wide string support soon. The stumbling block was Newlib, and the critical and difficult fixes have now been merged into the Newlib trunk. IFAIK, the only remaining task is for Cygwin's stdlib++ configuration to enable wide strings.
You might query the Cygwin mailing list to inquire about progress, and report back if they have any time estimate when wide string support might actually ship.
It appears that it is now not possible anymore to run the regression tests on cygwin (I think that the definition of BOOST_FILESYSTEM_NARROW_ONLY has changed when building process_jam_log) and as I have been said that running the tests on USB FAT32 disk is not a good idea, I have stopped running them; I have not enough space on my internal hard disk (maybe this could change later). I wanted to run the tests with beta version of cygwin 1.7 which says it has wide char support but I coudn't... Frédéric Avis : Ce message et toute pièce jointe sont la propriété d'Alcan et sont destinés seulement aux personnes ou à l'entité à qui le message est adressé. Si vous avez reçu ce message par erreur, veuillez le détruire et en aviser l'expéditeur par courriel. Si vous n'êtes pas le destinataire du message, vous n'êtes pas autorisé à utiliser, à copier ou à divulguer le contenu du message ou ses pièces jointes en tout ou en partie. Notice: This message and any attachments are the property of Alcan and are intended solely for the named recipients or entity to whom this message is addressed. If you have received this message in error please inform the sender via e-mail and destroy the message. If you are not the intended recipient you are not allowed to use, copy or disclose the contents or attachments in whole or in part.
participants (11)
-
Andrey Semashev
-
Beman Dawes
-
Edd Dawson
-
Emil Dotchevski
-
frederic.bron@alcan.com
-
John Maddock
-
Jorge Lodos Vigil
-
Peter Dimov
-
Ulrich Eckhardt
-
vicente.botet
-
Walter Landry