[filesystem] Version 3 of Boost.Filesystem added to trunk

Because version 3 will break some user code, both v2 and v3 will be shipped for several releases. For 1.44, the default is v2 and the user has to explicitly switch to v3. I'd really appreciate it if some Boosters could give v3 a try on your own code that uses Boost.Filesystem. If you already have trunk on your machine, just run "svn up" on it. Otherwise, checkout a copy: mkdir boost-trunk svn co http://svn.boost.org/svn/boost/trunk boost-trunk If you prefer, "svn export http://svn.boost.org/svn/boost/trunk boost-trunk" should also work. Then read boost-trunk/libs/filesystem/dual_versions.html, and follow the instructions there. Then please let me know your results: * Are the instructions for changing to version 3 sufficient? They are pretty minimal, but I'm assuming those doing this have a high level of background knowledge. * Did you have any problem changing to version 3? * Did you have problems compiling your code against version 3? * If there were compile errors, do you have any ideas about how to mitigate the problem? * Did you have any problems testing or running your code with version 3? * Any other comments or suggestions? Thanks, --Beman

On 2 June 2010 11:07, Beman Dawes <bdawes@acm.org> wrote:
Because version 3 will break some user code, both v2 and v3 will be shipped for several releases. For 1.44, the default is v2 and the user has to explicitly switch to v3.
I notice that you have to rebuild boost.filesystem when switching versions. Does that mean that you can't use both from one install of boost? I would have assumed that there's be filesystem_v2 and filesystem_v3 include directories, namespaces, and libraries. The filesystem directory would look at a #define to pick which headers to include and which namespace to mention in a using directive. The switching code would simply change the default of the #define and symlink the appropriate library. My thinking here is that I'd like to be able to install boost on my system, have old code pick up the v2 default, but still be able to use v3 explicitly in new code, should I so choose. ~ Scott McMurray

On Wed, Jun 2, 2010 at 1:37 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On 2 June 2010 11:07, Beman Dawes <bdawes@acm.org> wrote:
Because version 3 will break some user code, both v2 and v3 will be shipped for several releases. For 1.44, the default is v2 and the user has to explicitly switch to v3.
I notice that you have to rebuild boost.filesystem when switching versions. Does that mean that you can't use both from one install of boost?
That's correct.
I would have assumed that there's be filesystem_v2 and filesystem_v3 include directories, namespaces, and libraries. The filesystem directory would look at a #define to pick which headers to include and which namespace to mention in a using directive.
Anything like that I could think of seemed more work that it is worth. Much existing code, I suspect, will compile unchanged on v3. If it turns out there is lots of existing code that won't "just work" on V3, I want to find out the details before deciding on a course of action. It may be that v3 needs modifications.
The switching code would simply change the default of the #define and symlink the appropriate library.
I considered a symlink approach, but that's a problem on some platforms.
My thinking here is that I'd like to be able to install boost on my system, have old code pick up the v2 default, but still be able to use v3 explicitly in new code, should I so choose.
That would make a lot of sense if it turns out there is a lot of v2 based code that won't work with v3. Particularly code that isn't trivial to change to work with v3. So let me know if you are running into a lot of problems when you try your existing code. --Beman

On 2 June 2010 15:04, Beman Dawes <bdawes@acm.org> wrote:
Much existing code, I suspect, will compile unchanged on v3. If it turns out there is lots of existing code that won't "just work" on V3,
I suspect so as well, but it only takes one package that needs v2 for a linux distro to ship with v2, and if so using v3 there requires a supplementary boost install. What's the rationale for not making v3 the default right away?

On Wed, Jun 2, 2010 at 11:23 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
What's the rationale for not making v3 the default right away?
To limit initial exposure to early adopters so that it is easier to deal with any fallout from what is a really major upgrade. That said, I'm going to be testing to see what the impact of v3 as default would be on Boost libraries and tools. If that goes smoothly, maybe we might want to consider shipping with v3 as default. Thanks, --Beman

On Wed, Jun 2, 2010 at 1:37 PM, Scott McMurray <me22.ca+boost@gmail.com> wrote:
On 2 June 2010 11:07, Beman Dawes <bdawes@acm.org> wrote: ... I would have assumed that there's be filesystem_v2 and filesystem_v3 include directories, namespaces, and libraries. The filesystem directory would look at a #define to pick which headers to include and which namespace to mention in a using directive. The switching code would simply change the default of the #define and symlink the appropriate library.
FWIW, I'm prototyping this approach to better understand what is involved. So-far-so-good. No need for symlinks. Taking somewhat less time than I expected. Stay tuned... --Beman

Hi Beman, I tried converting quickbook last night. It was pretty simple, there were a few compile errors but I was able to fix them all by either using the replacement methods or removing 'fs::native' from the constructor. That's for g++ 4.5 on darwin. There were a few compile errors for Clang, I'll look into them if you want. If v2 is default, does that mean that v3 won't be going through our regression tests? Some comments on the parts of the documentation I looked at: http://svn.boost.org/svn/boost/trunk/libs/filesystem/v3/libs/filesystem/doc/... It says 'native_file_string' is now 'file_string', which doesn't exist. According to the reference it's either 'native', or perhaps one of the 'string' methods. I think you should also note in the reference that 'native' returns a reference while the other methods return by copy, it's implicit but easily overlooked. http://svn.boost.org/svn/boost/trunk/libs/filesystem/v3/libs/filesystem/doc/... Since the home page says that cygwin uses the windows api, I assume the description of absolute paths on cygwin is wrong. http://svn.boost.org/svn/boost/trunk/libs/filesystem/v3/libs/filesystem/doc/... I read this to find out about generic and native paths but completely missed that the next section describes how to get the formats from a path object, as I didn't realise what a 'path observer' is. Something along the lines of, 'Methods for accessing these formats are covered in the next section.' would be clearer. It'd also help if you linked back to the section that describes 'tut4'. I jumped into the middle of the tutorial and had to look back to find out what it did. Daniel

Hi Daniel,
I tried converting quickbook last night. It was pretty simple, there were a few compile errors but I was able to fix them all by either using the replacement methods or removing 'fs::native' from the constructor. That's for g++ 4.5 on darwin.
Hum... The configuration is currently such that you aren't getting all the transition aids. I'm pretty sure quickbook should work out of the box without any changes. Let me give it a try. I'll get back to you once a fix is committed. --Beman PS: I'll also reply to you docs suggestions, but I'd like to clear the configuration issue first.

On Thu, Jun 3, 2010 at 7:20 PM, Beman Dawes <bdawes@acm.org> wrote:
Hi Daniel,
I tried converting quickbook last night. It was pretty simple, there were a few compile errors but I was able to fix them all by either using the replacement methods or removing 'fs::native' from the constructor. That's for g++ 4.5 on darwin.
Hum... The configuration is currently such that you aren't getting all the transition aids. I'm pretty sure quickbook should work out of the box without any changes. Let me give it a try. I'll get back to you once a fix is committed.
OK, after revision 62420, quickbook works as is with v3. While it is great when existing code works with v3 without changes, it would be better for the long run to avoid use of deprecated names and features. All new code should define BOOST_FILESYSTEM_NO_DEPRECATED. I'm working on better docs for transitioning existing code. Thanks, --Beman

On Thu, Jun 3, 2010 at 6:40 AM, Daniel James <dnljms@gmail.com> wrote:
... Some comments on the parts of the documentation I looked at: ...
I just committed a bunch of doc changes that should address all those issues. Thanks, --Beman

On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion. With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something? Alex -- SFTP for Windows Explorer (http://www.swish-sftp.org)

On Thu, Jun 3, 2010 at 6:48 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion.
With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something?
Yep, there is another way. First convert the UTF-8 string to a UTF-16 string using software of your choice. Then construct a boost::filesystem::path with the UTF-16 string as an argument. Alternately, assign the UTF-16 string to a boost::filesystem::path. Say the resulting path is named p. Then p.string() will return a std::string containing the path in native format, native codepage. Does that solve your problem? --Beman

On Thu, 3 Jun 2010 20:04:26 -0400, Beman Dawes wrote:
On Thu, Jun 3, 2010 at 6:48 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion.
With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something?
Yep, there is another way. First convert the UTF-8 string to a UTF-16 string using software of your choice. Then construct a boost::filesystem::path with the UTF-16 string as an argument. Alternately, assign the UTF-16 string to a boost::filesystem::path.
Say the resulting path is named p. Then p.string() will return a std::string containing the path in native format, native codepage.
Does that solve your problem?
Possibly, I'll get back to you once I've actually tried implementing something. My next questions related to the Windows API A/W variants. Another library I've written many generic functions that wrap these kinds of API calls by taking a basic_path and selecting the A or W variant depending on whether the caller passes a basic_string<char> or a basic_string<wchar_t>. I imagine this is going to be hard to port to v3 if I still want the caller to be able to select the A or W variant. Is there any reason why I should even care about being able to call the A variants? My original theory was that someone could write a program that always used basic_string<char> and could use that on Win9x but I've not actually tested that theory. Alex -- SFTP for Windows Explorer (http://www.swish-sftp.org)

Would it be possible to have some constructors (or static functions) to create a path from an UTF-8 or UTF-16 string?

On Fri, Jun 4, 2010 at 3:57 AM, Sylvain Pointeau <sylvain.pointeau@gmail.com> wrote:
Would it be possible to have some constructors (or static functions) to create a path from an UTF-8 or UTF-16 string?
UTF-16 via wchar_t is already supported for Windows (and possibly other environments) where wchar_t a UTF-16 type. Also, UTF-16 will be supported for all c++0x compilers via char16_t strings. UTF-8 strings are already supported for environments like Mac OS X where UTF-8 is the default narrow string encoding. UTF-8 strings are already supported for all environments, via path::imbue(). Temporary use of any locale is already supported via class scoped_path_locale. Now it would also be possible to provide a constructor, etc, that takes a locale, but is that worth cluttering the interface? I went back and forth, on that, but the use cases I could think of were not very compelling. --Beman

On 6/4/2010 7:06 AM, Beman Dawes wrote:
UTF-16 via wchar_t is already supported for Windows (and possibly other environments) where wchar_t a UTF-16 type.
IIRC, Windows uses wchar_t for UCS-2, not UTF-16. No surrogate pairs. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On Fri, Jun 4, 2010 at 9:17 AM, Eric Niebler <eric@boostpro.com> wrote:
On 6/4/2010 7:06 AM, Beman Dawes wrote:
UTF-16 via wchar_t is already supported for Windows (and possibly other environments) where wchar_t a UTF-16 type.
IIRC, Windows uses wchar_t for UCS-2, not UTF-16. No surrogate pairs.
My understanding is that Windows switched to UTF-16 awhile ago. http://msdn.microsoft.com/en-us/library/dd374081%28VS.85%29.aspx says "Unicode-enabled functions in Windows use UTF-16" http://en.wikipedia.org/wiki/UTF-16/UCS-2 says "UTF-16 is the native internal representation of text in the Microsoft Windows 2000/XP/2003/Vista/CE..." and "Older Windows NT systems (prior to Windows 2000) only support UCS-2." I haven't actually run tests to see if surrogate pairs are handled correctly in filesnames. --Beman

On 6/4/2010 11:01 AM, Beman Dawes wrote:
On Fri, Jun 4, 2010 at 9:17 AM, Eric Niebler <eric@boostpro.com> wrote:
IIRC, Windows uses wchar_t for UCS-2, not UTF-16. No surrogate pairs.
My understanding is that Windows switched to UTF-16 awhile ago.
<snip> Hawt! My information was out of date. I'll stop spreading FUD. -- Eric Niebler BoostPro Computing http://www.boostpro.com

On 4 June 2010 11:09, Eric Niebler <eric@boostpro.com> wrote:
On 6/4/2010 11:01 AM, Beman Dawes wrote:
My understanding is that Windows switched to UTF-16 awhile ago.
Hawt! My information was out of date. I'll stop spreading FUD.
The APIs can perhaps handle it, but I'm not convinced that any tools do. Even Explorer in Windows 7 still requires that you hit backspace twice to delete a character outside the BMP. In contrast, Nautilus gets this right in Linux.

AMDG Beman Dawes wrote:
On Fri, Jun 4, 2010 at 3:57 AM, Sylvain Pointeau <sylvain.pointeau@gmail.com> wrote:
Would it be possible to have some constructors (or static functions) to create a path from an UTF-8 or UTF-16 string?
UTF-16 via wchar_t is already supported for Windows (and possibly other environments) where wchar_t a UTF-16 type.
Also, UTF-16 will be supported for all c++0x compilers via char16_t strings.
UTF-8 strings are already supported for environments like Mac OS X where UTF-8 is the default narrow string encoding.
UTF-8 strings are already supported for all environments, via path::imbue().
Temporary use of any locale is already supported via class scoped_path_locale.
Is this thread-safe?
Now it would also be possible to provide a constructor, etc, that takes a locale, but is that worth cluttering the interface? I went back and forth, on that, but the use cases I could think of were not very compelling.
In Christ, Steven Watanabe

On 06/04/2010 08:55 PM, Steven Watanabe wrote:
Is this thread-safe?
It seems not. It uses path_locale function that has an unprotected function-local static. I can also see a few namespace-scope non-POD variables in *.cpp.

On Fri, Jun 4, 2010 at 12:55 PM, Steven Watanabe <watanabesj@gmail.com> wrote:
AMDG
Beman Dawes wrote:
...
Temporary use of any locale is already supported via class scoped_path_locale.
Is this thread-safe?
No. Good point. That's probably a strong argument for adding a constructor that takes a locale. Since the uses are probably not all that common, maybe additional overloads for other operations can be avoided. Thanks! --Beman

On Fri, 4 Jun 2010 07:06:01 -0400, Beman Dawes wrote:
On Fri, Jun 4, 2010 at 3:57 AM, Sylvain Pointeau <sylvain.pointeau@gmail.com> wrote:
Would it be possible to have some constructors (or static functions) to create a path from an UTF-8 or UTF-16 string? ... Now it would also be possible to provide a constructor, etc, that takes a locale, but is that worth cluttering the interface? I went back and forth, on that, but the use cases I could think of were not very compelling.
For my puroses it would be nice to be able to set a per-path locale rather than a global one. This way, my SFTP paths, which are only ever displayed to the user, can be held in paths that deal in UTF-8/UTF-16 without forcing that upon other use of class path. These other uses can, for example, continue to use local-codepage/UTF-16 making them suitable for passing to Windows API functions. Alex

On Fri, Jun 4, 2010 at 6:37 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Fri, 4 Jun 2010 07:06:01 -0400, Beman Dawes wrote:
On Fri, Jun 4, 2010 at 3:57 AM, Sylvain Pointeau <sylvain.pointeau@gmail.com> wrote:
Would it be possible to have some constructors (or static functions) to create a path from an UTF-8 or UTF-16 string? ... Now it would also be possible to provide a constructor, etc, that takes a locale, but is that worth cluttering the interface? I went back and forth, on that, but the use cases I could think of were not very compelling.
For my puroses it would be nice to be able to set a per-path locale rather than a global one.
My *strong* objection to a per-path locale is that it adds additional state, which increase size and complicates reasoning. The most likely change would be to add an overload with a locale argument to one or more functions. I'm giving that serious thought.
This way, my SFTP paths, which are only ever displayed to the user, can be held in paths that deal in UTF-8/UTF-16 without forcing that upon other use of class path. These other uses can, for example, continue to use local-codepage/UTF-16 making them suitable for passing to Windows API functions.
I'm concerned enough about your use case to want to firm up a possible interface. I'll post something when I've had a chance to work on it a bit more. --Beman

On Thu, Jun 3, 2010 at 8:32 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Thu, 3 Jun 2010 20:04:26 -0400, Beman Dawes wrote:
On Thu, Jun 3, 2010 at 6:48 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion.
With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something?
Yep, there is another way. First convert the UTF-8 string to a UTF-16 string using software of your choice. Then construct a boost::filesystem::path with the UTF-16 string as an argument. Alternately, assign the UTF-16 string to a boost::filesystem::path.
Say the resulting path is named p. Then p.string() will return a std::string containing the path in native format, native codepage.
Does that solve your problem?
Possibly, I'll get back to you once I've actually tried implementing something.
My next questions related to the Windows API A/W variants. Another library I've written many generic functions that wrap these kinds of API calls by taking a basic_path and selecting the A or W variant depending on whether the caller passes a basic_string<char> or a basic_string<wchar_t>. I imagine this is going to be hard to port to v3 if I still want the caller to be able to select the A or W variant.
That's essentially what the v2 implementation of operations functions did. See libs/filesystem/src/operations.cpp. Life got much simpler when I switched to just using the W variants. I'm sorry I ever messed with the A variants.
Is there any reason why I should even care about being able to call the A variants? My original theory was that someone could write a program that always used basic_string<char> and could use that on Win9x but I've not actually tested that theory.
Are you willing to compromise your design and implementation so that someone could use Win9x? I'm not:-) Thanks for the comments and questions! --Beman

On Thu, 3 Jun 2010 20:04:26 -0400, Beman Dawes wrote:
On Thu, Jun 3, 2010 at 6:48 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion.
With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something?
Yep, there is another way. First convert the UTF-8 string to a UTF-16 string using software of your choice. Then construct a boost::filesystem::path with the UTF-16 string as an argument. Alternately, assign the UTF-16 string to a boost::filesystem::path.
If I contruct a path directly using a UTF-8 char string, will the UTF-8 to UTF-16 conversion not be taken care of by codecvt? Alex -- SFTP for Windows Explorer (http://www.swish-sftp.org)

On Fri, Jun 4, 2010 at 5:03 AM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Thu, 3 Jun 2010 20:04:26 -0400, Beman Dawes wrote:
On Thu, Jun 3, 2010 at 6:48 PM, Alexander Lamaison <awl03@doc.ic.ac.uk> wrote:
On Wed, 2 Jun 2010 11:07:34 -0400, Beman Dawes wrote:
* Any other comments or suggestions?
Some code I have wraps a SFTP library that returns paths as UTF-8 strings. These need to be converted to the local code page before displaying to the user. My original plan was to typedef basic_path with a traits class of my own that I could imbue with a specific locale to do the conversion.
With FSv3 the only way to do this seems to be path::imbue() which imbue *all* paths, globally. Or am I missing something?
Yep, there is another way. First convert the UTF-8 string to a UTF-16 string using software of your choice. Then construct a boost::filesystem::path with the UTF-16 string as an argument. Alternately, assign the UTF-16 string to a boost::filesystem::path.
If I contruct a path directly using a UTF-8 char string, will the UTF-8 to UTF-16 conversion not be taken care of by codecvt?
Yes, if the path locale is UTF-8. I was assuming you were talking about Windows, and didn't want to use path::imbue(). By the way, I completely forgot to mention class scoped_path_locale. See path.hpp, starting about line 452. --Beman

On 06/02/2010 07:07 PM, Beman Dawes wrote:
Because version 3 will break some user code, both v2 and v3 will be shipped for several releases. For 1.44, the default is v2 and the user has to explicitly switch to v3.
I'd really appreciate it if some Boosters could give v3 a try on your own code that uses Boost.Filesystem.
I think that the approach of manual switching between v2 and v3 is not very good. The packagers will not be able to ship both versions of the library in Linux distributions, which effectively means that v3 will not get adopted. Also, if I want my code to be compatible with both versions of the library, I cannot see a way to detect which version is currently installed. This is the case with my Boost.Log - it's not compatible with v3, mostly because path is not a template anymore. I would like people to be able to use Boost.Log regardless of which Boost.Filesystem their application uses. I would really prefer to have both versions of the library available at any time. Either as two separate libraries (like signals and signals2) or as two parts of one library (like spirit v2 and classic). In the latter case it could be compiled into one binary, with v2 and v3 being in different namespaces and "namespace filesystem" being an alias to one of them. And a quick glance at the code I see that initial_path is still not thread-safe. See this ticket: https://svn.boost.org/trac/boost/ticket/3531

On Sat, Jun 5, 2010 at 4:20 AM, Andrey Semashev <andrey.semashev@gmail.com> wrote:
On 06/02/2010 07:07 PM, Beman Dawes wrote:
Because version 3 will break some user code, both v2 and v3 will be shipped for several releases. For 1.44, the default is v2 and the user has to explicitly switch to v3.
I'd really appreciate it if some Boosters could give v3 a try on your own code that uses Boost.Filesystem.
I think that the approach of manual switching between v2 and v3 is not very good. The packagers will not be able to ship both versions of the library in Linux distributions, which effectively means that v3 will not get adopted.
Also, if I want my code to be compatible with both versions of the library, I cannot see a way to detect which version is currently installed. This is the case with my Boost.Log - it's not compatible with v3, mostly because path is not a template anymore. I would like people to be able to use Boost.Log regardless of which Boost.Filesystem their application uses.
You and Scott McMurray have convinced me that this is a real problem.
I would really prefer to have both versions of the library available at any time. Either as two separate libraries (like signals and signals2) or as two parts of one library (like spirit v2 and classic). In the latter case it could be compiled into one binary, with v2 and v3 being in different namespaces and "namespace filesystem" being an alias to one of them.
That's the approach I'm prototyping. Looks promising. Stay tuned... Thanks, --Beman
participants (8)
-
Alexander Lamaison
-
Andrey Semashev
-
Beman Dawes
-
Daniel James
-
Eric Niebler
-
Scott McMurray
-
Steven Watanabe
-
Sylvain Pointeau