[filesystem] Major changes before standardization

newer
Re: [boost] Re: Re: Querying the...

Beman Dawes

11 Nov 2004 11 Nov '04

5:20 p.m.

I'm currently working on a major revision of the Boost filesystem library aimed at getting it ready to be proposed to the C++ committee for the next Library TR. The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring. Doing this adds a certain amount of complexity compared to the current path class. For example, a path_traits class has to be introduced to give basic_paths on user defined types a way to import the conversions, delimiters, and other traits. To offset the added complexity, I'd like to reduce some of the complexities in the current design: * Eliminate the distinction between native and generic grammars; either would be permitted in all contexts. As well as simplifying the class interface a bit, this will also eliminate a source of user confusion. * Move the name error checking from the basic_path class into stand alone functions. Although the current error checking could be further improved by Peter Dimov's suggested change allowing failure to be treated as either a warning or a hard error, moving it out of the class simplifies the interface, allows full path (rather than name-by-name) error checks, and eases internationalization. The primary downside of these changes will be that users who want path portability checks will have to code calls to checking functions that are currently being called automatically. The internationalization of the library is a big enough change that we will probably want to have at least a mini-review once the changes are complete. Comments? --Beman

Show replies by date

Peter Dimov

11 Nov 11 Nov

5:41 p.m.

Beman Dawes wrote:

...

The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

Doing this adds a certain amount of complexity compared to the current path class. For example, a path_traits class has to be introduced to give basic_paths on user defined types a way to import the conversions, delimiters, and other traits.

If your basic_path takes a path_traits parameter - which means that you're, in fact, misnaming a policy class as 'path_traits' - this is likely to create basic_string-ish problems further down the path. In addition, basic_path on user-defined types simply doesn't make sense (to me at least), because the user can't just define his own path class. The set of supported paths is determined by the capabilities of the underlying filesystem layer. In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

Bruno Martínez Aguerre

8:58 p.m.

Peter Dimov wrote:

...

Beman Dawes wrote:

...
The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

Doing this adds a certain amount of complexity compared to the current path class. For example, a path_traits class has to be introduced to give basic_paths on user defined types a way to import the conversions, delimiters, and other traits.

If your basic_path takes a path_traits parameter - which means that you're, in fact, misnaming a policy class as 'path_traits' - this is likely to create basic_string-ish problems further down the path.

In addition, basic_path on user-defined types simply doesn't make sense (to me at least), because the user can't just define his own path class. The set of supported paths is determined by the capabilities of the underlying filesystem layer. In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

It does make sense to me. You could build an archive file yourself and use Boost.Filesystem to navigate it. Bruno Martínez

Peter Dimov

11:03 p.m.

Bruno Martínez Aguerre wrote:

...

Peter Dimov wrote:

...
In addition, basic_path on user-defined types simply doesn't make sense (to me at least), because the user can't just define his own path class. The set of supported paths is determined by the capabilities of the underlying filesystem layer. In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

It does make sense to me. You could build an archive file yourself and use Boost.Filesystem to navigate it.

Boost.Filesystem cannot navigate archives. It is a portable wrapper over the native filesystem API. You may use basic_path<something> as a custom path and write your own filesystem library to navigate your archives, but (1) this is not the stated goal of Boost.Filesystem and (2) why would you want to make people's lives miserable by using a nonstandard path is beyond me.

Beman Dawes

12 Nov 12 Nov

1:11 a.m.

At 12:41 PM 11/11/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

Doing this adds a certain amount of complexity compared to the current path class. For example, a path_traits class has to be introduced to give basic_paths on user defined types a way to import the conversions, delimiters, and other traits.

If your basic_path takes a path_traits parameter - which means that you're, in fact, misnaming a policy class as 'path_traits' - this is likely to create basic_string-ish problems further down the path.

I think I had better post some actual code rather than try to explain the approach taken. I'm definitely nervous about it. It will be a couple of weeks, but I will post some of my proof-of-concept code.

...

In addition, basic_path on user-defined types simply doesn't make sense (to me at least), because the user can't just define his own path class.

The user in fact can define his or her own path class, although granted there are a lot of constraints.

...

The set of supported paths is determined by the capabilities of the underlying filesystem layer.

Each O/S has an external data type and encoding which is used to represent paths in the external filesystem. For example, POSIX uses 1 byte with various implementation defined encoding and Windows uses 2 bytes with a Unicode encoding. The user can't change that. But inside a program the user has more freedom.

...

In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

The default conversion is implementation defined, but users can supply their own conversion. One use case I have in mind is a character based O/S which uses some MBCS encoding of paths that isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding. The user should be able to provide such a conversion function, overriding the implementation defined default. Note however that whether or not such a user supplied conversion will work sensibly or at all is very operating system dependent. The filesystem library can't do anything about what the O/S accepts or doesn't accept. Another case of particular interest is Windows where the external type is 2 bytes and the user chooses path, which is char based, as the internal type for directory iteration. What happens when an directory entry uses the high-order byte? The default conversion supplied by the Windows API is lossy; the high order byte is simply discarded. An alternative conversion function might consider this to be an error and throw. Now assuming the filesystem library chooses one of those approaches as the default, some users will prefer the other approach and they should be permitted to supply such a conversion function. --Beman

Peter Dimov

1:36 a.m.

Beman Dawes wrote:

...

At 12:41 PM 11/11/2004, Peter Dimov wrote:

...
In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

The default conversion is implementation defined, but users can supply their own conversion. One use case I have in mind is a character based O/S which uses some MBCS encoding of paths that isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding. The user should be able to provide such a conversion function, overriding the implementation defined default.

I see no need for custom conversions in this case. The user can just supply the appropriate narrow UTF-8 path directly. I also don't see where the implementation-defined default enters the picture, as the translation is between char and char.

...

Another case of particular interest is Windows where the external type is 2 bytes and the user chooses path, which is char based, as the internal type for directory iteration. What happens when an directory entry uses the high-order byte? The default conversion supplied by the Windows API is lossy; the high order byte is simply discarded.

The conversion depends on the current codepage, AFAIK. The high byte is only discarded if you're using windows-1251 (a superset of ISO-8859-1). A custom conversion simply cannot work portably. Besides, if the user knows that he'll need a custom conversion, it's much, much easier to just use wpath. Anything besides path/wpath is even less useful than basic_string that isn't string or wstring, amazing as this may be, and we all know how popular basic_string is.

Peter Dimov

2:03 a.m.

Peter Dimov wrote:

...

Beman Dawes wrote:

...
The default conversion is implementation defined, but users can supply their own conversion. One use case I have in mind is a character based O/S which uses some MBCS encoding of paths that isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding. The user should be able to provide such a conversion function, overriding the implementation defined default.

I see no need for custom conversions in this case. The user can just supply the appropriate narrow UTF-8 path directly. I also don't see where the implementation-defined default enters the picture, as the translation is between char and char.

I guess you have something like this in mind: void burn( wpath const & source, wpath const & dest ); where 'source' needs to be converted to a narrow path using encoding #1, but 'dest' needs to be converted to a narrow path using encoding #2 (UTF-8), _but_ the system doesn't know that so the user needs to override the conversion of 'dest'. Still makes no sense to me. :-) void burn( path const & source, path const & dest ); can be used to accomplish this, but it's a hypothetical scenario. If the system doesn't know that the CD needs its path UTF-8 encoded, nobody will be able to read that CD back! But I may be missing something. Perhaps if you illustrate the user path/conversion examples with code...

...

From where I sit, the library provides:

void fs_function( path const & p ); void fs_function( wpath const & p ); One of these is native, the other is either native or converts and calls the other. I can't "imbue" a custom conversion in path or wpath, but I don't have to, because I can just convert the path myself beforehand (with the exact same loss of portability). Turning the filesystem API into template<class Ch, class Tr> void fs_function( basic_path<Ch, Tr> const & p ); (thereby moving the entire implementation in headers) doesn't seem to buy us much; the native API is not, and will never be, templatized. :-) Not to mention that third-party libraries will be reluctant to adopt this style, as this requires them to ship source. In fact, many libraries will probably even omit the wpath overloads. If we could somehow combine path and wpath into one class, this would neatly sidestep this missing wpath overload problem. :-)

Jonathan Turkanis

7:44 a.m.

"Peter Dimov" <pdimov@mmltd.net> wrote in message:

...

In fact, many libraries will probably even omit the wpath overloads. If we could somehow combine path and wpath into one class, this would neatly sidestep this missing wpath overload problem. :-)

...

From your smiley it looks like you might be joking, but I think this is an idea worth exploring. Frustrated by repeated conversions of strings from narrow to wide and back when passing them between various APIs, I once wrote a string class which could store either a wide string or a narrow string, converting only when necessary. This might be a reasonable strategy for paths. Constructors could take either narrow or wide strings, and explicit conversion functions string() and wstring() could be provided.

Jonathan

Beman Dawes

3:50 p.m.

At 08:36 PM 11/11/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 12:41 PM 11/11/2004, Peter Dimov wrote:

...
In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

The default conversion is implementation defined, but users can supply their own conversion. One use case I have in mind is a character based O/S which uses some MBCS encoding of paths that isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding. The user should be able to provide such a conversion function, overriding the implementation defined default.

I see no need for custom conversions in this case. The user can just supply the appropriate narrow UTF-8 path directly.

I considered that. For most users, who will rarely if ever need a custom conversion, it would be fine to require them to do any custom conversion themselves before constructing a path. But a few users will need to do custom conversions for virtually every use of the library (probably because their O/S just traffics in raw chars, yet they need a wide character encoding.) These users would be helped a great deal by custom conversions. Maybe that is an extreme corner case - it certainly would simplify the design to eliminate custom conversions. See my reply to Vladimir Prus regarding a single path class for an example. One case where a custom conversion is required is for a user defined string type. There isn't any default; the user has to supply the conversion. I know you don't believe in the usefulness of such user defined string types, but I'd be surprised if the committee would accept elimination of user defined string types.

...

Anything besides path/wpath is even less useful than basic_string that isn't string or wstring, amazing as this may be, and we all know how popular basic_string is.

The issue isn't the popularity of basic_string. As long as there are even a few users who depend on basic_strings other than string and wstring, the committee will probably want to support it. Also, remember that basic_string<char16_t> and basic_string<char32_t> may well be mandated in the fairly close future. --Beman

Peter Dimov

7:26 p.m.

Beman Dawes wrote:

...

At 08:36 PM 11/11/2004, Peter Dimov wrote:

...
Beman Dawes wrote:

...
At 12:41 PM 11/11/2004, Peter Dimov wrote:

...
In particular, the user cannot define the conversions between the different path types, because they are implementation defined.

The default conversion is implementation defined, but users can supply their own conversion. One use case I have in mind is a character based O/S which uses some MBCS encoding of paths that isn't UTF-8, but the user wishes to burn a CD with UTF-8 encoding. The user should be able to provide such a conversion function, overriding the implementation defined default.

I see no need for custom conversions in this case. The user can just supply the appropriate narrow UTF-8 path directly.

I considered that. For most users, who will rarely if ever need a custom conversion, it would be fine to require them to do any custom conversion themselves before constructing a path. But a few users will need to do custom conversions for virtually every use of the library (probably because their O/S just traffics in raw chars, yet they need a wide character encoding.) These users would be helped a great deal by custom conversions.

I still don't get it. I guess that we need code. Either way, it is the user doing the conversion. They aren't helped one bit. We have two OSes in use today. Windows, which takes either path or wpath, and POSIX et al, which takes only a path. If the user wants to use something that is neither a path or a wpath, he must convert it to one of those. There is nothing the library can do, and providing smoke and mirrors just to make it _seem_ that other paths are supported, when in reality they simply are not, is both a disservice and a needless complication. IMO.

...

One case where a custom conversion is required is for a user defined string type. There isn't any default; the user has to supply the conversion.

The user needs to convert the user-defined string type to either path or wpath. This is not something that the filesystem library can, or should, do for him; a function can handle this conversion easily. This use case does not imply that there must exist a basic_path for every basic_string, because such a basic_path is not a filesystem path. The filesystem simply does not take user-defined strings, never will, and no amount of traitification can change that.

...

I know you don't believe in the usefulness of such user defined string types, but I'd be surprised if the committee would accept elimination of user defined string types.

There is no such elimination. basic_string works exactly as before, and OSes work exactly as before.

...

Also, remember that basic_string<char16_t> and basic_string<char32_t> may well be mandated in the fairly close future.

The filesystem library provides an interface to the native OS filesystem API. If that API can take char32_t (which is not the case today on any platform AFAIK), then the library needs to be able to take char32_t. On such a platform wchar_t will probably be char32_t, so a wpath can be used as-is. This is similar to the current status quo on Windows, where the UTF-16 wpath encoding dictates that wchar_t is char16_t. Custom conversions don't help, because only the system knows how a char32_t name maps to a char name. Anyway, here's a summary of my position (assuming two path types): void fs_function( path const & p ); void fs_function( wpath const & p ); Windows ("dual") implementation: First overload calls FsFunctionA, second FsFunctionW. No conversion is done by the library, because the assumption is that only the system can do it right. POSIX ("single") implementation: First overload calls ::fs_function, second does a library-supplied conversion and invokes the first. User needs to use basic_string<UDT>: wpath path_from_UDT( basic_string<UDT> const & s ); This covers the filesystem part. I suspect that what you want is to provide the generic path grammar part, templated on arbitrary character types. (A native grammar probably won't work for characters that aren't native.) That may be nice, and in fact I remember suggesting that before :-) but I'm really not sure whether this outweighs the fact that the filesystem-specific part of the design is encumbered with supporting the kitchen sink, because I don't recall ever needing path manipulation for something that is not char or wchar_t. One way to provide the necessary functionality is to expose a collection of algorithms that allow the user to do path manipulations on arbitrary character ranges. ;-) Either way, we need examples before we can move the discussion forward.

Beman Dawes

9:16 p.m.

At 02:26 PM 11/12/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...

...
(probably because their O/S just traffics in raw chars, yet they need a wide character encoding.) These users would be helped a great deal by custom conversions.

I still don't get it. I guess that we need code. Either way, it is the user doing the conversion. They aren't helped one bit.

Yes, either way the user has to supply the conversion code. But without direct support, the user must call that code manually (or more likely in a user written wrapper.) But if path directly supports user conversions, then the conversion functions are called automatically by path objects when needed, and that is a win for the user.

...

We have two OSes in use today. Windows, which takes either path or wpath,

...

and POSIX et al, which takes only a path. If the user wants to use >something that is neither a path or a wpath, he must convert it to one of those.

Yes. Or more exactly, to "native_path_string_type" which is an implementation defined typedef, AFAWK, always to std:string or std::wstring.

...

There is nothing the library can do, and providing smoke and mirrors just to make it _seem_ that other paths are supported, when in reality they simply are not, is both a disservice and a needless complication. IMO.

I don't understand why you say they aren't supported. Is it because the user is supplying the conversion functions?

...

...
One case where a custom conversion is required is for a user defined string type. There isn't any default; the user has to supply the conversion.

The user needs to convert the user-defined string type to either path or wpath. This is not something that the filesystem library can, or should,

...

for him; a function can handle this conversion easily.

Yes, of course. The question is whether the user calls the function directly, or passes it to the library so that the library will call the function when needed.

...

This use case does not imply that there must exist a basic_path for every basic_string, because

such a basic_path is not a filesystem path. The filesystem simply does not take user-defined strings, never will, and no amount of traitification can change that.

The filesystem operational functions never see the user-defined strings. Conversion has already occurred. The operational functions are not templated and only deal with the native_path_string_type.

...

...

Either way, we need examples before we can move the discussion forward.

I'll post proof-of-concept level code, but it will be several weeks. --Beman

Peter Dimov

11:50 p.m.

Beman Dawes wrote:

...

At 02:26 PM 11/12/2004, Peter Dimov wrote:

...
We have two OSes in use today. Windows, which takes either path or wpath, and POSIX et al, which takes only a path. If the user wants to use something that is neither a path or a wpath, he must convert it to one of those.

Yes. Or more exactly, to "native_path_string_type" which is an implementation defined typedef, AFAWK, always to std:string or std::wstring.

I'm not sure that this is the right way to handle "dual" OSes, such as Windows NT. On Windows NT both string types are "native" from the API point of view. The library should not attempt to convert the path supplied from the user! This may or may not produce the desired results. The OS must be trusted to do the necessary conversion (it knows whether the filesystem on which the path resides stores narrow or wide names, and how to convert between them in this particular context). "Single" OSes that only support... well, a single string type, will obviously need help from the library. :-)

Beman Dawes

13 Nov 13 Nov

1:19 a.m.

...

Beman Dawes wrote:

...
At 02:26 PM 11/12/2004, Peter Dimov wrote:

...
We have two OSes in use today. Windows, which takes either path or wpath, and POSIX et al, which takes only a path. If the user wants to use something that is neither a path or a wpath, he must convert it to one of those.

Yes. Or more exactly, to "native_path_string_type" which is an implementation defined typedef, AFAWK, always to std:string or std::wstring.

I'm not sure that this is the right way to handle "dual" OSes, such as Windows NT. On Windows NT both string types are "native" from the API

At 06:50 PM 11/12/2004, Peter Dimov wrote: point

...

of view. The library should not attempt to convert the path supplied from

...

the user! This may or may not produce the desired results. The OS must be

...

trusted to do the necessary conversion (it knows whether the filesystem on which the path resides stores narrow or wide names, and how to convert between them in this particular context).

It might be messy to make the single path class approach work if Windows is viewed as a dual rather than wide O/S. path p; p /= "foo"; if ( some_bool ) p /= L"kühl"; if ( exists( p ) ) ... // use wide or narrow API depending on some_bool I guess path objects could keep track of whether or not they had ever been modified by an argument other than a char string, and use the Windows wide API. Seems messy... --Beman

Peter Dimov

2:16 a.m.

Beman Dawes wrote:

...

It might be messy to make the single path class approach work if Windows is viewed as a dual rather than wide O/S.

path p; p /= "foo"; if ( some_bool ) p /= L"kühl";

if ( exists( p ) ) ... // use wide or narrow API depending on some_bool I guess path objects could keep track of whether or not they had ever been modified by an argument other than a char string, and use the Windows wide API. Seems messy...

It's worse. Some versions of Windows (9x) are narrow-minded, but you don't know that at compile time. ;-) You pretty much have to treat Windows as a dual OS; it's impossible to choose a native character type until the program is run.

Beman Dawes

5:12 p.m.

At 09:16 PM 11/12/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
It might be messy to make the single path class approach work if Windows is viewed as a dual rather than wide O/S.

path p; p /= "foo"; if ( some_bool ) p /= L"kühl";

if ( exists( p ) ) ... // use wide or narrow API depending on some_bool I guess path objects could keep track of whether or not they had ever been modified by an argument other than a char string, and use the Windows wide API. Seems messy...

It's worse. Some versions of Windows (9x) are narrow-minded, but you don't know that at compile time. ;-)

Damn! I always forget about Win 9x; I moved the NT at the first beta and never looked back. So what happens in Win 9x when you use the wide API?

...

You pretty much have to treat Windows as a dual OS; it's impossible to choose a native character type until the program is run.

That's a concern. The codepage issue you brought up is also a concern. I need to do some more research, clearly. In thinking more about a single path class versus a class path template, the single path class approach looks really tough is you can't identify a single native character type. Yet the single path class approach is really appealing in many ways. --Beman

Peter Dimov

6:04 p.m.

Beman Dawes wrote:

...

At 09:16 PM 11/12/2004, Peter Dimov wrote:

...
It's worse. Some versions of Windows (9x) are narrow-minded, but you don't know that at compile time. ;-)

Damn! I always forget about Win 9x; I moved the NT at the first beta and never looked back.

So what happens in Win 9x when you use the wide API?

It doesn't work in a variety of ways, some of them very frustrating.

Beman Dawes

9:46 p.m.

At 01:04 PM 11/13/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 09:16 PM 11/12/2004, Peter Dimov wrote:

...
It's worse. Some versions of Windows (9x) are narrow-minded, but you don't know that at compile time. ;-)

Damn! I always forget about Win 9x; I moved the NT at the first beta and never looked back.

So what happens in Win 9x when you use the wide API?

It doesn't work in a variety of ways, some of them very frustrating.

Then there is Win CE, which is wide-only IIRC. --Beman

John Maddock

14 Nov 14 Nov

4:10 p.m.

...

Damn! I always forget about Win 9x; I moved the NT at the first beta and never looked back.

So what happens in Win 9x when you use the wide API?

It probably won't load the executable, or else just return a failure code, however there is something called the Microsoft Layer for Unicode that adds limited Unicode support to Win95/98/ME, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winpro.... If we insist that programs use this when run on these platforms (which let's be honest are all legacy systems now), then we can use wchar_t as the internal native character type and let Microsoft's own libraries take care of code pages and translations. Presumably if they haven't got this code correct then no one will :-) John.

Beman Dawes

5:25 p.m.

At 11:10 AM 11/14/2004, John Maddock wrote:

...

...
Damn! I always forget about Win 9x; I moved the NT at the first beta and never looked back.

So what happens in Win 9x when you use the wide API?

It probably won't load the executable, or else just return a failure code, however there is something called the Microsoft Layer for Unicode that adds limited Unicode support to Win95/98/ME, see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/mslu/winpro g/microsoft_layer_for_unicode_on_windows_95_98_me_systems.asp.

I've been assuming Microsoft's Layer for Unicode, and was assuming that Peter meant there were serious problems even if it was enabled. Maybe there are some serious problems but it looks to me that we only need WideCharToMultiByte, MultiByteToWideChar, AreFileApisANSI, and the wide file management functions to work as documented. We don't care about problems in the other portions of the API.

...

If we insist that programs use this when run on these platforms (which let's be honest are all legacy systems now), then we can use wchar_t as the internal native character type and let Microsoft's own libraries take care of code pages and translations. Presumably if they haven't got this code

...

correct then no one will :-)

I reread the Microsoft docs last night and came to the same conclusion you do. (As I have every time I've read them). I'm going to do a prototype implementation on the assumption that we can use wchar_t as the internal native character type for Windows, and char for POSIX. If someone can provide a breaking test case, we will look at it and see how serious it is. Thanks, --Beman

Peter Dimov

8:11 p.m.

Beman Dawes wrote:

...

I've been assuming Microsoft's Layer for Unicode, and was assuming that Peter meant there were serious problems even if it was enabled.

No, MSLU works, as far as I know. But we can't assume its existence. A library that doesn't support Win9x without MSLU will not be useful for users that need to write code that works on a wide variety of machines. Win9x's installed base, although declining, is still significant.

Beman Dawes

10:25 p.m.

At 03:11 PM 11/14/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
I've been assuming Microsoft's Layer for Unicode, and was assuming that

...

...
Peter meant there were serious problems even if it was enabled.

No, MSLU works, as far as I know. But we can't assume its existence. A library that doesn't support Win9x without MSLU will not be useful for users that need to write code that works on a wide variety of machines. Win9x's

...

installed base, although declining, is still significant.

Is there a reason not to just point such users to http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm? or provide the redestributables in an apps installer if it has one? Even if we provided an alternate implementation that considered Win9X to have char based native path strings, we still have to do conversions between strings and wstrings; filesystem::path traffics in either regardless of the native platform. Since the default conversion has to be the one provided by the operating system if any, we still can't get away from needing the Microsoft Layer for Unicode on Win9X. The idea that the default conversion has to be the one provided by the operating system (if the O/S supports conversion between wide and narrow character paths) isn't something that I invented. It has been mentioned by a number of the developers with actual wide/narrow experience that I've asked for advise over the last couple of years. I can think of two committee members (not from the same company, either) who were really adamant about it. --Beman

Aaron W. LaFramboise

10:47 p.m.

Beman Dawes wrote:

...

At 03:11 PM 11/14/2004, Peter Dimov wrote:

...
Beman Dawes wrote:

...
I've been assuming Microsoft's Layer for Unicode, and was assuming that

...
...
Peter meant there were serious problems even if it was enabled.

No, MSLU works, as far as I know. But we can't assume its existence. A library that doesn't support Win9x without MSLU will not be useful for users that need to write code that works on a wide variety of machines. Win9x's

...
installed base, although declining, is still significant.

Is there a reason not to just point such users to http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm? or provide the redestributables in an apps installer if it has one?

While this isn't an issue to me personally, there are a great number of users for which MSLU is unacceptable. For whatever reason, MSLU has rather stringent redistribution terms that are significantly more strict than the BSL, and mutually incompatible with many open source licenses, including the GPL. I would recommend avoiding using MSLU if possible because it will stymie adoption of the library by certain catagories of users, and will possibly cause unintentional infringement or misunderstanding. It also undermines the intentions of the BSL. Due to these problems, at least one project has attempted to reimplement MSLU under a more friendly license, such as the so-far incomplete MZLU from Mozilla. The open source libunicows replaces Microsoft's pseudoimport library, and will dynamically use with MSLU or MZLU, depending on what is availible at load time. However, if only a small number of functions are needed, which I suspect would be the case, it might be best to eliminate this problematic dependency entirely, and dynamically resolve the wide functions as needed, using appropriate fallback functionality if they are unavailible. The functionality could probably be encapsulated transparently within some sort of helper class. Aaron W. LaFramboise

Beman Dawes

15 Nov 15 Nov

11:46 p.m.

At 05:47 PM 11/14/2004, Aaron W. LaFramboise wrote:

...

While this isn't an issue to me personally, there are a great number of users for which MSLU is unacceptable. For whatever reason, MSLU has rather stringent redistribution terms that are significantly more strict than the BSL, and mutually incompatible with many open source licenses, including the GPL.

I would recommend avoiding using MSLU if possible because it will stymie adoption of the library by certain catagories of users, and will possibly cause unintentional infringement or misunderstanding. It also undermines the intentions of the BSL.

Due to these problems, at least one project has attempted to reimplement MSLU under a more friendly license, such as the so-far incomplete MZLU from Mozilla. The open source libunicows replaces Microsoft's pseudoimport library, and will dynamically use with MSLU or MZLU, depending on what is availible at load time.

Boost wouldn't be the entity redistributing the helper code, so that reduces the problem somewhat.

...

However, if only a small number of functions are needed, which I suspect would be the case, it might be best to eliminate this problematic dependency entirely, and dynamically resolve the wide functions as needed, using appropriate fallback functionality if they are unavailible. The functionality could probably be encapsulated transparently within some sort of helper class.

I'm just not sure how much effort boost developers would be willing to put in to accommodate legacy operating systems which already have workarounds available from several sources. The issue may be moot anyhow. If Win9X is treated as a dual wide/narrow system (which we may do anyhow), and a program only uses narrow paths, then the wide API functions will never be called. Thanks, --Beman

Peter Dimov

10:29 a.m.

Beman Dawes wrote:

...

At 03:11 PM 11/14/2004, Peter Dimov wrote:

...
No, MSLU works, as far as I know. But we can't assume its existence. A library that doesn't support Win9x without MSLU will not be useful for users that need to write code that works on a wide variety of machines. Win9x's installed base, although declining, is still significant.

Is there a reason not to just point such users to http://www.microsoft.com/msdownload/platformsdk/sdkupdate/psdkredist.htm? or provide the redestributables in an apps installer if it has one?

It might be acceptable to require applications to ship unicows.dll. But then again, we don't require pthreads-win32.dll, we decided to reinvent the wheel. :-)

...

Even if we provided an alternate implementation that considered Win9X to have char based native path strings, we still have to do conversions between strings and wstrings; filesystem::path traffics in either regardless of the native platform. Since the default conversion has to be the one provided by the operating system if any, we still can't get away from needing the Microsoft Layer for Unicode on Win9X.

We ought to be able to duplicate the MSLU behavior by using WideCharToMultiByte. But MSLU presence is not the main issue here. The main issue is the native character type, which is not wchar_t on Windows 9x, even with MSLU installed.

...

The idea that the default conversion has to be the one provided by the operating system (if the O/S supports conversion between wide and narrow character paths) isn't something that I invented. It has been mentioned by a number of the developers with actual wide/narrow experience that I've asked for advise over the last couple of years. I can think of two committee members (not from the same company, either) who were really adamant about it.

I don't understand this paragraph. Choosing the wrong native character type causes redundant roundtrip conversions, one in Boost.Filesystem, one in the OS (or the MSLU add-on, which is not part of the OS, BTW.) I don't see what the reasonable opinions of the two committee members have to do with that. FWIW, I've repeatedly expressed the same opinion.

Peter Dimov

12:13 p.m.

Peter Dimov wrote:

...

Choosing the wrong native character type causes redundant roundtrip conversions, one in Boost.Filesystem, one in the OS.

Let me expand on that a little. It is _fundamentally wrong_ to assume that all present and future OS APIs have a single native character type. Consider a case where a dual API OS has access to two logical volumes C: and D:, where the file system on C: stores the filenames as 16 bit UTF-16, and the file system on D: uses narrow characters. Now the behavior of the calls is as follows: CreateFileA( "C:/foo.txt" ); // char -> wchar_t OS conversion CreateFileW( L"C:/foo.txt" ); // no OS conversion CreateFileA( "D:/foo.txt" ); // no OS conversion CreateFileW( L"D:/foo.txt" ); // wchar_t -> char OS conversion Furthermore, consider a typical scenario where the application has its own "native" character type, app_char_t. In a design that enforces a single "native" character type boost_fs_char_t ("native" is a deceptive term due to the above scenario), there are potentially redundant (and not necessarily preserving) conversions from app_char_t to boost_fs_char_t and then from boost_fs_char_t to the filesystem character type. In my opinion, the Boost filesystem library should pass the application characters _exactly as-is_ to the underlying OS API, whenever possible. It should not impose its own "native character" ideas upon the user nor upon the OS.

Beman Dawes

3:14 p.m.

At 07:13 AM 11/15/2004, Peter Dimov wrote:

...

Peter Dimov wrote:

...
Choosing the wrong native character type causes redundant roundtrip conversions, one in Boost.Filesystem, one in the OS.

Let me expand on that a little.

It is _fundamentally wrong_ to assume that all present and future OS APIs

...

have a single native character type.

The actual wording of PJP's paper was that for paths (not the entire OS API's), one type could be considered "fundamental".

...

Consider a case where a dual API OS has access to two logical volumes C: and D:, where the file system on C: stores the filenames as 16 bit UTF-16, and the file system on D: uses narrow characters.

That happens all the time on Windows. Often the A: drive is a narrow character FAT filesystem.

...

Now the behavior of the calls is as follows:

CreateFileA( "C:/foo.txt" ); // char -> wchar_t OS conversion CreateFileW( L"C:/foo.txt" ); // no OS conversion CreateFileA( "D:/foo.txt" ); // no OS conversion CreateFileW( L"D:/foo.txt" ); // wchar_t -> char OS conversion

Yes, that's my understanding too.

...

Furthermore, consider a typical scenario where the application has its own "native" character type, app_char_t. In a design that enforces a single "native" character type boost_fs_char_t ("native" is a deceptive term due to the above scenario), there are potentially redundant (and not necessarily preserving) conversions from app_char_t to boost_fs_char_t and then from boost_fs_char_t to the filesystem character type.

Yes. Note that even if a dual scheme is used, that same situation might arise: if ( fs::exists( "c:foo" ) ) ... if ( fs::exists( L"d:foo" ) ) ... Notice that a narrow character path was given for the wide-character filesystem and a wide character path given for the narrow-character file system. If the type of the user supplied path is what determines the API to use, the O/S may still have to do conversions when there is a mismatch with the file system. Do you see any alternative? If the library queried the O/S about the path (which I'm not sure is always possible) to see if the filesystem was wide or narrow, a conversion would still have to be done if the user supplied path used the other char type. That saves nothing and adds the cost of the query.

...

In my opinion, the Boost filesystem library should pass the application characters _exactly as-is_ to the underlying OS API, whenever possible. It should not impose its own "native character" ideas upon the user nor upon

...

the OS.

Your strongest argument IMO is the point about conversions not necessarily being value preserving. (I guess we could tell Windows users that they should not expect such conversions to work unless supported by the applicable codepage. But that seems spin rather than a real solution.) The efficiency argument is certainly real, but I don't see it as being quite as strong. (It will be important for some users, however. Think of very small or embedded systems.) If the rule is that there is some type (char or wchar_t) associated with each path, and the library will always use the native API of that type if available, then it seems to me that the arguments in favor of a single path class weaken considerably. Sure the library can keep track at runtime of whether a particular path is wide or narrow, but it is much more normal in C++ to distinguish at compile time. In other words, separate path and wpath classes. In discussion on the C++ committee's library reflector, there wasn't demand for a templatized basic_path type. AFAICS, a templatized basic_path type could be added later if demand arose. --Beman

Peter Dimov

4:43 p.m.

Beman Dawes wrote:

...

At 07:13 AM 11/15/2004, Peter Dimov wrote:

...
Now the behavior of the calls is as follows:

CreateFileA( "C:/foo.txt" ); // char -> wchar_t OS conversion CreateFileW( L"C:/foo.txt" ); // no OS conversion CreateFileA( "D:/foo.txt" ); // no OS conversion CreateFileW( L"D:/foo.txt" ); // wchar_t -> char OS conversion

Yes, that's my understanding too.

Right. Now this is what happens when you declare wchar_t to be "native". fs::function( "C:/foo.txt" ) converts to wchar_t, circumventing the OS conversion (you just argued that this was unacceptable). fs::function( L"C:/foo.txt" ); works as before. fs::function( "D:/foo.txt" ) converts to wchar_t, then the OS converts back (to "D:/foo.txt" if we are lucky). fs::function( L"D:/foo.txt" ) works as before.

...

...
Furthermore, consider a typical scenario where the application has its own "native" character type, app_char_t. In a design that enforces a single "native" character type boost_fs_char_t ("native" is a deceptive term due to the above scenario), there are potentially redundant (and not necessarily preserving) conversions from app_char_t to boost_fs_char_t and then from boost_fs_char_t to the filesystem character type.

Yes. Note that even if a dual scheme is used, that same situation might arise:

if ( fs::exists( "c:foo" ) ) ... if ( fs::exists( L"d:foo" ) ) ...

Notice that a narrow character path was given for the wide-character filesystem and a wide character path given for the narrow-character file system. If the type of the user supplied path is what determines the API to use, the O/S may still have to do conversions when there is a mismatch with the file system.

Of course. The point I am making is that, on a dual OS, (a) the only place where conversions happen must be the OS and (b) at most one conversion should occur.

...

Do you see any alternative?

There need not be an alternative. The idea is to not do worse than that.

...

Your strongest argument IMO is the point about conversions not necessarily being value preserving.

This is only an example of how things can go wrong. Another would be when a service pack or a new version changes the default OS conversion. If you rely on the ability to duplicate the OS behavior exactly, there'll be trouble. The bottom line is that, on a dual OS, if we get a narrow string from the user, we should ultimately pass a narrow string to the OS, and if we get a wide string, we should pass a wide string. Everything else will be wrong in some contexts.

Peter Dimov

4:55 p.m.

Peter Dimov wrote:

...

Of course. The point I am making is that, on a dual OS, (a) the only place where conversions happen must be the OS and (b) at most one conversion should occur.

Under the assumption that the user is consistent in his use of a single character type, of course. With a single path design, mixing char and wchar_t in the same path object will obviously force the library to convert. A separate path/wpath design never needs to convert above the dual API layer (although the dual API emulation layer that sits atop single API OSes will convert).

Beman Dawes

11:34 p.m.

At 11:43 AM 11/15/2004, Peter Dimov wrote:

...

... The bottom line is that, on a dual OS, if we get a narrow string from the

...

user, we should ultimately pass a narrow string to the OS, and if we get a wide string, we should pass a wide string. Everything else will be wrong in some contexts.

You have pretty well convinced me! I'm off tomorrow on a three day business trip, and I'll let the arguments rattle around in my head awhile to be sure. That still leaves the question of single path vs separate path and wpath classes. A single path class looks very good in a lot of ways but we still have to decide how to deal with this case on a dual narrow/wide O/S: path p( "foo" ); p /= L"bar"; How about a rule that if any portion of a path is wide, the entire path gets converted to wide? I haven't tried to follow that through to see what the implications would be. What about directory iteration? Is that wide or narrow? Don't directory_iterators have to come as two types, narrow and wide? --Beman

Peter Dimov

16 Nov 16 Nov

12:31 a.m.

Beman Dawes wrote:

...

That still leaves the question of single path vs separate path and wpath classes. A single path class looks very good in a lot of ways but we still have to decide how to deal with this case on a dual narrow/wide O/S: path p( "foo" ); p /= L"bar";

How about a rule that if any portion of a path is wide, the entire path gets converted to wide?

It's either that, or preserving the original width (thickness?). I'm not sure which is better, but see below.

...

What about directory iteration? Is that wide or narrow? Don't directory_iterators have to come as two types, narrow and wide?

With a path+wpath design, it's the user's choice. Iterating over a path returns a narrow iterator, and iterating over a wpath returns a wide iterator. With a single path, we have two options. Do as above, or choose a preferred character type whenever the library needs to return a path to the user (that may or may not vary depending on the filesystem the path points to.) In the latter case, it makes sense to make operator/= preserve the width of the returned path; presumably the library had its reasons to choose one over the other. See, I was right about the single path design creating more problems. ;-) path+wpath looks almost trivial in comparison.

Vladimir Prus

18 Nov 18 Nov

12:31 p.m.

Peter Dimov wrote:

...

Beman Dawes wrote:

...
That still leaves the question of single path vs separate path and wpath classes. A single path class looks very good in a lot of ways but we still have to decide how to deal with this case on a dual narrow/wide O/S: path p( "foo" ); p /= L"bar";

How about a rule that if any portion of a path is wide, the entire path gets converted to wide?

It's either that, or preserving the original width (thickness?). I'm not sure which is better, but see below.

The very same question arises with path/wpath: path p( "foo") p /= wpath(L"bar");

...

...
What about directory iteration? Is that wide or narrow? Don't directory_iterators have to come as two types, narrow and wide?

With a path+wpath design, it's the user's choice. Iterating over a path returns a narrow iterator, and iterating over a wpath returns a wide iterator.

With a single path, we have two options. Do as above, or choose a preferred character type whenever the library needs to return a path to the user (that may or may not vary depending on the filesystem the path points to.)

What does windows do if a file has wide filename, you use narrow interface, and wide filename cannot be concerted to 8-bit encoding without loss of data? - Volodya

Peter Dimov

12:47 p.m.

Vladimir Prus wrote:

...

Peter Dimov wrote:

...
Beman Dawes wrote:

...
That still leaves the question of single path vs separate path and wpath classes. A single path class looks very good in a lot of ways but we still have to decide how to deal with this case on a dual narrow/wide O/S: path p( "foo" ); p /= L"bar";

How about a rule that if any portion of a path is wide, the entire path gets converted to wide?

It's either that, or preserving the original width (thickness?). I'm not sure which is better, but see below.

The very same question arises with path/wpath:

path p( "foo") p /= wpath(L"bar");

We can avoid it by not defining a mixed operator/=, though.

Vladimir Prus

1 p.m.

Peter Dimov wrote:

...

...
The very same question arises with path/wpath:

path p( "foo") p /= wpath(L"bar");

We can avoid it by not defining a mixed operator/=, though.

Yes, but what if I need to do this? Then there should be some conversion function, and why that conversion function can't be called by mixed operator/=, then? - Volodya

Peter Dimov

1:18 p.m.

Vladimir Prus wrote:

...

Peter Dimov wrote:

...
...
The very same question arises with path/wpath:

path p( "foo") p /= wpath(L"bar");

We can avoid it by not defining a mixed operator/=, though.

Yes, but what if I need to do this? Then there should be some conversion function, and why that conversion function can't be called by mixed operator/=, then?

Because the library may not convert the way you want it to. Even if a wpath is convertible to a path, the converting constructor should probably be explicit. p /= wpath(L"bar"); seems fine at first sight, but it's actually an error; it should have been p /= "bar"; Without an implicit conversion, p /= path(wpath(L"bar")); the redundant conversions are much more evident. And to get back to the original question, p always stores a narrow path, so there is no ambiguity whether L"bar" needs to be narrowed, or "foo" is to be widened. There is a potential ambiguity with a mixed operator/, though, if one is provided. Which is why there shouldn't be a mixed operator/. ;-)

John Maddock

15 Nov 15 Nov

10:43 a.m.

...

No, MSLU works, as far as I know. But we can't assume its existence. A library that doesn't support Win9x without MSLU will not be useful for users that need to write code that works on a wide variety of machines. Win9x's installed base, although declining, is still significant.

And after I suggested that, I realised that there are compilers other than MSVC :-) Actually given the small number of API's involved, it wouldn't be that hard to wrap them ourselves entirely within the library (ie no separate dll to redistribute etc), but we can still leave worrying about that till after Beman's proved the other concepts work out OK; whatever end up doing, this detail is solvable IMO. John.

Peter Dimov

12 Nov 12 Nov

11:53 p.m.

Beman Dawes wrote:

...

At 02:26 PM 11/12/2004, Peter Dimov wrote:

...
I still don't get it. I guess that we need code. Either way, it is the user doing the conversion. They aren't helped one bit.

Yes, either way the user has to supply the conversion code. But without direct support, the user must call that code manually (or more likely in a user written wrapper.) But if path directly supports user conversions, then the conversion functions are called automatically by path objects when needed, and that is a win for the user.

Let's illustrate this with an example. I have a path type X that stores UTF-8 and a path type Y that stores Shift-JIS, both are instantiations of basic_path<char, Tr> for some TrX and TrY. How do you envision the conversion being called (a converting constructor, explicit, implicit, a dedicated function) and which of TrX and TrY will be used for the conversion? ;-)

Beman Dawes

13 Nov 13 Nov

1:40 a.m.

At 06:53 PM 11/12/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 02:26 PM 11/12/2004, Peter Dimov wrote:

...
I still don't get it. I guess that we need code. Either way, it is the user doing the conversion. They aren't helped one bit.

Yes, either way the user has to supply the conversion code. But without direct support, the user must call that code manually (or more likely in a user written wrapper.) But if path directly supports user conversions, then the conversion functions are called automatically by path objects when needed, and that is a win for the user.

Let's illustrate this with an example. I have a path type X that stores UTF-8 and a path type Y that stores Shift-JIS, both are instantiations of

...

basic_path<char, Tr> for some TrX and TrY.

How do you envision the conversion being called (a converting constructor, explicit, implicit, a dedicated function) and which of TrX and TrY will be used for the conversion?

My prototype looks like this: template< class String > class basic_path ... { public: basic_path( const String & ); // uses default for path_traits<String> basic_path( const String &, conversion_object ); // imbue non-default ... (I've vague on type of conversion_object because it is still in flux) So this design does not have any way to mix Shift-JIS and UTF-8. The single path class design would allow mixing the two. In that design conversion occurs at the time individual elements are appended, so whichever conversions are in effect at the time are applied. path p( L"..." ); // default UTF-8 in effect p.imbue( Shift-JIS conversion ); // exact form undecided as of yet p /= L"..."; // the leaf will be Shift-JIS encoded The single path class design is more flexible. --Beman

Edward Diener

12 Nov 12 Nov

11:06 p.m.

Beman Dawes wrote:

...

At 08:36 PM 11/11/2004, Peter Dimov wrote:

...
Anything besides path/wpath is even less useful than basic_string that >isn't string or wstring, amazing as this may be, and we all know how >popular basic_string is.

The issue isn't the popularity of basic_string. As long as there are even a few users who depend on basic_strings other than string and wstring, the committee will probably want to support it.

Also, remember that basic_string<char16_t> and basic_string<char32_t> may well be mandated in the fairly close future.

I agree and have also attempted to make this point before, particularly in comp.std.c++. It is certainly possible that other native character types will be added to C++ in the future. Perhaps also some sort of unicode character encoding as has been discussed in another thread. Surely it must be easier to use the C++ template to create strings using a new character type than to tack on more functionality to a generalized string class that must accomodate all future character types.

Thorsten Ottosen

11 Nov 11 Nov

6:46 p.m.

Hi Beman, "Beman Dawes" <bdawes@acm.org> wrote in message news:6.0.3.0.2.20041111114438.028c55d8@mailhost.esva.net... | I'm currently working on a major revision of the Boost filesystem library | aimed at getting it ready to be proposed to the C++ committee for the next | Library TR. I have a few requests: 1. there should be a really easy way to iterate over a directory in a recursive fashion; possibly with the chioce of doing DFS or BFS. I know we have discussed this before, but I feel it is so commonly used that it should be supported. 2. there should be a functions like this: template< class Seq > Seq load_file( const std::string& ); template< class Seq > Seq load_file( const path& ); which would permit stuff like std::string s = load_file<std::string>( "foo.txt" ); A naive implementation would simply do Seq seq( iter, iter ); return seq; but the idea is of course that faster methods like mem-mapping could be used. -Thorsten

Beman Dawes

12 Nov 12 Nov

1:17 a.m.

New subject: [filesystem] Major changes before standardization

...

Hi Beman,

"Beman Dawes" <bdawes@acm.org> wrote in message news:6.0.3.0.2.20041111114438.028c55d8@mailhost.esva.net... | I'm currently working on a major revision of the Boost filesystem

At 01:46 PM 11/11/2004, Thorsten Ottosen wrote: library

...

| aimed at getting it ready to be proposed to the C++ committee for the next | Library TR.

I have a few requests:

1. there should be a really easy way to iterate over a directory in a recursive fashion; possibly with the chioce of doing DFS or BFS. I know we have discussed this before, but I feel it is so commonly used that it should be supported.

2. there should be a functions like this:

template< class Seq > Seq load_file( const std::string& );

template< class Seq > Seq load_file( const path& );

which would permit stuff like

std::string s = load_file<std::string>( "foo.txt" );

A naive implementation would simply do Seq seq( iter, iter ); return seq; but the idea is of course that faster methods like mem-mapping could be used.

AFAICS, those functions can be built on top of the existing filesystem functionality. Thus they are lower priority for me than internationalization. But, yes, they would be useful, and I have often had to hand-coded both recursion and bulk load/save. That's the point of the convenience header; to be able to have at hand convenient functionality built on top of the library's core functionality. --Beman

David Abrahams

13 Nov 13 Nov

1:22 a.m.

"Thorsten Ottosen" <nesotto@cs.auc.dk> writes:

...

I have a few requests:

1. there should be a really easy way to iterate over a directory in a recursive fashion; possibly with the chioce of doing DFS or BFS. I know we have discussed this before, but I feel it is so commonly used that it should be supported.

Do python -c "import os; help(os.walk)" For a description of a proven interface for this. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Vladimir Prus

12 Nov 12 Nov

7:20 a.m.

Beman Dawes wrote:

...

The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

I don't this that's a good idea. Which type is supposed to be used in binary library interfaces? path or wpath? If 'path', then what happens if I pass a wpath to that library? If 'wpath' should always be used, then why would I ever want to use 'path'? I'd be much happier with design like this: class path { public: path(char*); path(wchar_t*); string file_string() const; wstring file_wstring() const; }; Could you comment on the discussion we had on this on the Boost-users list: http://thread.gmane.org/gmane.comp.lib.boost.user/6337 http://news.gmane.org/find-root.php?message_id=%3c200406221823.i5MINpjj02002... - Volodya

Peter Dimov

11:26 a.m.

Vladimir Prus wrote:

...

Beman Dawes wrote:

...
The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

I don't this that's a good idea. Which type is supposed to be used in binary library interfaces? path or wpath? If 'path', then what happens if I pass a wpath to that library? If 'wpath' should always be used, then why would I ever want to use 'path'?

Binary library interfaces should provide both.

...

I'd be much happier with design like this:

class path { public: path(char*); path(wchar_t*); string file_string() const; wstring file_wstring() const; };

Right, this makes omitting an overload impossible, but it may create other problems. :-)

Beman Dawes

4:34 p.m.

New subject: [filesystem] Major changes before standardization

...

Beman Dawes wrote:

...
The critical technical change required is internationalization. The

At 02:20 AM 11/12/2004, Vladimir Prus wrote: plan

...

...
is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

I don't this that's a good idea. Which type is supposed to be used in binary library interfaces? path or wpath?

For the operations functions, neither. Instead a common base class which works for any kind of basic_path. That way you don't expose the native API in headers, or get a size explosion.

...

If 'path', then what happens if I pass a wpath to that library?

Implementation defined conversion occurs.

...

If 'wpath' should always be used, then why would I ever want to use 'path'?

wpath would be a much better choice for programs which are expected to handle international character sets. path is fine for programs which never expect to handle international character sets.

...

I'd be much happier with design like this:

class path { public: path(char*); path(wchar_t*); string file_string() const; wstring file_wstring() const; };

...

Could you comment on the discussion we had on this on the Boost-users

A single path class approach is really interesting. Note, however, that it is a good bit more complicated than your synopsis above because of the need to provide templated member functions to handle user defined types. Ditto for std:basic_string types. One of the advantages is that it extends to user programs some of the flexibility that otherwise would only be available by hand coding overloads of user function. I spent awhile last week trying to work out the specifics of a single path class approach. Most of it worked pretty well. What stopped me was that if a user is allowed to supply conversion functions, then it is easy to inadvertently compose paths (via operator /=) which mix encoding schemes in the same path. The thought of a filename encoded as UTF-8 living in a directory encoded in shift-JIS boggled my mind. Is this something that we should allow the user to do, if they wish, and if their O/S gives them enough freedom to do so? Or is it so perverse we are negligent in making it easy? I'll give the single path approach some more thought, going on the assumption that there is no need to detect or diagnose mixed encodings. That was the showstopper, but it is probably overly protective. list:

...

http://thread.gmane.org/gmane.comp.lib.boost.user/6337

It just takes too much time to comment in detail. Mostly I agree with you and Peter, but wonder if you are aware of some of the committee's views. Need to support various string types, proposal to add wide character filename signatures, etc. Thanks, --Beman

Peter Dimov

6:54 p.m.

New subject: [filesystem] Major changes before standardization

Beman Dawes wrote:

...

At 02:20 AM 11/12/2004, Vladimir Prus wrote:

...
I'd be much happier with design like this:

class path { public: path(char*); path(wchar_t*); string file_string() const; wstring file_wstring() const; };

A single path class approach is really interesting. Note, however, that it is a good bit more complicated than your synopsis above because of the need to provide templated member functions to handle user defined types.

Is there _really_ such a need?

Beman Dawes

8:43 p.m.

New subject: [filesystem] Major changes before standardization

At 01:54 PM 11/12/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 02:20 AM 11/12/2004, Vladimir Prus wrote:

...
I'd be much happier with design like this:

class path { public: path(char*); path(wchar_t*); string file_string() const; wstring file_wstring() const; };

A single path class approach is really interesting. Note, however, that it is a good bit more complicated than your synopsis above because of the need to provide templated member functions to handle user defined types.

Is there _really_ such a need?

That's a good question. I personally think the need for UDT support is too marginal to worry about. I've just posted a query to the C++ Library Working Group's reflector to see if anyone there has strong feelings one way or another. Thanks, --Beman

Peter Dimov

13 Nov 13 Nov

12:11 a.m.

Beman Dawes wrote:

...

At 01:54 PM 11/12/2004, Peter Dimov wrote:

...
Beman Dawes wrote:

...
A single path class approach is really interesting. Note, however, that it is a good bit more complicated than your synopsis above because of the need to provide templated member functions to handle user defined types.

Is there _really_ such a need?

That's a good question. I personally think the need for UDT support is too marginal to worry about.

The interesting thing is that I can see a need for a generic path library (filesystem-independent). It even makes perfect sense for this library to be templated on the character type (because the operations do not depend on a specific type) and if I squint just right I see some benefits in adding a 'class Tr = path_traits<Ch>' parameter that will supply the path separator character and the escape character (if any). However I don't see how such a generic path library can be separated from the filesystem code. The generic grammar is simply not expressive enough to handle all native path quirks.

David Abrahams

1:24 a.m.

A meta-comment: it sounds to me like this library may not be ready to be proposed for standardization yet. Shouldn't we give these issues some time to bake first? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Beman Dawes

2:16 a.m.

New subject: [filesystem] Major changes before standardization

At 08:24 PM 11/12/2004, David Abrahams wrote:

...

A meta-comment: it sounds to me like this library may not be ready to be proposed for standardization yet. Shouldn't we give these issues some time to bake first?

We'll have a Boost implementation long before standardization. Most of the proposal will be identical to the current boost library. Most of class path will be identical to the current class path. Since it will be two or more years before final acceptance of a standards proposal, it seems there will be plenty of time to bake. We'll see. --Beman

Beman Dawes

1:45 a.m.

New subject: [filesystem] Major changes before standardization

At 07:11 PM 11/12/2004, Peter Dimov wrote:

...

Beman Dawes wrote:

...
At 01:54 PM 11/12/2004, Peter Dimov wrote:

...
Beman Dawes wrote:

...
A single path class approach is really interesting. Note, however, that it is a good bit more complicated than your synopsis above because of the need to provide templated member functions to handle user defined types.

Is there _really_ such a need?

That's a good question. I personally think the need for UDT support is too marginal to worry about.

The interesting thing is that I can see a need for a generic path library

...

(filesystem-independent). It even makes perfect sense for this library to be

templated on the character type (because the operations do not depend on a specific type) and if I squint just right I see some benefits in adding a

...

'class Tr = path_traits<Ch>' parameter that will supply the path separator character and the escape character (if any).

However I don't see how such a generic path library can be separated from

...

the filesystem code. The generic grammar is simply not expressive enough to handle all native path quirks.

That's correct. For example, it doesn't have an escape mechanism to represent '/' in an element name. At least so far, no one has come up with any practical cases where that is a problem. --Beman

Rogier van Dalen

16 Nov 16 Nov

2:05 p.m.

On Thu, 11 Nov 2004 12:20:23 -0500, Beman Dawes <bdawes@acm.org> wrote:

...

I'm currently working on a major revision of the Boost filesystem library aimed at getting it ready to be proposed to the C++ committee for the next Library TR.

The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

Does this suppose that all OS paths can be represented as a string of wchar's? Does it suppose that the whole range that wchar_t can cover is usable in file names? Would that have to be added to the wchar_t definition? What if I want to use a 32-bit Unicode codepoint for a filename on Windows? Regards, Rogier

Peter Dimov

4:12 p.m.

Rogier van Dalen wrote:

...

What if I want to use a 32-bit Unicode codepoint for a filename on Windows?

Windows doesn't expose an API that takes 32 bit characters.

John Maddock

4:16 p.m.

...

Does this suppose that all OS paths can be represented as a string of wchar's? Does it suppose that the whole range that wchar_t can cover is usable in file names? Would that have to be added to the wchar_t definition? What if I want to use a 32-bit Unicode codepoint for a filename on Windows?

wchar_t is supposed to be able to hold all the characters that are supported by the operating system, so any narrow character multi-byte sequence should be representable as a wide character one, but... Not all wide character sequences can be represented as a narrow character one (depending upon the encoding used), and if the underlying OS uses narrow character strings as path names (unix), and the wide character sequence can not be converted, then the file can not exist, so we get an error either which way. As for using UTF-32 strings on Windows, are surrogates allowed as path name characters? If "yes" then I guess it's supported (but you may have to do the UTF-32 to UTF-16 conversion yourself), this is an area that needs more thought. John.

Beman Dawes

19 Nov 19 Nov

7:46 p.m.

At 11:16 AM 11/16/2004, John Maddock wrote:

...

... Not all wide character sequences can be represented as a narrow character

...

one (depending upon the encoding used), and if the underlying OS uses narrow character strings as path names (unix), and the wide character sequence can not be converted, then the file can not exist, so we get an error either which way.

Notice too that if a narrow character path O/S is very restrictive as to what characters can appear in a path, then common encodings usually viewed as quite portable wouldn't work for that O/S. --Beman

Cory Nelson

13 Feb 13 Feb

10:10 a.m.

Well? It's been a few months, has there been any progress? I know a good deal of people excited to use wpath :) On Thu, 11 Nov 2004 12:20:23 -0500, Beman Dawes <bdawes@acm.org> wrote:

...

I'm currently working on a major revision of the Boost filesystem library aimed at getting it ready to be proposed to the C++ committee for the next Library TR.

The critical technical change required is internationalization. The plan is to provide a templated basic_path class, with typedefs for path and wpath. In other words, an approach very similar to the current std::basic_string, std::string, and std::wstring.

Doing this adds a certain amount of complexity compared to the current path class. For example, a path_traits class has to be introduced to give basic_paths on user defined types a way to import the conversions, delimiters, and other traits.

To offset the added complexity, I'd like to reduce some of the complexities in the current design:

* Eliminate the distinction between native and generic grammars; either would be permitted in all contexts. As well as simplifying the class interface a bit, this will also eliminate a source of user confusion.

* Move the name error checking from the basic_path class into stand alone functions. Although the current error checking could be further improved by Peter Dimov's suggested change allowing failure to be treated as either a warning or a hard error, moving it out of the class simplifies the interface, allows full path (rather than name-by-name) error checks, and eases internationalization.

The primary downside of these changes will be that users who want path portability checks will have to code calls to checking functions that are currently being called automatically.

The internationalization of the library is a big enough change that we will probably want to have at least a mini-review once the changes are complete.

Comments?

--Beman

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Cory Nelson http://www.int64.org

Beman Dawes

10:05 p.m.

At 05:10 AM 2/13/2005, Cory Nelson wrote:

...

Well? It's been a few months, has there been any progress? I know a good deal of people excited to use wpath :)

I'm very actively coding and testing. The code implementing the path header is pretty much complete, except for some of the name checking functions, and is passing current narrow regression tests on Windows. The code implementing the operations header is perhaps 95% complete for Windows, and 50% complete for POSIX, and is passing current narrow regression tests on Windows except for the few functions not yet implemented. I've done a bit of informal testing of wpath; since the code is now fully templatized, anything that works for path tends to work for wpath since both are just typedefs of basic_path<>. I also now have a full Linux machine set up, so I'll be able to test on Linux without having to borrow parts from my Windows box. The plan is to check in an "i18n" branch within the next two weeks or so. I'm glad to hear that there is at least one person out there who is interested in wpath:-) --Beman

7446

Age (days ago)

7540

Last active (days ago)

List overview

Download

55 comments

12 participants

participants (12)

Aaron W. LaFramboise
Beman Dawes
Bruno Martínez Aguerre
Cory Nelson
David Abrahams
Edward Diener
John Maddock
Jonathan Turkanis
Peter Dimov
Rogier van Dalen
Thorsten Ottosen
Vladimir Prus