[filesystem] i18n branch committed to CVS

older
[function_types][review] TODO-list...

Beman Dawes

23 Mar 2005 23 Mar '05

3:07 p.m.

CVS now contains a branch "i18n" of the filesystem directories: * Class templates basic_path, basic_directory_iterator, etc, support narrow, wide, and user-defined path types. Typedefs path, directory_iterator, etc, are provided, so most existing code continues to work. * Name checking has been moved out of class basic_path, so it no longer gets in the way of those not using the name checking facility. Deprecated versions of various functions are provided to preserve existing code. * The need to explicitly identify native format paths via special constructors has been eliminated, simplifying casual use of the library. Deprecated versions of those constructors are provided to preserve existing code. More detailed documentation is available in the i18n CVS branch at <boost-root>/libs/filesystem/doc/i18n.html. Most of the other documentation has also been updated to reflect the changes. Current issues with the i18n branch include: * Testing has been limited to Intel 8.1, Metrowerks 9.3, and Microsoft 7.1 on Windows, and GCC 3.3 on Mac OS X. How much effort to put into broken compilers is going to be a real issue - internationalization may have to be disabled, and that is a lot of work. * Some tests (particularly in operations_test.cpp) are commented out. Minor issues need to be resolved and the tests reactivated. * Two or three of the less important doc files haven't been updated yet. * The convenience header and source code hasn't been converted yet. * The UTF-8 conversion facet header and source is duplicated. Need to identify and fix the issues that prevented the boost/detail version from being used. * The "// TODO" notes in the code need to be revisited, issues fixed, and the notes eliminated. * Testing on Windows 9X would be helpful, and issues fixed or documented. * Backed up issues and patches from the mailing list need to be addressed. * Windows tests need to be run on a FAT filesystem to see what issues arise. All of the Windows analysis and design effort so far has gone into NTFS, so there may be critical problems with filesystems that only support narrow paths. * The POSIX wpath implementation assumes that UTF-8 is always the operating system's preferred external path encoding. If any Boost users are concerned about other encodings, please let me know. * A timing test is needed, particularly for directory iteration. There is concern over excessive copying by basic_path conversion functions, so need to test if speed is a real problem in a realistic application. Any comments would be appreciated (but please look at the docs first, as a lot of effort went into rationale and trying to answer anticipated questions). My plan is to pause development for a while, both to allow time for comments to be made, and also so I can attack a backlog of non-Boost work that has been building up. --Beman

Show replies by date

Vladimir Prus

31 Mar 31 Mar

9:22 a.m.

On Wednesday 23 March 2005 18:07, Beman Dawes wrote:

...

CVS now contains a branch "i18n" of the filesystem directories:

* Class templates basic_path, basic_directory_iterator, etc, support narrow, wide, and user-defined path types. Typedefs path, directory_iterator, etc, are provided, so most existing code continues to work.

I recall we had a long discussion concerning basic_path vs. single path type. I don't think results of that discussion are present in i18n.html -- essentially, there's no rationale for going with basic_path. There were several distinct issues. First is that if you have single path that stores unicode, then exists(path("foo")) will perform char -> wchar_t conversion inside path constructor, and that conversion might be not exactly the same that OS would have performed. One issues is that program might not have initialized global locale with locale(""). Another is that conversion performed by OS might be different then those of locale(""). I must admit I don't know when it might be the case on Windows (and POSIX don't do such conversions). So, I'd really like to know about real use cases. After all, QFile + QString works on windows. See docs at http://doc.trolltech.com/3.3/qfile.html The second issue, only relevant if the above one is real, is mixing different types of path. With single path: path p("a"), p2(L"b"); p /= p2; // must do conversion, might not do what's desired With basic_path: path p("a"); wpath p2(L"b"); p /= p2; // won't compile p /= path(p2); // explicit conversion is clearly seen. This again relies on the assumption that conversion from char to wchar_t might not do exactly the same as OS conversion would do. The third issue is that I don't like templated implementation of all functions. There's already compiled library, why not move all code there. For example: class common_path { public: char* data; bool is_wide; }; bool exists(const common_path& p) { if (p.is_wide) SomeOSFunctionW((wchar_t)*)p.data); else SomeOSFunctionA(p.data); } Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the interface, should I use path or wpath in it? If I use path, then due to missing conversion, the library is unusable with other code that uses wpath. So I need to use wpath. And so basically, all libraries need to use wpath everywhere. So, why do you need path at all?

...

* The POSIX wpath implementation assumes that UTF-8 is always the operating system's preferred external path encoding. If any Boost users are concerned about other encodings, please let me know.

I certainly do. The standard encoding for russian on Linux is koi8-r. Probably, we need to use the conversion facet that's part of global locale. Qt uses char *charset = nl_langinfo (CODESET); and values of LC_ALL, LC_CTYPE and LC_LANG variables. But then it contains its own translation tables. So using locale("") is the best guess, I think. - Volodya

Dan Rosen

4:53 p.m.

Hi, I've been stuck for the past month working on a Win32 i18n project that seems it will never end. I don't have much background in this area, but I can answer a question or two.

...

First is that if you have single path that stores unicode, then

exists(path("foo"))

will perform char -> wchar_t conversion inside path constructor, and that conversion might be not exactly the same that OS would have performed. One issues is that program might not have initialized global locale with locale(""). Another is that conversion performed by OS might be different then those of locale("").

So, one thing I know is that Windows 9x and NT-class systems behave differently in this respect. You're probably aware that NT-class systems traffic in wchar_t* encoded in UCS-2 internally, and that 9x-class systems deal with char* encoded in the system's ANSI codepage. Additionally, on Win2k/XP you have the ability to set a thread's ANSI codepage separately from the system's ANSI codepage. So, I'm 100% positive about this, but I believe an example of where locale() will differ from what Windows wants is the following case: - The OS is Win2k/XP, which stores strings as UCS-2, - The system and thread codepages differ, - You initialize a path("foo") requiring a conversion up to UCS-2. I think in this case, locale() won't give you what you want. I'm no expert on this, though, so it's worth checking.

...

path p("a"), p2(L"b"); p /= p2; // must do conversion, might not do what's desired

I think this is important to get right. Having path and wpath distinct from each other, and forcing explicit conversion, seems like exposing a choice to users in the interface that's entirely orthogonal to filesystem manipulation. My apologies if this has already been discussed ad nauseam, but it seems to me like the "do the right thing" string conversions should be encapsulated in a different library.

...

Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the interface, should I use path or wpath in it?

What seems to be common practice on Windows is something like this: typedef std::basic_string<TCHAR> tstring; where TCHAR is a macro which expands either to "char" or "wchar_t" depending on whether _UNICODE is defined. This tends to be clumsy, in my opinion. I fear the same practice would be adopted for basic_path<>. Cheers, dr

Beman Dawes

2 May 2 May

12:59 p.m.

At 05:22 AM 3/31/2005, Vladimir Prus wrote:

...

On Wednesday 23 March 2005 18:07, Beman Dawes wrote:

...
CVS now contains a branch "i18n" of the filesystem directories:

* Class templates basic_path, basic_directory_iterator, etc, support narrow, wide, and user-defined path types. Typedefs path, directory_iterator, etc, are provided, so most existing code continues to work.

I recall we had a long discussion concerning basic_path vs. single path type. I don't think results of that discussion are present in i18n.html -- essentially, there's no rationale for going with basic_path.

OK, I'll add rationale. Here is a first draft: During preliminary internationalization discussion on the Boost developer's list, a design was considered for a single path class which could hold either narrow or wide character based paths. That design was rejected because: * There were technical issues with conversions when a narrow path was appended to a wide path, and visa versa. The concern was that double conversions could cause incorrect results, that conversions best left to the operating system would be performed, and that the technical complexity was too great in relation to perceived benefits. User-defined types would only make the problem worse. * The design was, for many applications, an over-generalization with runtime memory and speed costs which would have to be paid for even when not needed. * There was concern that the design would be confusing to users, given that the standard library already uses single-value-type strings, rather than strings which morph value types as needed.

...

...

Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the interface, should I use path or wpath in it? If I use path, then due to missing conversion, the library is unusable with other code that uses wpath. So I

...

need to use wpath. And so basically, all libraries need to use wpath everywhere. So, why do you need path at all?

Applications which need wide-character internationalization will use wpath or other wide-character basic_path types. Applications which don't need wide-character internationalization will use path. Both are needed - they serve different user needs.

...

...
* The POSIX wpath implementation assumes that UTF-8 is always the operating system's preferred external path encoding. If any Boost users are concerned about other encodings, please let me know.

I certainly do. The standard encoding for russian on Linux is koi8-r. Probably, we need to use the conversion facet that's part of global locale. ... So using locale("") is the best guess, I think.

Hum... Point taken. I was hoping to avoid use of global locale because of past unhappy experience with inconsistencies between different UNIX flavors. Perhaps that situation has improved. Maybe a UTF-8 fallback could be provided for systems where global locale is unreliable. --Beman

Vladimir Prus

11 May 11 May

6:03 a.m.

On Monday 02 May 2005 16:59, Beman Dawes wrote:

...

...
I recall we had a long discussion concerning basic_path vs. single path type. I don't think results of that discussion are present in i18n.html >essentially, there's no rationale for going with basic_path.

OK, I'll add rationale. Here is a first draft:

During preliminary internationalization discussion on the Boost developer's list, a design was considered for a single path class which could hold either narrow or wide character based paths. That design was rejected because:

* There were technical issues with conversions when a narrow path was appended to a wide path, and visa versa. The concern was that double conversions could cause incorrect results, that conversions best left to the operating system would be performed, and that the technical complexity was too great in relation to perceived benefits. User-defined types would only make the problem worse.

I think this statement is not proved. Essentially, you are saying that there's an operating system that performs some char->wchar and wchar->char convertions in path operations, but does not provide any API to do the same convertion on plain char* and whar_t* pointers. I find this somewhat hard to believe. It might be true that std::locale cannot do the same conversion as OS's fs layer but: 1. It's a bug in std::locale design/implementation 2. You don't need to use std::locale, you can use OS API

...

* The design was, for many applications, an over-generalization with runtime memory and speed costs which would have to be paid for even when not needed.

I disagree. Consider that your current design does not allow to mix different path types at all. So, we should evaluate the performance of single path design only for the case where char/wchar_t are never fixed -- that is all paths are created either from char, or from wchar_t. Then, the memory overhead is a single bool flag, telling if a path was created from char or whar_t. No operating will need to do any conversion, so runtime overhead is just checking of that flag. I find this overhead very small, compared to the size of memory allocated for path, and the amount of work done by path method. Not to mention that a single OS call is likely to be 1000 times more expensive that this single comparison.

...

* There was concern that the design would be confusing to users, given that the standard library already uses single-value-type strings, rather than strings which morph value types as needed.

I don't think we should stick to std::string design, given that most environments with good Unicode support (Qt, Java, .Net) use a single string type.

...

>Also I note that there's no conversion from basic_path<char> to >basic_path<wchar_t> or vice versa, as far as I can say. To recall my >argument >for conversion: say I have a library which exposes paths in the interface, >should I use path or wpath in it? If I use path, then due to missing >conversion, the library is unusable with other code that uses wpath. So I

>need to use wpath. And so basically, all libraries need to use wpath >everywhere. So, why do you need path at all?

Applications which need wide-character internationalization will use wpath or other wide-character basic_path types. Applications which don't need wide-character internationalization will use path. Both are needed - they serve different user needs.

I think you're missing my point. Yes, the decision for application can probably me made. But if I'm writing a library I don't know if it will be used by application that needs wide paths, or application that does not need wide paths. I have to decide which path type to use in the interface (I'm talking about binary interface specifically). But if there's no path<->wpath convertion, then whatever type I choose, some applications will have troubles using the library, because they would not be able to convert between path types on the library boundary. Even if I provide both types in the interface, if there's no standard path<->wpath conversion, I'll have to either: - write such convertion myself - duplicate all code of the library -- for path and for wpath - Volodya

Beman Dawes

18 May 18 May

9:26 p.m.

...

On Monday 02 May 2005 16:59, Beman Dawes wrote:

...
...
I recall we had a long discussion concerning basic_path vs. single

At 02:03 AM 5/11/2005, Vladimir Prus wrote: path

...

...
...
type. I don't think results of that discussion are present in i18n.html essentially, there's no rationale for going with basic_path.

OK, I'll add rationale. Here is a first draft:

During preliminary internationalization discussion on the Boost developer's list, a design was considered for a single path class which could hold either narrow or wide character based paths. That design was rejected because:

* There were technical issues with conversions when a narrow path was appended to a wide path, and visa versa. The concern was that double conversions could cause incorrect results, that conversions best left to the operating system would be performed, and that the technical complexity was too great in relation to perceived benefits. User-defined types would only make the problem worse.

I think this statement is not proved. Essentially, you are saying that there's an operating system that performs some char->wchar and wchar->char convertions in path operations, but does not provide any API to do the same convertion on plain char* and whar_t* pointers. I find this somewhat hard to believe.

Windows, for one. Although that is really beside the point. The worry is the need for conversions when a path changes from wide to narrow, or visa versa.

...

...
* The design was, for many applications, an over-generalization with runtime memory and speed costs which would have to be paid for even when not needed.

I disagree. Consider that your current design does not allow to mix different path types at all. So, we should evaluate the performance of single path design only for the case where char/wchar_t are never fixed -- that is all ^^^^^ mixed? paths are created either from char, or from wchar_t.

Then, the memory overhead is a single bool flag, telling if a path was created from char or whar_t.

The memory overhead I was worried about wasn't user space for the bool, but the need to link in both narrow and wide versions of functions, particularly on low memory embedded systems.

...

No operating will need to do any conversion, so runtime overhead is just checking of that flag. I find this overhead very small, compared to the size of memory allocated for path, and the amount of work

...

done by path method. Not to mention that a single OS call is likely to be

...

1000 times more expensive that this single comparison.

That comparison isn't a worry for me either.

...

...
* There was concern that the design would be confusing to users, given that the standard library already uses single-value-type strings, rather

than

...

...
strings which morph value types as needed.

I don't think we should stick to std::string design, given that most environments with good Unicode support (Qt, Java, .Net) use a single string type.

A lot of people say they don't like the std::string design, but it is the standard for C++. Perhaps someday another string design will become popular, but that isn't even on the horizon AFAIKS.

...

...
...
Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the interface, should I use path or wpath in it? If I use path, then due to missing conversion, the library is unusable with other code that uses wpath. So I need to use wpath.

Yes. It is the same situation as with std::string vs std::wstring. If you think your app may sometimes have to deal correctly with wide strings (or paths) you should use std::wstring (and wpath).

...

...
...
And so basically, all libraries need to use wpath everywhere. So, why do you need path at all?

Applications which need wide-character internationalization will use wpath or other wide-character basic_path types. Applications which don't need

...

...
wide-character internationalization will use path. Both are needed - they serve different user needs.

I think you're missing my point. Yes, the decision for application can probably me made. But if I'm writing a library I don't know if it will be

...

used by application that needs wide paths, or application that does not >need wide paths.

I have to decide which path type to use in the interface (I'm talking about binary interface specifically). But if there's no path<->wpath convertion, then whatever type I choose, some applications will have troubles using the library, because they would not be able to convert between path types on >the library boundary.

Even if I provide both types in the interface, if there's no standard path<->wpath conversion, I'll have to either:

- write such convertion myself - duplicate all code of the library -- for path and for wpath

Partially in answer this very valid concern, I've exposed the wpath_traits conversion interface. I'm not sure that is a complete solution, but at least you wouldn't have to write the conversion code yourself. Please note that I'm not saying a single-path-type design is dumb or anything like that. It is just that it would be too big a leap without a lot of experimentation, trial use, etc. It would be a lot better to start with a single-string-type design. That's all just too big a project for me, and too much of a research project. I'm very happy with the new version of Boost.Filesystem. I think it smooths many of the rough spots of the current 1.33 version. It attacks most of the problems users have had head on. If someone else wants to do a new library that is even better, great! But that's a new library, not the current one. Thanks for the comments, --Beman

Vladimir Prus

3 Jun 3 Jun

12:13 p.m.

On Thursday 19 May 2005 01:26, Beman Dawes wrote:

...

...
...
the operating system would be performed, and that the technical complexity was too great in relation to perceived benefits. User-defined types

would

...
...
only make the problem worse.

I think this statement is not proved. Essentially, you are saying that there's an operating system that performs some char->wchar and wchar->char convertions in path operations, but does not provide any API to do the

same

...
convertion on plain char* and whar_t* pointers. I find this somewhat hard to believe.

Windows, for one.

Could you be more specific? Which transformation done by the filesystem can't be approximated with the call to MultiByteToWideChar or WideCharToMultiByte?

...

Although that is really beside the point. The worry is the need for conversions when a path changes from wide to narrow, or visa versa.

Why is it a worry?

...

...
I disagree. Consider that your current design does not allow to mix different path types at all. So, we should evaluate the performance of single path design only for the case where char/wchar_t are never fixed -- that is

all ^^^^^ mixed?

Yes, "mixed".

...

...
paths are created either from char, or from wchar_t.

Then, the memory overhead is a single bool flag, telling if a path was created from char or whar_t.

The memory overhead I was worried about wasn't user space for the bool, but the need to link in both narrow and wide versions of functions, particularly on low memory embedded systems.

Do you have the specifics? What OS/hardware do you have in mind? IIRC, Windows converts all user-provided paths into internal representation anyway. And isn't Java, that uses single-string type, works on such low-memory devices as mobile phones?

...

string

...
type.

A lot of people say they don't like the std::string design, but it is the standard for C++. Perhaps someday another string design will become popular, but that isn't even on the horizon AFAIKS.

And if boost::path is accepted into standard as templated class, then any new string class will have even fewer chances. "Look, the path class is also templates", everybody will say.

...

...
...
...
Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the

interface,

...
should I use path or wpath in it? If I use path, then due to missing conversion, the library is unusable with other code that uses wpath.

So

...
...
...
I need to use wpath.

Yes. It is the same situation as with std::string vs std::wstring. If you think your app may sometimes have to deal correctly with wide strings (or paths) you should use std::wstring (and wpath).

I keep on making the same argument over and over, but you don't hear it. If I'm writing a library, I have no idea what kind of string the applications will pass to the library. And BTW, what if application's requirements change over time?

...

...
Even if I provide both types in the interface, if there's no standard path<->wpath conversion, I'll have to either:

- write such convertion myself - duplicate all code of the library -- for path and for wpath

Partially in answer this very valid concern, I've exposed the wpath_traits conversion interface. I'm not sure that is a complete solution, but at least you wouldn't have to write the conversion code yourself.

Well, I don't understand how to use it. Can you stetck the code code to convert path to wpath and vice versa? The wpath_traits code seem to deal with strings only.

...

Please note that I'm not saying a single-path-type design is dumb or anything like that. It is just that it would be too big a leap without a lot of experimentation, trial use, etc.

Then, probably it's too early to standardize boost::path. After it's in, it won't be possible to add yet another path type. - Volodya

Beman Dawes

9 Jun 9 Jun

2:48 a.m.

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:200506031613.04544.ghost@cs.msu.su...

...

On Thursday 19 May 2005 01:26, Beman Dawes wrote:

...
... Essentially, you are saying that there's

...
an operating system that performs some char->wchar and wchar->char convertions in path operations, but does not provide any API to do the same convertion on plain char* and whar_t* pointers. I find this somewhat hard to believe.

Windows, for one.

Could you be more specific? Which transformation done by the filesystem can't be approximated with the call to MultiByteToWideChar or WideCharToMultiByte?

It is the "approximated" that is the concern. An approximately correct path isn't good enough. Users expect exact rather than approximate behavior.

...

...
Although that is really beside the point. The worry is the need for conversions when a path changes from wide to narrow, or visa versa.

Why is it a worry?

Some conversions are lossy. That is, they are not full reversible. So on unnecessary round trip can lose data.

...

...

...
...
Then, the memory overhead is a single bool flag, telling if a path was created from char or whar_t.

The memory overhead I was worried about wasn't user space for the bool, but the need to link in both narrow and wide versions of functions, particularly on low memory embedded systems.

Do you have the specifics? What OS/hardware do you have in mind? IIRC, Windows converts all user-provided paths into internal representation anyway. And isn't Java, that uses single-string type, works on such low-memory devices as mobile phones?

Phones that support Java are hardly "low-memory" devices. Regardless, as the cost of memory declines, what constitutes undue memory use changes. So you are correct to point out that this is a minor and declining concern.

...

...
A lot of people say they don't like the std::string design, but it is the standard for C++. Perhaps someday another string design will become popular, but that isn't even on the horizon AFAIKS.

And if boost::path is accepted into standard as templated class, then any new string class will have even fewer chances. "Look, the path class is also templates", everybody will say.

There is a proposal outstanding to add std::string overloads to many of the existing library classes. It isn't just the filesystem library than is further entrenching std::basic_string in the standard. If you want to argue for a runtime polymorphic string class that can change it's internal representation as needed, that's fine. But the way to prove the point is to develop and popularize such a class. Then a library like Boost.Filesystem would have a base to build on. But it doesn't seem to me that Boost.Filesystem is the place to do experiments with dynamically self-configuring strings.

...

...
...
...
...
Also I note that there's no conversion from basic_path<char> to basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument for conversion: say I have a library which exposes paths in the

interface,

...
should I use path or wpath in it? If I use path, then due to missing conversion, the library is unusable with other code that uses wpath.

So

...
...
...
I need to use wpath.

Yes. It is the same situation as with std::string vs std::wstring. If you think your app may sometimes have to deal correctly with wide strings (or paths) you should use std::wstring (and wpath).

I keep on making the same argument over and over, but you don't hear it. If I'm writing a library, I have no idea what kind of string the applications will pass to the library. And BTW, what if application's requirements change over time?

I do very much hear that argument, and find it a very strong argument indeed. But I don't see that as a problem to be solved at the level of Boost.Filesystem. Rather, a replacement for std::basic_string that offers runtime polymorthic self-configuration and interoperability. That is the place to start development IMO. Not in Boost.Filesystem. Boost.Filesystem is an innocent bystander that simply uses the std::basic_string compile-time polymorphism because that is the current standard.

...

...
...
Even if I provide both types in the interface, if there's no standard path<->wpath conversion, I'll have to either:

- write such convertion myself - duplicate all code of the library -- for path and for wpath

Partially in answer this very valid concern, I've exposed the wpath_traits conversion interface. I'm not sure that is a complete solution, but at least you wouldn't have to write the conversion code yourself.

Well, I don't understand how to use it. Can you stetck the code code to convert path to wpath and vice versa? The wpath_traits code seem to deal with strings only.

There is already an example of a user defined path based on strings of longs (which was a real-world example mentioned by someone in the LWG). I'll do another example, where the string type is std::wstring, but the external encoding is user supplied.

...

...
Please note that I'm not saying a single-path-type design is dumb or anything like that. It is just that it would be too big a leap without a lot of experimentation, trial use, etc.

Then, probably it's too early to standardize boost::path. After it's in, it won't be possible to add yet another path type.

People sometimes argue that it is premature to standardize some library component because a better one is just around the corner. The shared_ptr proposal had to fight that battle, for example. The LWG evaluates each proposal on its merits, as perceived at the time the proposal is considered. Once in a very great while, a proposal comes along that is good enough to cause removal of some component that was already accepted. The STL proposal was good enough to justify the remove of a now-forgotten dynamic vector called dynarray that was already voted in. If a much better string class comes along, well, then a filesystem library will be one of many libraries that will have to adapt to accommodate it. --Beman

Vladimir Prus

23 Jun 23 Jun

2:28 p.m.

On Thursday 09 June 2005 06:48, Beman Dawes wrote:

...

...
...
Windows, for one.

Could you be more specific? Which transformation done by the filesystem can't be approximated with the call to MultiByteToWideChar or WideCharToMultiByte?

It is the "approximated" that is the concern. An approximately correct path isn't good enough. Users expect exact rather than approximate behavior.

Well, "approximated" is a wrong word. I meant "which transformation done by filesystem cannot be done by appropriate API function with appropriate flags"? And, *please*, give some specifics. Rationale that says "on some operating system some transformation can't be done by user-lelve API" is too vague, IMO.

...

...
...
Although that is really beside the point. The worry is the need for conversions when a path changes from wide to narrow, or visa versa.

Why is it a worry?

Some conversions are lossy. That is, they are not full reversible. So on unnecessary round trip can lose data.

Again, specifics? Not also that in single path design you need conversion only when mixing different path types. It won't happen on initialization, for example.

...

...
I keep on making the same argument over and over, but you don't hear it. If I'm writing a library, I have no idea what kind of string the applications will pass to the library. And BTW, what if application's requirements change over time?

I do very much hear that argument, and find it a very strong argument indeed. But I don't see that as a problem to be solved at the level of Boost.Filesystem. Rather, a replacement for std::basic_string that offers runtime polymorthic self-configuration and interoperability. That is the place to start development IMO. Not in Boost.Filesystem. Boost.Filesystem is an innocent bystander that simply uses the std::basic_string compile-time polymorphism because that is the current standard.

Ok, I understand you position. The problem is boost::filesystem is something we can discuss, and I see chances of getting new string into standard to be zero. In fact, there is single-string design -- QString (http://doc.trolltech.com/4.0/qstring.html), and IMO, it beats std::string + lexical_cast + boost::format, and is rather popular. But that does not help, because boost::filesystem can't use non-standard class and standard string is not going to change.

...

...
Well, I don't understand how to use it. Can you stetck the code code to convert path to wpath and vice versa? The wpath_traits code seem to deal with strings only.

There is already an example of a user defined path based on strings of longs (which was a real-world example mentioned by someone in the LWG).

I'll do another example, where the string type is std::wstring, but the external encoding is user supplied.

Ok.

...

...
...
Please note that I'm not saying a single-path-type design is dumb or anything like that. It is just that it would be too big a leap without a lot of experimentation, trial use, etc.

Then, probably it's too early to standardize boost::path. After it's in, it won't be possible to add yet another path type.

People sometimes argue that it is premature to standardize some library component because a better one is just around the corner. The shared_ptr proposal had to fight that battle, for example. The LWG evaluates each proposal on its merits, as perceived at the time the proposal is considered. Once in a very great while, a proposal comes along that is good enough to cause removal of some component that was already accepted. The STL proposal was good enough to justify the remove of a now-forgotten dynamic vector called dynarray that was already voted in. If a much better string class comes along, well, then a filesystem library will be one of many libraries that will have to adapt to accommodate it.

I can only hope it's really possible. - Volodya

Beman Dawes

10:38 p.m.

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:200506231828.03375.ghost@cs.msu.su...

...

On Thursday 09 June 2005 06:48, Beman Dawes wrote:

...

...
I'll do another example, where the string type is std::wstring, but the external encoding is user supplied.

Ok.

That example is now committed on the "i18n" branch. Monday I plan to post a copy somewhere accessible, and ask for a mini-review of all the i81n changes. --Beman

Beman Dawes

18 May 18 May

8:53 p.m.

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:200503311322.33735.ghost@cs.msu.su...

...

On Wednesday 23 March 2005 18:07, Beman Dawes wrote:

...
CVS now contains a branch "i18n" of the filesystem directories:

...

* The POSIX wpath implementation assumes that UTF-8 is always the operating system's preferred external path encoding. If any Boost users are concerned about other encodings, please let me know.

I certainly do. The standard encoding for russian on Linux is koi8-r. Probably, we need to use the conversion facet that's part of global locale. Qt uses

char *charset = nl_langinfo (CODESET);

and values of LC_ALL, LC_CTYPE and LC_LANG variables. But then it contains its own translation tables. So using locale("") is the best guess, I think.

OK, I've added wpath_traits::imbue() functions so the user can control the locale to get the conversion facet. If the user doesn't call imbue(), the default will be the global locale at the time of first use of a wpath conversion. But see below.

...

So using locale("") is the best guess, I think.

After reading the C++, C, and POSIX standards, I would say you are right. But wide_test got a conversion failure from the codecvt facet on the Mac. Ditto for the global locale. I haven't tested on POSIX yet. (See aside, below) I gave up and changed wide_test to imbue() a locale with the Boost UTF8 codecvt facet. That allows testing to produce uniform results on all platforms, so is OK for testing, and also tests the imbue mechanism itself. --Beman <aside> I've ordered a KWM switch (with attached cables, only $28) so I won't have to crawl around on the floor every time I want to switch between Mac and Linux test machines. Couldn't believe how cheap they have gotten. Ditto dual-monitor video cards and 17" LCD monitors - 17" LCD down to $209 US at Dell, including shipping! If you aren't running a dual monitor rig, you are missing a lot of programming productivity, IMO. Run your IDE or editor on one monitor, reference material on the other. Wonderful setup. </aside>

Vladimir Prus

24 Jun 24 Jun

9:55 a.m.

On Thursday 19 May 2005 00:53, Beman Dawes wrote:

...

...
and values of LC_ALL, LC_CTYPE and LC_LANG variables. But then it contains its own translation tables. So using locale("") is the best guess, I think.

OK, I've added wpath_traits::imbue() functions so the user can control the locale to get the conversion facet. If the user doesn't call imbue(), the default will be the global locale at the time of first use of a wpath conversion. But see below.

Good.

...

...
So using locale("") is the best guess, I think.

After reading the C++, C, and POSIX standards, I would say you are right. But wide_test got a conversion failure from the codecvt facet on the Mac. Ditto for the global locale. I haven't tested on POSIX yet.

Looking at the test, it creates various files with various funny names. So, if locale("") is not unicode, you'll indeed get conversion failures. Or do I miss something? - Volodya

Beman Dawes

26 Jun 26 Jun

11:42 p.m.

"Vladimir Prus" <ghost@cs.msu.su> wrote in message news:200506241355.50000.ghost@cs.msu.su...

...

On Thursday 19 May 2005 00:53, Beman Dawes wrote:

...
...
So using locale("") is the best guess, I think.

After reading the C++, C, and POSIX standards, I would say you are right. But wide_test got a conversion failure from the codecvt facet on the Mac. Ditto for the global locale. I haven't tested on POSIX yet.

Looking at the test, it creates various files with various funny names. So, if locale("") is not unicode, you'll indeed get conversion failures. Or do I miss something?

I eventually added the ability to imbue the locale, so if locale("") isn't satisfactory, the user has an out. The wide_test program was changed to always imbue a locale which uses the Boost UTF-8 codecvt facet. That cleared the problems with tests on the Mac. --Beman

7332

Age (days ago)

7427

Last active (days ago)

List overview

Download

12 comments

3 participants

participants (3)

Beman Dawes
Dan Rosen
Vladimir Prus