
At 07:13 AM 11/15/2004, Peter Dimov wrote:
Peter Dimov wrote:
Choosing the wrong native character type causes redundant roundtrip conversions, one in Boost.Filesystem, one in the OS.
Let me expand on that a little.
It is _fundamentally wrong_ to assume that all present and future OS APIs
have a single native character type.
The actual wording of PJP's paper was that for paths (not the entire OS API's), one type could be considered "fundamental".
Consider a case where a dual API OS has access to two logical volumes C: and D:, where the file system on C: stores the filenames as 16 bit UTF-16, and the file system on D: uses narrow characters.
That happens all the time on Windows. Often the A: drive is a narrow character FAT filesystem.
Now the behavior of the calls is as follows:
CreateFileA( "C:/foo.txt" ); // char -> wchar_t OS conversion CreateFileW( L"C:/foo.txt" ); // no OS conversion CreateFileA( "D:/foo.txt" ); // no OS conversion CreateFileW( L"D:/foo.txt" ); // wchar_t -> char OS conversion
Yes, that's my understanding too.
Furthermore, consider a typical scenario where the application has its own "native" character type, app_char_t. In a design that enforces a single "native" character type boost_fs_char_t ("native" is a deceptive term due to the above scenario), there are potentially redundant (and not necessarily preserving) conversions from app_char_t to boost_fs_char_t and then from boost_fs_char_t to the filesystem character type.
Yes. Note that even if a dual scheme is used, that same situation might arise: if ( fs::exists( "c:foo" ) ) ... if ( fs::exists( L"d:foo" ) ) ... Notice that a narrow character path was given for the wide-character filesystem and a wide character path given for the narrow-character file system. If the type of the user supplied path is what determines the API to use, the O/S may still have to do conversions when there is a mismatch with the file system. Do you see any alternative? If the library queried the O/S about the path (which I'm not sure is always possible) to see if the filesystem was wide or narrow, a conversion would still have to be done if the user supplied path used the other char type. That saves nothing and adds the cost of the query.
In my opinion, the Boost filesystem library should pass the application characters _exactly as-is_ to the underlying OS API, whenever possible. It should not impose its own "native character" ideas upon the user nor upon
the OS.
Your strongest argument IMO is the point about conversions not necessarily being value preserving. (I guess we could tell Windows users that they should not expect such conversions to work unless supported by the applicable codepage. But that seems spin rather than a real solution.) The efficiency argument is certainly real, but I don't see it as being quite as strong. (It will be important for some users, however. Think of very small or embedded systems.) If the rule is that there is some type (char or wchar_t) associated with each path, and the library will always use the native API of that type if available, then it seems to me that the arguments in favor of a single path class weaken considerably. Sure the library can keep track at runtime of whether a particular path is wide or narrow, but it is much more normal in C++ to distinguish at compile time. In other words, separate path and wpath classes. In discussion on the C++ committee's library reflector, there wasn't demand for a templatized basic_path type. AFAICS, a templatized basic_path type could be added later if demand arose. --Beman