
Recently, I needed to fix a program to work properly with paths that cannot be represented by narrow strings under Windows. The problematic filenames came from the user via drag and drop. I tried to go the wstring route, the same approach that the filesystem library takes; but it was harder than I thought. Many parts of the code base assumed narrow paths. At the end I reverted the changes and just encoded the wide path into UTF-8 at the very start, passed the UTF-8 string through the existing code, then decoded the UTF-8 into a wstring at the very end, immediately before calling the Windows API. It worked. What this means in the [filesystem] context? Basically, instead of: template<class String, class Traits> class basic_path; I used something similar to: class path { private: string data_; // UTF-8, exposition only public: path( wstring const & s ); path( string const & s, encoding_type encoding = system_default ); wstring to_wstring() const; string to_string( encoding_type encoding = system_default ) const; }; -- Peter Dimov http://www.pdimov.com

"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00af01c6334a$b9ed2200$6407a8c0@pdimov2...
Seems like a reasonable and practical approach. I've wondered several times if we wouldn't have been better off if Microsoft had chosen UTF-8 as their Window external representation, too. --Beman

Beman Dawes wrote:
"Peter Dimov" <pdimov@mmltd.net> wrote in message news:00af01c6334a$b9ed2200$6407a8c0@pdimov2...
I don't think that they could have done that because of legacy FAT filesystems that could have been using narrow paths with an arbitrary encoding. But my point is that the library can use UTF-8 as its _internal portable encoding_, encoding into UTF-8 when it is given a path as a wstring or a (string, encoding) pair, and decoding into the appropriate (string, encoding) or wstring when it passes a path to the OS. Everything else can be string-based. With this approach, we can have a single path class that handles everything. No need to choose between a narrow path and a wide path, and no need to encode the character encoding into the path type. I've tried to communicate this via code, apparently with mixed success. :-)
participants (3)
-
Beman Dawes
-
Peter Dimov
-
Thorsten Ottosen