
Dave Abrahams wrote:
*Scenario D:* We try for scenario A. and people still use Qstrings, wxStrings, etc.
*Scenario E:* We add another string class and everyone adopts it
The problem with using an Unicode string, be it QString or utf8_string, to represent paths is that it forces you to pick an encoding under POSIX. When the OS gives you a file name as char*, to store it in your Unicode string, you have to interpret it. Then, to give it back to the OS, you have to de-interpret it. This forces you to choose between two evils: you can opt to use a single byte encoding such as ISO-8859-1, which gives you perfect round-trip, but leads to the problem that people can enter a Cyrillic file name in your Unicode-enabled GUI and see something odd happen on disk, even when their shell is configured as UTF-8 and can show Cyrillic names. Or, you can choose to use UTF-8, in which case the OS can give you a name which you can't decode properly, because it's invalid UTF-8. There is no single good answer to this, of course; even if you go with my recommended approach as treating paths as byte sequences unless and until you need to display them (in which case you treat them as UTF-8), there'll still be paths that won't show up properly on the screen. But the program will be able to work with them, even if they are undisplayable. To give a simple example: int my_main( int ac, char const* av[] ) { my_fopen( av[1] ); } Since files can have arbitrary byte sequences as names under POSIX (Mac OS X excluded), if my_fopen insists on taking valid UTF-8, it will refuse to open the file.