Re: [boost] [general] What will string handling in C++ look like inthe future [was Always treat ... ]

19 Jan 2011

      Dave Abrahams wrote:
...
*Scenario D:* We try for scenario A. and people still use Qstrings, 
wxStrings, etc.
*Scenario E:* We add another string class and everyone adopts it
The problem with using an Unicode string, be it QString or utf8_string, to 
represent paths is that it forces you to pick an encoding under POSIX. When 
the OS gives you a file name as char*, to store it in your Unicode string, 
you have to interpret it. Then, to give it back to the OS, you have to 
de-interpret it. This forces you to choose between two evils: you can opt to 
use a single byte encoding such as ISO-8859-1, which gives you perfect 
round-trip, but leads to the problem that people can enter a Cyrillic file 
name in your Unicode-enabled GUI and see something odd happen on disk, even 
when their shell is configured as UTF-8 and can show Cyrillic names. Or, you 
can choose to use UTF-8, in which case the OS can give you a name which you 
can't decode properly, because it's invalid UTF-8.

There is no single good answer to this, of course; even if you go with my 
recommended approach as treating paths as byte sequences unless and until 
you need to display them (in which case you treat them as UTF-8), there'll 
still be paths that won't show up properly on the screen. But the program 
will be able to work with them, even if they are undisplayable.

To give a simple example:

int my_main( int ac, char const* av[] )
{
    my_fopen( av[1] );
}

Since files can have arbitrary byte sequences as names under POSIX (Mac OS X 
excluded), if my_fopen insists on taking valid UTF-8, it will refuse to open 
the file.

Re: [boost] [general] What will string handling in C++ look like inthe future [was Always treat ... ]

Peter Dimov