Re: [boost] [General] Always treat std::strings as UTF-8? (was[Process] List of small issues)

14 Jan 2011

      Alexander Lamaison wrote:
...
...
On Fri, 14 Jan 2011 00:48:43 -0800 (PST), Artyom wrote:
...
Two problems with this approach:
- Even if the encoding under POSIX platforms is not UTF-8 you will
  be still able to open files, close them, stat on them and do any
  other operations regardless encoding as POSIX API is encoding
  agnostic, this is why it works well.
This isn't a problem, right?  This is exactly why it _does_ work :D 
Assume
the strings are in OS-default encoding, don't mess with them, hand them to
the OS API which knows how to treat them.
It doesn't always work. On Mac OS X, the paths must be UTF-8; the OS isn't 
encoding-agnostic, because the HFS+ file system stores file names as UTF-16 
(much like NTFS). You can achieve something similar on Linux by mounting a 
HFS+ or NTFS file system; the encoding is then specified at mount time and 
should also be observed. Of course, file systems that store file names as 
arbitrary null-terminated byte sequences are typically encoding-agnostic.

For my own code, I've gradually reached the conclusion that I should always 
use UTF-8 encoded narrow paths. This may not be feasible for a library (yet) 
because people still insist on using other encodings on Unix-like OSes, 
usually koi8-r. :-) I'm anxiously awaiting the day everyone in the 
Linux/Unix world will finally switch to UTF-8 so we can be done with this 
question once and for all.

Re: [boost] [General] Always treat std::strings as UTF-8? (was[Process] List of small issues)

Peter Dimov