
Alexander Lamaison wrote:
On Fri, 14 Jan 2011 00:48:43 -0800 (PST), Artyom wrote: ... Two problems with this approach:
- Even if the encoding under POSIX platforms is not UTF-8 you will be still able to open files, close them, stat on them and do any other operations regardless encoding as POSIX API is encoding agnostic, this is why it works well.
This isn't a problem, right? This is exactly why it _does_ work :D Assume the strings are in OS-default encoding, don't mess with them, hand them to the OS API which knows how to treat them.
It doesn't always work. On Mac OS X, the paths must be UTF-8; the OS isn't encoding-agnostic, because the HFS+ file system stores file names as UTF-16 (much like NTFS). You can achieve something similar on Linux by mounting a HFS+ or NTFS file system; the encoding is then specified at mount time and should also be observed. Of course, file systems that store file names as arbitrary null-terminated byte sequences are typically encoding-agnostic. For my own code, I've gradually reached the conclusion that I should always use UTF-8 encoded narrow paths. This may not be feasible for a library (yet) because people still insist on using other encodings on Unix-like OSes, usually koi8-r. :-) I'm anxiously awaiting the day everyone in the Linux/Unix world will finally switch to UTF-8 so we can be done with this question once and for all.