
Mathias Gaunard wrote:
POSIX system calls expect the text they receive as char* to be encoded in the current character locale.
No, POSIX system calls (under most Unix OSes, except on Mac OS X) are encoding-agnostic, they receive a null-terminated byte sequence (NTBS) without interpreting it. On Mac OS X, file paths must be UTF-8. Locales are not considered.
To write cross-platform code, you need to convert your UTF-8 input to the locale encoding when calling system calls, and convert text you receive from those system calls from the locale encoding to UTF-8.
This is one possible way to do it (blindly using UTF-8 is another). Strictly speaking, under an encoding-agnostic file system, you must not convert anything to anything because this may cause you to irretrievably lose the original path. For display purposes, of course, you have to pick an encoding somehow. There is no "current" character locale on Unix, by the way, unless you count the environment variables. The OS itself doesn't care. Using the current C locale (LANG=...) allows you to display the file names the same way the 'ls' command does, whereas using UTF-8 allows your user to enter file names which are not representable in the LANG locale.
Windows is exactly the same, except it's got two sets of locales and two sets of system calls.
Nope. It doesn't have two sets of locales.
So your technique for writing independent code is relying on the user to use an UTF-8 locale?
More or less. The code itself doesn't depend on the user locale, it always works, but to see the actual names in a terminal, you need an UTF-8 locale. This is now the recommended setup on all Unix OSes.