
Beman Dawes:
Yes. The situation on POSIX systems is quite messy.
I'm not sure that it is as messy as usually cited. There are basically two cases: 1. The filesystem is "8 bit neutral", that is, it stores the NTBS that is passed exactly as-is (and returns it unmodified); 2. The filesystem uses UTF-16 (NTFS and HPFS+). In this case, the OS translates the NTBS to UTF-16 for storage (using the system codepage in Windows, UTF-8 in Mac OS X, and the codepage specified at mount time on Linux) and translates the UTF-16 name from the FS back when returning it to the application. Note that the roundtrip on HPFS+ may not produce the original NTBS even for valid UTF-8 inputs because of the Unicode normalization that occurs (but it does produce the original string, as read by the user). Most of the perceived complexity comes from the fact that people living in the (1) world can't comprehend that non-neutral filesystems exist and expect to be able to (1) pass arbitrary byte strings to the OS and (2) get them back. This leads to other mistaken beliefs that it's possible for the user to choose the encoding of the input.