Re: [boost] [filesystem]Extracting path as string from wpath

20 Oct 2008

      On Sun, Oct 19, 2008 at 5:17 AM, Ulrich Eckhardt <doomster@knuut.de> wrote:
...
On Friday 17 October 2008 17:45:28 Emil Dotchevski wrote:
...
"In Mac OS X's VFS API file names are, by definition, canonically
decomposed Unicode, encoded using UTF-8."
This means that precomposed characters are forbidden and combining
diacritics must be used to replace them.
See http://developer.apple.com/qa/qa2001/qa1173.html.
Danger: read the whole document! The point is, that nothing guarantees this
encoding, it is by no means enforced by the OS. So, in order to be able to
use non-compliant media (like e.g. ones with codepage encodings, possibly
even unknown codepage encodings) you have to treat the strings received
from
the filesystem as byte strings. The only things you can rely on are:
- Termination with a null byte.
- Segments separated with a path separator (i.e. '/').
Otherwise, converting it to a text string is a lossy conversion because of
the
unreliable encoding (though assuming UTF-8 as a default works). Similarly,
encoding to a byte string isn't reliable, because the encoding of the
filesystem isn't guaranteed.
BTW:
- A similar discussion took place on the Python developers' mailinglist.
Current state seems to be to implement both a Unicode API and one using
byte
strings in parallel, though I'm not advocating that approach.
- The same problem is present on all POSIX systems (BSDs, Linux..) though
there you don't have the UTF-8 default but rather the encoding of the CTYPE
locale.
Yes. The situation on POSIX systems is quite messy. I've been discussing it
with the POSIX folks, and get conflicting answers depending on the example
presented. Part of the problem is that documented behavior of the POSIX
command line utilities is different from the program API behavior. Also,
real-world behavior sometimes seems different from POSIX specifications.
Sigh.

I'd really like to be put in contact with someone who has access to and is
familiar with POSIX variants used in Asia.

--Beman

Re: [boost] [filesystem]Extracting path as string from wpath

Beman Dawes