
On Mon, Jan 23, 2012 at 9:28 AM, Yakov Galka <ybungalobill@gmail.com> wrote:
On Mon, Jan 23, 2012 at 14:47, Beman Dawes <bdawes@acm.org> wrote:
On Mon, Jan 23, 2012 at 4:46 AM, Yakov Galka <ybungalobill@gmail.com> wrote: [...]
Unfortunately it boils to the interface whence you can get a c_str() to a UTF-16 string only.
That's not correct.
It's correct. I state that path::c_str() returns UTF-16 on Windows. It's a fact. So the encoding isn't an implementation detail but a part of the interface.
As quoted above, you said only that "...the interface whence you can get a c_str() to a UTF-16 string only." The interface includes multiple observers, which return values with various encodings other than UTF-16. The return types from the observers allow c_str() to access those values. During the design discussions, two other alternatives were discussed. (1) Always hold the path internally in a char string encoded UTF-8. The cost on Windows is that a conversion has to be done before every file system operation. The cost on POSIX is that a double conversion has to be done before every file system operation if the encoding is not UTF-8. (2) Hold two strings internally, one in the native type and encoding, the other in UTF-8. The cost is trying to keep them in sync, with the conversions that implies, for some definition of "in sync". If class std::basic_string itself had better support for string interoperability, class path would be able to side step at least some of the conversion headaches. --Beman