
At Thu, 20 Jan 2011 00:07:18 +0200, Peter Dimov wrote:
Dave Abrahams wrote:
At Wed, 19 Jan 2011 23:02:02 +0200, Peter Dimov wrote:
My answer is different. T is std::string, and:
- on POSIX OSes, this string is taken directly from the OS and given directly to the OS, without any conversion; - on Windows, this string is UTF-8 and is converted to UTF-16 before being given to the OS, and converted from UTF-16 after being received from it. This conversion should tolerate broken UTF-16 because the OS does so as well.
...
I prefer to have semantic constraints/invariants like "this is UTF-8 encoded" represented in the type system and enforced by public library interfaces. I'm arguing for a future like that.
But the semantics I outlined above only have this constraint under Windows.
Sorry, I don't understand what you're saying here. But let me say a little more about my point; maybe that will help. If I get a std::string from "somewhere", I don't know what encoding it's in, if any. The abstraction presented by std::string is essentially "sequence of individually addressable and mutable chars that by convention represents text in some unspecified way." It has lots of interface that is aimed at manipulating the raw sequence of chars, and none that helps with an interpretation of those chars. IIUC, you're talking about changing the abstraction presented by std::string to "sequence of individually addressable and mutable chars that by convention represents text encoded as utf-8." I would prefer to be handling something that presents the abstraction "character string." I'm not sure exactly what that looks like, but I'm pretty sure the "individually addressable and mutable chars" part should go. I'd like to see an interface that prevents corrupting the underlying data such that it no longer represents a valid sequence of characters (or at least makes it highly unlikely that such corruption could happen accidentally). Furthermore, there are lots of string-y things I'd want to do that aren't provided—or aren't provided well—by std::string, e.g. if (s1.starts_with(s2)) {...} Does this make more sense? -- Dave Abrahams BoostPro Computing http://www.boostpro.com