
At Mon, 17 Jan 2011 21:46:36 -0800, Emil Dotchevski wrote:
I think the reason to use separate types is to provide a type-safety barrier between your functions that operate on utf-8 and system or 3rd-party interfaces that don't or may not. In principle, that should force you to think about encoding and decoding at all the places where it may be needed, and should allow you to code naturally and with confidence where everybody is operating in utf8-land. The typical failures I've seen, where there is no such mechanism (e.g. in Python where there's no static typing), are caused because programmers lose track of whether what they're handling is encoded as utf-8 or not.
UTF-8 allows the use of char * for type erasure for strings, much like void * allows that in general.
Yes, that's exactly my point, although this isn't a property of UTF-8; it's a more general thing. In a dynamic language like Python everything is type-erased.
Using C++ type tags to discriminate between different data pointed by void pointers is mostly redundant
Exactly. I'm suggesting, essentially, to avoid the use of void pointers except where you're forced to, at the boundaries with "legacy" interfaces. -- Dave Abrahams BoostPro Computing http://www.boostpro.com