Re: [boost] [General] Always treat std::strings as UTF-8

18 Jan 2011


      At Mon, 17 Jan 2011 21:46:36 -0800,
Emil Dotchevski wrote:
...
...
I think the reason to use separate types is to provide a type-safety
barrier between your functions that operate on utf-8 and system or
3rd-party interfaces that don't or may not.  In principle, that should
force you to think about encoding and decoding at all the places where
it may be needed, and should allow you to code naturally and with
confidence where everybody is operating in utf8-land.  The typical
failures I've seen, where there is no such mechanism (e.g. in Python
where there's no static typing), are caused because programmers lose
track of whether what they're handling is encoded as utf-8 or not.
UTF-8 allows the use of char * for type erasure for strings, much like
void * allows that in general.
Yes, that's exactly my point, although this isn't a property of UTF-8;
it's a more general thing.  In a dynamic language like Python
everything is type-erased.
...
Using C++ type tags to discriminate
between different data pointed by void pointers is mostly redundant
Exactly.  I'm suggesting, essentially, to avoid the use of void
pointers except where you're forced to, at the boundaries with
"legacy" interfaces.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com