Re: [boost] [General] Always treat std::strings as UTF-8

18 Jan 2011

      On Tue, Jan 18, 2011 at 1:39 PM, Dave Abrahams <dave@boostpro.com> wrote:
...
At Tue, 18 Jan 2011 19:46:41 +0200,
Peter Dimov wrote:
...
Dave Abrahams wrote:
...
At Tue, 18 Jan 2011 13:27:29 +0200,
Peter Dimov wrote:
...
Dave Abrahams wrote:
...
I think the reason to use separate types is to provide a type-safety
barrier between your functions that operate on utf-8 and system or
3rd-party interfaces that don't or may not.  In principle, that should
force you to think about encoding and decoding at all the places where
it may be needed, and should allow you to code naturally and with
confidence where everybody is operating in utf8-land.
Yes, in principle. It isn't terribly necessary if everybody is
operating in UTF-8 land though.
But they won't be.  That's not today's reality.
They should be, though. As a practical matter, the difference between
taking/returning a string and taking/returning an utf8_t is to force
people to write an explicit conversion. This penalizes people who are
already in UTF-8 land because it forces them to use utf8_t( s,
encoding_utf8 ) and s.c_str( encoding_utf8 ) everywhere, without any
gain or need. It's true that for people whose strings are not UTF-8,
forcing those explicit conversions may be considered a good thing. So
it depends on what your goals are. Do you want to promote the use of
UTF-8 for all strings, or do you want to enable people to remain in
non-UTF-8-land?
Oh, I get it.  Nevermind :-)
On second thought...

There are two ways this could go AFAICS:

1. We just use std::string for UTF-8 and eventually the whole world
will catch up
2. We establish some other type for UTF-8 and *it* becomes the lingua franca

Aren't things still enough of a mess out there that #2 is just as
likely to work well?
-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com