
On Tue, Jan 18, 2011 at 1:39 PM, Dave Abrahams <dave@boostpro.com> wrote:
At Tue, 18 Jan 2011 19:46:41 +0200, Peter Dimov wrote:
Dave Abrahams wrote:
At Tue, 18 Jan 2011 13:27:29 +0200, Peter Dimov wrote:
Dave Abrahams wrote:
I think the reason to use separate types is to provide a type-safety barrier between your functions that operate on utf-8 and system or 3rd-party interfaces that don't or may not. In principle, that should force you to think about encoding and decoding at all the places where it may be needed, and should allow you to code naturally and with confidence where everybody is operating in utf8-land.
Yes, in principle. It isn't terribly necessary if everybody is operating in UTF-8 land though.
But they won't be. That's not today's reality.
They should be, though. As a practical matter, the difference between taking/returning a string and taking/returning an utf8_t is to force people to write an explicit conversion. This penalizes people who are already in UTF-8 land because it forces them to use utf8_t( s, encoding_utf8 ) and s.c_str( encoding_utf8 ) everywhere, without any gain or need. It's true that for people whose strings are not UTF-8, forcing those explicit conversions may be considered a good thing. So it depends on what your goals are. Do you want to promote the use of UTF-8 for all strings, or do you want to enable people to remain in non-UTF-8-land?
Oh, I get it. Nevermind :-)
On second thought... There are two ways this could go AFAICS: 1. We just use std::string for UTF-8 and eventually the whole world will catch up 2. We establish some other type for UTF-8 and *it* becomes the lingua franca Aren't things still enough of a mess out there that #2 is just as likely to work well? -- Dave Abrahams BoostPro Computing http://www.boostpro.com