Re: [boost] [General] Always treat std::strings as UTF-8

18 Jan 2011


      ...
Dave Abrahams wrote:
...
I think the reason to use separate types is to provide a type-safety
barrier between your functions that operate on utf-8 and system or
3rd-party interfaces that don't or may not.  In principle, that should
force you to think about encoding and decoding at all the places where
it may be needed, and should allow you to code naturally and with
confidence where everybody is operating in utf8-land.
Yes, in principle. It isn't terribly necessary if everybody is 
operating in UTF-8 land though. It's a bit like defining a separate 
integer type for nonnegative ints for type safety reasons - useful in 
theory, but nobody does it.
Are you saying that no one uses unsigned int for non-negative ints?   
I'm thinking I'm just misunderstanding you.  I work with whole groups of
On 01/18/2011 03:27 AM, Peter Dimov wrote:
people that are careful to declare things to match their use to take 
advantage of the compiler diagnostics.  Show me any large body of code 
where people are sloppy about this I'll turn on the appropriate warnings 
and find bugs for you by inspection.  My experience is that declaring 
everything int is something beginners do but once they've been bitten by 
the inevitable subtle and not so subtle bugs, intermediate level 
programmers learn to declare as unsigned things that will always be 
non-negative and for which it would be a mistake to ever be negative.  
In spite of being a good programmer with years of experience I make a 
constant series of sloppy coding errors and am thankful for every 
category the compiler will tell me about.  Everyone that has ever worked 
at a place that builds with warnings turned up and wants the warnings 
gone has gone through this and learned these lessons.  That's why I 
think I'm probably misunderstanding you.
...
If you're designing an interface that takes UTF-8 strings, it still 
may be worth it to have the parameters be of a utf8-specific type, if 
you want to force your users to think about the encoding of the 
argument each time they call one of your functions... this is a 
legitimate design decision. If you're in control of the whole program, 
though, it's usually not worth it - you just keep everything in UTF-8.
It's exactly why you would do it.  It gets the compiler involved and it 
will give you diagnostics that make it harder for you to do the wrong 
thing.  If the converting constructors for the utf-8 specific type are 
all explicit, so you can't accidentally get rid of the warning and 
_still_ have incorrect code, all the better.  Better to be correct by 
design when you can.

Patrick

Re: [boost] [General] Always treat std::strings as UTF-8

Patrick Horgan