
Dave Abrahams wrote:
I think the reason to use separate types is to provide a type-safety barrier between your functions that operate on utf-8 and system or 3rd-party interfaces that don't or may not. In principle, that should force you to think about encoding and decoding at all the places where it may be needed, and should allow you to code naturally and with confidence where everybody is operating in utf8-land.
Yes, in principle. It isn't terribly necessary if everybody is operating in UTF-8 land though. It's a bit like defining a separate integer type for nonnegative ints for type safety reasons - useful in theory, but nobody does it. Are you saying that no one uses unsigned int for non-negative ints? I'm thinking I'm just misunderstanding you. I work with whole groups of
On 01/18/2011 03:27 AM, Peter Dimov wrote: people that are careful to declare things to match their use to take advantage of the compiler diagnostics. Show me any large body of code where people are sloppy about this I'll turn on the appropriate warnings and find bugs for you by inspection. My experience is that declaring everything int is something beginners do but once they've been bitten by the inevitable subtle and not so subtle bugs, intermediate level programmers learn to declare as unsigned things that will always be non-negative and for which it would be a mistake to ever be negative. In spite of being a good programmer with years of experience I make a constant series of sloppy coding errors and am thankful for every category the compiler will tell me about. Everyone that has ever worked at a place that builds with warnings turned up and wants the warnings gone has gone through this and learned these lessons. That's why I think I'm probably misunderstanding you.
If you're designing an interface that takes UTF-8 strings, it still may be worth it to have the parameters be of a utf8-specific type, if you want to force your users to think about the encoding of the argument each time they call one of your functions... this is a legitimate design decision. If you're in control of the whole program, though, it's usually not worth it - you just keep everything in UTF-8.
It's exactly why you would do it. It gets the compiler involved and it will give you diagnostics that make it harder for you to do the wrong thing. If the converting constructors for the utf-8 specific type are all explicit, so you can't accidentally get rid of the warning and _still_ have incorrect code, all the better. Better to be correct by design when you can. Patrick