
Alexander Lamaison wrote:
I don't understand how it could possibly not help. If I see an api function call_me(std::string arg) I know next to nothing about what it's expecting from the string (except that by convention it tends to mean 'string in OS-default encoding').
You should read the documentation of call_me (*). Yes, I know that in the real world the documentation often doesn't specify an encoding (worse - the encoding varies between platforms and even versions of the same library), but if the developer of call_me hasn't bothered to document the encoding of the argument, he won't bother to use a special UTF-8 type for the argument, either. :-) (*) And the documentation should either say that call_me accepts UTF-8, or that call_me is encoding-agnostic, that is, it treats the string as a byte sequence. I can think of one reason to use a separate type - if you want to overload on encoding: void f( latin1_t arg ); void f( utf8_t arg ); In most such cases that spring to mind, however, what the user actually wants is: void f( string arg, encoding_t enc ); or even void f( string arg, string encoding ); In principle, as Chad Nelson says, it's useful to have separate types if the program uses several different encodings at once, fixed at compile time. I don't consider such a way of programming a good idea though. Strings should be either byte sequences or UTF-8; input can be of any encoding, possibly not known until runtime, but it should always be either processed as a byte sequence or converted to UTF-8 as a first step. Regarding the OS-default encoding - if, on Windows, you ever encounter or create a string in the OS default encoding, you've already lost - this code can't be correct. :-)