
Otherwise you should:
1. Reinvent the string
Or at least wrap it. ;-)
2. Reinvent standard library to use new string
Not entirely necessary, for the same reason that very few changes to the standard library are needed when you switch from char strings to char16_t strings to char32_t strings -- the standard library, designed around the idea of iterators, is mostly type-agnostic.
Ok... Few things: 1. UTF-32 is waste of space - don't use it unless it is something like handling code points (char32_t) 2. UTF-16 is too error prone (See: UTF-16 considered harmful) 3. There is not special type char8_t distinct from char, so you can't use it.
The utf*_t types provide fully functional iterators,
Ok let's thing what do you need iterators for? Accessing "characters" if so you are most likely doing something terribly wrong as you ignore the fact that codepoint != character. I would say such iterator is wrong by design unless you develop a Unicode algorithm that relates to code point.
so they'll work fine with most library functions, so long as those functions don't care that some characters are encoded as multiple bytes. It's just the ones that assume that a single byte represents all characters that you have to replace, and you'd have to replace those regardless of whether you're using a new string type or not, if you're using any multi-byte encoding.
Ok... The paragraph above is inheritable wrong first of all lets cleanup all things:
that some characters are encoded as multiple bytes
Characters are not code points.
the ones that assume that a single byte represents all characters
Please I want to make this statement even more clearer C H A R A C T E R != C O D E P O I N T Even in single byte encodings - for examples windows-1255 is single byte encoding and still my represent a single character using 1, 2 or 3 bytes! Once again - when you work with string you don't work with them as series of characters you want with them and text entities - text chunks.
and you'd have to replace those regardless of whether you're using a new string type or not, if you're using any multi-byte encoding.
No I would not because I don't look at string as on the sequence of code points - by themselves then are meaningless. Code points are meaningful in terms of Unicode algorithms that know how to combine them. So if you want to handle text chunks you will have to use some Unicode aware library.
It is just neither feasible no necessary.
My code says it's perfectly feasible. ;-) Whether it's necessary or not is up to the individual developer, but the type-safety it offers is more in line with the design philosophy of C++ than using std::string for everything. I hate to harp on the same tired example, but why do you really need any pointer type other than void*? It's the same idea.
No it isn't. String is text chunk. You can combine them, concatenate them, search for specific substrings or relate to ASCII characters for example like in HTML and parse them and this is perfectly doable withing standard std::string regardless it is UTF-8, Latin1 or other ISO-8859-* ASCII compatible encoding. This is very different. Giving you "utf-8" string or UTF-8 container would give you false feeling that you doing something right. Unicode is not about splitting string into code points or iterating over them... It is totally different thing. Artyom