
Erik Wien wrote:
It's good to have one string class for library interoperability reasons. Otherwise library A would demand utf8_string, library B would demand utf16_string, and library C would demand utf32_string. No matter which one you choose, you'll pay a price. (This doesn't change even if you spell utf8_string as string<utf8>.)
That is true. Though the strings of different encodings should be assignable to each other, libraries taking references to encoded_strings would need some conversion to be done.
We have a similar problem today with basic_string<char> and basic_string<wchar_t>, and I think it could also be solved in a way that is very similar to what is done in the <string> header.
Just to clarify: the string and wstring in the standard have a huge problem: you can't convert string to wstring in any way: there's just no appropriate converting constructor.
If we typedef a unicode_string or something as encoded_string<utf16>, and promote that as THE string class, most users would use that as their primary string representation, and simply be oblivious to the underlying encoding. (A good thing.)
That would still make it easy for a user to use some different encoding without good reason.
Advanced user could (just like we do today with basic_string) choose to support multiple encodings by templating their own functions on encoding as well.
Oh well. I just hope nobody will ever make an implementation of XML parser + XML Schema + XPath + XQuery + SOAP + HTML renderer which is fully templated on string type, unless the same person speeds up gcc by 10 times previously. - Volodya