
Dear list, following the whole string encoding discussion I would like to make some suggestions.
From the whole debate it is becoming clear, that instant switch from encoding-agnostic/platform-native std::string to UTF-8-encoded std::string is not likely to happen.
Then it was proposed that we create a utf8_t string type that would be used *together* (for all eternity) with the standard basic_string<>. While I see the advantages here, I (as I already said elsewhere) have the following problem with this approach: Using a name like utf8_t or u8string, string_utf8, etc. at least to me (and I've consulted this off the list, with several people) suggests, that UTF-8 is still something special and IMO also sends the message that it is OK to remain forever with the various encodings and std::string as it is today. We should *IMO* endorse the opposite. My suggestion is the following: Let us create a class called boost::string that will have all the properties that a string handling class in 2011+ A.D. should have, basically what std::string should have been. Then there are two alternatives: a) When all the zillions lines of legacy code in FORTRAN, COBOL, BASIC, LOGO, etc. :) are fixed / ported / abandoned, and UTF-8 becomes a true standard for text encoding widely accepted by the whole IT industry and markets, and all the issues that prevent us from doing the transition now are resolved, this string becomes the standard, like many other things from Boost in the past, and replaces the current std::string. b) As some (having much more insight into how the standardizing comitee works than I do) have pointed out, it will never become a true standard. But with the Boost's influence it at least becomes a de-facto standard for strings and it is (hopefully) adopted by the libraries that currently feel the need to invent string-classes themselves (with a good reason). Also I've uploaded into the vault file string_proposal.zip containing my (naive and un-expert-ly) idea what the interface for boost::string and the related-classes could look like (it still needs some work and it is completelly un-optimized, un-beautified, etc.). /me ducks and covers :) The idea is that, let std::string/wstring be platform-specifically- -encoded as it is now, but also let the boost::string handle the conversions as transparently as possible so if in case the standard adopts it, std::string would become a synonym for boost::string. It is only partially implemented and there are two examples showing how things could work, but the real UTF-8 validation, transcoding, error handling, is of course missing. Remember it is aimed at the design of the interfaces at this point. If you have the time, have a look and if my suggestions and/or the code looks completely wrong, please, feel free to slash it to pieces :), and if you feel up to it, propose something better. If this or something completely different and much better that comes out of it, will be agreed upon, we could set up a dedicated git repository for Boost.String and maybe try if the new suggested collaborative development in per-boost-component repositories really works. :) If some of the people that are skilled with unicode would join or lead the effort it would be awesome. Best, Matus