
OK, if the long term plan is:
1) design and implement boost::string using UTF-8 doing all the things like code-point iteration, character iteration, convenience stuff like starts-with, ends-with, replace, trim, etc., etc. with as much backward compatibility with std::string as possible without hindering progress
2) try really hard to push it to the standard
then I'm on board with that.
Some of those could be problematic (I've run across references implying that 0x20 isn't the universal word-separation character, so trim would at least need some extra parameters), but for the most part, I'd agree with it.
And also it is locale dependent. Unicode defines 4 text segments: Grapheme, Word and Sentence. http://www.unicode.org/reports/tr14/ There is also line break boundaries defined: http://unicode.org/reports/tr29 Most of them are also locale dependent as require use of dictionaries. So unless you want to carry locale information in the string, I don't think it is good to put these into the string itself. Artyom