
On Thu, 27 Jan 2011 12:51, Nevin Liber <nevin@eviloverlord.com> wrote:
I'd like to see this broken up into three discussions:
1. Immutable strings.
Immutable or not, I don't see a direct use for modification of individual code-units (e.g. char, wchar_t) in a string. Too many things can go wrong. Some kind of manipulation of code-points, yes, but not code-units. Anyway, code-points are not the end either. Multiple code-points may be needed to represent a grapheme, using combining characters. And sometimes a single code-point can represent several graphemes, such as ligatures.
2. utf8 strings.
Although I personally prefer UTF-8 encoded strings, the internal encoding is more or less irrelevant for an implementation based on rope or similar non-contiguous data structure. I believe this is what Dean Michael Berris is suggesting. I think this is especially true if direct access to individual code-units are prevented. For an implementation using a contiguous data structure and providing a constant time c_str member function I'd really want to see some option to set the internal encoding of strings. Performance-wise it may be preferred to use UTF-16 internally when using for example Win32 API, if an extra copy can be avoided.
3. Unrealistic pipe dream about replacing std::string.
Replacing std::string will never happen. Deprecating std::string in favor of std::text/std::unicode/std::xstring may happen in the long run. Regards, Anders Dalvander -- WWFSMD?