
"Erik Wien" <wien@start.no> writes:
"Rogier van Dalen" <rogiervd@gmail.com> wrote in message
I hadn't yet looked at it this way, but you are right from a theoretical point of view at least. To get more to practical matters, what do you think this should do:
unicode::string s = ...; s += 0xDC01; // An isolated surrogate, which is nonsense
? Should it throw, or convert the isolated surrogate to U+FFFD REPLACEMENT CHARACTER (Unicode standard 4 Section 2.7), or something else? And what should the member function with the opposite behaviour be called?
The best solution would be to never append single code units, but instead code points. The += operator would determine how many code units is required for the given code point.
Is this going to be illegal for most fs, then? std::copy( std::istream_iterator<char>(f), std::istream_iterator<char>(), std::back_inserter(my_utf8_string)); I think it pretty much has to work. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com