
On Mon, 24 Jan 2011 19:28:50 +0800 Dean Michael Berris <mikhailberis@gmail.com> wrote:
On Mon, Jan 24, 2011 at 3:04 PM, Patrick Horgan <phorgan1@gmail.com> wrote:
[...] I'm with you here, but to be fair to Chad, you could add to that list a string of utf-8 encoded characters. If a string contains things with a particular encoding there's value in being able to keep track of whether it's validly encoded. It may very well be that a std::string is part of another type, or that there's some encoding wrapper that lets you see it as utf-8 in the same way an external iterator lets you look at chars.
Sure, however I personally don't see the value of making the encoding an intrinsic property of a string object. [...]
Then I think we have different purposes, and I'll absent myself from this part of the discussion after this reply. Before I go, I'll note in passing that I've started on the modifications to the UTF types, and I found that it made sense to omit many of the mutating functions from utf8_t and utf16_t, at least the ones that operate on anything other than the end of the string.
Are you saying that you try it as utf-8, it doesn't decode and then you try utf-32 to see if it works? Cause the same string couldn't be both. Or are you saying that the string has some underlying encoding but something lets it be viewed in other encodings, for example it might actually be EUC, but external iterators let you view it as utf-8 or utf-16 or utf-32 interpreting on the fly?
I'm saying the string could contain whatever it contains (which is largely of little consequence) but that you can give a "view" of the string as UTF-8 if it's valid UTF-8, or UTF-32 if it's valid UTF-32. [...]
For what it's worth, that's the basic concept that I've adopted for the utf*_t modifications. The utf*_t gives only a code-point iterator (you can also get a char/char16_t/char32_t iterator from the type returned by the encoded() function). I plan to write a separate character iterator that will accept code-points and return actual Unicode characters. -- Chad Nelson Oak Circle Software, Inc. * * *