
Hi Chad, Like Mathias I'm not very enthusiastic about the approach that you're taking here - but there is plenty of space for different approaches, so if you want to do it like this you are welcome to do so. My own approach has been to: - Store text in sequence-of-byte containers of whatever sort seem appropriate, i.e. std::string, std::vector<char>, raw memory etc. - Use iterator adaptors to access that data as UTF-8 when appropriate. - Use std::algorithms like find(begin,end,what) rather than std::string members. This works for me, and I recommend it. So I have one comment on this exchange: Chad Nelson wrote:
There is no need for any reasoning: look at the code of your code point iterator. It uses a pointer and indexes, and is therefore not a generic iterator adaptor.
It wasn't meant to be generic. It was meant to be exactly what it is: an iterator specific to the UTF type where it's defined. For that purpose, it's designed exactly as it should be, IMHO.
Iterating through code points is fully generic and should work for any forward iterator or bidirectional iterator, not just a pointer.
I could make it fully generic, but it wouldn't be nearly as efficient that way. I chose to do the extra work to make it efficient.
I have to challenge your efficiency comment. I have UTF-8 encoding and decoding that works with generic iterators, including pointers, and I have no efficiency issues resulting from its genericity. In fact I spent some time carefully optimising it and I believe that when used with pointers it is as good as I could get by writing it in assembler. Regards, Phil.