Re: [boost] [UTF String] UTF String library 1.5 ready for perusal

10 Feb 2011

      Hi Chad,

Like Mathias I'm not very enthusiastic about the approach that you're 
taking here - but there is plenty of space for different approaches, so 
if you want to do it like this you are welcome to do so.

My own approach has been to:
- Store text in sequence-of-byte containers of whatever sort seem 
appropriate, i.e. std::string, std::vector<char>, raw memory etc.
- Use iterator adaptors to access that data as UTF-8 when appropriate.
- Use std::algorithms like find(begin,end,what) rather than std::string members.

This works for me, and I recommend it.

So I have one comment on this exchange:

Chad Nelson wrote:
...
...
There is no need for any reasoning: look at the code of your code
point iterator. It uses a pointer and indexes, and is therefore not a
generic iterator adaptor.
It wasn't meant to be generic. It was meant to be exactly what it is:
an iterator specific to the UTF type where it's defined. For that
purpose, it's designed exactly as it should be, IMHO.
...
Iterating through code points is fully generic and should work for
any forward iterator or bidirectional iterator, not just a pointer.
I could make it fully generic, but it wouldn't be nearly as efficient
that way. I chose to do the extra work to make it efficient.
I have to challenge your efficiency comment.  I have UTF-8 encoding and 
decoding that works with generic iterators, including pointers, and I 
have no efficiency issues resulting from its genericity.  In fact I 
spent some time carefully optimising it and I believe that when used 
with pointers it is as good as I could get by writing it in assembler.

Regards,  Phil.

Re: [boost] [UTF String] UTF String library 1.5 ready for perusal

Phil Endecott