Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

14 Aug 2011

      Soares Chen Ruo Fei wrote:
...
...
with non-Unicode CJK encodings
like Shift-JIS or GBK there is no
way to go backward
...
Ahh I see so that's quite nasty, but actually it still can be done
with the sacrifice on efficiency. Basically since the iterator already
has the begin and end boundary iterators it can simply reiterate all
over from the beginning of the string. Although doing so is roughly
O(N^2) it shouldn't make significant impact as developers rarely use
this multi-byte encoding and even seldom use the reverse decoding
function.
As a general point, I believe it's a bad idea to hide a surprise like 
O(N^2) instead of O(N) complexity in a "rare" case.  Doing so means 
that users will implement something that seems to work, and then get 
bitten later when it doesn't work in the field.  (For example, the 
first time that a customer in Japan tries to process a 1 MB file and it 
takes a million times longer than expected.)

It would be better to not provide the inefficient case at all.  Compare 
with how std::list doesn't provide random access, even though it could 
do so in O(N).  Looking at your character set iterator, it seems to me 
that you could have a forward-only iterator and a bidirectional 
iterator for UTF, but only the former for these other encodings.  Not 
storing the begin iterator when only forward iteration is needed also 
saves space.

Regards,  Phil.

Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

Phil Endecott