
Soares Chen Ruo Fei wrote:
with non-Unicode CJK encodings like Shift-JIS or GBK there is no way to go backward
Ahh I see so that's quite nasty, but actually it still can be done with the sacrifice on efficiency. Basically since the iterator already has the begin and end boundary iterators it can simply reiterate all over from the beginning of the string. Although doing so is roughly O(N^2) it shouldn't make significant impact as developers rarely use this multi-byte encoding and even seldom use the reverse decoding function.
As a general point, I believe it's a bad idea to hide a surprise like O(N^2) instead of O(N) complexity in a "rare" case. Doing so means that users will implement something that seems to work, and then get bitten later when it doesn't work in the field. (For example, the first time that a customer in Japan tries to process a 1 MB file and it takes a million times longer than expected.) It would be better to not provide the inefficient case at all. Compare with how std::list doesn't provide random access, even though it could do so in O(N). Looking at your character set iterator, it seems to me that you could have a forward-only iterator and a bidirectional iterator for UTF, but only the former for these other encodings. Not storing the begin iterator when only forward iteration is needed also saves space. Regards, Phil.