
On Mon, Jul 20, 2009 at 18:42, Eric Niebler<eric@boostpro.com> wrote:
Mathias Gaunard wrote:
Rogier van Dalen wrote:
Non-checking iterator adaptors can be faster. That would be useful when you know that a string is safe, for example, in a UTF string type that has a validity invariant.
I suppose that type of string should probably use optimized iterators that make use of the fact it is stored on contiguous and properly aligned memory anyway, so it will need special code.
There are 2 orthogonal issues here: 1) whether a sequence is stored in contiguous memory 2) whether it is already guaranteed to be well-formed UTF-XX
I think where confusion could arise is this: even thought these issues are orthogonal, if it's just about optimising, it might be acceptable to write code for a specific special case. However, one policy that is sensible is of "repairing" an invalid string: interpreting overlong sequences; and replacing uninterpretable code units by U+FFFD "Replacement character". This is similar to Cory's "ReplaceCheckFailures". Such a policy is necessary if a program needs to read a corrupted UTF file and make the most out of it. On the other hand, the current behaviour of throwing an error at overlong or invalid sequences is also sensible. The one-to-one relation between encoded and decoded form makes it the safest choice. It can guarantee there are no NULLs in the decoded form that were not in the encoded form. I think both of these policies (and possibly others that I haven't thought of) will need to be supported. Checking policies are therefore not just an optimisation.
Conflating the two will lead to bad design. I agree with Rogier.
That is great...
The routines should make checking a policy. Iterators should be non-checked. Checked iterators can be adaptors.
... but I'm not sure I understand what you mean. I read this as "you can build a checking iterator adaptor on top of an non-checking iterator adaptor". I don't think this is true for decoding UTF. I suspect, therefore, that I misunderstand something. Cheers, Rogier