Re: [boost] [tokenizer] is exception unaware

On Jul 18, 2006, at 6:38 AM, boost-request@lists.boost.org wrote:
Sebastian Redl said: (by the date of Tue, 18 Jul 2006 14:06:59 +0200)
Janek Kozicki wrote:
boost::tokenizer<>::iterator pos=tok.begin(); - is this IMO bad design intentional?
Boost.Tokenizer is written as to model an iterator. Have you even seen an iterator throw an exception on reaching the end of the sequence?
The rationale for container iterators is that iterating over containers should not be slowed down by exceptions. Those iterators don't even check for the iterator's validity! And trying to access beyond the container is not detected by the program, but simply causes segmentation fault. That's good for usual containers.
The requirements of iterators don't come from containers - they come from algorithms. It is not valid to dereference an end() iterator of any kind. Introducing a check for end() on each dereference would incur a redundant check on each call.
So yes, the design is probably intentional.
However tokenizer::iterator is not a typical container nor typical iterator. It is even unusual that it checks for the iterator's validity (because trying to access beyond the container causes assert (valid_) to fail ).
tokenizer::iterator is not a container at all, it is an iterator adaptor. Some interesting bits about iterator adaptors. Any context free grammar can be expressed as an iterator adaptor - or put a different way, a context free grammar satisfies the requirements of an iterator adaptor. Tokenizer is just one example of this - if the base iterator satisfies InputIterator it can be adapted as an InputIterator, if the base iterator satisfies ForwardIterator it can be adapted as a ForwardIterator (Bidirectional and RandomAccess will can also be adapted to Forward). Any InputIterator adaptor can also be adapted to an OutputIterator adaptor - hence any context free grammar can be expressed as either (you can tokenize on the way in or out). Consider expressing it as: token_iter = std::copy(tok.begin(), tok.end(), std::back_inserter (my_vector)); Now you can see that an additional check for end() on each step of the tokenizer is redundant. You might also consider writing a bounded algorithm: template<typename I // I models InputIterator typename O> // O models OutputIterator std::pair<I, O> copy_bounded(I in_first, I in_limit, O out_first, O out_limit) { while (in_first != in_limit && out_first != out_limit) { *out_first = *in_first; ++in_first; ++out_first; } return std::make_pair(in_first, out_first); } A collection of such bounded functions would be a nice contribution to boost (or ASL). You could also write a bounds checking iterator adaptor which throws on dereferencing end() [or incrementing past end()] - I would be careful with such an adaptor with standard algorithms - I'd have to survey the library to come up with a list but some algorithms are memory adaptive and not likely to be expecting a failure at these points. It should be safe though to bounds check output iterators as they could throw on assignment on the dereference which would be equivalent to throwing on dereference. Microsoft provides bounds checking iterators in their library which are intended to improve security (I very misguided notion but I won't comment further on that topic) - such iterators simply terminate if the bounds are violated - for their intended purpose this would be correct behavior. Sean
participants (1)
-
Sean Parent