Re: [boost] [tokenizer] is exception unaware

18 Jul 2006

      On Jul 18, 2006, at 6:38 AM, boost-request@lists.boost.org wrote:
...
Sebastian Redl said:     (by the date of Tue, 18 Jul 2006 14:06:59  
+0200)
...
Janek Kozicki wrote:
...
boost::tokenizer<>::iterator pos=tok.begin();
- is this IMO bad design intentional?
Boost.Tokenizer is written as to model an iterator. Have you even  
seen
an iterator throw an exception on reaching the end of the sequence?
The rationale for container iterators is that iterating over  
containers
should not be slowed down by exceptions. Those iterators don't even
check for the iterator's validity! And trying to access beyond the
container is not detected by the program, but simply causes  
segmentation
fault. That's good for usual containers.
The requirements of iterators don't come from containers - they come  
from algorithms. It is not valid to dereference an end() iterator of  
any kind. Introducing a check for end() on each dereference would  
incur a redundant check on each call.
...
...
So yes, the design is probably intentional.
However tokenizer::iterator is not a typical container nor typical
iterator. It is even unusual that it checks for the iterator's  
validity
(because trying to access beyond the container causes assert 
(valid_) to fail ).
tokenizer::iterator is not a container at all, it is an iterator  
adaptor.

Some interesting bits about iterator adaptors. Any context free  
grammar can be expressed as an iterator adaptor - or put a different  
way, a context free grammar satisfies the requirements of an iterator  
adaptor. Tokenizer is just one example of this - if the base iterator  
satisfies InputIterator it can be adapted as an InputIterator, if the  
base iterator satisfies ForwardIterator it can be adapted as a  
ForwardIterator (Bidirectional and RandomAccess will can also be  
adapted to Forward). Any InputIterator adaptor can also be adapted to  
an OutputIterator adaptor - hence any context free grammar can be  
expressed as either (you can tokenize on the way in or out).

Consider expressing it as:

token_iter = std::copy(tok.begin(), tok.end(), std::back_inserter 
(my_vector));

Now you can see that an additional check for end() on each step of  
the tokenizer is redundant. You might also consider writing a bounded  
algorithm:

template<typename I // I models InputIterator
	typename O> // O models OutputIterator
std::pair<I, O> copy_bounded(I in_first, I in_limit, O out_first, O  
out_limit)
{
	while (in_first != in_limit && out_first != out_limit)
	{
		*out_first = *in_first;
		++in_first;
		++out_first;
	}
	return std::make_pair(in_first, out_first);
}

A collection of such bounded functions would be a nice contribution  
to boost (or ASL).

You could also write a bounds checking iterator adaptor which throws  
on dereferencing end() [or incrementing past end()] - I would be  
careful with such an adaptor with standard algorithms - I'd have to  
survey the library to come up with a list but some algorithms are  
memory adaptive and not likely to be expecting a failure at these  
points. It should be safe though to bounds check output iterators as  
they could throw on assignment on the dereference which would be  
equivalent to throwing on dereference. Microsoft provides bounds  
checking iterators in their library which are intended to improve  
security (I very misguided notion but I won't comment further on that  
topic) - such iterators simply terminate if the bounds are violated -  
for their intended purpose this would be correct behavior.

Sean

Sean Parent

tags

participants (1)