Boost::Python iterators are off by 1?

Dan Eloff

2 Sep 2004 2 Sep '04

7:33 a.m.

If this is my fault the docs need to be more explanatory, but here's the problem: for the sequence (0, 1, 2), the Boost::Python iterator starts at 1 and ends 1 past 2 ( where 3 might be ). It does this because it calls the C++ iterators functions in this order: operator == post-increment dereference It should imho call them like (which would correct the issue) operator == dereference post-increment Like a normal for loop would execute: condition, code, action. The code to reproduce this is: #include <iterator> #include <boost/python.hpp> namespace py = boost::python; struct ctn { struct iterator: public std::iterator< std::input_iterator_tag, py::str > { int mNum; iterator( int num = 0 ): mNum( num ) {} py::str operator *() const { char temp[20]; const char * str = itoa( mNum, temp, 10 ); return py::str( str, strlen( str ) ); } iterator& operator ++(int) { mNum++; return *this; } friend bool operator ==( iterator lhs, iterator rhs ) { return ( lhs.mNum == rhs.mNum ); } }; iterator begin() { return iterator( 0 ); } iterator end() { return iterator( 3 ); // the range (0, 1, 2) three items } }; BOOST_PYTHON_MODULE(iter_bug) { py::class_< ctn >( "ctn" ) .def( "__iter__", py::iterator< ctn >() ) ; }

...

...
...
from iter_bug import * c = ctn() for i in c: ... print i ... 1 2 3

Notice how it's off by one because of the very strange order of doing things. Please help! -Dan

Show replies by date

David Abrahams

2 Sep 2 Sep

10:50 a.m.

Dan Eloff <dan.eloff@gmail.com> writes:

...

If this is my fault the docs need to be more explanatory

I don't think so, unless it's Boost.Python's job to explain how to build C++ iterators. Your iterator is nonconforming in many ways. I suggest you use the Boost Iterator Library if you need to build one (http://www.boost.org/libs/iterator). In this case, you could probably use a transform_iterator over a counting_iterator. The problem here is that your operator++ doesn't return a copy of the original iterator, the way it should (but there are lots of other problems, so use the iterator library instead):

...

The code to reproduce this is:

#include <iterator>

#include <boost/python.hpp>

namespace py = boost::python;

struct ctn { struct iterator: public std::iterator< std::input_iterator_tag, py::str > { int mNum;

iterator( int num = 0 ): mNum( num ) {}

py::str operator *() const { char temp[20]; const char * str = itoa( mNum, temp, 10 ); return py::str( str, strlen( str ) ); }

iterator operator ++(int) { iterator tmp = *this; mNum++; return tmp; }

...

friend bool operator ==( iterator lhs, iterator rhs ) { return ( lhs.mNum == rhs.mNum ); } };

iterator begin() { return iterator( 0 ); }

iterator end() { return iterator( 3 ); // the range (0, 1, 2) three items } };

HTH, -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Dan Eloff

7:16 p.m.

Yes, it was a bit of a simplification on the real iterator that was causing the problem, but I forgot that I hadn't implement post increment correctly. At the time I didn't see how it would get me into trouble, now I do :) My input iterator is a little tricky to create because I don't know how many times it can be incremented before reaching the end. The end() iterator simply holds a NULL pointer, incrementing it does nothing. When the begin() iterator is incremented enough I set it's pointer to NULL and it becomes equal to end(). This causes a lot of non-standard issues ( like end() == end()+1 ). I'm hoping that particular one doesn't cause a problem for me. Again it seems the trouble is with me rather than Boost::Python. Thanks for setting me straight, you saved me a lot of debugging time. -Dan

David Abrahams

8:15 p.m.

Dan Eloff <dan.eloff@gmail.com> writes:

...

Yes, it was a bit of a simplification on the real iterator that was causing the problem, but I forgot that I hadn't implement post increment correctly. At the time I didn't see how it would get me into trouble, now I do :) My input iterator is a little tricky to create because I don't know how many times it can be incremented before reaching the end. The end() iterator simply holds a NULL pointer, incrementing it does nothing. When the begin() iterator is incremented enough I set it's pointer to NULL and it becomes equal to end(). This causes a lot of non-standard issues ( like end() == end()+1 ). I'm hoping that particular one doesn't cause a problem for me.

Just use boost::iterator_facade to implement your iterator and myriad issues will disappear.

...

Again it seems the trouble is with me rather than Boost::Python. Thanks for setting me straight, you saved me a lot of debugging time.

You're welcome. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Dan Eloff

3 Sep 3 Sep

12:36 a.m.

...

Just use boost::iterator_facade to implement your iterator and myriad issues will disappear.

Almost, it cut the code down to about 1/3 of the size and 1/6 the complexity, allowing me to much more easily see where the problems were and how to fix them. However, one problem did not go away. Boost::Python does the C++ equivelent of: for( iter it = ctn.begin(); !(it == ctn.end()); ) *it++; The choice of post-increment was evil in my scenario. I had a C third-part function returning a pointer to a heap allocated structure or NULL if there is no more data. Calling the function a second time frees the previous results memory. No matter how I looked at the problem, I kept ending up with 2 of these structures at any given point in time, one being held by the temporary iterator being returned from the post increment, and one by the modified original iterator. One was of course invalid. I could have copied it into a new memory block before obtaining the second, but doing that would double the copying and memory usage of an already memory intensive operation. I did eventually get it working, but through a blatant hack only. Knowing the order the operations, I realised that if I put the code from the increment() function into equal() function before the comparison is made then everything will work out fine. I was unable to think of a better solution, perhaps you can think of one? May I ask why Boost::Python uses post-increment and then derefencing instead of pre-increment after derefencing. I don't understand why it was done this way, but I'm sure there was a good reason. Cheers, -Dan

David Abrahams

2:12 a.m.

Dan Eloff <dan.eloff@gmail.com> writes:

...

...
Just use boost::iterator_facade to implement your iterator and myriad issues will disappear.

Almost, it cut the code down to about 1/3 of the size and 1/6 the complexity, allowing me to much more easily see where the problems were and how to fix them.

However, one problem did not go away. Boost::Python does the C++ equivelent of:

for( iter it = ctn.begin(); !(it == ctn.end()); ) *it++;

The choice of post-increment was evil in my scenario. I had a C third-part function returning a pointer to a heap allocated structure or NULL if there is no more data. Calling the function a second time frees the previous results memory.

No matter how I looked at the problem, I kept ending up with 2 of these structures at any given point in time, one being held by the temporary iterator being returned from the post increment, and one by the modified original iterator.

One was of course invalid. I could have copied it into a new memory block before obtaining the second, but doing that would double the copying and memory usage of an already memory intensive operation.

I did eventually get it working, but through a blatant hack only. Knowing the order the operations, I realised that if I put the code from the increment() function into equal() function before the comparison is made then everything will work out fine.

I was unable to think of a better solution, perhaps you can think of one?

For things like this the solution is usually to store the dereferenced value inside the iterator itself. You might look into using boost::optional for efficiency in this case.

...

May I ask why Boost::Python uses post-increment and then derefencing instead of pre-increment after derefencing. I don't understand why it was done this way, but I'm sure there was a good reason.

I'm not sure there _is_ a good reason. If you'd like to patch it to do it the other way, and if all of the Boost.Python regression tests still pass, I'll apply your patch. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

David Abrahams

2:13 a.m.

Dan Eloff <dan.eloff@gmail.com> writes:

...

Just use boost::iterator_facade to implement your iterator and myriad

...
issues will disappear.

Almost, it cut the code down to about 1/3 of the size and 1/6 the complexity, allowing me to much more easily see where the problems were and how to fix them.

Oh, BTW, Boost.Python correspondence should go to the C++-sig: http://www.boost.org/more/mailing_lists.htm#cplussig -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

Dan Eloff

5:27 a.m.

...

For things like this the solution is usually to store the dereferenced value inside the iterator itself. You might look into using boost::optional for efficiency in this case.

If I store it in the iterator I get crashes because a reference is floating around to the long dead temporary iterator returned from post increment. I can store it in the parent (container) class of the iterator ( what I do now anyway ) but I need to store two values with post increment.

...

I'm not sure there _is_ a good reason. If you'd like to patch it to do it the other way, and if all of the Boost.Python regression tests still pass, I'll apply your patch.

If only so nobody else will have to go through what I just did today and yesterday I will make the patch. I'll post to the C++-sig when I've got it.

David Abrahams

1:17 p.m.

Dan Eloff <dan.eloff@gmail.com> writes:

...

...
For things like this the solution is usually to store the dereferenced value inside the iterator itself. You might look into using boost::optional for efficiency in this case.

If I store it in the iterator I get crashes because a reference is floating around to the long dead temporary iterator returned from post increment.

Dereference can return by value. That's normal for input iterators. But IIRC, the user of an input iterator isn't allowed to bind a reference to or take the address of the result of dereferencing anyway. So you might want to fix the code that does that.

...

I can store it in the parent (container) class of the iterator ( what I do now anyway ) but I need to store two values with post increment.

...
I'm not sure there _is_ a good reason. If you'd like to patch it to do it the other way, and if all of the Boost.Python regression tests still pass, I'll apply your patch.

If only so nobody else will have to go through what I just did today and yesterday I will make the patch. I'll post to the C++-sig when I've got it.

Please try to run the regression tests first! -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

7637

Age (days ago)

7638

Last active (days ago)

List overview

Download

8 comments

2 participants

participants (2)

Dan Eloff
David Abrahams