[GSOC] Regarding the project Boost.DeVector

Hi, I am interested in doing the Boost.DeVector project for GSOC 2009. I had certain queries regarding this project: 1. Does the standard specify any sort of implementation for deque, or does it only specify the requirements and interface? I ask this because if we assume a dynamic array implementation which grows on both the sides, implementing 'reserve', and specifying the size of the chunk should be a trivial task. 2. What are the examples of the low-level operations for which this container is over-encapsulated? Regards Satyam

Hi Satyam, 1. Does the standard specify any sort of implementation for deque, or
does it only specify the requirements and interface? I ask this because if we assume a dynamic array implementation which grows on both the sides, implementing 'reserve', and specifying the size of the chunk should be a trivial task.
No, the standard only specifies requirements and interfaces, and I do not think you will be able to extend std::deque to implement the desired functionality using free function templates alone. My impression is that this project involves some re-design of the interface, though it should respect the standard container and sequence container requirements, and presumably should be as close as feasible to the interface of std::deque and std::vector (part of the "bringing the best of them both together" thing). You probably do not need to "assume a dynamic array implementation which grows on both the sides" since you are going to implement the container, thus you can choose whatever implementation you want so long as you can appropriately specify the interface and requirements. Regards, Eugene Wee P.S.: I have never tried implementing a deque myself, but I recall Josuttis saying that a deque is typically implemented as a list of blocks with the first and last blocks growing in opposite directions.

Hi Eugene,
You probably do not need to "assume a dynamic array implementation which grows on both the sides" since you are going to implement the container, thus you can choose whatever implementation you want so long as you can appropriately specify the interface and requirements.
Thanks for clarifying that. Since we can use any implementation for this, (which satisfies the requirements and an appropriate interface of course), I suggest using a dynamic array which grows on both sides. With this implementation, we could easily provide a function like reserve(x,y) which reserves x units of memory in the front and y units at the back. I would like to request your and other boost developers/users feedback on this approach. Also, with a list of blocks approach, wouldn't it be rather difficult to provide amortized O(1) random access? Regards Satyam

Hi Satyam, Thanks for clarifying that. Since we can use any implementation for
this, (which satisfies the requirements and an appropriate interface of course), I suggest using a dynamic array which grows on both sides.
As in you allocate one contiguous block, and start using it from the middle? The problem would be that you lose the property that pointers/iterators/references are not invalidated if the container only grows at the ends, since you may need to perform re-allocation after the beginning or end of the single contiguous block is reached. Also, with a list of blocks approach, wouldn't it be rather difficult
to provide amortized O(1) random access?
The list of blocks could be a vector of pointers, as described in the "Segmented Iterators and Hierarchical Algorithms" article that the ideas page links to. Regards, Eugene Wee

On Sunday 22 March 2009 13:37:03 Satyam Shekhar wrote:
1. Does the standard specify any sort of implementation for deque, or does it only specify the requirements and interface?
You might be interested in (a draft for the) standard itself http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2800.pdf The link is from wikipedia's entry for C+0xx. -- Kind regards, Esben

Hi Satyam, Satyam Shekhar skrev:
Hi,
2. What are the examples of the low-level operations for which this container is over-encapsulated?
One thing is the iterators. As the paper by Austern explains for( segment : c.segments() ) for( unsigned n = 0u; n < segment.size; ++ n ) foo( segment[n] ) is faster than using iterators. Another is related to serialization, see e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2212.html -Thorsten

Hi Thorsten, I have been working on my proposal for the last few days and am almost done. Some of the questions which came to my mind: a. Why would deque need a reserve function? Vector needs it because growing a vector is costly due to relocations, and a reserve can reduce those relocations. A deque which is *generally* (from the reading I did and the opinions on this list) implemented as a list of blocks (whose addresses are stored in an array) will not need the relocations because the blocks are all scattered in memory. The only case where relocations would be required is when the array storing the addresses grows and needs to be relocated. Will this be frequent? Is this the case we are targeting when we want to add the reserve function? b. Why isn't the reserve function already there in a deque? Assuming there is a valid case for this function, adding it shouldn't be too difficult. Allocate a block(s) and add the address to the top-level array. c. Why would be want to specify the size of the chunks? Again, as in a., relocation does not take place in a deque except the top-level array, and changing the chunk size will not change the frequency of relocations in that. Also, with blocks of unequal sizes, providing constant time random access will be difficult. Regarding the project proposal, I wished to know whether to put it on the mailing list first, or submit it through the gsoc app. Regards Satyam

Satyam Shekhar skrev:
Hi Thorsten, I have been working on my proposal for the last few days and am almost done. Some of the questions which came to my mind:
a. Why would deque need a reserve function? Vector needs it because growing a vector is costly due to relocations, and a reserve can reduce those relocations. A deque which is *generally* (from the reading I did and the opinions on this list) implemented as a list of blocks (whose addresses are stored in an array) will not need the relocations because the blocks are all scattered in memory. The only case where relocations would be required is when the array storing the addresses grows and needs to be relocated. Will this be frequent? Is this the case we are targeting when we want to add the reserve function?
b. Why isn't the reserve function already there in a deque? Assuming there is a valid case for this function, adding it shouldn't be too difficult. Allocate a block(s) and add the address to the top-level array.
Well, I can't say why exactly std::deque is like it is. But I guess it has something to do with shrinking the container automatically when elements are erased (std::vector never does this). Now, I think it will be possible to implement reserve such that it allocates one big chunk with the new segments pointing into this chunk by adding a flag saying if the segment can be deallocated.
c. Why would be want to specify the size of the chunks? Again, as in a., relocation does not take place in a deque except the top-level array, and changing the chunk size will not change the frequency of relocations in that. Also, with blocks of unequal sizes, providing constant time random access will be difficult.
The chunk size needs to be the same. So if the chunk size is changed after elements is added to the container, it would need to reallocate to new segments (sort of like rehash()). But, initially, it would be very handy to set the segment size such that it matches your particular application, and such that allocation performance is portable accross compilers. With std::deque you just don't know.
Regarding the project proposal, I wished to know whether to put it on the mailing list first, or submit it through the gsoc app.
I guess you should do both. -Thorten

On Friday 27 March 2009 20:02:41 Satyam Shekhar wrote:
b. Why isn't the reserve function already there in a deque? Assuming there is a [use] valid case for this function, adding it shouldn't be too difficult. Allocate a block(s) and add the address to the top-level array.
I'm pretty sure the assumption "there is a valid [use] case" is false. reserve() is in vector so that clients can control when a possible complete realloc+move is going to happen, and possibly even avoid it altogether. This is important for a lot of reasons, including that all iterators get invalidated. For deque, new blocks are simply allocated as needed and the existing blocks are never reallocated. Thus, there would be little point with preallocating the array. -- kind regards, Esben

Esben Mose Hansen skrev:
On Friday 27 March 2009 20:02:41 Satyam Shekhar wrote:
b. Why isn't the reserve function already there in a deque? Assuming there is a [use] valid case for this function, adding it shouldn't be too difficult. Allocate a block(s) and add the address to the top-level array.
I'm pretty sure the assumption "there is a valid [use] case" is false.
reserve() is in vector so that clients can control when a possible complete realloc+move is going to happen, and possibly even avoid it altogether. This is important for a lot of reasons, including that all iterators get invalidated. For deque, new blocks are simply allocated as needed and the existing blocks are never reallocated. Thus, there would be little point with preallocating the array.
Please see my other post. -Thorsten

Another thought about this project - the paper by Matthew Austern about segmented iterators mentions 'segmentedness' of an iterator as orthogonal to it's traversal or reference category. Now one way to approach the task of creating segmented iterators for Devector is to use the Boost.Iterator library's iterator_facade/iterator_adaptor. But would it be worthwhile to modify Boost.Iterator itself and define new concepts using segmentedness as another orthogonal category, and then modify iterator_facade to support it? One of the factors influencing this would obviously be whether segmented iterators are going to be used widely in the future for other containers. (hash tables etc) Regards Satyam

Satyam Shekhar wrote:
One of the factors influencing this would obviously be whether segmented iterators are going to be used widely in the future for other containers. (hash tables etc)
The segmented iterator abstraction certainly makes a lot of sense, so is there any reason why it should not be used widely in the future? Perhaps Andrei Alexandrescu will tell us some reasons why not: Dave Abrahams (http://lists.boost.org/Archives/boost/2009/03/150068.php) wrote:
The official schedule is now live at http://www.boostcon.com/program! A few highlights of the "mouth-watering content" (in the words of one enrollee) are:
* A keynote address from Andrei Alexandrescu called "Iterators Must Go." It's sure to be provocative!
I hope there will be some way to learn about his opinions even for non-BoostCon participants. Regards, Thomas

Satyam Shekhar wrote:
But would it be worthwhile to modify Boost.Iterator itself and define new concepts using segmentedness as another orthogonal category, and then modify iterator_facade to support it? One of the factors influencing this would obviously be whether segmented iterators are going to be used widely in the future for other containers.
segmented iterators can also be used to iterate through recursive structures, such as trees, if I understood what I read on the subject correctly.

Mathias Gaunard on March 30 wrote:
Satyam Shekhar wrote:
But would it be worthwhile to modify Boost.Iterator itself and define new concepts using segmentedness as another orthogonal category, and then modify iterator_facade to support it? One of the factors influencing this would obviously be whether segmented iterators are going to be used widely in the future for other containers.
segmented iterators can also be used to iterate through recursive structures, such as trees, if I understood what I read on the subject correctly.
On a second thought (while deleting old mail...), this statement is misleading for several reasons: 1) A "normal" non-segmented iterator is perfectly able to iterate through recursive structures, such as trees. 2) A segmented iterator would be more efficient for "compile-time" recursice structures, but for "run-time" recursive structures, it would just fall back to the behavior (and efficiency) of a "normal" non-segmented iterator. 3) The "real" efficiency advantage of segmented iterators will normally come from the "inner-most" iteration, so whether a given recursive data structure will gain something by providing segmented-iterators will only depend on whether it is possible to provide a more efficient "inner-most" iteration for that data structure than without segmented iterators.

Satyam Shekhar skrev:
Another thought about this project - the paper by Matthew Austern about segmented iterators mentions 'segmentedness' of an iterator as orthogonal to it's traversal or reference category. Now one way to approach the task of creating segmented iterators for Devector is to use the Boost.Iterator library's iterator_facade/iterator_adaptor. But would it be worthwhile to modify Boost.Iterator itself and define new concepts using segmentedness as another orthogonal category, and then modify iterator_facade to support it? One of the factors influencing this would obviously be whether segmented iterators are going to be used widely in the future for other containers. (hash tables etc)
Well, one of the major obstacles with segmented iterators is that no standard algorithms support them. I guess it would be yet another GSOC project to implement this for boost.range ex etc. -Thorsten
participants (6)
-
Esben Mose Hansen
-
Eugene Wee
-
Mathias Gaunard
-
Satyam Shekhar
-
Thomas Klimpel
-
Thorsten Ottosen