[GSoC] Heaps & Queues (draft proposal)

Hi everyone. I am Lintao from Sun Yat-sen University, China. This is my first time to send mail on mailing list. I'm a little nervous:-). I am interested in Heaps & Queues provided by Boost in 2010 GSoC. And I have finished my draft proposal after viewing the mailing list these days. Thank you for your comments and suggestions. ====== project proposal ====== In my opinion, most of work is to reuse and rearrange the existing data structure in boost/pending and other data structures which have already existed.[1] The most important thing I should consider is the APIs design of the new library. So for each data structure, it should not only agree to the formulation provided by the BGL, but also should be easy to remember and user-friendly. Besides, I would like to add d-ary heap and splay heap into the new library. Because d-ary heap is a variation of binary heap and has faster running time. As for splay heap, it has the attractive property that recently accessed elements are quick to access again. I think it will be very useful in application software programming. If time permits, I want to add others implementations of heap [2] into the new library. Below is my design to the library APIs. template <class T, class Compare = std::less<T>> class heap{ public: * void insert(T& key);* * void delete(T* key);* * void update(T* t, T& new_key);* T* getRoot(); T* find(T& key); bool empty() const; size_type size() const; void print() const; private: void print_recur() const; protected: T* _root; }; NOTICE: * operations in bold mean it should be implement by the subclass. * operations print and print_recur is optional. But I prefer to include them because it'll be helpful when debugging the programs using this library. The class mentioned above is the base class for all the heap structures in the new library. The classes such as Fibonacci heap, relaxed heap, binomial heap and splay heap should inherit from this abstract class and add additional operations if needed. And I would prefer to use lists rather than vectors (as shown in boost/pending/fibonacci_heap.hpp) to implement it for three reasons. 1. It will save memory since the later needs extra space to store the key to value mapping. 2. Vector requires continuous memory while list doesn’t. It does not matter when the amount of data is small. But when the amount of data becomes large, it will be a tough problem; 3. It takes time to resize and reallocate the memory for vectors. Of course, list implementation has its own drawback such as easy to leak memory and cost more time when searching the heap. From my point of view, it is acceptable comparing with the problems mentioned before. As the adaptor of heaps, the priority queue API may like: template<class T, class Compare = std::less<T>, class Container = fibonacci_heap<T, Compare>> class priority_queue{ public: void push(T& key); T pop(); const T& front() const; T& front(); T* find(T& key); void update(T* t, T& new_key); bool empty() const; size_type size() const; void print() const; private: void print_recur() const; }; [1][2] It means the data structure in this URL: https://svn.boost.org/svn/boost/sandbox/libs/pri_queue/ Thank you for reading.

Just a couple of technical points.
* void update(T* t, T& new_key);*
What if T is non-copyable, doesn't support efficient copies, or the user only wants to update part of t? I don't think that this is quite the right solution. And I would prefer to use lists rather than vectors (as shown in
boost/pending/fibonacci_heap.hpp) to implement it for three reasons. 1. It will save memory since the later needs extra space to store the key to value mapping. 2. Vector requires continuous memory while list doesn’t. It does not matter when the amount of data is small. But when the amount of data becomes large, it will be a tough problem; 3. It takes time to resize and reallocate the memory for vectors.
Why not allow the user select the underlying container type? Andrew Sutton andrew.n.sutton@gmail.com

iaml wrote:
And I would prefer to use lists rather than vectors (as shown in boost/pending/fibonacci_heap.hpp) to implement it for three reasons. 1. It will save memory since the later needs extra space to store the key to value mapping. 2. Vector requires continuous memory while list doesn't. It does not matter when the amount of data is small. But when the amount of data becomes large, it will be a tough problem; 3. It takes time to resize and reallocate the memory for vectors.
Of course, list implementation has its own drawback such as easy to leak memory and cost more time when searching the heap. From my point of view, it is acceptable comparing with the problems mentioned before. As the adaptor of heaps, the priority queue API may like:
I think you are wrong about all three objections to vector. 1. Vectors should save you memory over lists. 2. Large vectors causing memory allocation failure due to fragmentation isn't generally a problem in 64 bit platforms and the right way to deal with the problem in 32 bit systems is to not fragment your memory in the first place. You should not sacrifice performance of a library to accommodate poorly written applications on some platforms at the expense of all other applications and all other platforms. 3. Vectors are faster than lists for almost all use cases. I also agree with Andrew that you can parameterize for container type. In the stl, heap operations there is a very clear abstraction between data structures and algorithms. I would think that providing that abstraction should be a goal of the project. Regards, Luke

I think you are wrong about all three objections to vector.
1. Vectors should save you memory over lists.
2. Large vectors causing memory allocation failure due to fragmentation isn't generally a problem in 64 bit platforms and the right way to deal with the problem in 32 bit systems is to not fragment your memory in the first place. You should not sacrifice performance of a library to accommodate poorly written applications on some platforms at the expense of all other applications and all other platforms.
3. Vectors are faster than lists for almost all use cases.
I also agree with Andrew that you can parameterize for container type. In the stl, heap operations there is a very clear abstraction between data structures and algorithms. I would think that providing that abstraction should be a goal of the project.
Well said. The only thing I would add to this is that a linked list will not suffice for a binary heap implementation, if that's what's being proposed, but binary tree will. The difference is in these highlighted nicely in the difference between std::set/map and the upcoming Boost.Container flat_set/flat_map [1], which I just discovered :) Andrew Sutton andrew.n.sutton@gmail.com [1] http://svn.boost.org/svn/boost/sandbox/move/libs/container/doc/html/index.ht...
participants (3)
-
Andrew Sutton
-
iaml
-
Simonson, Lucanus J