GSoC 2010: Heaps and Queues

Dan Larkin

24 Mar 2010 24 Mar '10

9:43 p.m.

I'm another potential student throwing my hat in the ring for the heap project. I'll save most of my personal details for my proposal, but for right now I thought I'd try to get a bit of a dialogue going with the community to help me understand what it is you guys are looking for and what it is I can work towards. So I'll start with my current understanding of the situation. The way I see it, the current heap implementations in boost/pending are rather fractured. They just have simple push/pop interfaces with no mutability (well, there's mutable_heap.hpp, but that's separate and not fleshed out). All in all, what seems to be needed is a standardized heap concept/API along with a few fully-implemented models. Standard heap functionality begs the following functions: - findMin - deleteMin - insert - changeKey - merge These can all be implemented quite efficiently (either worst case or amortized O(logn) time or better) for any heap model. I would be looking to implement the d-ary, binomial, and Fibonacci heaps for sure, with relaxed and Brodal heaps as a possibility, should time allow it. These heap models would be built as extensions of a base Heap class, which could be easily wrapped in a PriorityQueue class which takes the heap-type as a construction parameter (and defaults to a d-ary heap for instance). I know that a stable priority queue was mentioned in earlier discussion, but I'm not sure what to do about that just yet. It'll take a little more thinking before I could say whether I think it'd fall under the scope of the summer project, or if it should be put on the back-burner for later addition to the library. I guess that's basically how I see the problem so far. Please let me know what you think, or what other sort of information/planning/etc. you would like to hear from me prior to my proposal/draft. Dan Larkin

Show replies by date

Dan Larkin

26 Mar 26 Mar

3:54 p.m.

...

- findMin - deleteMin - insert - changeKey - merge These can all be implemented quite efficiently (either worst case or amortized O(logn) time or better) for any heap model.

Just realized this isn't entirely true. A merge on two heaps can be O(n) depending on the model. In any case, still eagerly awaiting some sort of feedback. Dan Larkin

Andrew Sutton

4:10 p.m.

...

I'm another potential student throwing my hat in the ring for the heap project.

Hi Dan, Sorry I've been slow responding. I'm having trouble keeping track of which GSoC email I've responded to :)

...

The way I see it, the current heap implementations in boost/pending are rather fractured. They just have simple push/pop interfaces with no mutability (well, there's mutable_heap.hpp, but that's separate and not fleshed out).

That's a fairly accurate description of the state of things. You should also look at the C++ Standard heap algorithms and priority queue. You might want to consider what how you might change those to support mutability in standard heaps. I know that a stable priority queue was mentioned in earlier discussion, but

...

I'm not sure what to do about that just yet.

In general a priority queue can essentially be thought of as a queue adaptor for an (partially?) ordered data structure. Stability is a property of the ordering of elements in the underlying data structure. Such that two equivalent objects in the queue will be popped in the reverse order of their insertion. Or was it the same order? You could probably make arguments for either or as long as its deterministic. I guess that's basically how I see the problem so far. Please let me know

...

what you think, or what other sort of information/planning/etc. you would like to hear from me prior to my proposal/draft.

Draft proposals are welcome. It's easier to give feedback than to suggest ideas :) Andrew Sutton andrew.n.sutton@gmail.com

Rajendran Thirupugalsamy

5:10 p.m.

Binomial heap and Fibonacci heap can do these operations in constant or O(log n) amortized. Basically merge operation can be done in log n. On Fri, Mar 26, 2010 at 12:10 PM, Andrew Sutton <andrew.n.sutton@gmail.com>wrote:

...

...
I'm another potential student throwing my hat in the ring for the heap project.

Hi Dan,

Sorry I've been slow responding. I'm having trouble keeping track of which GSoC email I've responded to :)

...
The way I see it, the current heap implementations in boost/pending are rather fractured. They just have simple push/pop interfaces with no mutability (well, there's mutable_heap.hpp, but that's separate and not fleshed out).

That's a fairly accurate description of the state of things. You should also look at the C++ Standard heap algorithms and priority queue. You might want to consider what how you might change those to support mutability in standard heaps.

I know that a stable priority queue was mentioned in earlier discussion, but

...
I'm not sure what to do about that just yet.

In general a priority queue can essentially be thought of as a queue adaptor for an (partially?) ordered data structure. Stability is a property of the ordering of elements in the underlying data structure. Such that two equivalent objects in the queue will be popped in the reverse order of their insertion. Or was it the same order? You could probably make arguments for either or as long as its deterministic.

I guess that's basically how I see the problem so far. Please let me know

...
what you think, or what other sort of information/planning/etc. you would like to hear from me prior to my proposal/draft.

Draft proposals are welcome. It's easier to give feedback than to suggest ideas :)

Andrew Sutton andrew.n.sutton@gmail.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Regards, Rajendran.T

Dan Larkin

2 Apr 2 Apr

3:39 a.m.

Andrew Sutton wrote:

...

Draft proposals are welcome. It's easier to give feedback than to suggest ideas :)

So, now that it's into the proper application period, would it be best to submit a draft proposal here for community feedback, or simply into the proper application and take care of revisions there? Dan Larkin

Andrew Sutton

1:31 p.m.

...

So, now that it's into the proper application period, would it be best to submit a draft proposal here for community feedback, or simply into the proper application and take care of revisions there?

You could the "project description" to the list. That's usually what you want to have people look at. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

3 Apr 3 Apr

3:47 p.m.

Andrew Sutton wrote:

...

You could the "project description" to the list. That's usually what you want to have people look at.

Okay, sounds good. To that effect: I would like to implement a library which allows a simple, unified concept for a heap/priority queue with a common interface. Through this common interface, I would allow the user to select the specific model to use based on their needs. The standard interface would allow the following functions to be used regardless of the model: - insert - delete - merge - findMin - deleteMin - decreaseKey Furthermore, I would implement the following heap models: - D-ary - Binomial - Fibonacci - Brodal (may be replaced with a different model with similar performance, pending further research) That's the gist of it. I'm not sure how much detail you want in the proposal, since a lot of the detail would be essentially in the form of code and I personally feel it would encroach upon the domain of the actual project work itself. In any case let me know what all of you think and what extra information you would want from me. Dan Larkin

Andrew Sutton

10:42 p.m.

...

That's the gist of it. I'm not sure how much detail you want in the proposal, since a lot of the detail would be essentially in the form of code and I personally feel it would encroach upon the domain of the actual project work itself. In any case let me know what all of you think and what extra information you would want from me.

I don't think it's a bad thing to suggest interfaces (in the form of code) in a proposal. Also, if you can compare and contrast with the current heap/priority queue data structures already in Boost, then you'll be doing a lot to demonstrate your understanding the project requirements. That's my opinion, anyway. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

4 Apr 4 Apr

4:47 p.m.

Andrew Sutton wrote:

...

I don't think it's a bad thing to suggest interfaces (in the form of code) in a proposal. Also, if you can compare and contrast with the current heap/priority queue data structures already in Boost, then you'll be doing a lot to demonstrate your understanding the project requirements. That's my opinion, anyway.

Okay. I'm about to head out for the day, but I'll write something up later tonight or tomorrow. Dan Larkin

Andrew Sutton

11:12 p.m.

...

I don't think it's a bad thing to suggest interfaces (in the form of code)

...
in a proposal. Also, if you can compare and contrast with the current heap/priority queue data structures already in Boost, then you'll be doing a lot to demonstrate your understanding the project requirements. That's my opinion, anyway.

Okay. I'm about to head out for the day, but I'll write something up later tonight or tomorrow.

Sounds good. There seems to be a lot of people submitting proposals for this project. Make sure that a) you actually describe what you're proposing to do and b) you put that in the context of what Boost already has. I cannot state how much better that makes an application. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

6 Apr 6 Apr

3:29 a.m.

Okay. Here's draft revision 1: I would like to implement a library which allows a simple, unified concept for a heap/priority queue with a common interface. Through this common interface, I would allow the user to select the specific model to use based on their needs. The standard interface would allow the following functions to be used regardless of the model: - insert - delete - merge - findMin - deleteMin - decreaseKey Furthermore, I would implement the following heap models: - D-ary - Binomial - Fibonacci - Brodal (may be replaced with a different model with similar performance, pending further research) As it stands, the current heap implementations in Boost C++ (specifically in sandbox/libs/pri_queue), take what I think to be the wrong approach. Heaps are conceptually very simple structures, as are their interfaces. By nature, they have a single important element (the minimum), and the rest is treated as a mystery to the outside world. I feel like the inclusion of iterators is unnecessary, and possibly misguided. This may be a lack of understanding on my part, but I'm frankly confused by how they apply to a heap structure. In any case, I also feel like a lot of the internal structure can be simplified, and memory usage minimized, by keeping only pointers to elements. This may seem excessive in the case of simple data types like integers, but I believe it will lead to much simpler memory management, despite the face that the user will need to handle some of it by themselves. That said, I do believe that the existing d-heap and f-heap implementations can be used as a guide to help me in my development process, in both examining the design decisions and verifying performance. Below are draft interfaces for the base Heap class and the PriorityQueue wrapper class. I've left out some of the tedious bits, such as constructors, in favor of keeping the focus on the functional interface. /** * Abstract base class specifying a simple, but fully-featured min-heap * interface. To simplify internal structure, ease the process of making * partial updates to potentially large member elements, and minimize memory * usage, only element pointers are stored, rather than actual data. * * @tparam T Type of elements stored in the heap * @tparam Compare Comparator for type T used for heap-ordering. Defaults to * standard '<' operator. */ template <class T, class Compare = std::less<T>> class Heap { public: /** * Inserts a new element into the heap. * * @param elem Pointer to element to insert */ virtual void insert( T* elem ) = 0; /** * Removes an arbitrary element from the heap and corrects the ordering * if needed. * * @param elem Pointer to element to remove */ virtual void remove( T* elem ) = 0; /** * Signals the heap that a given element has been updated. The heap may * need to be updated to guarantee a correct ordering. * * @param elem Pointer to element which has been updated */ virtual void update( T* elem ) = 0; /** * Returns the pointer to the minimum element in the heap. */ virtual T* findMin() = 0; /** * Removes the minimum element from the heap. */ virtual void removeMin() = 0; /** * Merges two heaps with the same type of elements. * * @param other Secondary heap to be merged */ virtual void merge( Heap<T,Compare> other ) = 0; /** * Returns true if no elements remain in the heap, false otherwise. */ virtual bool empty() = 0; /** * Returns the number of elements in the heap. */ virtual int size() = 0; }; /** * A flexible priority queue which allows specification of the underlying data * structure upon construction, chosen from the different implementations of * boost::Heap. The default structure is a binary heap. * * @tparam T Type of elements stored in the queue * @tparam Compare Comparator of type T for priority ordering. Defaults to * standard '<' operator. */ template <class T, class Compare = std::less<T>> class PriorityQueue { public: /** * Adds a new element to the queue. * * @param elem Pointer to element to push */ void push( T* elem ); /** * Retrieves the pointer to the minimum element in the queue and returns * it, removing the element from the queue afterwards. */ T* pop(); /** * Removes an arbitrary element from the queue and updates the ordering if * needed. * * @param elem Pointer to element to remove */ void remove( T* elem ); /** * Signals the priority queue that a given element has been updated. The * queue may need to be updated to guarantee a correct ordering. * * @param elem Pointer to element which has been updated */ void update( T* elem ); /** * Merges two priority queues with the same type of elements. * * @param other Secondary heap to merge */ void merge( PriorityQueue<T,Compare> other ); /** * Returns true if no elements remain in the queue, false otherwise. */ bool empty(); /** * Returns the number of elements in the heap. */ int size(); private: //! Heap containing the elements of the priority queue. Responsible for //! maintaining minimum element and providing other I/O functionality. Heap<T,Compare> data; }; (I apologize if the code looks a bit ugly here, but it should fix itself when copied to a text editor.) Once again, feedback is welcome. If my interpretation of existing code or the project idea is wrong, please, please point out my stupidity for my sake? Thanks. Dan Larkin

Andrew Sutton

12:53 p.m.

...

As it stands, the current heap implementations in Boost C++ (specifically in sandbox/libs/pri_queue), take what I think to be the wrong approach. Heaps are conceptually very simple structures, as are their interfaces. By nature, they have a single important element (the minimum), and the rest is treated as a mystery to the outside world. I feel like the inclusion of iterators is unnecessary, and possibly misguided. This may be a lack of understanding on my part, but I'm frankly confused by how they apply to a heap structure.

Pointers are used for updating elements after modifying them. Iterators are just abstract pointers (more or less), so really they play the same role. They aren't really intended to be used for actual iteration. Besides, you actually need the iterators for node-based data structures since T* may point inside a node. In any case, I also feel like a lot of the internal structure can be

...

simplified, and memory usage minimized, by keeping only pointers to elements. This may seem excessive in the case of simple data types like integers, but I believe it will lead to much simpler memory management, despite the face that the user will need to handle some of it by themselves. That said, I do believe that the existing d-heap and f-heap implementations can be used as a guide to help me in my development process, in both examining the design decisions and verifying performance.

This is probably a good evaluation.

...

Below are draft interfaces for the base Heap class and the PriorityQueue wrapper class. I've left out some of the tedious bits, such as constructors, in favor of keeping the focus on the functional interface.

If you plan to put code in your proposal keep it succinct and address the code in the text. Basically, leave out the comments :) Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

3:26 p.m.

Andrew Sutton wrote:

...

Pointers are used for updating elements after modifying them. Iterators are just abstract pointers (more or less), so really they play the same role. They aren't really intended to be used for actual iteration. Besides, you actually need the iterators for node-based data structures since T* may point inside a node.

Ah, that makes much more sense. I haven't looked too deeply into too many Boost libraries yet, so I'm not aware - is that iterator abstraction the preferred method over internal pointer manipulation? Originally I was considering the node pointers to be strictly internal structure and something that didn't really need interfacing, but I can see now why it would be useful, especially when extending the Boost classes to custom solutions.

...

If you plan to put code in your proposal keep it succinct and address the code in the text. Basically, leave out the comments :)

I'm used to code being written to speak for itself, but that's for when it's standing alone, not embedded in a proposal. Point taken. Once again, about to head into class so I can't take care of the revisions now, but later tonight I'll get revision 2 worked up and ready. (Also I apologize that I'm a bit more distracted than I usually would be. This is an exam week for me.) Dan Larkin

Andrew Sutton

10:57 p.m.

...

Ah, that makes much more sense. I haven't looked too deeply into too many Boost libraries yet, so I'm not aware - is that iterator abstraction the preferred method over internal pointer manipulation?

Not really "internal", just pointer-like. Typically they're used for iterating -- you can tell by the name :) but they're also used in pointer-like ways. Look at the STL list interfaces, especially insert() and erase().

...

I'm used to code being written to speak for itself, but that's for when it's standing alone, not embedded in a proposal. Point taken. Once again, about to head into class so I can't take care of the revisions now, but later tonight I'll get revision 2 worked up and ready. (Also I apologize that I'm a bit more distracted than I usually would be. This is an exam week for me.)

I don't believe that code does speak for itself; let the interface show what you want and describe in the text of the proposal. Don't feel too bad about being distracted. I've been at a postdoc interview for the last 2 days. Imagine how I feel :) Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

7 Apr 7 Apr

5:57 p.m.

Andrew Sutton wrote:

...

Not really "internal", just pointer-like. Typically they're used for iterating -- you can tell by the name :) but they're also used in pointer-like ways. Look at the STL list interfaces, especially insert() and erase().

Okay, so once again, focusing on studying for my exam tonight right now, but just doing a bit of research to get the gears turning in the background. Standard iterator functionality requires '++', '--', '==', 'begin()', and 'end()', correct? Did I miss anything? Other than that, I'm still a confused about the behavior of iterators within partially ordered data structures. It's one thing to have a pointer abstraction, but traversal seems to be an issue. In particular, if any changes to the structure are made during the traversal, the structure could re-arrange itself in such a way that individual nodes could be traversed multiple times before others are visited once, without any changes to the node value itself. This seems like poor behavior to me, and I'm not sure what to do with it. Perhaps someone could shed some light on the issue?

...

Don't feel too bad about being distracted. I've been at a postdoc interview for the last 2 days. Imagine how I feel :)

Oh man. I don't even want to think about that. D: Dan Larkin

Dan Larkin

6 p.m.

Dan Larkin wrote:

...

Okay, so once again, focusing on studying for my exam tonight right now, but just doing a bit of research to get the gears turning in the background. Standard iterator functionality requires '++', '--', '==', 'begin()', and 'end()', correct? Did I miss anything?

Silly me, there's plenty of documentation for this telling me that my original guess was quite wrong. This is what happens when I'm worrying about exams... Dan Larkin

Stewart, Robert

7:21 p.m.

Dan Larkin wrote:

...

Andrew Sutton wrote:

...
Not really "internal", just pointer-like. Typically they're used for iterating -- you can tell by the name :) but they're also used in pointer-like ways. Look at the STL list interfaces, especially insert() and erase().

Okay, so once again, focusing on studying for my exam tonight right now, but just doing a bit of research to get the gears turning in the background. Standard iterator functionality requires '++', '--', '==', 'begin()', and 'end()', correct? Did I miss anything?

It really depends upon the iterator category as to which operations are required. However, you certainly missed the dereferencing operator.

...

Other than that, I'm still a confused about the behavior of iterators within partially ordered data structures. It's one thing to have a pointer abstraction, but traversal seems to be an issue. In particular, if any changes to the structure are made during the traversal, the structure could re-arrange itself in such a way that individual nodes could be traversed multiple times before others are visited once, without any changes to the node value itself. This seems like poor behavior to me, and I'm not sure what to do with it. Perhaps someone could shed some light on the issue?

You define which container operations invalidate iterators and its up to clients to avoid those operations while iterating. _____ Rob Stewart robert.stewart@sig.com Software Engineer, Core Software using std::disclaimer; Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Andrew Sutton

8 Apr 8 Apr

12:11 p.m.

...

Okay, so once again, focusing on studying for my exam tonight right now, but just doing a bit of research to get the gears turning in the background. Standard iterator functionality requires '++', '--', '==', 'begin()', and 'end()', correct? Did I miss anything?

Begin and end are specific to a Container or Range (capitalization indicates a Concept), but more or less, right. Iterators also support dereferencing (*x if x has Iterator type).

...

Other than that, I'm still a confused about the behavior of iterators within partially ordered data structures. It's one thing to have a pointer abstraction, but traversal seems to be an issue. In particular, if any changes to the structure are made during the traversal, the structure could re-arrange itself in such a way that individual nodes could be traversed multiple times before others are visited once, without any changes to the node value itself. This seems like poor behavior to me, and I'm not sure what to do with it. Perhaps someone could shed some light on the issue?

That's a good observation, and has some interesting implications. By exposing an iterator (or pointer) into the heap, you admit the possibility that a user can modify an object in such a way that the heap property will be violated. For example, *x = 5; // Oops. Let's assume that this makes *x the minimum value in the heap. If x doesn't refer to the top of the heap, then we've violated the heap property. Conceivable, you could do two things: 1. Automatically update the heap if the user changes a value thru an iterator. This is a really bad idea for a number of reasons, but mostly because it's hard to generically trap modifying writes. Second, as you've observed, it could have some unknown side-effects on other iterators into the heap. Maybe they're invalid, maybe they now point to different values. I don't know. 2. Let the use update the modified value on their own. This means that heap will, for a short time, be invalid, but there aren't any complex side effects until the programmer invokes them. Of course, if I'd read the subsequent posts, then these questions would have already been answered :) Andrew Sutton andrew.n.sutton@gmail.com

iaml

4:29 p.m.

...

2. Let the use update the modified value on their own. This means that heap will, for a short time, be invalid, but there aren't any complex side effects until the programmer invokes them.

Sorry to interrupt your discussion. I don't quite understand the method mentioned above. Do you mean postponing the work for later operations? Which means change the key in heap but not heapify them until some operation (such as insertion) is called?

Andrew Sutton

10:02 p.m.

...

...
2. Let the use update the modified value on their own. This means that heap will, for a short time, be invalid, but there aren't any complex side effects until the programmer invokes them.

...

mentioned above. Do you mean postponing the work for later operations? Which means change the key in heap but not heapify them until some operation (such as insertion) is called?

If the programmer is going to modify an "en-heaped" object (an object that's been put into a heap), then they would need to update the heap before calling any other modifying operations. Maybe this approach allows too much freedom. I suppose you could also offer a method: void update(iterator pos, const T& x); // *pos = x; update(pos); That would prevent modifying operations in between updates. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

9 Apr 9 Apr

2:23 a.m.

Andrew Sutton wrote:

...

Maybe this approach allows too much freedom. I suppose you could also offer a method:

void update(iterator pos, const T& x); // *pos = x; update(pos);

That would prevent modifying operations in between updates.

This runs into the reference copying issue again. This assumes that T has a well-behaved and efficient copy-constructor, and does a full replacement of x rather than allowing for a partial update. I'm not convinced that the extra safety offers enough of a benefit over allowing the user to update it on their own: *x = newValue; update(x); Or, perhaps some more complex structure that would drive my point home a bit more effectively: x->field[999] = newValue; update(x); As long as the library is explicit about the fact that updating data will temporarily void the heap property until an update() call is made on the data in question, it should work much better this way. That said, I think the issue of iterator behavior inside the heap will be quite an issue. Iterator-value association should not be a major issue to maintain (or the iterators would be rather useless otherwise). Structural changes could definitely alter the order of traversal, but again, that should be okay given adequate warning about which heap operations could result in changes. Working up another revision currently. Hopefully it'll be the final one, since it's due in less than 24 hours, but who knows. And, of course, I'll be around for plenty of extra questioning in the interim period if the proposal leaves anything out. :D Dan Larkin

Steven Watanabe

2:42 a.m.

Dan Larkin wrote:

...

Andrew Sutton wrote:

...
Maybe this approach allows too much freedom. I suppose you could also offer a method:

void update(iterator pos, const T& x); // *pos = x; update(pos);

That would prevent modifying operations in between updates.

This runs into the reference copying issue again. This assumes that T has a well-behaved and efficient copy-constructor, and does a full replacement of x rather than allowing for a partial update. I'm not convinced that the extra safety offers enough of a benefit over allowing the user to update it on their own:

*x = newValue; update(x);

Or, perhaps some more complex structure that would drive my point home a bit more effectively:

x->field[999] = newValue; update(x);

As long as the library is explicit about the fact that updating data will temporarily void the heap property until an update() call is made on the data in question, it should work much better this way.

Boost.MultiIndex uses the following interface "template<typename Modifier> bool modify(iterator position,Modifier mod); Requires: Modifier is a model of Unary Function accepting arguments of type value_type&. position is a valid dereferenceable iterator of the index..." http://www.boost.org/libs/multi_index/doc/reference/ord_indices.html#modifie... In Christ, Steven Watanabe

Dan Larkin

3:26 a.m.

Okay, pending any incredibly prompt responses, this may be the final revision for the project detail section of my proposal, seeing as it's due in about 13 hours now, and I'll be spending several of those sleeping and another couple in class. ===================================== I would like to implement a library which allows a simple, unified concept for a heap/priority queue with a common interface. Through this common interface, I would allow the user to select the specific model to use based on their needs. The standard interface would allow the following functions to be used regardless of the model: - insert - delete - merge - findMin - deleteMin - decreaseKey Furthermore, I would implement the following heap models: - D-ary - Binomial - Fibonacci - Brodal (may be replaced with a different model with similar performance, pending further research) As it stands, the current heap implementations in Boost C++ (specifically in sandbox/libs/pri_queue), take what I think to be slightly the wrong approach. Heaps are conceptually very simple structures, as are their interfaces. By nature, they have a single important element (the minimum), and the rest is treated as a mystery to the outside world. Thus, the issue of iterator access is rather tricky. Many operations can change the structure of the heap so as to invalidate a traversal. These trouble spots would need to be documented. Along the same lines, I feel like a lot of the internal structure can be simplified and memory usage minimized by keeping only pointers to elements. This may seem excessive in the case of simple data types like integers, but I believe it will lead to much simpler memory management, despite the fact that the user will need to handle some of it by themselves. This leads nicely into the use of iterators to simply encapsulate the original data pointers and give them some (arbitrary) order within the structure. This offers a mutability scheme which should perform much better than the reference-based one currently implemented in Boost. A reference-based scheme assumes efficient copy-constructors and does not allow partial updates, both issues which can be circumvented quite gracefully by a pointer-based scheme. All that said, I do believe that the existing d-heap and f-heap implementations can be used as a guide to help me in my development process, in both examining the design decisions along the way and verifying performance. Below are draft interfaces for the base Heap class and the PriorityQueue wrapper class. I've left out some of the tedious bits, such as constructors, in favor of keeping the focus on the functional interface. template <class T, class Compare = std::less<T>> class Heap { public: virtual iterator& insert( T* elem ) = 0; virtual void remove( iterator& elem ) = 0; virtual void update( iterator& elem ) = 0; virtual iterator& findMin() = 0; virtual void removeMin() = 0; virtual void merge( Heap<T,Compare> other ) = 0; virtual bool empty() = 0; virtual int size() = 0; class iterator { public: T& operator* (); T* operator-> (); iterator& operator++(); iterator& operator--(); bool operator== (iterator const& other); bool operator!= (iterator const& other); }; virtual iterator& begin() = 0; virtual iterator& end() = 0; }; template <class T, class Compare = std::less<T>> class PriorityQueue { public: iterator& push( T* elem ); iterator& pop(); void remove( iterator& elem ); void update( iterator& elem ); void merge( PriorityQueue<T,Compare> other ); bool empty(); int size(); class iterator { public: T& operator* (); T* operator-> (); iterator& operator++(); iterator& operator--(); bool operator== (iterator const& other); bool operator!= (iterator const& other); }; virtual iterator& begin() = 0; virtual iterator& end() = 0; private: Heap<T,Compare> data; }; Dan Larkin

Steven Watanabe

3:37 a.m.

AMDG Dan Larkin wrote:

...

template <class T, class Compare = std::less<T>> class Heap {

public:

virtual iterator& insert( T* elem ) = 0;

Maybe I missed something but why pure virtual functions?

...

virtual void remove( iterator& elem ) = 0; virtual void update( iterator& elem ) = 0;

Why is the iterator passed by non-const reference?

...

virtual iterator& findMin() = 0; virtual void removeMin() = 0;

top and pop would be more consistent with std::priority_queue.

...

<snip> virtual iterator& begin() = 0; virtual iterator& end() = 0;

How can these possible return a /reference/ to an iterator? In Christ, Steven Watanabe

Dan Larkin

4:10 a.m.

Steven Watanabe wrote:

...

Maybe I missed something but why pure virtual functions?

Heap is simply a concept, and should be treated as an abstract base class, I believe, and should never be instantiated. Individual models should take up all the slack of actually implementing these methods. Do I have the wrong understanding?

...

...
virtual void remove( iterator& elem ) = 0; virtual void update( iterator& elem ) = 0;

Why is the iterator passed by non-const reference?

It shouldn't be. I got tied up a bit in the different levels of dereferencing in my head and was thinking const references wouldn't allow me to do what was needed. I blame it on the algorithms exam I just took frying my brain a little too much for one night.

...

...
virtual iterator& findMin() = 0; virtual void removeMin() = 0;

top and pop would be more consistent with std::priority_queue.

findMin and removeMin (or even moreso deleteMin) are more consistent with standard heap terminology. It's a well-known group of data structures taught in curricula all over and studied since who knows when. Why conform to the terminology of a specific implementation in a programming language that's not nearly as old or well-established? I'm providing the PriorityQueue class as essentially a wrapper with the potentially more appealing push/pop terminology for this reason. Sorry if that sounded overly aggressive, it wasn't my intent.

...

...
<snip> virtual iterator& begin() = 0; virtual iterator& end() = 0;

How can these possible return a /reference/ to an iterator?

This was a careless mistake on my part. Thank you for calling me on it. :D Dan Larkin

Steven Watanabe

4:31 a.m.

AMDG Dan Larkin wrote:

...

Steven Watanabe wrote:

...
Maybe I missed something but why pure virtual functions?

Heap is simply a concept, and should be treated as an abstract base class, I believe, and should never be instantiated. Individual models should take up all the slack of actually implementing these methods. Do I have the wrong understanding?

I'm confused about whether this is a real class and what it's actually doing. Is it actually going to be the base class for the different heap implementations? In my experience, templates and virtual functions are not usually combined in this way. The normal way to specify the concept requirements is something like http://www.boost.org/doc/html/Assignable.html

...

...
...
virtual iterator& findMin() = 0; virtual void removeMin() = 0;

top and pop would be more consistent with std::priority_queue.

findMin and removeMin (or even moreso deleteMin) are more consistent with standard heap terminology. It's a well-known group of data structures taught in curricula all over and studied since who knows when. Why conform to the terminology of a specific implementation in a programming language that's not nearly as old or well-established?

Because it's the language that you're implementing it in. Consistent naming makes it easier to plug in different types (for instance, what if I wanted to write a function that could take either your heap or a standard priority_queue). findMin is also inconsistent with standard and Boost naming conventions, which use lowercase_and_underscores for everything except Concepts and MACROS.

...

I'm providing the PriorityQueue class as essentially a wrapper with the potentially more appealing push/pop terminology for this reason. Sorry if that sounded overly aggressive, it wasn't my intent.

If the different names are the only reason to separate PriorityQueue and Heap classes, then I'm strongly against it. In Christ, Steven Watanabe

Dan Larkin

5:07 a.m.

Steven Watanabe wrote:

...

I'm confused about whether this is a real class and what it's actually doing. Is it actually going to be the base class for the different heap implementations?

Yes. As you can see in PriorityQueue, it's used as a real base class. Just an abstract one.

...

...
...
...
virtual iterator& findMin() = 0; virtual void removeMin() = 0;

top and pop would be more consistent with std::priority_queue.

findMin and removeMin (or even moreso deleteMin) are more consistent with standard heap terminology. It's a well-known group of data structures taught in curricula all over and studied since who knows when. Why conform to the terminology of a specific implementation in a programming language that's not nearly as old or well-established?

Because it's the language that you're implementing it in. Consistent naming makes it easier to plug in different types (for instance, what if I wanted to write a function that could take either your heap or a standard priority_queue). findMin is also inconsistent with standard and Boost naming conventions, which use lowercase_and_underscores for everything except Concepts and MACROS.

Boost naming conventions are a good reason to change the name. I haven't done my full research on this subject yet, so I simply used my own standard camel-case convention. That said, I still think that the naming of the methods should stick to the algorithms rather than the STL. (Just to play the devil's advocate, there should be little or no reason to have function take either the STL priority_queue or my heap since the d-ary heap and priority_queue should have the same exact performance. Only reason would be if you wanted one of the other heaps, in which case pass a heap and forget the priority_queue.)

...

...
I'm providing the PriorityQueue class as essentially a wrapper with the potentially more appealing push/pop terminology for this reason. Sorry if that sounded overly aggressive, it wasn't my intent.

If the different names are the only reason to separate PriorityQueue and Heap classes, then I'm strongly against it.

Well, it's a different interface, and more importantly allows a pluggable structure. That seems like reason enough to me, but I'm certainly open to being persuaded.

...

This is dubious W.R.T. exception safety, because if anything between the change of the value and the call to update throws or if update itself throws, the invariants of the heap will be broken.

This is certainly an issue, and one that has been discussed at length previously in the conversation (though not specifically with regard to exception safety). In short, yes. Any mutability will involve the heap properties being violated for a period of time. This particular implementation will rely on the user to be somewhat careful about maintaining their updates and, as you pointed out, handling exceptions. Dan Larkin Dan Larkin

Andrew Sutton

1:01 p.m.

Steven Watanabe wrote:

...

...
Maybe I missed something but why pure virtual functions?

Heap is simply a concept, and should be treated as an abstract base class, I believe, and should never be instantiated. Individual models should take up all the slack of actually implementing these methods. Do I have the wrong understanding?

I know that Steven addresses some of the issues in previous emails, but I thought I'd also add my own suggestion. Forget about concepts and abstract classes for now. Let each kind of heap be its own heap. If you try to over-generalize too early, you'll probably end up writing a lot of weird adaptive corner cases. After you have the

...

virtual iterator& findMin() = 0;

...
...
virtual void removeMin() = 0;

top and pop would be more consistent with std::priority_queue.

...

findMin and removeMin (or even moreso deleteMin) are more consistent with standard heap terminology. It's a well-known group of data structures taught in curricula all over and studied since who knows when. Why conform to the terminology of a specific implementation in a programming language that's not nearly as old or well-established? I'm providing the PriorityQueue class as essentially a wrapper with the potentially more appealing push/pop terminology for this reason. Sorry if that sounded overly aggressive, it wasn't my intent.

Because using greater<> as a comparator instead of less<> changes the semantics of "min" to the semantics of "max", which makes findMin and removeMin a little disingenuous at best. Really, the "top" of the heap is either the min or max depending how the heap is parameterized. You could probably make a case for "remove" instead of "pop", since heaps aren't really queue's unless they're being used as priority queues.

...

virtual iterator& begin() = 0;

...
...
virtual iterator& end() = 0;

How can these possible return a /reference/ to an iterator?

This was a careless mistake on my part. Thank you for calling me on it. :D

Same with insert(). Iterators are almost always passed by value, and if you're not passing by value then you probably need to make a very compelling case for not doing so. Welcome to Boost :) I hope we're not discouraging you. This is very good discussion. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

1:29 p.m.

Andrew Sutton wrote:

...

I know that Steven addresses some of the issues in previous emails, but I thought I'd also add my own suggestion. Forget about concepts and abstract classes for now. Let each kind of heap be its own heap. If you try to over-generalize too early, you'll probably end up writing a lot of weird adaptive corner cases. After you have the

Honestly, I think iterator behavior is the only place where that is even an issue. The other functions are well-defined for any heap model and shouldn't be any trouble. That said, without a unified concept, this project loses most, if not it's entire, purpose. The concept is almost as important to making the library useful as having the heaps themselves.

...

Because using greater<> as a comparator instead of less<> changes the semantics of "min" to the semantics of "max", which makes findMin and removeMin a little disingenuous at best. Really, the "top" of the heap is either the min or max depending how the heap is parameterized.

You could probably make a case for "remove" instead of "pop", since heaps aren't really queue's unless they're being used as priority queues.

Ah, that's a much more compelling argument. I had been having this argument with myself internally, but decided to stick to the min-convention due to the default behavior. I think that just won the case for using top-based vocabulary.

...

Welcome to Boost :) I hope we're not discouraging you. This is very good discussion.

Not at all. I'd much rather be torn a new one (pardon the imagery) than coddled. You learn a lot more and get a lot more done that way. Well, provided it doesn't devolve into a petty flame war that is. ;) Dan Larkin

Andrew Sutton

1:48 p.m.

...

Honestly, I think iterator behavior is the only place where that is even an issue. The other functions are well-defined for any heap model and shouldn't be any trouble.

That said, without a unified concept, this project loses most, if not it's entire, purpose. The concept is almost as important to making the library useful as having the heaps themselves.

As the author of the original project idea, I am going to disagree with you :) Developing concepts is not a top-down process, nor is it a "chicken and the egg" problem. One way to look at concepts is that they describe common features of related kinds of types, although there is a much more precise way of saying this. How do you know the commonalities of heaps and/or queues if you haven't implemented any? It's always possible to guess, but why spend the time? The project is to implement data structures, not describe them. Andrew Sutton andrew.n.sutton@gmail.com

Dan Larkin

3:36 p.m.

Andrew Sutton wrote:

...

Developing concepts is not a top-down process, nor is it a "chicken and the egg" problem. One way to look at concepts is that they describe common features of related kinds of types, although there is a much more precise way of saying this. How do you know the commonalities of heaps and/or queues if you haven't implemented any?

It's always possible to guess, but why spend the time? The project is to implement data structures, not describe them.

Well, maybe I'm taking a non-standard development approach, but for me the design process goes something like this: - Research algorithm/structure - Determine required functionality - Design interface to reflect the relationship between the theory and the implementation If the previously established theory (which describes the structure) groups a set of structures into a concept and group of "models" (sure, they're still concepts themselves, but...) that all have a well defined set of commonalities, it's not much of an issue to determine what they are, is it? And doesn't a good design from the beginning make the development process easier as time progresses? You end up with a much more coherent library if you have a well-designed interface to which the implementation conforms rather than an implementation that has an interface slapped on top. And of course you can then define an interface later and adapt all the pre-existing structures to match it, but again - if you know at least approximately what the interface should look like, why not start with it in mind? It keeps the code much cleaner. I know that I can't say I've implemented all the heaps. Nor can I say that I know exactly line for line how I will do so. But I can say that I know approximately what functionality I want out of them, and this is what I am trying to reflect with the abstract base class. It is still a draft interface, and subject to change as I begin implementing the various structures and seeing how differences may help to make the library better. But I think it does need to exist from a design decision standpoint. Dan Larkin

Dan Larkin

4:24 p.m.

I guess I should also say that I do understand that writing concepts is not a top-down process in general; however, in this case, given that heaps are well-documented theoretically and that Boost already has a few different heap implementations lacking a proper concept (or perhaps rather needing an improved one is more accurate), it's not entirely top-down here either. In this case, writing a concept first is partially top-down from the theory perspective and partially bottom-up from the learning from other implementations perspective. From that concept then, which should be relatively balanced and involve much less guesswork than usual due to the bidirectional approach, can be used as top-down guidance for library development. There, I think that got my point across a little better than my impassioned rambling from earlier this morning. Hopefully that thought process at least makes sense, even if you don't agree with it? Dan Larkin

Steven Watanabe

4:44 a.m.

AMDG Dan Larkin wrote:

...

*x = newValue; update(x);

Or, perhaps some more complex structure that would drive my point home a bit more effectively:

x->field[999] = newValue; update(x);

As long as the library is explicit about the fact that updating data will temporarily void the heap property until an update() call is made on the data in question, it should work much better this way.

This is dubious W.R.T. exception safety, because if anything between the change of the value and the call to update throws or if update itself throws, the invariants of the heap will be broken. In Christ, Steven Watanabe

Daniel Mahler

6 Apr 6 Apr

6:08 p.m.

Hi Dan, On Tue, Apr 6, 2010 at 5:29 AM, Dan Larkin <danielhlarkin@gmail.com> wrote:

...

Okay. Here's draft revision 1:

I would like to implement a library which allows a simple, unified concept for a heap/priority queue with a common interface. Through this common interface, I would allow the user to select the specific model to use based on their needs.

On thing to consider is making it an extension of BGL's Buffer concept http://www.boost.org/doc/libs/1_42_0/libs/graph/doc/Buffer.html. I am not sure how widely it is used, but compatibility should never hurt. cheers D

...

The standard interface would allow the following functions to be used regardless of the model: - insert - delete - merge - findMin - deleteMin - decreaseKey

Furthermore, I would implement the following heap models: - D-ary - Binomial - Fibonacci - Brodal (may be replaced with a different model with similar performance, pending further research)

As it stands, the current heap implementations in Boost C++ (specifically in sandbox/libs/pri_queue), take what I think to be the wrong approach. Heaps are conceptually very simple structures, as are their interfaces. By nature, they have a single important element (the minimum), and the rest is treated as a mystery to the outside world. I feel like the inclusion of iterators is unnecessary, and possibly misguided. This may be a lack of understanding on my part, but I'm frankly confused by how they apply to a heap structure. In any case, I also feel like a lot of the internal structure can be simplified, and memory usage minimized, by keeping only pointers to elements. This may seem excessive in the case of simple data types like integers, but I believe it will lead to much simpler memory management, despite the face that the user will need to handle some of it by themselves. That said, I do believe that the existing d-heap and f-heap implementations can be used as a guide to help me in my development process, in both examining the design decisions and verifying performance.

Below are draft interfaces for the base Heap class and the PriorityQueue wrapper class. I've left out some of the tedious bits, such as constructors, in favor of keeping the focus on the functional interface.

/** * Abstract base class specifying a simple, but fully-featured min-heap * interface. To simplify internal structure, ease the process of making * partial updates to potentially large member elements, and minimize memory * usage, only element pointers are stored, rather than actual data. * * @tparam T Type of elements stored in the heap * @tparam Compare Comparator for type T used for heap-ordering. Defaults to * standard '<' operator. */ template <class T, class Compare = std::less<T>> class Heap {

public:

/** * Inserts a new element into the heap. * * @param elem Pointer to element to insert */ virtual void insert( T* elem ) = 0;

/** * Removes an arbitrary element from the heap and corrects the ordering * if needed. * * @param elem Pointer to element to remove */ virtual void remove( T* elem ) = 0;

/** * Signals the heap that a given element has been updated. The heap may * need to be updated to guarantee a correct ordering. * * @param elem Pointer to element which has been updated */ virtual void update( T* elem ) = 0;

/** * Returns the pointer to the minimum element in the heap. */ virtual T* findMin() = 0;

/** * Removes the minimum element from the heap. */ virtual void removeMin() = 0;

/** * Merges two heaps with the same type of elements. * * @param other Secondary heap to be merged */ virtual void merge( Heap<T,Compare> other ) = 0;

/** * Returns true if no elements remain in the heap, false otherwise. */ virtual bool empty() = 0;

/** * Returns the number of elements in the heap. */ virtual int size() = 0;

};

/** * A flexible priority queue which allows specification of the underlying data * structure upon construction, chosen from the different implementations of * boost::Heap. The default structure is a binary heap. * * @tparam T Type of elements stored in the queue * @tparam Compare Comparator of type T for priority ordering. Defaults to * standard '<' operator. */ template <class T, class Compare = std::less<T>> class PriorityQueue {

public:

/** * Adds a new element to the queue. * * @param elem Pointer to element to push */ void push( T* elem );

/** * Retrieves the pointer to the minimum element in the queue and returns * it, removing the element from the queue afterwards. */ T* pop();

/** * Removes an arbitrary element from the queue and updates the ordering if * needed. * * @param elem Pointer to element to remove */ void remove( T* elem );

/** * Signals the priority queue that a given element has been updated. The * queue may need to be updated to guarantee a correct ordering. * * @param elem Pointer to element which has been updated */ void update( T* elem );

/** * Merges two priority queues with the same type of elements. * * @param other Secondary heap to merge */ void merge( PriorityQueue<T,Compare> other );

/** * Returns true if no elements remain in the queue, false otherwise. */ bool empty();

/** * Returns the number of elements in the heap. */ int size();

private:

//! Heap containing the elements of the priority queue. Responsible for //! maintaining minimum element and providing other I/O functionality. Heap<T,Compare> data;

};

(I apologize if the code looks a bit ugly here, but it should fix itself when copied to a text editor.)

Once again, feedback is welcome. If my interpretation of existing code or the project idea is wrong, please, please point out my stupidity for my sake? Thanks.

Dan Larkin

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Andrew Sutton

7 Apr 7 Apr

10:58 a.m.

...

...
I would like to implement a library which allows a simple, unified concept for a heap/priority queue with a common interface. Through this common interface, I would allow the user to select the specific model to use based on their needs.

On thing to consider is making it an extension of BGL's Buffer concept http://www.boost.org/doc/libs/1_42_0/libs/graph/doc/Buffer.html. I am not sure how widely it is used, but compatibility should never hurt.

I wouldn't focus too much on the Buffer concept as a target for an common interface. I think it's more important to get the data structure interfaces individually correct first. Andrew Sutton andrew.n.sutton@gmail.com

5595

Age (days ago)

5611

Last active (days ago)

List overview

Download

34 comments

7 participants

participants (7)

Andrew Sutton
Dan Larkin
Daniel Mahler
iaml
Rajendran Thirupugalsamy
Steven Watanabe
Stewart, Robert