[review] Heaps

newer
Re: [boost] Gauging interest in...

Andrew Sutton

31 May 2011 31 May '11

2:05 p.m.

All, I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann, and resulting from Tim's 2010 Google Summer of Code project The library implements a number of different heaps (or priority queues) including, a binary heap, d-ary heap, binomial heap, fibonacci heap, pairing heap, and skew heap. The library also provides an adaptor to make heaps mutable, allowing a programmer to modify a value and update its position within the heap. The library also also supports stability, ensuring that elements pushed into a heap with the same priority are popped in the same order. The library is not targeted in any specific domain nor directly related to any specific libraries, but intended to be a collection of general purpose data structures (not unlike the Boost.Unordered library). The documentation and source code can be downloaded from here: Documentation: http://tim.klingt.org/boost_heap Code: http://tim.klingt.org/git?p=boost_heap.git;a=snapshot;h=HEAD;sf=tgz Unlike previous reviews, this review will be assisted by a code review application, Code Collaborator (by SmartBear). The product was tested during a BoostCon session, and we are eager to see how well it scales to a community review. You can use the link below to access and review the Heap library. http://demo.smartbear.com/boost/go?page=ReviewDisplay&reviewid=4 The application allows line-by-line comments of the submitted source code in addition to providing features for general defect and issue reporting. If you are interested in using this system, we will have to register you as a user of the site. You can request access by replying to this message. I'm still learning my way around the system, so there are bound to be some missteps along the way :) If you are not interested in using the system, then feel free to send email like any other review. Reviews of the library should address the following questions. What is your evaluation of the design? What is your evaluation of the implementation? What is your evaluation of the documentation? What is your evaluation of the potential usefulness of the library? Did you try to use the library? With what compiler? Did you have any problems? How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? Are you knowledgeable about the problem domain? And finally, every review should answer this question: Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. Please note that some questions may be easier to answer in email than with the review tool. As always, reviews are important to the continued quality of the Boost C++ Libraries. Thank you in advance for your comments and contributions. Andrew PS. The review was slated to start on Sunday, but I was not able to make the announcement until today.

Show replies by date

Thorsten Ottosen

31 May 31 May

2:54 p.m.

Den 31-05-2011 16:05, Andrew Sutton skrev:

...

All,

I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann

A quick glance. On the surface it looks really good. For the documentation, I suggest that pre and postcondition are more clearly stated, and that the implementaion checks these with assertions e.g. for increase()/decrease() (didn't look at the source). Please also specify clearly what the requirements are for T on which the containers are templated. Is the implementation maximally generic in this area? Are the benchmark operating on the exact same data? I'm fine with using random data as long as the same elements are in the queues being compared. Similarly, one random sequence used in push comparisons. I couldn't tell if that is already the case. Something to think about: how difficult would it be to refactor the code such that it can be used like today, but also as intrusive containers (cf. Boost.Intrusive). When one needs many small prioriti queues, I bet intrusive containers is just damn fast. Good work -Thorsten

Thorsten Ottosen

3:33 p.m.

Den 31-05-2011 16:54, Thorsten Ottosen skrev:

...

Den 31-05-2011 16:05, Andrew Sutton skrev:

...
All,

I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann

A quick glance. On the surface it looks really good.

Also, if it is not already possible, those heaps that use an std::vector ir similar internally should have this data-structure as a template argument s.t. we can easily change this. In some application it might even be possible to use boost::array<T,N>. -Thorsten

Tim Blechmann

1 Jun 1 Jun

1:11 p.m.

...

...
...
I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann

A quick glance. On the surface it looks really good.

Also, if it is not already possible, those heaps that use an std::vector ir similar internally should have this data-structure as a template argument s.t. we can easily change this. In some application it might even be possible to use boost::array<T,N>.

hm ... not sure about this: the implementations requires push_back/pop_back (which are not available for boost::array) and random access iterators. the std::vector can be configured to use a different allocator using parameters and one can use `reserve' to reserve memory for the vector ... so i do not really see a specific use case, where one really wants to use a different container, or do you have a specific case? cheers, tim

Thorsten Ottosen

1:45 p.m.

Den 01-06-2011 15:11, Tim Blechmann skrev:

...

...
...
...
I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann

A quick glance. On the surface it looks really good.

Also, if it is not already possible, those heaps that use an std::vector ir similar internally should have this data-structure as a template argument s.t. we can easily change this. In some application it might even be possible to use boost::array<T,N>.

hm ... not sure about this: the implementations requires push_back/pop_back (which are not available for boost::array) and random access iterators. the std::vector can be configured to use a different allocator using parameters and one can use `reserve' to reserve memory for the vector ...

so i do not really see a specific use case, where one really wants to use a different container, or do you have a specific case?

You might now the max size of your heap, and so you can make a simple wrapper around an array. Other cases would be to use something like stlsoft::auto_buffer. -Thorsten

Tim Blechmann

4:14 p.m.

...

...
...
...
...
I'd like to announce the beginning of the review of the Heap library, written by Tim Blechmann

A quick glance. On the surface it looks really good.

Also, if it is not already possible, those heaps that use an std::vector ir similar internally should have this data-structure as a template argument s.t. we can easily change this. In some application it might even be possible to use boost::array<T,N>.

hm ... not sure about this: the implementations requires push_back/pop_back (which are not available for boost::array) and random access iterators. the std::vector can be configured to use a different allocator using parameters and one can use `reserve' to reserve memory for the vector ...

so i do not really see a specific use case, where one really wants to use a different container, or do you have a specific case?

You might now the max size of your heap, and so you can make a simple wrapper around an array. Other cases would be to use something like stlsoft::auto_buffer.

one could use a `container' argument similar to the std::priority_queue ... but i somehow like the way to pass the allocator as template argument (mainly for consistency among the different heaps) ... but of course, it would be possible to catch this at compile time ... so that either a container or an allocator is passed, but not both ... tim

Thorsten Ottosen

5:06 p.m.

Den 01-06-2011 18:14, Tim Blechmann skrev:

...

...
...
so i do not really see a specific use case, where one really wants to use a different container, or do you have a specific case?

You might now the max size of your heap, and so you can make a simple wrapper around an array. Other cases would be to use something like stlsoft::auto_buffer.

one could use a `container' argument similar to the std::priority_queue ... but i somehow like the way to pass the allocator as template argument (mainly for consistency among the different heaps) ... but of course, it would be possible to catch this at compile time ... so that either a container or an allocator is passed, but not both ...

Passing an allocator is not near as powerful a feature as passing the container. The same goes for flat_set/flat_map og circular_buffer, which I have argued should have such an argument too. -Thorsten

Tim Blechmann

5:27 p.m.

...

...
one could use a `container' argument similar to the std::priority_queue ... but i somehow like the way to pass the allocator as template argument (mainly for consistency among the different heaps) ... but of course, it would be possible to catch this at compile time ... so that either a container or an allocator is passed, but not both ...

Passing an allocator is not near as powerful a feature as passing the container.

nope ... but i would like to ensure the API consistency between node-based heaps and container-adaptors cheers, tim

Thorsten Ottosen

6:18 p.m.

Den 01-06-2011 19:27, Tim Blechmann skrev:

...

...
...
one could use a `container' argument similar to the std::priority_queue ... but i somehow like the way to pass the allocator as template argument (mainly for consistency among the different heaps) ... but of course, it would be possible to catch this at compile time ... so that either a container or an allocator is passed, but not both ...

Passing an allocator is not near as powerful a feature as passing the container.

nope ... but i would like to ensure the API consistency between node-based heaps and container-adaptors

Why? This is not exactly the same as defining a generic algorithm in terms of iterators. Lots of containers have different number of template arguments. I fail to see what benefits this "consistency" gives us. -Thorsten

Tim Blechmann

2 Jun 2 Jun

8:17 a.m.

...

...
...
...
one could use a `container' argument similar to the std::priority_queue ... but i somehow like the way to pass the allocator as template argument (mainly for consistency among the different heaps) ... but of course, it would be possible to catch this at compile time ... so that either a container or an allocator is passed, but not both ...

Passing an allocator is not near as powerful a feature as passing the container.

nope ... but i would like to ensure the API consistency between node-based heaps and container-adaptors

Why? This is not exactly the same as defining a generic algorithm in terms of iterators. Lots of containers have different number of template arguments. I fail to see what benefits this "consistency" gives us.

a user who wants to find the best implementation for a specific application can simply replace the name ... my proposal above would imply that either a container or an allocator can be specified: give an allocator if you just want to use a custom allocator (consistent among all heaps) or give a container if you want to specify a custom container (consistent among container adaptors). whats wrong with this approach? tim

Thorsten Ottosen

9:56 a.m.

...

a user who wants to find the best implementation for a specific application can simply replace the name ... my proposal above would imply that either a container or an allocator can be specified:

I can live with that. -Thorsten

Andrew Sutton

1 Jun 1 Jun

1:49 p.m.

...

...
Also, if it is not already possible, those heaps that use an std::vector ir similar internally should have this data-structure as a template argument s.t. we can easily change this. In some application it might even be possible to use boost::array<T,N>.

hm ... not sure about this: the implementations requires push_back/pop_back (which are not available for boost::array) and random access iterators. the std::vector can be configured to use a different allocator using parameters and one can use `reserve' to reserve memory for the vector ...

so i do not really see a specific use case, where one really wants to use a different container, or do you have a specific case?

I think the data structure *could* be adapted to use an array, but the array would have to be adapted to emulate a back insertion sequence. There may be cases where using a deque gives better performance than a vector, especially if the value_type is large (say, > 512B). This could be a moot point since priority queues are almost always used indirectly (with pointers, handles, or descriptors). Those are typically in the 4-32 byte range. I think there's value in considering parameterizing over the container type. There are probably use cases where bounding the heap size is useful. I'd like to know about these use cases before committing to make the change, however. Andrew

Thorsten Ottosen

5:10 p.m.

Den 01-06-2011 15:49, Andrew Sutton skrev:

...

I think there's value in considering parameterizing over the container type. There are probably use cases where bounding the heap size is useful. I'd like to know about these use cases before committing to make the change, however.

The cases are abundant. So many problems are only tractable for small input sizes. In my PhD work I have algorithms that will need many small priority queues. But even for larger sizes, it is still useful to pass e.g. stlsoft::auto_buffer as a container. An allocator is not near as powerful. Also, take a look at the STL from Electronic Arts ... they provide this feature for several types, e.g. circular_buffer. -Thorsten

Tim Blechmann

1:04 p.m.

hi thorsten,

...

For the documentation, I suggest that pre and postcondition are more clearly stated,

what kind of pre- and postconditions are you refering to? do you have anything specific in mind? or is there a library which does this in a very good way?

...

and that the implementaion checks these with assertions e.g. for increase()/decrease() (didn't look at the source).

... this is a good point ... i should probably add some sanity checks

...

Please also specify clearly what the requirements are for T on which the containers are templated. Is the implementation maximally generic in this area?

i do not really see any restriction on the held type ...

...

Are the benchmark operating on the exact same data? I'm fine with using random data as long as the same elements are in the queues being compared. Similarly, one random sequence used in push comparisons. I couldn't tell if that is already the case.

it is the same sequence or random values read from a pre-filled array

...

Something to think about: how difficult would it be to refactor the code such that it can be used like today, but also as intrusive containers (cf. Boost.Intrusive). When one needs many small prioriti queues, I bet intrusive containers is just damn fast.

during last gsoc i was discussing this with my mentor, who suggested to focus on non-intrusive implementations. in general, there are two types of implementations: node-based and container adaptors. the performance characteristics really depend on the size of your nodes and on your use case ... cheers, tim

Thorsten Ottosen

1:48 p.m.

Den 01-06-2011 15:04, Tim Blechmann skrev:

...

hi thorsten,

...
For the documentation, I suggest that pre and postcondition are more clearly stated,

what kind of pre- and postconditions are you refering to? do you have anything specific in mind? or is there a library which does this in a very good way?

...
and that the implementaion checks these with assertions e.g. for increase()/decrease() (didn't look at the source).

... this is a good point ... i should probably add some sanity checks

I couln't find a precondition in the docs for eg. increase().

...

...
Please also specify clearly what the requirements are for T on which the containers are templated. Is the implementation maximally generic in this area?

i do not really see any restriction on the held type ...

? surely it must support a linear ordering? operator<? -Thorsten

Tim Blechmann

4:24 p.m.

...

...
what kind of pre- and postconditions are you refering to? do you have anything specific in mind? or is there a library which does this in a very good way?

...
and that the implementaion checks these with assertions e.g. for increase()/decrease() (didn't look at the source).

... this is a good point ... i should probably add some sanity checks

I couln't find a precondition in the docs for eg. increase().

the concept->mutability is describing this API in detail, but i will try to improve the documentation in the class reference ...

...

...
...
Please also specify clearly what the requirements are for T on which the containers are templated. Is the implementation maximally generic in this area?

i do not really see any restriction on the held type ...

? surely it must support a linear ordering? operator<?

well, i meant, i do not see any non-trivial restriction on the held type ;) of course it must support a total order cheers, tim

Thorsten Ottosen

5:16 p.m.

Den 01-06-2011 18:24, Tim Blechmann skrev:

...

...
...
what kind of pre- and postconditions are you refering to? do you have anything specific in mind? or is there a library which does this in a very good way?

...
and that the implementaion checks these with assertions e.g. for increase()/decrease() (didn't look at the source).

... this is a good point ... i should probably add some sanity checks

I couln't find a precondition in the docs for eg. increase().

the concept->mutability is describing this API in detail, but i will try to improve the documentation in the class reference ...

Also, I pressume pop() requires !empty(). /All/ functions should also have a throws clause, perhaps conditionally on the held type. In this context, exception-safety guarantees should also be stated. Additionally, each function should state if it invalidates iterators and handles (but only the ones that does need to state it). This is really needed. -Thorsten

Tim Blechmann

2 Jun 2 Jun

8:19 a.m.

...

...
the concept->mutability is describing this API in detail, but i will try to improve the documentation in the class reference ...

Also, I pressume pop() requires !empty(). /All/ functions should also have a throws clause, perhaps conditionally on the held type. In this context, exception-safety guarantees should also be stated. Additionally, each function should state if it invalidates iterators and handles (but only the ones that does need to state it).

will have a look ... in general boost.heap does not throw any exceptions by itself, but of course the held type, the container or the allocator can throw ... tim

Andrew Sutton

1 Jun 1 Jun

1:59 p.m.

...

Unlike previous reviews, this review will be assisted by a code review application, Code Collaborator (by SmartBear). The product was tested during a BoostCon session, and we are eager to see how well it scales to a community review. You can use the link below to access and review the Heap library.

http://demo.smartbear.com/boost/go?page=ReviewDisplay&reviewid=4

Just a quick addendum on this topic... If you are interested in reviewing, you should register as a user here: http://demo.smartbear.com/boost/ You can be formally added as a reviewer to any current review by the author or (I think) any other reviewer. I may be wrong about the latter, but I was able to add another reviewer this morning. Andrew

Stewart, Robert

2:14 p.m.

Andrew Sutton wrote:

...

...
Unlike previous reviews, this review will be assisted by a code review application, Code Collaborator (by SmartBear). The product was tested during a BoostCon session, and we are eager to see how well it scales to a community review. You can use the link below to access and review the Heap library.

<http://demo.smartbear.com/boost/go?page=ReviewDisplay&reviewid=4>

Just a quick addendum on this topic... If you are interested in reviewing, you should register as a user here:

http://demo.smartbear.com/boost/

You can be formally added as a reviewer to any current review by the author or (I think) any other reviewer. I may be wrong about the latter, but I was able to add another reviewer this morning.

Two additional points: 1. Most participating via Code Collaborator should be "observers," not "reviewers." The latter are expected to "accept" each comment and defect. My suggestion is that the review manager be the sole "reviewer." 2. Code Collaborator should not be used for design level discussion or for the final review message that answers the traditional questions, including whether to accept the library. Instead, Code Collaborator is for commenting on, or marking defects against, particular lines of code or documentation. There is a General Chat area, but that should be used for comments regarding the content actually uploaded, something missing that should have been uploaded, or summary comments on most of the code or documentation. Higher level discussions and formal reviews should be posted to this listed as in the past. A tool like Code Collaborator, Crucible, etc. that supported threaded discussions and formal reviews would be ideal, but we don't have such a tool at present. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Andrew Sutton

2:25 p.m.

These are good observations.

...

Two additional points:

1. Most participating via Code Collaborator should be "observers," not "reviewers." The latter are expected to "accept" each comment and defect. My suggestion is that the review manager be the sole "reviewer."

I agree with this in principle, but I'm curious to hear if there are any dissenting opinions :)

...

2. Code Collaborator should not be used for design level discussion or for the final review message that answers the traditional questions, including whether to accept the library.

I find the communication features of Code Collaborator lacking for precisely this reason and would also prefer to discuss and debate high-level design issues on the mailing list. Andrew

Vicente Botet

4:13 p.m.

Andrew Sutton-3 wrote:

...

These are good observations.

...
Two additional points:

1. Most participating via Code Collaborator should be "observers," not "reviewers." The latter are expected to "accept" each comment and defect. My suggestion is that the review manager be the sole "reviewer."

I agree with this in principle, but I'm curious to hear if there are any dissenting opinions :)

I find interesting the reviewer can accept the response the author give to its own comments or defects. Once the review is finished, the review manager can change the role of all the other reviewers to observers. This gives the review manager an idea of how many issues have not been addressed.

...

...
2. Code Collaborator should not be used for design level discussion or for the final review message that answers the traditional questions, including whether to accept the library.

I find the communication features of Code Collaborator lacking for precisely this reason and would also prefer to discuss and debate high-level design issues on the mailing list.

I have not reached to do copy-paste from the source to the comments. Is there a way? If not this is quite limiting. Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/review-Heaps-tp3563201p3566239.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Andrew Sutton

4:35 p.m.

...

I have not reached to do copy-paste from the source to the comments. Is there a way? If not this is quite limiting.

I tried to copy some code into an email and couldn't seem to figure it out... Andrew

Stewart, Robert

2 Jun 2 Jun

11:14 a.m.

Andrew Sutton wrote:

...

...
I have not reached to do copy-paste from the source to the comments. Is there a way? If not this is quite limiting.

I tried to copy some code into an email and couldn't seem to figure it out...

The fancy behavior of the code pane, which synchronizes the comments pane to the line clicked, is typically a good thing. Unfortunately, there is no "copy" mode to disable that behavior. However, the "[Download]" links above the code provide a means to get the code on your machine so you can copy what you wish. It isn't ideal, but it is functional. _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Edward Diener

1 Jun 1 Jun

4:54 p.m.

On 6/1/2011 10:14 AM, Stewart, Robert wrote:

...

Andrew Sutton wrote:

...
...
Unlike previous reviews, this review will be assisted by a code review application, Code Collaborator (by SmartBear). The product was tested during a BoostCon session, and we are eager to see how well it scales to a community review. You can use the link below to access and review the Heap library.

<http://demo.smartbear.com/boost/go?page=ReviewDisplay&reviewid=4>

Just a quick addendum on this topic... If you are interested in reviewing, you should register as a user here:

http://demo.smartbear.com/boost/

You can be formally added as a reviewer to any current review by the author or (I think) any other reviewer. I may be wrong about the latter, but I was able to add another reviewer this morning.

Two additional points:

1. Most participating via Code Collaborator should be "observers," not "reviewers." The latter are expected to "accept" each comment and defect. My suggestion is that the review manager be the sole "reviewer."

2. Code Collaborator should not be used for design level discussion or for the final review message that answers the traditional questions, including whether to accept the library. Instead, Code Collaborator is for commenting on, or marking defects against, particular lines of code or documentation. There is a General Chat area, but that should be used for comments regarding the content actually uploaded, something missing that should have been uploaded, or summary comments on most of the code or documentation. Higher level discussions and formal reviews should be posted to this listed as in the past.

A tool like Code Collaborator, Crucible, etc. that supported threaded discussions and formal reviews would be ideal, but we don't have such a tool at present.

I agree with your comments above, especially this last. When I looked at Code Collaborator I was sort of amazed that the main thing missing from the tool was the ability to have comments as the main part of the review, with code review being just a part of the review. Somehow I feel they put things opposite to the way a real review works, but they seem to view reviews as code collaborations and not actual reviews/comments about software. Oh well, each tool to its own specifications, but it was definitely not for me or my idea of what a review of software is all about. I will say no more since comments under this heading should really be about the Heaps library and not about Code Collaborator.

Andrew Sutton

2:15 p.m.

...

I'd like to announce the beginning of the review of the Heap library,

I have a couple of issues that I'd like to see addressed, but I'd like to discuss them rather than just noting them in the review app. One of my primary objections to the current design is the use of unspecified policy parameters in the template parameter list of the main data structures. I think that it is important to limit the number of top-level template parameters to only non-policy parameters. I would recommend encapsulating policies in a single top-level parameter, for example: template<typename T, typename Policies = default_b_heap_policies<T>> class b_heap { ... }; where default_b_heap_policies determines the set of ancillary parameters to the instantiation of the data structure. It could be a template alias to a more concrete specification. There are two main reasons I suggest this. First, it simplifies the interface for general usage. There are only two parameters, their order is well specified, you can describe each parameter with concepts (probably), and general usage can probably ignore the second parameter. Second, it protects the template interface from later design decisions. For example, if you accept Thorstens recommendation to parameterize over the container type, you don't have to add a new top-level template parameter; you would only have to modify the default policies and the metafunctions that query for it. Andrew

Tim Blechmann

4:35 p.m.

...

...
I'd like to announce the beginning of the review of the Heap library,

I have a couple of issues that I'd like to see addressed, but I'd like to discuss them rather than just noting them in the review app.

One of my primary objections to the current design is the use of unspecified policy parameters in the template parameter list of the main data structures. I think that it is important to limit the number of top-level template parameters to only non-policy parameters.

of course this can be discussed ... i somehow like this explicit way of specifying parameters, but this is probably my personal preference ... i think it would be great to have a clear policy among all (or all new) boost libraries ... then this can be discussed once and only once ... i would prefer to have clear interface guidelines instead of arguing about personal preferences :)

...

Second, it protects the template interface from later design decisions. For example, if you accept Thorstens recommendation to parameterize over the container type, you don't have to add a new top-level template parameter; you would only have to modify the default policies and the metafunctions that query for it.

... i haven't really worked with c++0x, yet ... but isn't this address by variadic templates? cheers, tim

Andrew Sutton

4:48 p.m.

...

...
One of my primary objections to the current design is the use of unspecified policy parameters in the template parameter list of the main data structures. I think that it is important to limit the number of top-level template parameters to only non-policy parameters.

of course this can be discussed ... i somehow like this explicit way of specifying parameters, but this is probably my personal preference ...

But they're not explicit; the parameters are entirely too vague. If I look at the class definition, I should be able to able to generate some reasonable hypothesis about the kinds of types accepted by the template parameters.

...

i think it would be great to have a clear policy among all (or all new) boost libraries ... then this can be discussed once and only once ... i would prefer to have clear interface guidelines instead of arguing about personal preferences :)

There's a lot of value in having a consistent design style, but trying to pin one down for all of Boost may be a little bit beyond the scope of this review :) I am curious to hear some other opinions on this.

...

... i haven't really worked with c++0x, yet ... but isn't this address by variadic templates?

Yes and no. You can certainly support arbitrary policies with variadic templates (basically a tuple of policies). I built a small experimental library that did exactly this in 2009. I didn't like the result because it made the resulting classes so vague.

Tim Blechmann

2 Jun 2 Jun

8:24 a.m.

...

...
...
One of my primary objections to the current design is the use of unspecified policy parameters in the template parameter list of the main data structures. I think that it is important to limit the number of top-level template parameters to only non-policy parameters.

of course this can be discussed ... i somehow like this explicit way of specifying parameters, but this is probably my personal preference ...

But they're not explicit; the parameters are entirely too vague. If I look at the class definition, I should be able to able to generate some reasonable hypothesis about the kinds of types accepted by the template parameters.

hm ... i have modeled the interface vaguely according to boost.intrusive (probably because i was using it extensively in the months before working on boost.heap). tim

Vicente Botet

2:23 p.m.

Andrew Sutton-3 wrote:

...

Please note that some questions may be easier to answer in email than with the review tool.

Hi, I forward here some comments from CC requested by Andrew "VE: How the objects per page impact the behavior/performances of the b_heap? AS: I think that there may be a serious design flaw here. The number of objects per page should depend on the size of the object--much like the implementation of deque. I think that you should be able to find a function that derives this value based on the size of the object and the cache properties of the host machine. A non-trivial, but I think important feature. This discussion should be elevated to the mailing list. " Best, Vicente -- View this message in context: http://boost.2283326.n4.nabble.com/review-Heaps-tp3563201p3568446.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Andrew Sutton

2:41 p.m.

...

I forward here some comments from CC requested by Andrew

"VE: How the objects per page impact the behavior/performances of the b_heap? AS: I think that there may be a serious design flaw here. The number of objects per page should depend on the size of the object--much like the implementation of deque.

I think that you should be able to find a function that derives this value based on the size of the object and the cache properties of the host machine. A non-trivial, but I think important feature.

This discussion should be elevated to the mailing list.

In retrospect, "serious design flaw" is not a good description for this issue :) but I think that this is important to address. I was going to bring this up today. My suspicion is that allowing users to freely specify the number of objects per page could lead to performance degradation if they choose unwisely. I further suspect that you can formulate some heuristic that optimizes choice size of the object. I'd look at GCC's deque and see what what's going on there. LibC++ takes employs a similar heuristic. Cache properties will also impact performance related to this choice, but I think that information is not always available at compile time. Also, I think that cache properties matter more as the size of the heap gets large, so it may not I'm not fundamentally opposed to allowing customization of this parameter (in order to support experimentation), but I think that the default value should be derived differently. Also, the documentation must explain why the parameter might be varied, and in general why it should not. Andrew

Tim Blechmann

5:15 p.m.

...

...
"VE: How the objects per page impact the behavior/performances of the b_heap? AS: I think that there may be a serious design flaw here. The number of objects per page should depend on the size of the object--much like the implementation of deque.

I think that you should be able to find a function that derives this value based on the size of the object and the cache properties of the host machine. A non-trivial, but I think important feature.

This discussion should be elevated to the mailing list.

In retrospect, "serious design flaw" is not a good description for this issue :) but I think that this is important to address.

I was going to bring this up today. My suspicion is that allowing users to freely specify the number of objects per page could lead to performance degradation if they choose unwisely. I further suspect that you can formulate some heuristic that optimizes choice size of the object.

well, there are a number of constraints: * is our container aligned to cache boundaries ... the allocator needs to do a good job for this * what is the size of a cache line? ... can we get this at runtime? do we want to get this at runtime? for boost.lockfree i am using a hardcoded compile-time constant, to compute the padding size to avoid false sharing * what is the size of our objects? in general, this data structure is only reasonable for nodes that are reasonably small

...

I'd look at GCC's deque and see what what's going on there. LibC++ takes employs a similar heuristic.

i suppose, i is implemented as a linked list of nodes of a certain size?

...

I'm not fundamentally opposed to allowing customization of this parameter (in order to support experimentation), but I think that the default value should be derived differently. Also, the documentation must explain why the parameter might be varied, and in general why it should not.

well, the idea is to hold multiple levels of the tree in a single cache line or vm page. so i think it is pretty important to expose this to the interface ... but maybe a better discussion of the underlying principle is important. also it may be better to remove any notion about `cache line size' or `page size' but replace it with `memory locality', which is a broader term and implies that experimenting with different `objects_per_group<>' may be interesting. although my feeling says me, that the default value of 8 will be the first and the last answer for 99% of all use cases ... tim

Andrew Sutton

3 Jun 3 Jun

1:11 p.m.

...

...
I'd look at GCC's deque and see what what's going on there. LibC++ takes employs a similar heuristic.

i suppose, i is implemented as a linked list of nodes of a certain size?

Linked segments of contiguous memory. The number of objects per segment depend on object size.

...

also it may be better to remove any notion about `cache line size' or `page size' but replace it with `memory locality', which is a broader term and implies that experimenting with different `objects_per_group<>' may be interesting. although my feeling says me, that the default value of 8 will be the first and the last answer for 99% of all use cases ...

How would you specify locality? A metafunction on the object size and other configuration parameters (page size, cache line width, etc?). I'm won't doubt that you're right about 8 being a good default value, but I think you need to empirically demonstrate that the choice is right for a variety of reasonable object sizes (say, up to 32 bytes) and on both 32b and 64b systems. Andrew

Tim Blechmann

4:28 p.m.

...

...
also it may be better to remove any notion about `cache line size' or `page size' but replace it with `memory locality', which is a broader term and implies that experimenting with different `objects_per_group<>' may be interesting. although my feeling says me, that the default value of 8 will be the first and the last answer for 99% of all use cases ...

How would you specify locality? A metafunction on the object size and other configuration parameters (page size, cache line width, etc?).

grouping more objects will make the algorithm slower in terms of instructions, but faster since one will access multiple objects are in adjacent memory regions ... i would simply expose this parameter to the user and encourage experimentation ...

...

I'm won't doubt that you're right about 8 being a good default value, but I think you need to empirically demonstrate that the choice is right for a variety of reasonable object sizes (say, up to 32 bytes) and on both 32b and 64b systems.

i can add some benchmarks for various configurations ... cheers, tim

Andrew Sutton

6:42 p.m.

...

...
How would you specify locality? A metafunction on the object size and other configuration parameters (page size, cache line width, etc?).

grouping more objects will make the algorithm slower in terms of instructions, but faster since one will access multiple objects are in adjacent memory regions ... i would simply expose this parameter to the user and encourage experimentation ...

Yes, but what form will the parameter take? Andrew

Tim Blechmann

7:09 p.m.

...

...
...
How would you specify locality? A metafunction on the object size and other configuration parameters (page size, cache line width, etc?).

grouping more objects will make the algorithm slower in terms of instructions, but faster since one will access multiple objects are in adjacent memory regions ... i would simply expose this parameter to the user and encourage experimentation ...

Yes, but what form will the parameter take?

the number of objects per group ... like it is now

Andrew Sutton

7:37 p.m.

...

the number of objects per group ... like it is now

Oh... I thought you meant to introduce some other means of specifying the locality. I think "objects per page" is adequate, any other name might be misinterpreted. Andrew

Thomas Klimpel

7 Jun 7 Jun

7:32 p.m.

Andrew Sutton wrote:

...

Unlike previous reviews, this review will be assisted by a code review application, Code Collaborator (by SmartBear). The product was tested during a BoostCon session, and we are eager to see how well it scales to a community review. You can use the link below to access and review the Heap library.

http://demo.smartbear.com/boost/go?page=ReviewDisplay&reviewid=4

The application allows line-by-line comments of the submitted source code in addition to providing features for general defect and issue reporting. If you are interested in using this system, we will have to register you as a user of the site. You can request access by replying to this message.

Can you add me as "Observer", so that I can add "Defect Reports" instead of just "Comments" (see b_heap.hpp line 288 for example) that just get ignored?

Andrew Sutton

7:44 p.m.

...

Can you add me as "Observer", so that I can add "Defect Reports" instead of just "Comments" (see b_heap.hpp line 288 for example) that just get ignored?

Done. Andrew

5163

Age (days ago)

5170

Last active (days ago)

List overview

Download

38 comments

7 participants

participants (7)

Andrew Sutton
Edward Diener
Stewart, Robert
Thomas Klimpel
Thorsten Ottosen
Tim Blechmann
Vicente Botet

[review] Heaps

Andrew Sutton

Thorsten Ottosen

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Andrew Sutton

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Tim Blechmann

Thorsten Ottosen

Tim Blechmann

Andrew Sutton

Stewart, Robert

Andrew Sutton

Vicente Botet

Andrew Sutton

Stewart, Robert

Edward Diener

Andrew Sutton

Tim Blechmann

Andrew Sutton

Tim Blechmann

Vicente Botet

Andrew Sutton

Tim Blechmann

Andrew Sutton

Tim Blechmann

Andrew Sutton

Tim Blechmann

Andrew Sutton

Thomas Klimpel

Andrew Sutton

tags

participants (7)