
"VE: How the objects per page impact the behavior/performances of the b_heap? AS: I think that there may be a serious design flaw here. The number of objects per page should depend on the size of the object--much like the implementation of deque.
I think that you should be able to find a function that derives this value based on the size of the object and the cache properties of the host machine. A non-trivial, but I think important feature.
This discussion should be elevated to the mailing list.
In retrospect, "serious design flaw" is not a good description for this issue :) but I think that this is important to address.
I was going to bring this up today. My suspicion is that allowing users to freely specify the number of objects per page could lead to performance degradation if they choose unwisely. I further suspect that you can formulate some heuristic that optimizes choice size of the object.
well, there are a number of constraints: * is our container aligned to cache boundaries ... the allocator needs to do a good job for this * what is the size of a cache line? ... can we get this at runtime? do we want to get this at runtime? for boost.lockfree i am using a hardcoded compile-time constant, to compute the padding size to avoid false sharing * what is the size of our objects? in general, this data structure is only reasonable for nodes that are reasonably small
I'd look at GCC's deque and see what what's going on there. LibC++ takes employs a similar heuristic.
i suppose, i is implemented as a linked list of nodes of a certain size?
I'm not fundamentally opposed to allowing customization of this parameter (in order to support experimentation), but I think that the default value should be derived differently. Also, the documentation must explain why the parameter might be varied, and in general why it should not.
well, the idea is to hold multiple levels of the tree in a single cache line or vm page. so i think it is pretty important to expose this to the interface ... but maybe a better discussion of the underlying principle is important. also it may be better to remove any notion about `cache line size' or `page size' but replace it with `memory locality', which is a broader term and implies that experimenting with different `objects_per_group<>' may be interesting. although my feeling says me, that the default value of 8 will be the first and the last answer for 99% of all use cases ... tim