Re: [Boost-users] intrusive data structures with custom pointer class
On Fri, 07 Feb 2014 16:00:31 -0500 boost-users-request@lists.boost.org wrote:
Message: 6 Date: Fri, 07 Feb 2014 22:00:23 +0100 From: Ion Gazta?aga
To: boost-users@lists.boost.org Subject: Re: [Boost-users] intrusive data structures with custom pointer class Message-ID: <52F54967.9020704@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed El 06/02/2014 23:52, Matei David escribi?:
Hi,
I would like to use some boost::intrusive data structures (lists and trees) in a project I'm working on, but I ran into trouble trying to customize the existing classes to my needs. I decided to post my problem here, in case somebody knowledgeable of boost::intrusive *implementation details* can assist me. Thank you in advance.
Hi Meti,
You have a lot of questions, I hope I can answer most of them.
Background:
In the project I'm working on, I need to manage a large number of objects in memory. The data is dynamic, so I need to be able to alloc&dealloc objects, in addition to keeping the objects in various lists&trees (for which I'm trying to use boost::intrusive). The RAM usage is large (think, >100GB), but the number of objects is smaller than 2^32. So one thing I'd like to do is to replace 8-byte raw pointers with 4-byte offsets into a deque. Doing this would solve the alloc/dealloc part. However, I now need to customize boost::intrusive containers to work with these 4-byte offsets instead of raw pointers.
I guess the deque is a global object, right. What do you mean with "offset" into a deque? It's the offset passed to "deque::operator[]"?
Hi and thanks for the reply.
Contrary to some of the things you said, I found a way to (partly?)
implement my specific use case with boost::intrusive. However, I have
some concerns:
- I only tested a small part of the boost::intrusive functionality
with my customizations, so my success might be superficial. Could you
have a look and make an informed guess as to how stable this is?
- I am concerned that further boost updates might inadvertently break
my customizations, and I would hate to list =1.55 as a specific
dependency (as opposed to >=1.55). I understand this can never be
fully avoided, but I'm hoping you could decide to support my use
case, either officially by including it in the code base and the docs,
or at least non-officially by being aware of it.
I found and fixed this along the way:
https://svn.boost.org/trac/boost/ticket/9650
Let me restate my specific use-case: I have several types A, B,..., and
I want each object a1 of type A to be kept in a tree (say) along with
other A objects (not all A objects go in the same tree). Since there
are so many A objects, I care about their memory usage. Specifically, I
want to replace 8-byte pointers (A*) by 4-byte handles (which I call
Ptr<A>). An A-handle is really an uint32_t offset into the array of
type A objects. A B-handle is also an offset, but into another array,
holding B objects. Thus:
- The transformation handle-to-object_address is O(1).
- The transformation object_address-to-handle is prohibitive.
- I can only use A-handles for A objects stored in a unique A-object
array. If I construct an A object elsewhere, I can refer to it with a
raw pointer, but not with an A-handle.
- Objects of type B are stored in a different array than objects of
type A. This means, in particular, that handles are not freely
re-interpretable as real/smart pointers are with casts. So,
dereferencing the A handle 42 gives me the A object number 42 from
the A object array; but dereferencing the same handle reinterpreted
as a B handle gives me the object number 42 from the B object array.
NOTE: These pointer casts are my primary source of concern. I will
come back to this below.
My goal is to make boost::implicit structures work with these handles
instead of raw/smart pointers. Intuition tells me there isn't a serious
reason why they shouldn't, and that this type of scenario is general
enough to be of interest to others, and therefore to boost maintainers.
My (partial) success is here:
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/factory.hpp
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/test-i_list-orig.cpp
https://github.com/mateidavid/sga/raw/matei-dev/src/Dev/test-i_tree-orig.cpp
The code is a work in progress. You'll need a few more included files.
I'm compiling with, e.g.:
$ [g++|clang++] [-DNONSTATIC_VALUE_TRAITS]
-I
- Given a type T, I rolled out pointer-replacement class Ptr<T> which holds a 4-byte int, and upon operator*, it produces a Ref<T> object. - A reference-replacement Ref<T> class similarly holds a 4-byte int, it implements operator& which produces the associated Ptr<T> object, and it also provides an operator T& () conversion which dereferences the object from the deque. - With these, I tried a custom traits struct looking like this: value_type = node := T node_ptr = pointer := Ptr<T> reference := Ref<T>
Boost.Intrusive models (or tries to model) a pointer type that can be modeled with std::pointer_traits (more precisely boost::intrusive::pointer_traits). In that model the reference type must be a type that implements the dot operator. Sadly in C++ there is no way to implement a class that overloads this operator. That's why your pseudo-reference type will not work.
I thought about this. There are 2 different perspectives. From the point of view of the boost::intrusive data structures (e.g., list), I think these structures should never ever dereference any object they store. So, e.g. inside list.hpp, if we have somewhere an object val of type "reference" (such as in insert()), the code should never use "o.<field> = 42". Instead, whatever fields the data structure needs to access (like r/w access to parent pointers), they should access them through the traits class methods (e.g. traits::set_field(o,42)). However, this already seems to be the case today. From the POV of the traits classes, indeed, these need to dereference values. First of all, there is a problem with the function signatures. In http://www.boost.org/doc/libs/1_55_0/doc/html/intrusive/value_traits.html the signature of to_node_ptr should be changed from: static node_ptr to_node_ptr (value_type &value) to: static node_ptr to_node_ptr (pointer_traits<pointer>::reference value) This just involves changing the docs; you can change the signature (as I did in my customization) and it compiles just fine. Also, even with the right signature, the function should never do: { return value.get_node_ptr(); } but instead: { return (&value)->get_node_ptr(); } because both reference::operator& and pointer::operator-> are overloadable. However, the concern about traits classes is not as serious as the one about the data structures because users are given the option to write their own traits classes anyhow. So overall, I think the problem with the dot operator isn't serious enough to make this use case unattainable.
A. The traits classes seem to be there to support 2 existing customization types, namely the situation where "node" is a base class of "value" (implementing "base hooks") and the situation where "node" is a member of "value" (implementing "member hooks"). My situation represents a 3rd customization type.
You can use function_hook, the hook does not need to be a member or base class.
The hack I used seems to work for now. In the future, I think it would be nice to have the option to explicitly specify the header node (so the container would just use a pointer) in the interest of clearly separating any type of node/value allocation from the containers. But I wouldn't say that's a priority.
B. Even though the traits classes allow one to specify pointer types, it's possible (?) that under the hood the classes were designed to support only raw pointers and smart pointers, not the indirect offset-like pointers I need.
See previous answer about supported reference types.
Given A&B, it seems to me that in my customization I would need 2 different hacks. In the interest of code stability, I started wondering whether it is feasible to adapt existing boost::intrusive data to my needs (my first instinct) as opposed to simply rolling out my own rbtree.
I don't know the impact of supporting smart reference types. It might be impossible, I haven't tried it.
0. Obviously: Have you seen a similar situation being dealt with before? Maybe there's an existing solution out there. My reluctance to rolling out my own intrusive rbtrees is due to the fact I'd have to test them, as opposed to using a known implementation.
1. Are boost::intrusive data structures supposed to support custom pointer classes (neither real pointers nor smart pointers)? For instance, in 1.46 I found this code in boost/intrusive/detail/tree_algorithms.hpp
Yes, boost::interprocess::offset_ptr (but a raw reference can be converted to a smart pointer easily).
static node_ptr uncast(const_node_ptr ptr) { return node_ptr(const_cast
( ::boost::intrusive::detail::boost_intrusive_get_pointer(ptr))); } However, intrusive lists worked just fine in 1.46, which suggests this was simply a bug. This was fixed in recent Boost releases using pointer_traits to implement this function. You can customize pointer_traits to implement your own version.
2. Can you point me to an example of defining and using custom pointer class with boost::intrusive? To enable compilation (of lists only) in 1.46, I had to customize std::iterator_traits (that part I understood), but then also boost::pointer_to_other and boost::intrusive::detail::smart_ptr_type. Even though the code compiled, I don't know what those are doing exactly, so I'm weary of them.
See the example in libs/intrusive/example/doc_offset_ptr.cpp and boost/interprocess/offset_ptr.hpp. See the most recent boost version for fixed bugs. A dumb smart pointer is used in boost intrusive tests, and it's found in libs/intrusive/test/smart_ptr.hpp, but it's a bit oudated. Anyway I think pointer requirements for Boost.Intrusive might no be correctly specified and it might require some fixes. Recent versions uses pointer_traits, and I recommend you this approach.
3. In 1.55, I have a few more pages of errors, I think mainly due to these pointer customization issues I don't fully understand. Something seems to have changed from 1.46 to 1.55. Is there a reference about what changed? Or is there a portable way of rolling out such a custom pointer class?
pointer_traits was specified and all code was ported to use this customization point. The reference type is assumed to be a raw reference (as there is no way to overload operator dot to implement a working smart reference).
The only why to use Boost.Intrusive would be to implement a smart pointer (of any pointee type) that can be fastly obtained from raw reference. If this is your case, Boost.Intrusive might already have all that you need. If you need a "smart reference", then Boost.Intrusive is not prepared for this and I don't know how many effort would require adding support for this (even if this approach is feasible).
The problem with "node" objects embedded in the container could be solved with an option that allows external allocation of these nodes.
Best,
Ion
Empirically, since I got the containers (at least partially) working
with non-standard references, they must be already prepared to handle
those. Theoretically, as I described above, the containers themselves
should never have to dereference specific fields anyhow, and therefore
should be immune to reference type changes.
The more serious concern, ***and the reason I wrote all this***, is with
pointer casts. Unlike raw or smart pointers, these handles are
type-specific. So casting them to something else and then dereferencing
that object should be done with care. Take bstree, for instance. Its
base class (bstbase3) inherits from node & Value_Traits. The way you
obtain access to the Value_Traits struct (used with stateful traits) is
through a pointer which you later want to dereference:
typedef typename pointer_traits
El 11/02/2014 0:49, Matei David escribió:
However, things seemed to work ok if I don't wrap voids. So the main concern with my use case is that: ***intrusive containers should not cast node_ptr objects into other types of pointers (not even void*), and then attempt to cast them back***. They can cast other pointers (such as value_traits*) to void* and back, but not node_ptr-s. Please have this use case in mind when further developing boost::intrusive. The files I linked above can serve as a basic test case.
Matei, Thanks for your thorough explanation. I'm not sure this casting limitations could be acceptable in future Intrusive developments, but I agree that Intrusive should somehow be used with non-trivial pointers to solve problems like the one you're trying to solve. Right now I don't have enough time to carefully study your work in progress but I appreciate your effort and I'll try to fully understand all the details needed to support this special addressing.
I can help writing up this use case in slightly more detail if you think that would be useful for others.
Thanks. If you think it will be useful to support it in Intrusive, feel free to share your improvements in the mailing list. Best, Ion
participants (2)
-
Ion Gaztañaga
-
Matei David