Re: [boost] Review of Intrusive library begins today March 12

Hi Pavel, Nice to see you've found time to the the review. I will reply the comments that I'll consider most important: Pavel Vozenilek wrote:
* An vector-like pseudo-intrusive container may be added. It is superfluous but people may like it more than lists.
Well, a vector is not even pseudo-intrusive, because it has no nodes. std::vector<> + allocator is not sufficient in your opinion?
(Is a map like intrusive container possible, btw?)
I think a map-like intrusive structure is not very useful. A iset with a value type that has std::pair interface is enough, the advanced search functions can be used to compare the key part with the value type.
* More examples, e.g. one example per every combination of hooks and containers, just to have something to copy, paste and play.
"Every combination" might be too much ;-) but when presenting each container, I might add example using different hooks.
================================================================ One existing library that implement intrusive containers is Data Object Library (http://www.codefarms.com/dol.htm, also http://www.codefarms.com/ptl.htm).
This library uses external, preprocessor like, tool to generate code that handles lifetime management of the data. It meansd that you create an object, add into into how many containers you want and when it is removed from the last container then it gets automagically deleted.
Author of the library claimed development time speedup 10x comparing to classical manual lifetime management.
jjLib (http://sourceforge.net/projects/jjlib/) is based on the same pronciples and also uses external code generator.
I will look these for ideas and new features.
What I imagine as potentially possible is for the Intrusive Container library (or its complement) to completely manage lifetime of inserted objects:
* when an object is removed from the last container it gets automatically deleted.
This would need an embedded reference counted hook. but these reference count should be shared between all the hooks of a value type (including hoooks from base classes). Not an easy task to do in standard C++, I guess ;-), but I will put it in my investigation list.
================================================================ Naming conventions:
* I would personally prefere names like boost::intrusive_slist. The ilist is too easy to overlook and this type of naming is used way too often to cause confusion.
Ok. People consider boost::intrusive::slist good enough, but I have no problem with the names.
* The code snippets should not use generic names as Vector or List for the same reason.
Ok.
* ilist_auto_base_hook etc.: it is not really clear whether the class is intended to be base (it would end with _base).
It's difficult to find a meaningful short name. I think list_base_hook is good enough, but suggestions are welcome.
Perhaps auto_removing_i[ntrusive_]list_base. The word "hook" looks as unnecessary. The member hook may look like: auto_removing_i[ntrusive_]list_support.
Others have proposed using a single class using template parameters to specify the linking policy (safe, normal, auto-unlink).
1. overview.html: the sentence
"On the other hand, an intrusive container does not store copies of passed objects, but it stores the objects themselves."
may be better w/o using work store:
"On the other hand, an intrusive container does not insert copies of passed objects, but it references the objects themselves, without copying their data."
or so.
Your phrase is not convinging for me, but I don't find mine good enough.
"If you want to share an object between two containers, you either have to store multiple copies of those objects or you need to use containers of pointers: std::list<Object*>."
append "... or of smart pointers" just to remind this.
Ok.
------------- typo: Boost.Intrusivecontainer - missing space
Ok.
-------------- "In intrusive containers you don't store a copy of an object, but rather the original object itself."
may be better reworded to avoid "store" which usually implies copying.
Ok, but which other word would you use?
________________________________________________________________ 2. concepts.html:
Evry concept here may contain link to a small but complete example.
If is the second page in documentation and it may be bit too scaring for a newbie to get overloaded with abstract concepts w/o seeing what it could be good for.
I will try to link concepts. Others have proposed moving he full concepts page to the end of the documentation and start with an small summay.
The bullet item starting with "Node Traits" is a bit hard to parse:
* the term basic operations may be explained (or at least differenciated from NodeAlgorithms at this point) * "defines the interface" may be better "expects interface"
typo " to its own class" -> "to his own class"
Ok.
"ValueTraits will contain the information to convert user classes in nodes compatible with Node Algorithms."
I am missing what are the user classes converted into (that's that way I parsed it).
Ok. I will try to improve this.
I do not like the term "Pseudo-Intrusive Container" * it suggest something less worth that the "real container" * it gives impression that the essence of intrusiveness is missing here, which it is not, AFAICS
Perhaps "Intrusive Container with auxiliary allocated memory" looks clumsy but it exactly says what it is and nothing more.
I would prefer something shorter ;-)
In concepts summary: the sentence
"Node Algorithm: A class containing typedefs and static functions that manage a group of nodes."
- the word "manage" is used for the first tima and may mean quite a many things.
Something like "...that define basic operations that can be applied to a groups of nodes"?
The term "Node" if not defined and assumed implicitly (or from one prior code snippet).
...and this one it's quite difficult to define, I'm afraid.
------------- Possibly the names of the concepts may reflect which concetps are already implemented by the library and which always need to be reimplemented by the user.
All these concepts are implemented by the library, even though some are customizable by the user.
Auto-unlink hooks are mentioned here for the first time but ony in one sencence, As this may be one of the most useful features there ay be a link and/or example so the reader would be able to test it immediatelly.
Ok, a link to the auto-unlink hooks page would be enough.
________________________________________________________________ 4. usage.html:
Possibly this page may contain a link to complete examples of all possible combinations, e.g.:
* node with base hook * node with several base hooks * node with member hook * node with several member hooks
Don't you think this is a bit too much? Do you think an example using both (base and member) is too confusing?
It may be mentioned that tag does not need to be fully specified.
Ok.
------------- the code snippet with typedef boost::intrusive::ilist_base_hook<Foo>::value_traits<Foo> FooValueTraits;
contradicts the complete example bellow with:
typedef ilist< ilist_base_hook<my_tag>::value_traits<MyClass> > BaseList;
(in what is used as the template parameter of value_traits).
Well, MyClass is a concrete class of the example, so I thought it was not confusing, I could just write MyClass in both examples, if that's less confusing.
"With these features, without any external reference the user can know if the object has been inserted in a container."
- this should mention the API call to find out this information.
Ok.
--------------- The safe mode (probably) applies to single hook, not for situations with two or more hooks.
No, the safe mode is applied for every hook. If you try to insert an object in a container and it's already inserted in another container that uses the same hook, the code will assert.
The items: * the programmer needs to efficiently track the construction and destruction of objects * exception safety, especially the no-throw guarantee, is needed
may need bit more context.
The item: * additional memory management should be avoided
may talk rather about the allocation.
I'll try to elaborate a bit more.
//This is a container of abstract values: you can't do this with std::list. typedef boost::intrusive::ilist< Window::value_traits<Window> > win_list;
"abstract value" is what?
Ok. "Abstract class" is maybe more correct?
________________________________________________________________ 6. auto_unlink_hooks.html:
"non-constant time containers" - should have backlink to information what it is.
Ok.
---------------- "If several threads are using the same container, removing the object is thread-unsafe."
I guess that is true for removing an item from container explicitly.
I will improve it. The difference is that removing explicitly can be executed with a locked mutex, whereas the destructor is not explicitly called, will be automatically executed and the user might forget this operation.
----------------
"Container contents change silently without touching the container directly. This can lead to surprising effects."
touching => referencing This can lead ... =>
Touching is not referencing. Touching should say "modifying"
The bool used for SafeMode template parameter may be replaced by an enum and symbolic name.
Ok.
ilist_auto_base_hook<>::linked()
may be is_linked() or is_in_container().
is_linked() sounds good. No problem to change it.
--------------- Possible change in API: the second and third template parameters may be merged into one, like:
template<class ValueTraits, class SizeType = std::size_t> class islist;
and for no constant size() used as
ilist<a_tag, boost::size_not_kept>
I've already explained that I consider these properties orthogonal. You might not have a constant-time size but still need a non trivial std::size_t to give support for allocator based non-intrusive containers.
Explanation why the islist is circular and what impact it has on semantics is missing.
Others prefer to remove this reference and not say how it's implemented. I think this should be the case with list/slist/set-containers that can be implemented differently. If more containers are added, like irbtree/avl_tree or similar, this should be explained.
It would be interesting to hear whether the "two-way pointers" (e.g. http://www.semantics.org/code.html#flist) could be used together with Boost.Intrusive Containers.
I will investigate it.
________________________________________________________________ 9. iset_imultiset.html:
"Boost.Intrusive also offers associative containers that can be very useful when creating more complex associative containers,..."
- this is like saying a class may be used to built more complex classes, i.e. redundant (unless I am missing something).
Ok. I´ll remove it.
OT question: could a intrusive container with the data be ROMable? This may be mentioned somewhere.
I don't think so, because an intrusive container must modify the objects.
________________________________________________________________ 10. iunordered_set_iunordered_multiset.html:
memory allocator may be added as template parameter. If someone has really critical need for speed he may employ a zone allocator.
I prefer maintaining memory allocator out of Intrusive. I think this is the correct approach to maintain, even with pseudo-intrusive containers.
________________________________________________________________ 11. advanced_lookups_insertions.html:
"advanced lookup" may be named "matching by partial key", "advanced insertions" could be "optimized insertion".
The lookup by partial key is employed by at least one other library, if found it may be referenced.
Apart from partial keys other types comparable with the value type can be used, so I don't like this term.
________________________________________________________________ 12. erasing_and_destroying.html: [snip] Possible problems with the approach here:
* it is not possible to enforce that the objects destroyed were dynamically allocated.
* it is dangerous when object is part of more than one intrusive container.
* the names remove_and_destroy, erase_and_destroy and clear_and_destroy are not very helpful.
Some have proposed "dispose". I think this feature is really useful to easily build non-intrusive containers above intrusive containers.
________________________________________________________________ 13. clone_from.html: [snip] I noticed the documentation does not mention copyability and assignability of the data handled by intrusive containers. Pehaps the intro should inform what is possible, what is recommended and what dangers there are.
The data handled by intrusive containers have no restrictions. It can be non-copyable and non-moveable, as it's explained in the documentation.
One needs to remember that the cloned container needs to be destroyed explicitly and manually. It feels rather dangerous.
Just like the container from you cloned ;-) That's how intrusive containers work. Your clone function could have just allocate a vector of objects and return a reference instead of calling new.
class X { ... };
X x;
ilist<X> c1; ilist<X> c2; c1.push_back(x);
c1.clone_from(c1, ...); // will work
c1 first clears the target container, so there is nothing to clone.
X x1; X x2;
ilist<X> c3; ilist<X> c4; c3.push_back(x1); c4.push_back(x2);
// will this crash? // (the destroyed will try to delete the local x1) c3.clone_from(c4, ...);
The cloner will not try to delte the local x1, it will just call your destroyer function. Obviously, if you are using local objects this will crash, but with intrusive containers the programmer has the responsibility of memory issues.
________________________________________________________________ 14. using_smart_pointers.html
offset pointer should not be called smart pointer.
Why not? Smart pointers don't need to be related with lifetime management issues, as I understand the term. How would you call offset_ptr, a "pointer class"?
Boost.Smart Pointers is permitted here or not?
No.
The docs may say why Boost.Smart Pointers are not useful here.
The docs already state that the smart pointers need to have the same ownership semantics as raw pointers. But i will explicitly say the Boost.SmartPointer classes are not compatible.
________________________________________________________________ 15. obtaining_iterators_from_values.html [snip] ------------
The docs may say what is this feature good for (traversal, yes, but bit complicated as one needs to keep the first obtained iterator).
An iterator is the key to access to STL algorithms. If you have the value, you have access to STL algorithms.
Something as ability to get the end() and begin(), that would be a killer feature.
I'm afraid this will be difficult.
"local iterators" - a term I never saw before. Without knowing what it is I do not get half of the content here.
It's in the hash table proposal for C++0x.
-----------
Now I understand that the node algorithms + traits are like building blocks of the library (or are they completely separate?).
These are building blocks, but they can be used for other tasks where containers are not suitable.
Custom ValueTraits example:
"...we only need an extra pointer..."
w/o the "extra".
Ok.
Invalid comment in example code (on the bottom of this section): //Now test both lists
Ok.
------------ ValueTraits::value_type vs. NodeTraits::node
- could be named "reusing primitive intrusive operations for different values"
"Reusing node algorithms for different values" might be better.
Boost.Intrusive in space constrained environments
- here may be a table listing (again) the overhead or calculation of the overhead for list/slist/... node and of list/slist/... container.
I will add this as a separate chapter.
"For example, the programmer can use Boost.Intrusive to manage old data structures whose definition can't be changed."
-> ...can customize parts of Boost.Intrusive ...
Ok.
________________________________________________________________ 19. acknowledgments.html:
Joaquin M. Lopez Munoz - some diacritics is missing, doesn't?
Yes. I still don't know how to insert them in Quickbook, maybe a UTF editor is needed to achieve this. In the header HTML style scape sequences can be used, but not in the documentation body. Don't worry Joaquín, I'll fix that ;-)
why is ilist(const ilist &); listed in public interface if it is not implemented? Dtto operator=().
Boostbook/Doxygen error. Recently fixed in CVS thanks to Eric Niebler.
void push_back(value_type &) ;
- why non-const reference? Coudn't it make problems when an temporary is used as the parameter?
Const refernece has no sense with intrusive containers, because the value has to be modified. Temporaries can't bind to non-const references.
Possibly shift(integer how_much = 1) member could be added that would move begin() and end().
Use case: list of tasks to be processed repeatedly.
I'll think about this.
"Containers also know that the a value can be silently erased..." - what does it mean "they know" - that empty() has worse O complexity?
No just that they must cope with the fact that between two operations, the number of present nodes might have changed. For example, the size can't be internally stored (because it would be invalidated if an auto-unlink value operates between two operations).
* Example using BOOST_FOREACH could be added somewhere.
I'm not a big fan of it ;-(
* Does Boost.Assign work with this library?
No idea. But I plan to maintain the library without many dependencies or compile-time overhead.
* Does the library works together with Boost.Serialization?
No. Deserialization would be also a problem. How will be the objects created? (intrusive containers don't manage memory)
* Does the library (in debug mode) detect adding the same object multiple times into the same container?
If you are use safe-mode or auto-unlink hooks, yes.
* perhaps a header with forward declarations could be added into the library, like boost/intrusive_containers_fwd.hpp.
Ok.
________________________________________________________________ EOF
Thanks for your thorough review. Professional, as always, Regards, Ion

"Ion Gaztañaga" wrote: Pavel Vozenilek wrote: _________________________________________________________
* An vector-like pseudo-intrusive container may be added. It is superfluous but people may like it more than lists.
Well, a vector is not even pseudo-intrusive, because it has no nodes. std::vector<> + allocator is not sufficient in your opinion?
What I mean is a container that provides operator[]. Possibly something what keeps an auxiliary array in container and where hook has either * one back-pointer into the internal array inside container (to allow iterator::operator+(int)). * or one back pointer to the container + integer index (this would allow range checking)
================================================================
One existing library that implement intrusive containers is Data Object Library (http://www.codefarms.com/dol.htm, also http://www.codefarms.com/ptl.htm).
jjLib (http://sourceforge.net/projects/jjlib/) is based on the same pronciples and also uses external code generator.
I will look these for ideas and new features.
The online documentation is not perfect. The author had written book Taming C++: Pattern Classes and Persistence for Large Projects, Jiri Soukup, 1994, ACCU review: http://www.accu.org/bookreviews/public/reviews/t/t001196.htm). where the library and the intrusive technique is described. The open source library is underdocumented but source is available. _________________________________________________________
What I imagine as potentially possible is for the Intrusive Container library (or its complement) to completely manage lifetime of inserted objects:
* when an object is removed from the last container it gets automatically deleted.
This would need an embedded reference counted hook. but these reference count should be shared between all the hooks of a value type (including hoooks from base classes). Not an easy task to do in standard C++, I guess ;-), but I will put it in my investigation list.
I though about it a bit more since I wrote the review and here's how I would structure it (provided it is all technically feasible): The library would provide containers and hooks parametrized by one more parameter: a boolean OwnsData (alternatively differently named containes could exist): The containers not owning the data would: * Never ever destroy the data. * Cloning, assignment and copy construction won't be provided. * Erase/etc container methods would only unlink the container's hook. The containers owning data would: * Always own the data collectively (the last owner destroys it) * Assume the data was *always* allocated with new. * The cloning/assignement/copy constructor may be provided, employing copy constructor or the value and new/delete. (I personally would avoid it too as the semantic is too complicated.) * The value would provide few API functions , e.g. * move_value_from_one_container_to_another(), * extract_value_out_of_all_containers() (similar to std::auto_ptr::release) * possibly some pin_value/unpin_value wrapped in a RAII * Auto-hooking would behave as it does now. After the last hook is unlinked there reference to the value will became invalid. * Explicit manual delete of the value would auto-unlink the object first. * Erase/etc container methods would unlink the hook and delete the value itself if it is not owned anymore. There would be no way to mistakenly insert a value parametrized to be owned into non-owning container and vice versa. This approach is (roughly) semantically equivalent to what the libraries above offer for lifetime management. _________________________________________________________
"In intrusive containers you don't store a copy of an object, but rather the original object itself."
may be better reworded to avoid "store" which usually implies copying.
Ok, but which other word would you use?
E.g. In intrusive containers you don't copy an object into the container but rather the original object gets linked together with the other objects in the container. A schema of slist owning two three values may be the best description. _________________________________________________________
"Node Algorithm: A class containing typedefs and static functions that manage a group of nodes."
- the word "manage" is used for the first tima and may mean quite a many things.
Something like "...that define basic operations that can be applied to a groups of nodes"?
yeah ________________________________________________________________
Possibly this page may contain a link to complete examples of all possible combinations, e.g.:
* node with base hook * node with several base hooks * node with member hook * node with several member hooks
Don't you think this is a bit too much? Do you think an example using both (base and member) is too confusing?
The prototypical benefactor of such examples is someone who wrote some code, it didn't work so he, in the act of desperation, starts to look for working minimal example to go up. I go though this regularly. ________________________________________________________________
The safe mode (probably) applies to single hook, not for situations with two or more hooks.
No, the safe mode is applied for every hook. If you try to insert an object in a container and it's already inserted in another container that uses the same hook, the code will assert.
I actually intended to write that a value may have more than one hook and that the text in usage.html should not talk about the whole object. ________________________________________________________________
14. using_smart_pointers.html
offset pointer should not be called smart pointer.
Why not? Smart pointers don't need to be related with lifetime management issues, as I understand the term. How would you call offset_ptr, a "pointer class"? ---------------- I do not like usage of the word "smart" being expanded. Here "pointer-like object" or, uggh, "pseudopointer" could be used. ________________________________________________________________
* Does the library works together with Boost.Serialization?
No. Deserialization would be also a problem. How will be the objects created? (intrusive containers don't manage memory)
The owning containers (as defined above) would allow (de)serialization. Thinking of it the current containers should be serializable now assuming: * the values (physically stored elsewhere) are serialized too * the values were not cloned /Pavel

Pavel Vozenilek wrote:
"Ion Gaztañaga" wrote: Pavel Vozenilek wrote:
_________________________________________________________
* An vector-like pseudo-intrusive container may be added. It is superfluous but people may like it more than lists. Well, a vector is not even pseudo-intrusive, because it has no nodes. std::vector<> + allocator is not sufficient in your opinion?
What I mean is a container that provides operator[].
Possibly something what keeps an auxiliary array in container and where hook has either * one back-pointer into the internal array inside container (to allow iterator::operator+(int)). * or one back pointer to the container + integer index (this would allow range checking)
Still no convinced. What's the difference with a std::vector with a custom allocator?
The open source library is underdocumented but source is available.
Ok. Thanks.
* when an object is removed from the last container it gets automatically deleted. [snip]
I though about it a bit more since I wrote the review and here's how I would structure it (provided it is all technically feasible):
The library would provide containers and hooks parametrized by one more parameter: a boolean OwnsData (alternatively differently named containes could exist):
[snip]
The containers owning data would: * Always own the data collectively (the last owner destroys it) * Assume the data was *always* allocated with new. * The cloning/assignement/copy constructor may be provided, employing copy constructor or the value and new/delete. (I personally would avoid it too as the semantic is too complicated.)
So something similar to std::list<shared_ptr<T> > but intrusive containers. The reference count is shared between all the containers. That would require an intrusive_ptr as a member. But again: the reference count must be common for all the member. I'm thinking that this can be useful to implement an OS-like handle where the handle is an intrusive friendly class, but also reference counted. Interesting but not trivial to manage. I'll think this should wait until the first Intrusive library is out.
* Auto-hooking would behave as it does now. After the last hook is unlinked there reference to the value will became invalid.
I don't think so. Auto-unlink works only with containers that are just formed with nodes and does not have any other "living" data (like the an internal size), because the hooks have no pointer to the container (the are like any other hooks). If we want to support auto-unlink for a container with "living" data I think we should create a new category of linking policy (enhanced auto-unlink) where the hook stores also a pointer to the container where it was placed. This will increase sensibly the size of a hook (list hook would grow 50%) must this might be useful. But I want to maintain the size advantages of the current tiny auto-unlink hook. New linking policies might be created if the user wants auto-unlinking but size overhead is not an issue. I'm still not convinced with about linking intrusive containers and destruction policy. After all, that the job of an allocator. But anyway, I think the idea is interesting but requires more refinement.
_________________________________________________________
"In intrusive containers you don't store a copy of an object, but rather the original object itself."
may be better reworded to avoid "store" which usually implies copying. Ok, but which other word would you use?
E.g.
In intrusive containers you don't copy an object into the container but rather the original object gets linked together with the other objects in the container.
Ok.
No, the safe mode is applied for every hook. If you try to insert an object in a container and it's already inserted in another container that uses the same hook, the code will assert.
I actually intended to write that a value may have more than one hook and that the text in usage.html should not talk about the whole object.
Ok.
________________________________________________________________
14. using_smart_pointers.html
offset pointer should not be called smart pointer.
Why not? Smart pointers don't need to be related with lifetime management issues, as I understand the term. How would you call offset_ptr, a "pointer class"? ----------------
I do not like usage of the word "smart" being expanded. Here "pointer-like object" or, uggh, "pseudopointer" could be used.
According to Wikipedia (if we can trus: "In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. These additional features are intended to reduce bugs caused by the use of pointers while retaining efficiency. Smart pointers typically keep track of the objects they point to for the purpose of memory management." Observe "such as" or "tipically". Smart pointers are normally for memory ownership issues but a bounds checking pointer has ownerships semantics. I think offset_ptr is a smart pointer. Offers additional features like memory base address independent. Regards, Ion
participants (2)
-
Ion Gaztañaga
-
Pavel Vozenilek