[intrusive] New version uploaded. Review Request.

Hi to all, I've just uploaded a new version of Intrusive, the intrusive container library. This is a major rewrite of the library, and includes many breaking changes, like class names, headers... You can find the library in Boost.Vault, http://boost-consulting.com/vault/index.php?&direction=0&order=&directory=Containers or Boost.Sandbox CVS These are some of the changes: -> Added iset (intrusive set) container. -> Added constant-time "size()" option to containers. This option adds a member to every container that's updated with every insertion/removal. This changes some complexity guarantees in the containers. This constant-time "size()" is optional and it's not enabled by default. -> Added support for safe-mode and unsafe-mode hooks. In safe hooks each value's constructor puts the hook in a well-known state. Every time a value is inserted in the container, that state is checked. Every time a value is erased from the container, the hook is put in the well-known state. This can detect many programming errors when using intrusive containers. On the other hand, unsafe-mode hooks are faster and some container operations are have constant-time complexity when using unsafe hooks instead of linear-time complexity. -> Added support for non-raw pointers. It's possible to build shared memory intrusive containers using offset_ptr<..> instead of raw pointers. -> Added many member functions to intrusive containers. These members are missing operations that are present in their STL counterparts -> Removed the additional one pointer size overhead from member hooks. Now member and derivation approach have the same size overhead for a value. -> Added non "auto-unlink" hooks as default hooks. Auto-unlink hooks are also provided. -> Optimized rbtree algorithms for "equal_range" and "clone". Clone now offers support to clone trees using custom cloning functors. This allows recycling nodes in non-intrusive containers' operator=(). -> Added support for arbitrary keys in searches and insertion checks for associative containers (iset/imultiset). It's possible to use just an arbitrary key and a predicate comparing that key and the value inserted in the container in many functions: "find", "equal_range", "count", "lower_bound", "upper_bound". This allows searches without constructing a full value, something that in many cases, is quite expensive. -> If pointer alignment is power of 2 (nearly always), Intrusive hooks offer 3 pointer size overhead instead of the typical 3 pointer + integer overhead. This is achieved using the least significant bit of the parent pointer as the "red/black" bit of the node. -> Documentation has been improved and expanded. Added complexity guarantees for nearly all functions. Added a full quickbook/doxygen reference. -> More tests. Tested in WinXP-Visual 7.1 and MinGW-gcc 4.1 I think that Intrusive can serve now as a good basis to build non-intrusive containers (STL-like, multi-index like...). I've rewritten Boost.Interprocess containers using Intrusive containers, and performance and size have been improved. I'm also experimenting with a new logarithmic allocation time best-fit memory algorithm for Interprocess, based on intrusive red-black trees. I would like to request a review for the library to the review manager. While the library waits its review, I'm open to suggestions and improvements. If the library is accepted, my plan is to add more containers (pseudo-intrusive hash-set, maybe n-ary tress) and more functionality to the library. Comments and suggestions welcome! Regards, Ion

Ion Gaztañaga wrote:
-> Added constant-time "size()" option to containers. This option adds a member to every container that's updated with every insertion/removal. This changes some complexity guarantees in the containers. This constant-time "size()" is optional and it's not enabled by default.
I think you should try to model the STL. Indeed, containers such as std::list provide a constant time size function. ilist should do the same.
I think that Intrusive can serve now as a good basis to build non-intrusive containers (STL-like, multi-index like...).
Since intrusive linked lists is a common usage in C, do you have examples of libraries where that could be wrapped in your containers?

On 12/9/06, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
I think you should try to model the STL. Indeed, containers such as std::list provide a constant time size function. ilist should do the same.
I'm fairly certain that this does not need to be the case. There was a thread a while back about going though the sources to look for uses of list::size exactly because it could be O(n) to ensure that there was always an O(1) empty-using function as an alternative to size. ~ Scott

me22 wrote:
I'm fairly certain that this does not need to be the case.
There was a thread a while back about going though the sources to look for uses of list::size exactly because it could be O(n) to ensure that there was always an O(1) empty-using function as an alternative to size.
I thought containers were only supposed to provide operations if those were efficient. If std::list::size is linear, then it shouldn't have been provided and std::count should have been used instead. I do not have the time to check the spec though, so I'll believe you if you say it's not guaranteed to be O(1).

On 12/9/06, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
I thought containers were only supposed to provide operations if those were efficient. If std::list::size is linear, then it shouldn't have been provided and std::count should have been used instead.
std::distance, but yeah. I believe that std::list::size should not be provided so that it's obviously linear, allowing std::list::splice to be constant-time because you can cache size yourself but you can't make an O(N) splice O(1).
I do not have the time to check the spec though, so I'll believe you if you say it's not guaranteed to be O(1).
It seems to have its compexity specified as "(Note A)" which is "should be constant", but should != must. A quick google search found http://gcc.gnu.org/ml/libstdc++/2005-11/msg00219.html : "It would also be most interesting if it surveyed something more than boost, or if it at least did a more complete survey of boost. Finding all of the uses of list::size() in generic code is a real problem. Searching boost sources for size() turns up nearly 2000 hits, most of which won't be related to list, but I wouldn't be surprised if a few percent of them were inappropriate calls to list::size() (or could be, depending on template parameters). But I can't make a career out of manually picking through 2000 calls to size () and figuring out which are which. :-) "I realize I've got an uphill (almost sure to loose) battle here. gcc has had an O(N) list::size for forever. And the supporters of that decision are numerous, adamant and smart. When I considered gcc's libstdc++ a competitor, I actually took comfort in that fact, secure in the knowledge that gcc would never come around and challenge me in that department. If gcc's customer base mysteriously saw significant performance gains when migrating to my product, that was a good thing! ;-) Now, ironically, I find myself in a position of trying to argue for the very change that I took comfort in believing would never happen. :-) "My position isn't that a doubly-linked list should have an O(1) size. It is that if a container, any container, has a size(), then that member should be O(1) (same for operator[], at, swap, max_size, capacity, begin, end, front, back, push/pop_front, push/pop_back, empty, default ctor, -- perhaps amortized O(1) on some of those). It would be incredibly easy to give std::list an operator[]. It would also be an incredibly bad idea. "std::list has a size(). We should either get rid of it, or make it O (1). Having it there, and (maybe) be O(N) is nothing but a bear trap waiting to spring on the next person that comes to C++ and hasn't yet memorized Scott's Effective STL (btw I haven't memorized that valuable book either). I think getting rid of list::size is too big of a backwards compatibility hit." ~ Scott

me22 wrote:
On 12/9/06, Mathias Gaunard <mathias.gaunard@etu.u-bordeaux1.fr> wrote:
I think you should try to model the STL. Indeed, containers such as std::list provide a constant time size function. ilist should do the same.
I'm fairly certain that this does not need to be the case.
There was a thread a while back about going though the sources to look for uses of list::size exactly because it could be O(n) to ensure that there was always an O(1) empty-using function as an alternative to size.
I think that Howard Hinnant has a very good paper about this, titled "On std::list::size": http://home.twcny.rr.com/hinnant/cpp_extensions/On_list_size.html He suggests adding a new splice overload taking the number of elements between two iterators to obtain a O(1) size with lists with O(1) size. I've added that overload both to the intrusive singly and doubly linked lists. Historically, SGI STL had a O(N) size std::list. In Interprocess, since I took SGI implementation as my base implementation I had O(N) list. The last version has O(1) size. I think that most users expect list::size() to be O(1) because they think that otherwise, the operation wouldn't be there, just like std::list has no operator[](). All other containers have O(1) size, so is reasonable to expect O(1) size in list. However, with intrusive containers I think that both possibilities should be provided. In most cases, if the user uses an intrusive list to store his objects, he might expect the same behavior than std::list and Windows users (surely using Dinkumware STL) will expect O(1) size. On the other hand, if people use list::splice as a basic operation, they want to obtain O(1) splice. There are more examples of more efficient operations with O(N) size: for an intrusive list "erase(iterator, iterator)" O(1) if we have O(N) size. There is another reason to have a O(N) size() intrusive list: space overhead. If a boost user wants to implement a TR1 unordered_map, a typical implementation is to have bucket array and each bucket is a singly (maybe doubly) linked list of values. The library author does not need islist::size() for nothing, but an empty circular singly linked list can be implemented with just a pointer pointing to itself. Since the hashmap's load factor is normally 1.0, this means that for every inserted object, there is a bucket. Adding the size member duplicates the size of the bucket, and this space overhead is noticeable. If the programmer has an option to have a O(N) size, he reduces the size of a bucket to just one pointer. If we have a "unordered_set<int>" with a load factor = 1.0, and a space-friendly allocator (a node allocator), there is a ~25% size overhead if we use a O(1) size intrusive singly linked list: every node has 2 words (1 word for every stored int, 1 word for the pointer used to implement the linked list) and every the bucket has two words (one for the pointer to implement the linked list and another word to store the size of the linked list). Original Intrusive library containers had O(N) size. Since I think that most users expect to have O(1) size, I've added that possibility. The default is to have O(N) size, but that, of course, can be changed, if people prefer O(1) size. Thanks for the comments, Ion

"Ion Gaztañaga" wrote:
std::list::size":
One solution may be to provide list::size() and have the size itself encoded as a signed integer. If the integer is >= 0 then the size() will return it. After splice() the number will be set negative. Only the next size() call will go through the list and set the correct value again. For a typical list<> use the size() should get amortized to O(1). Alexandrescu had discussed this few years ago in context of his YASLI library. /Pavel

Pavel Vozenilek wrote:
"Ion Gaztañaga" wrote:
std::list::size":
One solution may be to provide list::size() and have the size itself encoded as a signed integer. If the integer is >= 0 then the size() will return it.
After splice() the number will be set negative. Only the next size() call will go through the list and set the correct value again. For a typical list<> use the size() should get amortized to O(1).
Alexandrescu had discussed this few years ago in context of his YASLI library.
This can be an option. The downside is that size() is a const function and we should declare the size member as mutable. This converts a thread-safe function like size() to a non-thread safe one. But I agree that this can be a good solution. To avoid wasting a bit of the internal size integer, we could define size() as "this->size_ - 1", so this->size_ == 0 could represent unknown size and this->size_== 1 an empty list, this->size_ == 2 a list with just 1 element and so on. Regards, Ion

"Ion Gaztañaga" wrote:
One solution may be to provide list::size() and have the size itself encoded as a signed integer. If the integer is >= 0 then the size() will return it.
This can be an option. The downside is that size() is a const function and we should declare the size member as mutable. This converts a thread-safe function like size() to a non-thread safe one.
I don't think much of MT safety is obtained by just relying on atomic list::size(), at least I hope people don't rely on it.
To avoid wasting a bit of the internal size integer, we could define size() as "this->size_ - 1", so this->size_ == 0 could represent unknown size and this->size_== 1 an empty list, this->size_ == 2 a list with just 1 element and so on.
Using negative value to indicate the need for recalculation translates into a single assembler instruction (jz on x86). resize() and empty() would need to do the check as well. /Pavel
participants (5)
-
Ion Gaztañaga
-
Mathias Gaunard
-
me22
-
Pavel Vozenilek
-
Sebastian Redl