[countertree] New Version

Francisco José Tapia

1 May 2011 1 May '11

9:08 p.m.

Hi this message is to announce the new version of the countertree library. You can find the zip file with the code and the documentation in < http://www.boost-consulting.com/vault/index.php?action=downloadfile&filename=countertree.zip&directory=Containers&> or if you want a quick look , you can see in my web page < http://fjtapia.webs.com/works/countertree/index.html> Technically this library is an implementation of a binary red-black counter tree. Colloquially is a balanced tree with an additional counter in each leaf. This permit the access to the elements by the position, like in a vector. It is a random access container. If the information stored is unordered, we have the class boost::vectortree. This class have the same interface than a std::vector. The std::vector is very fast O(k) inserting and deleting at end , but very slow O(N) if you must do in other positions . In the vectortree all the operations are O(logN). It is a bite slower than std:vector inserting and deleting at end, but much more faster for to insert and delete in any other position. If the information stored is ordered, you have the classes boost::set, boost::multiset boost::map and boost::multimap. These classes have identical interfaz than the std::set , std::multiset, std::map and std::multimap, and additionally, access by position with the function at(pos). The iterators are random access and have a function pos() which return their position in the container. You can mix safely access by iterators and access by position. The performance of these classes are similar to the classes of the STL. #include <boost/countertree/set.hpp> #include <iostream> int main ( void ) { //------------------------ begin -------------------- boost::set<int> A ; for ( int i = 0 ; i< 100 ; ++i) A.insert(i); for ( int i = 0 ; i < A.size() ; ++i) std::cout<<A.at(i); return 0 ; }; For to show the utility of this, imagine a large unordered file with the information about the employees of a big company. You have indexes implemented with multimaps, with pairs [value, position], and use the upper_bound and lower_bound functions . You need to select the employees in a range of age and weight. With the boost:multimap you can know the number of elements of the selection. In the weight query we found 100 elements and in the age query 10000 elements. To travel the elements of the weight query asking for the age, is faster than the opposite. Tray it. It fast, useful and easy to understand and use,. They are the STL classes with a few additional functions. I had checked this code with GCC (32 and 64) and Visual Studio 2008 (32) and Visual Studio 1010 (32 and 64). Please, if you find any problem, please mail me ( <fjtapia@gmail.com>) or if you mail Boost, please insert [countertree] in the head of the message. Some days I have not time to read all messages arrived to my mail, but if I see that I detect and respond first. Francisco Tapia <fjtapia@gmail.com>

Show replies by date

Rene Rivera

2 May 2 May

3:33 p.m.

On 5/1/2011 4:08 PM, Francisco José Tapia wrote:

...

or if you want a quick look , you can see in my web page< http://fjtapia.webs.com/works/countertree/index.html>

Please adjust your docs to make it *very* clear this is not an official, or accepted, Boost C++ Library. Preliminaries... Have you looked at the past rank tree and general tree conversations in the Boost dev list? Specifically the TR paper and partial implementation on this subject that was created as a GSoC project?

...

Technically this library is an implementation of a binary red-black counter tree. Colloquially is a balanced tree with an additional counter in each leaf. This permit the access to the elements by the position, like in a vector. It is a random access container.

Have you considered that it's possible to do away with the red-black information as the rank/count information is enough to create the balanced tree?

...

If the information stored is unordered, we have the class boost::vectortree.

You really should have some sub-namespace in that. I.e. "boost::countertree::vectortree" or such. It is frowned upon to insert things into the main boost namespace.

...

This class have the same interface than a std::vector. The std::vector is very fast O(k) inserting and deleting at end , but very slow O(N) if you must do in other positions . In the vectortree all the operations are O(logN). It is a bite slower than std:vector inserting and deleting at end, but much more faster for to insert and delete in any other position.

Are there other differences than performance to the standard containers? I.e. why would I use vectortree instead of some other container? For example, why use a vectortree<std::string> vs. a hashmap<size_t,std::string>? NOTE: Yes, I'm asking very loaded question.. So assume I know a lot about the particulars of this subject ;-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org (msn) - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim,yahoo,skype,efnet,gmail

Marsh Ray

4:39 p.m.

New subject: Proposed documentation convention for pre-accepted libs

On 05/02/2011 10:33 AM, Rene Rivera wrote:

...

On 5/1/2011 4:08 PM, Francisco José Tapia wrote:

...
or if you want a quick look , you can see in my web page< http://fjtapia.webs.com/works/countertree/index.html>

Please adjust your docs to make it *very* clear this is not an official, or accepted, Boost C++ Library.

I suggest that Boost adopt a convention for the documentation of libraries that are in the formally proposed or trial balloon state. It seems like a good idea to have an alternate set of identifiers, perhaps an alternate BoostBook stylesheet, macros, etc. that library authors can use when developing libraries that they think they might want to submit. A set of guidelines that would make it easy for an author to put together everything for a well-integrated submission without it looking like he's representing his library as an official Boost project. Then it can be just a few search-and-replace operations away from official once the library is accepted. - Marsh

Stewart, Robert

5:03 p.m.

New subject: Proposed documentation convention for pre-accepted libs

Marsh Ray wrote:

...

On 05/02/2011 10:33 AM, Rene Rivera wrote:

...
On 5/1/2011 4:08 PM, Francisco José Tapia wrote:

...
or if you want a quick look , you can see in my web page< http://fjtapia.webs.com/works/countertree/index.html>

Please adjust your docs to make it *very* clear this is not an official, or accepted, Boost C++ Library.

I suggest that Boost adopt a convention for the documentation of libraries that are in the formally proposed or trial balloon state.

There were extensive discussions of that recently (within the last two years, anyway). _____ Rob Stewart robert.stewart@sig.com Software Engineer using std::disclaimer; Dev Tools & Components Susquehanna International Group, LLP http://www.sig.com IMPORTANT: The information contained in this email and/or its attachments is confidential. If you are not the intended recipient, please notify the sender immediately by reply and immediately delete this message and all its attachments. Any review, use, reproduction, disclosure or dissemination of this message or any attachment by an unintended recipient is strictly prohibited. Neither this message nor any attachment is intended as or should be construed as an offer, solicitation or recommendation to buy or sell any security or other financial instrument. Neither the sender, his or her employer nor any of their respective affiliates makes any warranties as to the completeness or accuracy of any of the information contained herein or that this message or any of its attachments is free of viruses.

Paul A. Bristow

5:12 p.m.

New subject: Proposed documentation convention for pre-accepted libs

...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Marsh Ray Sent: Monday, May 02, 2011 5:40 PM To: boost@lists.boost.org Cc: "=?ISO-8859-1?Q?Francisco_Jos=E9?="@wowbagger.osl.iu.edu Subject: [boost] Proposed documentation convention for pre-accepted libs

On 05/02/2011 10:33 AM, Rene Rivera wrote:

...
On 5/1/2011 4:08 PM, Francisco José Tapia wrote:

...
or if you want a quick look , you can see in my web page< http://fjtapia.webs.com/works/countertree/index.html>

Please adjust your docs to make it *very* clear this is not an official, or accepted, Boost C++ Library.

I suggest that Boost adopt a convention for the documentation of libraries that are in the formally proposed or trial balloon state.

It seems like a good idea to have an alternate set of identifiers, perhaps an alternate BoostBook stylesheet, macros, etc. that library authors can use when developing libraries that they think they might want to submit. A set of guidelines that would make it easy for an author to put together everything for a well-integrated submission without it looking like he's representing his library as an official Boost project.

There was a long thread of discussion about using a logo like "Proposed for Boost" instead of the proper boost.png logo, but people got bored and we failed to reach agreement. I still support this simple mechanism. It encourages authors to get the docs to a good state, but it is quite clear that is not (yet) a Boost reviewed and approved library. A Quickbook [caution This is not yet a Boost library!] is another simple (and popular) option. Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com

Rene Rivera

8:11 p.m.

New subject: Proposed documentation convention for pre-accepted libs

On 5/2/2011 12:12 PM, Paul A. Bristow wrote:

...

There was a long thread of discussion about using a logo like

"Proposed for Boost" instead of the proper boost.png logo,

<http://svn.boost.org/svn/boost/trunk/doc/src/images/Inspired_by_Boost.png> <http://svn.boost.org/svn/boost/trunk/doc/src/images/Inspired_by_Boost.svg> <http://svn.boost.org/svn/boost/trunk/doc/src/images/Powered_by_Boost.png> <http://svn.boost.org/svn/boost/trunk/doc/src/images/Powered_by_Boost.svg> <http://svn.boost.org/svn/boost/trunk/doc/src/images/Proposed_for_Boost.png> <http://svn.boost.org/svn/boost/trunk/doc/src/images/Proposed_for_Boost.svg>

...

but people got bored and we failed to reach agreement.

I still support this simple mechanism. It encourages authors to get the docs to a good state, but it is quite clear that is not (yet) a Boost reviewed and approved library.

There was work put into the doc tool chain to prevent automatic use of the standard Boost logo when using Boostbook. Which obviously was circumvented, or otherwise not used, in this case. There are options in the Boostbook generation for one to indicate one of the alternate logos. I don't remember what the default was changed to after I first changed though. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org (msn) - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim,yahoo,skype,efnet,gmail

Daniel James

8:54 p.m.

New subject: Proposed documentation convention for pre-accepted libs

On 2 May 2011 21:11, Rene Rivera <grafikrobot@gmail.com> wrote:

...

There was work put into the doc tool chain to prevent automatic use of the standard Boost logo when using Boostbook. Which obviously was circumvented, or otherwise not used, in this case. There are options in the Boostbook generation for one to indicate one of the alternate logos. I don't remember what the default was changed to after I first changed though.

The default is to have no logo outside of boost, which I think is what you set. I changed the settings for css and links. I also the name of the parameter as it wasn't just changing the logo. The details are in the documentation list archives. The countertree documentation is using the logo markup from the website, not from boostbook. So it's added by some sort of post-processing.

Francisco José Tapia

9:21 p.m.

New subject: Proposed documentation convention for pre-accepted libs

It could be a good idea to have a separate page for the documentation of the proposed projects, with the same format, and when a project is accepted , you only need to move the documentation. If you access to the documentation across the page of don't accepted projects, it is clear, and don't need to make two documentations. 2011/5/2 Daniel James <dnljms@gmail.com>

...

On 2 May 2011 21:11, Rene Rivera <grafikrobot@gmail.com> wrote:

...
There was work put into the doc tool chain to prevent automatic use of

the

...
standard Boost logo when using Boostbook. Which obviously was circumvented, or otherwise not used, in this case. There are options in the Boostbook generation for one to indicate one of the alternate logos. I don't remember what the default was changed to after I first changed though.

The default is to have no logo outside of boost, which I think is what you set. I changed the settings for css and links. I also the name of the parameter as it wasn't just changing the logo. The details are in the documentation list archives. The countertree documentation is using the logo markup from the website, not from boostbook. So it's added by some sort of post-processing. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Marsh Ray

5:12 p.m.

On 05/01/2011 04:08 PM, Francisco José Tapia wrote:

...

Technically this library is an implementation of a binary red-black counter tree. Colloquially is a balanced tree with an additional counter in each leaf. This permit the access to the elements by the position, like in a vector. It is a random access container.

This sounds really useful. I was wanting something like this the other day. I was working on some code for processing named items. Sometimes processing of an item will discover other items needing to be processed. We don't want to process an item twice. So I wrote something like: std::set<string> names_todo; std::set<string> names_done; names_todo.insert( initial set of names ); while ( ! names_todo.empty() ) { string name = remove_first_element(names_todo); // by ref // adds names to 'todo' and 'done' sets. process_name(name, names_todo, names_done); // by ref } But unfortunately always removing the "first" element of a tree is pretty much the worst-case in terms of rebalancing overhead. Probably most algorithms use a stack for the 'todo' list but I was passing in a set anyway and thought it might benefit from the deduplication. However, with your container I'd have decent access to an arbitrary element of the set. Then the part that was remove_first_element(set&) could instead be "remove randomly selected element from set". This way I'd expect average case rather than worst-case performance. What would be even cooler though would be operations like "give me an element among those most efficient to remove" and "most efficient to insert before/after". - Marsh

Scott McMurray

5:49 p.m.

On Mon, May 2, 2011 at 10:12, Marsh Ray <marsh@extendedsubset.com> wrote:

...

However, with your container I'd have decent access to an arbitrary element of the set. Then the part that was remove_first_element(set&) could instead be "remove randomly selected element from set". This way I'd expect average case rather than worst-case performance.

What would be even cooler though would be operations like "give me an element among those most efficient to remove" and "most efficient to insert before/after".

Have you considered building something atop Boost.Intrusive's Splay containers? <http://www.boost.org/doc/libs/1_41_0/doc/html/intrusive/splay_set_multiset.html> Then you can just remove whatever's at the root of the tree each time, since moving the element to the root is the first step in deleting in a splay tree anyway, and you get good (average-case) performance for the "sometimes" inserts. Actually, your "done" set could be a splay_set too, since (barring inserts) you'll be inserting an element that was a child of the previously inserted element... ~ Scott

Marsh Ray

6:16 p.m.

New subject: [countertree] New Version c.f. Intrusive Splay containers

On 05/02/2011 12:49 PM, Scott McMurray wrote:

...

Have you considered building something atop Boost.Intrusive's Splay containers?<http://www.boost.org/doc/libs/1_41_0/doc/html/intrusive/splay_set_multiset.html>

That's looks great. I'll definitely keep that in mind when I get to the point of benchmarking and optimization.

...

Then you can just remove whatever's at the root of the tree each time, since moving the element to the root is the first step in deleting in a splay tree anyway, and you get good (average-case) performance for the "sometimes" inserts.

Actually, your "done" set could be a splay_set too, since (barring inserts) you'll be inserting an element that was a child of the previously inserted element...

So many data structures, so little time ... :-) - Marsh

Francisco José Tapia

9:11 p.m.

Response to Rene Rivera : You are right. I wrote the documentation in that format only by time economy. I wrote in a final format instead of do an intermediate format. I will do a first page of the documentation where indicate it is a preliminary library ,not accepted, and not part of the boost library. The idea about the namespace is right too. I willl change the boost namespace by other dependant of it. About the idea of the rank tree, it is very easy and fast to implement with the countertree library. The idea is very similar to the exposed in the previous message. You have the data stored in a vector, or a file or any other container. You have a “reference” to access to each elements. You implement the index as map or multimaps with pairs (Value, ”Reference”). Imagine you have several index. You make the query with upper_bound and lower_bound. With the countertree map and multimap you can know the size of every query in a O(log N) operation. In a countertree vector_tree, you insert only the “References” of the range with less elements. With the “Reference” , access to each element and check if the values of the others index are in the rank desired, if not , delete it of the vector_tree (It is a O(log N) operation). At the end of the process, in the vector_tree you have the valid “References” . I used this in a 3D locator in a simulator of silicon doping , and it was really fast, with the additional problem of a continuous creation and destroy of elements. In my opinion, if you have an easy way to implement it, it is unnecessary create a complex data structure in order to manage it. Response to Marsh Ray In all the containers of countertree, all the insertion or deletion operation are O(log N). The operation with the first and last elements are slighty faster , because there are pointers to them. But depending of the cost of the comparison, you can expect several millions of operations per second with 1 core, even with data structures of millions of elements. In my computer 1000000 of insertions and deletions in a vector_tree with 1000000 elements spend 0.631 seconds in win32 and 2.23 seconds with GCC 4.5 64 bits Response to Scott McMurray The splay tree must be rebalanced for to optimize the access. The most significant disadvantage is that the height of the splay tree can be linear. The countertree is fast, but if you need more speed, you can build a small cache over the tree, but must be carefully if you store too the position , because if you insert or delete in a previous position to the element stored in the cache, the position of the element change too. In my opinion, the red-black tree make outdated the splay trees, except , perhaps in a very limited environment.

Francisco José Tapia

15 May 15 May

9:35 a.m.

Hi this message is to announce the new version of the countertree library. According with the sugest of Rene Rivera ( Thanks), I had changed the namespaceof the library. Now all is in the namespace cntree. And in the documentation I have a big title with the text preliminary in the head of all the pages. If something is not OK , please mail me , and I will change as soon as possible. You can find the zip file with the code and the documentation in < http://www.boost-consulting.com/vault/index.php?action=downloadfile&filename=countertree.zip&directory=Containers&> or if you want a quick look , you can see in my web page < http://fjtapia.webs.com/works/countertree/index.html> The counter trees are trees where each node have an additional counter. This counter represent the number of nodes under it, including itself. This permit to access to the elements of the tree by their position, like a vector. You can mix safely access by iterators and access by position. This Library provide us : 1.- Vectors with the same speed inserting and deleting in the extreme positions than in the central positions. They are a bit slower than the STL vector inserting and deleting in the extreme positions, but much more faster inserting and deleting in the central positions. 2.- Set, multiset, map, multimap fully compatible with the STL set, multiset, map and multimap, with similar speed.The iterators are, like in a vector, random access (you can add or subtract any integer value to an iterator). You have too access to the elements by position like in a vector, with the function at (position) as you can see in the next example code #include <boost/countertree/set.hpp> #include <iostream> int main ( void ) { //------------------------ begin -------------------- cntree::set<int> A ; for ( int i = 0 ; i< 100 ; ++i) A.insert(i); for ( int i = 0 ; i < A.size() ; ++i) std::cout<<A.at(i); return 0 ; }; The iterators and the data structures are fully compatible with the STL functions and algorithms. For to show the utility of this, imagine a large unordered file with the information about the employees of a big company. You have indexes implemented with multimaps, with pairs [value, position], and use the upper_bound and lower_bound functions . You need to select the employees in a range of age and weight. With the boost:multimap you can know the number of elements of the selection. In the weight query we found 100 elements and in the age query 10000 elements. To travel the elements of the weight query asking for the age, is faster than the opposite. Tray it. It fast, useful and easy to understand and use,. They are the STL classes with a few additional functions. I had checked this code with GCC (32 and 64) and Visual Studio 2008 (32) and Visual Studio 1010 (32 and 64). Please, if you find any problem, please mail me ( <fjtapia@gmail.com>) or if you mail Boost, please insert [countertree] in the head of the message. Some days I have not time to read all messages arrived to my mail, but if I see that I detect and respond quickly. Francisco Tapia <fjtapia@gmail.com>

5187

Age (days ago)

5201

Last active (days ago)

List overview

Download

12 comments

7 participants

participants (7)

Daniel James
Francisco José Tapia
Marsh Ray
Paul A. Bristow
Rene Rivera
Scott McMurray
Stewart, Robert