Is there any interest in this container?

I was wondering if there is any interest in a container that is a model of sequence (just like std::vector or std::list) and with an interface matching that of std::vector (maybe with minor differences due to effiency) but with different time complexities? Basically this is a Random Access Container but with the following time complexity guarantees instead: * insert/erase anywhere in the list: O(lg n) * element Access (i.e. [] or at()): O(lg n) * front()/back(): O(1) This would make it better than a list for random accessing elements and better than a vector for inserting/deleting elements in the middle of the list. I am currently working of an implementation of this container and thought I would see if there would be any interest in such a container from the boost community. Sincerely, Peter Palotas

This seems like a useful data structure, although I can't think of any specific use cases. I am assuming you are implementing it as a balanced binary tree where you keep track of the number of nodes in the subtree rooted at each node. (This allows finding a node by its numeric index in time.) -- Jeremy Maitin-Shepard

Jeremy Maitin-Shepard wrote:
This seems like a useful data structure, although I can't think of any specific use cases.
I use such a data structure right... keeping track of a GUI list that can have items inserted/removed in it. A regular number->item map doesn't work as it has to change the index of items on inserts/deletes.
I am assuming you are implementing it as a balanced binary tree where you keep track of the number of nodes in the subtree rooted at each node. (This allows finding a node by its numeric index in time.)
That is how I implemented it. The data structure is called a "rank tree", describe in the white book. I've mentioned the structure in this list before :-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

Rene Rivera <grafik.list <at> redshift-software.com> writes:
That is how I implemented it. The data structure is called a "rank tree", describe in the white book. I've mentioned the structure in this list before
What is the 'white book'? Is this the standard reference for structures in this area? Matt

Matthew Vogt wrote:
Rene Rivera <grafik.list <at> redshift-software.com> writes:
That is how I implemented it. The data structure is called a "rank tree", describe in the white book. I've mentioned the structure in this list before
What is the 'white book'? Is this the standard reference for structures in this area?
Somewhat standard in that it's used in most CS schools. Introduction to Algorithms Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest ISBN 0-262-03141-8 -- ISBN 0-07-013143-0 Second edition at... http://www.bookpool.com/.x/n5753dkkj4/sm/0262032937 Bookpool: Introduction to Algorithms, 2nd Edition Peter mentioned to me privately that the 1st edition is available online. Peter care to provide a link to it? PS. And since Peter also asked, and before more people ask. The description I refer to is in section 15.1 of the first edition. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

Rene Rivera <grafik.list <at> redshift-software.com> writes:
Introduction to Algorithms Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest ISBN 0-262-03141-8 -- ISBN 0-07-013143-0
Peter mentioned to me privately that the 1st edition is available online. Peter care to provide a link to it?
Well, if it is online, it's not easy to find... Slides based on it are available, though, from: http://highered.mcgraw-hill.com/sites/0070131511/student_view0/ Matt

"Rene Rivera" <grafik.list@redshift-software.com> wrote in message news:4057E121.8080207@redshift-software.com...
[...] That is how I implemented it. The data structure is called a "rank tree", describe in the white book. [...]
Hmm...DADS doesn't have it, and I have found that to be a most exhaustive and authoritative source: http://www.nist.gov/dads/ Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

Jeremy Maitin-Shepard wrote:
This seems like a useful data structure, although I can't think of any specific use cases.
I am assuming you are implementing it as a balanced binary tree where you keep track of the number of nodes in the subtree rooted at each node. (This allows finding a node by its numeric index in time.)
Yes, that is correct. I'm implementing it as a red-black tree keeping track of the number of nodes rooted at each node just as you described. I'm also thinking about managing a linked list of the elements in the container to allow for constant time iteration through the list. This would add a small O(1) overhead to insertions and deletions, but on the other hand it would probably speed up deletions a little bit as well so it might be a good idea. The use I wanted this structure for is the same as Rene wrote about; for keeping track of eg. a GUI list that can have items inserted/removed in it. Not sure if there are other usages as well nor if this is a general enough usage to include in a library such as boost, but I know I've wanted this container for this purpose on a number of occasions.

"Peter Palotas" <peter@smartbusiness.nu> wrote in message news:auto-000000364385@statement.se...
I was wondering if there is any interest in a container that is a model of sequence (just like std::vector or std::list) and with an interface matching that of std::vector (maybe with minor differences due to effiency) but with different time complexities? [...]
Take a look at the policy-based map in the sandbox. I implement what I called "indexed nodes" which appear to give the complexity characteristics you describe. I'm not sure whether the indexed_set library soon to be reviewed also supports this feature or not, but given all the other features it has, I would be suprised if it didn't. ;) Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

That actually sounds pretty much what I was looking for. However I can't seem to find the source in the sandbox (using the web interface). Haven't poked around there before so maybe I'm just stupid, but I can't find the actual header-file in http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/libs/map/ Where can I find it? And how stable/complete is your implementation? Regards, Peter
-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of David B. Held Sent: den 17 mars 2004 19:54 To: boost@lists.boost.org Subject: [boost] Re: Is there any interest in this container?
"Peter Palotas" <peter@smartbusiness.nu> wrote in message news:auto-000000364385@statement.se...
I was wondering if there is any interest in a container that is a model of sequence (just like std::vector or std::list) and with an interface matching that of std::vector (maybe with minor differences due to effiency) but with different time complexities? [...]
Take a look at the policy-based map in the sandbox. I implement what I called "indexed nodes" which appear to give the complexity characteristics you describe. I'm not sure whether the indexed_set library soon to be reviewed also supports this feature or not, but given all the other features it has, I would be suprised if it didn't. ;)
Dave
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Palotas wrote:
That actually sounds pretty much what I was looking for. However I can't seem to find the source in the sandbox (using the web interface). Haven't poked around there before so maybe I'm just stupid, but I can't find the actual header-file in http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/libs/map/
Where can I find it? And how stable/complete is your implementation?
The headers are like in the regular Boost: ../boost-sandbox/boost/* I look at the implementation and it doesn't do what a rank tree does. It does make it easier to base a rank tree implementation against it. The part that is missing is keeping track of the rank of nodes as the tree changes, the constant time op, that lets one have the logN ops later on. [ Unless I missed something -- So correct me at will ;-) ] To David... It also looks to me that there is a much better way to support the custom index aspect of assoc containers with something I thought about recently, tuple_adaptor... The idea is to have a tuple compatible interface that can access a subset of a larger structure. You can then use the tuple adaptor as the comparison/index. Basically abstracting out the "index" like functionality you have. Here are some notes I made up about it last week...
tuple_adaptor, provides the boost::tuple interface, and types, given an existing structure and some of it's members. For example having this exiting structure:
struct data_op { int number; bool flag; std::string name; }; typedef tuple_adaptor<data_op,&data_op::number,&data_op::flag,&data_op::name> data_op_tuple; data_op_tuple x(1,false,"foo"); x.flag = true; x.get<0>() = 13; This facilitates adapting existing types for use in algorithms that take tuples. Another option this opens is proving a subset tuple interface to a large structure, for example: typedef tuple_adaptor<data_op,&data_op::number,&data_op::name> data_op_tuple_2; <<<< -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

"Rene Rivera" <grafik.list@redshift-software.com> wrote in message news:4058C84D.2000008@redshift-software.com...
[...] I look at the implementation and it doesn't do what a rank tree does. It does make it easier to base a rank tree implementation against it. The part that is missing is keeping track of the rank of nodes as the tree changes, the constant time op, that lets one have the logN ops later on. [ Unless I missed something -- So correct me at will ;-) ]
Hmm...are you sure? Did you look at the rotate_right() and rotate_left() functions? The "indexes" wouldn't be correct if you didn't adjust them on insert/delete. Essentially, what I call "index" is probably what you call "rank".
It also looks to me that there is a much better way to support the custom index aspect of assoc containers with something I thought about recently, tuple_adaptor... The idea is to have a tuple compatible interface that can access a subset of a larger structure. You can then use the tuple adaptor as the comparison/index. Basically abstracting out the "index" like functionality you have. Here are some notes I made up about it last week... [...]
Actually, this sounds almost exactly like what Joaquin's IndexedSet does. You might want to take a closer look at that if you haven't already. I probably will soon so I can give it a thorough review. Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

David B. Held wrote:
"Rene Rivera" <grafik.list@redshift-software.com> wrote in message news:4058C84D.2000008@redshift-software.com...
[...] I look at the implementation and it doesn't do what a rank tree does. It does make it easier to base a rank tree implementation against it. The part that is missing is keeping track of the rank of nodes as the tree changes, the constant time op, that lets one have the logN ops later on. [ Unless I missed something -- So correct me at will ;-) ]
Hmm...are you sure? Did you look at the rotate_right() and rotate_left() functions? The "indexes" wouldn't be correct if you didn't adjust them on insert/delete. Essentially, what I call "index" is probably what you call "rank".
No I wasn't sure, hence why I said I may have missed something ;-) Yes you are correct it does have the functionality. It still took me a while to find the exact place even with your hint above, IndexedNode rotate* and update*, that has the count.
does. You might want to take a closer look at that if you haven't already. I probably will soon so I can give it a thorough review.
Will do :-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com - 102708583/icq

"Peter Palotas" <peter@smartbusiness.nu> wrote in message news:auto-000000366088@statement.se...
[...] And how stable/complete is your implementation?
Well, it's been in use in a deployed software system for several years now with no apparent bugs, but the use-case does not stress the structure anywhere near exhaustively. For the purpose of reasonably fast insert + lookup, it works beautifully. As far as completeness goes, I shamelessly ripped off the STLport map/set implementation and hacked it. So it's basically as complete as SLTport's map/set. ;) That should also speak to the quality issue, though any bugs in the library are almost certainly mine. I really wish I had had the time to clean it up and formalize it, but it looks like Joaquin's set library might (or possibly just 'could') subsume its functionality anyway, so I will chase other dragons. Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

David B. Held <dheld <at> codelogicconsulting.com> writes:
"Peter Palotas" <peter <at> smartbusiness.nu> wrote in message news:auto-000000364385 <at> statement.se...
I was wondering if there is any interest in a container that is a model of sequence (just like std::vector or std::list) and with an interface matching that of std::vector (maybe with minor differences due to effiency) but with different time complexities? [...]
Take a look at the policy-based map in the sandbox. I implement what I called "indexed nodes" which appear to give the complexity characteristics you describe. I'm not sure whether the indexed_set library soon to be reviewed also supports this feature or not, but given all the other features it has, I would be suprised if it didn't. ;)
Well, currently indexed_set does not have such an index type, although it could be probably implemented with relative ease. But I fail to see what this type of structure is useful for. From what I've read on this thread, the main reason one could want to have operator[] is that it allows the programmer to store the index of an element for later retrieval. But this can be also accomplished (and with better performance) if we store an iterator or a reference to the element instead. I guess I'm not fully understanding the use case. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquin M Lopez Munoz" <joaquin@tid.es> wrote in message news:loom.20040318T093028-312@post.gmane.org...
[...] I guess I'm not fully understanding the use case.
As one poster mentioned, it has to do with GUIs. Although I've used this structure in non-GUI contexts, the original motivation was the back-end of a Win32 ListView. I had an application where data was being inserted continuously (not at a high rate, but high enough that I didn't want O(N) insert). The data had to be sorted, and you needed to be able to jump to any arbitrary point in the data and start iterating, without doing an O(N) scan. That's because a virtual ListView will let you scroll to any point in the list, and pass a linear index at which you must start dumping items. So if you have a listview with 3,000 items, the user could scroll to item 2,500, at which point your data structure must be able to deliver a screenful of data beginning with that index. So my solution was to modify std::map (since it already had the sorting invariant and reasonably fast insert) so that you got an O(log n) indexed lookup. Once you get the first lookup, you can just iterate for the remaining display elements (amortized constant). In fact, any time you want to present a large list of sorted data to a user (consider a client/server scenario, for instance), and the user can request data beginning at a certain ordinal index, the "rank tree" (as Rene calls it) is a useful structure to have. For instance, imagine a client that can browse a remote table of thousands of employee records that the server has cached in memory. Suppose the client displays 100 records per page, and the user says: "Let me see page 30". If you only have one client, it would not be unreasonable for the server to do a linear search to record 3,000. If you have 50 clients, that linear search starts to look pretty bad, and the O(log n) ordinal/rank search starts to look pretty good. Does that help? Dave --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.581 / Virus Database: 368 - Release Date: 2/9/2004

David B. Held <dheld <at> codelogicconsulting.com> writes: [...]
In fact, any time you want to present a large list of sorted data to a user [...] Does that help?
Yes it does. But then I think we are talking about two different data structures here: * Yours: A sorted container with ordinal-based access. * Peter's: A list-like container with ordinal-based access. Am I right about this? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

David B. Held <dheld <at> codelogicconsulting.com> writes: [...]
In fact, any time you want to present a large list of
sorted data to a
user [...] Does that help?
Yes it does. But then I think we are talking about two different data structures here:
* Yours: A sorted container with ordinal-based access. * Peter's: A list-like container with ordinal-based access.
Am I right about this?
I can't speak for what David meant, but personally I meant an unsorted container, yes, essentially a container with the same interface (pretty much anyway) as that of vector combined with list. // Peter

On 18. Mar 2004, at 11:50, David B. Held wrote:
[...] I guess I'm not fully understanding the use case. [...] I had an application where data was being inserted continuously (not at a high rate, but high enough that I didn't want O(N) insert). The data had to be sorted, and you needed to be able to jump to any arbitrary point in the data and start iterating, without doing an O(N) scan.
I had a similar problem myself, but I found that the slower (lg N) lookup time (even though iteration from that point was linear) did not make up for the faster insertion, because, most of the time I used the lookup methods -- and in your listview example, I would think the same is the case in 99% of the applications I can envision. Similar with the client/server example, since the 3,000 clients only fetch data from the database as I understood it. My problem was that insertions could come in bursts, and could e.g. all go to the start of the container -- my solution was to create a cache or insertion queue (in lack of a better name), to which items were initially added, and lookup would take this queue into consideration. The queue was of fixed size and flushed (in O(n) time) when it was full. Basically giving me the same theoretical time complexities as only using a vector, but in practice making insertion of m items significantly faster. -- http://www.diku.dk/hjemmesider/studerende/duff/
participants (7)
-
Allan Odgaard
-
David B. Held
-
Jeremy Maitin-Shepard
-
Joaquin M Lopez Munoz
-
Matthew Vogt
-
Peter Palotas
-
Rene Rivera