[multi_index][rfc] random access indices preview

Hello, Some months ago I prompted for interest in a new type of indices for Boost.MultiIndex featuring random access facilities, and received a couple of expressions of itnerest. The preliminary specification was published at: http://195.235.93.150/rnd_indices_spec.html I've just uploaded a preview version of random access indices at FileVault -> Containers (http://tinyurl.com/9zlmj) This preview follows the previous specification save for one point: reserve/capacity is provided for control of an internal array of pointers to the nodes, as explained in http://195.235.93.150/rnd_indices_spec.html#interface. There's no docs yet on this type of indices, but the usage should be straightforward from the preliminay specification notes. Random access indices act as a functional drop-in replacement of sequenced indices, so you can start playing with them very easily. My requests: * Do you find these indices worth including in Boost.MultiIndex? * If so, do you have comments on the specification? Things you'd change/add? There's a list of concrete issues at the end of the specification page, * Do you envision any interesting use case for random access indices? Have you experimented with them? Thank you very much for your feddback, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:437879AE.D786AA7D@tid.es... Hello, Some months ago I prompted for interest in a new type of indices for Boost.MultiIndex featuring random access facilities, and received a couple of expressions of itnerest. The preliminary specification was published at: http://195.235.93.150/rnd_indices_spec.html I've just uploaded a preview version of random access indices at FileVault -> Containers (http://tinyurl.com/9zlmj) This preview follows the previous specification save for one point: reserve/capacity is provided for control of an internal array of pointers to the nodes, as explained in http://195.235.93.150/rnd_indices_spec.html#interface. There's no docs yet on this type of indices, but the usage should be straightforward from the preliminay specification notes. Random access indices act as a functional drop-in replacement of sequenced indices, so you can start playing with them very easily. My requests: * Do you find these indices worth including in Boost.MultiIndex? * If so, do you have comments on the specification? Things you'd change/add? There's a list of concrete issues at the end of the specification page, * Do you envision any interesting use case for random access indices? Have you experimented with them? One possible alternative implementation is a modification of the order-statistic tree from "Introduction to Algorithms" by Corman et. all. Basically this is a balanced binary tree where every node contains the size of its left subtree. This results in a data structure where insertion, deletion, indexing and iterator arithmetic are all O(log n) and for which insertion and deletion do not invalidate iterators. Note that while the version shown in "Introduction to Algorithm" is a binary search tree, for your version you can omit the condition that the nodes are in sorted order and just keep the balancing and order-statistic structures. Joe Gottman

Hello Joe, Joe Gottman ha escrito:
"Joaquín Mª López Muñoz" <joaquin@tid.es> wrote in message news:437879AE.D786AA7D@tid.es... * Do you envision any interesting use case for random access indices? Have you experimented with them?
One possible alternative implementation is a modification of the order-statistic tree from "Introduction to Algorithms" by Corman et. all. Basically this is a balanced binary tree where every node contains the size of its left subtree. This results in a data structure where insertion, deletion, indexing and iterator arithmetic are all O(log n) and for which insertion and deletion do not invalidate iterators. Note that while the version shown in "Introduction to Algorithm" is a binary search tree, for your version you can omit the condition that the nodes are in sorted order and just keep the balancing and order-statistic structures.
I think your proposal makes a very different kind of index, since it wouldn't provide random access iterators. FWIW, ranked indices (http://tinyurl.com/b34sb) are based on order-statistics trees and can be used to construct ordered indices with O(log n) positional lookup (operator [] ): not exactly what you propose, but probably equivalent form some applications, and might be included in Boost.MultiIndex in the future. On the other hand, please note that middle-insertion and middle-deletion, though technically O(n) for random access indices, are actually much faster than for std::vector, since there's no element reallocation/shifting at all (only an internal array of pointers is reallocated or shifted accordingly.) So the performance characteristics of random access indices are, IMHO, not so bad. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

On 11/14/05, Joaquín Mª López Muñoz <joaquin@tid.es> wrote:
you can start playing with them very easily. My requests:
* Do you find these indices worth including in Boost.MultiIndex?
I have yet to use Boost.MultiIndex, but I have enjoyed reading the excellent documentation and have meant to find a use for it in my own code. This might just be the feature I need to try it. * If so, do you have comments on the specification? Things you'd
change/add? There's a list of concrete issues at the end of the specification page,
Re: capacity/reserve. I think it might be useful to include these methods and have them operate on the key array as you suggest. Your argument against them is that they are only used to ensure iterator validity. But wouldn't these methods also be useful in optimizing container load time, as they are for std::vector and friends? I know I always call "reserve" on a vector when I know how much data I'm going to load into it ahead of time. * Do you envision any interesting use case for random access indices?
Have you experimented with them?
I have an insertion-order traversal hash table that I use in a situation where MultiIndex + Random Access Iterators might provide a very compelling replacement, and I am eager to try it. Thank you very much for your feddback,
Hope this helps, -- Caleb Epstein caleb dot epstein at gmail dot com

On 11/15/05, Caleb Epstein <caleb.epstein@gmail.com> wrote:
Re: capacity/reserve. I think it might be useful to include these methods and have them operate on the key array as you suggest. Your argument against them is that they are only used to ensure iterator validity. But wouldn't these methods also be useful in optimizing container load time, as they are for std::vector and friends? I know I always call "reserve" on a vector when I know how much data I'm going to load into it ahead of time.
Now that I actually read your entire email it appears that you decided to support capacity/reserve in the implementation and the spec is out of date. Sorry for the misunderstanding and not reading your email fully. As to: I don't particularly like the name random_access. Do you have an alternative
proposal? Ideally, a name should be an adjective (like is already the case for "ordered", "sequenced", etc.) and capture the essential feature of the indices, namely random accessibility.
What aboust "clustered"? It seems a bit like a clustered index in SQL land, which implies a physical ordering to the data. Unless I'm totally off the mark. -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein ha escrito:
On 11/14/05, Joaquín Mª López Muñoz <joaquin@tid.es> wrote: I don't particularly like the name random_access. Do you have an alternative
proposal? Ideally, a name should be an adjective (like is already the case for "ordered", "sequenced", etc.) and capture the essential feature of the indices, namely random accessibility.
What aboust "clustered"? It seems a bit like a clustered index in SQL land, which implies a physical ordering to the data. Unless I'm totally off the mark.
"Clustered" seems to me a little far-fetched, since it has some connotations from DB world that do not really apply to random access indices: * DB tables usually can't have more than one clustered index. This is not the case with Boost.MultiIndex random access indices. * DB clustered indices are key-based, Boost.MultiIndex random access indices are not. Actually, in DB world "clustered" is an attribute of an index, much like "unique" is. This does not match well the situation in Boost.MultiIndex IMHO. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Caleb Epstein ha escrito:
* If so, do you have comments on the specification? Things you'd
change/add? There's a list of concrete issues at the end of the specification page,
Re: capacity/reserve. I think it might be useful to include these methods and have them operate on the key array as you suggest. Your argument against them is that they are only used to ensure iterator validity. But wouldn't these methods also be useful in optimizing container load time, as they are for std::vector and friends? I know I always call "reserve" on a vector when I know how much data I'm going to load into it ahead of time.
Yep, I finally decided to provide capacity/reserve.
* Do you envision any interesting use case for random access indices?
Have you experimented with them?
I have an insertion-order traversal hash table that I use in a situation where MultiIndex + Random Access Iterators might provide a very compelling replacement, and I am eager to try it.
If you finally do so, please report your results back. Also, I'm on the lookout for interesting use cases of random access indices to include in the examples section, yours could make a good candidate if you don't mind explaining it a little more. Thank you! Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (3)
-
Caleb Epstein
-
Joaquín Mª López Muñoz
-
Joe Gottman