Re: [boost] [multi_index]terminology/design concern

----- Mensaje original ----- De: Gennadiy Rozental <gennadiy.rozental@thomson.com> Fecha: Miércoles, Abril 12, 2006 8:41 pm Asunto: Re: [boost] [multi_index]terminology/design concern Joaquín wrote:
From your description above, seems to me what you understand as an "order" is verily what the index concept (as used in Boost.MultiIndex) tries to model. Basically, an index is an access interface to a set of objects jointly owned by several other indices --in contrast with regular STL containers who own their elements in an exclusive fashion.
[...]
"Order" would be a good candidate, except that it's generally tied to fixed-order containers (sets, maps), and Boost.MultiIndex also includes sequence-like constructs.
I do not see a problem with term "Order". You could have mulit_order_container. While STL present single order ones.
As I said, I'm reluctant to name sequenced indices as "orders", since it is precisely the lack of any fixed order that defines them. Besides, the differences between index interfaces are huge, and referring to them collectively as orders tends to blur (IMHO) these differences, as if the interfaces were homogeneous, which they aren't. Anyway, names were already chosen, published, and pretty much cast in stone when the library was first released in 2004. Back then, this conversation could have had more practical implications, right now I can only concede/deny your points, but no name change is likely to occur.
I'm afraid I'm not quite getting what you mean by "index" in this context. Certainly, the kind of operations you describe
Index is something that provided direct (constant-time) access somewhere within selected order.
can be performed in Boost.MultiIndex by means of so-called "compatible keys" as described in http://tinyurl.com/rouet,
I am not sure how this is related.
so that, for instance, given an ordered index sorted by (last name, first name) one can jump to the records beginning with some given last name. Is this what you're referring to?
Essencially. The same way as book index allows to jump to proper page based on keyword. Could you show some code example? What I want is to iterate through all persons with last name "Smith". All operations should have contant complexity.
So, if your multi-index container is defined like this: typedef multi_index_container< address, indexed_by< ordered_non_unique< // (last_name,first_name) index composite_key< address, member<address,std::string,&address::last_name>, member<address,std::string,&address::first_name> > > ... // other indices
address_book_t;
address_book_t b; then you can write the following: void list(const std::string& last_name) { std::pair<address_book_t::iterator,address_book_t::iterator> p= b.equal_range(last_name); std::copy( p.first,p.second, std::ostream_iterator<address>(std::cout)); } You require that all ops are constant complexity, which is not entirely fulfilled by my example, since equal_range is O(log n). On the other hand, I cannot see why your indexing concept must guarantee constant complexity and not only fast lookup.
If so, the only thing I can say is I never thought of these operations as directly related to an index concept. If you could elaborate on your discussion maybe I'll be seeing your point more clearly.
Thanks for discussing your concerns, I hope at least I've shed some light on them.
IMO we need another level of indirection that would support Index within an Order concept. I do not see how existing design could handle this.
If the need for the kind of bookmarking (allow me to use this term so as to not mix the different meanings we're giving to the word "index") you describe is sufficiently general, such a facility can be built separately from B.MI itself (like a standalone view-like library) or integrated into B.MI as an index decorator: typedef multi_index_container< element, indexed_by< bookmarked< ordered_unique<...> > ...
So, the bookmarked<...> construct can embed any preexistent index ("order" in your terminology) type and supplement it with bookmarking facilities, provided we had a precise definition of what bookmarking boils down to. This kind of stuff can be done within the current design, but then again I yet fail to see how what you describe is a general need not sufficiently covered by current facilities, like for instance compatible keys in ordered indices --clarification here most welcome. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"JOAQUIN LOPEZ MU?Z" wrote:
Anyway, names were already chosen, published, and pretty much cast in stone when the library was first released in 2004. Back then, this conversation could have had more practical implications, right now I can only concede/deny your points, but no name change is likely to occur.
A page named "Glossary" could be added. The page may provide description of context, rationale for choosen terminology, translation of terms from different libraries etc. Someting similar exists in Statechart library. /Pavel

Essencially. The same way as book index allows to jump to proper page based on keyword. Could you show some code example? What I want is to iterate through all persons with last name "Smith". All operations should have contant complexity.
So, if your multi-index container is defined like this:
typedef multi_index_container< address, indexed_by< ordered_non_unique< // (last_name,first_name) index composite_key< address, member<address,std::string,&address::last_name>, member<address,std::string,&address::first_name> > > ... // other indices
address_book_t;
address_book_t b;
then you can write the following:
void list(const std::string& last_name) { std::pair<address_book_t::iterator,address_book_t::iterator> p= b.equal_range(last_name); std::copy( p.first,p.second, std::ostream_iterator<address>(std::cout)); }
You require that all ops are constant complexity, which is not entirely fulfilled by my example, since equal_range is O(log n). On the other hand, I cannot see why your indexing concept must guarantee constant complexity and not only fast lookup.
That's the primary reason I would need an Index. Aa Order may have have different complexity guarantees. But in some cases I need partucular search to be performed much faster then it is guaranted by order search complexity. Essentially I am looking for an ability to do something like this: address_book_t ab; typedef address_book_t::order<2>::type full_name_order; // this operation is compile time full_name_order fno = ab.get_order<FullName>(); // this operation has constant complexity, since Last name Index provide direct access std::pair<full_name_order::iterator,full_name_order::iterator> p = fno.get_index<LastName>::equal_range( "Smith" ); ... In some cases Index could have non-constant complexity, but still could be used to significantly speedup orderred collection search. Lets say I have Input messages log ordered sequencially (meaning that the order is just natuaral order messages where comming in). And I have int64 number of theses comming in every day. Now for this order I want to define Time index to speedup lookup for messages in partucilar time interval. This operation most probably wont be constant (due to continues nature of time) but still could be implemented much faster then global serach through messages collection. In addition for the same collection I may want to define Symbol (some attribute in a messages) order. And for that order I need first letter index. My primary point in above examples is that both Order and Index are first class citizents and current desing doesn't account for the latter.
IMO we need another level of indirection that would support Index within an Order concept. I do not see how existing design could handle this.
If the need for the kind of bookmarking (allow me to use this term so as to not mix the different meanings we're giving to the word "index") you describe is sufficiently general, such a facility can be built separately from B.MI itself (like a standalone view-like library) or integrated into B.MI as an index decorator:
typedef multi_index_container< element, indexed_by< bookmarked< ordered_unique<...> > ...
So, the bookmarked<...> construct can embed any preexistent index ("order" in your terminology) type and supplement it with bookmarking facilities, provided we had a precise definition of what bookmarking boils down to. This kind of stuff can be done within the current design, but then again I yet fail to see how what you describe is a general need not sufficiently covered by current facilities, like for instance compatible keys in ordered indices --clarification here most welcome.
I am not really crazy about the term Bookmark. IMO Order/Index is much better reflect the nature of the concepts we are trying to model. But above is step in right direction IMO. Gennadiy

Hello Gennadiy, I'm returning to this discussion. Excuse the big delay. By the end of your post, you wrote:
I am not really crazy about the term Bookmark. IMO Order/Index is much better reflect the nature of the concepts we are trying to model. But above is step in right direction IMO.
Well, let me use Index/Bookmark if only during this discussion so as to not introduce further confusion, since I think you agree with me it is concepts we're most interested now, rather than words. I'm not in love with "bookmark" either, it is just a hack word I came by with some resonance to the concept it is being used to refer to here. Now, returning to the original flow of your post: Gennadiy Rozental <gennadiy.rozental <at> thomson.com> writes:
That's the primary reason I would need an Index. Aa Order may have have different complexity guarantees. But in some cases I need partucular search to be performed much faster then it is guaranted by order search complexity. Essentially I am looking for an ability to do something like this:
address_book_t ab; typedef address_book_t::order<2>::type full_name_order;
// this operation is compile time full_name_order fno = ab.get_order<FullName>();
// this operation has constant complexity, since Last name Index provide direct access std::pair<full_name_order::iterator,full_name_order::iterator> p = fno.get_index<LastName>::equal_range( "Smith" );
...
In some cases Index could have non-constant complexity, but still could be used to significantly speedup orderred collection search. Lets say I have Input messages log ordered sequencially (meaning that the order is just natuaral order messages where comming in). And I have int64 number of theses comming in every day. Now for this order I want to define Time index to speedup lookup for messages in partucilar time interval. This operation most probably wont be constant (due to continues nature of time) but still could be implemented much faster then global serach through messages collection. In addition for the same collection I may want to define Symbol (some attribute in a messages) order. And for that order I need first letter index. My primary point in above examples is that both Order and Index are first class citizents and current desing doesn't account for the latter. [...]
IMO we need another level of indirection that would support Index within an Order concept. I do not see how existing design could handle this.
Well, if we concentrate now on the kind of data structures we'd need in order to achieve what you request, I come up with the following conclusions: 1. If the index to be sped up is of the ordered kind, then the only way to have faster lookups for clusters of elements (which, according to my terminology, are groups of elements equivalent under some sorting criterion compatible to that used by the index, see http://tinyurl.com/zm6h3), we'd probably have to resort to hashing (which improves rb-tree O(log n) lookup) and have some inner structures like hash_map<key_type,oredered_index_type::iterator> storing pointers to the first element of each cluster. This has to be properly synchronized with the ordered index on insertion/deletion. 2. If the index to be sped up is of the sequenced kind, we've got a problem: you require that the bookmark refer to clusters by using iterators from the index itself, as implied in your example: std::pair<full_name_order::iterator,full_name_order::iterator> p = fno.get_index<LastName>::equal_range( "Smith" ); and explicitly stated by you in a previous post: "All indexes for the same order present the same iterator which is a view iterator." This is AFAICS impossible if the index is sequenced, since the elements can be freely rearranged and thus a cluster need not be adjacent (from the point of view of the sequenced index iterator). If instead of returning sequenced index iterators we allow the bookmark to use its own iterators, which could then properly traverse clusters in the desired order, we have nothing more nor less than an additional B.MI index!! You can have it ordered or hashed according to your taste. So, my conclusion in this case is that all bookmarking needs for sequenced indices are fully covered by additional indices. This argument applies unchanged to random access indices as well. 3. If we want to bookmark a hashed index, I really don't know how to have faster lookup of clusters than the hashed index itself already provides. As hashed indices do not show any definite traversal order, clustering can only be achieved by an additional hashed index approppiately defined. My hunch is that your request is primarily targeted at ordered indices, anyway. So, I conclude that bookmarking ("indexing" in your terminology) is either already covered by B.MI or irrelevant except in case #1, that of ordered indices ("orders" in your terminology), which can be sped up by using an internal hash table of pointers to the clusters's first elements. I'd love to know whether you agree with this analysis or, on the contrary, I'm missing your point. As for bookmarking ordered indices, I find the idea of some value for massive amounts of data, though I'm by no means certain that something like that should go into B.MI. But certainly I can write some prototype of the idea and let you play with it, in case you're interested. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

"Joaquin M Lopez Munoz" <joaquin@tid.es> wrote in message news:loom.20060420T234946-849@post.gmane.org...
Hello Gennadiy, I'm returning to this discussion. Excuse the big delay.
By the end of your post, you wrote:
I am not really crazy about the term Bookmark. IMO Order/Index is much better reflect the nature of the concepts we are trying to model. But above is step in right direction IMO.
Well, let me use Index/Bookmark if only during this discussion so as to not introduce further confusion, since I think you agree with me it is concepts we're most interested now, rather than words. I'm not in love with "bookmark" either, it is just a hack word I came by with some resonance to the concept it is being used to refer to here.
Now, returning to the original flow of your post:
Gennadiy Rozental <gennadiy.rozental <at> thomson.com> writes:
That's the primary reason I would need an Index. Aa Order may have have different complexity guarantees. But in some cases I need partucular search to be performed much faster then it is guaranted by order search complexity. Essentially I am looking for an ability to do something like this:
address_book_t ab; typedef address_book_t::order<2>::type full_name_order;
// this operation is compile time full_name_order fno = ab.get_order<FullName>();
// this operation has constant complexity, since Last name Index provide direct access std::pair<full_name_order::iterator,full_name_order::iterator> p = fno.get_index<LastName>::equal_range( "Smith" );
...
In some cases Index could have non-constant complexity, but still could be used to significantly speedup orderred collection search. Lets say I have Input messages log ordered sequencially (meaning that the order is just natuaral order messages where comming in). And I have int64 number of theses comming in every day. Now for this order I want to define Time index to speedup lookup for messages in partucilar time interval. This operation most probably wont be constant (due to continues nature of time) but still could be implemented much faster then global serach through messages collection. In addition for the same collection I may want to define Symbol (some attribute in a messages) order. And for that order I need first letter index. My primary point in above examples is that both Order and Index are first class citizents and current desing doesn't account for the latter. [...]
IMO we need another level of indirection that would support Index within an Order concept. I do not see how existing design could handle this.
Well, if we concentrate now on the kind of data structures we'd need in order to achieve what you request, I come up with the following conclusions:
1. If the index to be sped up is of the ordered kind, then the only way to have faster lookups for clusters of elements (which, according to my terminology, are groups of elements equivalent under some sorting criterion compatible to that used by the index, see http://tinyurl.com/zm6h3), we'd probably have to resort to hashing (which improves rb-tree O(log n) lookup) and have some inner structures like hash_map<key_type,oredered_index_type::iterator>
storing pointers to the first element of each cluster. This has to be properly synchronized with the ordered index on insertion/deletion.
It could be hash, it could be random access or it could be any other custom faster lookup. This is essencially is just a way to jump particulat position in the Order based on some conditions. I may have records ordered by creation time for last week and index that allows me to jump to start of todays and yestadays entries.
2. If the index to be sped up is of the sequenced kind, we've got a problem: you require that the bookmark refer to clusters by using iterators from the index itself, as implied in your example:
std::pair<full_name_order::iterator,full_name_order::iterator> p = fno.get_index<LastName>::equal_range( "Smith" );
and explicitly stated by you in a previous post:
"All indexes for the same order present the same iterator which is a view iterator."
This is AFAICS impossible if the index is sequenced,
I may misunderstand what sequenced order is. I understand it as just a "natural" order entries where added into collection with. Why couldn't I have an Index that would allow me to jump directly to the every 10000th entry?
since the elements can be freely rearranged and thus a cluster need not be adjacent (from the point of view of the sequenced index iterator).
What do you mean by rearanged and cluster here? Gennadiy
participants (4)
-
Gennadiy Rozental
-
JOAQUIN LOPEZ MU?Z
-
Joaquin M Lopez Munoz
-
Pavel Vozenilek