[Multi-Index] New "hierarchical" indexing?

older
[Q] regular expressions change the...

Dominique Devienne

10 Jul 2009 10 Jul '09

6:29 p.m.

I'm using BMI containers with several indexes, notably an index on the parent of each entry. But some of my BMIs have a hierarchical parent-child relationship of their entries, and I need to answer hierarchical queries like is-child-of, is-descendant-of, is-ancestor-of, etc... In another use case, entries have a "scenario" key, where scenario is hierarchical, and I want all entries in BMI A from a given scenario and all its ancestors, with the scenarios being in BMI B (i.e. "cross-bmi"). Of course I can do a "full scan" but I'm looking for something hopefully faster/smarter. Has anyone experimented with providing such a specialized index? At least in one instance, I'm manually keeping a separate container with all the edges of the DAG to be able to answer such a query, but then I have to maintain this "index" separately from the BMI, which is precisely what BMI is designed to avoid. This is also a solution that is OK in this particular case (~ 1500 nodes, ~7500 edges), but may be too expensive for some other graphs with more nodes. Any input would be appreciated. Thanks, --DD

Show replies by date

JOAQUIN M. LOPEZ MUÑOZ

11 Jul 11 Jul

10:17 a.m.

________________________________________ De: boost-users-bounces@lists.boost.org [boost-users-bounces@lists.boost.org] En nombre de Dominique Devienne [ddevienne@gmail.com] Enviado el: viernes, 10 de julio de 2009 20:29 Para: boost-users Asunto: [Boost-users] [Multi-Index] New "hierarchical" indexing?

...

In another use case, entries have a "scenario" key, where scenario is hierarchical, and I want all entries in BMI A from a given scenario and all its ancestors, with the scenarios being in BMI B (i.e. "cross-bmi"). Of course I can do a "full scan" but I'm looking for something hopefully faster/smarter.

Not entirely sure if the following is directly applicable to your problem, but anyway. If one of the keys of an index has a hierarchical nature, one can do child-of queries by taking advantage of compatible sorting criteria. Let me introduce an example. Suppose we have entries having an associated scenario which we simply model by a path showing the hierarchical relations among them: typedef std::vector<std::string> path_t; typedef flyweight<path_t> scenario_t; struct entry { entry(const scenario_t& scenario):scenario(scenario){} scenario_t scenario; // payload }; Note that we're compressing scenarios with Boost.Flyweight. This lets us dispose of an auxiliary container to hold the scenarios, which is what you're likely doing now. Our container of entries indexed by scenario is typedef multi_index_container< entry, indexed_by< ordered_non_unique<member<entry,scenario_t,&entry::scenario> > >

...

entry_container_t;

Now we define: struct parent_t:scenario_t { parent_t(const scenario_t& s):scenario_t(s){} }; struct children { bool operator()(const parent_t& p,const scenario_t& s)const { return std::lexicographical_compare( p.get().begin(),p.get().end(), s.get().begin(),s.get().begin()+std::min(p.get().size(),s.get().size())); } bool operator()(const scenario_t& s,const parent_t& p)const { return std::lexicographical_compare( s.get().begin(),s.get().begin()+std::min(p.get().size(),s.get().size()), p.get().begin(),p.get().end()); } }; What is this? parent_t is just a sort of alias to scenario_t that allows us to differentiate between a "terminal" scenario and a scenario used for purposes of recursive lookup. children is a compatible (with the order of entry_container_t) compare predicate that levels off scenario paths in a way that, for instance, if we have parent "a" then "a/b", "a/d", "a/b/e" etc. are deemed equivalent to the parent. This allows us to locate and count children: // count children entries of scenario s c.count(parent_t(s),children()) Please find attached a complete example shwoing the technique. Is this of any help to your particular problem? Thanks for using Boost.MultiIndex. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

Dominique Devienne

13 Jul 13 Jul

8:54 p.m.

On Sat, Jul 11, 2009 at 5:17 AM, JOAQUIN M. LOPEZ MUÑOZ<joaquin@tid.es> wrote:

...

...
In another use case, entries have a "scenario" key, where scenario is hierarchical, and I want all entries in BMI A from a given scenario and all its ancestors, with the scenarios being in BMI B (i.e. "cross-bmi"). Of course I can do a "full scan" but I'm looking for something hopefully faster/smarter.

Not entirely sure if the following is directly applicable to your problem, but anyway.

Thank you for taking the time to provide a full exemple. That's really nice!

...

typedef multi_index_container< entry, indexed_by< ordered_non_unique<member<entry,scenario_t,&entry::scenario> > > > entry_container_t;

I'm currently using hashed indices only. Does the technique of using a compatible key extend to them? (I suspect not in this case, but I thought I'd ask, just in case I'm misunderstanding.)

...

What is this? parent_t is just a sort of alias to scenario_t that allows us to differentiate between a "terminal" scenario and a scenario used for purposes of recursive lookup. children is a compatible (with the order of entry_container_t) compare predicate that levels off scenario paths in a way that, for instance, if we have parent "a" then "a/b", "a/d", "a/b/e" etc. are deemed equivalent to the parent.

I was thinking about such a Materialized Path impl, after reading http://www.dbazine.com/oracle/or-articles/tropashko4 recently. But I would not have been able to express it in an example as easily as you could. I appreciate the leg up.

...

Is this of any help to your particular problem?

I think it does. I'll need time to digest and adapt it, but it sounds like Materialized Path with an ordered index will avoid a full scan. Thanks again for your help. --DD

joaquin＠tid.es

14 Jul 14 Jul

6:18 a.m.

Dominique Devienne escribió:

...

I'm currently using hashed indices only. Does the technique of using a compatible key extend to them? (I suspect not in this case, but I thought I'd ask, just in case I'm misunderstanding.)

No, it doesn't extend to hashed indices, you have to stick with ordered ones. The crucial fact upon which the parent/children sorting criteria relies is that sorting on scenario_t induces a lexicographical order that lists subtrees together (if you think of paths as forming a directory tree, lexicographical order is equivalent to inorder traversal).

...

...
What is this? parent_t is just a sort of alias to scenario_t that allows us to differentiate between a "terminal" scenario and a scenario used for purposes of recursive lookup. children is a compatible (with the order of entry_container_t) compare predicate that levels off scenario paths in a way that, for instance, if we have parent "a" then "a/b", "a/d", "a/b/e" etc. are deemed equivalent to the parent.

I was thinking about such a Materialized Path impl, after reading http://www.dbazine.com/oracle/or-articles/tropashko4 recently. But I would not have been able to express it in an example as easily as you could. I appreciate the leg up.

I've only taken a cursory look at the reference you provide, but seems indeed that my proposal basically implements the materialized path idea.

...

...
Is this of any help to your particular problem?

I think it does. I'll need time to digest and adapt it, but it sounds like Materialized Path with an ordered index will avoid a full scan.

Good luck with your project, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo

5884

Age (days ago)

5888

Last active (days ago)

List overview

Download

3 comments

3 participants

participants (3)

Dominique Devienne
JOAQUIN M. LOPEZ MUÑOZ
joaquin＠tid.es