[multi-index] retrieve unique keys of a non-unique index

Hi everyone (my apologies if this shows as a report), In the following example is there a more efficient way to retrieve the unique keys of a non-unique index? #include <boost/multi_index_container.hpp> #include <boost/multi_index/member.hpp> #include <boost/multi_index/hashed_index.hpp> #include <boost/tuple/tuple.hpp> #include <algorithm> #include <iostream> #include <iterator> #include <string> using boost::multi_index_container; using namespace boost::multi_index; struct name {}; struct value {}; struct guid {}; struct TaggedNodeEntry { int name; int value; std::string guid; TaggedNodeEntry(int name_, int value_, std::string guid_) : name(name_), value(value_), guid(guid_) {} friend std::ostream& operator<<(std::ostream& os, const TaggedNodeEntry& e) { os << e.name << " " << e.value << " " << e.guid; return os; } }; typedef multi_index_container< TaggedNodeEntry, indexed_by< hashed_non_unique< tag<name>, BOOST_MULTI_INDEX_MEMBER(TaggedNodeEntry, int, name) >, hashed_non_unique< tag<value>, BOOST_MULTI_INDEX_MEMBER(TaggedNodeEntry, int, value) >, hashed_non_unique< tag<guid>, BOOST_MULTI_INDEX_MEMBER(TaggedNodeEntry, std::string, guid) >
TaggedNodes;
int main() { TaggedNodes taggedNodes; taggedNodes.insert(TaggedNodeEntry(1, 1, "one-one")); taggedNodes.insert(TaggedNodeEntry(1, 2, "one-two")); taggedNodes.insert(TaggedNodeEntry(1, 3, "one-three")); taggedNodes.insert(TaggedNodeEntry(2, 1, "two-one")); taggedNodes.insert(TaggedNodeEntry(2, 2, "two-two")); taggedNodes.insert(TaggedNodeEntry(3, 6, "three-six")); std::cout << std::endl << "Unique names :-" << std::endl; boost::multi_index::index<TaggedNodes, name>::type::iterator ic0 = get<name>(taggedNodes).begin(); boost::multi_index::index<TaggedNodes, name>::type::iterator ic1 = get<name>(taggedNodes).end(); while (ic0 != ic1) { std::cout << ic0->name << std::endl; ic0 = get<name>(taggedNodes).equal_range(ic0->name).second; } return 0; }

On Thu, Aug 1, 2013 at 10:47 PM, Rory McCarthy < rorymccarthy77@googlemail.com> wrote:
Why not simply switch from hashed_non_unique to ordered_non_unique? That way you can full-scan that ordered index, and check yourself when the key changes, since you are guaranteed the entries will be clustered by the key, w/o resorting to equal_range() like for the hashed container. You lose a bit of speed for arbitrary lookups compared to the hashed index (not much I suspect), but should gain on getting the unique keys IMHO. --DD

El 01/08/2013 22:47, Rory McCarthy escribió: Hi everyone (my apologies if this shows as a report), In the following example is there a more efficient way to retrieve the unique keys of a non-unique index? [...] boost::multi_index::index<TaggedNodes, name>::type::iterator ic0 = get<name>(taggedNodes).begin(); boost::multi_index::index<TaggedNodes, name>::type::iterator ic1 = get<name>(taggedNodes).end(); while (ic0 != ic1) { std::cout << ic0->name << std::endl; ic0 = get<name>(taggedNodes).equal_range(ic0->name).second; } In big-O terms, this is as good as you can get with hashed indices. A slightly more efficient approach (saving some unnecessary lookup) would be: void dump_entry_name(const TaggedNodeEntry& x) { std::cout<<x.name<<std::endl; } bool equal_entry_name(const TaggedNodeEntry& x,const TaggedNodeEntry& y) { return x.name==y.name; } ... std::unique_copy( get<name>(taggedNodes).begin(),get<name>(taggedNodes).end(), boost::make_function_output_iterator(&dump_entry_name), equal_entry_name); (which gets nicer if you can use lambda functions instead.) This is O(n), n being the number of total elements. If you use ordered indices, then appropriate use of upper_bound to skip a range of equivalent elements (much as you're doing with equal_range) would yield O(m*log(n)), where m is the number of *different* elements: this is better than the original when m << n. Joaquín M López Muñoz Telefónica Digital ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx

On Fri, Aug 2, 2013 at 9:36 AM, Joaquín Mª López Muñoz <joaquin@tid.es>wrote:
Ah yes, entries are clustered by keys during iteration whether ordered or hashed, I now realize. Thanks. Here's my STL-challenged version using BOOST_FOREACH. Less generic of course, but hopefully not buggy. --DD if (!get<name>(taggedNodes).empty()) { int curr_key = get<name>(taggedNodes).begin()->name; std::cout << curr_key << std::endl; BOOST_FOREACH (const TaggedNodeEntry& x, get<name>(taggedNodes)) { if (x.name != curr_key) { curr_key = x.name; std::cout << curr_key << std::endl; } } }

Thanks for the suggestions. I'm expecting that there will be relatively few unique name keys, so using ordered_non_unique with O(m*log(n)) complexity will definitely be more efficient, On Fri, Aug 2, 2013 at 1:05 AM, Dominique Devienne <ddevienne@gmail.com>wrote:
participants (3)
-
Dominique Devienne
-
Joaquín Mª López Muñoz
-
Rory McCarthy