Turning Boost.Multi-Index ordered_non_unique index into ordered_unique one, with custom comparator
I would like to support a slightly unusual use-case. We have a class with a integer member, and a BMI container for instances of this class. That member is supposed to be unique, *except* for a special "invalid" value (-1), which several instances can use. Right now, we use an ordered_non_unique index, because of the -1's, but that leaves the door open for other duplicate "valid" values, which should be forbidden. So my question is whether one can index this member uniquely, thanks to a custom comparator, such that the comparator does not violate the usual strict-weak-ordering requirement of ordered containers? I'm thinking kinda like SQL NULLs, which don't compare equal to each other, i.e. in this case -1 would never equal another -1, and use another arbitrary (but stable) ordering specifically for those -1's. This assumes though I can know which instance the int member is part of, i.e. will the comparator get the integer by value or reference, and if a reference, is that a reference to the "internal" element, such that the arbitrary order can be the address of the element for example? Or to avoid element movement in case of reallocs in the container, can the comparator access an iterator to project it into another random-access index of the same container, to use that index's integer-index as the arbitrary order? I'd appreciate some feedback/insights into the above. Thanks, --DD
AMDG On 03/02/2016 03:12 AM, Dominique Devienne wrote:
I would like to support a slightly unusual use-case.
We have a class with a integer member, and a BMI container for instances of this class.
That member is supposed to be unique, *except* for a special "invalid" value (-1), which several instances can use.
Right now, we use an ordered_non_unique index, because of the -1's, but that leaves the door open for other duplicate "valid" values, which should be forbidden.
So my question is whether one can index this member uniquely, thanks to a custom comparator, such that the comparator does not violate the usual strict-weak-ordering requirement of ordered containers? I'm thinking kinda like SQL NULLs, which don't compare equal to each other, i.e. in this case -1 would never equal another -1, and use another arbitrary (but stable) ordering specifically for those -1's. This assumes though I can know which instance the int member is part of, i.e. will the comparator get the integer
You could use the identity key extractor so that the comparator sees the whole object.
by value or reference, and if a reference, is that a reference to the "internal" element, such that the arbitrary order can be the address of the element for example?
You can't use the address, because copies need to be equivalent. You might be able to get away with it if you make the elements non-copyable and only insert them with emplace.
Or to avoid element movement in case of reallocs in the container, can the comparator access an iterator to project it into another random-access index of the same container, to use that index's integer-index as the arbitrary order?
The comparator needs to work on objects that are not in the container. Otherwise, you would have problems with insertion and lookup.
I'd appreciate some feedback/insights into the above. Thanks, --DD
In Christ, Steven Watanabe
Dominique Devienne <ddevienne <at> gmail.com> writes:
I would like to support a slightly unusual use-case. We have a class with a integer member, and a BMI container for instances of this class.
That member is supposed to be unique, *except* for a special "invalid" value (-1), which several instances can use.
Right now, we use an ordered_non_unique index, because of the -1's, but that leaves the door open for other duplicate "valid" values, which should be forbidden.
So my question is whether one can index this member uniquely, thanks to a custom comparator, such that the comparator does not violate the usual strict-weak-ordering requirement of ordered containers?
Steven's answer is, as always, correct, and we can leave it at that. But, if you want to venture into uncharted lands... Suppose we use the following: template<typename T,T puncture> struct punctured_less { bool operator()(const T& x,const T& y)const { if(x==puncture&&y==puncture)return true; return x<y; } }; using set=boost::multi_index_container< int, indexed_by< ordered_unique<identity<int>,punctured_less<int,-1>> >
;
Fist thing to notice is: punctured_less<int,-1> is *not* a strict weak ordering. For instance, it is not irreflexive as ::operator()(-1,-1)==true. But if we write: set s={-3,-2,-1,0,0,1,-1,2,3,-1,4,5,-1,6,7,-1,8,9,9,-1}; for(int x:s)std::cout<<x<<" "; the thing seemingly works and prints -3 -2 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 Note duplicates for 0 and 9 have been rejectet as they should. How come this is behaving so reasonably? The key insight is that insert(x) does as many internal comparisons as *strictly* necessary and not one more. Which means: no comparison is made if it can be inferred from the general properties of strict weak orderings (in particular, x is not compared against itself). Another interesting way to look at this fact is: insert(x) walks down the internal rb tree doing the minimum required number of comparisons (log(N) or the number of levels in the tree) and there is always a strict weak ordering compatible with whatever log(N) outputs we get from the Compare object (hey, even if Compare spits random booleans). In the case of punctured_less<int,-1> Boost.MultiIndex behaves as if each new -1 to be inserted is less that all other -1s already in the container. Insertion sort of works, lookup is more difficult to handle. You can verify that s.lower_bound(-1)==s.lower_bound(0) s.upper_bound(-1)==s.upper_bound(-2) that is, looking up for -1 behaves as if the -1s are not quite there, which is (in some sense) consistent with the fact that -1s never compare false to themselves. Note that [s.lower_bound(-1),s.upper_bound(-1)) is not a proper range (the end precedes the beginning). equal_range also behaves oddly: s.equal_range(-1)==std::make_pair(s.lower_bound(0),s.lower_bound(0) which is at least a valid range (but not the same as std::make_pair(s.lower_bound(-1),s.upper_bound(-1)) as guaranteed by the documentation). Lookup for values other than -1 behaves perfectly, as it is only when invoking operator()(-1,-1) that strict weak ordering properties are violated. If you're able to live in a state of undefined behavior and cope with the oddities described above, well, you might take advantage of this. Joaquín M López Muñoz
On Thu, Mar 3, 2016 at 10:21 AM, Joaquin M LópezMuñoz <joaquin@tid.es> wrote:
Dominique Devienne <ddevienne <at> gmail.com> writes:
I would like to support a slightly unusual use-case. [...]
Steven's answer is, as always, correct, and we can leave it at that. But, if you want to venture into uncharted lands...
it always is indeed. But also terse as well, also as usual :) I appreciate the hand-holding here Joaquin, I can definitely use it!
Suppose we use the following:
template<typename T,T puncture> struct punctured_less { bool operator()(const T& x,const T& y)const { if(x==puncture&&y==puncture)return true;
But can we do better? If like in any initial email, you take into account that the to-be-indexed int is part of a larger struct with other fields, which can be used to "break ties" between elements who's index match the "puncture"? This is why I asked about the comparator accessing the whole element struct.
return x<y; } };
using set=boost::multi_index_container< int, indexed_by< ordered_unique<identity<int>,punctured_less<int,-1>> >
;
Stated differently, this is closer to our actual use-case: struct entry { int uid; // real uuid in reality, but enough for our needs here int occurence; // the field to be indexed, uniquely "except for -1" }; struct by_occurence{}; using bmi = boost::multi_index_container< entry, indexed_by< random_access<>, ordered_unique< tag<by_occurence>, identity<entry>, magical_occurence_less<entry, -1> >
;
except right now, we have, ordered_non_unique< tag<by_occurence>, member<entry,int,&entry::occurence> >
First thing to notice is: punctured_less<int,-1> is *not* a strict weak ordering.
Yes. And that's why I'm asking whether "other" fields can "make it better" somehow. I "think" lookups would be resolved then, but I'm unsure that's "sound". --DD
Dominique Devienne <ddevienne <at> gmail.com> writes:
On Thu, Mar 3, 2016 at 10:21 AM, Joaquin M LópezMuñoz <joaquin <at>
Suppose we use the following: template<typename T,T puncture> struct punctured_less { bool operator()(const T& x,const T& y)const { if(x==puncture&&y==puncture)return true;
But can we do better? If like in any initial email, you take into account that the to-be-indexed int is part of a larger struct with other fields, which can be used to "break ties" between elements who's index match the "puncture"?
[...]
struct entry { int uid; // real uuid in reality, but enough for our needs here int occurence; // the field to be indexed, uniquely "except for -1" };
OK, yes, we can do something with that. What you want is unique sorting by occurrence, except when occurrence==-1 in which case you want to (uniquely) sort lexicographically on uid. This *is* indeed a SWO, so we can do: struct entry { int uid; int occurrence; }; struct entry_occurrence_less { bool operator()(const entry& x,const entry& y)const { if(x.occurrence==-1&&y.occurrence==-1)return x.uid<y.uid; return x.occurrence<y.occurrence; } bool operator()(const entry& x,int y)const { if(x.occurrence==-1&&y==-1)return false; return x.occurrence<y; } bool operator()(int x,const entry& y)const { if(x==-1&&y.occurrence==-1)return false; return x<y.occurrence; } }; using set=boost::multi_index_container< entry, indexed_by< ordered_unique<identity<entry>,entry_occurrence_less> > >; The following set s={ {386,-3},{29387,-2},{2323,-1},{231,0},{394857,0}, {123,1},{38274,-1},{4534,2},{234823,3},{8342,-1}, {5932,4},{12892,5},{928173,-1},{22332,6},{78172,7}, {8784734,-1},{43329,8},{545434,9},{10900,9},{33334,-1}}; for(const entry& x:s)std::cout<<x.occurrence<<" "; outputs -3 -2 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 as it should: duplicates for 0 an 9 are rejected, multiple -1s allowed. Thanks to the overloads of entry_occurrence_less::operator() we can lookup by occurrence as if we were using member<entry,int,&entry::occurence>: std::cout<<s.find(5)->uid<<"\n"; // prints 12892 and lookup on -1 also works as expected. For instance, the following for(auto r=s.equal_range(-1);r.first!=r.second;++r.first){ std::cout<<r.first->uid<<" "; } outputs 2323 8342 33334 38274 928173 8784734 I think this suits exactly your needs, please report back otherwise. Joaquín M López Muñoz
On Thu, Mar 3, 2016 at 2:06 PM, Joaquin M LópezMuñoz <joaquin@tid.es> wrote:
Dominique Devienne <ddevienne <at> gmail.com> writes:
On Thu, Mar 3, 2016 at 10:21 AM, Joaquin M LópezMuñoz <joaquin <at> But can we do better? If like in any initial email, you take into account that the to-be-indexed int is part of a larger struct with other fields, which can be used to "break ties" between elements who's index match the "puncture"?
[...]
struct entry { int uid; // real uuid in reality, but enough for our needs here int occurence; // the field to be indexed, uniquely "except for -1" };
OK, yes, we can do something with that. What you want is unique sorting by occurrence, except when occurrence==-1 in which case you want to (uniquely) sort lexicographically on uid. This *is* indeed a SWO, [...]
Thank you for confirming what I suspected. [...] Thanks to the overloads of entry_occurrence_less::operator()
we can lookup by occurrence as if we were using member<entry,int,&entry::occurence>:
This is the part I was missing, and struggling with. Seems simple, in hindsight :)
I think this suits exactly your needs, please report back otherwise.
Thank you so much for your help Joaquin. (and for BMI of course! My favorite container!!!) This is exactly what I was looking for. Encore merci, --DD
On Thu, Mar 3, 2016 at 2:12 PM, Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, Mar 3, 2016 at 2:06 PM, Joaquin M LópezMuñoz <joaquin@tid.es> wrote:
Dominique Devienne <ddevienne <at> gmail.com> writes:
[...] Thanks to the overloads of entry_occurrence_less::operator()
we can lookup by occurrence as if we were using
member<entry,int,&entry::occurence>:
This is the part I was missing, and struggling with.
Note that this CompatibleKey feature of BMI is just super handy, BTW. I've used it before, and even wrote a custom comparator with overloads, which means I should have seen your excellent solution myself. Really the "only" thing missing from this container are "runtime" (non-unique) indexes :). --DD
Dominique Devienne <ddevienne <at> gmail.com> writes:
Note that this CompatibleKey feature of BMI is just super handy, BTW.
I've used it before, and even wrote a custom comparator with overloads, which means I should have seen your excellent solution myself.
Toujours un plaisir d'aider les Boosteurs.
Really the "only" thing missing from this container are "runtime" (non-unique) indexes :).
This is an entirely different beast with worse performance and memory behavior than the static counterpart. I toyed with the idea many years ago but never put myself into doing it for real (industry level I mean) Joaquín M López Muñoz
participants (4)
-
Dominique Devienne
-
Joaquin M Lopez Munoz
-
Joaquin M López Muñoz
-
Steven Watanabe