[Multi-Index] performance difference between lower_bound() and upper_bound()
data:image/s3,"s3://crabby-images/c749d/c749d05e54c964f75131343fde6028537575e771" alt=""
I have a multi-index BMI (Boost 1.44), with an ordered_non_unique index on a"name" string field, which is lexicographically ordered. I'm searching for the range of values of the form "$prefix $integer", i.e. "foo 1", "foo 99", etc... to obtain this "range", I initially used (pseudo-code): auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9"); and within the range I test the suffix for its number'ness and keep the largest number. But I was surprised to see it performed badly for large number of calls. I switched to using instead: auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :"); And this performs much better (20x). The "corpus" of names of this particular test may be skewing the results, since entries are added to the BMI with monotonically increasing number suffixes, and all entries have the prefix I'm looking for, yet I'm very surprised of such a large performance difference between the lower/upper version, and the lower/lower version. Anyone has a clue what could be going on? Thanks, --DD PS: I'm assuming the two versions are equivalent. Let me know if I'm mistaken please. This is not yet thoroughly tested.
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Dominique Devienne
I have a multi-index BMI (Boost 1.44), with an ordered_non_unique index on a"name" string field, which is lexicographically ordered. I'm searching for the range of values of the form "$prefix $integer", i.e. "foo 1", "foo 99", etc...
to obtain this "range", I initially used (pseudo-code):
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");
and within the range I test the suffix for its number'ness and keep the largest number.
But I was surprised to see it performed badly for large number of calls.
I switched to using instead:
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");
And this performs much better (20x).
There is no reason why it should, if both ops are returning the sme range. Could you please show the customcomparison predicate you're using? This might shed some light on the issue. Thank you, Joaquín M López Muñoz Telefónica Digital
data:image/s3,"s3://crabby-images/c749d/c749d05e54c964f75131343fde6028537575e771" alt=""
On Thu, Nov 21, 2013 at 5:38 PM, Joaquin M Lopez Munoz
Dominique Devienne
writes: [...] to obtain this "range", I initially used (pseudo-code):
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");
[...] I switched to using instead:
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");
And this performs much better (20x).
There is no reason why it should, if both ops are returning the same range. Could you please show the custom comparison predicate you're using? This might shed some light on the issue.
Thanks Joaquín. Unfortunately, it's not easy to share the actual BMI definition, which uses intermediary templates for the index definitions (for reuse in other BMIs), a custom extractor of the name to work-around VC "thunk" warnings (since name() is a pure virtual method), and the string-type is a legacy custom one. I do not specify a custom comparison predicate to the BMI for name(), so it relies on std::less and calls the custom op< of our string type, which calls strncmp after some null checks. So it's a strict "ascii" lexicographical order, no locale used, and I'm confident it's ordering things correctly (it's been in place for months/years). Thanks for confirming it should not matter. I'll try with different sets of names in the index (the one I'm using now is very special), and I will confirm the same ranges are returned. --DD
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Dominique Devienne
On Thu, Nov 21, 2013 at 5:38 PM, Joaquin M Lopez Munoz
tid.es> wrote:
Dominique Devienne
writes: [...] to obtain this "range", I initially used (pseudo-code):
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");>> [...] I
switched to using instead:>
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");> And this performs much better (20x). There is no reason why it should, if both ops are returning the same range. Could you please show the custom comparison predicate you're using? This might shed some light on the issue.
Thanks Joaquín.
[...]
I do not specify a custom comparison predicate to the BMI for name(), so it relies on std::less and calls the custom op< of our string type, which calls strncmp after some null checks. So it's a strict "ascii" lexicographical order, no locale used, and I'm confident it's ordering things correctly (it's been in place for months/years).
OK, this gives extra info: I thought you were using a custom comparator
to test for prefix-equality --now I see you aren't, but instead you
just rely on plain regular string comparison.
So,consider
typedef multi_index_container<
std::string,
indexed_by<
ordered_non_unique
multi_t;
multi_t c={ "a 0","a 000","a 1","a 9","a 90","a :" }; The following auto first=c.lower_bound("a 0"); auto last=c.upper_bound("a 9"); while(first!=last)std::cout<<*first++<<", "; produces the output a 0, a 000, a 1, a 9 while this auto first=c.lower_bound("a 0"); auto last=c.lower_bound("a :"); while(first!=last)std::cout<<*first++<<", "; produces a 0, a 000, a 1, a 9, a 90, See the difference? Is this related to your problem? Joaquín M López Muñoz Telefónica Digital
data:image/s3,"s3://crabby-images/c749d/c749d05e54c964f75131343fde6028537575e771" alt=""
On Fri, Nov 22, 2013 at 9:21 PM, Joaquin M Lopez Munoz
[...] The following
auto first=c.lower_bound("a 0"); auto last=c.upper_bound("a 9"); while(first!=last)std::cout<<*first++<<", ";
produces the output
a 0, a 000, a 1, a 9
while this
auto first=c.lower_bound("a 0"); auto last=c.lower_bound("a :"); while(first!=last)std::cout<<*first++<<", ";
produces
a 0, a 000, a 1, a 9, a 90,
See the difference? Is this related to your problem?
Indeed it is. You're right on the money, thank you Joaquín! By not taking the full-range into account, I was not really getting the "max suffix", so when I was trying to insert a new entry with name "$prefix $max_suffix_int", I was again getting a conflict, which triggered another loop necessary for a different conflict resolution scheme (the new one is supposed to find a non-conflicting name in a single pass, at the cost of scanning the range of possible conflicts to find the max_suffix_int of really conflicting entries - some entries have the same name, butother keys make them non-conflicting). I also tested that both lower_bound(prefix + " :") and upper_bound(prefix + " 9999999999") (I use an int32_t suffix) give me the same upper range iterator, with similar performance. So it was all in my buggy code (well, more that I can't properly sort lexicographically in my head...), and BMI is behaving as expected (as I suspected all along). Thank you again for helping me get to the bottom of this. --DD
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
El 28/11/2013 13:55, Dominique Devienne escribió: By not taking the full-range into account, I was not really getting the "max suffix", so when I was trying to insert a new entry with name "$prefix $max_suffix_int", I was again getting a conflict, which triggered another loop necessary for a different conflict resolution scheme (the new one is supposed to find a non-conflicting name in a single pass, at the cost of scanning the range of possible conflicts to find the max_suffix_int of really conflicting entries - some entries have the same name, butother keys make them non-conflicting). I also tested that both lower_bound(prefix + " :") and upper_bound(prefix + " 9999999999") (I use an int32_t suffix) give me the same upper range iterator, with similar performance. Watch out: depending on how your strings look like, these two bounds are not equivalent. For instance: "p 9999999999" < "p 9999999999 foo" < "p :" I understand you're already taking this into account, but wanted to point it out just in case. Best, Joaquín M López Muñoz Telefónica Digital ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
data:image/s3,"s3://crabby-images/c749d/c749d05e54c964f75131343fde6028537575e771" alt=""
On Fri, Nov 29, 2013 at 10:34 AM, Joaquín Mª López Muñoz
El 28/11/2013 13:55, Dominique Devienne escribió: Watch out: depending on how your strings look like, these two bounds are not equivalent. For instance:
"p 9999999999" < "p 9999999999 foo" < "p :"
I understand you're already taking this into account, but wanted to point it out just in case. Best,
Thanks. Useful reminder. I'm checking the end pointer returned by strtol against the end of the string, so I discard entries which are not strictly prefix + space + integer. --DD
participants (3)
-
Dominique Devienne
-
Joaquin M Lopez Munoz
-
Joaquín Mª López Muñoz