[Multi-Index] performance difference between lower_bound() and upper_bound()

newer
Re: [Boost-users] [Thread] Using...

Dominique Devienne

21 Nov 2013 21 Nov '13

2:14 p.m.

I have a multi-index BMI (Boost 1.44), with an ordered_non_unique index on a"name" string field, which is lexicographically ordered. I'm searching for the range of values of the form "$prefix $integer", i.e. "foo 1", "foo 99", etc... to obtain this "range", I initially used (pseudo-code): auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9"); and within the range I test the suffix for its number'ness and keep the largest number. But I was surprised to see it performed badly for large number of calls. I switched to using instead: auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :"); And this performs much better (20x). The "corpus" of names of this particular test may be skewing the results, since entries are added to the BMI with monotonically increasing number suffixes, and all entries have the prefix I'm looking for, yet I'm very surprised of such a large performance difference between the lower/upper version, and the lower/lower version. Anyone has a clue what could be going on? Thanks, --DD PS: I'm assuming the two versions are equivalent. Let me know if I'm mistaken please. This is not yet thoroughly tested.

Attachments:

attachment.html (text/html — 1.7 KB)

Show replies by date

Joaquin M Lopez Munoz

21 Nov 21 Nov

4:38 p.m.

New subject: [Boost-users] [Multi-Index] performance difference between lower_bound() and upper_bound()

Dominique Devienne <ddevienne <at> gmail.com> writes:

...

I have a multi-index BMI (Boost 1.44), with an ordered_non_unique index on a"name" string field, which is lexicographically ordered. I'm searching for the range of values of the form "$prefix $integer", i.e. "foo 1", "foo 99", etc...

to obtain this "range", I initially used (pseudo-code):

auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");

and within the range I test the suffix for its number'ness and keep the largest number.

But I was surprised to see it performed badly for large number of calls.

I switched to using instead:

auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");

And this performs much better (20x).

There is no reason why it should, if both ops are returning the sme range. Could you please show the customcomparison predicate you're using? This might shed some light on the issue. Thank you, Joaquín M López Muñoz Telefónica Digital

Dominique Devienne

22 Nov 22 Nov

9:03 a.m.

New subject: [Multi-Index] performance difference between lower_bound() and upper_bound()

On Thu, Nov 21, 2013 at 5:38 PM, Joaquin M Lopez Munoz <joaquin@tid.es>wrote:

...

Dominique Devienne <ddevienne <at> gmail.com> writes:

...
[...] to obtain this "range", I initially used (pseudo-code):

auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");

[...] I switched to using instead:

auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");

And this performs much better (20x).

There is no reason why it should, if both ops are returning the same range. Could you please show the custom comparison predicate you're using? This might shed some light on the issue.

Thanks Joaquín. Unfortunately, it's not easy to share the actual BMI definition, which uses intermediary templates for the index definitions (for reuse in other BMIs), a custom extractor of the name to work-around VC "thunk" warnings (since name() is a pure virtual method), and the string-type is a legacy custom one. I do not specify a custom comparison predicate to the BMI for name(), so it relies on std::less and calls the custom op< of our string type, which calls strncmp after some null checks. So it's a strict "ascii" lexicographical order, no locale used, and I'm confident it's ordering things correctly (it's been in place for months/years). Thanks for confirming it should not matter. I'll try with different sets of names in the index (the one I'm using now is very special), and I will confirm the same ranges are returned. --DD

Joaquin M Lopez Munoz

8:21 p.m.

New subject: [Boost-users] [Multi-Index] performance difference between lower_bound() and upper_bound()

Dominique Devienne <ddevienne <at> gmail.com> writes:

...

On Thu, Nov 21, 2013 at 5:38 PM, Joaquin M Lopez Munoz <joaquin <at>

tid.es> wrote:

...

Dominique Devienne <ddevienne <at> gmail.com> writes:

...
[...] to obtain this "range", I initially used (pseudo-code):

...
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().upper_bound(prefix + " 9");>> [...] I

switched to using instead:>

...
auto begin = by_name_index().lower_bound(prefix + " 0"); auto end = by_name_index().lower_bound(prefix + " :");> And this performs much better (20x). There is no reason why it should, if both ops are returning the same range. Could you please show the custom comparison predicate you're using? This might shed some light on the issue.

Thanks Joaquín.

[...]

I do not specify a custom comparison predicate to the BMI for name(), so it relies on std::less and calls the custom op< of our string type, which calls strncmp after some null checks. So it's a strict "ascii" lexicographical order, no locale used, and I'm confident it's ordering things correctly (it's been in place for months/years).

OK, this gives extra info: I thought you were using a custom comparator to test for prefix-equality --now I see you aren't, but instead you just rely on plain regular string comparison. So,consider typedef multi_index_container< std::string, indexed_by< ordered_non_unique<identity<std::string>> >

...

multi_t;

multi_t c={ "a 0","a 000","a 1","a 9","a 90","a :" }; The following auto first=c.lower_bound("a 0"); auto last=c.upper_bound("a 9"); while(first!=last)std::cout<<*first++<<", "; produces the output a 0, a 000, a 1, a 9 while this auto first=c.lower_bound("a 0"); auto last=c.lower_bound("a :"); while(first!=last)std::cout<<*first++<<", "; produces a 0, a 000, a 1, a 9, a 90, See the difference? Is this related to your problem? Joaquín M López Muñoz Telefónica Digital

Dominique Devienne

28 Nov 28 Nov

12:55 p.m.

New subject: [Multi-Index] performance difference between lower_bound() and upper_bound()

On Fri, Nov 22, 2013 at 9:21 PM, Joaquin M Lopez Munoz <joaquin@tid.es>wrote:

...

[...] The following

auto first=c.lower_bound("a 0"); auto last=c.upper_bound("a 9"); while(first!=last)std::cout<<*first++<<", ";

produces the output

a 0, a 000, a 1, a 9

while this

auto first=c.lower_bound("a 0"); auto last=c.lower_bound("a :"); while(first!=last)std::cout<<*first++<<", ";

produces

a 0, a 000, a 1, a 9, a 90,

See the difference? Is this related to your problem?

Indeed it is. You're right on the money, thank you Joaquín! By not taking the full-range into account, I was not really getting the "max suffix", so when I was trying to insert a new entry with name "$prefix $max_suffix_int", I was again getting a conflict, which triggered another loop necessary for a different conflict resolution scheme (the new one is supposed to find a non-conflicting name in a single pass, at the cost of scanning the range of possible conflicts to find the max_suffix_int of really conflicting entries - some entries have the same name, butother keys make them non-conflicting). I also tested that both lower_bound(prefix + " :") and upper_bound(prefix + " 9999999999") (I use an int32_t suffix) give me the same upper range iterator, with similar performance. So it was all in my buggy code (well, more that I can't properly sort lexicographically in my head...), and BMI is behaving as expected (as I suspected all along). Thank you again for helping me get to the bottom of this. --DD

Joaquín Mª López Muñoz

29 Nov 29 Nov

9:34 a.m.

New subject: [Multi-Index] performance difference between lower_bound() and upper_bound()

El 28/11/2013 13:55, Dominique Devienne escribió: By not taking the full-range into account, I was not really getting the "max suffix", so when I was trying to insert a new entry with name "$prefix $max_suffix_int", I was again getting a conflict, which triggered another loop necessary for a different conflict resolution scheme (the new one is supposed to find a non-conflicting name in a single pass, at the cost of scanning the range of possible conflicts to find the max_suffix_int of really conflicting entries - some entries have the same name, butother keys make them non-conflicting). I also tested that both lower_bound(prefix + " :") and upper_bound(prefix + " 9999999999") (I use an int32_t suffix) give me the same upper range iterator, with similar performance. Watch out: depending on how your strings look like, these two bounds are not equivalent. For instance: "p 9999999999" < "p 9999999999 foo" < "p :" I understand you're already taking this into account, but wanted to point it out just in case. Best, Joaquín M López Muñoz Telefónica Digital ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Dominique Devienne

10:26 a.m.

New subject: [Multi-Index] performance difference between lower_bound() and upper_bound()

On Fri, Nov 29, 2013 at 10:34 AM, Joaquín Mª López Muñoz <joaquin@tid.es>wrote:

...

El 28/11/2013 13:55, Dominique Devienne escribió: Watch out: depending on how your strings look like, these two bounds are not equivalent. For instance:

"p 9999999999" < "p 9999999999 foo" < "p :"

I understand you're already taking this into account, but wanted to point it out just in case. Best,

Thanks. Useful reminder. I'm checking the end pointer returned by strtol against the end of the string, so I discard entries which are not strictly prefix + space + integer. --DD

4269

Age (days ago)

4277

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Dominique Devienne
Joaquin M Lopez Munoz
Joaquín Mª López Muñoz