Re: [boost] [flat_set] When to sort?

26 Mar 2017

      On 03/26/17 14:00, Vladimir Batov via Boost wrote:
...
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
...
What you're probably seeing is the overhead that comes from inserting
in the middle of the vector that is used by flat_set internally. This
especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair
of pointers. The comparison operation is expensive. So, it appears it
plays quite a role during initial construction. When I eliminated that
as described and sorted once... huge boost.
Well, if the ordering predicate plays the defining role, I don't quite 
see where your gain is. Inserting N elements in an empty flat_set or 
inserting N elements in a vector and then sorting would give you the 
same number of calls to the predicate, unless I'm missing something.

I still think that moving around lots of pointers on every insertion 
(you said the end size was ~50000 elements and I'm assuming normal 
distribution of insert positions) is the culprit. E.g. towards the end 
of construction inserting an element in the middle would have to move 
50000 * 8 = ~390 KiB of data, which is quite a lot. Did you try 
profiling the code?
...
...
No, this won't work in general because these methods are not supposed
to modify the container. In particular, they don't invalidate
iterators or references, and calling these methods concurrently is
thread-safe.
I do not feel those begin() et al will invalidate anything because the
first call to, say, begin() will sort the underlying container once.
After that it all works as usual, right?
No, not quite.

   flat_set< int > fs;
   fs.reserve(5);
   auto it1 = fs.begin();
   auto it2 = fs.insert(42).first;
   auto it3 = f2.begin(); // invalidates it1 and it2

Also, as I mentioned, multiple threads may me calling begin() or find() 
concurrently. These methods must also have const-qualified versions to 
be able to be used on non-mutable containers.