On 03/26/2017 10:35 PM, Andrey Semashev via Boost wrote:
On 03/26/17 14:00, Vladimir Batov via Boost wrote:
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
What you're probably seeing is the overhead that comes from inserting in the middle of the vector that is used by flat_set internally. This especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair of pointers. The comparison operation is expensive. So, it appears it plays quite a role during initial construction. When I eliminated that as described and sorted once... huge boost.
Well, if the ordering predicate plays the defining role, I don't quite see where your gain is. Inserting N elements in an empty flat_set or inserting N elements in a vector and then sorting would give you the same number of calls to the predicate, unless I'm missing something.
I must have failed to describe my use-case as it is not "inserting N elements in an empty flat_set". In fact, it is rather inserting quite a number (about 50,000 to 200,000 so far) of value_type instances on one-by-one basis.
I still think that moving around lots of pointers on every insertion (you said the end size was ~50000 elements and I'm assuming normal distribution of insert positions) is the culprit. E.g. towards the end of construction inserting an element in the middle would have to move 50000 * 8 = ~390 KiB of data, which is quite a lot.
It might well be. I did not profile what exactly contributes so much during flat_set construction. I simply chopped the whole construction off (replaced with straight emplace_back) to see huge 30-50% improvement.
Did you try profiling the code?
:-) You are just teasing me, right? :-)
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
No, not quite.
flat_set< int > fs; fs.reserve(5); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
Well, I do understand that insertion into a std::vector invalidates the iterators. :-) I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior. I argued that it was the same. That is, no changes in behavior but a huge construction performance boost. That fact should not probably be simply ignored.
Also, as I mentioned, multiple threads may me calling begin() or find() concurrently. These methods must also have const-qualified versions to be able to be used on non-mutable containers.
I can't see any of it as a fundamental problem. They all addressed in
implementation:
template