[Containers] Complexity of flat_*::insert(first,last)

Hi Ion, This library looks great. A quick comment about range-insertion in the flat containers, e.g.: template<typename InputIterator> void insert(InputIterator first, InputIterator last); Requires: i, j are not iterators into *this. Effects: inserts each element from the range [i,j) . Complexity: N log(size()+N) (N is the distance from i to j) search time plus N*size() insertion time. (Note typo: i,j vs, first,last) What implementation are you assuming to get that complexity? It seems to me that it is worse than it could be if you implement it something like this (PSEUDO-CODE): template<typename InputIterator> void insert(InputIterator first, InputIterator last) { iterator m = end(); insert(end(), first, last); sort(m,end()); inplace_merge(begin(),m,end()); } (I've only ever done this with multi* containers. For set and map you'll need something to avoid duplicates.) Regards, Phil. p.s. Having typed that I have developed a feeling of deja vu. Have I asked you this before?

El 03/08/2011 19:27, Phil Endecott escribió:
Hi Ion,
(Note typo: i,j vs, first,last)
Thanks.
What implementation are you assuming to get that complexity? It seems to me that it is worse than it could be if you implement it something like this (PSEUDO-CODE):
I took the first part from SGI documentation (the standard also says "N log(a.size() + N)") since I was implementing it as a loop of simple insertions: http://www.sgi.com/tech/stl/MultipleAssociativeContainer.html Average complexity for insert range is at most O(N * log(size() + N)), where N is j - i. The second part is vector insertion time in case is a vector front insertion by N. I think it should prepend "at most ". Your code should be: N ¿? //insert + O(N log(N)) //sort (comparisons) + O(size()+N) //inplace_merge (comparisons) if using intermediate memory I think (I'm not an expert and it could depend on N's size) your code has lower complexity (and an additional allocation, I think it could be faster depending on the cost of the copies). I guess we can just measure ;-) Ion PS: For set and map I don't know how to avoid duplicate issue. PS2: I offer some insertion guarantees for duplicates (see N1780): insert(end(), t) //insert at the end of duplicates insert(begin(), t)// insert at from of duplicates http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1780.html insert(first, last) was offering insertion in the end of duplicates (it was just a loop of simple insertions). This algorithm changes this, although the documentation does not offer any guarantee so I guess we can change it. Or use stable_sort (with additional memory N log N like std::sort) to guarantee it (inplace_merge is stable).

On 03/08/11 22:18, Ion Gaztañaga wrote:
El 03/08/2011 19:27, Phil Endecott escribió:
Hi Ion,
(Note typo: i,j vs, first,last)
Thanks.
What implementation are you assuming to get that complexity? It seems to me that it is worse than it could be if you implement it something like this (PSEUDO-CODE):
I took the first part from SGI documentation (the standard also says "N log(a.size() + N)") since I was implementing it as a loop of simple insertions:
http://www.sgi.com/tech/stl/MultipleAssociativeContainer.html
Average complexity for insert range is at most O(N * log(size() + N)), where N is j - i.
The second part is vector insertion time in case is a vector front insertion by N. I think it should prepend "at most ".
Your code should be:
N ¿? //insert + O(N log(N)) //sort (comparisons) + O(size()+N) //inplace_merge (comparisons) if using intermediate memory
I think (I'm not an expert and it could depend on N's size) your code has lower complexity (and an additional allocation, I think it could be faster depending on the cost of the copies).
No; the existing complexity is better. Let S=size() and N=aS. You have N log(S+N) = aS log(S(1+a)) = S(a log(S) + a log(1+a)) < S(a log(S) + 1+2a+a log(a)) (If you don't think that's obvious see <http://www.wolframalpha.com/input/?i=a+Log[1%2Ba]%3C1%2B2a%2Ba+Log[a]>) = S(1+2a + a(log(S)+log(a))) = S + aS(2 + log(aS)) = S + N(2 + log(N)) = N + N log(N) + S + N which is the alternative. Of course, there are constant factors to worry about, but my gut says they work in favour of the existing implementation too. John Bytheway

On 04/08/11 01:41, John Bytheway wrote:
No; the existing complexity is better. Let S=size() and N=aS. You have
N log(S+N) = aS log(S(1+a)) = S(a log(S) + a log(1+a)) < S(a log(S) + 1+2a+a log(a))
(If you don't think that's obvious see <http://www.wolframalpha.com/input/?i=a+Log[1%2Ba]%3C1%2B2a%2Ba+Log[a]>)
Or, if you want to be paranoid about the base of the logarithm: <http://www.wolframalpha.com/input/?i=a+Log[2%2C+1%2Ba]%3C1%2B2a%2Ba+Log[2%2C+a]>
= S(1+2a + a(log(S)+log(a))) = S + aS(2 + log(aS)) = S + N(2 + log(N)) = N + N log(N) + S + N
which is the alternative. Of course, there are constant factors to worry about, but my gut says they work in favour of the existing implementation too.
John Bytheway
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

El 04/08/2011 2:41, John Bytheway escribió:
No; the existing complexity is better. Let S=size() and N=aS. You have
N log(S+N) = aS log(S(1+a)) = S(a log(S) + a log(1+a)) < S(a log(S) + 1+2a+a log(a))
(If you don't think that's obvious see <http://www.wolframalpha.com/input/?i=a+Log[1%2Ba]%3C1%2B2a%2Ba+Log[a]>)
= S(1+2a + a(log(S)+log(a))) = S + aS(2 + log(aS)) = S + N(2 + log(N)) = N + N log(N) + S + N
which is the alternative. Of course, there are constant factors to worry about, but my gut says they work in favour of the existing implementation too.
Thanks! I think I also missed some copies. In sort+merge, first N was suppossed to be about insertion (which is wrong since sort and merge also make copies) and not about sorting (comparisons), so sorting complexity would be: N log(N) + S + N = S(alog(S) + 1+a+a log(a)) Regarding copies: With existing implementation we have some middle insertions for each value (that would be quadratic: N*size()). With merge+sort, we have less copies on first insertion (we insert at vector's end which is at worst linear) but I haven't taken in care the internal copies that sort() and inplace_merge() need to do. And I should use my own sort() and inplace_merge to support movable objects for C++03 compilers. So maybe I'll stick to current implementation for now ;-) Best, Ion PS: And all that without considering the time complexity a new call has, depending on the free blocks the allocator manages ;-)

El 04/08/2011 19:29, Phil Endecott escribió:
Hi John,
John Bytheway wrote:
No; the existing complexity is better. Let S=size() and N=aS. You have
N log(S+N)
Sorry, I'm lost already. What is N log(S+N) supposed to be the complexity of?
I guess it means the number of comparisons when inserting N elements in a sorted vector/tree. N insertions, each one takes logarithmic complexity (log2(S+N) comparisons). Ion

On 04/08/11 18:29, Phil Endecott wrote:
Hi John,
John Bytheway wrote:
No; the existing complexity is better. Let S=size() and N=aS. You have
N log(S+N)
Sorry, I'm lost already. What is N log(S+N) supposed to be the complexity of?
It's what Ion said two posts ago:
I took the first part from SGI documentation (the standard also says "N log(a.size() + N)") since I was implementing it as a loop of simple insertions:
http://www.sgi.com/tech/stl/MultipleAssociativeContainer.html
Average complexity for insert range is at most O(N * log(size() + N)), where N is j - i.
John

John Bytheway wrote:
On 04/08/11 18:29, Phil Endecott wrote:
Hi John,
John Bytheway wrote:
No; the existing complexity is better. Let S=size() and N=aS. You have
N log(S+N)
Sorry, I'm lost already. What is N log(S+N) supposed to be the complexity of?
It's what Ion said two posts ago:
I took the first part from SGI documentation (the standard also says "N log(a.size() + N)") since I was implementing it as a loop of simple insertions:
OK, but that's only the "first part". (I don't know why Ion has chosen to break down the complexity into separate parts for each step. As we all know, O(A+B) is the same as O(A) if A is always larger than B. As your posts show, this causes confusion.) To me it's fairly clear that implementing insert(range) as a simple loop will be O( N * (S+N) ) : you're doing N separate inserts, and each one will have to move on average half of the existing elements along by one. A one-pass algorithm like std::inplace_merge must do much better than that, i.e. O(N log N + S). FYI, the approach that I took in my equivalent class was to require the user to explicitly sort the container after doing inserts. This makes it possible to perform efficient batch updates without a temporary container, at the cost of a more complex and less standard interface. Regards, Phil.

My instinct is that pre-sorting the incoming range is worthwhile. I will now try to convince myself otherwise: template<typename InputIterator> void insert(InputIterator first, InputIterator last) { // however it is done, I expect something like this would be worthwhile: // allocate once, enough space for all: make_room(distance(first, last)); // presorting basically tells us where to start looking when inserting the next item // ie once we insert the first pre-sorted item, we know the next one will be somewhere after that. // Is that equivalent to something like the following, where we do a similar comparison as we go? // imagine insert tells us where it was inserted // or some internal version of insert does. location = insert(first); for (i = ++first; i < last; i++) { if ( *i < *(i - 1) ) location = insert_somewhere_before(location, i); else location = insert_somewhere_after(location, i); } } Then it just looks like this is equivalent to using the previous to pick a pivot point. Which isn't any big win. Or did I do too much slight of hand there? Tony

Gottlob Frege wrote:
My instinct is that pre-sorting the incoming range is worthwhile.
If you're going to do the insert-in-a-loop, that dominates and it is still O( N * (S+N) ). I think sorting brings down the constant by a factor of 2 compared to random data but it doesn't change the big-O. If you're going to do an inplace_merge or similar, that requires that the range is sorted. Phil.

El 03/08/2011 19:27, Phil Endecott escribió:
Hi Ion,
This library looks great. A quick comment about range-insertion in the flat containers, e.g.:
Phil, it would great if you could find time to do a small review of the library, August it's not a good month to find reviewers ;-) Best, Ion
participants (4)
-
Gottlob Frege
-
Ion Gaztañaga
-
John Bytheway
-
Phil Endecott