Hi, Adam!
Date: Sat, 05 Apr 2014 16:59:18 +0200 From: Adam Wulkiewicz
Again, the times you measure are too small to reasonably tell which one is faster. You could perform each test 10k more times to keep the results around 1s.
I believe my calculations are reliable. I'm sorry I did not explained it earlier and did not provide clarification on github. Firstly, the output is time per one operations in milliseconds. E.g., if it says 1.00e-4, it means 1e-7 seconds or 0. 000 000 1 seconds per one operation. The operation performed many times and result is divided. Secondly, in each operation there are many calculations, by both repeating and array growing. If the line before array says "averaged on 500 measurements" and in array size is 1024, it meant that it was 500 arrays and each of them growed from 512 elements to 1024 elements, giving 256000 atomic additions. Each of them took 1.17e-04 ms in average, they all took about 30 ms. Granularity of Linux timer (which is used in test) is 1 ms, so tolerance in the case is 3%.
Instead of comparing it with a different implementation I guess it would be sufficient to contain only 2 children in the internal nodes.
Well, I in fact found that numbers 14/28 are near optimum. (There are several optimums for different operations.) Theory agrees with practice, the best performance is when leaf/branch fits one cache line. (In fact, my container allows minimum L=4 M=4, each branch and leaf may have from 2 to 4 items. It behaves worse generally.) I tried to compare it with rope because I was asked to do so. I'll compare it with other similar containers as well. Rope has other problems such as balancing when critical and so on. Best regards, A.K.