Jeremy Murphy
OK. I might have missed it, but I can't see in the docs an explanation of how to choose L and M for a given data structure and cache line size. Is there a formula? ... I guess I'm concerned that these L and M values limit the ease of portability, by which I mean the additional effort required to consider the performance effects of the parameters on one machine or another is very different to the ease of use of std::vector.
Jeremy, I proposed good default values (L=30, M=60), which could be however changed. The default values are good for most PCs. I've performed experiments on my laptop and found that these values form plateau. I mean, if you change L or M twice, the result changes not more than 7 to 10%. However, if you decrease them strongly (L=4, M=4), you can lose 50% performance, if you increase them strongly (M=500), insertion/deletion will be performed slowly. ___ To be brief, you should use default values ___. (Compare this with vector behavior: you use default growth strategy, but you can change it if you want.) If want to get best performance, you should take into account not only machine, but also operations your application performs, also size of your structure, also memory manager, also typical size of container. If the application likes to read/access, large M is preferable. If it likes to insert/delete, small M is preferable. Default L is good enough unless you want to split/merge containers or perform bulk insertions/deletions (in that case L should be slightly increased). Large L is slightly better for big containers. Not to mention memory allocation: if ptree_seq contains 1 element, it allocates the whole leaf or M+ elements. So mobile apps can prefer smaller L and M for memory saving. ___ To be brief, if you want 100% performance, you should tune L and M during the profiling stage, when the whole application is complete ___.
Well, the point of the bibliography is primarily for anyone looking at your data structure to understand why it is designed that way. A specific reference to a textbook or paper allows us to see that and also potentially see what you missed. If your implementation is based on a Wikipedia article then cite that. Maybe I'm banging the science drum too hard, but I personally find that kind of explicit trail of theory helpful.
I mentioned article in wikipedia about rope just because it helps the common understanding. It shows how to keep sequence in tree-like structure. The idea is quite simple: to keep number of elements in each subtree near the pointer to that subtree. If you insert (delete) element, you modify log(N) subtrees, so you update log(N) numbers. If you want to access an element with given index, you use the search algorithm, which is very similar to tree-search algorithm, but you use these numbers instead of index. Also, I mention rope because it is very respectful and is included into gcc under the name of __gnu_cxx::rope. However, the rest is my optimizations. Firstly, I use btree instead of binary tree, which makes data structure cache-efficient. Secondly, I use some algoritmic tricks, which allow to improve performance (by constant value). The best example is using recursion-free, one-pass update algorithm, which I described on my title page. To be brief, I used an already known concept of keeping sequence in tree- like structure, and then effectively applied some known optimizations to it. Best regards, Aleksandr Kupriianov