[container] New container: why is everybody silent?

newer
[Serialization] link error in 1.56

Alexander Kuprijanov

14 Jul 2014 14 Jul '14

6:28 p.m.

Hi boost community! Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor). It's intended to replace std::vector in cases, when an application requires multiple insertion/deletion. It takes O(log(N)) for insertion/deletion. Docs: http://alexkupri.github.com/array/ Code: https://github.com/alexkupri/array Best regards, Aleksandr Kupriianov

Show replies by date

Klaim - Joël Lamotte

14 Jul 14 Jul

7:18 p.m.

I think that most Boost library devs are currently focused on fixing the issues with the the incoming release which is a bit difficult to get because it's the first time there is a release with the git-ified boost sources. If you don't get any feedback right now, try again after the release is done?

Christopher Kormanyos

8:24 p.m.

...

Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

...

...
I think that most Boost library devs are currently focused on fixing the issues with the the incoming release If you don't get any feedback right now, try again after the release is done?

I am interested in using the container on a microcontroller, for which performance is crucial. I intend to use your container in combination with a custom allocator. I think folks are still working very hard on the release. I am also very busy with normal work and the GSoC 2014 program. So could you please grant us maybe a bit more time? Cheers Chris On Monday, July 14, 2014 8:28 PM, Alexander Kuprijanov <alexkupri@gmail.com> wrote: Hi boost community! Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor). It's intended to replace std::vector in cases, when an application requires multiple insertion/deletion. It takes O(log(N)) for insertion/deletion. Docs: http://alexkupri.github.com/array/ Code: https://github.com/alexkupri/array Best regards, Aleksandr Kupriianov _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Aleksandr Kupriianov

15 Jul 15 Jul

12:20 p.m.

Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

...
Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

I am interested in using the container on a microcontroller, for which performance is crucial. I intend to use your container in combination with a custom allocator.

I think folks are still working very hard on the release. I am also very busy with normal work and the GSoC 2014 program. So could you please grant us maybe a bit more time?

Cheers Chris

Hi Chris! Thank you for your answer. If you have any questions regarding the container, feel free to ask me. By the way, if you are going to use it on microcontroller, other template parameters L,M might be better (smaller, I think), also default are good enough, I believe. I mean: btree_seq<Your_Class,15,30>. Everybody! If you are going to use it, just tell me: Alex, I'm using it, it's cool (or not so cool). Best regards, Aleksandr Kupriianov

Christopher Kormanyos

7:38 p.m.

...

...
...
Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

...

...
I am interested in using the container on a microcontroller, for which performance is crucial. I intend to use your container in combination with a custom allocator.

Cheers Chris

...

Hi Chris! Thank you for your answer. If you have any questions regarding the container, feel free to ask me.

Hi Alex, I can provide you with my benchmark results below. Please do not be discouraged by the potentially disappointing report, as I suspect my benchmark might be stressing the node-linked list in the wrong areas --- and with the wrong competition (std::vector). I tested the container with a floating-point benchmark that also performs numerous sequential operations on iterators ranging from [begin, end). I used a desktop PC with Microsoft Visual Studio 2013 and an ARM(R) A8 microcontroller with GCC 4.8.2. I used the container with a custom allocator having a memory pool of 512 bytes. Here is my report and the interpretation of the report below the report: --------------------------------------------------------------- I found many warnings in both GCC as well as VC: - redundant semicolons (empty statements) - local variables shadowing class members. - non-explicit conversion to/from signed/unsigned - signed / unsigned comparison - unary minus applied to unsigned type - unreferenced local variables I stripped the implementation: * I removed all exception handling and asserts. I identified the container uses in the benchmark: * Create two containers. * One container has 4 floats, the other has 5 floats. Do in a loop about 10 times the following: * On each container apply std::accumulate with std::multiplies. * On each container apply std::for_each with element incrementation. * Do the normal floating-point operations and loop management. I use a custom allocator with 512 byte pool: * typedef util::ring_allocator<T> allocator_type; I use either btree_seq or std::vector: * typedef btree_seq<T, 12, 24, allocator_type> container_type; * typedef std::vector<T, allocator_type> container_type; I perform the numerical correctness analysis: * Run on PC with VS2012. Result OK. * Run on ARM A8 with GCC. Result OK. I perform the size analysis on the ARM(R): * using neither : 7344 byte * using std::vector : 8928 byte, delta = 1584 * using btree_seq : 15968 byte, delta = 8624 I perform the runtime analysis on the ARM(R): * using std::vector : 4.4 microseconds * using btree_seq : 5.3 microseconds ----------------------------------------------------------------------------- Interpretation: The btree_seq obtains the correct result and can be used on a PC as well as microcontroller with a custom allocator. This is very positive! The btree_seq creates much more code than the corresponding use of std::vector. In addition, btree_seq runs significantly slower than std::vector in this particular benchmark. One possible explanation for this might be due to the large overhead of creating a linked list in the constructor. This is not consistent with your expected result and deserves further investigation. The code could be potentially cleaned up significantly. For this, build at very high warning levels and eliminate all warnings. As mentioned earlier, some C++11 awareness might be mandatory. An awareness of move construction, nullptr where appropriate, and construction / equating with std::initializer_list. Boost has C++11 compiler switches for these and more C++11 features. If you would like to see the code of the benchmark, it is available in the public domain. If you would like me to provide more measurement data, I might have time to compare with std::deque. I could also potentially provide you with a greater granularity regarding where the cycles and the bytes get lost with btree_seq. Please do not be discouraged by this report. It is only stressing a few use-cases, and certainly does not test btree_seq in its entirety. I encourage you to keep going. Cheers Chris On Tuesday, July 15, 2014 2:20 PM, Aleksandr Kupriianov <alexkupri@gmail.com> wrote: Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

...
Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

I am interested in using the container on a microcontroller, for which performance is crucial. I intend to use your container in combination with a custom allocator.

I think folks are still working very hard on the release. I am also very busy with normal work and the GSoC 2014 program. So could you please grant us maybe a bit more time?

Cheers Chris

Aleksandr Kupriianov

16 Jul 16 Jul

6:42 a.m.

Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

Hi Alex,

I tested the container with a floating-point benchmark that also performs numerous sequential operations on iterators ranging from [begin, end).

Here is my report and the interpretation of the report below the report:

---------------------------------------------------------------

I found many warnings in both GCC as well as VC:

I stripped the implementation: * I removed all exception handling and asserts.

I identified the container uses in the benchmark: * Create two containers. * One container has 4 floats, the other has 5 floats.

Do in a loop about 10 times the following: * On each container apply std::accumulate with std::multiplies. * On each container apply std::for_each with element incrementation. * Do the normal floating-point operations and loop management.

I use a custom allocator with 512 byte pool: * typedef util::ring_allocator<T> allocator_type;

I use either btree_seq or std::vector: * typedef btree_seq<T, 12, 24, allocator_type> container_type; * typedef std::vector<T, allocator_type> container_type;

I perform the numerical correctness analysis: * Run on PC with VS2012. Result OK. * Run on ARM A8 with GCC. Result OK.

I perform the size analysis on the ARM(R): * using neither : 7344 byte * using std::vector : 8928 byte, delta = 1584 * using btree_seq : 15968 byte, delta = 8624

I perform the runtime analysis on the ARM(R): * using std::vector : 4.4 microseconds * using btree_seq : 5.3 microseconds

---------------------------------------------------------------------------- -

...

Interpretation:

The btree_seq obtains the correct result and can be used on a PC as well as microcontroller with a custom allocator. This is very positive!

The btree_seq creates much more code than the corresponding use of std::vector. In addition, btree_seq runs significantly slower than std::vector in this particular benchmark.

The code could be potentially cleaned up significantly. For this, build at very high warning levels and eliminate all warnings.

As mentioned earlier, some C++11 awareness might be mandatory. An awareness of move construction, nullptr where appropriate, and construction / equating with std::initializer_list. Boost has C++11 compiler switches for these and more C++11 features.

If you would like to see the code of the benchmark, it is available in the public domain.

I encourage you to keep going.

Cheers Chris

Hi, Chris! 1) Thank you for spending some time to test my container. 2) Performance. Yes, for iterative and random-access the vector is slightly better (but only slightly, I'm also encouraged by 5.3 : 4.2 score). Vector is champion for these operations, and probably nobody can outperform it. But as soon as insertions/deletions are necessary, my container can outperform. 3) Real app. Are you writing the real app for microcontroller now? If you use insertions/deletions, especially random ones, my container can significantly outperform. Even if you just use push_back, my thing can slightly outperform. However, if you use iterative-access, vector wins, but not too much. I suggest you to try to use my thing in the real app. 4) Benchmark - can you provide a link? 5) C++11 - fully accepted. I'll deal with it soon. 6) Warnings - fully accepted. Best regards, and thank you again Aleksandr

Christopher Kormanyos

8:26 p.m.

...

...
I tested the container with a floating-point benchmark that also performs numerous sequential operations on iterators ranging from [begin, end).

...

2) Performance. Yes, for iterative and random-access the vector is slightly better (but only slightly, I'm also encouraged by 5.3 : 4.2 score). Vector is champion for these operations, and probably nobody can outperform it. But as soon as insertions/deletions are necessary, my container can outperform.

I agree. Then it is important to provide the users with information regarding where btree_seq is most efficient, and where it is less efficient. Some library developers might wonder how btree_seq is different from std::list. What missing niche in the STL does btree_seq intend to fill?

...

3) Real app. Are you writing the real app for microcontroller now? If you use insertions/deletions, especially random ones, my container can significantly outperform.

Yes. I can work out a test case.

...

Even if you just use push_back, my thing can slightly outperform. However, if you use iterative-access, vector wins, but not too much. I suggest you to try to use my thing in the real app.

Yes. That's aood idea.

...

4) Benchmark - can you provide a link?

Yes.

...

5) C++11 - fully accepted. I'll deal with it soon.

OK. Give me a few days for cleanup. It's all at Github. Links will follow.

...

6) Warnings - fully accepted Some very high GCC Warning settings are shown below.

-Wall \ -Wextra \ -pedantic \ -Wmain \ -Wundef \ -Wsign-conversion \ -Wunused-parameter \ -Wuninitialized \ -Wshadow \ -Wunreachable-code \ -Wswitch-default \ -Wswitch-enum \ -Wcast-align \ -Wmissing-include-dirs \ -Winit-self \ -Wfloat-equal \ -Wdouble-promotion If you have -std=c++11, then you can also use: -Wzero-as-null-pointer-constant I also tested btree_seq on an 8-bit Atmel(R) AVR(R) microcontroller today. It was about 500 times slower than the ARM(R) core, but got the right results. So now btree_seq has seen some action on 8-bit, 32-bit. and 64-bit systems. ----------------------------------------------------------------- 8-bit Atmel(R) AVR(R) microcontroller Size analysis: * using neither : 2240 byte * using std::vector : 4448 byte, delta = 2208 * using btree_seq : 13504 byte, delta = 11264 Runtime analysis: * using std::vector : 1500 microseconds * using btree_seq : 2200 microseconds ----------------------------------------------------------------- Cheers Chris On Wednesday, July 16, 2014 8:42 AM, Aleksandr Kupriianov <alexkupri@gmail.com> wrote: Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

Hi Alex,

I tested the container with a floating-point benchmark that also performs numerous sequential operations on iterators ranging from [begin, end).

Here is my report and the interpretation of the report below the report:

---------------------------------------------------------------

I found many warnings in both GCC as well as VC:

I stripped the implementation: * I removed all exception handling and asserts.

I identified the container uses in the benchmark: * Create two containers. * One container has 4 floats, the other has 5 floats.

Do in a loop about 10 times the following: * On each container apply std::accumulate with std::multiplies. * On each container apply std::for_each with element incrementation. * Do the normal floating-point operations and loop management.

I use a custom allocator with 512 byte pool: * typedef util::ring_allocator<T> allocator_type;

I use either btree_seq or std::vector: * typedef btree_seq<T, 12, 24, allocator_type> container_type; * typedef std::vector<T, allocator_type> container_type;

I perform the numerical correctness analysis: * Run on PC with VS2012. Result OK. * Run on ARM A8 with GCC. Result OK.

I perform the size analysis on the ARM(R): * using neither : 7344 byte * using std::vector : 8928 byte, delta = 1584 * using btree_seq : 15968 byte, delta = 8624

I perform the runtime analysis on the ARM(R): * using std::vector : 4.4 microseconds * using btree_seq : 5.3 microseconds

---------------------------------------------------------------------------- -

...

Interpretation:

The btree_seq obtains the correct result and can be used on a PC as well as microcontroller with a custom allocator. This is very positive!

The btree_seq creates much more code than the corresponding use of std::vector. In addition, btree_seq runs significantly slower than std::vector in this particular benchmark.

The code could be potentially cleaned up significantly. For this, build at very high warning levels and eliminate all warnings.

As mentioned earlier, some C++11 awareness might be mandatory. An awareness of move construction, nullptr where appropriate, and construction / equating with std::initializer_list. Boost has C++11 compiler switches for these and more C++11 features.

If you would like to see the code of the benchmark, it is available in the public domain.

I encourage you to keep going.

Cheers Chris

Aleksandr Kupriianov

17 Jul 17 Jul

11:44 a.m.

Hi, Chris! Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

I agree. Then it is important to provide the users with information regarding where btree_seq is most efficient, and where it is less efficient.

Thank you so much for testing my thing. External testing is very important psychological thing, because my own tests can be too gentle. Random insertions and deletions are things where it's champion.

...

Some library developers might wonder how btree_seq is different from std::list. What missing niche in the STL does btree_seq intend to fill?

Shortly speaking my thing is between list and vector. It sells everything for O(log(N)) price. In the list you can insert something in the middle for a constant time, only if you have an iterator, but going into the middle will take O(N) time. If you insert in the middle using vector, it will require O (N) anyway. Another niche is possible porting string on its basis. I.e., something like this: typedef string_abstraction<char,vector> string; typedef string_abstraction<char,btree_seq> scalable_string;

...

8-bit Atmel(R) AVR(R) microcontroller

Size analysis: * using neither : 2240 byte * using std::vector : 4448 byte, delta = 2208 * using btree_seq : 13504 byte, delta = 11264

Nevin Liber

6:06 p.m.

On 17 July 2014 06:44, Aleksandr Kupriianov <alexkupri@gmail.com> wrote:

...

...
Some library developers might wonder how btree_seq is different from std::list. What missing niche in the STL does btree_seq intend to fill?

Shortly speaking my thing is between list and vector.

Instead of yet another container, have you thought about adding another kind of index to Boost.Multi-index? The big benefit to that is that it makes it very easy to compare different types of indices with real world data. I've done this kind of thing with sequenced indices vs. random access indices.

...

Another niche is possible porting string on its basis. I.e., something like this:

typedef string_abstraction<char,vector> string; typedef string_abstraction<char,btree_seq> scalable_string;

Unlikely to be that useful, as the iterator invalidation and contiguous guarantees are vastly different in the two cases. -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Christopher Kormanyos

8:47 p.m.

...

...
8-bit Atmel(R) AVR(R) microcontroller

Size analysis: * using neither : 2240 byte * using std::vector : 4448 byte, delta = 2208 * using btree_seq : 13504 byte, delta = 11264

...

I'm not sure I understand your words "using neither", that's why I asked for a link. Yes, size for 8-bit Atmel is a little bit unexpected; normally it takes roughly 2 times more memory than vector.

The sizes refer to the actual size of constant program code. This is something that most PC programmers rarely consider in normal development. When I write "neither", it means the size of the microcontroller program code without the benchmark part active. In other words, there is neither vector nor btree_seq being used because noe of the benchmark code is active. The benchmark is linked below. Actually it is a stronger benchmark for floating-point portability rather than for containers. But it is well-tested code, and i have it available. So I decided to use it for a preliminary look at btree_seq. The benchmark is called within ugly compiler switches when these lines are un-commented. //#define CFG_USE_APP_BENCHMARK_FPU // #define CFG_APP_BENCHMARK_FPU_TYPE CFG_APP_BENCHMARK_FPU_TYPE_HYPERG https://github.com/ckormanyos/real-time-cpp/blob/master/ref_app/src/app/benc... The code for the mathematical hypergeometric function in the actual benchmark run is here. https://github.com/ckormanyos/real-time-cpp/blob/master/ref_app/src/app/benc... The function hypergeometric_pfq is being called. I can (and maybe should) work out some simpler test cases that more clearly expose the functionality of btree_seq. Cheers, Chris On Thursday, July 17, 2014 1:45 PM, Aleksandr Kupriianov <alexkupri@gmail.com> wrote: Hi, Chris! Christopher Kormanyos <e_float <at> yahoo.com> writes:

...

I agree. Then it is important to provide the users with information regarding where btree_seq is most efficient, and where it is less efficient.

Thank you so much for testing my thing. External testing is very important psychological thing, because my own tests can be too gentle. Random insertions and deletions are things where it's champion.

...

Some library developers might wonder how btree_seq is different from std::list. What missing niche in the STL does btree_seq intend to fill?

...

8-bit Atmel(R) AVR(R) microcontroller

Size analysis: * using neither : 2240 byte * using std::vector : 4448 byte, delta = 2208 * using btree_seq : 13504 byte, delta = 11264

I'm not sure I understand your words "using neither", that's why I asked for a link. Yes, size for 8-bit Atmel is a little bit unexpected; normally it takes roughly 2 times more memory than vector. Thank you 1e10 times for testing, Best regards, Aleksandr _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ion Gaztañaga

14 Jul 14 Jul

8:41 p.m.

El 14/07/2014 20:28, Alexander Kuprijanov escribió:

...

Hi boost community!

Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

Sorry, I have your previous post on my to-do list but no time to even read a bit about it. To make it part of Boost.Container, it should support portable non-raw pointers, move semantics, C++11 allocators and other features supported by Boost.Container. A btree based data structure seems useful. However, in your links, I see little documentation on how the data structure works, how you mapped an associative container into a sequence container, the implementation of insert,erase,operator[]... That would help a lot to receive more feedback. Best, Ion

Aleksandr Kupriianov

15 Jul 15 Jul

12:26 p.m.

Ion Gaztañaga <igaztanaga <at> gmail.com> writes:

...

El 14/07/2014 20:28, Alexander Kuprijanov escribió:

...
Hi boost community!

Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

To make it part of Boost.Container, it should support portable non-raw pointers, move semantics, C++11 allocators and other features supported by Boost.Container.

A btree based data structure seems useful. However, in your links, I see little documentation on how the data structure works, how you mapped an associative container into a sequence container, the implementation of insert,erase,operator[]... That would help a lot to receive more feedback.

Hi Ion! Thank you for feedback. I'll take your advice and add description firstly and C ++11 features then. By the way, it's not exactly "mapping sequence on associative container". It's implementation of the sequence using tree-like structure. Best regards, Aleksandr

Aleksandr Kupriianov

24 Jul 24 Jul

1:58 p.m.

Ion Gaztañaga <igaztanaga <at> gmail.com> writes:

...

A btree based data structure seems useful. However, in your links, I see little documentation on how the data structure works, how you mapped an associative container into a sequence container, the implementation of insert,erase,operator[]... That would help a lot to receive more feedback.

Hi Ion! The documentation added: http://alexkupri.github.io/array/ C++11 features coming soon. Best regards, Aleksandr

Aleksandr Kupriianov

11 Aug 11 Aug

3:52 p.m.

Ion Gaztañaga <igaztanaga <at> gmail.com> writes:

...

Sorry, I have your previous post on my to-do list but no time to even read a bit about it.

To make it part of Boost.Container, it should support portable non-raw pointers, move semantics, C++11 allocators and other features supported by Boost.Container.

Hi Ion! All requirements fulfilled: documentation, license, c++11, warnings suppressed. http://alexkupri.github.io/array/ Please, add it to review schedule. Best regards, Aleksandr

Glen Fernandes

12 Aug 12 Aug

4:23 a.m.

On Mon, Aug 11, 2014 at 8:52 AM, Aleksandr Kupriianov <alexkupri@gmail.com> wrote:

...

All requirements fulfilled: documentation, license, c++11, warnings suppressed. http://alexkupri.github.io/array/

It does not yet support the C++11 allocator model. Some changes required: 1. Do not call .construct or .destroy directly, but only via std::allocator_trails Note: If you want to support both C++11 and C++03 compilers then do this when BOOST_NO_CXX11_ALLOCATOR is _not_ defined - and when it is defined, use placement new for construction Note: Do this only for construction of value_type objects; for all other construction use placement new 2. Do not use the ::pointer member types of the allocator type directly. Use std::allocator_traits to obtain these types. 3. A C++11 allocator's 'pointer' type may not be a raw pointer, so do not make that assumption. p = a.allocate(n); - p should be the allocator's ::pointer type. If you need a raw pointer from it, std::addressof(*p) will get you that address - but preserve the value of p for as long as you keep that address around, because p is what you will give back to .deallocate(). Note: Both std::addressof and std::allocator_traits mentioned have Boost equivalents; boost::addressof and boost::allocator_traits; if you want to avoid the BOOST_NO_CXX11_* checks and conditional code for each. Glen

Nigel Stewart

16 Aug 16 Aug

5:06 a.m.

Just curious if this container might be better suited to sorting (and re-sorting) than a std::vector. And, I guess, for containers of large structs that are relatively expensive to copy or swap by value? - Nigel

Jeremy Murphy

6:26 a.m.

I'm pretty excited by this container; I'll start testing it as soon as I find an excuse for it. Did you create it as part of a thesis? If so, it would be nice to see the thesis too. If not, it would be nice to see some discussion about which/whose B-Tree you have implemented and a bibliography. I'm not intimately familiar with this literature but a question that immediately comes to mind is whether your implementation is cache-oblivious and why. My limited understanding is that a CO structure would be more general and thus work on supercomputers and microcontrollers equally well without the programmer needing to tweak the L and M parameters. I still need to read the relevant paper [1] and your implementation in detail. Cheers. Jeremy [1] http://epubs.siam.org/doi/abs/10.1137/S0097539701389956 http://www.cc.gatech.edu/~bader/COURSES/UNM/ece637-Fall2003/papers/BDF00.pdf

Aleksandr Kupriianov

18 Aug 18 Aug

9:24 a.m.

Jeremy Murphy <jeremy.william.murphy <at> gmail.com> writes:

...

Did you create it as part of a thesis? If so, it would be nice to see the thesis too. If not, it would be nice to see some discussion about which/whose B-Tree you have implemented and a bibliography. I'm not intimately familiar with this literature but a question that immediately comes to mind is whether your implementation is cache-oblivious and why.

1) It's not a part of a thesis. 2) Well, it is cache-efficient, not cache-oblivious. (In other words, you should choose L and M so, that they fit well for a desired data structure and size of cache line. If it was cache-oblivious, it would work for all sizes of cache line.) Cache-oblivious implementation (like in your reference) would probably require custom allocator and might trigger problems with fragmentation. On the other hand, many processors have small size of cache line (for example, i7 has 64 bytes for L1, L2 and L3), so I don't think that cache-oblivious implementation is necessary. 3) I just combined concept of sequence container and b-tree for access. I did not use any existing concepts or code, just did it from scratch. There are many implementations of similar concepts, for example, __gnu_cxx::rope. rope combines binary tree and sequence in the similar way. There is an article in wikipedia on the rope. There is a comparison of my container with rope in my main.cpp, although it is switched off.

Jeremy Murphy

20 Aug 20 Aug

1:15 a.m.

On 18 August 2014 19:24, Aleksandr Kupriianov <alexkupri@gmail.com> wrote:

...

2) Well, it is cache-efficient, not cache-oblivious. (In other words, you should choose L and M so, that they fit well for a desired data structure and size of cache line. If it was cache-oblivious, it would work for all sizes of cache line.) Cache-oblivious implementation (like in your reference) would probably require custom allocator and might trigger problems with fragmentation. On the other hand, many processors have small size of cache line (for example, i7 has 64 bytes for L1, L2 and L3), so I don't think that cache-oblivious implementation is necessary.

OK. I might have missed it, but I can't see in the docs an explanation of how to choose L and M for a given data structure and cache line size. Is there a formula? Knowing data structure size sounds reasonable but how many people know the size of their cache line? And do I understand correctly that since it is specified as a compile-time constant, the value in the source code needs to be changed or at least checked when compiled on a given machine for the first time? I guess I'm concerned that these L and M values limit the ease of portability, by which I mean the additional effort required to consider the performance effects of the parameters on one machine or another is very different to the ease of use of std::vector. 3) I just combined concept of sequence container and b-tree for access. I

...

did not use any existing concepts or code, just did it from scratch. There are many implementations of similar concepts, for example, __gnu_cxx::rope. rope combines binary tree and sequence in the similar way. There is an article in wikipedia on the rope. There is a comparison of my container with rope in my main.cpp, although it is switched off.

Well, the point of the bibliography is primarily for anyone looking at your data structure to understand why it is designed that way. A specific reference to a textbook or paper allows us to see that and also potentially see what you missed. If your implementation is based on a Wikipedia article then cite that. Maybe I'm banging the science drum too hard, but I personally find that kind of explicit trail of theory helpful. Jeremy

Aleksandr Kupriianov

25 Aug 25 Aug

6:48 a.m.

Jeremy Murphy <jeremy.william.murphy <at> gmail.com> writes:

...

OK. I might have missed it, but I can't see in the docs an explanation of how to choose L and M for a given data structure and cache line size. Is there a formula? ... I guess I'm concerned that these L and M values limit the ease of portability, by which I mean the additional effort required to consider the performance effects of the parameters on one machine or another is very different to the ease of use of std::vector.

Jeremy, I proposed good default values (L=30, M=60), which could be however changed. The default values are good for most PCs. I've performed experiments on my laptop and found that these values form plateau. I mean, if you change L or M twice, the result changes not more than 7 to 10%. However, if you decrease them strongly (L=4, M=4), you can lose 50% performance, if you increase them strongly (M=500), insertion/deletion will be performed slowly. ___ To be brief, you should use default values ___. (Compare this with vector behavior: you use default growth strategy, but you can change it if you want.) If want to get best performance, you should take into account not only machine, but also operations your application performs, also size of your structure, also memory manager, also typical size of container. If the application likes to read/access, large M is preferable. If it likes to insert/delete, small M is preferable. Default L is good enough unless you want to split/merge containers or perform bulk insertions/deletions (in that case L should be slightly increased). Large L is slightly better for big containers. Not to mention memory allocation: if ptree_seq contains 1 element, it allocates the whole leaf or M+ elements. So mobile apps can prefer smaller L and M for memory saving. ___ To be brief, if you want 100% performance, you should tune L and M during the profiling stage, when the whole application is complete ___.

...

Well, the point of the bibliography is primarily for anyone looking at your data structure to understand why it is designed that way. A specific reference to a textbook or paper allows us to see that and also potentially see what you missed. If your implementation is based on a Wikipedia article then cite that. Maybe I'm banging the science drum too hard, but I personally find that kind of explicit trail of theory helpful.

I mentioned article in wikipedia about rope just because it helps the common understanding. It shows how to keep sequence in tree-like structure. The idea is quite simple: to keep number of elements in each subtree near the pointer to that subtree. If you insert (delete) element, you modify log(N) subtrees, so you update log(N) numbers. If you want to access an element with given index, you use the search algorithm, which is very similar to tree-search algorithm, but you use these numbers instead of index. Also, I mention rope because it is very respectful and is included into gcc under the name of __gnu_cxx::rope. However, the rest is my optimizations. Firstly, I use btree instead of binary tree, which makes data structure cache-efficient. Secondly, I use some algoritmic tricks, which allow to improve performance (by constant value). The best example is using recursion-free, one-pass update algorithm, which I described on my title page. To be brief, I used an already known concept of keeping sequence in tree- like structure, and then effectively applied some known optimizations to it. Best regards, Aleksandr Kupriianov

Aleksandr Kupriianov

18 Aug 18 Aug

9:03 a.m.

Nigel Stewart <nigels.com <at> gmail.com> writes:

...

Just curious if this container might be better suited to sorting (and re-sorting) than a std::vector.

btree_seq is better than vector in insert/delete intensive applications. Since sorting does not require insert/delete, btree_seq will be comparable (but slower) with vector in sorting.

...

And, I guess, for containers of large structs that are relatively expensive to copy or swap by value?

I expect that after including btree_seq into boost, ptr_btree_seq will be built upon the btree_seq, like ptr_vector upon vector. Leaves of ptr_btree_seq will physically contain only pointers to the large objects, but the container will have the same interface as btree_seq. (This should not take much effort, since ptr_* implementation already exists for vector.)

Vadim Stadnik

15 Jul 15 Jul

12:35 p.m.

On Tue, Jul 15, 2014 at 4:28 AM, Alexander Kuprijanov <alexkupri@gmail.com> wrote:

...

Hi boost community!

Please tell me something about my container, don't be silent. I firmly intend to offer it to boost, because it's much faster than competitors (by a constant factor).

It's intended to replace std::vector in cases, when an application requires multiple insertion/deletion. It takes O(log(N)) for insertion/deletion.

Docs: http://alexkupri.github.com/array/ Code: https://github.com/alexkupri/array

Hi Alexander, I have already made a few comments. I'd like say a bit more about one of previous comments. Normally, Boost community does NOT review projects published without Boost license. Are you going to add Boost license to your project? Regards, Vadim Stadnik

Aleksandr Kupriianov

16 Jul 16 Jul

7 a.m.

Vadim Stadnik <vadimstdk <at> gmail.com> writes:

...

Hi Alexander,

I have already made a few comments. I'd like say a bit more about one of previous comments.

Normally, Boost community does NOT review projects published without Boost license. Are you going to add Boost license to your project?

Regards, Vadim Stadnik

Hi, Vadim! Thank you for your kind words. 1) License - accepted. 2) Documentation - accepted (coming soon). 3) Examples - accepted (after adding c++11 features). Best regards, Aleksandr

3971

Age (days ago)

4013

Last active (days ago)

List overview

Download

22 comments

10 participants

participants (10)

Aleksandr Kupriianov
Alexander Kuprijanov
Christopher Kormanyos
Glen Fernandes
Ion Gaztañaga
Jeremy Murphy
Klaim - Joël Lamotte
Nevin Liber
Nigel Stewart
Vadim Stadnik