Re: [boost] Interest in a container which can hold multiple data types?

Sorry about the README file, I pretty much just uploaded what I had to github so people can grab the source. Admittedly, things there are lacking. I'll work on filling out the readme with some instructions hopefully today. The main difference between this and a container of boost:any is that the container of boost::any still is a container of one data type, just being that one data type is a very flexible one. This container natively supports insertion of different data types. So, you could do, int my_int = 5; double my_double0 = 9342.132; double my_double1 = 987.654; std::string my_string("This is a string."); std::vector<double> my_vec; my_vector.push_back(123.456); omni my_container; my_container.push_back(my_int); my_container.push_back(my_double); my_container.push_back(my_string); my_container.push_back(my_vector); and the container happily stores the data. You can then access the data through iterators //access all doubles for (auto itr = my_container.begin<double>(); itr != my_containert.end<double>(); ++itr) { std::cout << *itr << std::endl; } or through specifying the data type and index //get the 0th element string std::cout << my_container.at<string>(0) << std::endl; -- James Armstrong

On 5/4/2015 6:33 PM, James Armstrong wrote:
This also works with std::vector<boost::any>.
This could be useful, but wouldn't it be better to write this as generic algorithms capable of working with any container of boost::any, not just vector? Better yet, any container of type erased objects (boost::any, boost::variant, etc.) using something like boost::get<T>()?

I would actually prefer not to use type erasure: make a strong-typed and efficient heterogenous container. Something like this... ? template <typename... Args> struct tuple_vector : std::tuple< std::vector<Args>... > { typename std::tuple<Args...> type; template <typename T> void push_back(const T& value) { std::get< std::vector<T> >(*this).push_back(value); } template <typename T> void push_back( T&& value) { std::get< std::vector<T> >(*this).push_back(value); } }; struct Point { double x; double y; }; struct Line { double x; double y; double a; double l; }; struct Rectangle { double x; double y; double a; double w; double h; }; struct Circle { double x; double y; double r; }; tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Circle{1.0, 1.0, .3} ); shapes.push_back(Rectangle{0.0, 0.0, 0.0, 1.0, 1.0} );

On 5/4/2015 9:50 PM, Thijs (M.A.) van den Berg wrote:
This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.

On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?

I think if you require preservation of ordering then you'll indeed need an extra index, but then you'll lose in speed and performance, and then I would just use a combination of std::vector and boost::variant / boost::any.

2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method). That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.

On 5/5/2015 1:12 PM, TONGARI J wrote:
Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?

I don't think it makes sense to add an index to an tuple_vector, it's more like an unordered heterogenous container, it'll be good at iterating all "B"s though. Every variant will have pros and cons when implemented correctly. In general, the challenge imo is iterating heterogenous containers, a specialized for_each will probably be very handy.

M.a. wrote:
Such container would be the most useful for people wanting to fight with performance degradation caused by some non-perfect memory access patterns, cache misses, etc. There are at least two concepts: 1. A heterogenous container/collection grouping objects of the same type in the same place in memory (homogeneous containers) This is AFAIU something you're planning to implement. Take a look at the poly_collection by Joaquín M López Muñoz http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html 2. A semantically homogeneous container breaking large objects into smaller ones in order to separate the hot and cold data to improve caching I think tuple_vector would be a good name for such container. Some time ago I implemented a dirty version full of hacks only to do some benchmarking: https://github.com/awulkiew/test-tuple_vector The goal was to implement a collection semantically the same as vector of tuples but internally breaking the tuple into elements and storing those elements in separate containers. The tricky part is that the reference type of such collection is not a true reference but a tuple of references, or a similar type which should semantically work like a reference, with collapsing, etc. AFAIK such container could also be used if some use case required to store specific chunks of data in buffers, e.g. in GPU processing where the GPU must be fed with one type of data. Regards, Adam

2015-05-05 23:17 GMT+08:00 Boris Rasin <boris@pointx.org>:
The point is bulk operation. The internal could use some interval/span based structure for the indices , e.g. (A, 3), (B, 3) for [AAABBB]. Traversing the container in order then collapses to tight loops for consecutive data of the same type, that is, one `switch` for multiple elements, so the overhead could be amortized, while for vector<variant>, there's always one switch for one element.

TONGARI J <tongari95 <at> gmail.com> writes:
I wrote something on this some time ago, maybe worth having a look at: http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html Joaquín M López Muñoz Telefónica

On 5/5/2015 11:59 PM, Joaquín M Lopez Munoz wrote:
Yes, I know. I was just pointing out the fact that in your articles you explore performance differences between a container with polymorphic storage and container with storage by value, while we here were discussing possible performance differences between two containers storing data by value.

Boris Rasin <boris <at> pointx.org> writes:
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun: https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0 Timing for for_each on 1k-10M elements (MSVC 12.0): het_collection;vector_variant;sorted_vector_variant; for_each: 1000;0.0225453;0.0822232;0.0840048; 10000;0.0229827;0.0828214;0.0821378; 100000;0.0229448;0.081315;0.0839802; 10000000;0.0250193;0.0906496;0.0879296; So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance. Joaquín M López Muñoz Telefónica

2015-05-06 20:58 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
In case that you misunderstood my previous post, the assumption that when values of the same type lie together could result in better performance was for ordered tuple_vector-based sequence, not for vector_variant. Your benchmark result is indeed what we expected, what I'd like to see is the comparison between *ordered* het_collection and vector_variant.

TONGARI J <tongari95 <at> gmail.com> writes:
Yes, definitely I'm not fully getting your point, my apologies. What do you mean by ordered het_collection? As it is designed het_collection clusters values of the same type together.. Do you mean a version where the order of insertion is arbitrary as in a vector? If so, any sensible design of such a het_collection_v2 would boil down to esentially the same as vector<variant>, I'd say. Joaquín M López Muñoz Telefónica

2015-05-06 21:55 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
What I have in mind is a het_collection with interval/span based structure for ordering. You can think it as a het_collection that also retains the insertion order. And I agree that in general case, vector<variant> is the choice, only when you know the values of the same type tend to lie together beforehand could a ordered het_collection give better performance.

On 5/6/2015 3:58 PM, Joaquin M Lopez Munoz wrote:
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?

Vicente J. Botet Escriba <vicente.botet <at> wanadoo.fr> writes:
Boost.MultiIndex is not powerful enough to support this: multi_index_containers are node-based by design, so there's no way we can have values stored contiguosly (unless via an indirection, which ruins caching). Joaquín M López Muñoz Telefónica

Le 06/05/15 17:36, Joaquin M Lopez Munoz a écrit :
Hi Joaquin, sorry I believed that a direct access index was stored contiguously even if it uses a node base design. After a second thought I see that it is not possible to implement it, at least efficiently. Vicente

My 2cents as a data point related to this discussion: I use polymorphic containers in my projects but I had to implement them myself as the poly_collection was not available before and it don't always match my needs. I realized through usage that I had very different needs for polymorphic containers so maybe it would be interesting to some people here to have some info about use case. I have interest in seeing polymorphic containers in boost too. Here is what I use currently: - polymorphic collection of vectors: This is more or less the poly collection, with no guarantee of order but which does guarantee contiguity of values for one type, so that it is possible to efficiently go through all the values of one type. I tend to use this for several pool-like systems which provide services for sets of data which should be processed by the same "thread". - a container storing only one value per type: I use this for some kind of type-identified-messages board system where I just store the last message value of each type of message in the container. Of course for one type there can be no value too. It's more or less a map<typeid, anyvalue> but the interface is just simpler. - a set, giving guarantee of unique values, but that does take any kind of value type. I have an old review about this, so the code is similar to this: https://gist.github.com/Klaim/10599137 (I guess it could be optimized but I'm not an expert)

2015-05-06 22:41 GMT+08:00 Boris Rasin <boris@pointx.org>:
FWIW, I made a simple proof-of-concept, it's basically a het_collection with ordering. https://gist.github.com/jamboree/5a5797e8869168cc64d9 Maybe someone would give it a performance test.

TONGARI J <tongari95 <at> gmail.com> writes:
Can't you maybe adapt the testing framework at https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0 and run that locally? Joaquín M López Muñoz Telefónica

2015-05-07 18:28 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
https://gist.github.com/jamboree/7fca5045a1bcd880ba4b [MSVC14 RC] variant_sequence(no-order);variant_sequence;variant_sequence(sorted);vector_variant; for_each: 1000;0.00741892;0.033256;0.0234027;0.045456; 10000;0.0073138;0.0325596;0.0230426;0.0449622; 100000;0.00748698;0.0391825;0.0230043;0.045153; 10000000;0.0128108;0.0405801;0.0248128;0.0497122; [g++ 4.9.2/MinGW64] variant_sequence(no-order);variant_sequence;variant_sequence(sorted);vector_variant; for_each: 1000;0.0034023;0.0368542;0.00955831;0.0217586; 10000;0.003645;0.0363854;0.0095947;0.0210316; 100000;0.00367347;0.0425784;0.00957825;0.0213344; 10000000;0.00921831;0.0457604;0.0129568;0.0289718; As expected, variant_sequence performs better than vector_variant when the data is grouped, while it performs worse than vector_variant when the data is dispersed.
participants (9)
-
Adam Wulkiewicz
-
Boris Rasin
-
James Armstrong
-
Joaquin M Lopez Munoz
-
Joaquín M Lopez Munoz
-
Klaim - Joël Lamotte
-
Thijs (M.A.) van den Berg
-
TONGARI J
-
Vicente J. Botet Escriba