Re: [boost] Interest in a container which can hold multiple data types?
Sorry about the README file, I pretty much just uploaded what I had to github so people can grab the source. Admittedly, things there are lacking. I'll work on filling out the readme with some instructions hopefully today. The main difference between this and a container of boost:any is that the container of boost::any still is a container of one data type, just being that one data type is a very flexible one. This container natively supports insertion of different data types. So, you could do, int my_int = 5; double my_double0 = 9342.132; double my_double1 = 987.654; std::string my_string("This is a string."); std::vector<double> my_vec; my_vector.push_back(123.456); omni my_container; my_container.push_back(my_int); my_container.push_back(my_double); my_container.push_back(my_string); my_container.push_back(my_vector); and the container happily stores the data. You can then access the data through iterators //access all doubles for (auto itr = my_container.begin<double>(); itr != my_containert.end<double>(); ++itr) { std::cout << *itr << std::endl; } or through specifying the data type and index //get the 0th element string std::cout << my_container.at<string>(0) << std::endl; -- James Armstrong
On 5/4/2015 6:33 PM, James Armstrong wrote:
The main difference between this and a container of boost:any is that the container of boost::any still is a container of one data type, just being that one data type is a very flexible one. This container natively supports insertion of different data types. So, you could do,
int my_int = 5; double my_double0 = 9342.132; double my_double1 = 987.654; std::string my_string("This is a string."); std::vector<double> my_vec; my_vector.push_back(123.456);
omni my_container;
my_container.push_back(my_int); my_container.push_back(my_double); my_container.push_back(my_string); my_container.push_back(my_vector);
and the container happily stores the data.
This also works with std::vector<boost::any>.
You can then access the data through iterators
//access all doubles for (auto itr = my_container.begin<double>(); itr != my_containert.end<double>(); ++itr) { std::cout << *itr << std::endl; }
or through specifying the data type and index
//get the 0th element string std::cout << my_container.at<string>(0) << std::endl;
This could be useful, but wouldn't it be better to write this as generic algorithms capable of working with any container of boost::any, not just vector? Better yet, any container of type erased objects (boost::any, boost::variant, etc.) using something like boost::get<T>()?
On 04 May 2015, at 18:51, Boris Rasin <boris@pointx.org> wrote:
On 5/4/2015 6:33 PM, James Armstrong wrote: The main difference between this and a container of boost:any is that the container of boost::any still is a container of one data type, just being that one data type is a very flexible one. This container natively supports insertion of different data types. So, you could do,
int my_int = 5; double my_double0 = 9342.132; double my_double1 = 987.654; std::string my_string("This is a string."); std::vector<double> my_vec; my_vector.push_back(123.456);
omni my_container;
my_container.push_back(my_int); my_container.push_back(my_double); my_container.push_back(my_string); my_container.push_back(my_vector);
and the container happily stores the data.
This also works with std::vector<boost::any>.
You can then access the data through iterators
//access all doubles for (auto itr = my_container.begin<double>(); itr != my_containert.end<double>(); ++itr) { std::cout << *itr << std::endl; }
or through specifying the data type and index
//get the 0th element string std::cout << my_container.at<string>(0) << std::endl;
This could be useful, but wouldn't it be better to write this as generic algorithms capable of working with any container of boost::any, not just vector? Better yet, any container of type erased objects (boost::any, boost::variant, etc.) using something like boost::get<T>()?
I would actually prefer not to use type erasure: make a strong-typed and efficient heterogenous container. Something like this... ? template <typename... Args> struct tuple_vector : std::tuple< std::vector<Args>... > { typename std::tuple<Args...> type; template <typename T> void push_back(const T& value) { std::get< std::vector<T> >(*this).push_back(value); } template <typename T> void push_back( T&& value) { std::get< std::vector<T> >(*this).push_back(value); } }; struct Point { double x; double y; }; struct Line { double x; double y; double a; double l; }; struct Rectangle { double x; double y; double a; double w; double h; }; struct Circle { double x; double y; double r; }; tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Circle{1.0, 1.0, .3} ); shapes.push_back(Rectangle{0.0, 0.0, 0.0, 1.0, 1.0} );
On 5/4/2015 9:50 PM, Thijs (M.A.) van den Berg wrote:
I would actually prefer not to use type erasure: make a strong-typed and efficient heterogenous container. Something like this... ?
template <typename... Args> struct tuple_vector : std::tuple< std::vector<Args>... > { typename std::tuple<Args...> type;
template <typename T> void push_back(const T& value) { std::get< std::vector<T> >(*this).push_back(value); }
template <typename T> void push_back( T&& value) { std::get< std::vector<T> >(*this).push_back(value); } };
struct Point { double x; double y; }; struct Line { double x; double y; double a; double l; }; struct Rectangle { double x; double y; double a; double w; double h; }; struct Circle { double x; double y; double r; };
tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Point{1.0, 1.0} ); shapes.push_back(Circle{1.0, 1.0, .3} ); shapes.push_back(Rectangle{0.0, 0.0, 0.0, 1.0, 1.0} );
This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} );
This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type.
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} ); This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type.
One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
On 05 May 2015, at 11:21, Boris Rasin <boris@pointx.org> wrote:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes; shapes.push_back(Point{1.0, 1.0} ); This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container. Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type.
One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
I think if you require preservation of ordering then you'll indeed need an extra index, but then you'll lose in speed and performance, and then I would just use a combination of std::vector and boost::variant / boost::any.
2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes;
shapes.push_back(Point{1.0, 1.0} );
This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type.
One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method). That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.
On 5/5/2015 1:12 PM, TONGARI J wrote:
2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes;
shapes.push_back(Point{1.0, 1.0} );
This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type.
One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.
Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?
On 05 May 2015, at 17:17, Boris Rasin <boris@pointx.org> wrote:
On 5/5/2015 1:12 PM, TONGARI J wrote: 2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes;
shapes.push_back(Point{1.0, 1.0} ); This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type. One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.
Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?
I don't think it makes sense to add an index to an tuple_vector, it's more like an unordered heterogenous container, it'll be good at iterating all "B"s though. Every variant will have pros and cons when implemented correctly. In general, the challenge imo is iterating heterogenous containers, a specialized for_each will probably be very handy.
M.a. wrote:
On 05 May 2015, at 17:17, Boris Rasin <boris@pointx.org> wrote:
On 5/5/2015 1:12 PM, TONGARI J wrote: 2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
> shapes.push_back(Point{1.0, 1.0} ); This is neat, but it' not quite the same thing. Unlike std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the type level. The reason is that objects gets stored in linear memory per type. One could conceivably add another internal container to maintain overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages? If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in
tuple_vector<Point, Line, Rectangle, Circle> shapes; traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better. Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?
I don't think it makes sense to add an index to an tuple_vector, it's more like an unordered heterogenous container, it'll be good at iterating all "B"s though. Every variant will have pros and cons when implemented correctly.
In general, the challenge imo is iterating heterogenous containers, a specialized for_each will probably be very handy.
Such container would be the most useful for people wanting to fight with performance degradation caused by some non-perfect memory access patterns, cache misses, etc. There are at least two concepts: 1. A heterogenous container/collection grouping objects of the same type in the same place in memory (homogeneous containers) This is AFAIU something you're planning to implement. Take a look at the poly_collection by Joaquín M López Muñoz http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html 2. A semantically homogeneous container breaking large objects into smaller ones in order to separate the hot and cold data to improve caching I think tuple_vector would be a good name for such container. Some time ago I implemented a dirty version full of hacks only to do some benchmarking: https://github.com/awulkiew/test-tuple_vector The goal was to implement a collection semantically the same as vector of tuples but internally breaking the tuple into elements and storing those elements in separate containers. The tricky part is that the reference type of such collection is not a true reference but a tuple of references, or a similar type which should semantically work like a reference, with collapsing, etc. AFAIK such container could also be used if some use case required to store specific chunks of data in buffers, e.g. in GPU processing where the GPU must be fed with one type of data. Regards, Adam
2015-05-05 23:17 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 1:12 PM, TONGARI J wrote:
2015-05-05 17:21 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
tuple_vector<Point, Line, Rectangle, Circle> shapes;
shapes.push_back(Point{1.0, 1.0} );
This is neat, but it' not quite the same thing. Unlike
std::vector<boost::any>, your tuple_vector does not organize objects into strictly linear arrangement. Using your example, there is no way to draw shapes in the order they were inserted into container.
Good point! So it loses overall ordering, only preservers it at the
type level. The reason is that objects gets stored in linear memory per type.
One could conceivably add another internal container to maintain
overall index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle> really better than std::vector<boost::variant<Point, Line, Rectangle, Circle>>? It would use less memory per object when some stored types are smaller than others, and in some circumstances these savings would exceed overhead of multiple internal containers and an additional index container. Any other advantages?
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.
Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?
The point is bulk operation. The internal could use some interval/span based structure for the indices , e.g. (A, 3), (B, 3) for [AAABBB]. Traversing the container in order then collapses to tight loops for consecutive data of the same type, that is, one `switch` for multiple elements, so the overhead could be amortized, while for vector<variant>, there's always one switch for one element.
TONGARI J <tongari95 <at> gmail.com> writes:
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better.
I wrote something on this some time ago, maybe worth having a look at: http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html Joaquín M López Muñoz Telefónica
On 5/5/2015 7:49 PM, Joaquin M Lopez Munoz wrote:
TONGARI J <tongari95 <at> gmail.com> writes:
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better. I wrote something on this some time ago, maybe worth having a look at:
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value.
Boris Rasin <boris <at> pointx.org> writes:
On 5/5/2015 7:49 PM, Joaquin M Lopez Munoz wrote:
I wrote something on this some time ago, maybe worth having a look at:
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value.
poly_collection does store data by value. Also, it is more space-efficient than vector<variant>. Joaquín M López Muñoz Telefónica
On 5/5/2015 11:59 PM, Joaquín M Lopez Munoz wrote:
Boris Rasin <boris <at> pointx.org> writes:
On 5/5/2015 7:49 PM, Joaquin M Lopez Munoz wrote:
I wrote something on this some time ago, maybe worth having a look at:
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value. poly_collection does store data by value. Also, it is more space-efficient than vector<variant>.
Yes, I know. I was just pointing out the fact that in your articles you explore performance differences between a container with polymorphic storage and container with storage by value, while we here were discussing possible performance differences between two containers storing data by value.
Boris Rasin <boris <at> pointx.org> writes:
On 5/5/2015 7:49 PM, Joaquin M Lopez Munoz wrote:
TONGARI J <tongari95 <at> gmail.com> writes:
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better. I wrote something on this some time ago, maybe worth having a look at:
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value.
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun: https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0 Timing for for_each on 1k-10M elements (MSVC 12.0): het_collection;vector_variant;sorted_vector_variant; for_each: 1000;0.0225453;0.0822232;0.0840048; 10000;0.0229827;0.0828214;0.0821378; 100000;0.0229448;0.081315;0.0839802; 10000000;0.0250193;0.0906496;0.0879296; So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance. Joaquín M López Muñoz Telefónica
2015-05-06 20:58 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
Boris Rasin <boris <at> pointx.org> writes:
On 5/5/2015 7:49 PM, Joaquin M Lopez Munoz wrote:
TONGARI J <tongari95 <at> gmail.com> writes:
If the data sequence shows some affinity, I guess an ordered tuple_vector-based sequence can provide some performance benefit in traversal (with some crafted for_each method).
That is, a sequence of [AAABBB] may be traversed faster than [ABABAB]
for
such a container, and I believe in the later case such that the types are uniformly distributed, vector<variant> will perform better. I wrote something on this some time ago, maybe worth having a look at:
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections-with.html
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value.
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun:
https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0
Timing for for_each on 1k-10M elements (MSVC 12.0):
het_collection;vector_variant;sorted_vector_variant; for_each: 1000;0.0225453;0.0822232;0.0840048; 10000;0.0229827;0.0828214;0.0821378; 100000;0.0229448;0.081315;0.0839802; 10000000;0.0250193;0.0906496;0.0879296;
So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance.
In case that you misunderstood my previous post, the assumption that when values of the same type lie together could result in better performance was for ordered tuple_vector-based sequence, not for vector_variant. Your benchmark result is indeed what we expected, what I'd like to see is the comparison between *ordered* het_collection and vector_variant.
TONGARI J <tongari95 <at> gmail.com> writes:
2015-05-06 20:58 GMT+08:00 Joaquin M Lopez Munoz <joaquin <at> tid.es>:
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun:
[...]
So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance.
In case that you misunderstood my previous post, the assumption that when values of the same type lie together could result in better performance was for ordered tuple_vector-based sequence, not for vector_variant.
Your benchmark result is indeed what we expected, what I'd like to see is the comparison between *ordered* het_collection and vector_variant.
Yes, definitely I'm not fully getting your point, my apologies. What do you mean by ordered het_collection? As it is designed het_collection clusters values of the same type together.. Do you mean a version where the order of insertion is arbitrary as in a vector? If so, any sensible design of such a het_collection_v2 would boil down to esentially the same as vector<variant>, I'd say. Joaquín M López Muñoz Telefónica
2015-05-06 21:55 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
TONGARI J <tongari95 <at> gmail.com> writes:
2015-05-06 20:58 GMT+08:00 Joaquin M Lopez Munoz <joaquin <at> tid.es>:
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun:
[...]
So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance.
In case that you misunderstood my previous post, the assumption that when values of the same type lie together could result in better performance was for ordered tuple_vector-based sequence, not for vector_variant.
Your benchmark result is indeed what we expected, what I'd like to see is the comparison between *ordered* het_collection and vector_variant.
Yes, definitely I'm not fully getting your point, my apologies. What do you mean by ordered het_collection? As it is designed het_collection clusters values of the same type together.. Do you mean a version where the order of insertion is arbitrary as in a vector? If so, any sensible design of such a het_collection_v2 would boil down to esentially the same as vector<variant>, I'd say.
What I have in mind is a het_collection with interval/span based structure for ordering. You can think it as a het_collection that also retains the insertion order. And I agree that in general case, vector<variant> is the choice, only when you know the values of the same type tend to lie together beforehand could a ordered het_collection give better performance.
On 5/6/2015 3:58 PM, Joaquin M Lopez Munoz wrote:
Boris Rasin <boris <at> pointx.org> writes:
Interesting read, thanks for the links. Although in this case we are talking about vector<variant> which stores data by value. Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a
collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun:
https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0
Timing for for_each on 1k-10M elements (MSVC 12.0):
het_collection;vector_variant;sorted_vector_variant; for_each: 1000;0.0225453;0.0822232;0.0840048; 10000;0.0229827;0.0828214;0.0821378; 100000;0.0229448;0.081315;0.0839802; 10000000;0.0250193;0.0906496;0.0879296;
So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance.
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?
Le 06/05/15 16:41, Boris Rasin a écrit :
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?
I suspect this is the role of Boost.MultiIndex :) Vicente
Vicente J. Botet Escriba <vicente.botet <at> wanadoo.fr> writes:
Le 06/05/15 16:41, Boris Rasin a écrit :
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?
I suspect this is the role of Boost.MultiIndex :)
Boost.MultiIndex is not powerful enough to support this: multi_index_containers are node-based by design, so there's no way we can have values stored contiguosly (unless via an indirection, which ruins caching). Joaquín M López Muñoz Telefónica
Le 06/05/15 17:36, Joaquin M Lopez Munoz a écrit :
Vicente J. Botet Escriba <vicente.botet <at> wanadoo.fr> writes:
Le 06/05/15 16:41, Boris Rasin a écrit :
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?
I suspect this is the role of Boost.MultiIndex :)
Boost.MultiIndex is not powerful enough to support this: multi_index_containers are node-based by design, so there's no way we can have values stored contiguosly (unless via an indirection, which ruins caching).
Hi Joaquin, sorry I believed that a direct access index was stored contiguously even if it uses a node base design. After a second thought I see that it is not possible to implement it, at least efficiently. Vicente
My 2cents as a data point related to this discussion: I use polymorphic containers in my projects but I had to implement them myself as the poly_collection was not available before and it don't always match my needs. I realized through usage that I had very different needs for polymorphic containers so maybe it would be interesting to some people here to have some info about use case. I have interest in seeing polymorphic containers in boost too. Here is what I use currently: - polymorphic collection of vectors: This is more or less the poly collection, with no guarantee of order but which does guarantee contiguity of values for one type, so that it is possible to efficiently go through all the values of one type. I tend to use this for several pool-like systems which provide services for sets of data which should be processed by the same "thread". - a container storing only one value per type: I use this for some kind of type-identified-messages board system where I just store the last message value of each type of message in the container. Of course for one type there can be no value too. It's more or less a map<typeid, anyvalue> but the interface is just simpler. - a set, giving guarantee of unique values, but that does take any kind of value type. I have an old review about this, so the code is similar to this: https://gist.github.com/Klaim/10599137 (I guess it could be optimized but I'm not an expert)
2015-05-06 22:41 GMT+08:00 Boris Rasin <boris@pointx.org>:
On 5/6/2015 3:58 PM, Joaquin M Lopez Munoz wrote:
Boris Rasin <boris <at> pointx.org> writes:
Interesting read, thanks for the links. Although in this case we are
talking about vector<variant> which stores data by value.
Ok, understood. I've written a small performance test of vector<variant<Ts...>> and sorted vector<variant<Ts...>> vs. a collection class het_collection<Ts...> storing values of the same type contiguously and providing a specialized for_each memfun:
https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0
Timing for for_each on 1k-10M elements (MSVC 12.0):
het_collection;vector_variant;sorted_vector_variant; for_each: 1000;0.0225453;0.0822232;0.0840048; 10000;0.0229827;0.0828214;0.0821378; 100000;0.0229448;0.081315;0.0839802; 10000000;0.0250193;0.0906496;0.0879296;
So, unsurprisingly, het_collection does much better as for_each on a vector<variant> needs to check type on each iteration. Sorting the vector so that values of the same type lie together does not have any impact on performance.
So traversing such "unordered" het_collection is indeed faster. Thanks for the data. The question is how useful such container would be in real life if it only supported "unordered" traversal, i.e. did not satisfy requirements of "sequence container"? As I suggested previously, one could add additional internal index container to keep track of linear element arrangement, allowing both linear traversal and fast "unordered" traversal when desired. Does this make any sense?
FWIW, I made a simple proof-of-concept, it's basically a het_collection with ordering. https://gist.github.com/jamboree/5a5797e8869168cc64d9 Maybe someone would give it a performance test.
TONGARI J <tongari95 <at> gmail.com> writes:
FWIW, I made a simple proof-of-concept, it's basically a het_collection with ordering.
https://gist.github.com/jamboree/5a5797e8869168cc64d9
Maybe someone would give it a performance test.
Can't you maybe adapt the testing framework at https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0 and run that locally? Joaquín M López Muñoz Telefónica
2015-05-07 18:28 GMT+08:00 Joaquin M Lopez Munoz <joaquin@tid.es>:
TONGARI J <tongari95 <at> gmail.com> writes:
FWIW, I made a simple proof-of-concept, it's basically a het_collection with ordering.
https://gist.github.com/jamboree/5a5797e8869168cc64d9
Maybe someone would give it a performance test.
Can't you maybe adapt the testing framework at
https://www.dropbox.com/s/qy9qbf0eyf1itsy/het_collection.cpp?dl=0
and run that locally?
https://gist.github.com/jamboree/7fca5045a1bcd880ba4b [MSVC14 RC] variant_sequence(no-order);variant_sequence;variant_sequence(sorted);vector_variant; for_each: 1000;0.00741892;0.033256;0.0234027;0.045456; 10000;0.0073138;0.0325596;0.0230426;0.0449622; 100000;0.00748698;0.0391825;0.0230043;0.045153; 10000000;0.0128108;0.0405801;0.0248128;0.0497122; [g++ 4.9.2/MinGW64] variant_sequence(no-order);variant_sequence;variant_sequence(sorted);vector_variant; for_each: 1000;0.0034023;0.0368542;0.00955831;0.0217586; 10000;0.003645;0.0363854;0.0095947;0.0210316; 100000;0.00367347;0.0425784;0.00957825;0.0213344; 10000000;0.00921831;0.0457604;0.0129568;0.0289718; As expected, variant_sequence performs better than vector_variant when the data is grouped, while it performs worse than vector_variant when the data is dispersed.
participants (9)
-
Adam Wulkiewicz
-
Boris Rasin
-
James Armstrong
-
Joaquin M Lopez Munoz
-
Joaquín M Lopez Munoz
-
Klaim - Joël Lamotte
-
Thijs (M.A.) van den Berg
-
TONGARI J
-
Vicente J. Botet Escriba