[poly_collection] User-defined polymorphism model

Hi Joaquín and all reviewers, Before we reach the end of the review period, I wanted to comment an (IMHO) interesting feature used in the internal design of PolyCollection. The library offers three of polymorphism models: base-derived, boost::any and signature-sharing callable types. Under the hood, the library abstracts those polymorphic models into a generic class in the detail namespace: template<typename Model,typename Allocator> class poly_collection; and each concrete container is just implemented in terms of poly_collection: template<typename Base,typename Allocator> class base_collection : public detail::poly_collection<detail::base_model<Base>,Allocator> The first template parameter (Model) is the abstraction ("concept") that implements the non-generic part of each polymorphism model. A model implements the following types and operations (comments on each attribute are my interpretation, they can be completely wrong): template<class ModelArg> class Model { // //Value-related attributes // //The type we want to expose as value_type of the container using value_type= ...; //Is T a subtype of the general type (acceptable as value)? template<class T> using is_subtype= ... //Is T the "most refined type"? template<class T> using is_terminal=... //Id of a value template<class T> static const std::type_info& subtypeid(const T& x); //Obtain a type-erased pointer to the value template<class T> static void* subaddress(T& x); template<typename T> static const void* subaddress(const T& x); // //Segment-related // //Segment iterators that point to type_erased value_types using base_iterator=... using const_base_iterator=... //const<->non-const conversions static base_iterator nonconst_iterator(const_base_iterator it); template<typename T> static iterator<T> nonconst_iterator(const_iterator<T> it); //Concrete iterators for Derived template<typename Derived> using iterator=...; template<typename Derived> using const_iterator=... //Generic type, comparable with base_iterator. //base_iterator == the last returned base_sentinel //means that base_iterator points one-past the last //element of a segment. using base_sentinel=...; using const_base_sentinel=... //Construct a segment for type T using the given allocator and //return it type-erased. template<typename T,typename Allocator> static detail::segment_backend<Model>* make(const Allocator& al); }; I have several questions on this polymorphism model. 1) Wouldn't be a good idea to make this model-based polymorphism (or a refined one) public? Maybe the current model needs more work to support additional polymorphism models. Of course requirements for each operation in a model should be detailed. A correct abstraction to define user-provided polymorphism models is hard but this seems really a great feature that makes this library more valuable for users. 2) If understand this model abstraction correctly, each Model chooses the segment type to store values (it could be any sequence type, I guess). The segment_backend abstraction does not only abstract ConcreteTypes from value_types (using void pointers that will be casted to real types in the implementation of the virtual function), but also the segment type (the sequence holding the concrete values). Current implementation uses two segment types: split_segment (stores Concrete types and generic value_type's in two separate vectors) and packed_segment (stores ConcreteTypes which already contain value_type's as a subobject). function_collection and any_collection use the first segment, base_collection uses the second. Why are different segment types needed? Can't we store value_types and ConcreteTypes in a struct and use only one type of segment? If this is possible, I guess iteration performance would be improved for function_collection and any_collection. Model definition could be simplified. On the other hand, if the sequence type can be left out of the model definition, the underlying sequence type (change vector with deque, small vector or any other) could be made configurable. Would this be useful for users? 3) I understand that caching a base_sentinel in each segment is an optimization to avoid a virtual function call when iterating with base_iterators. Is this assumption correct? Why is a different base_sentinel type needed to mark the end of each segment? Wouldn't a "base_iterator" pointing to that on-past-the end position suffice? Is an optimization (maybe base_sentinel is cheaper) or a must have? Thanks, Ion PD: I have more design questions related to other topics but I will post a different thread for each one.

Em 14 may 2017, às 1:32, Ion Gaztañaga <igaztanaga@gmail.com> escreveu:
Hi Joaquín and all reviewers,
Before we reach the end of the review period, I wanted to comment an (IMHO) interesting feature used in the internal design of PolyCollection. The library offers three of polymorphism models: base-derived, boost::any and signature-sharing callable types.
Under the hood, the library abstracts those polymorphic models into a generic class in the detail namespace:
[...]
I have several questions on this polymorphism model.
1) Wouldn't be a good idea to make this model-based polymorphism (or a refined one) public? Maybe the current model needs more work to support additional polymorphism models. Of course requirements for each operation in a model should be detailed. A correct abstraction to define user-provided polymorphism models is hard but this seems really a great feature that makes this library more valuable for users.
I'd be really conservative here and withhold such a big move until real use cases are found. The problem is once this is made public the design of the lib becomes effectively frozen.
2) If understand this model abstraction correctly, each Model chooses the segment type to store values (it could be any sequence type, I guess). The segment_backend abstraction does not only abstract ConcreteTypes from value_types (using void pointers that will be casted to real types in the implementation of the virtual function), but also the segment type (the sequence holding the concrete values).
Current implementation uses two segment types: split_segment (stores Concrete types and generic value_type's in two separate vectors) and packed_segment (stores ConcreteTypes which already contain value_type's as a subobject). function_collection and any_collection use the first segment, base_collection uses the second.
Why are different segment types needed? Can't we store value_types and ConcreteTypes in a struct and use only one type of segment? If this is possible, I guess iteration performance would be improved for function_collection and any_collection. Model definition could be simplified.
I tried using packed_segment with function_collection and any_collection, profiled and found split_segment is faster: basically, it gets you more values per cache line.
On the other hand, if the sequence type can be left out of the model definition, the underlying sequence type (change vector with deque, small vector or any other) could be made configurable. Would this be useful for users?
This was also proposed by Thorsten, please see my answer there.
3) I understand that caching a base_sentinel in each segment is an optimization to avoid a virtual function call when iterating with base_iterators. Is this assumption correct?
Yes, this is it.
Why is a different base_sentinel type needed to mark the end of each segment? Wouldn't a "base_iterator" pointing to that on-past-the end position suffice? Is an optimization (maybe base_sentinel is cheaper) or a must have?
An optimization. base_iterator would do. Joaquín M López Muñoz

On 14/05/2017 9:51, Joaquín M López Muñoz via Boost wrote:
I have several questions on this polymorphism model.
1) Wouldn't be a good idea to make this model-based polymorphism (or a refined one) public? Maybe the current model needs more work to support additional polymorphism models. Of course requirements for each operation in a model should be detailed. A correct abstraction to define user-provided polymorphism models is hard but this seems really a great feature that makes this library more valuable for users.
I'd be really conservative here and withhold such a big move until real use cases are found. The problem is once this is made public the design of the lib becomes effectively frozen.
Right. We'd need more to test more polymorphism models in order to deduce a more generic model. Noted for a future improvement.
Current implementation uses two segment types: split_segment (stores Concrete types and generic value_type's in two separate vectors) and packed_segment (stores ConcreteTypes which already contain value_type's as a subobject). function_collection and any_collection use the first segment, base_collection uses the second.
Why are different segment types needed? Can't we store value_types and ConcreteTypes in a struct and use only one type of segment? If this is possible, I guess iteration performance would be improved for function_collection and any_collection. Model definition could be simplified.
I tried using packed_segment with function_collection and any_collection, profiled and found split_segment is faster: basically, it gets you more values per cache line.
Ok, now I see it. Since poly_collection optimizes processing and not space or other operations, split_segment seems a good choice as objects are tightly packed and all additional data is stored in another segment. packed_segment is an optimization when value_type is already stored inside the object (like a base class). Makes sense.
On the other hand, if the sequence type can be left out of the model definition, the underlying sequence type (change vector with deque, small vector or any other) could be made configurable. Would this be useful for users?
This was also proposed by Thorsten, please see my answer there.
Right. Maybe the least intrusive approach is to require element contiguity so something like small-vector could be used. But I agree this is not an essential feature. Best, Ion
participants (2)
-
Ion Gaztañaga
-
Joaquín M López Muñoz