[cpo-proposal] presentation of the idea

Santiago Tapia

13 Aug 2013 13 Aug '13

12:07 a.m.

Hi everyone, I would like to make a proposal for a new library. I only have a draft without documentation but I would like to know people’s opinion about the idea before making a formal proposal. The library will be named CPO (containers for polymorphic objects) and it will provide containers to store dynamic allocated objects from a class hierarchy. I have drafted one container and I have two more in process. The cpo library could be an alternative solution for boost pointer container library or some other solutions that use pointers. In order to understand the idea, I think that the following example of the use of the library might be better than a long explanation: #include <boost/cpo/classifier.hpp> #include <iostream> using namespace boost::cpo; class base { public: virtual int do_something(int a) const = 0; }; class derived_A : public base { public: derived_A(int b) : data(b) {} virtual int do_something(int a) const { return a + data; } protected: int data; }; class derived_B : public base { public: derived_B(int b) : data(b) {} virtual int do_something(int a) const { return a - data; } protected: int data; }; int main(int argc, char** argv) { classifier<base, std::allocator> collection; collection.insert( derived_A(21) ); collection.insert( derived_B(40) ); classifier<base, std::allocator>::iterator i, e = collection.end(); for ( i = collection.begin(); i != e; ++i ) { const base& x = *i; std::cout << x.do_something(6) << std::endl; } return 0; } ---- Program output: 27 -34 ---- Some details and advantages must be remarked in the provided example: - There are not any pointers in the client code (the example). - The container's name is classifier because it classifies objects internally using the RTTI. - The container classifier is _not_ a sequence. - Any iterator of the classifier dereferences to the base class. - The allocator argument in the classifier will be applied to every object added to the classifier. Thus, the allocator is provided without its argument. The std::allocator is the default argument. - The classifier uses continuous storage in memory for objects of the same type, therefore it is expected to improve cache efficiency with respect to solutions that use pointers (and new to allocate objects), especially if the client code uses some cache-aware allocator. I think that the idea will be interesting and useful. Although I have registered a project in sourceforge, classifier_container, I must insist: the library is under development and without documentation. I look forward to your comments and I thank you for them in advance. Best Regards, Santiago Tapia CITEF CENTRO DE INVESTIGACIÓN EN TECNOLOGÍAS FERROVIARIAS | RAILWAY TECHNOLOGY RESEARCH CENTRE UNIVERSIDAD POLITÉCNICA DE MADRID ESCUELA TÉCNICA SUPERIOR DE INGENIEROS INDUSTRIALES Santiago Tapia Fernández José Gutiérrez Abascal, 2 | 28006 Madrid TEL + 34 91 336 4267 | FAX + 34 91 561 8618 E-MAIL stapia@etsii.upm.es; www.citef.industriales.upm.es

Show replies by date

Larry Evans

13 Aug 13 Aug

12:42 p.m.

On 08/12/13 19:07, Santiago Tapia wrote:

...

Hi everyone,

I would like to make a proposal for a new library. I only have a draft without documentation but I would like to know people’s opinion about the idea before making a formal proposal.

The library will be named CPO (containers for polymorphic objects) and it will provide containers to store dynamic allocated objects from a class hierarchy. I have drafted one container and I have two more in process. The cpo library could be an alternative solution for boost pointer container library or some other solutions that use pointers. In order to understand the idea, I think that the following example of the use of the library might be better than a long explanation:

#include <boost/cpo/classifier.hpp> #include <iostream>

using namespace boost::cpo;

class base { public: virtual int do_something(int a) const = 0; };

class derived_A : public base { public: derived_A(int b) : data(b) {} virtual int do_something(int a) const { return a + data; } protected: int data; };

class derived_B : public base { public: derived_B(int b) : data(b) {} virtual int do_something(int a) const { return a - data; } protected: int data; };

int main(int argc, char** argv) { classifier<base, std::allocator> collection;

collection.insert( derived_A(21) ); collection.insert( derived_B(40) );

classifier<base, std::allocator>::iterator i, e = collection.end();

for ( i = collection.begin(); i != e; ++i ) { const base& x = *i; std::cout << x.do_something(6) << std::endl; }

return 0; } [snip] Hi Santiago,

What if the class hierarchy is used to model a graph that contains cycles. For example, something like a spirit grammar where a non-terminal on the rhs of a non-terminals definition? Wouldn't this require some sort of smart pointer or other garbage collection method with the ability to collect cycles? -regards, Larry

Klaim - Joël Lamotte

4:35 p.m.

On Tue, Aug 13, 2013 at 2:42 PM, Larry Evans <cppljevans@suddenlink.net>wrote:

...

What if the class hierarchy is used to model a graph that contains cycles. For example, something like a spirit grammar where a non-terminal on the rhs of a non-terminals definition? Wouldn't this require some sort of smart pointer or other garbage collection method with the ability to collect cycles?

My understanding of the proposed container is that it's mostly like having a collection of vectors, one for each final types, but exposing only one base interface in container interface. If I'm correct, I don't see how the problem you describe might be a problem. Joel.

Larry Evans

6:25 p.m.

On 08/13/13 11:35, Klaim - Joël Lamotte wrote:> On Tue, Aug 13, 2013 at 2:42 PM, Larry Evans <cppljevans@suddenlink.net>wrote:

...

...
What if the class hierarchy is used to model a graph that contains cycles. For example, something like a spirit grammar where a non-terminal on the rhs of a non-terminals definition? Wouldn't this require some sort of smart pointer or other garbage collection method with the ability to collect cycles?

My understanding of the proposed container is that it's mostly like having a collection of vectors, one for each final types, but exposing only one base interface in container interface. If I'm correct, I don't see how the problem you describe might be a

problem.

...

To be clear, you're saying it was not Santiago's intention to use this libray to create something like this class: class base_container : public base { public: virtual int do_something(int a) const; classifier<base, std::allocator> collection; }; Is so, that intention was not clear from Santiago's OP. -regards, Larry

Larry Evans

6:54 p.m.

On 08/13/13 13:25, Larry Evans wrote:

...

On 08/13/13 11:35, Klaim - Joël Lamotte wrote:> On Tue, Aug 13, 2013 at 2:42 PM, Larry Evans <cppljevans@suddenlink.net>wrote:

...
...
What if the class hierarchy is used to model a graph that contains cycles. For example, something like a spirit grammar where a non-terminal on the rhs of a non-terminals definition? Wouldn't this require some sort of smart pointer or other garbage collection method with the ability to collect cycles?

My understanding of the proposed container is that it's mostly like having a collection of vectors, one for each final types, but exposing only one base interface in container interface. If I'm correct, I don't see how the problem you describe might be a

problem.

...
To be clear, you're saying it was not Santiago's intention to use this libray to create something like this class:

class base_container : public base { public: virtual int do_something(int a) const; classifier<base, std::allocator> collection; };

Rereading the OP, I see: collection.insert( derived_A(21) ); which means a completely new object is created during the insert; hence, no cycles in pointer graph can occur. Sorry for noise.

Klaim - Joël Lamotte

4:33 p.m.

On Tue, Aug 13, 2013 at 2:07 AM, Santiago Tapia <santiago.tapia@upm.es>wrote:

...

I think that the idea will be interesting and useful.

It feels like I've written similar containers several times through the years, so it might be useful, but in the same time in the recent years I didn't use such mechanism, not sure why. Maybe a classic implementation example would make a clearer point on which case it solves. Joel Lamotte

Andrey Semashev

7:33 p.m.

On Tuesday 13 August 2013 02:07:15 Santiago Tapia wrote:

...

Hi everyone,

I would like to make a proposal for a new library. I only have a draft without documentation but I would like to know people’s opinion about the idea before making a formal proposal.

The library will be named CPO (containers for polymorphic objects) and it will provide containers to store dynamic allocated objects from a class hierarchy. I have drafted one container and I have two more in process. The cpo library could be an alternative solution for boost pointer container library or some other solutions that use pointers. In order to understand the idea, I think that the following example of the use of the library might be better than a long explanation:

[snip] I created containers of dynamically typed elements with Boost.Intrusive a few times. It was a bit more verbose than your example but the resulting solution was quite efficient and it had all the properties of a container. The drawback was that the solution was intrusive w.r.t. the stored types. Do you think you could create a set of containers (e.g. list, set, map) of polymorphic objects based on Boost.Intrusive without requiring to modify the types of the elements?

...

Some details and advantages must be remarked in the provided example:

- The container classifier is _not_ a sequence.

Why? Is this intentional?

...

- The allocator argument in the classifier will be applied to every object added to the classifier. Thus, the allocator is provided without its argument. The std::allocator is the default argument.

It's better not to use template template parameters in the container interface as it limits its usefullness because users won't be able to specify non- template allocators or template allocators with different template arguments. You can use the standard interface for allocator rebinding: typedef allocator<T> allocator_T; typedef typename allocator_T::template rebind<U>::other allocator_U; You can use std::allocator<void> by default then. All in all the basic idea looks interesting, although I'm not sure about the particular classifier container. It looks very specific to me, the same functionality could be achieved with a multimap or unordered_multimap of polymorphic elements.

Andrey Semashev

7:37 p.m.

On Tuesday 13 August 2013 23:33:25 you wrote:

...

On Tuesday 13 August 2013 02:07:15 Santiago Tapia wrote:

...
Hi everyone,

I would like to make a proposal for a new library. I only have a draft without documentation but I would like to know people’s opinion about the idea before making a formal proposal.

The library will be named CPO (containers for polymorphic objects) and it will provide containers to store dynamic allocated objects from a class hierarchy. I have drafted one container and I have two more

in process. The cpo library could be an alternative solution for

boost pointer container library or some other solutions that use pointers. In order to understand the idea, I think that the following example of the

...
use of the library might be better than a long explanation: [snip]

I created containers of dynamically typed elements with Boost.Intrusive a few times. It was a bit more verbose than your example but the resulting solution was quite efficient and it had all the properties of a container. The drawback was that the solution was intrusive w.r.t. the stored types. Do you think you could create a set of containers (e.g. list, set, map) of polymorphic objects based on Boost.Intrusive without requiring to modify the types of the elements?

...
Some details and advantages must be remarked in the provided example:

- The container classifier is _not_ a sequence.

Why? Is this intentional?

...
- The allocator argument in the classifier will be applied to every object added to the classifier. Thus, the allocator is provided without its argument. The std::allocator is the default argument.

It's better not to use template template parameters in the container interface as it limits its usefullness because users won't be able to specify non- template allocators or template allocators with different template arguments. You can use the standard interface for allocator rebinding:

typedef allocator<T> allocator_T; typedef typename allocator_T::template rebind<U>::other allocator_U;

You can use std::allocator<void> by default then.

All in all the basic idea looks interesting, although I'm not sure about the particular classifier container. It looks very specific to me, the same functionality could be achieved with a multimap or unordered_multimap of polymorphic elements.

...or I should have said "multiset or unordered_multiset".

Thorsten Ottosen

16 Aug 16 Aug

11 a.m.

Hi Santiago,

...

Hi everyone,

I would like to make a proposal for a new library. I only have a draft without documentation but I would like to know people’s opinion about the idea before making a formal proposal.

The library will be named CPO (containers for polymorphic objects) and it will provide containers to store dynamic allocated objects from a class hierarchy. I have drafted one container and I have two more in process.

[snip]

...

int main(int argc, char** argv) { classifier<base, std::allocator> collection;

collection.insert( derived_A(21) ); collection.insert( derived_B(40) );

classifier<base, std::allocator>::iterator i, e = collection.end();

for ( i = collection.begin(); i != e; ++i ) { const base& x = *i; std::cout << x.do_something(6) << std::endl; }

return 0; }

I would like to see the use of forwrding in-place construction: collection.push_back<derived_A>( 21 ); using variadic templates.

...

- The container's name is classifier because it classifies objects internally using the RTTI.

I had similar ideas that I wanted to implement. I didn't expect to use RTTI. Basically I wanted the class hierarchy's base class to derive from a class with pure virtual functions that allowed the container to perform the inplace-contruction, copying and alignment stuff.

...

- The container classifier is _not_ a sequence.

I would expect the container to allow forward iteration. I assume you mean that erase/insert into the middle is prohibited? I imagined that only push_back/pop_back was supported.

...

- The classifier uses continuous storage in memory for objects of the same type, therefore it is expected to improve cache efficiency with respect to solutions that use pointers (and new to allocate objects), especially if the client code uses some cache-aware allocator.

I wanted to call my class template polymorphic_vector, taking base class and allocator template arguments. I imagined that the storage would be completely contigious, even when many different types of objects where stored into the container. This would give maximum cache efficiency. Then if the user wanted to sort or otherwise manipulate the sequence, he could just create an std::vector<T*> of the elements (Thus, push_back should probably return a pointer to the constructed element). All in all, I hope you will persue this idea. I certainly have many use-cases for it in code where efficiency matters, and in code where using Boost.Intrusive is just must more work. kind regards Thorsten

Larry Evans

1:33 p.m.

On 08/16/13 06:00, Thorsten Ottosen wrote: [snip]

...

...
- The classifier uses continuous storage in memory for objects of the same type, therefore it is expected to improve cache efficiency with respect to solutions that use pointers (and new to allocate objects), especially if the client code uses some cache-aware allocator.

I wanted to call my class template polymorphic_vector, taking base class and allocator template arguments. I imagined that the storage would be completely contigious, even when many different types of objects where stored into the container. This would give maximum cache efficiency.

Hi Thorsten, I'm curious how you'd be able to store different types in contiguous storage. I would guess the base class would have to have a virtual derived_alignof and derived_sizeof methods which would return alignof(Derived) and sizeof(Derived), and then the container would use placement new and these virtual functions and a container_buffer type std::vector<char> to figure out where to place the next element in the container_buffer? Since boost::fusion::vector<T1,T2,...,Tn> stores different types, T1,T2,...,Tn in contiguous storage, I'm wondering why not use boost::fusion::vector? [snip] -regards, Larry

Thorsten Ottosen

19 Aug 19 Aug

11:59 a.m.

On 16-08-2013 15:33, Larry Evans wrote:

...

On 08/16/13 06:00, Thorsten Ottosen wrote: [snip]

...
...
- The classifier uses continuous storage in memory for objects of the same type, therefore it is expected to improve cache efficiency with respect to solutions that use pointers (and new to allocate objects), especially if the client code uses some cache-aware allocator.

I wanted to call my class template polymorphic_vector, taking base class and allocator template arguments. I imagined that the storage would be completely contigious, even when many different types of objects where stored into the container. This would give maximum cache efficiency.

Hi Thorsten,

I'm curious how you'd be able to store different types in contiguous storage. I would guess the base class would have to have a virtual derived_alignof and derived_sizeof methods which would return alignof(Derived) and sizeof(Derived), and then the container would use placement new and these virtual functions and a container_buffer type std::vector<char> to figure out where to place the next element in the container_buffer?

That sounds like the basic idea. Alignment may not be needed if we assume some kind of maximal alignment. Sizeof may not be needed either if the syntax for adding objects is cont.push_back<Derived>( x, y );

...

Since boost::fusion::vector<T1,T2,...,Tn> stores different types, T1,T2,...,Tn in contiguous storage, I'm wondering why not use boost::fusion::vector?

So this vector has no requirement on the size of the container (at compile time)? I guess fusion::vector uses some kind of variant storage approach? If so, a lot of space may be wasted if Derived1 is small and Derived2 is large. Also, We would really like not to have T1,T2,...,Tn to be complete types when using a polymorphic_vector, and we would really like not to mention all the potential types stored up front. regards Thorsten

Larry Evans

12:54 p.m.

On 08/19/13 06:59, Thorsten Ottosen wrote:

...

On 16-08-2013 15:33, Larry Evans wrote:

...
On 08/16/13 06:00, Thorsten Ottosen wrote: [snip]

...
...
- The classifier uses continuous storage in memory for objects of the same type, therefore it is expected to improve cache efficiency with respect to solutions that use pointers (and new to allocate objects), especially if the client code uses some cache-aware allocator.

I wanted to call my class template polymorphic_vector, taking base class and allocator template arguments. I imagined that the storage would be completely contigious, even when many different types of objects where stored into the container. This would give maximum cache efficiency.

Hi Thorsten,

I'm curious how you'd be able to store different types in contiguous storage. I would guess the base class would have to have a virtual derived_alignof and derived_sizeof methods which would return alignof(Derived) and sizeof(Derived), and then the container would use placement new and these virtual functions and a container_buffer type std::vector<char> to figure out where to place the next element in the container_buffer?

That sounds like the basic idea. Alignment may not be needed if we assume some kind of maximal alignment. Sizeof may not be needed either if the syntax for adding objects is

cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned. Does that make sense?

...

...
Since boost::fusion::vector<T1,T2,...,Tn> stores different types, T1,T2,...,Tn in contiguous storage, I'm wondering why not use boost::fusion::vector?

So this vector has no requirement on the size of the container (at compile time)?

OOPS. IIUC, fusion::vector does require all types be known at compile time.

...

I guess fusion::vector uses some kind of variant storage approach?

No. fusion::vector uses the preprocessor to generate the code for a struct something like: struct vectorN { T1 m1; T2 m2, ..., TN mN; }; You're probably wondering how that could be done with the preprocessor, and I did have a look at how awhile back, but can't remember the details now, other than it took a bit of work to understand.

...

If so, a lot of space may be wasted if Derived1 is small and Derived2 is large.

Also, We would really like not to have T1,T2,...,Tn to be complete types when using a polymorphic_vector, and we would really like not to mention all the potential types stored up front.

Ah! OK, I understand now. Thanks. -regards, Larry

Larry Evans

1:10 p.m.

On 08/19/13 07:54, Larry Evans wrote:

...

On 08/19/13 06:59, Thorsten Ottosen wrote: [snip]

...
On 16-08-2013 15:33, Larry Evans wrote:

...
On 08/16/13 06:00, Thorsten Ottosen wrote: [snip] That sounds like the basic idea. Alignment may not be needed if we assume some kind of maximal alignment. Sizeof may not be needed either if the syntax for adding objects is

cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned.

Does that make sense?

OOPS. If all those push_back's of a char causes the vector to be resized in placed in a different location, then wouldn't that require all the realignments for each element to be recalculated :( Hmmm. Not if &cont[0] is stored at a max aligned location, which I assume it would be since the storage would be created on the heap with new[] which, according to the c++ standard ( can't remember which page or section), returns a maximally aligned storage location. And since the start of cont is maximally aligned, any offsets in that cont with a given alignment would have the same alignment when moved, IIUC. -regards, Larry [snip]

Thorsten Ottosen

1:44 p.m.

On 19-08-2013 15:10, Larry Evans wrote:

...

On 08/19/13 07:54, Larry Evans wrote:

...
On 08/19/13 06:59, Thorsten Ottosen wrote:

...

...
...
cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned.

Does that make sense?

OOPS. If all those push_back's of a char causes the vector to be resized in placed in a different location, then wouldn't that require all the realignments for each element to be recalculated :(

Hmmm. Not if &cont[0] is stored at a max aligned location, which I assume it would be since the storage would be created on the heap with new[] which, according to the c++ standard ( can't remember which page or section), returns a maximally aligned storage location.

Exactly. The relative locations should stay the same, only the start address would change.

...

And since the start of cont is maximally aligned, any offsets in that cont with a given alignment would have the same alignment when moved, IIUC.

Yes, at this level it is nice to take advantage of the memcopying built into vector<char>. I guess the above means that the idea of letting push_back return the address of the object is not going to work /unless no reallocation happens/. This may be too fragile, so maybe we should embed a singly linked list directly together with the DerivedX instances and update it on copying. This seems to contradict the use of memcpy. Hm. Maybe we can avoid storing the linked list al together. Forward iteration would find the next element by asking the current element by its size and by computing i_aligned (somehow). Maybe that's the way to go. regards -Thorsten

Larry Evans

1:57 p.m.

On 08/19/13 08:44, Thorsten Ottosen wrote:

...

On 19-08-2013 15:10, Larry Evans wrote:

...
On 08/19/13 07:54, Larry Evans wrote:

...
On 08/19/13 06:59, Thorsten Ottosen wrote:

...
...
...
cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned.

Does that make sense?

OOPS. If all those push_back's of a char causes the vector to be resized in placed in a different location, then wouldn't that require all the realignments for each element to be recalculated :(

Hmmm. Not if &cont[0] is stored at a max aligned location, which I assume it would be since the storage would be created on the heap with new[] which, according to the c++ standard ( can't remember which page or section), returns a maximally aligned storage location.

Exactly. The relative locations should stay the same, only the start address would change.

...
And since the start of cont is maximally aligned, any offsets in that cont with a given alignment would have the same alignment when moved, IIUC.

Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

I guess the above means that the idea of letting push_back return the address of the object is not going to work /unless no reallocation happens/. This may be too fragile, so maybe we should embed a singly linked list directly together with the DerivedX instances and update it on copying. This seems to contradict the use of memcpy. Hm.

Maybe we can avoid storing the linked list al together. Forward iteration would find the next element by asking the current element by its size and by computing i_aligned (somehow). Maybe that's the way to go.

OR, by having the container also contain a std::vector<std::size_t> offsets;//offsets of values stored. as well as: std::vector<char> storage;//for storing values at offsets. This would trade space for time, but it would also allow random access. -regards, Larry

Thorsten Ottosen

2:06 p.m.

On 19-08-2013 15:57, Larry Evans wrote:

...

On 08/19/13 08:44, Thorsten Ottosen wrote:

...

...
Maybe we can avoid storing the linked list al together. Forward iteration would find the next element by asking the current element by its size and by computing i_aligned (somehow). Maybe that's the way to go.

OR, by having the container also contain a std::vector<std::size_t> offsets;//offsets of values stored. as well as: std::vector<char> storage;//for storing values at offsets.

This would trade space for time, but it would also allow random access.

Yeah, random access indexing, probably not full random access iterators. Since the size of vector<T*> is no bigger than vector<size_t>, it's probably better just to allow forward iteration and then let the user create a vector<T*> as they want (for full random access manipulation). -Thorsten

Larry Evans

2:30 p.m.

On 08/19/13 09:06, Thorsten Ottosen wrote:

...

On 19-08-2013 15:57, Larry Evans wrote:

...
On 08/19/13 08:44, Thorsten Ottosen wrote:

...
...
Maybe we can avoid storing the linked list al together. Forward iteration would find the next element by asking the current element by its size and by computing i_aligned (somehow). Maybe that's the way to go.

OR, by having the container also contain a std::vector<std::size_t> offsets;//offsets of values stored. as well as: std::vector<char> storage;//for storing values at offsets.

This would trade space for time, but it would also allow random access.

Yeah, random access indexing, probably not full random access iterators.

Since the size of vector<T*> is no bigger than vector<size_t>, it's probably better just to allow forward iteration and then let the user create a vector<T*> as they want (for full random access manipulation).

-Thorsten

However I assume you meant vector<T*> to contain pointers to elements in the vector<char> storage of the container. But what happens if storage needs to be resized? OTOH, using vector<std::size_t> to store the offsets would not be invalidated by resizing storage. -Larry

Thorsten Ottosen

2:41 p.m.

On 19-08-2013 16:30, Larry Evans wrote:

...

On 08/19/13 09:06, Thorsten Ottosen wrote:

...

...
Since the size of vector<T*> is no bigger than vector<size_t>, it's probably better just to allow forward iteration and then let the user create a vector<T*> as they want (for full random access manipulation).

-Thorsten

However I assume you meant vector<T*> to contain pointers to elements in the vector<char> storage of the container. But what happens if storage needs to be resized? OTOH, using vector<std::size_t> to store the offsets would not be invalidated by resizing storage.

True, there are different tradeoffs. At least in my use cases, the objects would be loaded at application start, and remain in memory until the application exists. Therefore, I can create a vector<T*> object after loading the data and rely on the pointers to be valid throughout the entire application lifetime. Perhaps push_back should return the offset instead, leaving it up to the user if he needs to store the offset or the pointers? Anyway, I think the basic functionality of this library should strive for minimal overhead, and then perhaps expose various container adaptors that add more functionality. regards -Thorsten

Larry Evans

8:18 p.m.

On 08/19/13 09:41, Thorsten Ottosen wrote:

...

On 19-08-2013 16:30, Larry Evans wrote:

...
On 08/19/13 09:06, Thorsten Ottosen wrote:

...
...
Since the size of vector<T*> is no bigger than vector<size_t>, it's probably better just to allow forward iteration and then let the user create a vector<T*> as they want (for full random access manipulation).

-Thorsten

However I assume you meant vector<T*> to contain pointers to elements in the vector<char> storage of the container. But what happens if storage needs to be resized? OTOH, using vector<std::size_t> to store the offsets would not be invalidated by resizing storage.

True, there are different tradeoffs.

At least in my use cases, the objects would be loaded at application start, and remain in memory until the application exists. Therefore, I can create a vector<T*> object after loading the data and rely on the pointers to be valid throughout the entire application lifetime.

Perhaps push_back should return the offset instead, leaving it up to the user if he needs to store the offset or the pointers?

Anyway, I think the basic functionality of this library should strive for minimal overhead, and then perhaps expose various container adaptors that add more functionality.

regards

-Thorsten

The 1st attachment uses the offset method for the container and has the: template<typename F> void push_back(F const&); method. The output is in 2nd attachment. -Larry

Larry Evans

20 Aug 20 Aug

12:38 a.m.

On 08/19/13 15:18, Larry Evans wrote: [snip]

...

The 1st attachment uses the offset method for the container and has the:

template<typename F> void push_back(F const&);

method.

[snip] OOPS, f_offset didn't have padding added. Corrected in this version.

Thorsten Ottosen

8:30 a.m.

On 19-08-2013 22:18, Larry Evans wrote:

...

...
The 1st attachment uses the offset method for the container and has the:

template<typename F> void push_back(F const&);

method.

The output is in 2nd attachment.

You work fast :-) Since the classes in a hierarchy is normally not copyable (though perhaps cloneable), I would make push_back like this: template< class T, typename... Args > void push_back( Args... args ) { // A. static assert that T* is convertiable to Base* // B. all the usual stuff // C. in-place construct from std::forward(args)... } -Thorsten

Rob Stewart

9:04 a.m.

On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

On 19-08-2013 15:10, Larry Evans wrote:

...
On 08/19/13 07:54, Larry Evans wrote:

...
On 08/19/13 06:59, Thorsten Ottosen wrote:

...
...
...
cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned.

Does that make sense?

OOPS. If all those push_back's of a char causes the vector to be resized in placed in a different location, then wouldn't that require all the realignments for each element to be recalculated :(

Reallocation causes a bigger problem...

...

...
Hmmm. Not if &cont[0] is stored at a max aligned location, which I assume it would be since the storage would be created on the heap with new[] which, according to the c++ standard ( can't remember which page or section), returns a maximally aligned storage location.

Exactly. The relative locations should stay the same, only the start address would change.

Sure, the new objects would be laid out in memory the same after a reallocation, but...

...

...
And since the start of cont is maximally aligned, any offsets in that cont with a given alignment would have the same alignment when moved, IIUC.

Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run. If the container manages the reallocations by noticing when it is about to occur, creates a new vector with greater capacity, clones each element into the new vector, destroys the elements in the old vector, then swaps the two vectors, it can work. I should think a deque would be better as it doesn't require all of that work. ___ Rob (Sent from my portable computation engine)

Thorsten Ottosen

9:23 a.m.

On 20-08-2013 11:04, Rob Stewart wrote:

...

On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...

...
Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

OK. :-) I was waiting for an expert to join the discussion.

...

If the container manages the reallocations by noticing when it is about to occur, creates a new vector with greater capacity, clones each element into the new vector, destroys the elements in the old vector, then swaps the two vectors, it can work.

Yeah, ok. I guess there will a need for one or two virtual functions in a base class after all. Copy-construction is one way, move-construction probably better.

...

I should think a deque would be better as it doesn't require all of that work.

Well, if we had a deque with better control over the chuck size, then perhaps. But I doubt it's a good idea. The container should of course have a reserve( unsigned words ) function as well as an shrink_to_fit() such that reallocations/memory use can be minimized. -Thorsten

Larry Evans

11:55 a.m.

On 08/20/13 04:23, Thorsten Ottosen wrote:

...

On 20-08-2013 11:04, Rob Stewart wrote:

...
On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
...
Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

...

OK. :-) I was waiting for an expert to join the discussion.

...
If the container manages the reallocations by noticing when it is about to occur, creates a new vector with greater capacity, clones each element into the new vector, destroys the elements in the old vector, then swaps the two vectors, it can work.

Yeah, ok. I guess there will a need for one or two virtual functions in a base class after all. Copy-construction is one way, move-construction probably better.

...
I should think a deque would be better as it doesn't require all of that work.

Well, if we had a deque with better control over the chuck size, then perhaps. But I doubt it's a good idea.

I had thought about deque; however, I couldn't think of how a deque<char> would work because the chuck size would have to be large enough to handle the largest size of the elements placed in the container. Since, IIUC, that's not known until the actual container.push_back<Field>(a_field) is run, and can change each time container.push_back is called, deque couldn't work. I also did study briefly the segmented sequences idea discussed in thread: http://article.gmane.org/gmane.comp.lib.boost.devel/142262 However, I did not study long enough to see if it would solve the problem :( [snip] -Larry

Larry Evans

12:11 p.m.

On 08/20/13 06:55, Larry Evans wrote:

...

On 08/20/13 04:23, Thorsten Ottosen wrote:

...
On 20-08-2013 11:04, Rob Stewart wrote:

...
On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
...
Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

However, this is a specific case of a more general problem. Suppose the polymorphic_object at container[i+j] contains a pointer of the polymorphic_obect at container[i]. Then after resizing, the pointer will become invalid. The specific case Rob mentions is when j=0. Maybe the class should just warn the user not to do this or suggest a workaround, maybe instead of storing a pointer, just store the index into the container? [snip] -Larry

Thorsten Ottosen

12:44 p.m.

On 20-08-2013 14:11, Larry Evans wrote:

...

On 08/20/13 06:55, Larry Evans wrote:

...
On 08/20/13 04:23, Thorsten Ottosen wrote:

...
On 20-08-2013 11:04, Rob Stewart wrote:

...
On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
...
Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

However, this is a specific case of a more general problem. Suppose the polymorphic_object at container[i+j] contains a pointer of the polymorphic_obect at container[i]. Then after resizing, the pointer will become invalid. The specific case Rob mentions is when j=0.

Maybe the class should just warn the user not to do this or suggest a workaround, maybe instead of storing a pointer, just store the index into the container?

Pointer stability is an /inherent/ "problem" compared to using e.g. ptr_vector<T>. I see no other solution than warning the user about pointer invalidation, and advice the user to fill the container before storing any pointers in other places, or as you say, to use indexes. Since we need a base class, then maybe somthing like this is a starting point: class polymorphic_object { public: virtual ~polymorphic_object() {} private: polymorphic_object( const polymorphic_object& ); polymorphic_object& operator=( const polymorphic_object& ); public: static void copy( const polymorphic_object& this_, char* to ) { // preconditions this_.do_copy( to ); // postconditions } static void move( polymorphic_object& this_, char* to ) { // preconditions this_.do_move( to ); // postconditions } private: virtual polymorphic_object* do_copy( char* to ) const = 0; virtual polymorphic_object* do_move( char* to ) = 0; }; regards Thorsten

Larry Evans

22 Aug 22 Aug

2:52 p.m.

On 08/20/13 07:44, Thorsten Ottosen wrote:

...

On 20-08-2013 14:11, Larry Evans wrote:

...
On 08/20/13 06:55, Larry Evans wrote:

...
On 08/20/13 04:23, Thorsten Ottosen wrote:

...
On 20-08-2013 11:04, Rob Stewart wrote:

...
On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
...
Yes, at this level it is nice to take advantage of the memcopying built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

However, this is a specific case of a more general problem. Suppose the polymorphic_object at container[i+j] contains a pointer of the polymorphic_obect at container[i]. Then after resizing, the pointer will become invalid. The specific case Rob mentions is when j=0.

Maybe the class should just warn the user not to do this or suggest a workaround, maybe instead of storing a pointer, just store the index into the container?

Pointer stability is an /inherent/ "problem" compared to using e.g. ptr_vector<T>. I see no other solution than warning the user about pointer invalidation, and advice the user to fill the container before storing any pointers in other places, or as you say, to use indexes.

Since we need a base class, then maybe somthing like this is a starting point:

class polymorphic_object { public: virtual ~polymorphic_object() {}

private: polymorphic_object( const polymorphic_object& ); polymorphic_object& operator=( const polymorphic_object& );

public: static void copy( const polymorphic_object& this_, char* to ) { // preconditions this_.do_copy( to ); // postconditions }

static void move( polymorphic_object& this_, char* to ) { // preconditions this_.do_move( to ); // postconditions }

private: virtual polymorphic_object* do_copy( char* to ) const = 0; virtual polymorphic_object* do_move( char* to ) = 0; };

regards

Thorsten

Hi Thorsten, I'm not sure how these do_copy and do_move virtual member functions would work differently than just using the derived copy CTOR's. The do_{copy,move} would still have to know how to map the old pointer to the new pointer, if pointers were used. OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work? OTOH, the problem with using offsets is that any function taking the offset argument to access the actual polymorphic_object in the container of polymorphic_object's would have to be given, in addition, a pointer to that container. OTOH, I'm wondering if borrowing from the code in a copying garbage collector would provide a solution. For example, if the: std::vector<char> storage; was the storage in the container, then when it was resized such that a new storage region needed to be used, then wouldn't the code in a copying garbage collector which changes all pointers in the from space to pointers in the to space solve the problem? Just a thought. I've never written a copying garbage collector; hence, I may be missing something obvious. -regards, Larry

Larry Evans

3:14 p.m.

On 08/22/13 09:52, Larry Evans wrote:

...

On 08/20/13 07:44, Thorsten Ottosen wrote:

...
On 20-08-2013 14:11, Larry Evans wrote:

...
On 08/20/13 06:55, Larry Evans wrote:

...
On 08/20/13 04:23, Thorsten Ottosen wrote:

...
On 20-08-2013 11:04, Rob Stewart wrote:

...
On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen <thorsten.ottosen@dezide.com> wrote:

...
> Yes, at this level it is nice to take advantage of the memcopying > built into vector<char>.

Here's the problem. Replicating the bits of and object isn't the same as copying it. Many classes store a this pointer or rely on a data member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

However, this is a specific case of a more general problem. Suppose the polymorphic_object at container[i+j] contains a pointer of the polymorphic_obect at container[i]. Then after resizing, the pointer will become invalid. The specific case Rob mentions is when j=0.

Maybe the class should just warn the user not to do this or suggest a workaround, maybe instead of storing a pointer, just store the index into the container?

Pointer stability is an /inherent/ "problem" compared to using e.g. ptr_vector<T>. I see no other solution than warning the user about pointer invalidation, and advice the user to fill the container before storing any pointers in other places, or as you say, to use indexes.

Since we need a base class, then maybe somthing like this is a starting point:

class polymorphic_object { public: virtual ~polymorphic_object() {}

private: polymorphic_object( const polymorphic_object& ); polymorphic_object& operator=( const polymorphic_object& );

public: static void copy( const polymorphic_object& this_, char* to ) { // preconditions this_.do_copy( to ); // postconditions }

static void move( polymorphic_object& this_, char* to ) { // preconditions this_.do_move( to ); // postconditions }

private: virtual polymorphic_object* do_copy( char* to ) const = 0; virtual polymorphic_object* do_move( char* to ) = 0; };

regards

Thorsten

Hi Thorsten,

I'm not sure how these do_copy and do_move virtual member functions would work differently than just using the derived copy CTOR's. The do_{copy,move} would still have to know how to map the old pointer to the new pointer, if pointers were used. OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work? OTOH, the problem with using offsets is that any function taking the offset argument to access the actual polymorphic_object in the container of polymorphic_object's would have to be given, in addition, a pointer to that container.

OTOH, I'm wondering if borrowing from the code in a copying garbage collector would provide a solution. For example, if the:

std::vector<char> storage;

was the storage in the container, then when it was resized such that a new storage region needed to be used, then wouldn't the code in a copying garbage collector which changes all pointers in the from space to pointers in the to space solve the problem?

Just a thought. I've never written a copying garbage collector; hence, I may be missing something obvious.

-regards, Larry

Some googling for "copy collector stl allocator" lead to: http://computer-programming-forum.com/81-vc/cf24648b1c6c8921.htm which talks about a similar problem with memory mapped files.

Larry Evans

3:42 p.m.

On 08/22/13 10:14, Larry Evans wrote:

...

On 08/22/13 09:52, Larry Evans wrote:

...
On 08/20/13 07:44, Thorsten Ottosen wrote:

...
On 20-08-2013 14:11, Larry Evans wrote:

...
On 08/20/13 06:55, Larry Evans wrote:

...
On 08/20/13 04:23, Thorsten Ottosen wrote:

...
On 20-08-2013 11:04, Rob Stewart wrote: > On Aug 19, 2013, at 9:44 AM, Thorsten Ottosen > <thorsten.ottosen@dezide.com> wrote:

>> Yes, at this level it is nice to take advantage of the memcopying >> built into vector<char>. > > Here's the problem. Replicating the bits of and object isn't the same > as copying it. Many classes store a this pointer or rely on a data > member address as a unique key, etc. Copy constructors must run.

Good catch. Thanks Rob.

However, this is a specific case of a more general problem. Suppose the polymorphic_object at container[i+j] contains a pointer of the polymorphic_obect at container[i]. Then after resizing, the pointer will become invalid. The specific case Rob mentions is when j=0.

Maybe the class should just warn the user not to do this or suggest a workaround, maybe instead of storing a pointer, just store the index into the container?

Pointer stability is an /inherent/ "problem" compared to using e.g. ptr_vector<T>. I see no other solution than warning the user about pointer invalidation, and advice the user to fill the container before storing any pointers in other places, or as you say, to use indexes.

Since we need a base class, then maybe somthing like this is a starting point:

class polymorphic_object { public: virtual ~polymorphic_object() {}

private: polymorphic_object( const polymorphic_object& ); polymorphic_object& operator=( const polymorphic_object& );

public: static void copy( const polymorphic_object& this_, char* to ) { // preconditions this_.do_copy( to ); // postconditions }

static void move( polymorphic_object& this_, char* to ) { // preconditions this_.do_move( to ); // postconditions }

private: virtual polymorphic_object* do_copy( char* to ) const = 0; virtual polymorphic_object* do_move( char* to ) = 0; };

regards

Thorsten

Hi Thorsten,

I'm not sure how these do_copy and do_move virtual member functions would work differently than just using the derived copy CTOR's. The do_{copy,move} would still have to know how to map the old pointer to the new pointer, if pointers were used. OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work? OTOH, the problem with using offsets is that any function taking the offset argument to access the actual polymorphic_object in the container of polymorphic_object's would have to be given, in addition, a pointer to that container.

OTOH, I'm wondering if borrowing from the code in a copying garbage collector would provide a solution. For example, if the:

std::vector<char> storage;

was the storage in the container, then when it was resized such that a new storage region needed to be used, then wouldn't the code in a copying garbage collector which changes all pointers in the from space to pointers in the to space solve the problem?

Just a thought. I've never written a copying garbage collector; hence, I may be missing something obvious.

-regards, Larry

Some googling for "copy collector stl allocator" lead to:

http://computer-programming-forum.com/81-vc/cf24648b1c6c8921.htm

which talks about a similar problem with memory mapped files.

Even more relevant is: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2486.pdf which, on page 6 has a section, "Relocatable Container" which says: Using a specialty pointer that stores offsets, rather than raw pointer values, I can make a container capable of being streamed from one process to the next. Sounds like this would solve the problem. -regards, Larry

Thorsten Ottosen

5:01 p.m.

On 22-08-2013 17:42, Larry Evans wrote:

...

Even more relevant is:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2486.pdf

which, on page 6 has a section, "Relocatable Container" which says:

Using a specialty pointer that stores offsets, rather than raw pointer values, I can make a container capable of being streamed from one process to the next.

Sounds like this would solve the problem.

I guess this is what boost.interprocess already does? -Thorsten

Thorsten Ottosen

3:38 p.m.

On 22-08-2013 16:52, Larry Evans wrote:

...

On 08/20/13 07:44, Thorsten Ottosen wrote:

...

...
Since we need a base class, then maybe somthing like this is a starting point:

class polymorphic_object { public: virtual ~polymorphic_object() {}

private: polymorphic_object( const polymorphic_object& ); polymorphic_object& operator=( const polymorphic_object& );

public: static void copy( const polymorphic_object& this_, char* to ) { // preconditions this_.do_copy( to ); // postconditions }

static void move( polymorphic_object& this_, char* to ) { // preconditions this_.do_move( to ); // postconditions }

private: virtual polymorphic_object* do_copy( char* to ) const = 0; virtual polymorphic_object* do_move( char* to ) = 0; };

regards

Thorsten

Hi Thorsten,

I'm not sure how these do_copy and do_move virtual member functions would work differently than just using the derived copy CTOR's.

We know the exact type in cont.push_back<Derived>( ... ) but at the time when the container needs to move/copy elements around, the exact type of the objects has been lost. All we know is that in polymorphic_vector<Base> the elements are somehow derived from Base. Hence we can't call the the CTOR's. To get to the next element, I guess we need a virtual object_size() function too.

...

The do_{copy,move} would still have to know how to map the old pointer to the new pointer, if pointers were used.

Yes, the derived classes' do_{copy,move} would call their respective constructors. The postcondition in copy/move would assert that the newly constructed object is of the correct type, using typeid. If you want to make the class automatically track pointers stored inside derived classes pointing to other derived classes, then I think it is way to complicated. The user should be adviced/required to construct all the elements and then set up pointers if that is needed.

...

OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work?

Doesn't that have the same problems as memcpy'ing? That is, if we don't call the constructor, we don't get the implicit stuff inside the classes properly constructed. regards Thorsten

Larry Evans

4:27 p.m.

On 08/22/13 10:38, Thorsten Ottosen wrote: [snip]

...

...
OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work?

Doesn't that have the same problems as memcpy'ing? That is, if we don't call the constructor, we don't get the implicit stuff inside the classes properly constructed.

OK. Maybe so. I was still under the impression that the only problem with simply copying the old std::vector<char>::data() to a new std::vector<char>::data() is that this could copy the raw pointers verbatim which would mean those raw pointers in the new std::vector<char>::data() would point somewhere still in the old std::vector<char>::data(), which would be invalid after the resize. OTOH, with offsets instead of raw pointers, the offsets would be w.r.t. the existing std::vector<char>::data(); hence, would still be valid since offsets would remain invariant. Although I haven't looked at, this looks like something that could be used: http://www.boost.org/doc/html/interprocess/offset_ptr.html (Thanks to Bjorn Reese for sending me a private email pointing to this reference). -regards,

...

regards

Thorsten

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ben Pope

23 Aug 23 Aug

2:54 a.m.

On 23/08/13 00:27, Larry Evans wrote:

...

On 08/22/13 10:38, Thorsten Ottosen wrote: [snip]

...
...
OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work?

Doesn't that have the same problems as memcpy'ing? That is, if we don't call the constructor, we don't get the implicit stuff inside the classes properly constructed.

OK. Maybe so. I was still under the impression that the only problem with simply copying the old std::vector<char>::data() to a new std::vector<char>::data() is that this could copy the raw pointers verbatim which would mean those raw pointers in the new std::vector<char>::data() would point somewhere still in the old std::vector<char>::data(), which would be invalid after the resize. OTOH, with offsets instead of raw pointers, the offsets would be w.r.t. the existing std::vector<char>::data(); hence, would still be valid since offsets would remain invariant.

But how can you be sure an arbitrary non-POD object in the vector doesn't hold a pointer to itself? Also, this problem has just been found in Folly fbvector: https://github.com/facebook/folly/issues/35 Ben

Larry Evans

3:32 a.m.

On 08/22/13 21:54, Ben Pope wrote:

...

On 23/08/13 00:27, Larry Evans wrote:

...
On 08/22/13 10:38, Thorsten Ottosen wrote: [snip]

...
...
OTOH, if only offsets were used, then wouldn't simply using the vector<char>::swap work?

Doesn't that have the same problems as memcpy'ing? That is, if we don't call the constructor, we don't get the implicit stuff inside the classes properly constructed.

OK. Maybe so. I was still under the impression that the only problem with simply copying the old std::vector<char>::data() to a new std::vector<char>::data() is that this could copy the raw pointers verbatim which would mean those raw pointers in the new std::vector<char>::data() would point somewhere still in the old std::vector<char>::data(), which would be invalid after the resize. OTOH, with offsets instead of raw pointers, the offsets would be w.r.t. the existing std::vector<char>::data(); hence, would still be valid since offsets would remain invariant.

But how can you be sure an arbitrary non-POD object in the vector doesn't hold a pointer to itself?

Raw pointers would be outlawed by the class' documentation. Instead, offsets would be used. In the case of container[i] containing a pointer to itself, this would be implemented as just something like container_offset_ptr(i). Now, as mentioned in the post you responded to, there's the boost::offset_ptr: http://www.boost.org/doc/html/interprocess/offset_ptr.html which would simply contain a zero offset representing a pointer to self. This is a much more elegant solution, AFAICT, than using offsets in the whole container.

...

Also, this problem has just been found in Folly fbvector: https://github.com/facebook/folly/issues/35

That doesn't use boost::offset_ptr. Could that problem be solved using boost::offset_ptr? -Larry

Thorsten Ottosen

7:41 a.m.

On 23-08-2013 04:54, Ben Pope wrote:

...

But how can you be sure an arbitrary non-POD object in the vector doesn't hold a pointer to itself?

Also, this problem has just been found in Folly fbvector: https://github.com/facebook/folly/issues/35

WE can't, hence we need to call copy/move constructors via the virtual functions. -Thorsten

Larry Evans

3:08 p.m.

On 08/23/13 02:41, Thorsten Ottosen wrote:

...

On 23-08-2013 04:54, Ben Pope wrote:

...
But how can you be sure an arbitrary non-POD object in the vector doesn't hold a pointer to itself?

Also, this problem has just been found in Folly fbvector: https://github.com/facebook/folly/issues/35

WE can't, hence we need to call copy/move constructors via the virtual functions.

-Thorsten

The attached code show a non-POD object containing a raw pointer to itself and an offset_ptr to itself. It also shows an in-place construction of such a non-POD object in a "from" std::vector<char>. Then a COPY CTOR of another "to" std::vector<char>. The printouts (also attached) show that the raw_ptr has wrong value in the "to" but the offset_ptr has the correct value. AFAICT, this shows the offset_ptr would solve the problem of moving the storage in case the storage needs to be resized. Am I missing something? -Larry

Thorsten Ottosen

27 Aug 27 Aug

11:44 a.m.

On 23-08-2013 17:08, Larry Evans wrote:

...

AFAICT, this shows the offset_ptr would solve the problem of moving the storage in case the storage needs to be resized.

Am I missing something?

I can't tell. Some interprocess experts might want to way in here? Anyway, I can live with a container that has unstable pointers/references after push_back. std::vector<T> is like that, so what's exactly the difference here? -Thorsten

Larry Evans

12:09 p.m.

On 08/27/13 06:44, Thorsten Ottosen wrote:

...

On 23-08-2013 17:08, Larry Evans wrote:

...
AFAICT, this shows the offset_ptr would solve the problem of moving the storage in case the storage needs to be resized.

Am I missing something?

I can't tell. Some interprocess experts might want to way in here?

Anyway, I can live with a container that has unstable pointers/references after push_back. std::vector<T> is like that, so what's exactly the difference here?

The offset_ptr<polymorophic_object> solves the unstable pointer problem caused by moving the storage buffer. That's one problem 1st mentioned by Rob Stewart here: http://article.gmane.org/gmane.comp.lib.boost.devel/243582 and then again by Ben Pope here: http://article.gmane.org/gmane.comp.lib.boost.devel/243633 The code I last posted shows the offset_ptr still points to the *this even after the move, although the raw pointer points to the old location. Also, no virtual methods are needed to preserve this stable pointer feature. Using just offset_ptr seems to do the job. -regards, Larry

Gottlob Frege

28 Aug 28 Aug

4:51 a.m.

<snip> big thread about polymorphic container ... </snip> I hope I didn't miss anything in the thread. If I'm repeating something, sorry. I don't think we want or need to limit what we put into the container. Neither by forcing certain virtual functions onto the base class, nor by limiting what the contained types can and can't do (like have pointers to themselves and do funky things in their copy/move constructors). When push_back<Triangle> is called, we create an instance of an ItemHandler<Triangle> class, that derives from BaseItemHandler, and implements copy/move virtually. ie type-erasure. ItemHandler<Triangle> knows how to move/copy Triangles. It is not Triangle's job! We put the ItemHandler<Triangle> in a map<RTTI, BaseItemHandler*>. Note: - we only need *one* ItemHandler<Triangle> for all Triangles, not one for each. - Triangle doesn't have any do_copy/move/size virtual functions, ItemHandler<Triangle> does. - we can store Triangle mixed in with Squares and other shapes or isolated in a group of Triangles. Maybe different containers will choose differently. - if stored mixed together, then we have a parallel vector that is a pointer to BaseItemHandler* for each item in the main vector. ie (typed in email, so should not compile) struct BaseItemHandler { virtual void do_copy(char * dest, char const * src) const = 0; virtual std::size_t size() const = 0; }; template <typename T> struct ItemHandler : BaseItemHandler { virtual void do_copy(char * dest, char const * src) const { T * typed_src = reinterpret_cast<T*>(src); new (dest) T(*typed_src); // in-place copy construct } virtual std::size_t size() const { return sizeof(T); } }; template <typename Item> void classifier::push_back(Item const & item) { char const * key = typeid(Item).name(); // or use raw name or address of type info // add/get handler to/from map HandlerMap::iterator itHandler = handlerMap.find(key); BaseItemHandler * handler = 0; if (itHandler == handlerMap.end()) { // new type we haven't seen before handler = new ItemHandler<Item>; handlerMap[key] = handler; } else { handler = *itHandler; } char * space = make_space_for_item(handler->size()); // make space in vector<char> handler->do_copy(space, item); inplace copy construct handlerList.push_back(handler); // handler for this item // note that handler is in handlerMap *once* (ie one per type) // and same pointer may be in handlerList multiple times, as it appears once for each item of type Item } classifier::iterator & classifier::iterator::operator++(int) { // iterator has pointer/iterator/offset into handlerList (called 'itHandler') BasItemHandler * handler = *itHandler; // and pointer/iterator/offset into vector<char>, called 'data' // ++ on data data += handler->size(); // ++ on handler ++itHandler; // iterator now points to next object in the vector, // and points to its associated handler return *this; } Tony On Tue, Aug 27, 2013 at 8:09 AM, Larry Evans <cppljevans@suddenlink.net>wrote:

...

On 08/27/13 06:44, Thorsten Ottosen wrote:

...
On 23-08-2013 17:08, Larry Evans wrote:

...
AFAICT, this shows the offset_ptr would solve the problem of moving the storage in case the storage needs to be resized.

Am I missing something?

I can't tell. Some interprocess experts might want to way in here?

Anyway, I can live with a container that has unstable pointers/references after push_back. std::vector<T> is like that, so what's exactly the difference here?

The offset_ptr<polymorophic_object> solves the unstable pointer problem caused by moving the storage buffer. That's one problem 1st mentioned by Rob Stewart here:

http://article.gmane.org/gmane.comp.lib.boost.devel/243582

and then again by Ben Pope here:

http://article.gmane.org/gmane.comp.lib.boost.devel/243633

The code I last posted shows the offset_ptr still points to the *this even after the move, although the raw pointer points to the old location. Also, no virtual methods are needed to preserve this stable pointer feature. Using just offset_ptr seems to do the job.

-regards, Larry

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Thorsten Ottosen

6:57 a.m.

On 28-08-2013 06:51, Gottlob Frege wrote:

...

<snip> big thread about polymorphic container ... </snip>

I hope I didn't miss anything in the thread. If I'm repeating something, sorry.

I don't think we want or need to limit what we put into the container. Neither by forcing certain virtual functions onto the base class, nor by limiting what the contained types can and can't do (like have pointers to themselves and do funky things in their copy/move constructors).

When push_back<Triangle> is called, we create an instance of an ItemHandler<Triangle> class, that derives from BaseItemHandler, and implements copy/move virtually. ie type-erasure. ItemHandler<Triangle> knows how to move/copy Triangles. It is not Triangle's job!

We put the ItemHandler<Triangle> in a map<RTTI, BaseItemHandler*>.

[Thorsten:] This is a bad idea IMO. The point of such a container is to provide as little fragmentation of the heap as possible, to get better locality of data (when the data consists of polymorphic objetcs). With std::vector<int> we get great locality. We don't yet have such an alternative for polymorphic objects. kind regards -Thorsten

Gottlob Frege

3:50 p.m.

On Wed, Aug 28, 2013 at 2:57 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

On 28-08-2013 06:51, Gottlob Frege wrote:

...
We put the ItemHandler<Triangle> in a map<RTTI, BaseItemHandler*>.

[Thorsten:]

This is a bad idea IMO. The point of such a container is to provide as little fragmentation of the heap as possible, to get better locality of data (when the data consists of polymorphic objetcs). With

std::vector<int>

we get great locality. We don't yet have such an alternative for polymorphic objects.

kind regards

-Thorsten

Well I think there are trade offs. Is locality the number one feature? It might be better to store the size (and type) of each object right beside the object inside the data vector then. So at least iterating is local. I would still use Type Handlers for copy/move. I think it is OK if copy/move is slightly less local. You could also probably track whether any of the Items have non-trivial copy/move operators. ie in push_back<Item>() increment a nonTrivalCounter. If nonTrivialCounter == 0 on copy/move, then do a memcpy, otherwise use the Handlers to copy.

Thorsten Ottosen

29 Aug 29 Aug

8:23 a.m.

On 28-08-2013 17:50, Gottlob Frege wrote:

...

On Wed, Aug 28, 2013 at 2:57 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:

...

Well I think there are trade offs. Is locality the number one feature?

Yes. If that os not your use case, then you have many of other options which are somewhat easier to use.

...

It might be better to store the size (and type) of each object right beside the object inside the data vector then. So at least iterating is local.

Could be an option. Might even be a good one. Experiments will tell.

...

I would still use Type Handlers for copy/move. I think it is OK if copy/move is slightly less local.

You could also probably track whether any of the Items have non-trivial copy/move operators. ie in push_back<Item>() increment a nonTrivalCounter. If nonTrivialCounter == 0 on copy/move, then do a memcpy, otherwise use the Handlers to copy.

This is way too complicated. It's unlikely that all objects would be trivial, probably impossible, since the objects needs to have a base class. -Thorsten

Larry Evans

28 Aug 28 Aug

12:51 p.m.

...

<snip> big thread about polymorphic container ... </snip>

I hope I didn't miss anything in the thread. If I'm repeating something, sorry.

I don't think we want or need to limit what we put into the container. Neither by forcing certain virtual functions onto the base class, nor by limiting what the contained types can and can't do (like have pointers to themselves and do funky things in their copy/move constructors).

When push_back<Triangle> is called, we create an instance of an ItemHandler<Triangle> class, that derives from BaseItemHandler, and implements copy/move virtually. ie type-erasure. ItemHandler<Triangle> knows how to move/copy Triangles. It is not Triangle's job!

We put the ItemHandler<Triangle> in a map<RTTI, BaseItemHandler*>.

Note: - we only need *one* ItemHandler<Triangle> for all Triangles, not one for each. - Triangle doesn't have any do_copy/move/size virtual functions, ItemHandler<Triangle> does.

- we can store Triangle mixed in with Squares and other shapes or isolated in a group of Triangles. Maybe different containers will choose differently. - if stored mixed together, then we have a parallel vector that is a pointer to BaseItemHandler* for each item in the main vector.

ie (typed in email, so should not compile)

struct BaseItemHandler { virtual void do_copy(char * dest, char const * src) const = 0; virtual std::size_t size() const = 0; };

template <typename T> struct ItemHandler : BaseItemHandler { virtual void do_copy(char * dest, char const * src) const { T * typed_src = reinterpret_cast<T*>(src); new (dest) T(*typed_src); // in-place copy construct } virtual std::size_t size() const { return sizeof(T); } };

template <typename Item> void classifier::push_back(Item const & item) { char const * key = typeid(Item).name(); // or use raw name or address of type info

// add/get handler to/from map HandlerMap::iterator itHandler = handlerMap.find(key); BaseItemHandler * handler = 0; if (itHandler == handlerMap.end()) { // new type we haven't seen before handler = new ItemHandler<Item>; handlerMap[key] = handler; } else { handler = *itHandler; }

char * space = make_space_for_item(handler->size()); // make space in vector<char> handler->do_copy(space, item); inplace copy construct How does make_space_for_item assure the return value is aligned

On 08/27/13 23:51, Gottlob Frege wrote: properly for an Item type? [snip]

Gottlob Frege

3:37 p.m.

On Wed, Aug 28, 2013 at 8:51 AM, Larry Evans <cppljevans@suddenlink.net>wrote:

...

On 08/27/13 23:51, Gottlob Frege wrote:

...
char * space = make_space_for_item(handler->size()); // make space in vector<char> handler->do_copy(space, item); inplace copy construct

How does make_space_for_item assure the return value is aligned properly for an Item type? [snip]

Well, I guess it would also need some kind of handler->alignment(). make_space_for_item(handler->size(), handler->alignment()); you would look at the end of the vector<char> and see if it aligns or how much you need to fudge there. So add enough for alignment + size and return the aligned address.

Larry Evans

29 Aug 29 Aug

4:08 p.m.

On 08/28/13 10:37, Gottlob Frege wrote:

...

On Wed, Aug 28, 2013 at 8:51 AM, Larry Evans <cppljevans@suddenlink.net>wrote:

...
On 08/27/13 23:51, Gottlob Frege wrote:

...
char * space = make_space_for_item(handler->size()); // make space in vector<char> handler->do_copy(space, item); inplace copy construct

How does make_space_for_item assure the return value is aligned properly for an Item type? [snip]

Well, I guess it would also need some kind of handler->alignment().

make_space_for_item(handler->size(), handler->alignment());

you would look at the end of the vector<char> and see if it aligns or how much you need to fudge there. So add enough for alignment + size and return the aligned address.

This "add enough for alignment+size" is what attachment to: http://article.gmane.org/gmane.comp.lib.boost.devel/243575 does; however, that code doesn't have anything like a handler->alignment() since all the information about the alignment requirements can be gotten from the Item template argument to the push_back member function. HTH. -Larry

Thorsten Ottosen

19 Aug 19 Aug

1:12 p.m.

On 19-08-2013 14:54, Larry Evans wrote:

...

On 08/19/13 06:59, Thorsten Ottosen wrote:

...

...
That sounds like the basic idea. Alignment may not be needed if we assume some kind of maximal alignment. Sizeof may not be needed either if the syntax for adding objects is

cont.push_back<Derived>( x, y );

Using a templated push_back, IIUC, would require some sort of container, such as std::vector<char> cont, which contains contigous storage, and the push_back should find the next location, i_aligned, in std::vector<char> which is at alignment, alignof(Derived). and would then push_back i_aligned chars, then further push_back sizeof(Derived) char's, the new inplace at cont.begin()+i_aligned.

Does that make sense?

Yes. And I think that very often i_aligned would be zero due to the internal padding in classes. -Thorsten

4337

Age (days ago)

4353

Last active (days ago)

List overview

Download

45 comments

8 participants

participants (8)

Andrey Semashev
Ben Pope
Gottlob Frege
Klaim - Joël Lamotte
Larry Evans
Rob Stewart
Santiago Tapia
Thorsten Ottosen