Interest in boost.deepcopy

Just gauging initial interest in a boost.deepcopy library, analogous to deepcopy in python. Each type would register deepcopy behaviour with the library (there are similarities to boost.serialize). Boost.deepcopy would supply out-of-the-box registration for pod types, pointer types, arrays, stl containers and common boost containers. Example: template<typename T> void deepcopy(deepcopycontext& ctxt, const T& obj) { ...; } Why not use boost::serialize and serialize into memory then out again I hear you say? That's a great approach, but only when every object requires the same deepcopy behaviour (also, it's not too optimal). The problem is, I need specific behaviour in some types. Specifically, I need copy-on-write behaviour for a Buffer class that I'm using. For example: void deepcopy(deepcopycontext& ctxt, const Buf& b) { if(!ctxt.contains(&b)) { Buf b2(b); // doesn't copy the buf data, just holds a shared ptr internally. Data is only cloned when b/b2 is written to ctxt.insert(&b, b2); } } I attempted to model this a while ago using boost.serialize, but it turned out pretty clunky, I started needing a few decorator classes for special cases, etc. Apologies for the oversimplified examples above, just wanting to get interest in the jist of it. Please let me know if this overlaps with existing boost functionality also (I'm not aware of anything atm). Feedback appreciated. Thx, Allan

Consider an object hierarchy in which there may be multiple references to shared objects. A deep copy should give an identical, but entirely separate copy of the entire structure. Also consider eg std::map<std::string,T*>... the std::map copy constructor is not going to create new instances of T in the copy - this is a shallow copy. You could argue that I should write a custom smart pointer to perform a deep copy in its copy constructor, but for multiple reasons I'd quite like to not have to do this, and to use boost::shared_ptr as normal. On Tue, Oct 25, 2011 at 1:13 PM, Steven Watanabe <watanabesj@gmail.com>wrote:

On Tue, Oct 25, 2011 at 1:28 PM, GMan <gmannickg@gmail.com> wrote:
Because I don't want to have to introduce a custom shared_ptr class across my whole codebase when a standard one will do? Because I'm boost.python binding my library and I don't want to have to write a bunch of code to deal with a custom ptr storage type.
That's the standard solution. You can use T instead of T*, or std::unique_ptr<T>, or a boost::ptr_map.
How do either of these suggestions deal with avoidance of duplicating shared objects within a hierarchy? Furthermore a ptr_map won't cut it, my keys also need to be deep-copied as they are a non-trivial type.

On Mon, Oct 24, 2011 at 7:42 PM, Allan Johns <allan.johns@drdstudios.com>wrote:
Because I don't want to have to introduce a custom shared_ptr class across my whole codebase when a standard one will do?
Well apparently it's not doing since you'd need to write more utility code anyway. :P If I understand correctly.
I'm not sure they do, I didn't quite understand your case. I feel like either there's an existing and more orthodox approach, or that if this is indeed the best approach it's quite situational at best. -- GMan, Nick Gorski

On Tue, Oct 25, 2011 at 1:29 PM, Julien Nitard <julien.nitard@m4tp.org>wrote:
Often but not always. Sometimes a type will want specific behaviour, such as sharing an internal reference with the original object because there is some COW pattern implemented.
Does it work properly if (in your example above) T is a polymorphic class ? If yes, how ?
Yes, and in the same way boost.serialize works... each polymorphic class is registered beforehand.

UNCLASSIFIED Definitely it sounds a very worthwhile addition to me. Given that it does follow some proven patterns/concepts as in boost::serialize (for instance), it should take away a lot of guesses out of how it should/would work. I'd love to see it added, pls... Gabe Levy Senior Software Engineer, HiQ Systems Pty Ltd WSD, 201L, Edinburgh, SA
IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

UNCLASSIFIED the user code simpler. While the elegance of the usage may be obvious, the question may arise as to the performance of it. So as long as it is implemented in a performance aware way (i.e. for the best performance possible), I'd have no problems seeing some/many overall benefits. However, it would then be HIGHLY desirable to support generic allocators with allocators' type-defined pointer types. Following this would make it "boost::interprocess" compatible. That requirement would be very high on my list. Thank you and looking forward to something like this. Regards Gabe Gabe Levy Senior Software Engineer, HiQ Systems Pty Ltd WSD, 201L, Edinburgh, SA
IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

No smart pointer, a separate class for each instance of the need for deep copy. You have a missing object in your design that owns the "entire structure" and implements the deep copy as its assignment operator. That object is responsible for de-duplicating pointers to shared objects. Whenever this comes up in whatever guise, a C++ garbage collector or smarter smart pointer, the answer is always the same: get the ownership of memory clear in you design and use RAII to manage memory and assignment operator to make copies. You shouldn't be trying to make sense of cyclical dependencies of raw pointers in the first place. Fix the bad design, don't enable it with a clever library. Regards, Luke

In my case I have several data types - tables, buffers, tuples, attributes etc. I need to be able to clone (ie deepcopy) any part of one of these hierarchical structures... there is no "entire structure", if you will. I understand what's being said about memory ownership, but in this case I have full control of my problem domain, and such a generic deep copy library would be useful and save time - otherwise I'm just going to have to implement deep copy behaviour inside of all my classes anyway (which is actually what I have at the moment). This pattern has come up several times before in my work, so it isn't a one-off, and the motivation is not to deal with cyclic dependencies (although that should probably be dealt with). Perhaps there should be a 'deep copy context' that you can create for your own code or share from other libraries, so for eg one library's idea of what "deep copying" an std::vector is, can differ from another library's. Would this address your concern over ambiguity of memory ownership? Given that this behaviour is implemented as a standard module in another language (python) I'm surprised it's being dismissed so easily? Allan ps -- I'm not a python programmer, just wanted to make that clear! Well I do use it but... you know what I mean. On Wed, Oct 26, 2011 at 8:33 AM, Simonson, Lucanus J < lucanus.j.simonson@intel.com> wrote:

On Tue, Oct 25, 2011 at 3:54 PM, Allan Johns <allan.johns@drdstudios.com>wrote:
It would help *me* if you gave a more concrete example of why this is useful...I feel like you're speaking in generalities that I'm having trouble connecting with :/ And, if it helps me, it might help others. - Jeff

Sure. Consider a (simplified) subset of the data types in my case: class Object{}; class AttributeBase : public Object{} template<T> class Attribute : public AttributeBase { shared_ptr<T> value; } template<T> class Buffer : public Object { shared_ptr<std::vector<T>> data; shared_ptr<Table<K,shared_ptr<AttributeBase>> attributes; } template<K> class Table : public Object { std::map<K,shared_ptr<Object>> map; } These objects are often arranged in a hierarchy - a table contains attributes, other tables, or buffers. An attribute in turn contains a shared_ptr to a pod type, and a buffer contains data in an std::vector (that may become a shared resource due to COW behaviour), and also contains a table of attributes. There may be cyclic dependencies that I can't do anything about (the user has created the hierarchy, and I can't suffer the cost of cyclic detection at runtime). I need to create a separate copy of some subset of this hierarchy, and this is initiated by the user - it might be a buffer, a table, an attribute, etc. At the moment this is achieved by each class containing a virtual clone() function that implements the "deep" copy. This function is passed a "cloning context" instance, which is a map of newly allocated instances, so that duplicates are avoided (ie shared objects). There is a special case (Buffer) where a "clone" operation does not result in an identical and separate copy of the data, instead the internal std::vector is shared with the new Buffer until such time that either are attempted to be written to (COW). Thus I have three implementations of clone() in this example, and I'm using a virtual function. If the deepcopy library that I'm describing existed, then two things would happen: - I'd drop the virtual. - I'd just have deepcopy specializations for Table, Buffer etc. In the 'typical' cloning cases (everything but Buffer), the implementation would be a one-liner, deferring deep copying of the already-supported types (shared_ptr, std::map) to the existing deepcopy implementations. The deepcopy library would take care of the object tracking side of things (avoiding duplicates etc), much like boost.serialize does currently. In the trivial case (ie one where there is no specific deepcopy behaviour like COW) the end result would be the same as boost.serializing the original hierarchy to memory, then back out again. hth A On Wed, Oct 26, 2011 at 10:04 AM, Jeffrey Lee Hellrung, Jr. < jeffrey.hellrung@gmail.com> wrote:

on Tue Oct 25 2011, Allan Johns <allan.johns-AT-drdstudios.com> wrote:
Pickling is considered "evil" in the BuildBot project precisely because you can't control the boundaries (http://irclogs.jackgrigg.com/irc.freenode.net/buildbot/2011-05-09), though I realize it's sometimes convenient. Seems to me that you could tackle this need by implementing a special "cloning_archive" type for Boost.Serialization, and be done with it. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Wed, Oct 26, 2011 at 10:17 AM, Dave Abrahams <dave@boostpro.com> wrote:
I very much wanted to do exactly this Dave, and initially I did. But it didn't really work out, especially taking custom deepcopy behaviour like COW into account. I did end up with something that worked, but it was pretty clunky. One example: boost.serialize code already exists for std::vector etc. However, in cases where assignment is equivalent to a deep copy (such as std::vector<int>) the actual data serialization had to be skipped. So I needed an "assignment_is_deepcopy" trait for all types involved. Then there were temporaries to worry about. Existing class serialization code can copy data to a temporary variable, then serialize from there. This will break "clone" serializing - you need to somehow tell the archive that the data in this case is temporary and needs to be copied in full, rather than just serializing a reference, which is the typical case, and necessary for optimised performance (simply copying all data is not acceptable - too expensive). So a tmp<> decorater class was needed, which meant that the solution wasn't extensible to existing serialization code from other libraries. There is also the unnecessary overhead involved. Boost.serialization stores tracking data that isn't necessary when you don't need to worry about persistence (for example, class ids etc). Not to mention the ubiquitous boost::serialize::nvp, which is unnecessary here. thx A

on Mon Oct 24 2011, Allan Johns <allan.johns-AT-drdstudios.com> wrote:
Just gauging initial interest in a boost.deepcopy library
<pet peeve alert> Sorry to slam this idea out of the gate, but the whole notion of a "deep copy" is broken and wrong (in my humble opinion). When you say you're going to "deep copy" an object it shows you don't understand the boundaries of that object's value. The object's value is copied by its copy constructor, and compared by its operator== (assuming it has one). If your "deep copy" extends beyond the boundaries of the value, there's no way of knowing how far it should extend. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Dave Abrahams <dave@boostpro.com> writes:
Fascinating discussion. I'm curious about your statement here. Do you object to "deep copy" as implemented in a copy constructor or the notion of "deep copy" of some aggregate data structure in general? For example, I have often needed to clone some branch of a tree data structure where nodes contain pointers to other nodes. For me this is in the context of a compiler and subtree cloning is a convenient way to perform code duplication. In the past I have implemented a virtual clone() member for each kind of tree node in order to get polymorphic behavior. clone() clones child nodes and all that. If your objection covers cases like the above, I am very interested in alternative solutions. Thanks! -Dave

on Wed Nov 02 2011, greened-AT-obbligato.org (David A. Greene) wrote:
I object to the terms "shallow copy" and "deep copy," because they're imprecise, and because they tend to confuse the notion of copying (full stop), which is fundamental and well-defined. I want to be able to talk about "copying this list of pointers" without someone saying "wait, do you mean a deep or a shallow copy?" And when they do say that, what do you suppose "shallow" and "deep" mean? It isn't clear: * shallow might mean that the two lists share storage, so that changing the first pointer in the original list changes the first pointer in the copy. * shallow might mean merely that no effort is made to clone the objects being pointed to by the list elements (i.e. ordinary std::list<T*> semantics). So then I need to ask for a definition of shallow/deep. The whole thing is a mess that arises all the time in languages with mutation but without an intrinsic notion of value semantics, but there's no reason we should go into that territory in C++.
Awesome. Cloning a subtree is a well-defined operation. If nodes in the tree happen to contain pointers that aren't part of the tree's parent/child/sibling link structure, I know they get copied bitwise and that's the end of it.
It might make sense to create a class that encapsulates a subtree and whose copy ctor implements the clone operation. Then again, it might not; depends on your application. -- Dave Abrahams BoostPro Computing http://www.boostpro.com
participants (9)
-
Allan Johns
-
Dave Abrahams
-
GMan
-
greened@obbligato.org
-
Jeffrey Lee Hellrung, Jr.
-
Julien Nitard
-
Levy, Gabriel (Contractor)
-
Simonson, Lucanus J
-
Steven Watanabe