
SUMMARY ======= I would like to summarize the discussion about copy-on-write so far: * I wrote a copy-on-write-pointer implementation <https://github.com/ralphtandetzky/cow_ptr.git> with the following use cases: 1. A const-correct pimpl pointer. 2. Helper class for implementing copy-on-write for higher level structures, where copying is expensive. (E.g. matrix or image classes) 3. cow_ptr<Base> wraps polymorphic classes giving them genuine value semantics, so they can be put into standard containers, even if Base is abstract. 4. It can be used to add cloning to a class hierarchy non-intrusively. * Thread-safety was discussed. (brought up by Alexey) - The reference counter is atomic. - All constant operations on cow_ptr and its pointee are thread-safe as long as const operations on the pointee are thread-safe. - If for the pointee constant operations are thread-safe and if it is safe to write to a pointee from one thread as long as no one else is reading or writing, then the same is true for all individual cow_ptrs pointing to that object and all access though these cow_ptrs. * Is cow still necessary? (brought up by Mathias) - Since in C++11 you can move objects cheaply instead of copying them an important use case of copy-on-write is gone. Before move-semantics returning objects by value was sometimes a bad performance issue. Copy-on-write solved that. - Cow is still useful today for matrix classes or image classes or even trees that share state under the hood, but should not influence each other when writing. - Example: If you want to implement a property tree, you can use the approach class PropertyTree : AbstractProperty { public: /* implementation of public interface. */ private: std::list<cow_ptr<AbstractProperty>> properties; }; - Having this you can keep a history of a big property tree in memory easily. std::vector<PropertyTree> history; auto current = history.back(); current.modify(); history.push_back( current ); * Is COW unsafe? (brought up by Mathias) - COW is sometimes considered unsafe. That's why the C++ standard COW implementations of std::string. - The code std::string a("Hello world!"); char * p = &a[11]; std::string b( a ); *p = '.'; // modifies a and b, if std::string was implemented using COW. does not work correctly, for COW-implementations of std::string. - The reason this does not work is the escaped pointer. When escaping pointers are strictly avoided, this effect cannot happen. Therefore cow_ptr does not provide a non-const version of the get() member function, but a member function modify() (formerly known as apply()) which can be used in the following way: cow_ptr<MyType> p( new MyType ); auto q = p; p.modify( [&]( MyType * p ){ p->doSomething(); p->doSomethingElse(); } ); COW_MODIFY(p) { p->doSomething(); p->doSomethingElse(); }; // equivalent to the line above - It is still possible for a pointer to escape, but the interface is such that it is easy to use it correctly and hard to use incorrectly. - The interface design of std::string prevents the possibility for implementing it correctly. Hence COW must be considered during interface design phase of a class. * Alternatives to COW (brought up by Mathias) - C++11 move and cloning. -Most often unnecessary copies can be avoided using C++11 move-semantics and cloning where necessary. - Flyweight factory. - Objects are accessed by a hash value. There's always only one copy of identical objects. For complex objects that are modified often recalculating the hash and synchronizing the hash table thread-safely can be a bad performance bottleneck. - shared_ptr<T const> - Even with shared_ptr<T const> you never know, if there's a shared_ptr<T> object (non-const) through which the pointee is modified. shared_ptrs are really shared. It is likely more error prone to use shared_ptr to implement COW. If T is a polymorphic class but does not have a clone() member function, then cloning will not work properly because of slicing. shared_ptr is useful for many things, but it's probably not the best tool to implement COW. * The name (brought up by Peter) - cow is an acronym and lower case. It's a farm animal ... enticing me to write member function names like "moo". The name does not reflect the ability to contain polymorphic value pointers. (Peter) - clone_on_write<T> would be a suggestion of mine. It might be useful to drop the _ptr suffix completely, since the class has value semantics. - Others have suggested to split cow_ptr<T> into a read_ptr<T> and write_ptr<T> classes. * Slicing problems (brought up by Vincente) - The constructor taking an Y * pointer might lead to slicing problems, if the pointee is not an Y object, but somethings derived. - The default_copier will make a runtime-check assert( typeid(*p) == typeid(Y) ). * Comparison to adobe::copy_on_write<T> <http://cppnow.org/session/value-semantics-and-concepts-based-polymorphism/> (brought up by Andreas) - This class is constructed by moving a T object into itself. Copying is implemented as cheap copy of a pointer with reference counting. - other than constructors, destructors and assignment operators there are only the public member functions read() and write(). read() returns a const reference to the contained object, write() makes an internal copy, if the reference count is greater than 1, and then returns a non-const reference to the contained object. - The class does not support cloning for polymorphic T, but always uses the copy-constructor of T in order to copy. - Hence the class interface is extremely simple. * Comparison to value_ptr<T> <http://www.google.de/url?sa=t&rct=j&q=n3339&source=web&cd=3&sqi=2&ved=0CD4QFjAC&url=http%3A%2F%2Fwww.open-std.org%2Fjtc1%2Fsc22%2Fwg21%2Fdocs%2Fpapers%2F2012%2Fn3339.pdf&ei=umkbUabjN6nh4QTAmoCYAg&usg=AFQjCNGikPTGbnWijae8tzd1KTLvz1C63Q> in N3339 (open-std) (brought up by Vincente) - Basic properties: A value_ptr<T> mimics the value semantics of its pointee. Hence the pointee lifetime is the pointer lifetime, and the pointee is copied whenever the pointer is copied. Internally the pointee can be of a derived class of T. In this case the object is cloned properly. - Hence value_ptr<T> has the use-cases 1, 3 and 4 of cow_ptr<T>, but does not implement copy-on-write (use case 2). - value_ptr has the cloner and the deleter as template arguments of the class. The current implementation of cow_ptr only has the pointee type as template parameter. The cloner and deleter are stored dynamically. - value_ptr does not have a reference counter. - Other than that value_ptr<T> and cow_ptr<T> are extremely similar from the public interface. - In conjunction with copy_on_write<T> this can be used to do the same stuff as cow_ptr<T> does. The way to use it would be copy_on_write<value_ptr<T>>. * pointer-semantics or value-semantics and nullptr (brought up by Vincente) - Should the COW-class be nullable? If not, then it should probably not be called cow_ptr. - This question has not been discussed to the end yet. Personally, I don't think that null-cow_ptrs are very useful. * Different member and non-member functions (brought up by Vincente) - relational operators (brought up by Vincente) - It is not clear, whether operator==() on cow_ptrs should only compare pointers or also pointees. This would depend on whether the COW-class is considered a pointer or a genuine value. - release() - Should not exists, because the callee would not know what deleter to call. (similar to shared_ptr) - reset() - Will be implemented in order to provide the performance benefits. * The write_ptr<T> and read_ptr<T> solution (brought up by Peter) - read_ptr<T> would be similar to shared_ptr<T const> and write_ptr<T> would be a unique_ptr<T> equivalent. read_ptr<T> has a member function which returns a write_ptr<T> through which the pointee can be modified. Afterwards the write_ptr<T> can be moved back into the read_ptr<T>: read_ptr<T> pr; if ( write_ptr<T> pw = pr.write() ) { pw->modify(); pr = std::move( pw ); } - This possibly provides a better separation of concerns (safer, clearer, more flexible). - However, the above code is not exception-safe, if pr becomes a nullptr when the write() function is called. It makes exception-safe code harder to write. - In case the use_count is greater than 1: Should pr.write() make the copy? Or should pw.operator->() make the copy? This is not sufficiently discussed yet. Thank you for all your constructive feedback! Ralph