Re: [boost] [smart ptr] Any interest in copy-on-write pointer for C++11?

9 Feb 2013


      On 02/09/2013 07:01 AM, Mathias Gaunard wrote:
...
On 08/02/13 18:41, Ralph Tandetzky wrote:
...
Copy-on-write is the most important use case of the class design. Even
larger libraries like Qt use copy-on-write for some types (like QImage
or QPixmap) and make the user interface much easier to use.
That's because Qt is broken, and is written to support broken C++ 
programming practices.
If you only copy when you need to copy, then there is no need for the 
COW mechanism. COW was invented to workaround issues with code that 
copied but didn't actually need to.
COW also has the nasty effect that your real object is gonna change 
its address just because you modified it. For this reason the C++11 
standard prevents implementation of standard components like 
std::string from using that mechanism.
Obviously, COW was invented for code that copied but didn't need to. And 
that's still a valid reason to use COW today in C++11. Yes, you can move 
objects and avoid a copy. Or you can swap cheaply, if you need to. But 
sometimes you need to copy, if you don't know for sure whether the 
reference count is 1.

Under your assertions the STL is just as broken. For example 
std::vector<T>::pushback() might change the address of the contained 
data and therefore invalidate all pointers and iterators to it. That's 
still a source of many bugs unfortunately, especially for newbies in 
C++. The COW implementation of std::string had the following weird effect:
         std::string a("Hello world!");
         char & p = a[11];
         std::string b( a ); // makes a cheap copy
         p = '.'; // modifies a and b
The reason is the escaping reference. The same effect can be achieved 
with QImage because the interface allows escaping pointers to the 
contained image data. The cow_ptr<T> design I'm proposing tries to avoid 
this pitfall. There's a const version of the member function get(), but 
not mutable one, since that would let a pointer escape instantly. The 
only function that lets a non-const pointer escape directly is 
operator->(), which is reasonable, since it is unlikely for a user of 
the class to write
         T * raw = ptr.operator->();
To still make it possible to modify the pointed to object easily you can 
use the cow_ptr member function apply() which takes a functor which 
takes a raw pointer to T or to use the macro COW_APPLY in the following way
         auto ptr = make_cow<Type>( /* constuctor arguments */ );
         ptr.apply( [&]( Type * p ) { p->modify(); } ); // equivalent to 
ptr->modify();
         COW_APPLY(ptr) { ptr->modify(); }; // does the same
If you really want to, you can still let a pointer or a reference to the 
pointed-to object escape. But it's harder. This way the interface is 
easy to use correctly and hard to use incorrectly. If you use it 
correctly (i.e. don't let references escape), then it does not suffer 
from the above problem with the cow implementation of string and I 
believe it's safe.
...
...
A major
reason is that value semantics are easier to reason about than reference
semantics. For this purpose cow_ptr<T> is an enabler. Moving isn't
always sufficient.
Value semantics do not require COW. There is no logic between your 
statements. Value semantics mean that when you modify a copy of an 
object, then the original is left unmodified. To achieve this you can 
either copy when requested or delay it until you're actually modifying.
Sorry, my statement was incomplete. The goal is to avoid unnecessary 
copies, because copying can be really expensive (think of the data 
inside an image or matrix class). I would prefer a matrix class with 
value sematics instead of reference semantics. It's easier to reason 
about. And I would like to be able to write
         Matrix a, b(1000,1000);
         a = b;
without having to fear that the whole 1000x1000 matrix is deeply copied.
...
The simplest way to deal with this is to simply copy when asked to 
copy by the user, which is not only straightforward and keeping a good 
separation of concerns, but it also means you don't need the delaying 
mechanism and the overhead attached to it.
KISS.
For client code of a class using cow_ptrs for data members internally it 
is even easier not to worry about making copies, but the class 
automatically does it for you. cow_ptr helps to implement that behaviour.
...
...
The class design allows you to copy objects polymorphically. I tried it.
It works. It's useful.
It would work just as well without COW. COW does not affect observable 
behaviour, it's purely an implementation-specific detail.
It should not leak in the interface.
It can be an implementation-specific detail and can make code faster 
without the client code knowing about it. But it can also be a thing the 
client code relies upon as with the Matrix class above. Sometimes client 
code might want to know, if copies can be made cheaply.
...
...
I used it in production code.
That just means the code does what you need and is stable enough to 
work reliably in your use cases.
This doesn't say anything about the quality of the design.
At least I could convince you that there are legit use cases. That was 
harder than I expected.