Re: [boost] [submission] STL-compliant containers with copy-on-write

31 Mar 2011

      2011/3/31 Mathias Gaunard <mathias.gaunard@ens-lyon.org>
...
On 31/03/2011 21:19, Alexandr Sterligov wrote:
1. Cache returns cached objects "by value" - this means that objects are
...
copied.
Returning by value does not necessarily mean copying.
It can mean "nothing" (if the returned object is local to the function and
is used to initialize another object) or "move" (if the returned object is
local to the function or is moved from elsewhere).
...
-: Cache is read 100-1000 times more often then written. That's usual. We
...
can avoid 100-1000 copies.
+: almost no synchronization is needed.
2. Cache returns pointer or reference. No copying.
+: no copying, no memcpy...
-: very hard synchronization in case if cache is written. Pointer from
cache
is finally put to another thread (GUI even loop), so only const pointer or
reference may be returned from cache and it makes code in the GUI more
complex - we need to copy data before we'd like to change anything. It
starts to look like "external" COW.
Which is probably really what you want. A COW layer built on top.
Actually, your whole cache idea really looks like Boost.Flyweight, except
the latter purposely disallows mutability.
Immutability of stored object is reason why it cannot be used as cache. And
flyweight pattern is not for such cases.

Scenario:
We have list of appointments. Each appointment consist of several fields:
creator, subject, time, notes, list of attachements and some other stuff.
Creator may change some of the fields, these changes are delivered to all
participants of the appointment through network communication.

User goes to the list of appointment. GUI requests business-logic for it.
There is no object in the cache, server communication is done and
appointment list is received and stored to the cache.
User goes somewhere form the list of appointments and then goes back. GUI
requests business logic again and cached value is received again. But before
it is fully displayed on the screen, bottom layer thread which polls server
invalidates appointment list, as some of the values has been changed by
creator. It happens very seldom.

In following scenario it is nice to use move semantics, and when cache is
invalidated just remove value from the cache, shared_ptr will delete data
which is now in the GUI thread. But here comes the problem, GUI sometimes
would like to change data as well - for example apply filter, or sort it, or
change binary attachement. Full list of appointments and attachements may be
pretty large and both GUI and server side change is really seldom. Good
solution is to move, but then if there is any need to change - copy it.
That's exactly copy-on-write. It may be implemented in the container itself
or on the top like you've noticed. If it is implemented on the top there
will be a lot of logic duplicates all over the code. In my case:
appointments, messages, call history, medical measurements and other medical
data. With COW inside container I've implemented and tested it once and
don't need to take care about move semantics (values inside cache depends
from what's happening in the GUI - not nice...)
...
Another situation is inside high-performance streaming library (TVoIP
...
conference) - there are chain of filters which process media packets in
conveyor manner. Filters are developed separately and may work inplace, or
defer packets to process them later. If there is no COW, filters should
always copy packets for deferring or inplace modification.
Or you could simply move them.
Let's assume following scenario. We have 2 filters. FilterA would may deffer
packet to process it later and FilterB may process packet inplace (destroys
internal data).

Packet is received and put to the chain of filters.
First filter in chain FilterA decides to deffer packet. Filter saves packet
internally and will process it later together with next packet.
Second filter FilterB decides to process it inplace, so it will destroy
internal data of the packet. But this data would be needed later by
FilterA... Nither FilterA nor FilterB doesn't know about each other. They
cannot decide should they move or copy. Without any additional information
passing from one to another one of filters should always copy, but this copy
may be avoided in some cases.

Next packet is received.
FilterA decides to process it together with the previous one and it doesn't
need to save it in internal state.
Second FilterB still decides to process it inplace. FilterB can safely
destroy internal data as no one needs it.

Move semantics cannot handle it. On first packet it should be copyed
somewhere. On second packet it might not be copyed (move semantics). But as
FilterA doesn't know about FilterB and vice versa, there is no chance to
determine when to copy and when to move. There should be some additional
data passing between filters. It may be "COW on the top" or COW internally
in packet.

-- 
Best regards,
Alexander Sterligov