
2011/3/31 Mathias Gaunard <mathias.gaunard@ens-lyon.org>
On 31/03/2011 21:19, Alexandr Sterligov wrote:
1. Cache returns cached objects "by value" - this means that objects are
copied.
Returning by value does not necessarily mean copying. It can mean "nothing" (if the returned object is local to the function and is used to initialize another object) or "move" (if the returned object is local to the function or is moved from elsewhere).
-: Cache is read 100-1000 times more often then written. That's usual. We
can avoid 100-1000 copies. +: almost no synchronization is needed.
2. Cache returns pointer or reference. No copying. +: no copying, no memcpy... -: very hard synchronization in case if cache is written. Pointer from cache is finally put to another thread (GUI even loop), so only const pointer or reference may be returned from cache and it makes code in the GUI more complex - we need to copy data before we'd like to change anything. It starts to look like "external" COW.
Which is probably really what you want. A COW layer built on top. Actually, your whole cache idea really looks like Boost.Flyweight, except the latter purposely disallows mutability.
Immutability of stored object is reason why it cannot be used as cache. And flyweight pattern is not for such cases. Scenario: We have list of appointments. Each appointment consist of several fields: creator, subject, time, notes, list of attachements and some other stuff. Creator may change some of the fields, these changes are delivered to all participants of the appointment through network communication. User goes to the list of appointment. GUI requests business-logic for it. There is no object in the cache, server communication is done and appointment list is received and stored to the cache. User goes somewhere form the list of appointments and then goes back. GUI requests business logic again and cached value is received again. But before it is fully displayed on the screen, bottom layer thread which polls server invalidates appointment list, as some of the values has been changed by creator. It happens very seldom. In following scenario it is nice to use move semantics, and when cache is invalidated just remove value from the cache, shared_ptr will delete data which is now in the GUI thread. But here comes the problem, GUI sometimes would like to change data as well - for example apply filter, or sort it, or change binary attachement. Full list of appointments and attachements may be pretty large and both GUI and server side change is really seldom. Good solution is to move, but then if there is any need to change - copy it. That's exactly copy-on-write. It may be implemented in the container itself or on the top like you've noticed. If it is implemented on the top there will be a lot of logic duplicates all over the code. In my case: appointments, messages, call history, medical measurements and other medical data. With COW inside container I've implemented and tested it once and don't need to take care about move semantics (values inside cache depends from what's happening in the GUI - not nice...)
Another situation is inside high-performance streaming library (TVoIP
conference) - there are chain of filters which process media packets in conveyor manner. Filters are developed separately and may work inplace, or defer packets to process them later. If there is no COW, filters should always copy packets for deferring or inplace modification.
Or you could simply move them.
Let's assume following scenario. We have 2 filters. FilterA would may deffer packet to process it later and FilterB may process packet inplace (destroys internal data). Packet is received and put to the chain of filters. First filter in chain FilterA decides to deffer packet. Filter saves packet internally and will process it later together with next packet. Second filter FilterB decides to process it inplace, so it will destroy internal data of the packet. But this data would be needed later by FilterA... Nither FilterA nor FilterB doesn't know about each other. They cannot decide should they move or copy. Without any additional information passing from one to another one of filters should always copy, but this copy may be avoided in some cases. Next packet is received. FilterA decides to process it together with the previous one and it doesn't need to save it in internal state. Second FilterB still decides to process it inplace. FilterB can safely destroy internal data as no one needs it. Move semantics cannot handle it. On first packet it should be copyed somewhere. On second packet it might not be copyed (move semantics). But as FilterA doesn't know about FilterB and vice versa, there is no chance to determine when to copy and when to move. There should be some additional data passing between filters. It may be "COW on the top" or COW internally in packet. -- Best regards, Alexander Sterligov