Re: [boost] Shared Map

8 Aug 2007

      Jeremy: You might be interested in this paper which basically  
describes (albeit from a different design point of view) your data  
structure:

James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre  
Tarjan: Making Data Structures Persistent. J. Comput. Syst. Sci. 38 
(1): 86-124 (1989)

As far as the lock goes, why not a shared_ptr per node? (those use  
atomic ref counters)  As long as you do not maintain the size of the  
map, that should be all good.  You can maintain the size in every  
node, and get the size of a collection from the root.

I've heard of shared DAG structures to represent hierarchical shared  
collections, along the same principles.  They work well, if a bit slow.

--Herve Bronnimann

On Aug 7, 2007, at 12:59 AM, Jeremy Bruestle wrote:
...
Greetings,
I've recently implemented a new collection type for a project of mine.
 It works identically to std::map, with the same algorithmic
guarantees, but additionally can be 'copied' in constant time.  That
is:
shared_map<int, int> a;
for(i = 0; i < 100; i++)
   a[i] = i*i;   // Each insert in log(n) of the map size
shared_map<int, int> b = a;   // This is constant time (not linear as
in a std::map)
b[1] = 7;  // Still log(n) of course
b[2] = 3;
assert(a[2] == 4 && b[2] == 3 && a[6] == 36 && b[6] == 36);
The trick is simple: the map is a red-black tree, but we get rid of
the explicit parent pointer for each red-black node.  This complicates
the code, but is algorithmically uninteresting since we always descend
from the root anyway and just need to track the parent as we go,
either in an explicit stack or through recursion (my current
implementation).  The key point however is that the lack of an
explicit parent allows us to use reference counting, and share
children.  Thus the copy from a to b simply copies the root node, and
adds an additional reference count.  All future changes do copy on
write for shared nodes (any with a ref count of 2 or more).
The does lower the overall performance for all operations somewhat,
but for cases like mine where there is a need to copy large maps a
lot, and each map remains quite similar to the other maps, there is a
huge savings in computation and memory usage.  Another nice use is:
copy a map (since it's free) and iterate over it while modifying the
original with no worries.
One caveat, obviously multiple threads require a single global lock
since even apparently different collections share internal state (or I
could add a per node lock, which has the nice effect of general fine
grained tree locking, but is tons of overhead, perhaps a compile time
choice?).
My current implementation is sufficient for my purposes, but with a
little cleanup (making stronger iterator validity guarantees, better
support for erasing ranges, explicit parent stack instead of recursion
perhaps), it could be perhaps useful to everyone.  Also, it's a little
work but straightforward to generalize to shared_set, shared_multimap,
etc.
I was wondering if there would be any interest from people in such a
library?  If so, I would be willing to do the needed cleanup, if not,
it suits me reasonably well as is (I don't even use erase).  Also
curious if anyone has seen such a thing anywhere before. It seems
obvious enough to me, but perhaps it's actually new, I couldn't find
anything like it when I went searching.
Thanks for your time
-Jeremy
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/ 
listinfo.cgi/boost

Re: [boost] Shared Map

Hervé Brönnimann