
Jeremy: You might be interested in this paper which basically describes (albeit from a different design point of view) your data structure: James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre Tarjan: Making Data Structures Persistent. J. Comput. Syst. Sci. 38 (1): 86-124 (1989) As far as the lock goes, why not a shared_ptr per node? (those use atomic ref counters) As long as you do not maintain the size of the map, that should be all good. You can maintain the size in every node, and get the size of a collection from the root. I've heard of shared DAG structures to represent hierarchical shared collections, along the same principles. They work well, if a bit slow. --Herve Bronnimann On Aug 7, 2007, at 12:59 AM, Jeremy Bruestle wrote:
Greetings,
I've recently implemented a new collection type for a project of mine. It works identically to std::map, with the same algorithmic guarantees, but additionally can be 'copied' in constant time. That is:
shared_map<int, int> a; for(i = 0; i < 100; i++) a[i] = i*i; // Each insert in log(n) of the map size shared_map<int, int> b = a; // This is constant time (not linear as in a std::map) b[1] = 7; // Still log(n) of course b[2] = 3;
assert(a[2] == 4 && b[2] == 3 && a[6] == 36 && b[6] == 36);
The trick is simple: the map is a red-black tree, but we get rid of the explicit parent pointer for each red-black node. This complicates the code, but is algorithmically uninteresting since we always descend from the root anyway and just need to track the parent as we go, either in an explicit stack or through recursion (my current implementation). The key point however is that the lack of an explicit parent allows us to use reference counting, and share children. Thus the copy from a to b simply copies the root node, and adds an additional reference count. All future changes do copy on write for shared nodes (any with a ref count of 2 or more).
The does lower the overall performance for all operations somewhat, but for cases like mine where there is a need to copy large maps a lot, and each map remains quite similar to the other maps, there is a huge savings in computation and memory usage. Another nice use is: copy a map (since it's free) and iterate over it while modifying the original with no worries.
One caveat, obviously multiple threads require a single global lock since even apparently different collections share internal state (or I could add a per node lock, which has the nice effect of general fine grained tree locking, but is tons of overhead, perhaps a compile time choice?).
My current implementation is sufficient for my purposes, but with a little cleanup (making stronger iterator validity guarantees, better support for erasing ranges, explicit parent stack instead of recursion perhaps), it could be perhaps useful to everyone. Also, it's a little work but straightforward to generalize to shared_set, shared_multimap, etc.
I was wondering if there would be any interest from people in such a library? If so, I would be willing to do the needed cleanup, if not, it suits me reasonably well as is (I don't even use erase). Also curious if anyone has seen such a thing anywhere before. It seems obvious enough to me, but perhaps it's actually new, I couldn't find anything like it when I went searching.
Thanks for your time
-Jeremy _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost