
On 8/7/07, François Duranleau <duranlef@iro.umontreal.ca> wrote:
On Tue, 7 Aug 2007, Anthony Williams wrote:
How does this work with iterators? On a normal map you can get an iterator to an entry and it remains valid until that entry is deleted. Can you do this with your shared map? I'm inclined to think that any modification to the map would have to result in all iterators being invalidated, as they may now potentially refer to the wrong copy of a node.
I also wonder how he implements iterators without a pointer to the parent. AFAIK, that is one way to allow stepping from one element to the next (or the previous) from any node, with no knowledge of how we accessed that node. Without a parent pointer, aside from using threaded trees (which would created cycles of shared pointers), I don't know how to do it efficiently (keeping a stack inside the iterator would make them costly to copy, and would require O(log N) memory).
First, in reference to François, you are correct that the iterators contain log(n) components, and thus copy of an iterator is log(n). In reply to Anthony, given the current implementation, you are correct, as my project only needs const_iterators. However there is a solution as follows: First note that the iterators are not a single pointer as mentioned, but a stack of pointers,since they must maintain parent data in some way. The iterator also maintainsa pointer to the collection it was generated from. The stack of pointers allows amortized constant time access to previous and next elements without starting over from root. Now to deal with modification of the underlying collection, we put a 'version number' on each red black tree node, and when we generate the iterator, place the version number alongside the pointers as we do our find(), etc. When we copy red-black node, we inc the version number on the source node (the copy starts at 0). Now our iterators check the pointer's versions as they traverse them to make sure the cached and current numbers are identical. If this check ever fails, the iterator resets itself by looking up the key in the collection anew, which is transparent to the user (although it incurs another log(n) bit of work, breaking the constant time requirement on transversal, but only for the cases where a collection is under heavy modification during transversal). This makes the shared_map have the same iterator stability guarantees as std::map. Additionally, I need to cache the 'begin' iterator, since begin should be constant time, this is also a mild pain but doable. Note the pointers cached by the iterator can't become invalid since an iterator hold reference counts. Also note: the dereference operator for non const iterators may need to do a log(n) copy down the tree if the node in question has a ref-count > 1, otherwise changes to the mapped_type would be visible to others. To avoid needless work when the user isn't going to modify the mapped_type during a dereference, we could return a proxy object with a cast and operator=, etc, but that's probably more hassle than it's worth. Another tricky question is const_iterators. The general semantics for const_iterators in std::map are that they have visibility to changes to the map, but do not allow changes to the map. I can make my const_iterators simpler (not check version number, etc), if I make them see a 'frozen' version of the collection (effectively they iterate over a copy of the collection, since copies are free). This is actually very convenient in many cases, but causes the shared_map to have slightly different semantics from std::map, and I'm not sure if it's a good design decision. On the other hand, there is a high cost to tracking changes which could be avoided for const_iteators, and isn't scanning the collection with a const_iterator while modifying it a fairly cracked thing to do anyway? I can't think of any std::algorithm for example that does any such thing. -Jeremy