[BGL] boost::disjoint_set getting new representative after link()
Hi, I'm using boost::disjoint_sets_with_storage<>. After calling link(rep1, rep2), I want to know the new rep - the representative of the merged set. Obviously, I can call find_set(rep1) to find it. I assume that the extra call to find_set incurs an additional run-time cost per call to link. Is this true or is the call immediate (as in returning the immediate parent of rep1)? If it does incur a run-time cost, then we know that the new-rep is *either *rep1 or rep2. Is there some other way to know which one it is? On a related matter, I am also tracking the *size *of the sets for each link(). Does knowing the size of the rep set before the link help in determining which of rep1 or rep2 will be the new rep? Is there a simple way to extend boost::disjoint_sets_with_storage<> to do the size tracking internally (currenlty I keep an external size vector per rep to keep track of the link()s). Thanks, Adi
On Thu, 21 Mar 2013, Adi Shavit wrote:
Hi, I'm using boost::disjoint_sets_with_storage<>. After calling link(rep1, rep2), I want to know the new rep - the representative of the merged set.
Obviously, I can call find_set(rep1) to find it. I assume that the extra call to find_set incurs an additional run-time cost per call to link. Is this true or is the call immediate (as in returning the immediate parent of rep1)?
If you are using path halving, it appears that it is basically immediate (looks up the parent of whichever representative you gave it, then looks up the parent of that, and does nothing else if the sets were directly joined). Path compression seems to be more complicated but not do too much more work in your use case.
If it does incur a run-time cost, then we know that the new-rep is either rep1 or rep2. Is there some other way to know which one it is?
It does not appear so using the public interface.
On a related matter, I am also tracking the size of the sets for each link(). Does knowing the size of the rep set before the link help in determining which of rep1 or rep2 will be the new rep?
The test is not based on the sizes, but on the ranks of the sets being merged. Search for "union by rank" in https://en.wikipedia.org/wiki/Disjoint-set_data_structure for more explanation of how the representative for the merged set is chosen.
Is there a simple way to extend boost::disjoint_sets_with_storage<> to do the size tracking internally (currenlty I keep an external size vector per rep to keep track of the link()s).
When do you need to know the sizes? Only at the end of whatever process is building up the disjoint set structure? If so, calling compress_sets() then iterating directly over the parent map (the parents() member) might be the easiest way to go. Otherwise, disjoint_sets_with_storage<> does not directly contain any of the logic for maintaining the data structure, so it would not be hard to make your own edited version that maintains sizes. That may not be any better than the approach you are already using, though. -- Jeremiah Willcock
Hi Jeremiah, Thanks for your reply. I ended up writing my own version, it is quite a simple algorithm. It allowed me to do several of the following optimizations (which are relevant for my particular usage): 1. Return the merged rep directly from link(), instead of a void method - zero overhead. 2. Track the set size internally. My runtime decision to incrementally merge sets depends on the momentary size of each set. 3. Replace rank with size - the set size differences are equivalent to the set ranks, so I used the size instead of rank, and thus kept the same mem-footprint and runtime. 4. Track the number of disjoint sets (decreases by 1 for each link() call). Warm regards, Adi
Hi, I'm using boost::disjoint_sets_with_storage<>. After calling link(rep1, rep2), I want to know the new rep - the representative of the merged set.
Obviously, I can call find_set(rep1) to find it. I assume that the extra call to find_set incurs an additional run-time cost per call to link. Is this true or is the call immediate (as in returning the immediate parent of rep1)?
If you are using path halving, it appears that it is basically immediate (looks up the parent of whichever representative you gave it, then looks up the parent of that, and does nothing else if the sets were directly joined). Path compression seems to be more complicated but not do too much more work in your use case.
If it does incur a run-time cost, then we know that the new-rep is either rep1 or rep2. Is there some other way to know which one it is?
It does not appear so using the public interface.
On a related matter, I am also tracking the size of the sets for each link(). Does knowing the size of the rep set before the link help in determining which of rep1 or rep2 will be the new rep?
The test is not based on the sizes, but on the ranks of the sets being merged. Search for "union by rank" in https://en.wikipedia.org/wiki/Disjoint-set_data_structure for more explanation of how the representative for the merged set is chosen.
Is there a simple way to extend boost::disjoint_sets_with_storage<> to do the size tracking internally (currenlty I keep an external size vector per rep to keep track of the link()s).
When do you need to know the sizes? Only at the end of whatever process is building up the disjoint set structure? If so, calling compress_sets() then iterating directly over the parent map (the parents() member) might be the easiest way to go. Otherwise, disjoint_sets_with_storage<> does not directly contain any of the logic for maintaining the data structure, so it would not be hard to make your own edited version that maintains sizes. That may not be any better than the approach you are already using, though.
-- Jeremiah Willcock
participants (2)
-
Adi Shavit
-
Jeremiah Willcock