[BGL] memory leak in graph/r_c_shortest_paths.hpp

Hello, I'm using BGL in my project and, with some input data, I get - sooner or later, as I run r_c_shortest_paths many times - a segfault. This behaviour is apparently quite random, as I can see by implementing a custom visitor that the last label evaluated before the crash changes all the time. I must also say that the graph I use, while always having the same topological structure, has different properties that affect labelling each time I run the algorithm and these vary quite randomly from one run to the other, as they are generated by a random heuristics. Anyway, with one particular set of input data, I manage to have the program segfault (as I said, sooner or later) 100% of the times. This is far from being an MWE, but it's the best I could produce. I tried to debug the memory leak with valgrind and it turns out that r_c_shortest_paths_dispatch is trying to access memory in a block previously freed by: l_alloc.deallocate( cur_outer_splabel.get(), 1 ); in r_c_shortest_paths.hpp:314. After some debugging I started to believe that the problem doesn't rely with my code, but it might be a bug in r_c_shortest_paths.hpp. If this is actually the case and where the problem lies, though, I haven't been able to figure out with certainty. The output from valgrind can be found here: http://pastebin.com/raw.php?i=WWafwFv3 (output cleaned up and with templates substituted) http://pastebin.com/raw.php?i=cb3p6jzM (raw output intended to facilitate reading and with templates substituted) http://pastebin.com/raw.php?i=qeauBQTm (raw output indented to facilitate reading) http://pastebin.com/raw.php?i=hpppf49c (raw output) My code (the whole project) can be found here: https://github.com/alberto-santini/maritime-vrp In order to run it, one must have a licensed copy of CPLEX and change the include and library paths in the Makefile's accordingly. Unfortunately, without putting the labelling part inside a price-and-branch procedure (that uses CPLEX as a solver for the master problem) I wasn't able to produce input data by hand that would lead me to obtain the segfault. I understand this is not the best way to investigate into a bug and will be happy to provide any further help if requested to. Thanks, AS -- View this message in context: http://boost.2283326.n4.nabble.com/BGL-memory-leak-in-graph-r-c-shortest-pat... Sent from the Boost - Users mailing list archive at Nabble.com.

On Mon, 23 Sep 2013, potato_research wrote:
Hello,
I'm using BGL in my project and, with some input data, I get - sooner or later, as I run r_c_shortest_paths many times - a segfault.
This behaviour is apparently quite random, as I can see by implementing a custom visitor that the last label evaluated before the crash changes all the time. I must also say that the graph I use, while always having the same topological structure, has different properties that affect labelling each time I run the algorithm and these vary quite randomly from one run to the other, as they are generated by a random heuristics.
Anyway, with one particular set of input data, I manage to have the program segfault (as I said, sooner or later) 100% of the times. This is far from being an MWE, but it's the best I could produce.
Could you please try the version that is in the trunk now? I added a bunch of assertions to try to diagnose this issue.
I tried to debug the memory leak with valgrind and it turns out that r_c_shortest_paths_dispatch is trying to access memory in a block previously freed by:
l_alloc.deallocate( cur_outer_splabel.get(), 1 );
in r_c_shortest_paths.hpp:314. After some debugging I started to believe that the problem doesn't rely with my code, but it might be a bug in r_c_shortest_paths.hpp. If this is actually the case and where the problem lies, though, I haven't been able to figure out with certainty.
The output from valgrind can be found here: http://pastebin.com/raw.php?i=WWafwFv3 (output cleaned up and with templates substituted) http://pastebin.com/raw.php?i=cb3p6jzM (raw output intended to facilitate reading and with templates substituted) http://pastebin.com/raw.php?i=qeauBQTm (raw output indented to facilitate reading) http://pastebin.com/raw.php?i=hpppf49c (raw output)
My code (the whole project) can be found here: https://github.com/alberto-santini/maritime-vrp
In order to run it, one must have a licensed copy of CPLEX and change the include and library paths in the Makefile's accordingly. Unfortunately, without putting the labelling part inside a price-and-branch procedure (that uses CPLEX as a solver for the master problem) I wasn't able to produce input data by hand that would lead me to obtain the segfault.
Can you dump each set of input data before it goes into r_c_shortest_paths, then send whichever version causes it to fail? -- Jeremiah Willcock

Jeremiah, I'm working on your suggestion of dumping the data and obtaining an MWE that we could make into a test (if it turns out there's a problem in the boost code). In the meanwhile, one of the assertions you added triggered! Here it is: Assertion failed: (p_cur_label->b_is_valid), function r_c_shortest_paths_dispatch, file /Users/alberto/Applications/boost/boost/graph/r_c_shortest_paths.hpp, line 472. Thanks, AS -- View this message in context: http://boost.2283326.n4.nabble.com/BGL-memory-leak-in-graph-r-c-shortest-pat... Sent from the Boost - Users mailing list archive at Nabble.com.

On Mon, 23 Sep 2013, potato_research wrote:
Jeremiah, I'm working on your suggestion of dumping the data and obtaining an MWE that we could make into a test (if it turns out there's a problem in the boost code). In the meanwhile, one of the assertions you added triggered! Here it is:
Assertion failed: (p_cur_label->b_is_valid), function r_c_shortest_paths_dispatch, file /Users/alberto/Applications/boost/boost/graph/r_c_shortest_paths.hpp, line 472.
It's good that the assertion triggered; it probably makes the issue easier to debug. Here are a couple of questions that might suggest places to look: 1. Do the cases that fail have multiple, partially-overlapping Pareto-optimal paths to the goal? Do they have multiple paths that are tied for "goodness"? 2. If you remove all of the clearing of b_is_valid, plus remove all of the object destruction and deallocation calls (to avoid the program crashing), does the answer seem reasonable? -- Jeremiah Willcock

1. Do the cases that fail have multiple, partially-overlapping Pareto-optimal paths to the goal? Do they have multiple paths that are tied for "goodness"?
This is very difficult for me to say, due to the size of the graphs which have around 400 nodes and up to 5000 edges. What I can tell you, though, is that the instances that cause the segfault seem to take significantly longer than the others (i.e. if a non-crashing instance completes in t, a crashing instance takes around 10^4 * t to crash - more on this in the next paragraph) and this makes me think that the reason is that there are less labels that are dominated and - henceforth - more paths "tied for goodness" as you suggest.
2. If you remove all of the clearing of b_is_valid, plus remove all of the object destruction and deallocation calls (to avoid the program crashing), does the answer seem reasonable?
I did as you suggested. The first thing I could do is to confirm the perceived difference in execution time, as mentioned above. Here is an example (program comiled with -O0 -g): Time elapsed (on 0.1-reduced graph, 194 edges): 0.000195 seconds. Time elapsed (on 0.2-reduced graph, 194 edges): 0.0002 seconds. Time elapsed (on 0.3-reduced graph, 907 edges): 0.000176 seconds. Time elapsed (on 0.4-reduced graph, 2271 edges): 0.000179 seconds. Time elapsed (on 0.5-reduced graph, 2922 edges): 0.000182 seconds. Time elapsed (on 0.6-reduced graph, 2922 edges): 0.00018 seconds. Time elapsed (on 0.7-reduced graph, 3003 edges): 0.707146 seconds. <******* Time elapsed (on 0.8-reduced graph, 3232 edges): 2.04194 seconds. <******* Time elapsed (on 0.9-reduced graph, 3470 edges): 3.71097 seconds. <******* I think the marked instances are the ones that would have caused a segfault. [Just for reference, the "reduced graph" the above snippets refer to is tied to the implementation of the algorithm. I first try labelling on a graph where I remove arcs whose cost is > N * maximum_prize_collectable, with N varying from 0.1 to 1 until some negative reduced cost path is found among the pareto-optimal ones.] Removing the destructions and deallocations makes the program complete without crashing. I can't check by hand if the labels obtained are correct until I mange somehow to scale down the problem and still bring the crash about. The only thing I can say is that the whole column generation algorithm always gives the same result (I ran it 40 times) and the result seems to make sense. The next step was to keep the destructions and only remove the deallocations and the program still works and produces the same result. I'm still working at dumping the graphs before executing r_c_shortest_paths, but it's not easy as vertices and edges have shared_ptr to quite complex objects as their bundled properties and so I would need to dump them first and then somehow reconstruct their relationships. In the meanwhile I'm also trying to generate smaller graphs that still exhibit the faulty behaviour - without success up to now. AS -- View this message in context: http://boost.2283326.n4.nabble.com/BGL-memory-leak-in-graph-r-c-shortest-pat... Sent from the Boost - Users mailing list archive at Nabble.com.
participants (2)
-
Jeremiah Willcock
-
potato_research