multi_index_container assertion after crash
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Hi, I have an application which uses a persistent multi_index_container (I am using bip::managed_mapped_file && after openning the file I do "container = mappedFile->find_or_construct(...)"). I have an interesting behavior when my process crashes/stopped during massive use of the multi_index_container after the process restarts I don't experience any problems reading or inserting new elements to the container but if I try to erase elements or call mutli_index_container::clear (Debug mode only BOOST_MULTI_INDEX_ENABLE_SAFE_MODE is defined) an assertion pops up (safe_mode::Check_same_owner / safe_mode::Check_valid_iterator). At first I thought that this kind of assertion fails due to concurrency issue, but it is not the case (I verified that this data structure is not accessed from more than one thread at a time, during this time the container is locked). Does anyone has any idea what can cause this situation? Or has a resolution ? Thanks a lot in advance, Eli.
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky escribió:
Hi,
I have an application which uses a persistent multi_index_container (I am using bip::managed_mapped_file && after openning the file I do “container = mappedFile->find_or_construct(…)”). I have an interesting behavior when my process crashes/stopped during massive use of the multi_index_container after the process restarts I don’t experience any problems reading or inserting new elements to the container but if I try to erase elements or call mutli_index_container::clear (Debug mode only BOOST_MULTI_INDEX_ENABLE_SAFE_MODE is defined) an assertion pops up (safe_mode::Check_same_owner / safe_mode::Check_valid_iterator).
At first I thought that this kind of assertion fails due to concurrency issue, but it is not the case (I verified that this data structure is not accessed from more than one thread at a time, during this time the container is locked). Does anyone has any idea what can cause this situation? Or has a resolution ?
Hi Eli, I'm afraid you're providing too little information, but let me try a shot in the dark: are you persisting multi_index_containers iterators as well? These iterators are *not* persistable in managed memory as they contain absolute pointers as their representation. Can this be related to the problem? Other than this, I sincerely don't have a clue. If the crash/stop happens in the middle of a multi_index_container op then the data structure can be left in an unconsistent state, though I'm having a hard time figuring out how this inconsistent state can result in safe mode assertions as the ones you describe. Maybe if you could provide a little more context I'd be able to be more helpful than this. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Joaquín hi, First thanks for your response. The iterators I use are not persistent (I am retrieving the iterators from the index each time I am adding or removing element to/from the container). And even though the crash happens during massive use of the container, it does not happens in the middle of multi_index_container op (I have an applications log where I can see when exactly where in the code the app. crashes). I thought that may be the multi_index_container shall be flushed to the disk (it will effect performance) but I didn't find any method to do it. For more information I would like to tell you in general about the test I do: 1. at first I populate my multi_index_container with elements (~4000). 2. I invoke a process which removes elements from the container (and processing each element after its removal - this phase doesn't relate to any boost structure ). During stage # 2 I am killing the process via task manager. Thanks a lot! Eli. -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of joaquin@tid.es Sent: Tuesday, October 13, 2009 3:51 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] multi_index_container assertion after crash Eli Zakashansky escribió:
Hi,
I have an application which uses a persistent multi_index_container (I am using bip::managed_mapped_file && after openning the file I do "container = mappedFile->find_or_construct(.)"). I have an interesting behavior when my process crashes/stopped during massive use of the multi_index_container after the process restarts I don't experience any problems reading or inserting new elements to the container but if I try to erase elements or call mutli_index_container::clear (Debug mode only BOOST_MULTI_INDEX_ENABLE_SAFE_MODE is defined) an assertion pops up (safe_mode::Check_same_owner / safe_mode::Check_valid_iterator).
At first I thought that this kind of assertion fails due to concurrency issue, but it is not the case (I verified that this data structure is not accessed from more than one thread at a time, during this time the container is locked). Does anyone has any idea what can cause this situation? Or has a resolution ?
Hi Eli, I'm afraid you're providing too little information, but let me try a shot in the dark: are you persisting multi_index_containers iterators as well? These iterators are *not* persistable in managed memory as they contain absolute pointers as their representation. Can this be related to the problem? Other than this, I sincerely don't have a clue. If the crash/stop happens in the middle of a multi_index_container op then the data structure can be left in an unconsistent state, though I'm having a hard time figuring out how this inconsistent state can result in safe mode assertions as the ones you describe. Maybe if you could provide a little more context I'd be able to be more helpful than this. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky escribió:
Joaquín hi,
First thanks for your response.
The iterators I use are not persistent (I am retrieving the iterators from the index each time I am adding or removing element to/from the container). And even though the crash happens during massive use of the container, it does not happens in the middle of multi_index_container op (I have an applications log where I can see when exactly where in the code the app. crashes). I thought that may be the multi_index_container shall be flushed to the disk (it will effect performance) but I didnt find any method to do it.
For more information I would like to tell you in general about the test I do: 1. at first I populate my multi_index_container with elements (~4000). 2. I invoke a process which removes elements from the container (and processing each element after its removal - this phase doesn't relate to any boost structure ). During stage # 2 I am killing the process via task manager.
Umm, maybe we're looking at the wrong area and this has nothing to do with persistence. Consider the following piece of pseudocode: multi_index_container<...> m; auto it=m.find(...); m.erase(it); auto it2=it1; // WRONG See the last line? Once you've deleted the element a given iterator points at, this iterator becomes invalid and the only thing you can do with it is assign it a new value --you can't copy it, use it as the right part of an assignment, etc. If safe mode is on, assertions like the one you're seeing will pop up. Maybe you're having such a problem in your code? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Joaquín hi, I understand perfectly what u tried to do (and also experienced such problems before), therefore I verified that after I obtain the index(idx = container->get<INDEX>();) it is not changes (only read methods such as end(), begin() etc. are invoked) until I try to remove an element from the container (idx.erase(iterator);), and on the first try I try to call erase the assertion pops up. Thanks again for you response, Eli. -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of joaquin@tid.es Sent: Wednesday, October 14, 2009 12:48 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] multi_index_container assertion after crash Eli Zakashansky escribió:
Joaquín hi,
First thanks for your response.
The iterators I use are not persistent (I am retrieving the iterators from the index each time I am adding or removing element to/from the container). And even though the crash happens during massive use of the container, it does not happens in the middle of multi_index_container op (I have an applications log where I can see when exactly where in the code the app. crashes). I thought that may be the multi_index_container shall be flushed to the disk (it will effect performance) but I didnt find any method to do it.
For more information I would like to tell you in general about the test I do: 1. at first I populate my multi_index_container with elements (~4000). 2. I invoke a process which removes elements from the container (and processing each element after its removal - this phase doesn't relate to any boost structure ). During stage # 2 I am killing the process via task manager.
Umm, maybe we're looking at the wrong area and this has nothing to do with persistence. Consider the following piece of pseudocode: multi_index_container<...> m; auto it=m.find(...); m.erase(it); auto it2=it1; // WRONG See the last line? Once you've deleted the element a given iterator points at, this iterator becomes invalid and the only thing you can do with it is assign it a new value --you can't copy it, use it as the right part of an assignment, etc. If safe mode is on, assertions like the one you're seeing will pop up. Maybe you're having such a problem in your code? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
[Please don't top-post, see http://www.boost.org/community/policy.html#quoting ] Eli Zakashansky escribió:
Joaquín hi,
I understand perfectly what u tried to do (and also experienced such problems before), therefore I verified that after I obtain the index(idx = container->get<INDEX>();) it is not changes (only read methods such as end(), begin() etc. are invoked) until I try to remove an element from the container (idx.erase(iterator);), and on the first try I try to call erase the assertion pops up.
Well, where does iterator come from? Is there any chance that a) it's end() because the lookup condition was not met? b) it's invalid because you already deleted the element it was pointing to? c) it belongs to a different container? Have you swapped containers, maybe? Is this systematic, i.e. does it happen the first time that particular line of code is hit, or does it happen more or less at random? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Joaquín hi, a) it's end() because the lookup condition was not met? - I am checking it before trying to erase. b) it's invalid because you already deleted the element it was pointing to? - No, I am looking for a record and then the first time I try to erase it, it happens. c) it belongs to a different container? Have you swapped containers, maybe? -No way. I've checked it 1000 times, and this is the same code that is used in my regular run and I had never experienced such problems (in this phase of the flow, this piece of code). Only when testing after crash the assertion pops up in 100% of the cases. Eli. -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of joaquin@tid.es Sent: Monday, October 19, 2009 12:49 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] multi_index_container assertion after crash [Please don't top-post, see http://www.boost.org/community/policy.html#quoting ] Eli Zakashansky escribió:
Joaquín hi,
I understand perfectly what u tried to do (and also experienced such problems before), therefore I verified that after I obtain the index(idx = container->get<INDEX>();) it is not changes (only read methods such as end(), begin() etc. are invoked) until I try to remove an element from the container (idx.erase(iterator);), and on the first try I try to call erase the assertion pops up.
Well, where does iterator come from? Is there any chance that a) it's end() because the lookup condition was not met? b) it's invalid because you already deleted the element it was pointing to? c) it belongs to a different container? Have you swapped containers, maybe? Is this systematic, i.e. does it happen the first time that particular line of code is hit, or does it happen more or less at random? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky escribió:
Joaquín hi,
a) it's end() because the lookup condition was not met? - I am checking it before trying to erase. b) it's invalid because you already deleted the element it was pointing to? - No, I am looking for a record and then the first time I try to erase it, it happens. c) it belongs to a different container? Have you swapped containers, maybe? -No way.
I've checked it 1000 times, and this is the same code that is used in my regular run and I had never experienced such problems (in this phase of the flow, this piece of code). Only when testing after crash the assertion pops up in 100% of the cases.
Hi Eli, first of all please don't top-post, it's again this list posting rules, see http://www.boost.org/community/policy.html#quoting . I'm afraid I can't be of more help with the information you've got. From your description of the problem, the container can't possibily be corrupted because you know that crashes never happen in the middle of an op container. Also, iterators are not persisted and they seemingly work OK except when recovering after a crash. I don't see where the problem might be, though a problem certainly is there. Unless you can provide more info and/or some sample code I don't see what else to try from my position. From yours you might want to debug the program harder and/or set Boost.MultiIndex invariant-checking mode (warning: extremely time-consuming) to validate your hypothesis that crashes do not corrupt the container. Best regards, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/6c5e8/6c5e8355a1099045fd81360a7a2c99dbfc837d03" alt=""
joaquin@tid.es wrote on Monday, October 19, 2009 10:40 AM:
I'm afraid I can't be of more help with the information you've got.
From
your description of the problem, the container can't possibily be corrupted because you know that crashes never happen in the middle of an op container. Also, iterators are not persisted and they seemingly work OK except when recovering after a crash. I don't see where the problem might be, though a problem certainly is there.
I haven't been following this thread too closely, so forgive me if this is off target. Is it possible that, when your process crashes, the operating system doesn't bother to properly flush the mapped file to disk? I have read that some versions of UNIX will produce undefined results if you don't sync a mapped file before closing it, even in an orderly shutdown. I think Windows is better about that, but I don't know what happens in a crash. What operating system are you using? (An earlier reference to "task manager" implies Windows.) Does anybody know if boost::interprocess needs to do anything special to flush mapped files in that platform? If so, the operating system won't permit it to do so in a crash. Even if interprocess doesn't need to do anything, it's still possible that the operating system might decide to discard anything that wasn't already flushed for some reason.
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Hello,
After asking my question the first time (a few month ago) Andrew and Joaquín both tried to help me, but unfortunately I still didn't find problem's root cause / solution. Partially it might be because I did not supplied enough information therefore I wrote a short-simple-single threaded console app that might help you reproduce the problem / find my mistake (this of course not a production app therefore there are a lot of hard-coded constants etc.).
The apps code contains 3 files " BoostTestingApp.cpp", " DataContainer.h", and "DataContainer.cpp" which could be found below.
But first the instructions:
To create the "damaged" data file use the following input:
2 (insert)
-1 (maximum)
5 (empty) - during this operation kill proc manually from the program manager (yes I am using windows) which simulates the crush.
To reproduce the problem (assertion window pops up) use the following inputs (restart the app using debug after creating the damaged file) :
Input #1:
4 (delete all)
Input #2:
3 (delete)
500000 (this particular record)
The program :
BoostTestingApp.cpp
------------------
#include "stdafx.h"
#include "DataContainer.h"
#include <iostream>
#include
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky
Hello,
After asking my question the first time (a few month ago) Andrew and Joaquín both tried to help me, but unfortunately I still didn't find problem's root cause / solution. Partially it might be because I did not supplied enough information therefore I wrote a short-simple-single threaded console app that might help you reproduce the problem / find my mistake (this of course not a production app therefore there are a lot of hard-coded constants etc.).
Eli, I'm not able to put the program in working condition: I'm compiling against the release branch (changing to trunk didn't help), debug mode, BOOST_ALL_NO_LIB #defined, Multi-threaded Debug DLL (/MDd) runtime library. Building is succesful, but the program gets stuck at the initialization phase, plase find attached the call stack where a nevereding loop is met. Any idea what's happening? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Joaquin hi, Thanks a lot for you quick response (again!). It is hard to understand what is the problem. What I think that the difference between you and I might be the fact that I used static linking (with boost lib files) . In case you fail to run the program again try to use a VC project which can be downloaded at https://rcpt.yousendit.com/808014672/0cd55316ab7da75c1c45acf265f9df31 as baseline (don't forget changing the include folders path). Thanks a lot, Eli Zakashansky
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
________________________________________ De: boost-users-bounces@lists.boost.org [boost-users-bounces@lists.boost.org] En nombre de Eli Zakashansky [Eli.Zakashansky@nice.com] Enviado el: sábado, 23 de enero de 2010 20:26 Para: boost-users@lists.boost.org Asunto: Re: [Boost-users] multi_index_container assertion after crash
Joaquin hi,
Thanks a lot for you quick response (again!). It is hard to understand what is the problem. What I think that the difference between you and I might be the fact that I used static linking (with boost lib files) .
In case you fail to run the program again try to use a VC project which can be downloaded at https://rcpt.yousendit.com/808014672/0cd55316ab7da75c1c45acf265f9df31 as baseline (don't forget changing the include folders path).
Your project is for MSVC 9.0, alas I've got MSVC 8.0 here. I've rebuilt with static linking and the problem persists. Maybe you can take a look yourself, please find attached the MSVC 8.0 solution, which hopefully you can open with MSVC 9.0 See the same there? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/3df55/3df55f6382e77a9a03a5d992ed251ab9b3cd2423" alt=""
Joaquin hi, I've opened your project in VS2005, added the required lib files (libboost_date_time-vc80-mt-s-1_41.lib and libboost_date_time-vc80-mt-sgd-1_41.lib) which I compiled especially for this test, and besides the additional directories I've changed the property RuntimeLibrary which I set to be "0" [Project properties -> Configuration properties -> Code Generation Runtime Library <= Multi-threaded(/MT)]. I was surprised twice: 1. I've executed the program with no difficulties (didn't reproduced what you've mentioned about endless loops). 2. I didn't reproduced the initial problem which I've complained about (the assertions). Which makes me think that this a result of the VC9 compilation. I've did those tests with several boost versions (1.35,1.39,1.41) and reproduced the problem in 100% of the cases, but always compiled with VC9 compiler, changing the compiler is not in option in my case therefore I must find a solution to this problem. I'm glad that now I might be more focused about where the problem is, but have no clue in how to solve it. Any new ideas ? Eli Zakashansky. ________________________________________ Your project is for MSVC 9.0, alas I've got MSVC 8.0 here. I've rebuilt with static linking and the problem persists. Maybe you can take a look yourself, please find attached the MSVC 8.0 solution, which hopefully you can open with MSVC 9.0 See the same there? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/6c5e8/6c5e8355a1099045fd81360a7a2c99dbfc837d03" alt=""
On Monday, January 25, 2010 11:29 AM , Eli Zakashansky wrote:
I've opened your project in VS2005, added the required lib files (libboost_date_time-vc80-mt-s-1_41.lib and
which I compiled especially for this test, and besides the additional
changed the property RuntimeLibrary which I set to be "0" [Project
Configuration properties -> Code Generation Runtime Library <= Multi-threaded(/MT)].
I was surprised twice:
1. I've executed the program with no difficulties (didn't reproduced what you've mentioned about endless loops). 2. I didn't reproduced the initial problem which I've complained about (the assertions). Which makes me think that this a result of the VC9 compilation. I've did those tests with several boost versions (1.35,1.39,1.41) and reproduced the problem in 100% of the cases, but always compiled with VC9 compiler, changing the compiler is not in
libboost_date_time-vc80-mt-sgd-1_41.lib) directories I've properties -> option in my case
therefore I must find a solution to this problem. I'm glad that now I might be more focused about where the problem is, but have no clue in how to solve it. Any new ideas ?
Have you tried opening Joaquin's project in VC9? That may tell you if there is something wrong with VC9, or if there is some obscure difference between your project settings and Joaquin's.
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky
Joaquin hi,
I've opened your project in VS2005, added the required lib files (libboost_date_time-vc80-mt-s-1_41.lib and libboost_date_time-vc80-mt-sgd-1_41.lib) [...]
These libs are not really needed, you can get rid of them by defining BOOST_ALL_NO_LIB. Does this make any difference wrt to the 1, below?
I was surprised twice:
1. I've executed the program with no difficulties (didn't reproduced what you've mentioned about endless loops).
See above.
2. I didn't reproduced the initial problem which I've complained about (the assertions). Which makes me think that this a result of the VC9 compilation. [...] Any new ideas ?
Not really, your results are not consistent with mine and seemingly we're compiling the same project. I don't get why your program does not get stuck om startup. You might also want to try Andrew's sensible suggestion. I'm sorry I can't be more helpful, please keep me updated if some idea or new fact arises. Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
data:image/s3,"s3://crabby-images/d15a8/d15a849e756d614839063b3d7e2d9dd31858352b" alt=""
Eli Zakashansky escribió:
Joaquin hi,
[...]
I was surprised twice:
1. I've executed the program with no difficulties (didn't reproduced what you've mentioned about endless loops).
I've just built the project I sent you yesterday on a VS2008 environment and, again, the program gets stuck at startup. Whatever differences there are between our testing scenarios, seems like they're not related with the compiler itself; maybe it's the OS or something (mine is Win XP SP3). Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (5)
-
Andrew Holden
-
Eli Zakashansky
-
Joaquin M Lopez Munoz
-
JOAQUIN M. LOPEZ MUÑOZ
-
joaquin@tid.es