[multi-index] Segmentation fault during erase

(I am using 64-bit boost 1.33.1, gcc version 4.1.2) I am trying to track down an elusive segmentation fault that occurs after days of running a program. I am wondering if I am using the boost multi-index library incorrectly. I have a coredump that gives the following backtrace (full backtrace at the end of this email). at /usr/include/boost/multi_index/detail/ord_index_node.hpp:274 a274 while(x!=root&&(x==0 || x->color()==black)){ having looked at the code in that function, I can only assume it is called as a result of these two following lines: (Attempting to erase the first member of the index). CallQueue::nth_index<CALL_NEXT_CALL_TIME>::type & timeIndex = this->m_calls.get<CALL_NEXT_CALL_TIME>(); timeIndex.erase( timeIndex.begin() ); I am certain that the queue is not empty, as the following methods guard against an empty queue. (Shown in an excerpt of my relevant code at the bottom of this email). Finally, my definition for the m_calls structure and #defines for the indexes follow at the end of this email. My program is very clean memory wise, having run a fairly extensive test suite through valgrind, so the possibility of other memory errors compounding are low. My main question is if I am using the multi-index library incorrectly, or if there is a bug in the multi-index library. Failing that, any insight as to why I would be getting this error is appreciated :). Thanks, Josh %%%START m_calls structure%%% typedef boost::multi_index::multi_index_container< Call * , boost::multi_index::indexed_by< boost::multi_index::ordered_unique< BOOST_MULTI_INDEX_CONST_MEM_FUN( Call , int , getId ) > , boost::multi_index::ordered_non_unique< BOOST_MULTI_INDEX_CONST_MEM_FUN( Call , const boost::posix_time::ptime & , getNextCallTime ) > > > CallQueue; CallQueue m_calls; %%%END m_calls structure%%% %%%START #defines for m_calls indexes%%% #define CALL_ID 0 #define CALL_NEXT_CALL_TIME 1 %%%END #defines for m_calls indexes%%% %%%START MY CODE%%% Call * PhoneList::getNextCall() { if ( this->m_calls.empty() ) { throw NoAvailableCallException(); } CallQueue::nth_index<CALL_NEXT_CALL_TIME>::type::iterator pos = this->m_calls.get<CALL_NEXT_CALL_TIME>().begin(); return *pos; } Call * PhoneList::popAvailableCall() { Call * c = this->getNextCall(); if ( c->getNextCallTime() > LocalTime::getInst()->getCachedTime() ) { throw NoAvailableCallException(); } CallQueue::nth_index<CALL_NEXT_CALL_TIME>::type & timeIndex = this->m_calls.get<CALL_NEXT_CALL_TIME>(); timeIndex.erase( timeIndex.begin() ); /*************BELIEVED THAT ERROR ORIGINATES HERE****************/ return c; } %%%END MY CODE%%% %%%START GDB BACKTRACE%%% #0 0x000000000044a7fd in Database::AQM::QueueEngine::PhoneList::popAvailableCall (this=<value optimized out>) at /usr/include/boost/multi_index/detail/ord_index_node.hpp:274 a274 while(x!=root&&(x==0 || x->color()==black)){ (gdb) backtrace #0 0x000000000044a7fd in Database::AQM::QueueEngine::PhoneList::popAvailableCall (this=<value optimized out>) at /usr/include/boost/multi_index/detail/ord_index_node.hpp:274 #1 0x000000000044664c in Database::AQM::QueueEngine::Controller::getCall (this=0x6d54d0) at Controller.cpp:83 #2 0x0000000000411d55 in Database::AQM::Controller::getCall (this=<value optimized out>) at Controller.cpp:75 #3 0x000000000041169b in Control::AQM::Controller::getCall (this=<value optimized out>) at Controller.cpp:27 #4 0x000000000040a01b in Boundary::AQM::RequestSocket::Controller::getCall (this=<value optimized out>) at Controller.cpp:43 #5 0x000000000045a703 in Boundary::AQM::RequestSocket::GetCallRequest::perform (this=<value optimized out>) at GetCallRequest.cpp:17 #6 0x0000000000410564 in Boundary::AQM::RequestSocket::RequestSocket::OnLine (this=0x6d31f0, l=@0x6d3340) at RequestSocket.cpp:30 #7 0x0000000000483829 in TcpSocket::ReadLine (this=0x6d31f0) at TcpSocket.cpp:810 #8 0x0000000000480a9c in SocketHandler::Select (this=0x42802b90, tsel=0x42802b70) at SocketHandler.cpp:395 #9 0x0000000000482e25 in SocketHandler::Select (this=0x2aaab04c8b98, sec=<value optimized out>, usec=1297040160) at SocketHandler.cpp:227 #10 0x000000000040ee6b in Boundary::AQM::RequestSocket::RequestPoller::run (this=0x2aaab0000e40) at RequestPoller.cpp:84 #11 0x00002aaaaad15c3f in boost::function0<void, std::allocator<boost::function_base> >::operator() () from /usr/lib64/libboost_thread.so.2 #12 0x00002aaaaad156bf in boost::thread_group::join_all () from /usr/lib64/libboost_thread.so.2 #13 0x0000003aa56062f7 in start_thread () from /lib64/libpthread.so.0 #14 0x0000003aa4ace86d in clone () from /lib64/libc.so.6 #15 0x0000000000000000 in ?? () %%%END GDB BACKTRACE%%%

Joshua Moore-Oliva escribió:
(I am using 64-bit boost 1.33.1, gcc version 4.1.2)
I am trying to track down an elusive segmentation fault that occurs after days of running a program. I am wondering if I am using the boost multi-index library incorrectly. I have a coredump that gives the following backtrace (full backtrace at the end of this email).
at /usr/include/boost/multi_index/detail/ord_index_node.hpp:274 a274 while(x!=root&&(x==0 || x->color()==black)){
having looked at the code in that function, I can only assume it is called as a result of these two following lines: (Attempting to erase the first member of the index).
CallQueue::nth_index<CALL_NEXT_CALL_TIME>::type & timeIndex = this->m_calls.get<CALL_NEXT_CALL_TIME>(); timeIndex.erase( timeIndex.begin() );
I am certain that the queue is not empty, as the following methods guard against an empty queue. (Shown in an excerpt of my relevant code at the bottom of this email).
Finally, my definition for the m_calls structure and #defines for the indexes follow at the end of this email. My program is very clean memory wise, having run a fairly extensive test suite through valgrind, so the possibility of other memory errors compounding are low. My main question is if I am using the multi-index library incorrectly, or if there is a bug in the multi-index library. Failing that, any insight as to why I would be getting this error is appreciated :).
Thanks, Josh
[...] Hello Josh, I don't see anything patently suspicious in the code you provide. You might want to check the following issues: 1. m_calls is a multi_index_container of Call *pointers*. Is it possible that a Call is deleted *before* its pointer is erased from m_call? This could trigger problems like the one you're seeing, in contexts removed from the point where the Call object is deleted (as could be the case here with popAvailableCall). 2. Is m_call used in a multi-threaded environment? If so, have you checked for synchronization problems? 3. You might try setting the invariant-checking mode and safe mode described at http://tinyurl.com/4o8jex and see whether this sheds some further light. Be aware that these modes have a huge impact on performance and should be used only in debug builds. Looking fwd to your feedback, Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
participants (2)
-
joaquin@tid.es
-
Joshua Moore-Oliva