[ASIO] random crashes

28 Jan 2009

      Hello

I m using ASIO library (from boost 1.35) in a network daemon (running on 
a Debian Etch system). The code structure is almost the same as the one 
described in the HTTP Server example 
(http://tenermerx.com/Asio/boost_asio_1_3_1/doc/html/boost_asio/examples.html...). 
  One io_service is used, async_accept() creates new "sessions" and so on.

The daemon can run flawlessly for weeks (or only some hours), and crash 
randomly (segmentation fault). The "network" load isn't really high, 
every 5 minutes, a few megabytes of data (characters) are sent to the 
daemon. I didn't noticed any memory leaks or null pointers accesses. 
Among many crashes, I found 2 kind of crashes.

Since core are dumped, I tried to debug it, but I don't know how to 
interpret the core result. The binary wasn't linked against debug 
libraries and I just can get debug data from the binary itself.

Here's the gdb output on "bt full" command. The daemon has 4 threads, 
here s the gdb output of the one which causes the crash (I assume this 
is this one)

First kind of crash : (only the relevant part is pasted, the output is 
huge). It seems to be related to the way I handle the timeout on a timer.

In my daemon, the io_service thread may call close() function on the 
socket object when another thread may call cancel() on the socket 
object. Could this lead to a crash ? (I can post source code if needed) 
What is the best way to handle receive timeout *and* socket & timer 
close from another thread ?

Thread 1 (process 27120):
Program terminated with signal 11, Segmentation fault.
#0  0xb7c7c024 in pthread_mutex_lock () from 
/lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#1  0xb7d5f0c6 in pthread_mutex_lock () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0x08060438 in boost::asio::detail::posix_mutex::lock (this=0x16) at 
/usr/include/boost/asio/detail/posix_mutex.hpp:71
	error = 0
#3  0x08060559 in scoped_lock (this=0xb796807c, m=@0x16) at 
/usr/include/boost/asio/detail/scoped_lock.hpp:36
No locals.
#4  0x08060ad0 in 
boost::asio::detail::epoll_reactor<false>::close_descriptor (this=0x2, 
descriptor=-1291829448)
     at /usr/include/boost/asio/detail/epoll_reactor.hpp:297
	lock = {<boost::noncopyable_::noncopyable> = {<No data fields>}, mutex_ 
= @0x16, locked_ = 212}
	ev = {events = 6, data = {ptr = 0x25, fd = 37, u32 = 37, u64 = 8589934629}}
#5  0x0809b189 in 
boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp, 
boost::asio::detail::epoll_reactor<false> >::close (this=0xb3004358, 
impl=@0xb3037f1c, ec=@0xb7968128) at 
/usr/include/boost/asio/detail/reactive_socket_service.hpp:210
No locals.
#6  0x0809b277 in 
boost::asio::stream_socket_service<boost::asio::ip::tcp>::close 
(this=0xb3000098, impl=@0xb3037f1c,
     ec=@0xb7968128) at 
/usr/include/boost/asio/stream_socket_service.hpp:145
No locals.
#7  0x0809b2c5 in boost::asio::basic_socket<boost::asio::ip::tcp, 
boost::asio::stream_socket_service<boost::asio::ip::tcp> >::close 
(this=0xb3037f18) at /usr/include/boost/asio/basic_socket.hpp:253
	ec = {m_val = 0, m_cat = 0xb7c74c10}
#8  0x08091129 in Session::handleTimeout (this=0xb3037f18, 
error=@0xb796821c) at Session.cc:80

Could someone bring me some explainations about this crash ?

Second kind of crash : (I really have no idea from where could come the 
crash, the backtrace isn't very useful)

(gdb) bt full
#0 
boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::unlock 
(this=0xb5631dac)
     at /usr/include/boost/asio/detail/scoped_lock.hpp:58
No locals.
#1  0x00000000 in ?? ()
No symbol table info available.

Could someone give me some advices about how to handle these crashes ? 
How could I get more data in backtrace ?

Thanks in advance for your help.
Regards

Axel

Jon Biggar

Axel

Bill Somerville

tags

participants (3)