[asio] Crash when posted handlers execute in one thread only?

Hi, Not sure if there is a race condition in the code below, or if I am not stopping the service in a sanctioned manner but when run, the code consistently crashes on OSX 10.9 with Boost 1.55 and Clang. The only regular pattern was that the crash occurs when all the posted completion handlers execute in one thread only. When the completion handlers execute across both threads, there is no crash. This seems to me like there could be an initialization bug somewhere. I'll try and create a smaller test case. Thanks for any ideas or suggestions. The code: #define LOG(msg) { \ boost::mutex::scoped_lock lock(log_mutex); \ std::cout << "[" << boost::this_thread::get_id() \ << "] " << msg << std::endl; \ } void test() { boost::mutex log_mutex; LOG("Boost version: " << BOOST_LIB_VERSION); for(int ii = 0; ii < 100; ++ii) { LOG("Iteration: " << ii); boost::asio::io_service service; boost::asio::io_service::work work(service); boost::thread_group group; auto fun = [&](){ LOG("Starting service"); service.run(); LOG("Completing service"); }; group.create_thread(fun); group.create_thread(fun); int ntasks = 10; volatile int val = 2; std::vector<volatile int> ints(ntasks,val); for(int ii = 0; ii < ntasks; ++ii) { service.post([&ints,ii,&log_mutex]() { LOG("Task: " << ii); ints[ii] = 1; }); } service.post([&service,&log_mutex](){ LOG("Stopping service"); service.stop(); }); group.join_all(); for(int ii = 0; ii < ntasks; ++ii) { if(ints[ii] != 1) { LOG(ii << "->" << ints[ii]); } } } LOG("Done test"); } int main() { try { test(); } catch(...) { std::cout << "Exception in main()" << std::endl; } } The output + stack trace: ... [0x7fff7aeb9310] Iteration: 91 [0x104681000] Starting service [0x104704000] Starting service [0x104704000] Task: 0 [0x104704000] Task: 1 [0x104704000] Task: 2 [0x104704000] Task: 3 [0x104704000] Task: 4 [0x104704000] Task: 5 [0x104704000] Task: 6 [0x104704000] Task: 7 [0x104704000] Task: 8 [0x104704000] Task: 9 [0x104704000] Stopping service [0x104704000] Completing service [0x104681000] Completing service Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x0000000104703c57 0x00007fff8b06cdf3 in _pthread_cond_updateval () (gdb) where #0 0x00007fff8b06cdf3 in _pthread_cond_updateval () #1 0x00007fff8b06c91e in _pthread_cond_signal () #2 0x00000001000048d0 in boost::asio::detail::posix_event::signal_and_unlock<boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex> > () at build/boost/include/boost/asio/detail/posix_event.hpp:62 #3 0x00000001000048d0 in boost::asio::detail::task_io_service::wake_one_idle_thread_and_unlock () at build/boost/include/boost/asio/detail/task_io_service.hpp:484 #4 0x00000001000048d0 in boost::asio::detail::task_io_service::wake_one_thread_and_unlock (this=<value temporarily unavailable, due to optimizations>, lock=<value temporarily unavailable, due to optimizations>) at posix_event.hpp:493 #5 0x0000000100004776 in boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::~scoped_lock () at task_io_service.ipp:278 #6 boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>::~scoped_lock () at build/boost/include/boost/asio/detail/scoped_lock.hpp:52 #7 0x0000000100004776 in boost::asio::detail::task_io_service::post_immediate_completion (this=<value temporarily unavailable, due to optimizations>, op=<value temporarily unavailable, due to optimizations>, is_continuation=<value temporarily unavailable, due to optimizations>) at scoped_lock.hpp:279 #8 0x0000000100001952 in test () at task_io_service.hpp:70 #9 0x0000000100002ca0 in main () at app/exe/testcpp/main.cpp:64 (gdb) info threads 2 0x00007fff985fc662 in kevent64 () * 1 0x00007fff8b06cdf3 in _pthread_cond_updateval ()

On 6/04/2014 02:49, quoth Sohail Somani:
Not sure if there is a race condition in the code below, or if I am not stopping the service in a sanctioned manner but when run, the code consistently crashes on OSX 10.9 with Boost 1.55 and Clang. The only regular pattern was that the crash occurs when all the posted completion handlers execute in one thread only. When the completion handlers execute across both threads, there is no crash. This seems to me like there could be an initialization bug somewhere. I'll try and create a smaller test case.
Does it still crash if you remove the call to service.stop()? Because in the code below, that should be redundant. Also, starting the threads should occur *after* you post all the tasks. Otherwise there is a chance that none of them will actually execute.

Not sure if there is a race condition in the code below, or if I am not stopping the service in a sanctioned manner but when run, the code consistently crashes on OSX 10.9 with Boost 1.55 and Clang. The only regular pattern was that the crash occurs when all the posted completion handlers execute in one thread only. When the completion handlers execute across both threads, there is no crash. This seems to me like there could be an initialization bug somewhere. I'll try and create a smaller test case.
Does it still crash if you remove the call to service.stop()? Because in the code below, that should be redundant.
Also, starting the threads should occur *after* you post all the tasks. Otherwise there is a chance that none of them will actually execute.
Note that OP uses io_service::work, so running idle io_service is ok.

On 2014-04-06, 8:20 PM, Gavin Lambert wrote:
On 6/04/2014 02:49, quoth Sohail Somani:
Not sure if there is a race condition in the code below, or if I am not stopping the service in a sanctioned manner but when run, the code consistently crashes on OSX 10.9 with Boost 1.55 and Clang. The only regular pattern was that the crash occurs when all the posted completion handlers execute in one thread only. When the completion handlers execute across both threads, there is no crash. This seems to me like there could be an initialization bug somewhere. I'll try and create a smaller test case.
Does it still crash if you remove the call to service.stop()? Because in the code below, that should be redundant.
Also, starting the threads should occur *after* you post all the tasks. Otherwise there is a chance that none of them will actually execute.
Thanks for your comments. As Igor pointed out in another email, without a call to service.stop(), the io_service will continue to wait for more handlers which means the group.join_all() will continue to wait uninterrupted until the Sun finally explodes. I was able to also reproduce the issue if I take it down to one thread which means my earlier intuition about requiring two threads was incorrect. There must be a race condition in the code, but I just don't see it right now. Sohail

Does it still crash if you remove the call to service.stop()? Because in the code below, that should be redundant.
Also, starting the threads should occur *after* you post all the tasks. Otherwise there is a chance that none of them will actually execute.
Thanks for your comments.
As Igor pointed out in another email, without a call to service.stop(), the io_service will continue to wait for more handlers which means the group.join_all() will continue to wait uninterrupted until the Sun finally explodes.
Just out if curiosity, if you create io_service::work dynamically and destroy it instead of stopping io_service, does it crash? Like this: auto work = make_shared<io_service::work>(service); //.... // at the point where you want to stop the service: work.reset();

On 2014-04-07, 5:12 PM, Igor R wrote:
Does it still crash if you remove the call to service.stop()? Because in the code below, that should be redundant.
Also, starting the threads should occur *after* you post all the tasks. Otherwise there is a chance that none of them will actually execute.
Thanks for your comments.
As Igor pointed out in another email, without a call to service.stop(), the io_service will continue to wait for more handlers which means the group.join_all() will continue to wait uninterrupted until the Sun finally explodes.
Just out if curiosity, if you create io_service::work dynamically and destroy it instead of stopping io_service, does it crash? Like this: auto work = make_shared<io_service::work>(service); //.... // at the point where you want to stop the service: work.reset();
With this change, it crashes once every two runs of the test app so it crashes less often, but still crashes reliably. Seems like a bug to me and not the PEBKAC for which I am infamous. Anyone familiar enough with the asio library to say that I should file a bug? Thanks, Sohail
participants (3)
-
Gavin Lambert
-
Igor R
-
Sohail Somani