
Hi, I have build a job queue with a combination io_service, io_service::work, packaged_task and a thread group. I am seeing randomly either segfaults or strange asserts in pthread or even malloc, which suggest to me memory is being corrupted somehow. The problem is it does not show in debug mode running under valgrind, but if I run valgrind on the non debug version I get valgrind errors in pthreads called from asio. I try to downgrade my boost lib from 1.54 to 1.52, but still the same. I am completely stuck on this, does anyone have some hints to possible causes ? I can not show the complete source since it's not possible to isolate, but here's the types I have : std::vector<boost::shared_ptr<ExceptionTransfer> > io_error_; std::vector<boost::shared_future<BasisSelectRet> > io_future_; boost::asio::io_service io_service_threads_; boost::shared_ptr<boost::asio::io_service::work> io_work_; boost::thread_group io_threads_; I am calling it ala like this : /* Create new tasks */ basistasks_[workerid] = boost::make_shared<boost::packaged_task<BasisSelectRet>
(boost::bind(&CallBasisSelection,&worker,boost::ref(io_error_[workerid])));
/* Make threads */ io_service_threads_.post(boost::bind(&boost::packaged_task<BasisSelectRet>::operator(),basistasks_[workerid])); io_future_[workerid] = basistasks_[workerid]->get_future().share(); The called function is defined as follows (BasisSelectRet is just a enum) : /* Helper function */ BasisSelectRet CallBasisSelection(TreeWorker *worker, boost::shared_ptr<ExceptionTransfer> &error) { BasisSelectRet ret = BasisSelectRetOk; try { worker->BasisSelection(); } catch( ... ) { error->SetException(boost::current_exception()); } return ret; } I can wait for a job to finish like this : boost::wait_for_any(io_future_.begin(),io_future_.end()); Here's the val grind errors : ==3723== by 0xB5D77BF: ??? ==3723== ==3723== Thread 5: ==3723== Invalid read of size 4 ==3723== at 0x6506C01: __pthread_mutex_cond_lock (pthread_mutex_lock.c:50) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEAA2A: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c334 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6506C36: __pthread_mutex_cond_lock (pthread_mutex_lock.c:61) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEAA2A: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c328 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6506C40: __pthread_mutex_cond_lock (pthread_mutex_lock.c:62) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEAA2A: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c330 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid write of size 4 ==3723== at 0x6506C4C: __pthread_mutex_cond_lock (pthread_mutex_lock.c:125) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEAA2A: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c330 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 1 ==3723== at 0x5F27B2B: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0xDD41297: ??? ==3723== Address 0x670c360 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6500311: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:37) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c334 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid write of size 4 ==3723== at 0x650032F: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:46) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c330 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6500360: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:49) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c338 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6500340: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:52) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c328 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid write of size 4 ==3723== at 0x6503C55: __lll_unlock_wake (lowlevellock.S:374) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c328 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Syscall param futex(futex) points to unaddressable byte(s) ==3723== at 0x6503C7C: __lll_unlock_wake (lowlevellock.S:380) ==3723== by 0x5EEAA58: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0x670c328 is not stack'd, malloc'd or (recently) free'd ==3723== <Log> 20 0 20 -3.05198e+02 - - - 1127 31.18 ==3723== Thread 3: ==3723== Invalid read of size 8 ==3723== at 0x4E0ED3F: __intel_ssse3_memcpy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x7439DF7: ??? ==3723== Address 0x66f3b58 is 112 bytes inside a block of size 116 alloc'd ==3723== at 0x4024F20: malloc (vg_replace_malloc.c:236) ==3723== by 0x582A7B6: _ZN3slm6Matrix16ResizeVecsInternEiPKib. (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x7439DF7: ??? ==3723== ==3723== Invalid read of size 8 ==3723== at 0x4E0ED6F: __intel_ssse3_memcpy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x7439DF7: ??? ==3723== Address 0x66f5d48 is 136 bytes inside a block of size 140 alloc'd ==3723== at 0x4024F20: malloc (vg_replace_malloc.c:236) ==3723== by 0x582A7B6: _ZN3slm6Matrix16ResizeVecsInternEiPKib. (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x7439DF7: ??? ==3723== <Log> 20 0 20 -3.05198e+02 -2.68521e+02* 12.0 1 1139 36.72 ==3723== Thread 4: ==3723== Invalid read of size 4 ==3723== at 0x6506C01: __pthread_mutex_cond_lock (pthread_mutex_lock.c:50) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEA9C5: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0xb93ffac is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6506C36: __pthread_mutex_cond_lock (pthread_mutex_lock.c:61) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEA9C5: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0xb93ffa0 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid read of size 4 ==3723== at 0x6506C40: __pthread_mutex_cond_lock (pthread_mutex_lock.c:62) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEA9C5: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0xb93ffa8 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Invalid write of size 4 ==3723== at 0x6506C4C: __pthread_mutex_cond_lock (pthread_mutex_lock.c:125) ==3723== by 0x65010B2: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.S:203) ==3723== by 0x5F27B73: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x5EEA9C5: boost::asio::io_service::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0x4C160F9: boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> > > >::run() (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x4E25DAB: thread_proxy (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulumnet20.so) ==3723== by 0x64FC96D: start_thread (pthread_create.c:300) ==3723== by 0x65DD98D: clone (clone.S:130) ==3723== Address 0xb93ffa8 is not stack'd, malloc'd or (recently) free'd ==3723== ==3723== Thread 5: ==3723== Invalid read of size 1 ==3723== at 0x5F27B2B: boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service::thread_info&, boost::system::error_code const&) (in /home/bj/sulum/root/src/bjam/bin/intel-linux/release/address-model-32/threading-multi/libsulum20.so) ==3723== by 0xCD40297: ??? ==3723== Address 0xb93ffd8 is not stack'd, malloc'd or (recently) free'd ==3723==

On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I have build a job queue with a combination io_service, io_service::work, packaged_task and a thread group. I am seeing randomly either segfaults or strange asserts in pthread or even malloc, which suggest to me memory is being corrupted somehow. [...] I can wait for a job to finish like this :
boost::wait_for_any(io_future_.begin(),io_future_.end());
What are you doing once this has returned? In particular note that if you want to "give up" on retrieving any further results then you must stop() the io_service and join_all() the thread group (in that order) before you allow the io_service or thread_group to be destroyed. Their respective destructors do not do this for you. (Also note that this will of course still complete however many tasks have already started to process, unless you have some other means of cancelling a task in progress, such as using interruption points.)

On Mon, Sep 30, 2013 at 12:03 AM, Gavin Lambert <gavinl@compacsort.com>wrote:
On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I have build a job queue with a combination io_service,
io_service::work, packaged_task and a thread group. I am seeing randomly either segfaults or strange asserts in pthread or even malloc, which suggest to me memory is being corrupted somehow.
[...]
I can wait for a job to finish like this :
boost::wait_for_any(io_future_** .begin(),io_future_.end());
What are you doing once this has returned?
In particular note that if you want to "give up" on retrieving any further results then you must stop() the io_service and join_all() the thread group (in that order) before you allow the io_service or thread_group to be destroyed. Their respective destructors do not do this for you.
(Also note that this will of course still complete however many tasks have already started to process, unless you have some other means of cancelling a task in progress, such as using interruption points.)
Thank you for the reply, much appreciated. I use interruptions points. When I want to stop workers and end the program I call : io_work_.reset(); io_threads_.interrupt_all(); JoinAllWorkers(); I see I have no io_service::stop(), but I read in a stack exchange reply that was not needed in this case, is that true ? Also I don't call join on the thread group, but on each thread individually, which I assume must be OK (done in JoinAllWorker()).
______________________________**_________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/**mailman/listinfo.cgi/boost-**users<http://lists.boost.org/mailman/listinfo.cgi/boost-users>

On Mon, Sep 30, 2013 at 12:21 AM, Bo Jensen <jensen.bo@gmail.com> wrote:
On Mon, Sep 30, 2013 at 12:03 AM, Gavin Lambert <gavinl@compacsort.com>wrote:
On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I have build a job queue with a combination io_service,
io_service::work, packaged_task and a thread group. I am seeing randomly either segfaults or strange asserts in pthread or even malloc, which suggest to me memory is being corrupted somehow.
[...]
I can wait for a job to finish like this :
boost::wait_for_any(io_future_** .begin(),io_future_.end());
What are you doing once this has returned?
In particular note that if you want to "give up" on retrieving any further results then you must stop() the io_service and join_all() the thread group (in that order) before you allow the io_service or thread_group to be destroyed. Their respective destructors do not do this for you.
(Also note that this will of course still complete however many tasks have already started to process, unless you have some other means of cancelling a task in progress, such as using interruption points.)
Thank you for the reply, much appreciated.
I use interruptions points. When I want to stop workers and end the program I call :
io_work_.reset();
io_threads_.interrupt_all();
JoinAllWorkers();
I see I have no io_service::stop(), but I read in a stack exchange reply that was not needed in this case, is that true ?
Also I don't call join on the thread group, but on each thread individually, which I assume must be OK (done in JoinAllWorker()).
I can also get the same errors by having only one thread in the queue and the main thread. In this case there's no work in the main thread until the worker thread is finishing. It get's even more suspicious since this module should be deterministic (ensured by processing jobs in certain order), which it obviously is not i.e I can rerun it and it works or it may fail with another error. I never get valgrind errors in my part of the code though.
______________________________**_________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/**mailman/listinfo.cgi/boost-**users<http://lists.boost.org/mailman/listinfo.cgi/boost-users>

On 9/30/2013 6:21 PM, Quoth Bo Jensen:
I see I have no io_service::stop(), but I read in a stack exchange reply that was not needed in this case, is that true ?
Mostly. If you don't stop() the io_service then it will still waste some time entering each posted-but-not-yet-run job and running up until the first interruption point (assuming that the interruption point will still be "live" after the interruption exception has been thrown once, which I'm not entirely sure about -- I haven't really played with interruption points too much myself). If your number of jobs is less than or equal to your number of worker threads then this will be invisible except in a rare race if the job completes very quickly. Even with a larger number of jobs it might be hard to spot if some of their work is asynchronous itself, thereby allowing all of the original jobs to get run early on. Either way though as long as you join all the worker threads before you allow the service to be destroyed then it ought to be safe -- if you hit the above case then it'll just take longer to join than you might be expecting.
Also I don't call join on the thread group, but on each thread individually, which I assume must be OK (done in JoinAllWorker()).
The thread_group::join_all() is just an individual thread join in a loop, so it should be fine as long as you're doing an unconditional join and not a timed_join. Otherwise I can't think of anything else helpful, sorry! :( You could try asking on the asio-specific mailing list. They're likely going to ask you to try to make a smaller test-case that you can share in full though.

On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I am calling it ala like this :
/* Create new tasks */ basistasks_[workerid] = boost::make_shared<boost::packaged_task<BasisSelectRet>
(boost::bind(&CallBasisSelection,&worker,boost::ref(io_error_[workerid])));
What is "worker" here? Could it become invalid before the join completes? Also note that when passing it to a different thread it's safer to allow the shared_ptr to be copied rather than being passed by reference, although this probably isn't related to your current problem.
/* Helper function */ BasisSelectRet CallBasisSelection(TreeWorker *worker,
boost::shared_ptr<ExceptionTransfer> &error) { BasisSelectRet ret = BasisSelectRetOk;
try { worker->BasisSelection(); } catch( ... ) { error->SetException(boost::current_exception()); }
return ret; }
You probably shouldn't be catching exceptions here. packaged_task will do that for you anyway, and as it stands I think there's a slim chance of a broken_promise exception that will bypass this code, so you'd need to catch where you're accessing the futures anyway. (If your number of jobs is larger than your number of threads.)

On Mon, Sep 30, 2013 at 1:20 AM, Gavin Lambert <gavinl@compacsort.com>wrote:
On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I am calling it ala like this :
/* Create new tasks */ basistasks_[workerid] = boost::make_shared<boost::**packaged_task<BasisSelectRet>
(boost::bind(&**CallBasisSelection,&worker,** boost::ref(io_error_[workerid]**)));
What is "worker" here? Could it become invalid before the join completes?
Worker is an instance of a class that does some work. It can not be become invalid before, since it's destructor is called long after the thread is destroyed.
Also note that when passing it to a different thread it's safer to allow the shared_ptr to be copied rather than being passed by reference, although this probably isn't related to your current problem.
Thanks, I will change it.
/* Helper function */
BasisSelectRet CallBasisSelection(TreeWorker *worker,
boost::shared_ptr<**ExceptionTransfer> &error) { BasisSelectRet ret = BasisSelectRetOk;
try { worker->BasisSelection(); } catch( ... ) { error->SetException(boost::**current_exception()); }
return ret; }
You probably shouldn't be catching exceptions here. packaged_task will do that for you anyway, and as it stands I think there's a slim chance of a broken_promise exception that will bypass this code, so you'd need to catch where you're accessing the futures anyway. (If your number of jobs is larger than your number of threads.)
There's several issues with exceptions, I am following the approach in the last example given in this post : https://plus.google.com/u/0/102920706569335701415/posts/VuKMpMhKSnm I have tested it works probably, which it does.
______________________________**_________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/**mailman/listinfo.cgi/boost-**users<http://lists.boost.org/mailman/listinfo.cgi/boost-users>

This turned out to be an issue on my side, the problem was I have memory pools which needs to run in serial, hence data was not protected in all places causing random memory corruption. I apologize, thanks for the reply Gavin. On Mon, Sep 30, 2013 at 8:36 AM, Bo Jensen <jensen.bo@gmail.com> wrote:
On Mon, Sep 30, 2013 at 1:20 AM, Gavin Lambert <gavinl@compacsort.com>wrote:
On 9/30/2013 5:41 PM, Quoth Bo Jensen:
I am calling it ala like this :
/* Create new tasks */ basistasks_[workerid] = boost::make_shared<boost::**packaged_task<BasisSelectRet>
(boost::bind(&**CallBasisSelection,&worker,** boost::ref(io_error_[workerid]**)));
What is "worker" here? Could it become invalid before the join completes?
Worker is an instance of a class that does some work. It can not be become invalid before, since it's destructor is called long after the thread is destroyed.
Also note that when passing it to a different thread it's safer to allow the shared_ptr to be copied rather than being passed by reference, although this probably isn't related to your current problem.
Thanks, I will change it.
/* Helper function */
BasisSelectRet CallBasisSelection(TreeWorker *worker,
boost::shared_ptr<**ExceptionTransfer> &error) { BasisSelectRet ret = BasisSelectRetOk;
try { worker->BasisSelection(); } catch( ... ) { error->SetException(boost::**current_exception()); }
return ret; }
You probably shouldn't be catching exceptions here. packaged_task will do that for you anyway, and as it stands I think there's a slim chance of a broken_promise exception that will bypass this code, so you'd need to catch where you're accessing the futures anyway. (If your number of jobs is larger than your number of threads.)
There's several issues with exceptions, I am following the approach in the last example given in this post :
https://plus.google.com/u/0/102920706569335701415/posts/VuKMpMhKSnm
I have tested it works probably, which it does.
______________________________**_________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/**mailman/listinfo.cgi/boost-**users<http://lists.boost.org/mailman/listinfo.cgi/boost-users>
participants (2)
-
Bo Jensen
-
Gavin Lambert