observing process blocked on find_or_construct

Hi , We are trying interprocess scenario, one process created shared memory objects and killed and second process open the object using find_or_construct seeing this process blocked at getting lock. bt as shown below, #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa6b6733eb6 in _L_lock_941 () from /lib64/libpthread.so.0 #2 0x00007fa6b6733daf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7fa6b198b070) at ../nptl/pthread_mutex_lock.c:113 #3 0x0000000001bc3822 in lock (this=0x7fa6b198b070) at /x86/include/boost/interprocess/sync/posix/recursive_mutex.hpp:90 #4 lock (this=0x7fa6b198b070) at /x86/include/boost/interprocess/sync/interprocess_recursive_mutex.hpp:163 #5 scoped_lock (m=..., this=<synthetic pointer>) at /x86/include/boost/interprocess/sync/scoped_lock.hpp:81 #6 boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>::priv_generic_named_construct<char> (this=0x7fa6b198b010, type=<optimized out>, name=0x7ffde183ca20 "reported_slice_config_list", num=1, try2find=<optimized out>, dothrow=<optimized out>, table=..., index=..., is_intrusive=...) at /x86/include/boost/interprocess/segment_manager.hpp:1076 #7 0x0000000001bee007 in boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>, boost::interprocess::iset_index>::priv_generic_construct (this=<optimized out>, name=name@entry=0x7ffde183ca20 "reported_slice_config_list", num=num@entry=1, try2find=try2find@entry=true, dothrow=dothrow@entry=true, table=...) at /x86/include/boost/interprocess/segment_manager.hpp:760 #8 0x0000000001bf112d in generic_construct<boost::container::map<unsigned int, gnb::broadcast_plmn_list, std::less<unsigned int>, boost::interprocess::allocator<std::pair<unsigned int const, gnb::broadcast_plmn_list>, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family>, boost::interprocess::iset_index> > > > (table=..., dothrow=true, try2find=true, num=1, name=0x7ffde183ca20 "reported_slice_config_list", this=<optimized out>) at /x86/include/boost/interprocess/segment_manager.hpp:704 #9 operator()<std::less<unsigned int>, boost::interprocess::allocator<std::pair<unsigned int const, gnb::broadcast_plmn_list>, boost::interprocess::segment_manager<char, boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long int, long unsigned int, 0ul>, 0ul>, boost::interprocess::iset_index> >&> (this=<optimized out>) at /x86/include/boost/interprocess/detail/named_proxy.hpp:132 please let me know in which cases this can be occured, we see first process also hangs in termination. -- Regards, Murali Kishore

Murali Kishore wrote:
Hi ,
We are trying interprocess scenario, one process created shared memory objects and killed and second process open the object using find_or_construct seeing this process blocked at getting lock.
bt as shown below, #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa6b6733eb6 in _L_lock_941 () from /lib64/libpthread.so.0 #2 0x00007fa6b6733daf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7fa6b198b070) at ../nptl/pthread_mutex_lock.c:113 #3 0x0000000001bc3822 in lock (this=0x7fa6b198b070) at /x86/include/boost/interprocess/sync/posix/recursive_mutex.hpp:90
Detecting mutexes held by dead processes requires the so-called POSIX robust mutexes. I can see in the Interprocess source code that those are enabled when the macro BOOST_INTERPROCESS_POSIX_ROBUST_MUTEXES is defined. This macro is defined automatically #if (_XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L) https://github.com/boostorg/interprocess/blob/29cee9c6067f1d20ddb6421af15977... but I suppose you can also try defining it manually and see if that helps, because _XOPEN_SOURCE and _POSIX_C_SOURCE are also user macros and you probably aren't defining any of them.

Thanks Peter Dimov, i have added logic unlock in signal handler, now i am not seeing this issue. I am seeing one more issue, if i call construct object and do work and clear in loop of ~30000, i see following error while construct call. 43 #0 set_bits (b=0, n=...) at /x86/include/boost/interp rocess/offset_ptr.hpp:728 44 #1 set_color (c=<optimized out>, n=...) at /x86/inclu de/boost/intrusive/detail/rbtree_node.hpp:167 45 #2 boost::intrusive::rbtree_algorithms<boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, uns igned long, 0ul>, true> >::rebalance_after_insertion (header=..., p=...) at /x86/include/boost/intrusive/rbtree_algorithms.hpp:558 46 #3 0x0000000001c3ba5d in insert_equal<boost::intrusive::detail::key_nodeptr_comp<std::less<boost::interprocess::rbtree_best _fit<boost::interprocess::mutex_family>::block_ctrl>, boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost: :interprocess::mutex_family>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, boost::move_detail::identity<boost::interprocess::rbtr ee_best_fit<boost::interprocess::mutex_family>::block_ctrl> > > (comp=..., new_node=..., hint=..., header=...) at /x86/include/boost/intrusive/rbtree_algorithms.hpp:388 47 #4 boost::intrusive::bstree_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::mutex _family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_tr aits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::int rusive::dft_tag, 3u>, void, void, unsigned long, true, (boost::intrusive::algo_types)5, void>::insert_equal (this=this@entry =0x7fcdf86f5038, hint=..., value=...) at /x86/include/ boost/intrusive/bstree.hpp:1085 48 #5 0x0000000001c3e9f5 in insert (value=..., hint=..., this=0x7fcdf86f5038) at /x86/include/boost/intrusive/set.hpp:752 49 #6 boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsi gned long, 0ul>, 0ul>::priv_check_and_allocate (this=this@entry=0x7fcdf86f5010, nunits=nunits@entry=10, block=0x7fcdf87429c0 , received_size=@0x7fcdfb5f5730: 149) at /x86/include/ boost/interprocess/mem_algo/rbtree_best_fit.hpp:1282 50 #7 0x0000000001c4c7ac in priv_allocate (backwards_multiple=1, reuse_ptr=<synthetic pointer>, prefer_in_recvd_out_size=@0x7f cdfb5f5730: 149, limit_size=<optimized out>, command=1, this=0x7fcdf86f5010) at /x86/include/boost/interprocess/mem_algo/rbtree_best_fit.hpp:977 51 #8 boost::interprocess::rbtree_best_fit<boost::interprocess::mutex_family, boost::interprocess::offset_ptr<void, long, unsi gned long, 0ul>, 0ul>::allocate (this=this@entry=0x7fcdf86f5010, nbytes=<optimized out>) at /x86/include/boost/interprocess/mem_algo/rbtree_best_fit.hpp:673 52 #9 0x0000000001c4cc85 in allocate (nbytes=<optimized out>, this=0x7fcdf86f5010) at /x86/include/boost/interprocess/segment_manager.hpp:177 let me know what can cause this kind of issue. Regards, Murali On Thu, Jan 23, 2025 at 11:35 PM Peter Dimov <pdimov@gmail.com> wrote:
Murali Kishore wrote:
Hi ,
We are trying interprocess scenario, one process created shared memory objects and killed and second process open the object using find_or_construct seeing this process blocked at getting lock.
bt as shown below, #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fa6b6733eb6 in _L_lock_941 () from /lib64/libpthread.so.0 #2 0x00007fa6b6733daf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7fa6b198b070) at ../nptl/pthread_mutex_lock.c:113 #3 0x0000000001bc3822 in lock (this=0x7fa6b198b070) at /x86/include/boost/interprocess/sync/posix/recursive_mutex.hpp:90
Detecting mutexes held by dead processes requires the so-called POSIX robust mutexes. I can see in the Interprocess source code that those are enabled when the macro BOOST_INTERPROCESS_POSIX_ROBUST_MUTEXES is defined.
This macro is defined automatically
#if (_XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L)
https://github.com/boostorg/interprocess/blob/29cee9c6067f1d20ddb6421af15977...
but I suppose you can also try defining it manually and see if that helps, because _XOPEN_SOURCE and _POSIX_C_SOURCE are also user macros and you probably aren't defining any of them.
-- Regards, Murali Kishore

On 1/28/25 15:16, Murali Kishore via Boost wrote:
Thanks Peter Dimov, i have added logic unlock in signal handler, now i am not seeing this issue.
I am seeing one more issue, if i call construct object and do work and clear in loop of ~30000, i see following error while construct call.
It's not enough to just unlock the mutex (or recover it if it was abandoned). You also need to restore the state it was protecting to a consistent state. Which is often unrealistic to do since you don't know which part of the state is corrupted and how to restore it to a consistent state. For example, you don't have the means to repair the segment manager, if its internal object tree is left corrupted, and you don't know whether any of the objects stored in it are half-constructed or otherwise inconsistent. Typically, your best course of action when you detect an abandoned mutex is to scrap the data it protects and start from scratch. And also try hard to not abandon mutexes as much as possible, e.g. don't just kill the process on a signal and let it finish its work on the shared memory first.
participants (3)
-
Andrey Semashev
-
Murali Kishore
-
Peter Dimov