BOOST_THREAD_HAS_EINTR_BUG...

Not really sure where to bring this issue up. I have a singleton globals object with a session manager, share manager, etc. that is destroyed atexit() on final reference release. There are a few mutexes in the implementation. This functions fine on Windows, MacOS, etc - the globals object is released during atexit() and shutdown proceeds normally. Under Linux when BOOST_THREAD_HAS_EINTR_BUG is defined we SIGABRT at this: ~mutex() { BOOST_VERIFY(!posix::pthread_mutex_destroy(&m)); } The verify fails and we abort(). When I define BOOST_THREAD_HAS_NO_EINTR_BUG we do not abort - which means that posix::pthread_mutex_destroy() returns 0 in the case where BOOST_THREAD_HAS_NO_EINTR_BUG is defined and returns non-zero in the case of it not being defined. Guidance? Also point me at a better place for this discussion if there is such a place. Thanks, David Bien

Em sex., 26 de abr. de 2024 às 13:51, David Bien via Boost <boost@lists.boost.org> escreveu:
Under Linux when BOOST_THREAD_HAS_EINTR_BUG is defined we SIGABRT at this:
~mutex() { BOOST_VERIFY(!posix::pthread_mutex_destroy(&m)); }
How is the code for BOOST_THREAD_HAS_NO_EINTR_BUG? Are UNIX signals being involved? -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/

Apparently abort() calls raise(SIGABRT) under Linux. /* This prints an "Assertion failed" message and aborts. */ extern void __assert_fail (const char *__assertion, const char *__file, unsigned int __line, const char *__function) __THROW __attribute__ ((__noreturn__)); And I guess that one way to abort is to throw an __attribute__ ((__noreturn__)). Or, perhaps more likely, it is throwing into no catch() and then throwing itself causes an abort() due to no handler being present above atexit(). bien ________________________________ From: Vinícius dos Santos Oliveira <vini.ipsmaker@gmail.com> Sent: Friday, April 26, 2024 10:05 AM To: boost@lists.boost.org <boost@lists.boost.org> Cc: David Bien <davidbien@hotmail.com> Subject: Re: [boost] BOOST_THREAD_HAS_EINTR_BUG... Em sex., 26 de abr. de 2024 às 13:51, David Bien via Boost <boost@lists.boost.org> escreveu:
Under Linux when BOOST_THREAD_HAS_EINTR_BUG is defined we SIGABRT at this:
~mutex() { BOOST_VERIFY(!posix::pthread_mutex_destroy(&m)); }
How is the code for BOOST_THREAD_HAS_NO_EINTR_BUG? Are UNIX signals being involved? -- Vinícius dos Santos Oliveira https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fvinipsmaker.github.io%2F&data=05%7C02%7C%7Ca3335f7546f240f606ae08dc661307e6%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638497479127208113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Z4o45i1vO7qyfLVFSBFq8purCVRwxe7CVU3nu7lxarA%3D&reserved=0<https://vinipsmaker.github.io/>

Vinícius dos Santos Oliveira wrote:
Em sex., 26 de abr. de 2024 às 13:51, David Bien via Boost <boost@lists.boost.org> escreveu:
Under Linux when BOOST_THREAD_HAS_EINTR_BUG is defined we SIGABRT at this:
~mutex() { BOOST_VERIFY(!posix::pthread_mutex_destroy(&m)); }
What is the return value of pthread_mutex_destroy here?
How is the code for BOOST_THREAD_HAS_NO_EINTR_BUG? Are UNIX signals being involved?
It's used here: https://github.com/boostorg/thread/blob/aec18d337f41d8e3081ee65f5cf3b5090179... Basically, the relevant code is: #ifdef BOOST_THREAD_HAS_EINTR_BUG BOOST_FORCEINLINE BOOST_THREAD_DISABLE_THREAD_SAFETY_ANALYSIS int pthread_mutex_destroy(pthread_mutex_t* m) { int ret; do { ret = ::pthread_mutex_destroy(m); } while (ret == EINTR); return ret; } #else BOOST_FORCEINLINE BOOST_THREAD_DISABLE_THREAD_SAFETY_ANALYSIS int pthread_mutex_destroy(pthread_mutex_t* m) { return ::pthread_mutex_destroy(m); } #endif The only scenario in which the former can return nonzero when the latter doesn't would be if it returns EINTR, and then tries to destroy the mutex again, and then get EINVAL because the mutex has already been destroyed. But this doesn't make sense because in this case the other function would have returned EINTR, which is also nonzero, so it should also fail the VERIFY. So I'm at a loss here. More printf debugging (printing the return values in both cases) is needed.

Em sex., 26 de abr. de 2024 às 14:25, Peter Dimov <pdimov@gmail.com> escreveu:
So I'm at a loss here. More printf debugging (printing the return values in both cases) is needed.
Same here. EINTR should only happen for UNIX signals AFAIK (and SIGABRT only happens much later... after the call to pthread_mutex_detroy already returned, so it's not important yet). However glibc isn't even using syscalls for pthread_mutex_destroy from what I'm seeing: https://github.com/bminor/glibc/blob/master/nptl/pthread_mutex_destroy.c -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/

It seems there is some type of race condition involved. It both works, fails, and hangs regardless of BOOST_THREAD_HAS_EINTR_BUG defined. Weird cuz I never saw anything of the sort in weeks of testing under Windows, and a couple days of testing under MacOS - both using boost synchronization objects. When it aborts the return is EBUSY. I am going to change it to use STL just as a data point. The condition variable/mutex pattern here: ~CShareMgr() { { boost::lock_guard< boost::mutex > lock( m_cvMutex ); m_fRunning = false; } m_cv.notify_one(); if ( m_thread.joinable() ) // wait on cleanup thread here. { m_thread.join(); } } void CShareMgr::_CleanupThread() { while ( true ) { boost::unique_lock< boost::mutex > lock( m_cvMutex ); if ( m_cv.wait_for( lock, boost::chrono::minutes( 1 ), [ this ] { return !m_fRunning; } ) ) return; lock.unlock(); _CleanupThreadVolatile(); } } void CShareMgr::_CleanupThreadVolatile() volatile { std::vector<std::shared_ptr<volatile CShare>> vecShares; std::vector<vTyShareKey> vecExpiredKeys; // 1) Create a vector of shared ptr of shares. std::chrono::minutes nMinutesCleanupShare; { _TyLockingPtrRead lockRead(*this, m_mapMutex); for (auto& pair : lockRead->m_map) vecShares.push_back(pair.second); nMinutesCleanupShare = lockRead->m_nMinutesCleanupShare; } // 2) Let the lockRead go out of scope. // 3) Peruse the array of shares backward adding keys of expired shares to vecExpiredKeys auto now = std::chrono::system_clock::now(); for (auto it = vecShares.rbegin(); it != vecShares.rend(); ++it) { if (now - (*it)->GetLastAccessed() > nMinutesCleanupShare) vecExpiredKeys.push_back((*it)->GetShareKey()); } // 4) Use the write lock to lock the map and then remove all expired share from the map _TyLockingPtrWrite lockWrite(*this, m_mapMutex); for (const auto& key : vecExpiredKeys) lockWrite->m_map.erase(key); } Perhaps there is something wrong with my usage above? FYI: I use http://erdani.org/publications/cuj-02-2001.php.html cuz I've been bitten before... 😉. [http://i.cmpnet.com/ddj/digital/ddj.gif]<http://erdani.org/publications/cuj-02-2001.php.html> volatile: The Multithreaded Programmer's Best Friend<http://erdani.org/publications/cuj-02-2001.php.html> The volatile keyword was devised to prevent compiler optimizations that might render code incorrect in the presence of certain asynchronous events. erdani.org ________________________________ From: Boost <boost-bounces@lists.boost.org> on behalf of Peter Dimov via Boost <boost@lists.boost.org> Sent: Friday, April 26, 2024 10:24 AM To: boost@lists.boost.org <boost@lists.boost.org> Cc: Peter Dimov <pdimov@gmail.com> Subject: Re: [boost] BOOST_THREAD_HAS_EINTR_BUG... Vinícius dos Santos Oliveira wrote:
Em sex., 26 de abr. de 2024 às 13:51, David Bien via Boost <boost@lists.boost.org> escreveu:
Under Linux when BOOST_THREAD_HAS_EINTR_BUG is defined we SIGABRT at this:
~mutex() { BOOST_VERIFY(!posix::pthread_mutex_destroy(&m)); }
What is the return value of pthread_mutex_destroy here?
How is the code for BOOST_THREAD_HAS_NO_EINTR_BUG? Are UNIX signals being involved?
It's used here: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fboostorg%2Fthread%2Fblob%2Faec18d337f41d8e3081ee65f5cf3b5090179ab0e%2Finclude%2Fboost%2Fthread%2Fpthread%2Fpthread_helpers.hpp%23L17&data=05%7C02%7C%7C8a131f3fbb1546d5138208dc6615d40e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638497491169778552%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=X%2B767f7giYai7qMLtBCTd65AfHC2HKYAnCAHMciSoqU%3D&reserved=0<https://github.com/boostorg/thread/blob/aec18d337f41d8e3081ee65f5cf3b5090179ab0e/include/boost/thread/pthread/pthread_helpers.hpp#L17> Basically, the relevant code is: #ifdef BOOST_THREAD_HAS_EINTR_BUG BOOST_FORCEINLINE BOOST_THREAD_DISABLE_THREAD_SAFETY_ANALYSIS int pthread_mutex_destroy(pthread_mutex_t* m) { int ret; do { ret = ::pthread_mutex_destroy(m); } while (ret == EINTR); return ret; } #else BOOST_FORCEINLINE BOOST_THREAD_DISABLE_THREAD_SAFETY_ANALYSIS int pthread_mutex_destroy(pthread_mutex_t* m) { return ::pthread_mutex_destroy(m); } #endif The only scenario in which the former can return nonzero when the latter doesn't would be if it returns EINTR, and then tries to destroy the mutex again, and then get EINVAL because the mutex has already been destroyed. But this doesn't make sense because in this case the other function would have returned EINTR, which is also nonzero, so it should also fail the VERIFY. So I'm at a loss here. More printf debugging (printing the return values in both cases) is needed. _______________________________________________ Unsubscribe & other changes: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost&data=05%7C02%7C%7C8a131f3fbb1546d5138208dc6615d40e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638497491169793535%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=%2BSvaxKUj05Yr91tn80uRbt%2BCRgK%2BnXuViPDrBTOVIU4%3D&reserved=0<http://lists.boost.org/mailman/listinfo.cgi/boost>

Sure, but then there must be a bug in my code... do you see one? It's pretty simple - and once again has worked for many weeks of testing under other platforms. ________________________________ From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:15 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG... David Bien wrote:
When it aborts the return is EBUSY.
EBUSY means that you're destroying a locked mutex, so the abort is legitimate.

David Bien wrote:
Sure, but then there must be a bug in my code... do you see one? It's pretty simple - and once again has worked for many weeks of testing under other platforms.
No, I don't see a bug in the code you posted. (Assuming that m_thread is joinable and executes _CleanupThread.) Incidentally, underscore followed by a capital letter is reserved to the implementation and you're not supposed to use such identifiers.
________________________________
From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:15 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG...
David Bien wrote:
When it aborts the return is EBUSY.
EBUSY means that you're destroying a locked mutex, so the abort is legitimate.

"Incidentally, underscore followed by a capital letter is reserved to the implementation and you're not supposed to use such identifiers." Yeah, people have been telling me that for 35 years of writing C++ - the fact is that the compiler mangles every single name - as you well know... I literally have never encountered a single problem due to my coding style. Yeah, the pattern looks correct to me as well - I mean if the m_cv and m_cvMutex are passing then the thread should exit, no? It seems foolproof - the code that is - no reason it shouldn't exit the cleanup thread. I'd love for someone to poke holes in this theory. bien ________________________________ From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:25 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG... David Bien wrote:
Sure, but then there must be a bug in my code... do you see one? It's pretty simple - and once again has worked for many weeks of testing under other platforms.
No, I don't see a bug in the code you posted. (Assuming that m_thread is joinable and executes _CleanupThread.) Incidentally, underscore followed by a capital letter is reserved to the implementation and you're not supposed to use such identifiers.
________________________________
From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:15 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG...
David Bien wrote:
When it aborts the return is EBUSY.
EBUSY means that you're destroying a locked mutex, so the abort is legitimate.

Ok, so I asked copilot - LMAO!!! It suggested, and I didn't think it made sense but I tried it, to change the class member declaration order to destruct the std::thread first. std::weak_ptr< CGlobals > m_wpGlobals; std::unordered_map< boost::uuids::uuid, CSession, std::hash< boost::uuids::uuid > > m_sessions; boost::shared_mutex m_mutex; std::chrono::minutes m_nMinutesCleanupSession; bool m_fRunning; boost::mutex m_cvMutex; boost::condition_variable m_cv; std::thread m_thread; I had it somewhere in the middle - but importantly above the m_cv and m_cvMutex. I ran it thirty times or so and didn't get a race or an EBUSY. Granted this is no guarantee - but it always failed before within a few runs. Oh, and I also changed this - but things still failed after this change. ~CShareMgr() { { boost::lock_guard< boost::mutex > lock( m_cvMutex ); m_fRunning = false; m_cv.notify_one(); // moved this up from below the block. } if ( m_thread.joinable() ) { m_thread.join(); } } Does the change to destructing the std::thread() first make any sense to anyone at all? Cuz it doesn't make sense to me - the presumption is that the thread wasn't in a joinable state when joinable() was called? A bit confused, bien ________________________________ From: Boost <boost-bounces@lists.boost.org> on behalf of David Bien via Boost <boost@lists.boost.org> Sent: Friday, April 26, 2024 11:30 AM To: Peter Dimov <pdimov@gmail.com>; boost@lists.boost.org <boost@lists.boost.org> Cc: David Bien <davidbien@hotmail.com> Subject: Re: [boost] BOOST_THREAD_HAS_EINTR_BUG... "Incidentally, underscore followed by a capital letter is reserved to the implementation and you're not supposed to use such identifiers." Yeah, people have been telling me that for 35 years of writing C++ - the fact is that the compiler mangles every single name - as you well know... I literally have never encountered a single problem due to my coding style. Yeah, the pattern looks correct to me as well - I mean if the m_cv and m_cvMutex are passing then the thread should exit, no? It seems foolproof - the code that is - no reason it shouldn't exit the cleanup thread. I'd love for someone to poke holes in this theory. bien ________________________________ From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:25 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG... David Bien wrote:
Sure, but then there must be a bug in my code... do you see one? It's pretty simple - and once again has worked for many weeks of testing under other platforms.
No, I don't see a bug in the code you posted. (Assuming that m_thread is joinable and executes _CleanupThread.) Incidentally, underscore followed by a capital letter is reserved to the implementation and you're not supposed to use such identifiers.
________________________________
From: Peter Dimov <pdimov@gmail.com> Sent: Friday, April 26, 2024 11:15 AM To: 'David Bien' <davidbien@hotmail.com>; boost@lists.boost.org <boost@lists.boost.org> Subject: RE: [boost] BOOST_THREAD_HAS_EINTR_BUG...
David Bien wrote:
When it aborts the return is EBUSY.
EBUSY means that you're destroying a locked mutex, so the abort is legitimate.
_______________________________________________ Unsubscribe & other changes: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.boost.org%2Fmailman%2Flistinfo.cgi%2Fboost&data=05%7C02%7C%7Cebf66098729e4302d7cc08dc661f0555%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638497530615589376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=CI1k7on8sa8nfV%2Be8KRpGAzRXmeNiW%2FVrQL7ZCGB0%2Fc%3D&reserved=0<http://lists.boost.org/mailman/listinfo.cgi/boost>
participants (3)
-
David Bien
-
Peter Dimov
-
Vinícius dos Santos Oliveira