Re: [Boost-users] Compatibility between shared_ptr and Rational RoseRT

Ross Manges wrote:
There is one restriction: all write accesses to a specific shared_ptr need to be exclusive. This is the usual Posix model, also called "as thread safe as an int" and "basic thread safety". It applies to most shared variables in a program. But I agree that it is easy to break the rules. And there is always the possibility that your program is fine, but there might be a thread-related error in shared_ptr (these are very difficult to find). So please let me know if you find out more.

Hi Peter, Thanks for the information. Our model is multi-threaded, so we are now taking a closer look at the way in which we are using shared_ptr, and checking if any of them are used across threads. Regards Dave Please respond to boost-users@lists.boost.org Sent by: boost-users-bounces@lists.boost.org To: <boost-users@lists.boost.org> cc: Subject: Re: [Boost-users] Compatibility between shared_ptr and Rational RoseRT Ross Manges wrote:
There is one restriction: all write accesses to a specific shared_ptr need to be exclusive. This is the usual Posix model, also called "as thread safe as an int" and "basic thread safety". It applies to most shared variables in a program. But I agree that it is easy to break the rules. And there is always the possibility that your program is fine, but there might be a thread-related error in shared_ptr (these are very difficult to find). So please let me know if you find out more. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users ------------------------------------------------------------ This email and any attached files contains company confidential information which may be legally privileged. It is intended only for the person(s) or entity to which it is addressed and solely for the purposes set forth therein. If you are not the intended recipient or have received this email in error please notify the sender by return, delete it from your system and destroy any local copies. It is strictly forbidden to use the information in this email including any attachment or part thereof including copying, disclosing, distributing, amending or using for any other purpose. In addition the sender excludes all liabilities (whether tortious or common law) for damage or breach arising or related to this email including but not limited to viruses and libel.

some info about my system: Red Hat Enterprise Linux 3, running 2.4.21-4.ELsmp (on a dual-processor dell with xeon's, if that matters) and here's the backtrace: (gdb) bt #0 0x0804a703 in atomic_increment (pw=0x21006) at sp_counted_base_gcc_x86.hpp:59 #1 0x0804a655 in boost::detail::sp_counted_base::add_ref_copy (this=0x21002) at sp_counted_base_gcc_x86.hpp:133 #2 0x0804a4d7 in boost::detail::shared_count::operator= (this=0xb191cde8, r=@0xb731e004) at shared_count.hpp:181 #3 0x08049a58 in boost::shared_ptr<Element>::operator= (this=0xb191cde4, r=@0xb731e000) at shared_ptr.hpp:148 #4 0x0804b089 in SimpleQueue::pop (this=0x804f290) at ../SimpleQueue.cpp:22 #5 0x080494ef in CommonThread::operator() (this=0x8050470) at ../CommonThread.cpp:11 #6 0x0804a06d in boost::detail::function::void_function_obj_invoker0<CommonThread, void>::invoke (function_obj_ptr= {obj_ptr = 0x8050470, const_obj_ptr = 0x8050470, func_ptr = 0x8050470, data = "p"}) at function_template.hpp:136 #7 0xb75ce283 in boost::thread_group::size () from /home/auser/devel/boost_cvs_install/lib/libboost_thread-gcc-mt-1_32.so.1.32.0 #8 0xb7493e21 in pthread_start_thread () from /lib/i686/libpthread.so.0 #9 0xb743208a in clone () from /lib/i686/libc.so.6

Ross Manges wrote:
You should probably redownload since the CVS contained some slightly broken versions for a short time. :-)
if I run against version 1_32, the backtrace is in the scope_lock instead of atomic_increment)
.. although your problem probably lies elsewhere. I'm not seeing any violations of the rules in your code; the problem is probably more mundane: ElementPtr SimpleQueue::pop() { ElementPtr retElem; if(!Queue.empty()) { Since this check is not protected by a mutex lock, it is possible for two or more threads to test for emptiness simultaneously when the queue contains only one element... boost::mutex::scoped_lock scoped_lock(mutex); retElem = Queue.back(); ... and then the second and subsequent threads will access an invalid .back(). This may explain your crash. Queue.pop_back(); } else { cout <<"******************* EMPTY QUEUE" << endl; } return retElem; }

Thanks for getting back to me so quickly! My comments are embedded below.
(gdb) bt #0 0x080e3f1b in ?? () #1 0x00000023 in ?? () #2 0x0804a97d in boost::detail::shared_count::operator= (this=0xb60dddf8, r=@0xb60dddd8) at shared_count.hpp:182 #3 0x08049a14 in boost::shared_ptr<Element>::operator= (this=0xb60dddf4, r=@0xb60dddd4) at shared_ptr.hpp:148 #4 0x0804948b in CommonThread::run (this=0x80ec258) at ../CommonThread.cpp:14 #5 0xb75ad7be in ost::ThreadImpl::ThreadExecHandler (th=0x80ec258) at thread.cpp:1110 #6 0xb75ac9bf in ccxx_exec_handler (th=0x80ec258) at thread.cpp:1136 #7 0xb754ae21 in pthread_start_thread () from /lib/i686/libpthread.so.0 #8 0xb73f308a in clone () from /lib/i686/libc.so.6

Ross Manges wrote:
Your problems are getting more subtle. The Boost threads version is fine, because pop looks like this in pseudocode: lock write retElem return retElem implicit unlock Since all accesses to retElem are protected, this doesn't violate the rules. But the CommonC++ version is: lock write retElem unlock return retElem and as you can see, another thread may enter the critical region and write retElem in parallel with the return statement, which needs to make a copy of retElem and place it in the return value.

Also, I didn't mean to hijack this thread from the RoseRT topic. It just so happens that we're going to be running this code through Purify in the near future, so I'm curious if the original poster of this thread has resolved his FIM errors? here's my current backtrace: (gdb) bt #0 0x0804ae63 in atomic_increment (pw=0x1d) at sp_counted_base_gcc_x86.hpp:59 #1 0x0804ad4f in boost::detail::sp_counted_base::add_ref_copy (this=0x19) at sp_counted_base_gcc_x86.hpp:133 #2 0x0804ae0d in shared_count (this=0xb15ef014, r=@0x80966b4) at shared_count.hpp:170 #3 0x0804b8a0 in shared_ptr (this=0xb15ef010, _ctor_arg=@0x80966b0) at ../Element.cpp:20 #4 0x0804c247 in _Construct<ElementPtr, boost::shared_ptr<Element> > (__p=0xb15ef010, __value=@0x80966b0) at stl_construct.h:78 <...snip...>

Ross Manges wrote:
Your problem is not related to shared_from_this at all: void SimpleQueue1::add(ElementPtr elem) { queue.push_back(elem); } You just forgot to protect queue.push_back by locking the queue mutex.

Just posting a last follow up. We've finally got our coding working well, and as it turns out, none of our shared_ptr problems were not actually shared_ptr problems, as you might have guessed. Most of our problems came from vectors or maps that were not locked down appropriately with muticies (like the example above). One serious bug that had us stumped for a long time was the rules for the validity of an iterator to a stl vector. We were calling my_vector.erase(it) and then happily using 'it' in a loop, not realizing that 'it' had been invalidated by the erase. Just a simple oversight that cost us a lot of hair-pulling and teeth grinding. It made the debugging even more difficult because all of our backtraces pointed to the shared_ptr scope_lock, which kept diverting us from the real cause. Anyway, thanks for all of your help. The Boost community has been great! --Ross

On 4/26/05, Ross Manges <ross.manges@ihmail.com> wrote:
In SimpleQueue::pop, you call Queue.empty() without locking the mutex. Move your scoped lock before the if (!Queue.empty ()) check and all is well. I've verified this on a 2-way Xeon box running an almost-identical kernel. -- Caleb Epstein caleb dot epstein at gmail dot com
participants (4)
-
Caleb Epstein
-
Dave.Ware@seleniacomms.com
-
Peter Dimov
-
Ross Manges