missing __volatile__ tags in sp_counted_base_gcc_xxx.hpp

Hello, I've been investigating some failures on Mac OS X, ppc concerning shared_ptr. One of the things that has struck me is that in boost/ detail/sp_counted_base_gcc_ppc.hpp, the __asm__ block in atomic_decrement is decorated with __volatile__, but the __asm__ blocks in atomic_increment and atomic_conditional_increment are not. The failure I'm seeing disappears when the __volatile__ attribute is added in the two missing places. Would this be a proper fix to boost/ detail/sp_counted_base_gcc_ppc.hpp? Should boost/detail/ sp_counted_base_gcc_x86.hpp be treated similarly? Thanks, Howard

Howard Hinnant wrote:
Hello,
I've been investigating some failures on Mac OS X, ppc concerning shared_ptr. One of the things that has struck me is that in boost/ detail/sp_counted_base_gcc_ppc.hpp, the __asm__ block in atomic_decrement is decorated with __volatile__, but the __asm__ blocks in atomic_increment and atomic_conditional_increment are not. The failure I'm seeing disappears when the __volatile__ attribute is added in the two missing places.
This is, I believe, intentional; the increments shouldn't be reorder-sensitive, at least in theory. Why/how does it fail?

On Mar 13, 2006, at 3:37 PM, Peter Dimov wrote:
Howard Hinnant wrote:
Hello,
I've been investigating some failures on Mac OS X, ppc concerning shared_ptr. One of the things that has struck me is that in boost/ detail/sp_counted_base_gcc_ppc.hpp, the __asm__ block in atomic_decrement is decorated with __volatile__, but the __asm__ blocks in atomic_increment and atomic_conditional_increment are not. The failure I'm seeing disappears when the __volatile__ attribute is added in the two missing places.
This is, I believe, intentional; the increments shouldn't be reorder-sensitive, at least in theory. Why/how does it fail?
I'm seeing (only with inlining/optimizations on) that the reference count is not properly incremented. This is with a single-threaded test case. I haven't fully understood the code differences yet, but it appears that code is being reordered across these assembly blocks in such a way as to change the logic. I'm not familiar enough with gcc assembly semantics to know if the volatile attribute (on the assembly block itself, not any data) is required to make inline assembly behave itself. Again, this is not a multithread issue that I can see, as the failures I'm seeing are in single threaded code. Thanks for your comments. -Howard

Howard Hinnant wrote:
On Mar 13, 2006, at 3:37 PM, Peter Dimov wrote:
Howard Hinnant wrote:
Hello,
I've been investigating some failures on Mac OS X, ppc concerning shared_ptr. One of the things that has struck me is that in boost/ detail/sp_counted_base_gcc_ppc.hpp, the __asm__ block in atomic_decrement is decorated with __volatile__, but the __asm__ blocks in atomic_increment and atomic_conditional_increment are not. The failure I'm seeing disappears when the __volatile__ attribute is added in the two missing places.
This is, I believe, intentional; the increments shouldn't be reorder-sensitive, at least in theory. Why/how does it fail?
I'm seeing (only with inlining/optimizations on) that the reference count is not properly incremented. This is with a single-threaded test case. I haven't fully understood the code differences yet, but it appears that code is being reordered across these assembly blocks in such a way as to change the logic. I'm not familiar enough with gcc assembly semantics to know if the volatile attribute (on the assembly block itself, not any data) is required to make inline assembly behave itself.
Again, this is not a multithread issue that I can see, as the failures I'm seeing are in single threaded code.
There is a known bug in 4.0: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21528 that can have this effect, but it apparently has been fixed in 4.0.1. There might be others that the boost test doesn't uncover. I think that the compiler is not supposed to reorder across __asm__ blocks in a way that could change the logic and if such reordering happens it should be reported as a bug against g++.

On Mar 13, 2006, at 6:00 PM, Peter Dimov wrote:
I'm seeing (only with inlining/optimizations on) that the reference count is not properly incremented. This is with a single-threaded test case. I haven't fully understood the code differences yet, but it appears that code is being reordered across these assembly blocks in such a way as to change the logic. I'm not familiar enough with gcc assembly semantics to know if the volatile attribute (on the assembly block itself, not any data) is required to make inline assembly behave itself.
Again, this is not a multithread issue that I can see, as the failures I'm seeing are in single threaded code.
There is a known bug in 4.0:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21528
that can have this effect, but it apparently has been fixed in 4.0.1. There might be others that the boost test doesn't uncover.
I think that the compiler is not supposed to reorder across __asm__ blocks in a way that could change the logic and if such reordering happens it should be reported as a bug against g++.
That was very kind of you to look that up for me, thanks. Since my last posting I've also become convinced that this is an optimization bug in gcc. It appears that exactly one call to atomic_increment has been optimized away. I'll investigate further whether 21528 is the cause/fix. Thanks much. -Howard

On Mar 13, 2006, at 6:08 PM, Howard Hinnant wrote:
On Mar 13, 2006, at 6:00 PM, Peter Dimov wrote:
I'm seeing (only with inlining/optimizations on) that the reference count is not properly incremented. This is with a single-threaded test case. I haven't fully understood the code differences yet, but it appears that code is being reordered across these assembly blocks in such a way as to change the logic. I'm not familiar enough with gcc assembly semantics to know if the volatile attribute (on the assembly block itself, not any data) is required to make inline assembly behave itself.
Again, this is not a multithread issue that I can see, as the failures I'm seeing are in single threaded code.
There is a known bug in 4.0:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21528
that can have this effect, but it apparently has been fixed in 4.0.1. There might be others that the boost test doesn't uncover.
I think that the compiler is not supposed to reorder across __asm__ blocks in a way that could change the logic and if such reordering happens it should be reported as a bug against g++.
That was very kind of you to look that up for me, thanks. Since my last posting I've also become convinced that this is an optimization bug in gcc. It appears that exactly one call to atomic_increment has been optimized away. I'll investigate further whether 21528 is the cause/fix. Thanks much.
I'm back.... ;-) My real problem is that I'm a neophyte in the bewildering world of gcc asm syntax. I've been studying: http://gcc.gnu.org/onlinedocs/gcc-4.0.3/gcc/Extended- Asm.html#Extended-Asm on the good advice of some colleagues. The other advice I'm getting is that there is still a bug in: boost/detail/sp_counted_base_gcc_ppc.hpp Looking at just atomic_increment: inline void atomic_increment( int * pw ) { int tmp; __asm__ ( "0:\n\t" "lwarx %1, 0, %2\n\t" "addi %1, %1, 1\n\t" "stwcx. %1, 0, %2\n\t" "bne- 0b": "=m"( *pw ), "=&b"( tmp ): "r"( pw ): "cc" ); } My current understanding is that the "=m" constraint indicates that *pw is write-only, and should be changed to "+m" to indicate read/ write (to memory). Indeed when I make this change, my test case clears right up. It also appears that atomic_decrement and atomic_conditional_increment could use this treatment as well. Looking forward to further discussion on this... -Howard

Howard Hinnant wrote:
My real problem is that I'm a neophyte in the bewildering world of gcc asm syntax.
Aren't we all?
I've been studying:
http://gcc.gnu.org/onlinedocs/gcc-4.0.3/gcc/Extended- Asm.html#Extended-Asm
on the good advice of some colleagues. The other advice I'm getting is that there is still a bug in:
boost/detail/sp_counted_base_gcc_ppc.hpp
Looking at just atomic_increment:
inline void atomic_increment( int * pw ) { int tmp;
__asm__ ( "0:\n\t" "lwarx %1, 0, %2\n\t" "addi %1, %1, 1\n\t" "stwcx. %1, 0, %2\n\t" "bne- 0b":
"=m"( *pw ), "=&b"( tmp ): "r"( pw ): "cc" ); }
My current understanding is that the "=m" constraint indicates that *pw is write-only, and should be changed to "+m" to indicate read/ write (to memory). Indeed when I make this change, my test case clears right up. It also appears that atomic_decrement and atomic_conditional_increment could use this treatment as well.
Looks like a bug. Hmm. I think I remember something about +m causing internal compiler errors and not working in general (for some versions of g++); that's why the x86 version uses "=m"(*pw) as an output and "m"(*pw) as an input. But there are no "m"(*pw) inputs in PPC and IA64! Can you try the +m change on as many g++ versions as possible? I may be misremembering things. Or can you try the potentially safer alternative of adding "m"( *pw ) as an input and see whether this also works? If you contribute your test case, this will be appreciated, too.

Sorry for the delay in the follow up. On Mar 16, 2006, at 7:03 PM, Peter Dimov wrote:
Can you try the +m change on as many g++ versions as possible? I may be misremembering things. Or can you try the potentially safer alternative of adding "m"( *pw ) as an input and see whether this also works?
Yes, that seems to work just fine too.
If you contribute your test case, this will be appreciated, too.
The original report was one of the boost unit tests: ---- file boost_unit_test_gcc4_pb_sample.cpp --- #include <iostream> #include <boost/test/unit_test.hpp> using boost::unit_test_framework::test_suite; using boost::unit_test_framework::test_case; using namespace std; class SampleTest { public: SampleTest (int val) : a (val) { } int a; void sampleTest () { std::cout << "a : " << a << std::endl; BOOST_CHECK_EQUAL (a, 12345); } }; test_suite* init_unit_test_suite( int argc, char* argv[] ) { test_suite* test= BOOST_TEST_SUITE( "Sample test" ); boost::shared_ptr<SampleTest> instance (new SampleTest (12345)); test->add (BOOST_CLASS_TEST_CASE ( & SampleTest::sampleTest, instance)); return test; } ---- end-of-file boost_unit_test_gcc4_pb_sample.cpp --- The symptom showed itself by having the SampleTest destructor (or malloc) scribble on destructed/deleted memory. I later narrowed it down to that included below. Sorry, it is kind of big. I never did reduce it further. #include <boost/shared_ptr.hpp> namespace boost { namespace ut_detail { struct unused {}; template<typename R> struct invoker { template<typename Functor> R invoke( Functor& f ) { return f(); } template<typename Functor, typename T1> R invoke( Functor& f, T1 t1 ) { return f( t1 ); } template<typename Functor, typename T1, typename T2> R invoke( Functor& f, T1 t1, T2 t2 ) { return f( t1, t2 ); } template<typename Functor, typename T1, typename T2, typename T3> R invoke( Functor& f, T1 t1, T2 t2, T3 t3 ) { return f( t1, t2, t3 ); } }; template<> struct invoker<unused> { template<typename Functor> unused invoke( Functor& f ) { f(); return unused(); } template<typename Functor, typename T1> unused invoke( Functor& f, T1 t1 ) { f( t1 ); return unused(); } template<typename Functor, typename T1, typename T2> unused invoke( Functor& f, T1 t1, T2 t2 ) { f( t1, t2 ); return unused(); } template<typename Functor, typename T1, typename T2, typename T3> unused invoke( Functor& f, T1 t1, T2 t2, T3 t3 ) { f( t1, t2, t3 ); return unused(); } }; } namespace ut_detail { template<typename R> struct callback0_impl { virtual ~callback0_impl() {} virtual R invoke() = 0; }; template<typename R, typename Functor> struct callback0_impl_t : callback0_impl<R> { explicit callback0_impl_t( Functor f ) : m_f( f ) {} virtual R invoke() { return invoker<R>().invoke( m_f ); } private: Functor m_f; }; template<typename R = void> class callback0 { public: callback0() {} template<typename Functor> callback0( Functor f ) : m_impl( new ut_detail::callback0_impl_t<R,Functor>( f ) ) {} void operator=( callback0 const& rhs ) { m_impl = rhs.m_impl; } template<typename Functor> void operator=( Functor f ) { m_impl.reset( new ut_detail::callback0_impl_t<R,Functor>( f ) ); } R operator()() const { return m_impl->invoke(); } bool operator!() const { return !m_impl; } private: boost::shared_ptr<ut_detail::callback0_impl<R> > m_impl; }; class test_case { public: enum { type = 1 }; test_case( callback0<> const& test_func ); callback0<> const& test_func() const { return m_test_func; } private: friend class framework_impl; ~test_case() {} callback0<> m_test_func; }; inline test_case::test_case( callback0<> const& test_func ) : m_test_func( test_func ) { } } namespace ut_detail { template<typename UserTestCase> struct user_tc_method_invoker { typedef void (UserTestCase::*test_method )(); user_tc_method_invoker( shared_ptr<UserTestCase> inst, test_method tm ) : m_inst( inst ), m_test_method( tm ) {} void operator()() { ((*m_inst).*m_test_method)(); } shared_ptr<UserTestCase> m_inst; test_method m_test_method; }; } template<typename UserTestCase> ut_detail::test_case* make_test_case( void (UserTestCase::*test_method )(), boost::shared_ptr<UserTestCase> const& user_test_case ) { return new ut_detail::test_case ( ut_detail::user_tc_method_invoker<UserTestCase>( user_test_case, test_method ) ); } } class SampleTest { public: SampleTest (int val) : a (val) { } int a; void sampleTest () { } }; SampleTest* g; int main() { boost::ut_detail::test_case* t; { boost::shared_ptr<SampleTest> instance (g = new SampleTest (12345)); std::cout << "instance.use_count = " << instance.use_count() << " before add: g = " << g << " *g = " << g->a << '\n'; t = boost::make_test_case((& SampleTest::sampleTest), instance ); std::cout << "instance.use_count = " << instance.use_count() << " after add: g = " << g << " *g = " << g->a << '\n'; } std::cout << "after add: g = " << g << " *g = " << g->a << '\n'; return t != 0; } expecting an output similar to: instance.use_count = 1 before add: g = 0x30010 *g = 12345 instance.use_count = 2 after add: g = 0x30010 *g = 12345 after add: g = 0x30010 *g = 12345 The use_count values were the dead giveaway. The second one was coming out 1 or even sometimes 0. -Howard

Howard Hinnant wrote:
Sorry for the delay in the follow up.
On Mar 16, 2006, at 7:03 PM, Peter Dimov wrote:
Can you try the +m change on as many g++ versions as possible? I may be misremembering things. Or can you try the potentially safer alternative of adding "m"( *pw ) as an input and see whether this also works?
Yes, that seems to work just fine too.
Added to CVS, and thanks.
participants (2)
-
Howard Hinnant
-
Peter Dimov