Re: [Boost-testing] Re: [thread] test_read_write_mutex breaks regressiontesting

OK, after seeing two of these tests continue to perform poorly and hang in some cases, I decided it was time to break out the big guns: Purify. It found a number of problems with test_mutex and test_read_write_mutex, which seem to be triggered by Boost.Test making use of a string after it has been freed. Attached is the Purify output from test_mutex and test_read_write_mutex. -- Caleb Epstein caleb dot epstein at gmail dot com

On 4/26/05, Caleb Epstein <caleb.epstein@gmail.com> wrote:
OK, after seeing two of these tests continue to perform poorly and hang in some cases, I decided it was time to break out the big guns: Purify. It found a number of problems with test_mutex and test_read_write_mutex, which seem to be triggered by Boost.Test making use of a string after it has been freed.
Attached is the Purify output from test_mutex and test_read_write_mutex.
Digging a little deeper, I think the root of the problem is that multiple threads are touching the guts of the Boost.Test logging code at the same time, and its static variables are being manipulated without thread protections. Here's where Purify notices the problem: (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where current thread: t@6 [1] purify_stop_here(0x1, 0x6b8c4, 0x0, 0x0, 0x0, 0x0), at 0x7e944 [2] _x_trap_and_go(0x74, 0x0, 0x42, 0x0, 0x0, 0x0), at 0x7e8dc [3] std::transform<char*,char*,char(*)(char)>(first = 0x1f2e7f "hread/test/util.inl", last = 0x1f2e92 "", result = 0x1f2e7f "hread/test/util.inl", op = 0xb1e98 = &boost::unit_test::set_unix_slash(char)), line 452 in "algorithm.cc" =>[4] boost::unit_test::unit_test_log_t::operator<<(this = 0x1d336c, f = STRUCT), line 277 in "unit_test_log.ipp" [5] boost::test_tools::tt_detail::check_impl(pr = CLASS, check_descr = CLASS, file_name = CLASS, line_num = 34U, tl = PASS, ct = CHECK_EQUAL, num_of_args = 2U, ...), line 144 in "test_tools.ipp" [6] boost::test_tools::tt_detail::check_impl<boost::test_tools::tt_detail::equal_impl_frwd,int,int>(P = STRUCT, check_descr = CLASS, file_name = CLASS, line_num = 34U, tool_level = CHECK, ct = CHECK_EQUAL, arg0 = 1, arg0_descr = 0x191f95 "boost::xtime_get(&xt, boost::TIME_UTC)", arg1 = 1, arg1_descr = 0x191fbc "static_cast<int>(boost::TIME_UTC)"), line 402 in "test_tools.hpp" [7] __unnamed_lpburU7kbCGP9::delay(secs = 0, msecs = 100, nsecs = 0), line 34 in "util.inl" [8] test_lock<boost::mutex>::operator()(this = 0x7e27fbe3), line 44 in "test_mutex.cpp" [9] do_test_mutex(), line 169 in "test_mutex.cpp" [10] boost::detail::function::void_function_invoker0<void(*)(),void>::invoke(function_ptr = UNION), line 103 in "function_template.hpp" [11] boost::function0<void,int>::operator()(this = 0x7e27fd24), line 576 in "function_template.hpp" [12] thread_proxy(param = 0x7e3ffb38), line 113 in "thread.cpp" And here is the method in question: (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) l 271 unit_test_log_t& 272 unit_test_log_t::operator<<( log::file const& f ) 273 { 274 assign_op( s_log_impl().m_entry_data.m_file, f.m_file_name, 0 ); 275 276 // normalize file name 277 std::transform( s_log_impl().m_entry_data.m_file.begin(), s_log_impl().m_entry_data.m_file.end(), 278 s_log_impl().m_entry_data.m_file.begin(), 279 &set_unix_slash ); 280 But s_log_impl().m_entry_data.m_file is not equal to f.m_file_name as it should be: (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print f.m_file_name f.m_file_name = { m_begin = 0x191f52 "/home/nbde52d/src/boost-regression/boost/libs/thread/test/util.inl" m_end = 0x191f94 "" } (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) print s_log_impl().m_entry_data.m_file boost::unit_test::__unnamed_lpburnXebCmne::s_log_impl().m_entry_data.m_file = { __data_ = { __data_ = 0x1f2f10 "../libs/thread/test/test_mutex.cpp" } __nullref = struct __rwstd::__null_string_ref_rep<char,std::char_traits<char>,std::allocator<char>,__rwstd::__string_ref_rep<std::allocator<char>
/* STATIC CLASS */ }
-- Caleb Epstein caleb dot epstein at gmail dot com

(/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) l 271 unit_test_log_t& 272 unit_test_log_t::operator<<( log::file const& f ) 273 { 274 assign_op( s_log_impl().m_entry_data.m_file, f.m_file_name, 0 ); 275 276 // normalize file name 277 std::transform( s_log_impl().m_entry_data.m_file.begin(), s_log_impl().m_entry_data.m_file.end(), 278 s_log_impl().m_entry_data.m_file.begin(), 279 &set_unix_slash ); 280
But s_log_impl().m_entry_data.m_file is not equal to f.m_file_name as it should be
As you could see f.m_file_name is assigned to s_log_impl().m_entry_data.m_file in line 274. So the only possible explanation is that 2 threads entered this function simultaneously. Gennadiy

On 4/26/05, Gennadiy Rozental <gennadiy.rozental@thomson.com> wrote:
As you could see f.m_file_name is assigned to s_log_impl().m_entry_data.m_file in line 274. So the only possible explanation is that 2 threads entered this function simultaneously.
Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO. Basically they change the code to avoid calling the Boost.Test code except in the case that a test has actually failed. As a result, there will be less progress reported when --log_level=X is specified, though the checks *are* being done. Not sure if these changes are acceptable, but they at least prevent memory corruption and crashes on this and possibly other platforms. -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein wrote:
On 4/26/05, Gennadiy Rozental <gennadiy.rozental@thomson.com> wrote:
As you could see f.m_file_name is assigned to s_log_impl().m_entry_data.m_file in line 274. So the only possible explanation is that 2 threads entered this function simultaneously.
Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO. Basically they change the code to avoid calling the Boost.Test code except in the case that a test has actually failed. As a result, there will be less progress reported when --log_level=X is specified, though the checks *are* being done.
Not sure if these changes are acceptable, but they at least prevent memory corruption and crashes on this and possibly other platforms.
What's a convenient tool for applying diffs on Windows? Mike

Michael Glassford wrote:
What's a convenient tool for applying diffs on Windows?
Not sure about convenient.. certainly standard: http://gnuwin32.sourceforge.net/packages/patch.htm Patch for Windows -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

On 4/26/05, Michael Glassford <glassfordm@hotmail.com> wrote:
What's a convenient tool for applying diffs on Windows?
I use the Cygwin tools to do UNIX-y stuff on Windows. There is a package for "patch". This might work also: http://gnuwin32.sourceforge.net/packages/patch.htm -- Caleb Epstein caleb dot epstein at gmail dot com

On 4/26/05, Caleb Epstein <caleb.epstein@gmail.com> wrote:
Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO.
Any progres on getting these patches applied? My daily test run is still being held up by these two broken tests. They do not exit, at least not in anything resembling a reasonable amount of time (hours). -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein wrote:
On 4/26/05, Caleb Epstein <caleb.epstein@gmail.com> wrote:
Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO.
Any progres on getting these patches applied? My daily test run is still being held up by these two broken tests. They do not exit, at least not in anything resembling a reasonable amount of time (hours).
I suggest to disable these tests until the issues are sorted out. There's no point in running tests which are known to break the test system. Regards, m

Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO.
Any progres on getting these patches applied? My daily test run is still being held up by these two broken tests. They do not exit, at least not in anything resembling a reasonable amount of time (hours).
And as a point of principle: we should really apply similar patches to all of the thread test programs that use multiple threads, and Boost.Test (presumably all of them?). John.

On 4/29/05, John Maddock <john@johnmaddock.co.uk> wrote:
Here's are patches which fix the memory corruption problems of test_mutex and test_read_write_mutex on SunPRO.
Any progres on getting these patches applied? My daily test run is still being held up by these two broken tests. They do not exit, at least not in anything resembling a reasonable amount of time (hours).
And as a point of principle: we should really apply similar patches to all of the thread test programs that use multiple threads, and Boost.Test (presumably all of them?).
Here's a quick analysis of the Boost.Thread test programs and their usage of Boost.Test: * test_barrier.cpp - OK. Doesn't use BOOST_CHECK until after all threads have been joined * test_condition.cpp - looks like it needs work, though this test never hangs/fails for me so perhaps my analysis is incorrect. * test_mutex.cpp - Patched (via util.inl), but I think I still see more potential problems. * test_once.cpp - OK. See test_barrier.cpp * test_read_write_mutex.cpp - patched to not use BOOST_CHECK. Calls BOOST_ERROR if tests fail. * test_thread.cpp - OK. only one thread should be active at a time * test_tss.cpp - OK. Uses a mutex to protect calls to BOOST_CHECK and there is a comment musing on the lack of thread-safety there. * test_xtime.cpp: OK. Doesn't spawn threads. So basically, I'm slightly concerned about test_mutex.cpp (after my patch) and test_condition.cpp but I think other than that the rest of the test programs are OK. -- Caleb Epstein caleb dot epstein at gmail dot com

Caleb Epstein wrote:
Here's a quick analysis of the Boost.Thread test programs and their usage of Boost.Test:
* test_barrier.cpp - OK. Doesn't use BOOST_CHECK until after all threads have been joined
* test_condition.cpp - looks like it needs work, though this test never hangs/fails for me so perhaps my analysis is incorrect.
* test_mutex.cpp - Patched (via util.inl), but I think I still see more potential problems.
* test_once.cpp - OK. See test_barrier.cpp
* test_read_write_mutex.cpp - patched to not use BOOST_CHECK. Calls BOOST_ERROR if tests fail.
* test_thread.cpp - OK. only one thread should be active at a time
* test_tss.cpp - OK. Uses a mutex to protect calls to BOOST_CHECK and there is a comment musing on the lack of thread-safety there.
* test_xtime.cpp: OK. Doesn't spawn threads.
So basically, I'm slightly concerned about test_mutex.cpp (after my patch) and test_condition.cpp but I think other than that the rest of the test programs are OK.
Bill Kempf was aware of the fact that Boost.Test isn't thread-safe (in fact, we found it out the hard way, too). I think he adjusted all the tests before he disappeared. Regards, m
participants (6)
-
Caleb Epstein
-
Gennadiy Rozental
-
John Maddock
-
Martin Wille
-
Michael Glassford
-
Rene Rivera