Auto unit test suite hangs due to locked mutex
I have searched the net for some indication of what I might be dealing with but have come up empty. This is on a Fedora Core 2 system running GCC 3.3.3. Boost was installed via RPM: # rpm -q boost boost-1.32.0-3.1.2 # I have a reasonably small auto unit test suite via Boost with 48 small discrete tests. It has been running fine previously. There are no tests that are known to invoke mutexes, Boost or otherwise. When we previously built Boost manually, this was not an issue; we're now trying to use the standard RPMs to make distribution/release building simpler. There is possibly an issue in how we're linking to libraries in our automake since those had to be changed to use the RPM installed libraries rather than our previously home-built ones. However, the make (compile, link, etc.) throws no errors, etc., and so we have no indication other than this hang that this is indeed where the problem lies. I had these make files, etc. working fine on a similarily installed system that instead ran FC4 with gcc 4.0.1 and Boost was installed as an RPM but at boost-1.32.0-6. When I copied to the FC2 system and set it up, I now see the issue. The lone symptom is that the test suite seems to complete but then hangs afterwards, apparently deadlocked on a mutex. When I run the test program under --log_level=all, I get the following output: ------------ $ mytestsuite --log_level=all Running 48 test cases... Entering test suite "Auto Unit Test" Entering test case "CounterConstruction" [...specific test output...] Leaving test case "CounterConstruction" [...more test cases...] Entering test case "DataNodeEqualityTest" [...specific test output...] Leaving test case "DataNodeEqualityTest" Leaving test suite "Auto Unit Test" *** No errors detected -------------- ...and then it just hangs without returning to the command line until I ctrl-C. When I run it under gdb to get a stack trace, I get the following session: -------------- $ libtool gdb mytestsuite *** Warning: inferring the mode of operation is deprecated. *** Future versions of Libtool will require -mode=MODE be specified. GNU gdb Red Hat Linux (6.0post-0.20040223.19rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) run Starting program: /home/build/<ourpath>/.libs/lt-mytestsuite Error while mapping shared library sections: : Success. Error while reading shared library symbols: : No such file or directory. [Thread debugging using libthread_db enabled] [New Thread -150709728 (LWP 13105)] Error while reading shared library symbols: : No such file or directory. Error while reading shared library symbols: : No such file or directory. Running 48 test cases... *** No errors detected Program received signal SIGINT, Interrupt. [Switching to Thread -150709728 (LWP 13105)] 0x00272402 in ?? () (gdb) backtrace #0 0x00272402 in ?? () #1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 #2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0 #3 0x00000000 in ?? () (gdb) quit The program is running. Exit anyway? (y or n) y $ -------------- Finally, a library listing for the test suite executable gives the following: -------------- $ ldd ./.libs/mytestsuite linux-gate.so.1 => (0x001fb000) mylibrary.0 => not found [...related to libtool use?] libxslt.so.1 => /usr/lib/libxslt.so.1 (0x03ec2000) libxml2.so.2 => /usr/lib/libxml2.so.2 (0x03a8c000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00644000) libz.so.1 => /usr/lib/libz.so.1 (0x00557000) libxmlwrapp.so.5 => /home/build/<somepath>/dep/lib/libxmlwrapp.so.5 (0x00dcc000) libboost_unit_test_framework.so.1 => /usr/lib/libboost_unit_test_framework.so.1 (0x003bf000) libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x0080c000) libm.so.6 => /lib/tls/libm.so.6 (0x0052c000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00763000) libc.so.6 => /lib/tls/libc.so.6 (0x0040f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x003f2000) $ -------------- Any advice to proceed and repair? We'd like very much to be able to use these RPMs than reverting back to our previous homegrown approach. Thank you and kind regards, Richard Newman Crowley Davis Research, Inc.
Any advice to proceed and repair? We'd like very much to be able to use these RPMs than reverting back to our previous homegrown approach.
I don't think I could be of too much help here. Couple notes though. All boost libraries could be built both in single and multi-thread mode. Could you choose one to use? Which one are you using now? Does it anything to do with what you are doing - Did you try to run trivial single test case module? Why do you need to link with pthreads at all? Gennadiy
I appreciate your reply. With it coming from an RPM, I think I can only tell if it were linked in multi-thread mode by looking at its dependencies (there is no _mt_gcc type suffixes on the libraries since they are just copied into /usr/lib by the RPM). The dependencies include libpthread, so I'm going to assume from that multi-threaded mode has been used to build the libraries for the package. When we built Boost directly, we generally used the _mt_gcc version. Nominally, we don't need pthreads right now in our test case as none of the tests seem to directly invoke mutexes. However, the library we are building the test suite to cover does include mutex support. (We recently began to use the Boost test framework to automate unit testing and so we have been adding tests as we touch code during refactors, etc. so our test coverage is not at all complete yet). In any case, linking with pthread on my FC4/gcc4.0.1/boost1.32.0-6rpm worked fine. I do think though that a library reference might be to blame, I just don't know where to look. I could certainly do a trivial single test case. However, I had avoided such an approach because to compare apples to apples, the best is to leave everything in place and comment out all the unit tests, adding back in a single one. From there though, I could start to tear apart the make files and note when things changed. I'm probably left with that as the only real diagnostic now; I was hoping the symptoms were indicative of some library, etc., missing that I was ignorant of. Thanks, Richard Gennadiy Rozental wrote:
Any advice to proceed and repair? We'd like very much to be able to use these RPMs than reverting back to our previous homegrown approach.
I don't think I could be of too much help here. Couple notes though. All boost libraries could be built both in single and multi-thread mode. Could you choose one to use? Which one are you using now? Does it anything to do with what you are doing - Did you try to run trivial single test case module? Why do you need to link with pthreads at all?
Gennadiy
When I reduced the test set to the following test, it still hangs but
says that one test passed (as the 48 I used before did).
BOOST_AUTO_UNIT_TEST(TrivialTest)
{
int x = 0;
x = 1;
}
When I remove this one test and so have no tests, no hang occurs. It
simply says no errors and terminates normally. I guess I can review the
macro represented by BOOST_AUTO_UNIT_TEST to see what might be invoked
here.
The make session for both cases was the same (excepting of course the
test output):
--------------------
$ make
if /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H
-I. -I/home/<somepath>/project/src/mylibrary -I../..
-I/home/<somepath>/project/src -I/home/<somepath>/construct/dep/include
-D__CSGA_REVISION__='"2057M"' -pthread -Werror -Wall -g -O2 -MT
function.lo -MD -MP -MF ".deps/function.Tpo" -c -o function.lo
function.cpp; \
then mv -f ".deps/function.Tpo" ".deps/function.Plo"; else rm -f
".deps/function.Tpo"; exit 1; fi
g++ -DHAVE_CONFIG_H -I. -I/home/<somepath>/project/src/mylibrary
-I../.. -I/<somepath>/project/src -I/<somepath>/dep/include
-D__CSGA_REVISION__=\"2057M\" -pthread -Werror -Wall -g -O2 -MT
function.lo -MD -MP -MF .deps/function.Tpo -c function.cpp -fPIC -DPIC
-o .libs/function.o
/bin/sh ../../libtool --tag=CXX --mode=link g++ -pthread -Werror -Wall
-g -O2 -o mylibrary.la -rpath /<somepath>/release/product/lib
I appreciate your reply.
With it coming from an RPM, I think I can only tell if it were linked in multi-thread mode by looking at its dependencies (there is no _mt_gcc type suffixes on the libraries since they are just copied into /usr/lib by the RPM). The dependencies include libpthread, so I'm going to assume from that multi-threaded mode has been used to build the libraries for the package. When we built Boost directly, we generally used the _mt_gcc version.
Nominally, we don't need pthreads right now in our test case as none of the tests seem to directly invoke mutexes. However, the library we are building the test suite to cover does include mutex support. (We recently began to use the Boost test framework to automate unit testing and so we have been adding tests as we touch code during refactors, etc. so our test coverage is not at all complete yet). In any case, linking with pthread on my FC4/gcc4.0.1/boost1.32.0-6rpm worked fine. I do think though that a library reference might be to blame, I just don't know where to look.
I could certainly do a trivial single test case. However, I had avoided such an approach because to compare apples to apples, the best is to leave everything in place and comment out all the unit tests, adding back in a single one. From there though, I could start to tear apart the make files and note when things changed. I'm probably left with that as the only real diagnostic now; I was hoping the symptoms were indicative of some library, etc., missing that I was ignorant of.
Thanks, Richard
Gennadiy Rozental wrote:
Any advice to proceed and repair? We'd like very much to be able to use these RPMs than reverting back to our previous homegrown approach.
I don't think I could be of too much help here. Couple notes though. All boost libraries could be built both in single and multi-thread mode. Could you choose one to use? Which one are you using now? Does it anything to do with what you are doing - Did you try to run trivial single test case module? Why do you need to link with pthreads at all?
Gennadiy
I took the .cpp file where I wrote the TrivialTest and used gcc (3.3.3)
to produce the expanded listing (using -E). I then substituted that
expanded version back as the .cpp file and rebuilt via automake (using
-g3 -O0) for debug. Now my backtrace is more informative:
--------------------------
(gdb) cont
Running 1 test case...
*** No errors detected
Program received signal SIGINT, Interrupt.
0x008a4402 in ?? ()
(gdb) backtrace
#0 0x008a4402 in ?? ()
#1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0
#3 0x003fd840 in _dl_runtime_resolve () from /lib/ld-linux.so.2
#4 0x0036e8ea in scoped_lock (this=0x8aab67c, m=@0x3fd840) at
lwm_pthreads.hpp:72
#5 0x0036e8ea in scoped_lock (this=0xfeedd910, m=@0x8aab67c) at
lwm_pthreads.hpp:72
#6 0x0036e7dc in boost::detail::sp_counted_base::release
(this=0x8aab670) at shared_count.hpp:140
#7 0x0036e170 in ~shared_count (this=0x8aab650) at shared_count.hpp:378
#8 0x00377392 in ~shared_ptr (this=0x8aab64c) at unit_test_suite.hpp:60
#9 0x003772eb in ~test_case (this=0x8aab630) at unit_test_suite.hpp:60
#10 0x0037719c in ~function_test_case (this=0x8aab630) at
call_traits.hpp:103
#11 0x00b45d45 in boost::unit_test::ut_detail::normalize_test_case_name
() from /usr/lib/libboost_unit_test_framework.so.1
#12 0x00b46018 in
std::for_each
When I reduced the test set to the following test, it still hangs but says that one test passed (as the 48 I used before did).
BOOST_AUTO_UNIT_TEST(TrivialTest) { int x = 0; x = 1; }
When I remove this one test and so have no tests, no hang occurs. It simply says no errors and terminates normally. I guess I can review the macro represented by BOOST_AUTO_UNIT_TEST to see what might be invoked here.
The make session for both cases was the same (excepting of course the test output): -------------------- $ make if /bin/sh ../../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I/home/<somepath>/project/src/mylibrary -I../.. -I/home/<somepath>/project/src -I/home/<somepath>/construct/dep/include -D__CSGA_REVISION__='"2057M"' -pthread -Werror -Wall -g -O2 -MT function.lo -MD -MP -MF ".deps/function.Tpo" -c -o function.lo function.cpp; \ then mv -f ".deps/function.Tpo" ".deps/function.Plo"; else rm -f ".deps/function.Tpo"; exit 1; fi g++ -DHAVE_CONFIG_H -I. -I/home/<somepath>/project/src/mylibrary -I../.. -I/<somepath>/project/src -I/<somepath>/dep/include -D__CSGA_REVISION__=\"2057M\" -pthread -Werror -Wall -g -O2 -MT function.lo -MD -MP -MF .deps/function.Tpo -c function.cpp -fPIC -DPIC -o .libs/function.o /bin/sh ../../libtool --tag=CXX --mode=link g++ -pthread -Werror -Wall -g -O2 -o mylibrary.la -rpath /<somepath>/release/product/lib
-L/<somepath>/dep/lib -lxslt -lxml2 -lxmlwrapp rm -fr .libs/mylibrary.0 .libs/mylibrary.0.0.0 .libs/mylibrary.la .libs/mylibrary.lai g++ -shared -nostdlib /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../crti.o /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/crtbeginS.o -L/usr/lib -L/<somepath>/dep/lib /usr/lib/libxslt.so /usr/lib/libxml2.so -lxmlwrapp -L/usr/lib/gcc-lib/i386-redhat-linux/3.3.3 -L/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../.. -lstdc++ -lm -lc -lgcc_s /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/crtendS.o /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../crtn.o -pthread -Werror -Wall -g -O2 -Wl,-soname -Wl,mylibrary.0 -o .libs/mylibrary.0.0.0 (cd .libs && rm -f mylibrary.0 && ln -s mylibrary.0.0.0 mylibrary.0) (cd .libs && rm -f mylibrary && ln -s mylibrary.0.0.0 mylibrary) creating mylibrary.la (cd .libs && rm -f mylibrary.la && ln -s ../mylibrary.la mylibrary.la) /bin/sh ../../libtool --tag=CXX --mode=link g++ -pthread -Werror -Wall -g -O2 -o mytestsuite -R/<somepath>/dep/lib mytestsuite.o ../../src/mylibrary/mylibrary.la -L/<somepath>/dep/lib -lboost_unit_test_framework g++ -pthread -Werror -Wall -g -O2 -o .libs/mytestsuite mytestsuite.o ../../src/mylibrary/.libs/mylibrary -L/<somepath>/dep/lib -L/usr/lib /usr/lib/libxslt.so /usr/lib/libxml2.so -lm -lpthread -lz -lxmlwrapp -lboost_unit_test_framework -Wl,--rpath -Wl,/<somepath>/release/product/lib -Wl,--rpath -Wl,/<somepath>/dep/lib creating mytestsuite ../../src/mylibrary/mytestsuite *** No errors detected --------------------
Richard
Richard Newman wrote:
I appreciate your reply.
With it coming from an RPM, I think I can only tell if it were linked in multi-thread mode by looking at its dependencies (there is no _mt_gcc type suffixes on the libraries since they are just copied into /usr/lib by the RPM). The dependencies include libpthread, so I'm going to assume from that multi-threaded mode has been used to build the libraries for the package. When we built Boost directly, we generally used the _mt_gcc version.
Nominally, we don't need pthreads right now in our test case as none of the tests seem to directly invoke mutexes. However, the library we are building the test suite to cover does include mutex support. (We recently began to use the Boost test framework to automate unit testing and so we have been adding tests as we touch code during refactors, etc. so our test coverage is not at all complete yet). In any case, linking with pthread on my FC4/gcc4.0.1/boost1.32.0-6rpm worked fine. I do think though that a library reference might be to blame, I just don't know where to look.
I could certainly do a trivial single test case. However, I had avoided such an approach because to compare apples to apples, the best is to leave everything in place and comment out all the unit tests, adding back in a single one. From there though, I could start to tear apart the make files and note when things changed. I'm probably left with that as the only real diagnostic now; I was hoping the symptoms were indicative of some library, etc., missing that I was ignorant of.
Thanks, Richard
Gennadiy Rozental wrote:
Any advice to proceed and repair? We'd like very much to be able to use these RPMs than reverting back to our previous homegrown approach.
I don't think I could be of too much help here. Couple notes though. All boost libraries could be built both in single and multi-thread mode. Could you choose one to use? Which one are you using now? Does it anything to do with what you are doing - Did you try to run trivial single test case module? Why do you need to link with pthreads at all?
Gennadiy
Unfortunately (and IMO it's a critical design flaw) smart_ptr chooses
threadness on defines level. Still you should be able to build the unit test
framework in single threaded mode when shared_ptr wouldn't be using any
mutexes.
Gennadiy
"Richard Newman"
I took the .cpp file where I wrote the TrivialTest and used gcc (3.3.3) to produce the expanded listing (using -E). I then substituted that expanded version back as the .cpp file and rebuilt via automake (using -g3 -O0) for debug. Now my backtrace is more informative:
-------------------------- (gdb) cont Running 1 test case...
*** No errors detected
Program received signal SIGINT, Interrupt. 0x008a4402 in ?? () (gdb) backtrace #0 0x008a4402 in ?? () #1 0x0064dcbe in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 #2 0x0064ac84 in _L_mutex_lock_29 () from /lib/tls/libpthread.so.0 #3 0x003fd840 in _dl_runtime_resolve () from /lib/ld-linux.so.2 #4 0x0036e8ea in scoped_lock (this=0x8aab67c, m=@0x3fd840) at lwm_pthreads.hpp:72 #5 0x0036e8ea in scoped_lock (this=0xfeedd910, m=@0x8aab67c) at lwm_pthreads.hpp:72 #6 0x0036e7dc in boost::detail::sp_counted_base::release
participants (2)
-
Gennadiy Rozental
-
Richard Newman