[program_options] seg fault on ppc64 / linux / gcc 3.4.4
I've just moved our project across to Boost 1.33, which seems to be
working fine on our x86 linux gcc 3.4.4 machine. However, I'm getting
seg faults on really basic stuff on the mac G5 we have, running linux
with gcc 3.4.4 (Gentoo, using ppc64 kernel and userspace). This machine
had no problems under Boost 1.32, I should add.
Here is a simple test program (based on one of the early examples):
=========
#include
Hi Liam,
Here is a simple test program (based on one of the early examples):
========= #include
namespace po = boost::program_options; int main ( int, char* ) { po::options_description generic("Generic options");
generic.add_options() ("version,v", "print version string"); generic.add_options() ("help", "produce help message");
return ( 0 ); } =========
The seg fault happens on the second add_options() line. If the two lines are combined (as in the original example) it faults somewhere in there, but I think it is the same place.
My gdb session:
=========
#0 0x000000800005d7a8 in ._ZN5boost6detail17sp_counted_impl_pINS_15program_options18option_descriptionEE7disposeEv () from /usr/lib/libboost_program_options.so.1.33.0
(gdb) up #1 0x0000008000057b90 in ._ZN5boost15program_options29options_description_easy_initclEPKcS3_ () from /usr/lib/libboost_program_options.so.1.33.0
My first guess is single-threaded/multi-threaded mismatch. Did you try compiling the application with -pthread? Do you know if the program_options library you use is build with multithreading or not? HTH, Volodya
Vladimir Prus wrote:
My first guess is single-threaded/multi-threaded mismatch. Did you try compiling the application with -pthread? Do you know if the program_options library you use is build with multithreading or not?
Compiling with -pthread gives the same result. Building and running under 32-bit userspace makes the problem go away. As to libraries, the Boost install created: /usr/lib/libboost_program_options-mt.a /usr/lib/libboost_program_options-mt.so.1.33.0 /usr/lib/libboost_program_options.a /usr/lib/libboost_program_options.so.1.33.0 as well as *gcc links to those, and standard links: /usr/lib/libboost_program_options-mt.so /usr/lib/libboost_program_options.so which point to the threaded and non-threaded so's, as you would expect. It looks to me like a 64-bit related problem, but I can't tell whether it is gcc or a Boost issue. I will see whether we can go back a version with gcc, or something. But that will be next week, as the sysadmin is away until then. Any other suggestions appreciated. Take care, Liam -- Liam Routt Ph: (03) 8344-1315 Research Programmer caligari@cs.mu.oz.au Computer Science, Melbourne University (or liam@routt.net)
Liam Routt wrote: [snip]
It looks to me like a 64-bit related problem, but I can't tell whether it is gcc or a Boost issue. I will see whether we can go back a version with gcc, or something. But that will be next week, as the sysadmin is away until then.
program_options works (meaning all tests pass) under Tru64 which is a 64 bit platform, so I would be surprised to hear that there are any 64 bit issues in the library itself. IIRC there were some size_type vs. unsigned issues which could cause failures on 64 bit platforms, but I think Vladimir fixed those. Markus
Liam Routt wrote:
It looks to me like a 64-bit related problem, but I can't tell whether it is gcc or a Boost issue. I will see whether we can go back a version with gcc, or something. But that will be next week, as the sysadmin is away until then.
Any other suggestions appreciated.
How about running the smart_ptr test suite? The shared_count implementation was changed significantly for gcc on PPC (and many other targets), but I don't think either of the PPC systems running automated regression tests are doing so in 64-bit mode. Ben.
Ben Hutchings wrote:
Liam Routt wrote:
It looks to me like a 64-bit related problem, but I can't tell whether it is gcc or a Boost issue. I will see whether we can go back a version with gcc, or something. But that will be next week, as the sysadmin is away until then.
Any other suggestions appreciated.
How about running the smart_ptr test suite? The shared_count implementation was changed significantly for gcc on PPC (and many other targets), but I don't think either of the PPC systems running automated regression tests are doing so in 64-bit mode.
That seemed like a fair request, so I set about working out how to do that. I've now run the tests on my x86 system and on the PPC64, with differing results. The x86 one passes all the tests. The PPC64: ===== ...failed updating 5 targets... ...skipped 5 targets... ...updated 12 targets... ===== The fails seem to be (I'm new at parsing this output): ===== smart_ptr_test shared_ptr_basic_test shared_ptr_test weak_ptr_test shared_from_this_test ===== In the smart_ptr_test and shared_ptr_test output I note a fair number of use_count() related error messages, which might indeed indicate that the shared_count is a problem area. I have saved the output and can provide it on request, but I didn't think that simply mailing it to the list was necessarily the right step to take. What *is* the right step to take next? Take care, Liam -- Liam Routt Ph: (03) 8344-1315 Research Programmer caligari@cs.mu.oz.au Computer Science, Melbourne University (or liam@routt.net)
Liam Routt wrote:
I've now run the tests on my x86 system and on the PPC64, with differing results. The x86 one passes all the tests. The PPC64:
===== ...failed updating 5 targets... ...skipped 5 targets... ...updated 12 targets... =====
The fails seem to be (I'm new at parsing this output):
===== smart_ptr_test shared_ptr_basic_test shared_ptr_test weak_ptr_test shared_from_this_test =====
In the smart_ptr_test and shared_ptr_test output I note a fair number of use_count() related error messages, which might indeed indicate that the shared_count is a problem area.
I have saved the output and can provide it on request, but I didn't think that simply mailing it to the list was necessarily the right step to take. What *is* the right step to take next?
I believe that the shared count is 64-bit on PPC64 (since it is declared long) but the atomic operations being used on it assume it's 32-bit, since they were written for PPC32. If you send the log to me I might be able to tell whether this is the case. May I suggest also that you try applying the following (wholly untested) patch to boost/detail/sp_counted_base_gcc_ppc.hpp and re-running the test: --- sp_counted_base_gcc_ppc.hpp.orig 2005-08-22 14:31:10.315970200 +0100 +++ sp_counted_base_gcc_ppc.hpp 2005-08-22 14:30:47.512073200 +0100 @@ -41,9 +41,9 @@ __asm__ ( "0:\n\t" - "lwarx %1, 0, %2\n\t" + "ldarx %1, 0, %2\n\t" "addi %1, %1, 1\n\t" - "stwcx. %1, 0, %2\n\t" + "stdcx. %1, 0, %2\n\t" "bne- 0b": "=m"( *pw ), "=&b"( tmp ): @@ -62,9 +62,9 @@ ( "sync\n\t" "0:\n\t" - "lwarx %1, 0, %2\n\t" + "ldarx %1, 0, %2\n\t" "addi %1, %1, -1\n\t" - "stwcx. %1, 0, %2\n\t" + "stdcx. %1, 0, %2\n\t" "bne- 0b\n\t" "isync": @@ -86,12 +86,12 @@ __asm__ ( "0:\n\t" - "lwarx %1, 0, %2\n\t" - "cmpwi %1, 0\n\t" + "ldarx %1, 0, %2\n\t" + "cmpdi %1, 0\n\t" "beq 1f\n\t" "addi %1, %1, 1\n\t" "1:\n\t" - "stwcx. %1, 0, %2\n\t" + "stdcx. %1, 0, %2\n\t" "bne- 0b": "=m"( *pw ), "=&b"( rv ): Ben.
Ben Hutchings wrote:
Liam Routt wrote:
In the smart_ptr_test and shared_ptr_test output I note a fair number of use_count() related error messages, which might indeed indicate that the shared_count is a problem area.
I have saved the output and can provide it on request, but I didn't think that simply mailing it to the list was necessarily the right step to take. What *is* the right step to take next?
I believe that the shared count is 64-bit on PPC64 (since it is declared long) but the atomic operations being used on it assume it's 32-bit, since they were written for PPC32.
That's my guess as well.
If you send the log to me I might be able to tell whether this is the case. May I suggest also that you try applying the following (wholly untested) patch to boost/detail/sp_counted_base_gcc_ppc.hpp and re-running the test:
--- sp_counted_base_gcc_ppc.hpp.orig 2005-08-22 14:31:10.315970200 +0100 +++ sp_counted_base_gcc_ppc.hpp 2005-08-22 14:30:47.512073200 +0100 @@ -41,9 +41,9 @@ __asm__ ( "0:\n\t" - "lwarx %1, 0, %2\n\t" + "ldarx %1, 0, %2\n\t"
[...]
I think that it would be better to use a 32 bit count instead, as in the
patch below (int is 32-bit under both PPC32 and PPC64, right?)
Either way, please let us know whether one or both of these changes resolve
the issue, so that we can put the fix in 1.33.1, if one is released.
diff -c -r1.3 sp_counted_base_gcc_ppc.hpp
*** sp_counted_base_gcc_ppc.hpp 8 Apr 2005 10:39:28 -0000 1.3
--- sp_counted_base_gcc_ppc.hpp 22 Aug 2005 18:12:00 -0000
***************
*** 32,42 ****
namespace detail
{
! inline void atomic_increment( long * pw )
{
// ++*pw;
! long tmp;
__asm__
(
--- 32,42 ----
namespace detail
{
! inline void atomic_increment( int * pw )
{
// ++*pw;
! int tmp;
__asm__
(
***************
*** 52,62 ****
);
}
! inline long atomic_decrement( long * pw )
{
// return --*pw;
! long rv;
__asm__ __volatile__
(
--- 52,62 ----
);
}
! inline int atomic_decrement( int * pw )
{
// return --*pw;
! int rv;
__asm__ __volatile__
(
***************
*** 76,87 ****
return rv;
}
! inline long atomic_conditional_increment( long * pw )
{
// if( *pw != 0 ) ++*pw;
// return *pw;
! long rv;
__asm__
(
--- 76,87 ----
return rv;
}
! inline int atomic_conditional_increment( int * pw )
{
// if( *pw != 0 ) ++*pw;
// return *pw;
! int rv;
__asm__
(
***************
*** 109,116 ****
sp_counted_base( sp_counted_base const & );
sp_counted_base & operator= ( sp_counted_base const & );
! long use_count_; // #shared
! long weak_count_; // #weak + (#shared != 0)
public:
--- 109,116 ----
sp_counted_base( sp_counted_base const & );
sp_counted_base & operator= ( sp_counted_base const & );
! int use_count_; // #shared
! int weak_count_; // #weak + (#shared != 0)
public:
***************
*** 170,176 ****
long use_count() const // nothrow
{
! return static_cast
Peter Dimov wrote:
Ben Hutchings wrote:
Liam Routt wrote:
In the smart_ptr_test and shared_ptr_test output I note a fair number of use_count() related error messages, which might indeed indicate that the shared_count is a problem area.
I have saved the output and can provide it on request, but I didn't think that simply mailing it to the list was necessarily the right step to take. What *is* the right step to take next?
I believe that the shared count is 64-bit on PPC64 (since it is declared long) but the atomic operations being used on it assume it's 32-bit, since they were written for PPC32.
That's my guess as well.
If you send the log to me I might be able to tell whether this is the case. May I suggest also that you try applying the following (wholly untested) patch to boost/detail/sp_counted_base_gcc_ppc.hpp and re-running the test:
--- sp_counted_base_gcc_ppc.hpp.orig 2005-08-22 14:31:10.315970200 +0100 +++ sp_counted_base_gcc_ppc.hpp 2005-08-22 14:30:47.512073200 +0100 @@ -41,9 +41,9 @@ __asm__ ( "0:\n\t" - "lwarx %1, 0, %2\n\t" + "ldarx %1, 0, %2\n\t"
[...]
I think that it would be better to use a 32 bit count instead, as in the patch below (int is 32-bit under both PPC32 and PPC64, right?)
Surely we have some sort of "bit-sized" basic types? I'm new to Boost, but we've had typedefs in our own project for as long as we've been using C++...
Either way, please let us know whether one or both of these changes resolve the issue, so that we can put the fix in 1.33.1, if one is released.
Both patches allow us to pass the entire suite on the PPC64/Linux/gcc combo. Any suggestion which I should apply locally to allow us to proceed? Are there more parts of the test suite that we should run? Take care, Liam -- Liam Routt Ph: (03) 8344-1315 Research Programmer caligari@cs.mu.oz.au Computer Science, Melbourne University (or liam@routt.net)
Liam Routt wrote:
Peter Dimov wrote:
I think that it would be better to use a 32 bit count instead, as in the patch below (int is 32-bit under both PPC32 and PPC64, right?)
Surely we have some sort of "bit-sized" basic types? I'm new to Boost, but we've had typedefs in our own project for as long as we've been using C++...
We have
Either way, please let us know whether one or both of these changes resolve the issue, so that we can put the fix in 1.33.1, if one is released.
Both patches allow us to pass the entire suite on the PPC64/Linux/gcc combo.
Any suggestion which I should apply locally to allow us to proceed?
... the 'int' patch seems to be the way to go. 'ld*' might not work on PPC32, and 64 bits are an overkill for the reference count, anyway.
Are there more parts of the test suite that we should run?
You may want to try shared_ptr_mt_test and weak_ptr_mt_test, if you haven't already. These aren't being run by default.
participants (5)
-
Ben Hutchings
-
Liam Routt
-
Markus Schöpflin
-
Peter Dimov
-
Vladimir Prus