Peter Dimov wrote:
Paul Davis wrote:
Howdy,
I've come across an odd segfault that originates in some of the boost code. (By originate, meaning thats where the stack trace points, I'm not sure if its me or boost thats wrong)
Anyway, the weird part is that its only on my 64 bit machine.
I took a look at where its segfaulting in boost::detail::atomic_exchange_and_add(). Its scary inline assembly stuff. Well, mostly I just don't know assembly so I haven't the slightest idea if its right or wrong. And obviously, its a platform specific header and what not so I imagine its limited to this area. I have had other weird segfaults that come and go from the atomic_* set of methods. I can't pin down exactly whats causing it. They mostly seem to be coming from storing shared_ptr's in STL containers. I've never had any problems with it before so I'm assuming its just a relatively untested section of code.
It's being tested quite extensively, but some problems are triggered only in very rare circumstances depending on the optimization level and the specific compiler backend. This will be pretty hard to pin down.
We can start with sanity checking whether int is 32 bits or 64 bits on this platform. You should also try different optimization levels and see whether this makes a difference. It would help a lot if you can trim the failing example to a small snippet; we can examine the generated assembly (g++ -S) then and see the atomic_* portions in context (they are usually marked with #APP in the .s file.)
[snip]
sizeof( int ) == 4 on this platform.
I've done a bit of looking to see if I can't find anywhere that I'm just
wildly screwing up memory access.
I've tried valgrind and efence both.
I can't get efence to give me any information because it keeps detecting
a malloc of size 0 in the postgres library, which the program doesn't
run without.
Valgrind gave some errors emanating from the boost pointers reading
invalid memory reads. I've attached the valgrind output.
Now, I don't have the greatest experience with valgrind, but I'm
reasonably sure that its not pointing directly at boost. I could've
overrun an array boundary, or somehow managed to write to my own memory
segment and screwed with the internals of the ptr in which case it would
still be my fault.
But I'm at a loss as what else to try. I've tried recreating a similar
situation in a simple testcase, but I can't manage to get the bug to
come out. I'm leaning toward it being my fault, but I'm not exactly
sure what to try next to figure out where it is.
I'll keep looking.
Paul
==993== Memcheck, a memory error detector.
==993== Copyright (C) 2002-2005, and GNU GPL'd, by Julian Seward et al.
==993== Using LibVEX rev 1471, a library for dynamic binary translation.
==993== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==993== Using valgrind-3.1.0-Debian, a dynamic binary instrumentation framework.
==993== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==993== For more details, rerun with: -v
==993==
==993== Invalid read of size 8
==993== at 0x4010664: (within /lib/ld-2.3.6.so)
==993== by 0x40089BC: (within /lib/ld-2.3.6.so)
==993== by 0x4004DF3: (within /lib/ld-2.3.6.so)
==993== by 0x4006612: (within /lib/ld-2.3.6.so)
==993== by 0x576C51B: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x576E627: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576E6D2: __libc_dlopen_mode (in /lib/libc-2.3.6.so)
==993== by 0x574A0F1: __nss_lookup_function (in /lib/libc-2.3.6.so)
==993== by 0x574A253: (within /lib/libc-2.3.6.so)
==993== Address 0x69E8528 is 16 bytes inside a block of size 21 alloc'd
==993== at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==993== by 0x4006A00: (within /lib/ld-2.3.6.so)
==993== by 0x576C51B: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x576E627: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576E6D2: __libc_dlopen_mode (in /lib/libc-2.3.6.so)
==993== by 0x574A0F1: __nss_lookup_function (in /lib/libc-2.3.6.so)
==993== by 0x574A253: (within /lib/libc-2.3.6.so)
==993== by 0x5702A14: getpwuid_r (in /lib/libc-2.3.6.so)
==993== by 0x4B37E18: pqGetpwuid (in /usr/lib/libpq.so.4.1)
==993==
==993== Invalid read of size 8
==993== at 0x401067E: (within /lib/ld-2.3.6.so)
==993== by 0x40089BC: (within /lib/ld-2.3.6.so)
==993== by 0x4004DF3: (within /lib/ld-2.3.6.so)
==993== by 0x4006612: (within /lib/ld-2.3.6.so)
==993== by 0x4009C2C: (within /lib/ld-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x4009F32: (within /lib/ld-2.3.6.so)
==993== by 0x576C57A: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x58AD043: (within /lib/libdl-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== Address 0x69F7A90 is 24 bytes inside a block of size 26 alloc'd
==993== at 0x4A19A16: malloc (vg_replace_malloc.c:149)
==993== by 0x4006A00: (within /lib/ld-2.3.6.so)
==993== by 0x4009C2C: (within /lib/ld-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x4009F32: (within /lib/ld-2.3.6.so)
==993== by 0x576C57A: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x58AD043: (within /lib/libdl-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x58AD541: (within /lib/libdl-2.3.6.so)
==993== by 0x58AD081: dlopen (in /lib/libdl-2.3.6.so)
==993==
==993== Conditional jump or move depends on uninitialised value(s)
==993== at 0x4008F11: (within /lib/ld-2.3.6.so)
==993== by 0x576C666: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x58AD043: (within /lib/libdl-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x58AD541: (within /lib/libdl-2.3.6.so)
==993== by 0x58AD081: dlopen (in /lib/libdl-2.3.6.so)
==993== by 0x407683: GetModule(std::string) (dds.cc:46)
==993== by 0x4094BB: end_elem(void*, char const*) (dds.cc:196)
==993== by 0x4C4C29A: (within /usr/lib/libexpat.so.1.0.0)
==993== by 0x4C4D064: (within /usr/lib/libexpat.so.1.0.0)
==993==
==993== Conditional jump or move depends on uninitialised value(s)
==993== at 0x4008F51: (within /lib/ld-2.3.6.so)
==993== by 0x576C666: (within /lib/libc-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x576D0C9: _dl_open (in /lib/libc-2.3.6.so)
==993== by 0x58AD043: (within /lib/libdl-2.3.6.so)
==993== by 0x400B13F: (within /lib/ld-2.3.6.so)
==993== by 0x58AD541: (within /lib/libdl-2.3.6.so)
==993== by 0x58AD081: dlopen (in /lib/libdl-2.3.6.so)
==993== by 0x407683: GetModule(std::string) (dds.cc:46)
==993== by 0x4094BB: end_elem(void*, char const*) (dds.cc:196)
==993== by 0x4C4C29A: (within /usr/lib/libexpat.so.1.0.0)
==993== by 0x4C4D064: (within /usr/lib/libexpat.so.1.0.0)
========================
| DDS Parameter Report |
========================
Marching
########
Previously Calculated
---------------------
Generation in Progress
----------------------
border_points
==993==
==993== Invalid read of size 1
==993== at 0x4B2CDA7: PQcmdTuples (in /usr/lib/libpq.so.4.1)
==993== by 0x4FD39BA: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA098 is 32 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x4B2CDAB: PQcmdTuples (in /usr/lib/libpq.so.4.1)
==993== by 0x4FD39BA: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA09F is 39 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x4B2CDD7: PQcmdTuples (in /usr/lib/libpq.so.4.1)
==993== by 0x4FD39BA: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA0A0 is 40 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x4B2CDC2: PQcmdTuples (in /usr/lib/libpq.so.4.1)
==993== by 0x4FD39BA: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA0A1 is 41 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x56A5CF1: (within /lib/libc-2.3.6.so)
==993== by 0x56A3721: atoi (in /lib/libc-2.3.6.so)
==993== by 0x4FD39C2: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA0A1 is 41 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x56A5E77: (within /lib/libc-2.3.6.so)
==993== by 0x56A3721: atoi (in /lib/libc-2.3.6.so)
==993== by 0x4FD39C2: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA0A1 is 41 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993==
==993== Invalid read of size 1
==993== at 0x56A5E0D: (within /lib/libc-2.3.6.so)
==993== by 0x56A3721: atoi (in /lib/libc-2.3.6.so)
==993== by 0x4FD39C2: dds::client::Command(std::string) (client.cc:121)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
==993== Address 0x69FA0A2 is 42 bytes inside a block of size 160 free'd
==993== at 0x4A1A5B3: free (vg_replace_malloc.c:235)
==993== by 0x4FD39B1: dds::client::Command(std::string) (client.cc:119)
==993== by 0x4FD39F8: dds::client::Insert(std::string) (client.cc:127)
==993== by 0x408A92: main (dds.cc:489)
=============
| Arguments |
=============
experiment_id = 5053
field = 2
sequence = 0
============
| Marching |
============
==993==
==993== Invalid read of size 8
==993== at 0x40A4BC: boost::detail::shared_count::~shared_count() (shared_count.hpp:159)
==993== by 0x6D0B492: boost::shared_ptr