[regex] Static linking woes with gcc 4.0x on Fedora Core
I've been having great success with the regex library on Win32 and Linux, up until the time I decided to flip over to using all static linking. What used to be working code suddenly goes belly-up during static constructor initialization, but only when I statically link. When I use dynamic-linking, all appears to be well. I've tried the following permutations, each of which crashes, but each in a slightly different place: - Stock Fedora Core 4 install, with gcc 4.0.0 and boost 1.32 - Upgrade the above to gcc 4.0.2 - Upgrade boost to boost-1.33.1-6.i386.rpm (+boost-devel) - Build boost 1.33 from sources using gcc 4.0.2 When I started, it was dying when running static initializers inside the regex lib. I don't have the exact stack trace right now, but it looked like c_regex_traits<char>::init() was calling a null function pointer somewhere (similar to the top frame shown below). But now that I've come to the last step, it's dying during exit(), with the following backtrace---which isn't terribly helpful, I know: (gdb) backtrace Core was generated by `trquery'. Program terminated with signal 11, Segmentation fault. #0 0x00000000 in ?? () #1 0x08096f63 in __tcf_0 () #2 0x080fa5ee in exit () #3 0x08048d80 in DoQueryAll (ctl=@0xbfaf2a84, ttyname=0x0) at siquery.cpp:472 #4 0x08049826 in main (argc=1, argv=0xbfaf2ec4) at siquery.cpp:320 My DoQueryAll() is calling exit() directly, although I doubt that should make a difference. And again, this All Just Works(tm) when I dynamically link. Dynamic linking is not an option for us---if it came to that, then I'd be seriously looking at refactoring to replace boost.regex with either custom parsing or another regex lib. Besides, bjam *does* build a libboost_regex-XXX.a, presumably it's supposed to work. :-) Now I know that based on the above backtrace, this could be anything. I'm suspecting boost because of the first symptom where the backtrace is in c_regex_traits<char>::init(). So given the dearth of discussion on this category of problem, I'm guessing that perhaps this is a problem with gcc 4.x? Are there any known issues with boost.regex on gcc 4.0.x? Are they fixed in 4.1? Should I just fall back to Fedora Core 3 to get gcc 3.x? Thanks much! David Yon Tactical Software
David Yon wrote:
So given the dearth of discussion on this category of problem, I'm guessing that perhaps this is a problem with gcc 4.x? Are there any known issues with boost.regex on gcc 4.0.x? Are they fixed in 4.1? Should I just fall back to Fedora Core 3 to get gcc 3.x?
David, I've not seen or heard of anything like that before, the first thing to check is that the regex tests work OK, cd into libs/regex/test and bjam regex_regress Assuming gcc-4 is your default compiler then that will build the main regression test program *with static linking*. If that works OK, then I bet there's some lind of binary incompatibity between the lib variant that you're using and your program: try adding the regex source direct to your program's build process and see if that works, if it does I would suggest building the regex source into a static lib yourself using whatever compiler options you're using. However... I note that your program is calling exit explicity, there's just an outside chance that there's some kind of bug either in the regex lib initialisation/cleanup routines or in gcc-4.x. Or... maybe your program is accessing the regex lib during it's cleanup routines, and finding that regex has already destroyed it's globals? Those are just wildcard suggestions frankly. HTH, John.
John Maddock wrote:
If that works OK, then I bet there's some lind of binary incompatibity between the lib variant that you're using and your program: try adding the regex source direct to your program's build process and see if that works, if it does I would suggest building the regex source into a static lib yourself using whatever compiler options you're using.
I'll probably try this first. I'm not doing anything special with compiler options---I'm using Kdevelop to build command-line tools and a daemon, and letting it choose the options. Before I go reverse-engineer the Jamfile, can you tell me anything specific that Boost.regex wants to see in terms of compiler options or pre-processer definitions?
However... I note that your program is calling exit explicity, there's just an outside chance that there's some kind of bug either in the regex lib initialisation/cleanup routines or in gcc-4.x. Or... maybe your program is accessing the regex lib during it's cleanup routines, and finding that regex has already destroyed it's globals? Those are just wildcard suggestions frankly.
I don't define an atexit handler. The only thought I have is that I do have a number of statically-declared regex objects which would be getting cleaned up by the compiler automatically calling destructors. Perhaps the compiler is getting it wrong by cleaning up the regex library before destructing my static regex objects? It's just odd that using a .so works just great but this all unravels with a .a.
David Yon wrote:
I'll probably try this first. I'm not doing anything special with compiler options---I'm using Kdevelop to build command-line tools and a daemon, and letting it choose the options. Before I go reverse-engineer the Jamfile, can you tell me anything specific that Boost.regex wants to see in terms of compiler options or pre-processer definitions?
No, it's just a bunch of source.
However... I note that your program is calling exit explicity, there's just an outside chance that there's some kind of bug either in the regex lib initialisation/cleanup routines or in gcc-4.x. Or... maybe your program is accessing the regex lib during it's cleanup routines, and finding that regex has already destroyed it's globals? Those are just wildcard suggestions frankly.
I don't define an atexit handler. The only thought I have is that I do have a number of statically-declared regex objects which would be getting cleaned up by the compiler automatically calling destructors. Perhaps the compiler is getting it wrong by cleaning up the regex library before destructing my static regex objects?
It's just odd that using a .so works just great but this all unravels with a .a.
The .so will get unloaded after the executable, so the static regex instances might be the cause of the problem I suppose, but they really shouldn't be. If this does turn out to be the cause and you can put together a test case I'll look into it. John.
John Maddock wrote:
The .so will get unloaded after the executable, so the static regex instances might be the cause of the problem I suppose, but they really shouldn't be. If this does turn out to be the cause and you can put together a test case I'll look into it.
One more thought... where would be good places to put some good old-fashioned printfs in the regex code to detect when static initializer/destructors are being called? Would be interesting to see if I can catch the compiler in the act of destroying my static regexes *after* it has shut down the regex library code. Thanks for the quick attention, BTW.
David Yon wrote:
One more thought... where would be good places to put some good old-fashioned printfs in the regex code to detect when static initializer/destructors are being called? Would be interesting to see if I can catch the compiler in the act of destroying my static regexes *after* it has shut down the regex library code.
Hmm, actually the more I think about it, the less places I can think of that actually do anything at shutdown time. The best thing would be to get a decent stack trace and/or a test case. John.
John Maddock wrote:
David Yon wrote:
One more thought... where would be good places to put some good old-fashioned printfs in the regex code to detect when static initializer/destructors are being called? Would be interesting to see if I can catch the compiler in the act of destroying my static regexes *after* it has shut down the regex library code.
Hmm, actually the more I think about it, the less places I can think of that actually do anything at shutdown time. The best thing would be to get a decent stack trace and/or a test case.
Ok, sorry for the firedrill, but I'm now almost certain that Boost is the victim here rather than the culprit. As I said earlier, I'm using KDevelop with the AutoMake back-end for doing the actual build. The AutoMake back-end has a linker check-box that lets you statically link. Well, you do get an executable which doesn't have any shared lib dependencies, but for me, that executable is seriously broken. I apologize for this being somewhat off-topic, but I'm posting the rest of this note anyways in hopes the some poor schmoe in the future gets a more informative Google hit than did I this time around... So after going over to a Fedora Core 3 box with gcc 3.4, I was still running into problems. Only I was back to my original symptom of dying in c_regex_traits<char>::init() (which is called during static constructor initialization). It was dying on this line: #ifdef BOOST_HAS_THREADS re_detail::cs_guard g(*re_detail::p_re_lock); #endif Well actually, it was dying during InitializeCriticalSection(), which simply calls pthread_mutex_init(). Except in my case, the top of the backtrace was showing 0x00000000. Initially I attributed this to a side-effect of some wildly bad stack or something, but when you look at the disassembly, indeed the call to pthread_mutex_init() had been resolved to a null address!! (WTF?) Running "nm" on the executable shows that pthread_mutex_init is a "Weak Symbol", which means that it was legal for it not to resolve and therefore it resolved to zero. I'm not sure what problem the gcc folks were trying to solve with that somewhat oddball concept, or why the KDevelop "-all-static" checkbox causes this to happen. I do know that for C++, "-all-static" must be doing some serious voodoo, since normally it is not at all straightforward to get gcc to cleaning link C++ statically. For a very special flavor of pain, read the following thread on that topic: http://groups.google.com/group/gnu.gcc.help/browse_frm/thread/bfd688b5998856... At any rate, I've determined that there doesn't seem to be anything special about pthreads, because seemingly-inconsequential changes to the build will make the problem move around. I.e., when the problem is that I get a segv during exit(), obviously pthread_mutex_init() had been properly linked in that time. So I'm working on getting a static link "the hard way" without relying on the broken "-all-static" option. Hopefully that is the path to enlightenment. One good thing to come out of all this is that I discovered that including the Boost.regex source in my build is not the black magic I thought it would be. Thanks, John, for that suggestion---it will seriously simplify by build scripting. David Yon Tactical Software
participants (2)
-
David Yon
-
John Maddock