
There appear to be a few regressions in the Linux test results: 1) Several libraries are listed as failing with intel-linux-8 including config_test, regex, and python. However these do pass when tested on my own machine with the latest Intel binaries. Could this be a setup issue? For example if the compiler has recently been patched then you may need to clear out past builds before the tests will start passing again. 2) Test exec monitor is failing with gcc-2.96+STLPort which is causing quite a few other failures. 3) the test results look a little messed up somehow, and the failures listed in http://boost.sourceforge.net/regression-logs/cs-Linux.html don't match those in http://boost.sourceforge.net/regression-logs/cs-Linux/developer/. Definitely vacation time now... John.

John Maddock wrote:
2) Test exec monitor is failing with gcc-2.96+STLPort which is causing quite a few other failures.
These results look stale. It's complaining about boost::std_min which I pulled out June 22. I just made a change yesterday that should hopefully get this compiler/library combination moving again. I'll keep an eye on it. -- Eric Niebler Boost Consulting www.boost-consulting.com

John Maddock wrote:
There appear to be a few regressions in the Linux test results:
1) Several libraries are listed as failing with intel-linux-8 including config_test, regex, and python. However these do pass when tested on my own machine with the latest Intel binaries. Could this be a setup issue? For example if the compiler has recently been patched then you may need to clear out past builds before the tests will start passing again.
I haven't installed the latest patches, yet. I'll look into this. Another possible reason for the different results could be in different version or different installation of gcc being used. At least for Boost.Python different installation of gcc caused problems.
2) Test exec monitor is failing with gcc-2.96+STLPort which is causing quite a few other failures.
+ the std::min problems.
3) the test results look a little messed up somehow, and the failures listed in http://boost.sourceforge.net/regression-logs/cs-Linux.html don't match those in http://boost.sourceforge.net/regression-logs/cs-Linux/developer/.
Right :( I noticed this myself and I'm still trying to find out what's happening. Maybe, I failed to react to changes to Boost.Build or the test scripts. I'll need some time to sort everything out. So far, I don't even have a clue why for many of the tests the test type is not being printed. Running the tests takes ages now on my machine. This is due to a few tests consuming enormous amounts of memory. Swapping causes the test time to increase to hours(!) for some of the tests. The current killer test is algorithm/string/replace on intel 8. Compiling the program consumes ~800MB RAM, which is 3 times as much as my box has built in. This results in a compilation time of 3 hours for that test. Having such tests to run effectively reduces my ability to sort out the problems (I'm currently considering to move the testing for some of the compilers to a different machine. However, this will take some time. I dislike this idea, though, since this will result in Linux tests being split across several sets.) My box sure is underequipped, but 800MB for a single test is way too much, anyway. Recently a problem came up with program_options/cmdline_test_dll. Several times, my computer crashed and I haven't been able to figure out the reason for it. Today, I was lucky to see that test eating up all the memory and CPU. So it looks like it ran into an infinite loop. This for several times has been an indicator for something going wrong with the signal handling in Boost.Test; this time, it also looks like Boost.Test is the culprit; strace shows output similar to the other cases: --- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0 (and so on) This type of failure is a showstopper for testing. I suggest to disable the sigaction based signal handling in Boost.Test at least for gcc 2 and for como. Perhaps, other compilers are also affected. Regards, m

Martin Wille wrote:
Recently a problem came up with program_options/cmdline_test_dll. Several times, my computer crashed and I haven't been able to figure out the reason for it. Today, I was lucky to see that test eating up all the memory and CPU. So it looks like it ran into an infinite loop. This for several times has been an indicator for something going wrong with the signal handling in Boost.Test; this time, it also looks like Boost.Test is the culprit; strace shows output similar to the other cases:
--- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0
Hmm.... the test worked OK for me! I'm really interested to figure out where the SIGABRT comes from: maybe some assert fires.
This type of failure is a showstopper for testing. I suggest to disable the sigaction based signal handling in Boost.Test at least for gcc 2 and for como. Perhaps, other compilers are also affected.
You mean the problems is only on those two toolsets? Yes, I think disabling signal handling in Boost.Test to see where the test fails would be very desired. BTW, you mention como, but I don't see that toolset in linux regression results. - Volodya

Vladimir Prus wrote:
Martin Wille wrote:
Recently a problem came up with program_options/cmdline_test_dll. Several times, my computer crashed and I haven't been able to figure out the reason for it. Today, I was lucky to see that test eating up all the memory and CPU. So it looks like it ran into an infinite loop. This for several times has been an indicator for something going wrong with the signal handling in Boost.Test; this time, it also looks like Boost.Test is the culprit; strace shows output similar to the other cases:
--- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0
Hmm.... the test worked OK for me! I'm really interested to figure out where the SIGABRT comes from: maybe some assert fires.
The short version: Usually *some* signal is raised and caught, siglongjumping() to some other location on the call stack confuses exception handling. An exception is thrown, the (confused) exception handling mechanism thinks it is invalid and calls terminate() which in turn calls abort(). abort() raises the SIGABRT, the signal handler gets invoked again. Longer versions can be found in the mailing list archives, this problem has been reported a few times already. We're deep into UB land, and the signal handling code is known to fail on como. Apparently, it also fails on gcc 2 under certain circumstances. I wouldn't be surpised at all if it also failed on other compilers.
This type of failure is a showstopper for testing. I suggest to disable the sigaction based signal handling in Boost.Test at least for gcc 2 and for como. Perhaps, other compilers are also affected.
You mean the problems is only on those two toolsets? Yes, I think disabling signal handling in Boost.Test to see where the test fails would be very desired. BTW, you mention como, but I don't see that toolset in linux regression results.
The problem is known to exist on como. That's actually the major reason for como not being on the list of regression results. The code exploits UB, and I'm expecting it to cause problems with other toolsets, too, some day. However, other than to disable the sigaction/siglongjmp based signal handling, I have no suggestion to fix it. Regards, m

Martin Wille wrote:
--- SIGABRT (Aborted) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(21453, SIGABRT) = 0
Hmm.... the test worked OK for me! I'm really interested to figure out where the SIGABRT comes from: maybe some assert fires.
The short version: Usually *some* signal is raised and caught, siglongjumping() to some other location on the call stack confuses exception handling. An exception is thrown, the (confused) exception handling mechanism thinks it is invalid and calls terminate() which in turn calls abort(). abort() raises the SIGABRT,
Ah.. so the original signal was not necessary SIGABRT. Anyway, it's interesting to know what it was, or why the test decided to fail in the first place.
The problem is known to exist on como. That's actually the major reason for como not being on the list of regression results. The code exploits UB, and I'm expecting it to cause problems with other toolsets, too, some day. However, other than to disable the sigaction/siglongjmp based signal handling, I have no suggestion to fix it.
I've only now realized that the code tries to throw from signal handler... well, that's really UB! - Volodya

Running the tests takes ages now on my machine. This is due to a few tests consuming enormous amounts of memory. Swapping causes the test time to increase to hours(!) for some of the tests. The current killer test is algorithm/string/replace on intel 8. Compiling the program consumes ~800MB RAM, which is 3 times as much as my box has built in. This results in a compilation time of 3 hours for that test. Having such tests to run effectively reduces my ability to sort out the problems (I'm currently considering to move the testing for some of the compilers to a different machine. However, this will take some time. I dislike this idea, though, since this will result in Linux tests being split across several sets.) My box sure is underequipped, but 800MB for a single test is way too much, anyway.
I can reproduce that with my 512 Mb laptop: interestingly other compilers don't seem to have an issue with that test, including Intel 8 on Windows (dual booted on the same machine). I'll get on to the folks at Intel about that, it looks like something they might want to look into. John.
participants (4)
-
Eric Niebler
-
John Maddock
-
Martin Wille
-
Vladimir Prus