Linux regression test runners - What do devs want?
I've been pondering this recently, but a forced upgrade of my azure VMs last night is pressing the issue...and I wanted to solicit feedback before I go and set things back up... I've currently been running three testing machines for linux, with several testers on each, (windows is completely separate, and I'm happy with how that is running)...here was my past strategy: teeks99-03 - develop - lots of compiler versions (all I could get to run on ubuntu 12.04: gcc-4.4-4.8, clang-3.0-3.4) teeks99-04 - develop - only two compilers, with fast turnaround times (gcc-4.8, clang-3.4) teeks99-05 - master - lots of compiler versions (all I could get to run on ubuntu 12.04: gcc-4.4-4.8, clang-3.0-3.4) I can try to set things up like this again, but there are two reasons I was thinking about changing: 1) It seems that a lot of the gcc/clang versions have the same pass/fail results as the other versions of that compiler, so we're not adding a ton (though we may be adding the crucial points where a compiler change breaks something...is this something that devs have seen in practice?) 2) The new version of my runners (Ubuntu 14.04) doesn't easily support the old gcc versions, it would take a lot of effort to get some of the old ones running. Clang is even worse...it is much more difficult to get even two versions running side by side now. Instead of running lots of different versions, I was thinking about running various compatibility options of the two main compilers...I've seen some other test runners with things like libc++, c++11, c++14, etc. My real question, would developers be better served by these options than different versions? Would they prefer both with longer revisit times between each test type? Other thoughts? Tom
________________________________________ From: Boost [boost-bounces@lists.boost.org] on behalf of Tom Kent [lists@teeks99.com] Sent: 03 January 2015 16:36 To: Running Boost regression tests; Boost Developers List Subject: [boost] Linux regression test runners - What do devs want?
I've been pondering this recently, but a forced upgrade of my azure VMs last night is pressing the issue...and I wanted to solicit feedback before I go and set things back up...
....
Instead of running lots of different versions, I was thinking about running various compatibility options of the two main compilers...I've seen some other test runners with things like libc++, c++11, c++14, etc. My real question, would developers be better served by these options than different versions? Would they prefer both with longer revisit times between each test type? Other thoughts?
Tom
_______________________________________________ Tom First of all, thank you to you and everyone else who runs tests. As a maintainer (of Boost Phoenix) I need your tests and I am grateful for them. Let me answer you by explaining my recent experience. I have been attempting to sort out some rather elusive bugs which are in the Boost Phoenix code (sorry folks, not solved all of them yet). In the course of this several things have become clear from the testing. 1. One bug showed up in recent compilers (gcc 4.9 and later and clang 3.5 and later) but not in earlier compilers. This turned out to be because the more recent compilers have a different policy on function overloading resolution. 2. Some bugs showed up with compilation for C++11, but not for the same compiler with C++03. 3. Some bugs show up for older gcc e.g. 4.4 and 4.5, but not for later compilers. 4. Some bugs show up as running out of memory on some systems, so I have tried to reduce the header range. 5. Some bugs show up on one testers version of a compiler and not someone elses. A real puzzle those. Looking at what you do, you have been covering the range of compilers. Other people are doing the C++11 comparison. It is good if between you there are a range of these things covered. It is also good if tests do run every few days, so that confirmation can be gained of the effectiveness of a fix. This has been a problem over the New Year as I think some test stations are just not running for a while. It is good to be able to distinguish between problems which are mine to sort out, and those caused by a dependency gping wrong, as happened today to Boost Move. BTW, two of your runs on Ubuntu 14.04.arm seem to be failing on everything with this: ../libs/phoenix/test/include/core/actor.cpp:7:13: fatal error: error writing to /tmp/ccaRf02g.s: No space left on device int main() {} I hope this helps in your decision making. Please ask me if you want more information. Happy New Year John
I don't know if this will help but .. I would like to see the test matrix show a different column of results for versions C++03, C++11, and C++14. For example, if I look at http://www.boost.org/development/tests/develop/teeks99-04a-Ubuntu12-04-64.ht... I can't figure out which version of C++ we're using. In my case, I strive to keep the serialization library backward compatible so this is especially important to me. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/Linux-regression-test-runners-What-do-dev... Sent from the Boost - Dev mailing list archive at Nabble.com.
First, I'd like to thank you for running tests for so many
configurations. It is most helpful.
On Sat, Jan 3, 2015 at 7:36 PM, Tom Kent
I've been pondering this recently, but a forced upgrade of my azure VMs last night is pressing the issue...and I wanted to solicit feedback before I go and set things back up...
[snip]
I can try to set things up like this again, but there are two reasons I was thinking about changing: 1) It seems that a lot of the gcc/clang versions have the same pass/fail results as the other versions of that compiler, so we're not adding a ton (though we may be adding the crucial points where a compiler change breaks something...is this something that devs have seen in practice?)
This is true that developer errors tend to influence multiple, if not all compilers. However, this kind of errors is also the easiest to discover and fix. Often the developer is able to reproduce such failure locally. The other kind of problems is compiler bugs. These are naturally specific to the compiler versions, sometimes up to the patch version. The developer often does not have all compiler versions installed, so this kind of problems are hard to discover and fix locally. For this reason having multiple compiler versions in the matrix is invaluable help to developers, and I would very much like to have it. From personal experience, I do stumble upon compiler-specific problems from time to time. Another point is testing compiler-specific code (i.e. the code that uses compiler-specific features available since some version). Our current infrastructure doesn't really suit for that kind of testing, but still it is possible to see that the code compiles and runs. In some cases it is possible to verify that it doesn't generate warnings. If you want to reduce the number of configs to test, you can probably have a look which compiler versions are used in different Linux distros (e.g. CentOS, Fedora, OpenSUSE, Ubuntu, Debian) of the latest LTS and normal releases. For instance, I know gcc 4.4 was used in Debian Squeeze, but in Wheezy (the latest Debian release at the moment) it's 4.7, so if no one else uses 4.4 we can probably drop it. Make a list of what you'd like to drop and post it here so that people can see which versions specifically are candidates for removal.
2) The new version of my runners (Ubuntu 14.04) doesn't easily support the old gcc versions, it would take a lot of effort to get some of the old ones running. Clang is even worse...it is much more difficult to get even two versions running side by side now.
Is it possible to install different versions in chroot environment?
Instead of running lots of different versions, I was thinking about running various compatibility options of the two main compilers...I've seen some other test runners with things like libc++, c++11, c++14, etc. My real question, would developers be better served by these options than different versions? Would they prefer both with longer revisit times between each test type? Other thoughts?
Having testbeds for newer language versions would be good, but if I had to choose between more language versions and more compilers I would choose the latter. It would be ok for me if there were a limited set of fast turnaround testbeds (e.g. latest widespread gcc in C++03 and C++11 modes) and the more exotic rest were less frequent, as long as "less frequent" is reasonable. I'd say, any testbed should cycle at least once a week or so. With longer refresh periods it becomes more difficult to fix bugs before the release.
On 3 Jan 2015 at 10:36, Tom Kent wrote:
I can try to set things up like this again, but there are two reasons I was thinking about changing: 1) It seems that a lot of the gcc/clang versions have the same pass/fail results as the other versions of that compiler, so we're not adding a ton (though we may be adding the crucial points where a compiler change breaks something...is this something that devs have seen in practice?)
For my automated testing of parts of Boost I only bother testing builds with multiple compiler versions, and I don't bother with the unit testing which is generally only done with one clang (thread and UB sanitiser) and one gcc (right now usually 4.8). The build testing across compilers is very valuable, and you'd be amazed how often some random version breaks e.g. GCC 4.7 only. I've only ever seen unit testing break once on a single compiler only, that was on GCC 4.8.0. Obviously different code bases may experience different things.
2) The new version of my runners (Ubuntu 14.04) doesn't easily support the old gcc versions, it would take a lot of effort to get some of the old ones running. Clang is even worse...it is much more difficult to get even two versions running side by side now.
That's because the LLVM maintainer of the Ubuntu repos only added multi-version support from 3.5 onwards or so due to a feature request from me. To get earlier versions of clang which coexist happily on the same Ubuntu 14.04, look into the hATrayflood repo on launchpad. He's hacked clangs right back to 2.9 through to 3.6, and they all coexist. I also have GCCs back to 4.6 all coexisting, I think that's from the Ubuntu toolchain repo on launchpad. I don't need earlier GCCs, so can't say much about those.
Instead of running lots of different versions, I was thinking about running various compatibility options of the two main compilers...I've seen some other test runners with things like libc++, c++11, c++14, etc. My real question, would developers be better served by these options than different versions? Would they prefer both with longer revisit times between each test type? Other thoughts?
For Boost.Thread you can see the build dashboard at https://github.com/boostorg/thread/. As you'll see, we test 03, 11 and 14 as we have been surprised by random breakage. Instead of testing libc++ on Linux, I'd suggest a FreeBSD 10 runner which is also surprisingly good at catching problems on OS X. I've found you can simply use the FreeBSD 10 default compiler, though note that the auto detection logic in bootstrap.sh is broken on FreeBSD and has been for some time (force the toolset to clang, it'll work). Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
I have been running a RHEL6 tester since my team has been unfortunately cursed with it. It is interesting that it has tests which fail despite the gcc 4.4 on teeks does not. My question about older compilers would be, are they actually being used on Ubuntu or should we be testing a different distro? - Thomas
participants (6)
-
Andrey Semashev
-
Fletcher, John P
-
Niall Douglas
-
Robert Ramey
-
Suckow, Thomas J
-
Tom Kent