Re:[boost] [Boost.Test] New testing procedure?

Dave Abrahams wrote:
Clearly, test the test (meta-testing ?) is a special category. I needs to be staged to be tested itself before being used to test other stuff. I believe boost testing is going to be an issue in the near future do the fact that testing time is getting longer and longer and longer. I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release. Although this wouldn't definitively address the issue raised it would help. As only those libraries currently being tested would suffer due to dependencies on other code. It would also save lots of time and permit test time to keep from becoming an issue. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
I don't. You can get test results for any library on any compiler that's being tested daily within 24 hours. Some compilers are tested every 12 hours (see meta-comm). I don't see why that should be insufficient.
Although this wouldn't definitively address the issue raised it would help.
I don't see how. The test library breaks and that breaks all the other libraries. How will it help if tests are run less often?
As only those libraries currently being tested would suffer due to dependencies on other code.
The libraries still suffer; the tests just stop telling us so. IMO sticking our heads in the sand is not a good approach to testing.
It would also save lots of time and permit test time to keep from becoming an issue.
It would also prevent problems from being seen. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Fri, 21 May 2004 20:41:59 -0400, David Abrahams wrote
It's kind of spotty outside of the meta-comm guys: IBM Aix 11 days Mac OS today SGI Irix 2 weeks linux 4 days Sun Solaris 6 days Win32 4 weeks win32_metacomm today And that's today. Consider during the next couple months 3-4 new libraries are pending to be added. Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test. What will happen in a year when we have say 10 new libraries? Robert and I have believe something will need to be done. We've tried to start a discussion, but no one responded: http://lists.boost.org/MailArchives/boost/msg64471.php http://lists.boost.org/MailArchives/boost/msg64491.php Jeff BTW I might be able to contribute to the Linux testing -- are there instructions on how to set this up somewhere?

Jeff Garland writes:
IMO the only thing it indicates is that these tests are initiated manually.
Consider during the next couple months 3-4 new libraries are pending to be added.
No a problem, in general. Right now a *full rebuild* takes about 8 hours. If we switch to incremental model, we have plenty of reserve, here.
Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test.
Well, the dramatic cases need to be dealt with, and IMO a Jamfile that allows the library author to manage the level of "stressfulness" would be just enough.
What will happen in a year when we have say 10 new libraries?
Well, hopefully we'll also have more computing power. Surely a lot of organizations which use Boost libraries can afford to spare a middle-class machine for automatic testing?
For *nix systems, there is a shell script that is pretty much self-explanatory: http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests... If you want something that requires even less maintenance, we can provide you with the Python-based regression system we use here at Meta. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 15:04:19 -0500, Aleksey Gurtovoy wrote
Really. I find it hard to believe all the *nix guys don't have a cron job setup. But maybe what you are saying is the rest of the system isn't 100%...
I assume that's all the compilers? Anyway, I remember others (Beman) have previously expressed concern about the length of the test cycle.
'Something' will need to be done with or for serialization. The current test is very lengthy to perform. So I suppose Robert can cut the test down, but that means the possibility of missing some portability issue. So I can see why we want the capability to run that full-up torture test -- just not every day.
Perhaps. From my view things seem pretty thin already. There was some discussion during the last release that some testers had removed the python tests because they were taking too long. BTW, just to pile on, wouldn't it be nice if we had testing of the sandbox libraries as well? This would really help those new libraries get ported sooner rather than later...
I agree more distribution of testing would be another way to improve things -- at least for windows and Linux. But the reason I'm advocating the ability to split into a basic versus torture and the various dll/static options is that we don't have five contributors to run an SGI test. If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have. Wouldn't be better to have a basic test that would be faster to run than no tests at all?
Thanks, I'll take a look.
If you want something that requires even less maintenance, we can provide you with the Python-based regression system we use here at Meta.
Well, I'm going to want something almost totally hands-off or it just won't happen. I don't have time to babysit stuff. So I guess I'd like to see both. For awhile I'm likely to setup only a single compiler (gcc 3.3.1) on my Mandrake 9 machine. With that approach I should be able to cycle more frequently. Incremental testing is probably a good thing to try out as well. Jeff

Jeff Garland writes:
If they did setup it, the above list would look different.
But maybe what you are saying is the rest of the system isn't 100%...
I'm saying that the fact that tests on some platforms hasn't been run for a while means exactly just that -- they hasn't been run for a while, no more, no less. There might be a number of reasons why that's the case with every particular platform, but by itself it doesn't indicate that a (supposedly) long run cycle has anything to do with it.
Yep, nine of them.
Anyway, I remember others (Beman) have previously expressed concern about the length of the test cycle.
It is a problem if you are running them on your "primary" machine during the day. I don't think we can do much about it -- just compiling the tests takes about half of the whole cycle's time, and personally I see little value in regressions that at least didn't compile every test. On the other hand, an incremental cycle, if it involves just a couple of libraries, can be made pretty fast. Bjam needs some tweaking, though, to skip the libraries that were marked up as unusable.
Sure, I was just saying that the library author can deal with it on its own -- just make several sections in the bjam file and enable/disable them depending on your current needs.
If we provide a documented way to setup the whole thing, and post "A Call for Regression Runners", I am sure we'll get some response.
Well, you are right that right now the resources are a little spare, but IMO it's just because we didn't work on it.
IMO that's asking too much. Many of them never get submitted.
"Basic" (supposedly what we have now) versus "drastic" (supposedly what's coming with serialization) distinction definitely makes sense. I am not arguing against this one, rather against lowering down our current standards.
If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have.
True, if they are forced to run the drastic test, which IMO shouldn't be the case -- it should be entirely up to the regression runner to decide when and if they have the resources to do that.
Wouldn't be better to have a basic test that would be faster to run than no tests at all?
Sure, and it should be up to them to decide that.
OK, we'll make it available.
It produces less reliable results, but the roots of it needs to be tracked and fixed, so yes, it would be good to start looking into it. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 17:18:04 -0500, Aleksey Gurtovoy wrote
Well, I think there is. The additional value of compiling and running the exact same test for date_time in the dll and static link version is exactly the sort of thing that could reduce the compile and runtimes for 'primary machine' testers. I suppose I could start customizing my Jamfile to only run multi-threaded dll on windows, but now I'm deciding what the regression testers can afford and it stops you (Meta-Comm) from running all the variations -- which I still want to see.
Sure, but really I'm proposing we turn that around. If the regression tester has the hardware resources to run a torture test with 3 different linking variations then they should be able to do that. As soon as Robert enables the 2.5 hour torture test regression testers might suddenly have an objection to 'author only' control.
You could be right.
Many are extensions to existing libraries under active development -- likely to get moved to final CVS. I think it would be nice for libraries coming up for review to get the benefit of the regression system. Clearly we might need to subset what gets run and clean out the old stuff, but I think this would smooth the integration of new libraries.
I don't want to lower the current standard either. With the Basic option, however, some current libraries might define a smaller test suite speeding up the core tests. Of course, if there it is impossible to subset, then fine they could stay where they are now. Those regression sites that have the horsepower to run the torture test with all variations can still go for that option. Of course we will prefer that, but some might choose to run the torture test once per week (say over a weekend) and the regular tests during the week.
Well as soon as Robert wants to run the torture test he's going to get it at all sites if he controls it via his Jamfile. So we need some boost-wide option to define these variations. Hopefully my other email clarifies the idea.
Ok will do... Jeff

Jeff Garland writes:
You have a point, here.
I agree it would be nice, but that's basically it. If we happened to have the resources to do that, good, if we don't, well, we don't.
I think I'm becoming comfortable with the idea. It does complicate things, however. For instance, now the library author would have to look into twice as much reports to determine the state of her library (superseding the old "torture" results with the new "basic" ones would make the whole thing useless).
It does, although I don't see how we can manage/afford all the combinations. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 22:21:29 -0500, Aleksey Gurtovoy wrote
I agree, it's a nice to have. I was just trying to make the point that there is more that could be done...
Yeah, I agree that's an issue. Although I expect the basic results to normally be a subset of the torture results. So they could be overlayed, but then some of the results are out of date. Probably better would be to have totally different pages for the basic and torture results. Which of course makes it harder to find the results. On the other hand, we might be wringing our hands about problems that aren't important. Maybe Meta-Comm is always going to run the torture test in incremental mode while someone else just runs basic tests normally. Then before a release we could ask the testers that can to notch up to the complete test.
I agree. The combinatorics aren't good, and I even forgot to include the debug/release dimension. But this is where additional testers could help. One group could run with release builds while another runs debug builds. So the current state of affairs is that there isn't a standard set of flags to control the debug/release and linking options. So if the library author wants static linking he basically has to define static linking in the Jamfile. What I'm thinking is that control of the linking and debug/release options shouldn't have to be specified this way. The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like: VC7.1 VC7.1 VC7.1 etc release release debug dll static dll test1 Pass fail Pass test2 Pass Pass Pass ... Then basic/complete option controls the number of tests run. Jeff

"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
Tests aren't generally designed to be run in release mode. In release mode, assertions are turned off.
There certainly is, for debug/release; you stick it in the BUILD variable.
Why do you think that? I think your whole view of the static/dynamic issue is naive. Static and dynamic objects can't always be built the same way, so that may have to be part of the Jamfile. And static/dynamic linking isn't an all-or-nothing proposition. Every library test involves at least two libraries (the one being tested and the runtime), sometimes more. There's no inherent reason some couldn't be statically linked and others dynamically linked. Furthermore, I don't see a big advantage in having a separate command-line option to choose which of those linking modes is used. If the library needs to be tested in several variants, then so be it. If it doesn't, but you'd like to see more variants sometimes, you can put some of the variants into the --torture option.
That's outta hand, IMO. If it's worth having options, let's keep them simple. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, 23 May 2004 02:38:43 -0400, David Abrahams wrote
Whatever. Problems that assert finds show up in release builds anyway, they are just harder to track down. And since there are issues that only appear in release, and customers typically get release code, I tend to spend most of my test effort getting burn-time in release mode.
That's true, that one is ok. For some reason it never hit me that all the current regression tests are run in debug mode -- unless I override it i the Jamfile ;-)
Perhaps.
Static and dynamic objects can't always be built the same way, so that may have to be part of the Jamfile.
Yes, those rules are typically provided by the library/build/Jamfile. Why would I need special rules in the test/Jamfile other than to specify my dependency on the dynamic or static library?
That's irrelevent. I only care about the library under test.
Furthermore, I don't see a big advantage in having a separate command-line option to choose which of those linking modes is used.
Ok, we disagree on this.
Ok. By the way, I like your suggestion to call it --complete or perhaps better --exhaustive.
Well, I think it correctly factors the dimensions of compilation options versus tests. But I think your earlier email provides an example of how something similar can be achieved by just using 'if' statements to control the rules for various linkage options, so I'm fine with baby steps. Jeff

"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
I didn't say you would. Is that what you mean when you say "_the_ Jamfile"?
OK, that makes it a little better specified. But why do you think that's a more relevant test than one that varies how the runtime is linked?
I didn't suggest that.
There are many, many more dimensions. You could select different optimization levels, for example. You could test with inlining on/off. You could test with RTTI on/off. Maybe there's an argument for the idea that complete testing is run against each of the library configurations that is installed by the top level build process, and no other ones... -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, 23 May 2004 14:51:49 -0400, David Abrahams wrote
Yes, I was talking about the test/Jamfile.
Because I assume that the runtime libraries are already tested and stable and the focus is on the various incarnations of the library under test. However, I do concede your point. To be exhaustive, linking different runtimes is required to test all the interactions. Which, of course, increases the number of options yet again...
Sorry for the incorrect attribution -- too much email.
Now that's outta hand ;-) I agree that there is an almost infinite potential set of options. I believe the set I'm suggesting hits a broad cross-section of needs, but I'd be happy to see others step forward with different test variations if they have a need.
That sounds like a reasonable approach to me. Jeff

"Aleksey Gurtovoy" <agurtovoy@meta-comm.com> writes:
A really slick system would integrate weekly and daily results into one table, but as you said, it gets complicated somewhere. In this case, it's in the results processing. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

For *nix systems, there is a shell script that is pretty much self-explanatory:
http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests...
Couple things: 1) Are all the *nix regression testers checking out anonymous and is there still a 24/48 hour delay on sourceforge? Or have they modified to script? 2) The script has step 6 which finishes with generating the html table. There must be step 7 to upload the results. Is there guidance on this? Thx, Jeff

Jeff Garland wrote:
linux 4 days
This is due to me being offline for some time (as announced when a possible release date was being discussed) I'll be able to resume daily testing in July. In the meantime, at least weekly results are available. Regards, m

Robert Ramey writes:
I don't agree. As a developer, I want to see the breakage as early as possible, and "no continuos testing" model would prevent me from that. The last thing I want is to deal with accumulated failures when I wasn't expecting it. IMO the asnwer to a long testing cycle is a) incremental cylces, with a full rebuild once a week or something similar (here at Meta, for instance, are currently doing full rebuild on every cycle); b) distributed testing of libraries, with the following merging of results into a single report. -- Aleksey Gurtovoy MetaCommunications Engineering
participants (5)
-
Aleksey Gurtovoy
-
David Abrahams
-
Jeff Garland
-
Martin Wille
-
Robert Ramey