Re:[boost] [Boost.Test] New testing procedure?

Dave Abrahams wrote:
maybe we need to move to a different model wherein the test library's own tests are run on a CVS branch of the code (?) so that Gennadiy can see and deal with his problems before they are merged into the main trunk and break everything else?
Clearly, test the test (meta-testing ?) is a special category. I needs to be staged to be tested itself before being used to test other stuff. I believe boost testing is going to be an issue in the near future do the fact that testing time is getting longer and longer and longer. I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release. Although this wouldn't definitively address the issue raised it would help. As only those libraries currently being tested would suffer due to dependencies on other code. It would also save lots of time and permit test time to keep from becoming an issue. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
Dave Abrahams wrote:
maybe we need to move to a different model wherein the test library's own tests are run on a CVS branch of the code (?) so that Gennadiy can see and deal with his problems before they are merged into the main trunk and break everything else?
Clearly, test the test (meta-testing ?) is a special category. I needs to be staged to be tested itself before being used to test other stuff.
I believe boost testing is going to be an issue in the near future do the fact that testing time is getting longer and longer and longer.
I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release.
I don't. You can get test results for any library on any compiler that's being tested daily within 24 hours. Some compilers are tested every 12 hours (see meta-comm). I don't see why that should be insufficient.
Although this wouldn't definitively address the issue raised it would help.
I don't see how. The test library breaks and that breaks all the other libraries. How will it help if tests are run less often?
As only those libraries currently being tested would suffer due to dependencies on other code.
The libraries still suffer; the tests just stop telling us so. IMO sticking our heads in the sand is not a good approach to testing.
It would also save lots of time and permit test time to keep from becoming an issue.
It would also prevent problems from being seen. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Fri, 21 May 2004 20:41:59 -0400, David Abrahams wrote
"Robert Ramey" <ramey@rrsd.com> writes:
Dave Abrahams wrote: I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release.
I don't. You can get test results for any library on any compiler that's being tested daily within 24 hours. Some compilers are tested every 12 hours (see meta-comm). I don't see why that should be insufficient.
It's kind of spotty outside of the meta-comm guys: IBM Aix 11 days Mac OS today SGI Irix 2 weeks linux 4 days Sun Solaris 6 days Win32 4 weeks win32_metacomm today And that's today. Consider during the next couple months 3-4 new libraries are pending to be added. Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test. What will happen in a year when we have say 10 new libraries? Robert and I have believe something will need to be done. We've tried to start a discussion, but no one responded: http://lists.boost.org/MailArchives/boost/msg64471.php http://lists.boost.org/MailArchives/boost/msg64491.php Jeff BTW I might be able to contribute to the Linux testing -- are there instructions on how to set this up somewhere?

Jeff Garland writes:
On Fri, 21 May 2004 20:41:59 -0400, David Abrahams wrote
"Robert Ramey" <ramey@rrsd.com> writes:
Dave Abrahams wrote: I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release.
I don't. You can get test results for any library on any compiler that's being tested daily within 24 hours. Some compilers are tested every 12 hours (see meta-comm). I don't see why that should be insufficient.
It's kind of spotty outside of the meta-comm guys: IBM Aix 11 days Mac OS today SGI Irix 2 weeks linux 4 days Sun Solaris 6 days Win32 4 weeks win32_metacomm today
And that's today.
IMO the only thing it indicates is that these tests are initiated manually.
Consider during the next couple months 3-4 new libraries are pending to be added.
No a problem, in general. Right now a *full rebuild* takes about 8 hours. If we switch to incremental model, we have plenty of reserve, here.
Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test.
Well, the dramatic cases need to be dealt with, and IMO a Jamfile that allows the library author to manage the level of "stressfulness" would be just enough.
What will happen in a year when we have say 10 new libraries?
Well, hopefully we'll also have more computing power. Surely a lot of organizations which use Boost libraries can afford to spare a middle-class machine for automatic testing?
Robert and I have believe something will need to be done. We've tried to start a discussion, but no one responded:
http://lists.boost.org/MailArchives/boost/msg64471.php http://lists.boost.org/MailArchives/boost/msg64491.php
Jeff
BTW I might be able to contribute to the Linux testing -- are there instructions on how to set this up somewhere?
For *nix systems, there is a shell script that is pretty much self-explanatory: http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests... If you want something that requires even less maintenance, we can provide you with the Python-based regression system we use here at Meta. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 15:04:19 -0500, Aleksey Gurtovoy wrote
Jeff Garland writes:
It's kind of spotty outside of the meta-comm guys: IBM Aix 11 days Mac OS today SGI Irix 2 weeks linux 4 days Sun Solaris 6 days Win32 4 weeks win32_metacomm today
And that's today.
IMO the only thing it indicates is that these tests are initiated manually.
Really. I find it hard to believe all the *nix guys don't have a cron job setup. But maybe what you are saying is the rest of the system isn't 100%...
Consider during the next couple months 3-4 new libraries are pending to be added.
No a problem, in general. Right now a *full rebuild* takes about 8 hours. If we switch to incremental model, we have plenty of reserve, here.
I assume that's all the compilers? Anyway, I remember others (Beman) have previously expressed concern about the length of the test cycle.
Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test.
Well, the dramatic cases need to be dealt with, and IMO a Jamfile that allows the library author to manage the level of "stressfulness" would be just enough.
'Something' will need to be done with or for serialization. The current test is very lengthy to perform. So I suppose Robert can cut the test down, but that means the possibility of missing some portability issue. So I can see why we want the capability to run that full-up torture test -- just not every day.
What will happen in a year when we have say 10 new libraries?
Well, hopefully we'll also have more computing power. Surely a lot of organizations which use Boost libraries can afford to spare a middle-class machine for automatic testing?
Perhaps. From my view things seem pretty thin already. There was some discussion during the last release that some testers had removed the python tests because they were taking too long. BTW, just to pile on, wouldn't it be nice if we had testing of the sandbox libraries as well? This would really help those new libraries get ported sooner rather than later...
...from the other mail... Aleksey wrote: b) distributed testing of libraries, with the following merging of results into a single report.
I agree more distribution of testing would be another way to improve things -- at least for windows and Linux. But the reason I'm advocating the ability to split into a basic versus torture and the various dll/static options is that we don't have five contributors to run an SGI test. If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have. Wouldn't be better to have a basic test that would be faster to run than no tests at all?
BTW I might be able to contribute to the Linux testing -- are there instructions on how to set this up somewhere?
For *nix systems, there is a shell script that is pretty much self-explanatory:
http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests...
Thanks, I'll take a look.
If you want something that requires even less maintenance, we can provide you with the Python-based regression system we use here at Meta.
Well, I'm going to want something almost totally hands-off or it just won't happen. I don't have time to babysit stuff. So I guess I'd like to see both. For awhile I'm likely to setup only a single compiler (gcc 3.3.1) on my Mandrake 9 machine. With that approach I should be able to cycle more frequently. Incremental testing is probably a good thing to try out as well. Jeff

Jeff Garland writes:
On Sat, 22 May 2004 15:04:19 -0500, Aleksey Gurtovoy wrote
Jeff Garland writes:
It's kind of spotty outside of the meta-comm guys: IBM Aix 11 days Mac OS today SGI Irix 2 weeks linux 4 days Sun Solaris 6 days Win32 4 weeks win32_metacomm today
And that's today.
IMO the only thing it indicates is that these tests are initiated manually.
Really. I find it hard to believe all the *nix guys don't have a cron job setup.
If they did setup it, the above list would look different.
But maybe what you are saying is the rest of the system isn't 100%...
I'm saying that the fact that tests on some platforms hasn't been run for a while means exactly just that -- they hasn't been run for a while, no more, no less. There might be a number of reasons why that's the case with every particular platform, but by itself it doesn't indicate that a (supposedly) long run cycle has anything to do with it.
Consider during the next couple months 3-4 new libraries are pending to be added.
No a problem, in general. Right now a *full rebuild* takes about 8 hours. If we switch to incremental model, we have plenty of reserve, here.
I assume that's all the compilers?
Yep, nine of them.
Anyway, I remember others (Beman) have previously expressed concern about the length of the test cycle.
It is a problem if you are running them on your "primary" machine during the day. I don't think we can do much about it -- just compiling the tests takes about half of the whole cycle's time, and personally I see little value in regressions that at least didn't compile every test. On the other hand, an incremental cycle, if it involves just a couple of libraries, can be made pretty fast. Bjam needs some tweaking, though, to skip the libraries that were marked up as unusable.
Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test.
Well, the dramatic cases need to be dealt with, and IMO a Jamfile that allows the library author to manage the level of "stressfulness" would be just enough.
'Something' will need to be done with or for serialization. The current test is very lengthy to perform. So I suppose Robert can cut the test down, but that means the possibility of missing some portability issue. So I can see why we want the capability to run that full-up torture test -- just not every day.
Sure, I was just saying that the library author can deal with it on its own -- just make several sections in the bjam file and enable/disable them depending on your current needs.
What will happen in a year when we have say 10 new libraries?
Well, hopefully we'll also have more computing power. Surely a lot of organizations which use Boost libraries can afford to spare a middle-class machine for automatic testing?
Perhaps. From my view things seem pretty thin already.
If we provide a documented way to setup the whole thing, and post "A Call for Regression Runners", I am sure we'll get some response.
There was some discussion during the last release that some testers had removed the python tests because they were taking too long.
Well, you are right that right now the resources are a little spare, but IMO it's just because we didn't work on it.
BTW, just to pile on, wouldn't it be nice if we had testing of the sandbox libraries as well? This would really help those new libraries get ported sooner rather than later...
IMO that's asking too much. Many of them never get submitted.
...from the other mail... Aleksey wrote: b) distributed testing of libraries, with the following merging of results into a single report.
I agree more distribution of testing would be another way to improve things -- at least for windows and Linux. But the reason I'm advocating the ability to split into a basic versus torture and the various dll/static options is that we don't have five contributors to run an SGI test.
"Basic" (supposedly what we have now) versus "drastic" (supposedly what's coming with serialization) distinction definitely makes sense. I am not arguing against this one, rather against lowering down our current standards.
If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have.
True, if they are forced to run the drastic test, which IMO shouldn't be the case -- it should be entirely up to the regression runner to decide when and if they have the resources to do that.
Wouldn't be better to have a basic test that would be faster to run than no tests at all?
Sure, and it should be up to them to decide that.
BTW I might be able to contribute to the Linux testing -- are there instructions on how to set this up somewhere?
For *nix systems, there is a shell script that is pretty much self-explanatory:
http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests...
Thanks, I'll take a look.
If you want something that requires even less maintenance, we can provide you with the Python-based regression system we use here at Meta.
Well, I'm going to want something almost totally hands-off or it just won't happen. I don't have time to babysit stuff. So I guess I'd like to see both.
OK, we'll make it available.
For awhile I'm likely to setup only a single compiler (gcc 3.3.1) on my Mandrake 9 machine. With that approach I should be able to cycle more frequently. Incremental testing is probably a good thing to try out as well.
It produces less reliable results, but the roots of it needs to be tracked and fixed, so yes, it would be good to start looking into it. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 17:18:04 -0500, Aleksey Gurtovoy wrote
Anyway, I remember others (Beman) have previously expressed concern about the length of the test cycle.
It is a problem if you are running them on your "primary" machine during the day. I don't think we can do much about it -- just compiling the tests takes about half of the whole cycle's time, and personally I see little value in regressions that at least didn't compile every test.
Well, I think there is. The additional value of compiling and running the exact same test for date_time in the dll and static link version is exactly the sort of thing that could reduce the compile and runtimes for 'primary machine' testers. I suppose I could start customizing my Jamfile to only run multi-threaded dll on windows, but now I'm deciding what the regression testers can afford and it stops you (Meta-Comm) from running all the variations -- which I still want to see.
Jeff wrote: Serialization tests alone dramatically increase the length of the time to run the regression if we always run the full test. ...snip... Sure, I was just saying that the library author can deal with it on its own -- just make several sections in the bjam file and enable/disable them depending on your current needs.
Sure, but really I'm proposing we turn that around. If the regression tester has the hardware resources to run a torture test with 3 different linking variations then they should be able to do that. As soon as Robert enables the 2.5 hour torture test regression testers might suddenly have an objection to 'author only' control.
Perhaps. From my view things seem pretty thin already.
If we provide a documented way to setup the whole thing, and post "A Call for Regression Runners", I am sure we'll get some response.
There was some discussion during the last release that some testers had removed the python tests because they were taking too long.
Well, you are right that right now the resources are a little spare, but IMO it's just because we didn't work on it.
You could be right.
BTW, just to pile on, wouldn't it be nice if we had testing of the sandbox libraries as well? This would really help those new libraries get ported sooner rather than later...
IMO that's asking too much. Many of them never get submitted.
Many are extensions to existing libraries under active development -- likely to get moved to final CVS. I think it would be nice for libraries coming up for review to get the benefit of the regression system. Clearly we might need to subset what gets run and clean out the old stuff, but I think this would smooth the integration of new libraries.
"Basic" (supposedly what we have now) versus "drastic" (supposedly what's coming with serialization) distinction definitely makes sense. I am not arguing against this one, rather against lowering down our current standards.
I don't want to lower the current standard either. With the Basic option, however, some current libraries might define a smaller test suite speeding up the core tests. Of course, if there it is impossible to subset, then fine they could stay where they are now. Those regression sites that have the horsepower to run the torture test with all variations can still go for that option. Of course we will prefer that, but some might choose to run the torture test once per week (say over a weekend) and the regular tests during the week.
If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have.
True, if they are forced to run the drastic test, which IMO shouldn't be the case -- it should be entirely up to the regression runner to decide when and if they have the resources to do that.
Well as soon as Robert wants to run the torture test he's going to get it at all sites if he controls it via his Jamfile. So we need some boost-wide option to define these variations. Hopefully my other email clarifies the idea.
For awhile I'm likely to setup only a single compiler (gcc 3.3.1) on my Mandrake 9 machine. With that approach I should be able to cycle more frequently. Incremental testing is probably a good thing to try out as well.
It produces less reliable results, but the roots of it needs to be tracked and fixed, so yes, it would be good to start looking into it.
Ok will do... Jeff

Jeff Garland writes:
Sure, I was just saying that the library author can deal with it on its own -- just make several sections in the bjam file and enable/disable them depending on your current needs.
Sure, but really I'm proposing we turn that around. If the regression tester has the hardware resources to run a torture test with 3 different linking variations then they should be able to do that. As soon as Robert enables the 2.5 hour torture test regression testers might suddenly have an objection to 'author only' control.
You have a point, here.
BTW, just to pile on, wouldn't it be nice if we had testing of the sandbox libraries as well? This would really help those new libraries get ported sooner rather than later...
IMO that's asking too much. Many of them never get submitted.
Many are extensions to existing libraries under active development -- likely to get moved to final CVS. I think it would be nice for libraries coming up for review to get the benefit of the regression system. Clearly we might need to subset what gets run and clean out the old stuff, but I think this would smooth the integration of new libraries.
I agree it would be nice, but that's basically it. If we happened to have the resources to do that, good, if we don't, well, we don't.
"Basic" (supposedly what we have now) versus "drastic" (supposedly what's coming with serialization) distinction definitely makes sense. I am not arguing against this one, rather against lowering down our current standards.
I don't want to lower the current standard either. With the Basic option, however, some current libraries might define a smaller test suite speeding up the core tests. Of course, if there it is impossible to subset, then fine they could stay where they are now. Those regression sites that have the horsepower to run the torture test with all variations can still go for that option. Of course we will prefer that, but some might choose to run the torture test once per week (say over a weekend) and the regular tests during the week.
I think I'm becoming comfortable with the idea. It does complicate things, however. For instance, now the library author would have to look into twice as much reports to determine the state of her library (superseding the old "torture" results with the new "basic" ones would make the whole thing useless).
If the test takes 5 to 6 hours to run a single compiler we might lose the one contributor we have.
True, if they are forced to run the drastic test, which IMO shouldn't be the case -- it should be entirely up to the regression runner to decide when and if they have the resources to do that.
Well as soon as Robert wants to run the torture test he's going to get it at all sites if he controls it via his Jamfile. So we need some boost-wide option to define these variations. Hopefully my other email clarifies the idea.
It does, although I don't see how we can manage/afford all the combinations. -- Aleksey Gurtovoy MetaCommunications Engineering

On Sat, 22 May 2004 22:21:29 -0500, Aleksey Gurtovoy wrote
wouldn't it be nice if we had testing of the sandbox libraries as well?
I agree it would be nice, but that's basically it. If we happened to have the resources to do that, good, if we don't, well, we don't.
I agree, it's a nice to have. I was just trying to make the point that there is more that could be done...
I think I'm becoming comfortable with the idea. It does complicate things, however. For instance, now the library author would have to look into twice as much reports to determine the state of her library (superseding the old "torture" results with the new "basic" ones would make the whole thing useless).
Yeah, I agree that's an issue. Although I expect the basic results to normally be a subset of the torture results. So they could be overlayed, but then some of the results are out of date. Probably better would be to have totally different pages for the basic and torture results. Which of course makes it harder to find the results. On the other hand, we might be wringing our hands about problems that aren't important. Maybe Meta-Comm is always going to run the torture test in incremental mode while someone else just runs basic tests normally. Then before a release we could ask the testers that can to notch up to the complete test.
Well as soon as Robert wants to run the torture test he's going to get it at all sites if he controls it via his Jamfile. So we need some boost-wide option to define these variations. Hopefully my other email clarifies the idea.
It does, although I don't see how we can manage/afford all the combinations.
I agree. The combinatorics aren't good, and I even forgot to include the debug/release dimension. But this is where additional testers could help. One group could run with release builds while another runs debug builds. So the current state of affairs is that there isn't a standard set of flags to control the debug/release and linking options. So if the library author wants static linking he basically has to define static linking in the Jamfile. What I'm thinking is that control of the linking and debug/release options shouldn't have to be specified this way. The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like: VC7.1 VC7.1 VC7.1 etc release release debug dll static dll test1 Pass fail Pass test2 Pass Pass Pass ... Then basic/complete option controls the number of tests run. Jeff

"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
Well as soon as Robert wants to run the torture test he's going to get it at all sites if he controls it via his Jamfile. So we need some boost-wide option to define these variations. Hopefully my other email clarifies the idea.
It does, although I don't see how we can manage/afford all the combinations.
I agree. The combinatorics aren't good, and I even forgot to include the debug/release dimension.
Tests aren't generally designed to be run in release mode. In release mode, assertions are turned off.
But this is where additional testers could help. One group could run with release builds while another runs debug builds.
So the current state of affairs is that there isn't a standard set of flags to control the debug/release and linking options.
There certainly is, for debug/release; you stick it in the BUILD variable.
So if the library author wants static linking he basically has to define static linking in the Jamfile. What I'm thinking is that control of the linking and debug/release options shouldn't have to be specified this way.
Why do you think that? I think your whole view of the static/dynamic issue is naive. Static and dynamic objects can't always be built the same way, so that may have to be part of the Jamfile. And static/dynamic linking isn't an all-or-nothing proposition. Every library test involves at least two libraries (the one being tested and the runtime), sometimes more. There's no inherent reason some couldn't be statically linked and others dynamically linked. Furthermore, I don't see a big advantage in having a separate command-line option to choose which of those linking modes is used. If the library needs to be tested in several variants, then so be it. If it doesn't, but you'd like to see more variants sometimes, you can put some of the variants into the --torture option.
The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like:
VC7.1 VC7.1 VC7.1 etc release release debug dll static dll
test1 Pass fail Pass test2 Pass Pass Pass ...
Then basic/complete option controls the number of tests run.
That's outta hand, IMO. If it's worth having options, let's keep them simple. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, 23 May 2004 02:38:43 -0400, David Abrahams wrote
"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
I agree. The combinatorics aren't good, and I even forgot to include the debug/release dimension.
Tests aren't generally designed to be run in release mode. In release mode, assertions are turned off.
Whatever. Problems that assert finds show up in release builds anyway, they are just harder to track down. And since there are issues that only appear in release, and customers typically get release code, I tend to spend most of my test effort getting burn-time in release mode.
So the current state of affairs is that there isn't a standard set of flags to control the debug/release and linking options.
There certainly is, for debug/release; you stick it in the BUILD variable.
That's true, that one is ok. For some reason it never hit me that all the current regression tests are run in debug mode -- unless I override it i the Jamfile ;-)
So if the library author wants static linking he basically has to define static linking in the Jamfile. What I'm thinking is that control of the linking and debug/release options shouldn't have to be specified this way.
Why do you think that? I think your whole view of the static/dynamic issue is naive.
Perhaps.
Static and dynamic objects can't always be built the same way, so that may have to be part of the Jamfile.
Yes, those rules are typically provided by the library/build/Jamfile. Why would I need special rules in the test/Jamfile other than to specify my dependency on the dynamic or static library?
And static/dynamic linking isn't an all-or-nothing proposition. Every library test involves at least two libraries (the one being tested and the runtime), sometimes more. There's no inherent reason some couldn't be statically linked and others dynamically linked.
That's irrelevent. I only care about the library under test.
Furthermore, I don't see a big advantage in having a separate command-line option to choose which of those linking modes is used.
Ok, we disagree on this.
If the library needs to be tested in several variants, then so be it. If it doesn't, but you'd like to see more variants sometimes, you can put some of the variants into the --torture option.
Ok. By the way, I like your suggestion to call it --complete or perhaps better --exhaustive.
The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like:
VC7.1 VC7.1 VC7.1 etc release release debug dll static dll
test1 Pass fail Pass test2 Pass Pass Pass ...
Then basic/complete option controls the number of tests run.
That's outta hand, IMO. If it's worth having options, let's keep them simple.
Well, I think it correctly factors the dimensions of compilation options versus tests. But I think your earlier email provides an example of how something similar can be achieved by just using 'if' statements to control the rules for various linkage options, so I'm fine with baby steps. Jeff

"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
Static and dynamic objects can't always be built the same way, so that may have to be part of the Jamfile.
Yes, those rules are typically provided by the library/build/Jamfile. Why would I need special rules in the test/Jamfile
I didn't say you would. Is that what you mean when you say "_the_ Jamfile"?
other than to specify my dependency on the dynamic or static library?
And static/dynamic linking isn't an all-or-nothing proposition. Every library test involves at least two libraries (the one being tested and the runtime), sometimes more. There's no inherent reason some couldn't be statically linked and others dynamically linked.
That's irrelevent. I only care about the library under test.
OK, that makes it a little better specified. But why do you think that's a more relevant test than one that varies how the runtime is linked?
Furthermore, I don't see a big advantage in having a separate command-line option to choose which of those linking modes is used.
Ok, we disagree on this.
If the library needs to be tested in several variants, then so be it. If it doesn't, but you'd like to see more variants sometimes, you can put some of the variants into the --torture option.
Ok. By the way, I like your suggestion to call it --complete or perhaps better --exhaustive.
I didn't suggest that.
The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like:
VC7.1 VC7.1 VC7.1 etc release release debug dll static dll
test1 Pass fail Pass test2 Pass Pass Pass ...
Then basic/complete option controls the number of tests run.
That's outta hand, IMO. If it's worth having options, let's keep them simple.
Well, I think it correctly factors the dimensions of compilation options versus tests.
There are many, many more dimensions. You could select different optimization levels, for example. You could test with inlining on/off. You could test with RTTI on/off. Maybe there's an argument for the idea that complete testing is run against each of the library configurations that is installed by the top level build process, and no other ones... -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

On Sun, 23 May 2004 14:51:49 -0400, David Abrahams wrote
"Jeff Garland" <jeff@crystalclearsoftware.com> writes:
Yes, those rules are typically provided by the library/build/Jamfile. Why would I need special rules in the test/Jamfile
I didn't say you would. Is that what you mean when you say "_the_ Jamfile"?
Yes, I was talking about the test/Jamfile.
That's irrelevent. I only care about the library under test.
OK, that makes it a little better specified. But why do you think that's a more relevant test than one that varies how the runtime is linked?
Because I assume that the runtime libraries are already tested and stable and the focus is on the various incarnations of the library under test. However, I do concede your point. To be exhaustive, linking different runtimes is required to test all the interactions. Which, of course, increases the number of options yet again...
Ok. By the way, I like your suggestion to call it --complete or perhaps better --exhaustive.
I didn't suggest that.
Sorry for the incorrect attribution -- too much email.
The same set of tests should be runnable by the regression tester and hence associated with the column in the results table. So something like:
VC7.1 VC7.1 VC7.1 etc release release debug dll static dll
test1 Pass fail Pass test2 Pass Pass Pass ...
Then basic/complete option controls the number of tests run.
That's outta hand, IMO. If it's worth having options, let's keep them simple.
Well, I think it correctly factors the dimensions of compilation options versus tests.
There are many, many more dimensions. You could select different optimization levels, for example. You could test with inlining on/off. You could test with RTTI on/off.
Now that's outta hand ;-) I agree that there is an almost infinite potential set of options. I believe the set I'm suggesting hits a broad cross-section of needs, but I'd be happy to see others step forward with different test variations if they have a need.
Maybe there's an argument for the idea that complete testing is run against each of the library configurations that is installed by the top level build process, and no other ones...
That sounds like a reasonable approach to me. Jeff

"Aleksey Gurtovoy" <agurtovoy@meta-comm.com> writes:
I think I'm becoming comfortable with the idea. It does complicate things, however. For instance, now the library author would have to look into twice as much reports to determine the state of her library (superseding the old "torture" results with the new "basic" ones would make the whole thing useless).
A really slick system would integrate weekly and daily results into one table, but as you said, it gets complicated somewhere. In this case, it's in the results processing. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com

For *nix systems, there is a shell script that is pretty much self-explanatory:
http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests...
Couple things: 1) Are all the *nix regression testers checking out anonymous and is there still a 24/48 hour delay on sourceforge? Or have they modified to script? 2) The script has step 6 which finishes with generating the html table. There must be step 7 to upload the results. Is there guidance on this? Thx, Jeff

On Sat, 22 May 2004 14:14:04 -0700, Jeff Garland wrote
For *nix systems, there is a shell script that is pretty much self-explanatory:
http://cvs.sourceforge.net/viewcvs.py/boost/boost/tools/regression/run_tests...
Couple things: 1) Are all the *nix regression testers checking out anonymous and is there still a 24/48 hour delay on sourceforge? Or have they modified to script?
Replies to self... Nevermind, I think I see how the checkout is handled. They just check out a tree by hand for the first run...
2) The script has step 6 which finishes with generating the html table. There must be step 7 to upload the results. Is there guidance on this?
Thx,
Jeff _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jeff Garland wrote:
linux 4 days
This is due to me being offline for some time (as announced when a possible release date was being discussed) I'll be able to resume daily testing in July. In the meantime, at least weekly results are available. Regards, m

Robert Ramey writes:
Dave Abrahams wrote:
maybe we need to move to a different model wherein the test library's own tests are run on a CVS branch of the code (?) so that Gennadiy can see and deal with his problems before they are merged into the main trunk and break everything else?
Clearly, test the test (meta-testing ?) is a special category. I needs to be staged to be tested itself before being used to test other stuff.
I believe boost testing is going to be an issue in the near future do the fact that testing time is getting longer and longer and longer.
I believe we will have to move to an "on demand" model for most testing while reserving "total coverage" testing for just prior to release.
I don't agree. As a developer, I want to see the breakage as early as possible, and "no continuos testing" model would prevent me from that. The last thing I want is to deal with accumulated failures when I wasn't expecting it. IMO the asnwer to a long testing cycle is a) incremental cylces, with a full rebuild once a week or something similar (here at Meta, for instance, are currently doing full rebuild on every cycle); b) distributed testing of libraries, with the following merging of results into a single report. -- Aleksey Gurtovoy MetaCommunications Engineering
participants (5)
-
Aleksey Gurtovoy
-
David Abrahams
-
Jeff Garland
-
Martin Wille
-
Robert Ramey