
Basically you're correct on all of this. Rene Rivera wrote:
After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*]. And reading some of the recent discussion about the desire to expand testing to include cross-version compatibility and cross-compiler compatibility, and hence having the number of tests multiply possibly exponentially. I am seriously concerned that we are going in the wrong direction when it comes to structuring tests.
This was the basis of my suggestion that we run a complete set only very occasionally.
From looking at the tests for serialization I think we are over-testing, and we are past the point of exhausting testing resources. Currently this library takes the approach of carpet bombing the testing space. The current tests follow this overall structure:
[feature tests] x [archive types] x [char/wchar] x [DLL/not-DLL]
Obviously this will never scale.
carpet bombing the test space? - I like the imagery. When I started this was not a problem. I was happy to beat it to death as I could ( and still do ) just run the whole suite on my machine overnight when ever I make a change. However, I agree that we're about at the limit without making some changes.
My first observation is that it doesn't seem that those axis look like independent features to me. That is, for example, the char/wchar functionality doesn't depend on the feature getting tested, or at least it shouldn't. And I can't imagine the library is structure internally in that way. To me it doesn't make sense to test "array" saving with each of the 3 archive types since the code for serialization of the "array" is the same in all situations. Hence it would makes more sense to me to structure the tests as:
[feature test] x [xml archive type] x [char] x [not-DLL]
[text archive tests] x [char] x [non-DLL]
[binary archive tests] x [non-DLL]
[wchar tests] x [non-DLL]
[DLL tests]
Basically it's structured to test specific aspects of the library not to test each aspect against each other aspect. Some benefits as I see them:
This makes a lot of sense - except that in the past it has turned out that some turn out to be accidently connected. Also sometimes compiler quirks show up in just some combinations.
* Reduced number of tests means faster turn around on testing. * It's much easier to add tests for other aspects as one only has to concentrate on a few tests instead of many likely unrelated aspects. * The tests can be expanded to test the aspects more critically. For example the DLL tests can be very specific as to what aspect of DLL vs non-DLL they test.
Note the DLL version should function identially to the static library version - so this is an exhaustive test of that fact.
* It is easier to tell what parts of the library are breaking when the tests are specific.
Hmm - that sort of presumes we know what's going to fail ahead of time. There is another related issue. It seems that the tests are run every night - even though no changes have been made at all to the serialization library. In effect, we're using the serializaiton library to test other changes in boost. The argument you make above can just as well be used to argue that serialization is on a different dimension than other libraries so serialization tests shouldn't be re-run just because some other library changes. So there are a number of things that might be looked into a) Reduce the combinations of the serializaton tests. b) Don't use libraries to test other libraries. That is don't re-test one library (.e.g. serialization) just because some other library that it depends upon (e.g. mpl) has changed. c) Define a two separate test Jamfiles - i) normal test ii) carpet bombing mode e) Maybe normal mode can be altered on frequent basis when I just want to test a new feature. or just one test. f) Include as part of the installation instructions for users an exhaustive test mode. That is a user how downloads and installs the package would have the option of producing the whole test results on his own platform and sending in his results. This would have a couple of advantages i) It would be sure that all new platforms are tested ii) I would ensure that the user has everyting installed correctly Robert Ramey