Re: [boost] [testing] Refactoring serialization tests, and possibly others.

20 Sep 2005

      Basically you're correct  on all of this.

Rene Rivera wrote:
...
After having run two cycles of tests for Boost with the
mingw-3_4_2-stlport-5_0 configuration and having it take more than 14
hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization
library[*]. And reading some of the recent discussion about the desire
to expand testing to include cross-version compatibility and
cross-compiler compatibility, and hence having the number of tests
multiply possibly exponentially. I am seriously concerned that we are
going in the wrong direction when it comes to structuring tests.
This was the basis of my suggestion that we run a complete set only very 
occasionally.
...
From looking at the tests for serialization I think we are
over-testing, and we are past the point of exhausting testing
resources. Currently this library takes the approach of carpet
bombing the testing space. The current tests follow this overall
structure:
[feature tests] x [archive types] x [char/wchar] x [DLL/not-DLL]
Obviously this will never scale.
carpet bombing the test space? - I like the imagery.  When I started this 
was not a problem.  I was happy to beat it to death as I could ( and still 
do ) just run the whole suite on my machine overnight when ever I make a 
change.  However, I agree that we're about at the limit without making some 
changes.
...
My first observation is that it doesn't seem that those axis look like
independent features to me. That is, for example, the char/wchar
functionality doesn't depend on the feature getting tested, or at
least it shouldn't. And I can't imagine the library is structure
internally in that way. To me it doesn't make sense to test "array"
saving with each of the 3 archive types since the code for
serialization of the "array" is the same in all situations. Hence it
would makes more sense to me to structure the tests as:
[feature test] x [xml archive type] x [char] x [not-DLL]
[text archive tests] x [char] x [non-DLL]
[binary archive tests] x [non-DLL]
[wchar tests] x [non-DLL]
[DLL tests]
Basically it's structured to test specific aspects of the library not
to test each aspect against each other aspect. Some benefits as I see
them:
This makes a lot of sense - except that in the past it has turned out that 
some turn out to be accidently connected. Also sometimes compiler quirks 
show up in just some combinations.
...
* Reduced number of tests means faster turn around on testing.
* It's much easier to add tests for other aspects as one only has to
concentrate on a few tests instead of many likely unrelated aspects.
* The tests can be expanded to test the aspects more critically. For
example the DLL tests can be very specific as to what aspect of DLL vs
non-DLL they test.
Note the DLL version should function identially to the static library 
version - so this is an exhaustive test of that fact.
...
* It is easier to tell what parts of the library are breaking when the
tests are specific.
Hmm - that sort of presumes we know what's going to fail ahead of time.

There is another related issue.  It seems that the tests are run every 
night - even though no changes have been made at all to the serialization 
library.  In effect, we're using the serializaiton library to test other 
changes in boost.  The argument you make above can just as well be used to 
argue that serialization is on a different dimension than other libraries so 
serialization tests shouldn't be re-run just because some other library 
changes.

So there are a number of things that might be looked into

a) Reduce the combinations of the serializaton tests.
b) Don't use libraries to test other libraries.  That is don't re-test one 
library (.e.g. serialization) just because some other library that it 
depends upon (e.g. mpl) has changed.
c) Define a two separate test Jamfiles -
    i) normal test
    ii) carpet bombing mode
e) Maybe normal mode can be altered on frequent basis when I just want to 
test a new feature. or just one test.
f) Include as part of the installation instructions for users an exhaustive 
test mode.  That is a user how downloads and installs the package would have 
the option of producing the whole test results on his own platform and 
sending in his results.  This would have a couple of advantages
  i) It would be sure that all new platforms are tested
  ii) I would ensure that the user has everyting installed correctly

Robert Ramey