
There are a great many things that could (and should) be discussed with respect to the boost infrastructure, as well as the development process. This is about testing, though, so I'd like to restrict my arguments to that as much as possible. I hear (and share) various complaints about the existing testing procedure: * Test runs require a lot of resources. * Test runs take a lot of time. * There is no clear (visual) association between test results and code revisions. * There are (sometimes) multiple test runs for the same platform, so the absolute number of failures has no meaning (worse, two test runs for the same platform may result in differing outcomes, because some environment variables differ and are not accounted for). Now let me contrast that to some utopic boost testing harness with the following characteristics: * The boost repository stores code, as well as a description of platforms, configurations that the code should be tested on. * The overal space of tests is chunked by some local harness into small-scale test suites, accessible for volunteers to run. * Contributors subscribe by providing some well controlled environment in which such test suites can run. The whole works somewhat similar to seti@home (say), i.e. users merely install some 'slave' that then contacts the master to retrieve individual tasks, sending back results as they are ready. * The master harness then collects results, generates reports, and otherwise postprocesses the incoming data. For example, individual slaves may be associated with some confidence ('trust') about the validity of the results (after all, there is always that last bit of uncontrolled environment potentially affecting test runs...) What does it take to get there ? I think there are different paths to pursue, more or less independently. 1) The test run procedure should be made more and more autonomous, requiring less hand-holding by the user. The less parameters there are for users to set, the less error-prone (or at least, subject of interpretation) the results become. This also implies a much enhanced facility to report platform characteristics from the user's platform as part of the test run results. (In fact, this should be reported upfront, as these data determine what part of the mosaic the slave will actually execute.) 2) The smaller tasks, as well as the more convenient handling, should increase parallelism, leading to a shorter turn-around. That, together with better annotation should allow the report generator to more correctly associate test results with code versions, helping developers to better understand what changeset a regression relates to. I think that a good tool to use for 1) is buildbot (http://buildbot.net/trac). It allows to formalize the build process. The only remaining unknown is the environment seen by the buildslaves when they are started. However, a) all environment variables are reported, and b) we can encapsulate the slave startup further to control the environmental variables to be seen by the build process. As far as the size of tasks (test suites) is concerned, this question is related to the discussion concerning modularity. Individual test runs should at most run a single toolchain on a single library, but may be even less (a single build variant, say). Keeping modularity at that level also allows to parametrize test sub-suites. For example, the boost.python testsuite may need to be tested against different python versions, while boost.mpi needs to be tested against different MPI backends / versions. Etc. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...