Le 08/10/15 19:46, Bjørn Roald a écrit :
On 04 Oct 2015, at 14:49, Raffi Enficiaud
wrote: Le 04/10/15 13:38, John Maddock a écrit :
On 04/10/2015 12:09, Bjorn Reese wrote:
As many others have said, Boost.Test is "special" in that the majority of Boost's tests depend on it. Even breakages in develop are extremely painful in that they effectively halt progress for any Boost library which uses Test for testing.
This sort of problem has been discussed before on this list without any real progress. I think a solution to this is needed to allow boost tools maintainers (boost.test is also a tool), similar services that library maintainers enjoy. A solution may also provide better test services for all boost developers and possibly other projects. An idea of a possible way forward providing a test_request service at boost.org/test_request is outlined below.
I think the problem are simple: - the "develop" branch is currently a soup. - the regression dashboard should be improved. I will detail those two bullets.
I would like thoughts on how useful or feasible such a service would be, these are some questions I would like to have answered;
- Will library maintainers use a boost.org/test_request service? - How valuable would it be, as compared to merging to develop and waiting for current test reports? - How much of a challenge would it be to get test runners (new and old) onboard?
As far as I can see, some libraries have testing alternatives. Some are building on Travis. Yesterday, I created a build plan on my local Atlassian Bamboo instance, running the tests on all branches of boost.test against develop, on several platforms. Obviously, "several" platforms/compilers (5) is not in the same scale as the current regression dashboard, but it is a good start. What I need now is a way to publish this information on a public place, because my Bamboo CI is on an internal network.
- How feasible is it to set up a service as outlined below based on modification of the current system for regression testing in boost?
I think if we want to reuse or build upon the current system, it is hard and limiting.
- What alternatives exist providing same kind of, or better value to the community, hopefully with less effort? E.g.: can Jenkins or other such test dashboards / frameworks easily be configured to provide the flexibility and features needed here?
I think that what you propose is well covered by already existing tools in the industry. For instance, having a look to Atlassian Bamboo might be a good start: - it's **free for open source projects** - it's compiling/testing **one** specific version across many runners, so we have a clear status on one version. The dashboard is currently showing many different versions. - builds can be manually triggered or triggered on events: eg. change on core libraries, change on one specific library, scheduled (nightly) - it's trivial to set up, we can also have many different targets (continuous, stable, release candidate, etc). It has an extensive way of expressing a build in small jobs (can be just a script). - it understands git and submodules: one version is checked out on the central server, and dispatches on all runners. Runners can fully cache the git repository locally to lower the traffic and update time. - it provides metrics on the tests/compilations: this would then be used for release managers to make appropriate decisions on what would be the next stable version to build/test against. - it understands branches, and can automatically fork the build on new branches: it is then easy to test topic branches on several runners. - it maintains an history of the build/test sessions (configurable) that allow us to go back in time readily to check what happened. - it has a very nice interface - it can dispatch build/test based on requirements on the runners: instead of making a run on all available runners, you express the build as having requirements such as Windows+VS2008, Clang6+OSX10.9, etc. The load is also dispatched on runners. - it's Java based, available as soon as there is a Java VM for a platform. - etc etc. The only thing I do not think it addresses today is the asynchronism of the current runner setup: in the current setup, the runners may or not be available and provide complementary information (some of them are running once a month or so), but without being strongly synchronized on the versions of the superprojects. In the Bamboo setup, the version is the same on all runners, so if runners are not available, it is blocking the completion of the build. It's easy to address this issue by having lots of runners providing overlapping requirements though. The way I see it is: 1-/ some "continuous" frequent compilation and test is running, using a synchronized version on several runners. 2-/ based on the results (eg. increased stability, bad commit disaster, unplanned breaking change), a branch on the superproject eg. develop-stable is moved forward and pointing to a new, tested/confirmed revision of the previous stage 3-/ the current runners test against the "develop-stable", and provide information on the existing dashboard 4-/ metrics are deployed on the dashboard to see what is happening with boost during the development (number of compilation or test failure, etc). 5-/ a general policy/convention is used for master and develop: master is a public candidate, stable and tested. Develop is isolating every module/component and building against master or develop-stable (or both). For instance, boost.test[develop] builds against master (last known public version), except for boost.test which is on develop (next version). The advantages would be the following: - develop-stable moves by increment in a stable manner, less frequently and more surely than the current develop - develop-stable is already tested on several mainstream configuration, so it is an already viable test candidate for the runners. It avoids wasting resources (mostly checkout/compilation/test time, but also human: interpreting the results, this time with less results to parse ) - with "develop-stable", we have real increment of functionality: every step in develop-stable is an improvement on the overall boost, according to metrics universally accepted (yet to be defined). - having this scheme with the bullet 5-/ on master/develop/develop-stable allows to test the changes wrt. what was provided to the end-user (building against master) and wrt. the future release of boost (building against develop-stable). It also decouples the different potentially unstable states of the different components. - if we have a candidate on develop-stable or master that is missing some important runners, we can synchronize (humanly) with the runner maintainers to make them available for a specific version. Again less resource waste, better responsiveness. The shortcomings are: - having a develop-stable does not prevent the runners from running on different versions. - someone/something has the power/decision of moving develop-stable to a new version. - triggers more builds (has to be tampered though, a build on eg. boost.test would happen only if boost.test[develop] changes). What is lacking now: - a clear stable development branch at the superproject level. The superproject is an **integration** project of many components, and should be used to test the integration of versions of its components (if they are playing well together). As I said, the current develop branch is a soup, where all the coupling we want to avoid are happening. - a way to have a quick feedback on each of the components, against a stable state. Quick also means less runners, available 95% of the time. - a dashboard summarizing much better the information, keeping an history based on versions, and providing good metrics for evaluating the quality of the integration As a side note, I created a build plan with Bamboo for boost.test, testing all the branches of boost.test against boost[develop]. This is quite easy to do. An example of log is here: http://pastebin.com/raw.php?i=4aGPnD1a Build+test of boost.test took 12min on a windows runner, including checkout, b2 construction and b2 header. Raffi