Re: [boost] boost.test regression or behavior change (was Re: Boost.lockfree)

9 Oct 2015

      Le 08/10/15 19:46, Bjørn Roald a écrit :
...
...
On 04 Oct 2015, at 14:49, Raffi Enficiaud <raffi.enficiaud@mines-paris.org> wrote:
Le 04/10/15 13:38, John Maddock a écrit :
...
On 04/10/2015 12:09, Bjorn Reese wrote:
As many others have said, Boost.Test is "special" in that the majority
of Boost's tests depend on it.  Even breakages in develop are extremely
painful in that they effectively halt progress for any Boost library
which uses Test for testing.
This sort of problem has been discussed before on this list without
any real progress. I think a solution to this is needed to allow boost
tools maintainers (boost.test is also a tool), similar services that
library maintainers enjoy. A solution may also provide better test
services for all boost developers and possibly other projects. An idea
of a possible way forward providing a test_request service at
boost.org/test_request is outlined below.
I think the problem are simple:
- the "develop" branch is currently a soup.
- the regression dashboard should be improved.

I will detail those two bullets.
...
I would like thoughts on how useful or feasible such a service would
be, these are some questions I would like to have answered;
- Will library maintainers use a boost.org/test_request service? -
How valuable would it be, as compared to merging to develop and
waiting for current test reports? - How much of a challenge would it
be to get test runners (new and
old) onboard?
As far as I can see, some libraries have testing alternatives. Some are 
building on Travis. Yesterday, I created a build plan on my local 
Atlassian Bamboo instance, running the tests on all branches of 
boost.test against develop, on several platforms. Obviously, "several" 
platforms/compilers (5) is not in the same scale as the current 
regression dashboard, but it is a good start.
What I need now is a way to publish this information on a public place, 
because my Bamboo CI is on an internal network.
...
- How feasible is it to set up a service as outlined below based on
modification
of the current system for regression testing in boost?
I think if we want to reuse or build upon the current system, it is hard 
and limiting.
...
- What
alternatives exist providing same kind of, or better value to
the community,
hopefully with less effort? E.g.: can Jenkins or other such test
dashboards /
frameworks easily be configured to provide the flexibility and
features needed here?
I think that what you propose is well covered by already existing tools 
in the industry.

For instance, having a look to Atlassian Bamboo might be a good start:
- it's **free for open source projects**
- it's compiling/testing **one** specific version across many runners, 
so we have a clear status on one version. The dashboard is currently 
showing many different versions.
- builds can be manually triggered or triggered on events: eg. change on 
core libraries, change on one specific library, scheduled (nightly)
- it's trivial to set up, we can also have many different targets 
(continuous, stable, release candidate, etc). It has an extensive way of 
expressing a build in small jobs (can be just a script).
- it understands git and submodules: one version is checked out on the 
central server, and dispatches on all runners. Runners can fully cache 
the git repository locally to lower the traffic and update time.
- it provides metrics on the tests/compilations: this would then be used 
for release managers to make appropriate decisions on what would be the 
next stable version to build/test against.
- it understands branches, and can automatically fork the build on new 
branches: it is then easy to test topic branches on several runners.
- it maintains an history of the build/test sessions (configurable) that 
allow us to go back in time readily to check what happened.
- it has a very nice interface
- it can dispatch build/test based on requirements on the runners: 
instead of making a run on all available runners, you express the build 
as having requirements such as Windows+VS2008, Clang6+OSX10.9, etc. The 
load is also dispatched on runners.
- it's Java based, available as soon as there is a Java VM for a platform.
- etc etc.

The only thing I do not think it addresses today is the asynchronism of 
the current runner setup: in the current setup, the runners may or not 
be available and provide complementary information (some of them are 
running once a month or so), but without being strongly synchronized on 
the versions of the superprojects. In the Bamboo setup, the version is 
the same on all runners, so if runners are not available, it is blocking 
the completion of the build. It's easy to address this issue by having 
lots of runners providing overlapping requirements though.

The way I see it is:
1-/ some "continuous" frequent compilation and test is running, using a 
synchronized version on several runners.
2-/ based on the results (eg. increased stability, bad commit disaster, 
unplanned breaking change), a branch on the superproject eg. 
develop-stable is moved forward and pointing to a new, tested/confirmed 
revision of the previous stage
3-/ the current runners test against the "develop-stable", and provide 
information on the existing dashboard
4-/ metrics are deployed on the dashboard to see what is happening with 
boost during the development (number of compilation or test failure, etc).
5-/ a general policy/convention is used for master and develop: master 
is a public candidate, stable and tested. Develop is isolating every 
module/component and building against master or develop-stable (or 
both). For instance, boost.test[develop] builds against master (last 
known public version), except for boost.test which is on develop (next 
version).

The advantages would be the following:
- develop-stable moves by increment in a stable manner, less frequently 
and more surely than the current develop
- develop-stable is already tested on several mainstream configuration, 
so it is an already viable test candidate for the runners. It avoids 
wasting resources (mostly checkout/compilation/test time, but also 
human: interpreting the results, this time with less results to parse )
- with "develop-stable", we have real increment of functionality: every 
step in develop-stable is an improvement on the overall boost, according 
to metrics universally accepted (yet to be defined).
- having this scheme with the bullet 5-/ on 
master/develop/develop-stable allows to test the changes wrt. what was 
provided to the end-user (building against master) and wrt. the future 
release of boost (building against develop-stable). It also decouples 
the different potentially unstable states of the different components.
- if we have a candidate on develop-stable or master that is missing 
some important runners, we can synchronize (humanly) with the runner 
maintainers to make them available for a specific version. Again less 
resource waste, better responsiveness.

The shortcomings are:
- having a develop-stable does not prevent the runners from running on 
different versions.
- someone/something has the power/decision of moving develop-stable to a 
new version.
- triggers more builds (has to be tampered though, a build on eg. 
boost.test would happen only if boost.test[develop] changes).

What is lacking now:
- a clear stable development branch at the superproject level. The 
superproject is an **integration** project of many components, and 
should be used to test the integration of versions of its components (if 
they are playing well together). As I said, the current develop branch 
is a soup, where all the coupling we want to avoid are happening.
- a way to have a quick feedback on each of the components, against a 
stable state. Quick also means less runners, available 95% of the time.
- a dashboard summarizing much better the information, keeping an 
history based on versions, and providing good metrics for evaluating the 
quality of the integration

As a side note, I created a build plan with Bamboo for boost.test, 
testing all the branches of boost.test against boost[develop]. This is 
quite easy to do. An example of log is here:
http://pastebin.com/raw.php?i=4aGPnD1a

Build+test of boost.test took 12min on a windows runner, including 
checkout, b2 construction and b2 header.

Raffi