Re: [boost] Re: [1.33.0] Let's start preparations...

8 Mar 2005


      Aleksey Gurtovoy wrote:
...
Martin Wille writes:
...
...
The people involved in creating the test procedure have put very
much effort in it and the resulting system does its job nicely when
it happens to work correctly. However, apparently, the overall
complexity of the testing procedure has grown above our management
capabilities.
Honestly, I don't see from what you conclude that, less how it's
apparent. Having said that...
- many reports of "something doesn't work right", often related to 
post-processing. Less than optimal responses on those. We all do 
understand that you and Misha are under time constraints and therefor 
aren't able to answer immediately. Having only two people who are able 
to fix these things is one small part of our problems. The fact that 
people do not know who would be responsible for finding out what part of 
the testing procedure is going wrong seems to indicate a management problem.

- bugs suddenly go away and people involved in tracking them down do not 
understand what was causing them. This kind of problem is probably 
related to the build system. I consider this one fairly dangerous, actually.

- We're not really able to tell when a bug started to get reported.
...
...
Maybe, we should take a step back and collect all the issues we have
and all knowledge about what is causing these issues.
... this is a good idea. Making the issues visible definitely helps in
keeping track of where we are and what still needs to be done, and
quite possibly in soliciting resources to resolve them.
...
I'll make a start, I hope others will contribute to the list.
Issues and causes unordered (please, excuse any duplicates):
I'll comment on the ones I have something to say about.
...
- testing takes a huge amount of resources (HD, CPU, RAM, people
 operating the test systems, people operating the result rendering
 systems, people coding the test post processing tools, people
 finding the bugs in the testing system)
True. It's also a very general observation. Don't see how having it
here helps us.
I'm under the impression some people did not know how much resources 
testing actually costs. I've seen reactions of surprise when I mentioned 
the CPU time, HD space or RAM consumed by the tests. Pleas for splitting 
test cases were ignored (e.g. random_test).
...
...
- the testing procedure is complex
Internally, yes. The main complexity and _the_ source of fragility
lies in "bjam results to XML" stage of processing. I'd say it's one of
the top 10 issues by solving which we can substantially simplify
everybody's life.
I agree. This processing step has to deal with the build system (which 
in complex itself) and with different compiler output. Other complexity 
probably stems from having to collect and to display test results that 
reflect different cvs checkout times.
...
...
- the code-change to result-rendering process takes too long
Not anymore. In any case, there is nothing in the used technology
(XSLT) that would make this an inherent bottleneck. It became one
because the original implementation of the reporting tools just
wasn't written for the volume of the processed data the tools are
asked to handle nowdays.
*This* step might be a lot fast now (congrats, this is a *big* 
improvement). However, there still are other factors which make the 
code-change to result rendering process take too long.
...
...
- bugs in the testing procedure take too long to get fixed
I think all I can say on this one is said here -- 
http://article.gmane.org/gmane.comp.lib.boost.devel/119341.
I'm not trying to imply Misha or you wouldn't do enough. However, the 
fact that only two people have the knowledge and the access to the 
result collection stage of the testing process is a problem in itself.
...
...
- incremental testing doesn't work flawlessly
That's IMO another "top 10" issue that hurts a lot.
...
- deleting tests requires manual purging of old results in an
 incremental testing environment.
Just an example of the above, IMO.
Right. However, it's one of the more difficult problems to solve. The 
build system would have to be expanded to make it delete results for 
tests which don't exist anymore.
...
...
- lousy performance of Sourceforge
- resource limitations at Sourceforge (e.g. the number of files there)
This doesn't hurt us anymore, does it?
It hurts everytime the result collecting stage doesn't work correctly.
We're not able to generate our own XML results and to upload them due to 
the SF resource limits.
...
...
- test results aren't easily reproducible. They depend much on the
 components on the respective testing systems (e.g. glibc version,
 system compiler version, python version, kernel version and even on
 the processor used on Linux)
True. There is much we can do about it, though, is it?
You're probably right. However, I wanted to mention this point, because 
someone might have an idea how to address it. I guess it boils down to 
needing more testers in order to see more flavours of similar environments.
...
...
- becoming a new contributor for testing resources is too difficult.
I don't think it's true anymore. How simplier it can become --
http://www.meta-comm.com/engineering/regression_setup/instructions.html?
Hmm, recent traffic on the testing reflector seemed to indicate it isn't 
too simple. This might be caused by problems with the build system.
...
...
- we're supporting compilers that compile languages significantly
 different from C++.
Meaning significantly non-conforming compilers or something else?
Yes, significantly non-conforming compilers.
...
...
- post-release displaying of test results apparently takes too much
 effort. Otherwise, it would have been done.
Huh? The were on the website (and still are) the day the release was
announced. See
http://www.meta-comm.com/engineering/boost-regression/1_32_0/developer/summa...
Well, I take that back then. However, this URL seems not to be well 
known. Not a problem then.
...
...
- some library maintainers feel the need to run their own tests
 regularly. Ideally, this shouldn't be necessary.
Agreed ("regularly" is a key word here). IMO the best we can do here
is to ask them to list the reasons for doing so.
One reason sure is that the test environments or the test cycles 
available are somehow unsatisfying. I would understand either. More 
testers would help here, too.
...
...
- test post processing has to work on output from different
 compilers. Naturally, that output is formatted differently.
What's the problem here?
It isn't a problem? We don't parse the output from the compilers?
...
...
- several times the post processing broke due to problems with the
 XSLT processor.
And twice as often it broke due to somebody's erroneous checkin. The
latter is IMO much more important to account for and handle to
gracefully. Most of XSLT-related problems of the past were caused by
inadequate usage, such as transformation algorithms not prepared for a
huge volume of data we are now processing.
Do you expect the recent updates to be able to handle a significantly 
higher volume? This would be big improvement. I'm asking because I had 
the impression some parts in the XSL processing used O(n^2) algorithms
(or worse). My local tests with changing the length of pathnames seemed 
to indicate that (replacing "/home/boost" with "/boost" resulted in 
significant speedup of the XSLT processor).
...
...
- there's no way of testing experimental changes to core libraries
 without causing reruns of most tests (imagine someone would want to
 test an experimental version of some part of MPL).
Do you mean running library tests only off the branch?
Yes, and running only a reduced set of tests for that if possible.
I think this would help the library maintainers.
...
...
- switching between CVS branches during release preparations takes
 additional resources and requires manual intervention.
What do you think of this one --
http://article.gmane.org/gmane.comp.lib.boost.devel/119337?
I'm with Victor on this point; for the testers (and hopefully there'll 
be more of them one day) it's significantly easier not to have to change 
anything during the release preparations. This could be achieved by 
using the CVS trunk as release branch until the actual release gets 
tagged. Development would have to continue in a branch and to be merged 
back into the branch after the release.
Ideally, the testers would be able to run the test without having to 
attend the runs. This is currently not possible.
(Just as an example: while I'm writing this I recognize that, 
apparently, I'm unable to upload test results now because of an error 
caused by one of the Python scripts: "ImportError: No module named ftp")
...
Finally, thanks for putting this together!
I hoped other people would contribute to the list; I'm sure there's a 
lot more to say about testing. E.g. it would be nice to have some sort 
of history of recent regression results. It would be nice to be able to 
split the runs vertically (running tests for a smaller set of toolsets) 
and horizontally (running tests for a smaller set of libraries) easily;
I realize, though, that presenting the results would become more difficult.


Regards,
m