
David Abrahams wrote:
"Victor A. Wagner Jr." <vawjr@rudbek.com> writes:
At Sunday 2005-03-06 18:52, you wrote:
Let's start revving up to release Boost 1.33.0. Personally, I'd like to get it out the door by mid-April at the latest, and I'm offering to manage this release.
thank you for your offer, but if you don't get the damned regression
Please keep your language civil.
testing working FIRST (it's been non-responsive
Can you please be more specific about what has been non-responsive? I doubt anyone can fix anything without more information.
Whatever tone might be appropriate or not ... Several testers have raised issues and plead for better communication several (probably many) times. Most of the time, we seem to get ignored, unfortunately. I don't want to accuse anyone of voluntarily neglecting our concerns. However, I think we apparently suffer from a "testing is not too well understood" problem at several levels. The tool chain employed for testing is very complex (due to the diversity of compilers and operation systems involved) and too fragile. Complexity leads to lack of understanding (among the testers and among the library developers) and to false assumptions and to lack of communication. It additionally causes long delays between changing code and running the tests and between running the tests and the result being rendered. This in turn makes isolating bugs in the libraries more difficult. Fragility leads to the testing procedure breaking often and to breaking without getting noticed for some time and to breaking without anyone being able to recognize immediately exactly what part broke. This is a very unpleasant situation for anyone involved and it causes a significant level of frustration at least among those who run the tests (e.g. to see the own test results not being rendered for severals days or to see the test system being abused as a change announcement system isn't exactly motivating). Please, understand that a lot of resources (human and computers) are wasted due to these problems. This waste is most apparent those who run the tests. However, most of the time, issues raised by the testers seemed to get ignored. Maybe, that was just because we didn't yell loud enough or we didn't know whom to address or how to fix the problems. Personally, I don't have any problem with the words Victor chose. Other people might have. If you're one of them, then please understand that we're feeling there's something going very wrong with the testing procedure and we're afraid it will go on that way and we'll lose a lot of the quality (and the reputation) Boost has. The people involved in creating the test procedure have put very much effort in it and the resulting system does its job nicely when it happens to work correctly. However, apparently, the overall complexity of the testing procedure has grown above our management capabilities. This is one reason why release preparations take so long. Maybe, we should take a step back and collect all the issues we have and all knowledge about what is causing these issues. I'll make a start, I hope others will contribute to the list. Issues and causes unordered (please, excuse any duplicates): - testing takes a huge amount of resources (HD, CPU, RAM, people operating the test systems, people operating the result rendering systems, people coding the test post processing tools, people finding the bugs in the testing system) - the testing procedure is complex - the testing procedure is fragile - the code-change to result-rendering process takes too long - bugs in the testing procedure take too long to get fixed - changes to code that will affect the testing procedure aren't communicated well - incremental testing doesn't work flawlessly - deleting tests requires manual purging of old results in an incremental testing environment. - the number of target systems for testing is rather low; this results in questionable portability. - lousy performance of Sourceforge - resource limitations at Sourceforge (e.g. the number of files there) - between releases the testing system isn't as well maintained as during the release preparations. - test results aren't easily reproducible. They depend much on the components on the respective testing systems (e.g. glibc version, system compiler version, python version, kernel version and even on the processor used on Linux) - library maintainers don't have access to the testing systems; this results in longer test-fix cycles. - changes which will cause heavy load at the testing sites never get announced in advance. This is a problem when testing resources have to be shared with the normal workload (like in my case). - changes that requires old test results to get purged usually don't get announced. - becoming a new contributor for testing resources is too difficult. - we're supporting compilers that compile languages significantly different from C++. - there's no common concept of which compilers to support and which not. - post-release displaying of test results apparently takes too much effort. Otherwise, it would have been done. - tests are run for compilers for which they are known to fail. 100% waste of resources here. - known-to-fail tests are rerun although the dependencies didn't change. - some tests are insanely big. - some library maintainers feel the need to run their own tests regularly. Ideally, this shouldn't be necessary. - test post processing has to work on output from different compilers. Naturally, that output is formatted differently. - test post processing makes use of very recent XSLT features. - several times the post processing broke due to problems with the XSLT processor. - XSLT processing takes long (merging all the components that are input to the result rendering takes ~1 hour just for the tests I run) - the number of tests is growing - there's no way of testing experimental changes to core libraries without causing reruns of most tests (imagine someone would want to test an experimental version of some part of MPL). - switching between CVS branches during release preparations takes additional resources and requires manual intervention. I'm sure testers and library developers are able to add a lot more to the list. Regards, m