
[Please follow up on Boost.Testing list] Martin Wille writes:
Aleksey Gurtovoy wrote:
Martin Wille writes:
The people involved in creating the test procedure have put very much effort in it and the resulting system does its job nicely when it happens to work correctly. However, apparently, the overall complexity of the testing procedure has grown above our management capabilities. Honestly, I don't see from what you conclude that, less how it's apparent. Having said that...
- many reports of "something doesn't work right", often related to post-processing.
And in almost all cases "something doesn't work right" usually ended up being a temporary breakage caused either by newly implemented functionality in the regression tools' chain / internal environment changes on our side, or malfunctioning directly related to inremental runs/jam log parsing. The only thing the former cases indicate is that the tools are being worked on and only _possibly_ that people doing the work taking somewhat more risks at breaking things than, say, during the release. In any case, this by no means indicates loss of control -- quite the opposite. The latter cases, as we all agree, _are_ tips of the seriously hurting issues that needs to be resolved ASAP. Yet it's nothing new.
Less than optimal responses on those.
Well, I disagree with this particular angle of looking at the situation. Given the history of the recent issues which _I_ would classify as suboptimally resolved/responded to, for me the above statement is equivalent to saying: "Something didn't work right recently and it seemed like it might as well be the problem be on the reporting side -- I'd expect the corresponding maintainers to look at it proactively and sort things out". Needless to say I don't consider this to be neither fair nor productive way of looking at things.
We all do understand that you and Misha are under time constraints and therefor aren't able to answer immediately. Having only two people who are able to fix these things is one small part of our problems. The fact that people do not know who would be responsible for finding out what part of the testing procedure is going wrong seems to indicate a management problem.
IMO the problem is not that people don't know who is responsible (in fact, assigning a single person to be responsible is going to bring us back to square one) but rather that nobody steps up and says "I'll research this and report back" -- in a timely manner, that is. Is it a management problem? Rather lack of resources, I think.
- bugs suddenly go away and people involved in tracking them down do not understand what was causing them. This kind of problem is probably related to the build system. I consider this one fairly dangerous, actually.
Same here. Yet again, we've been having these problems from the day one. If your point is that it's time to solve them, I agree 100%.
- We're not really able to tell when a bug started to get reported.
I'm not sure I understand this one. Could you please provide an example?
I'll make a start, I hope others will contribute to the list. Issues and causes unordered (please, excuse any duplicates): I'll comment on the ones I have something to say about.
- testing takes a huge amount of resources (HD, CPU, RAM, people operating the test systems, people operating the result rendering systems, people coding the test post processing tools, people finding the bugs in the testing system) True. It's also a very general observation. Don't see how having it here helps us.
I'm under the impression some people did not know how much resources testing actually costs. I've seen reactions of surprise when I mentioned the CPU time, HD space or RAM consumed by the tests. Pleas for splitting test cases were ignored (e.g. random_test).
OK.
- the testing procedure is complex Internally, yes. The main complexity and _the_ source of fragility lies in "bjam results to XML" stage of processing. I'd say it's one of the top 10 issues by solving which we can substantially simplify everybody's life.
I agree. This processing step has to deal with the build system (which in complex itself) and with different compiler output. Other complexity probably stems from having to collect and to display test results that reflect different cvs checkout times.
It it really a problem nowdays? I think we have timestamps in every possible place and they make things pretty obvious.
- the code-change to result-rendering process takes too long Not anymore. In any case, there is nothing in the used technology (XSLT) that would make this an inherent bottleneck. It became one because the original implementation of the reporting tools just wasn't written for the volume of the processed data the tools are asked to handle nowdays.
*This* step might be a lot fast now (congrats, this is a *big* improvement). However, there still are other factors which make the code-change to result rendering process take too long.
I think the asnwer to this is further splitting of work among distributed machines.
- bugs in the testing procedure take too long to get fixed I think all I can say on this one is said here -- http://article.gmane.org/gmane.comp.lib.boost.devel/119341.
I'm not trying to imply Misha or you wouldn't do enough. However, the fact that only two people have the knowledge and the access to the result collection stage of the testing process is a problem in itself.
It is. Anybody who feels interested enough to be filled in on this is more than welcome to join. [...]
- lousy performance of Sourceforge - resource limitations at Sourceforge (e.g. the number of files there) This doesn't hurt us anymore, does it?
It hurts everytime the result collecting stage doesn't work correctly. We're not able to generate our own XML results and to upload them due to the SF resource limits.
I'd say we just need a backup results-processing site.
- becoming a new contributor for testing resources is too difficult. I don't think it's true anymore. How simplier it can become -- http://www.meta-comm.com/engineering/regression_setup/instructions.html?
Hmm, recent traffic on the testing reflector seemed to indicate it isn't too simple. This might be caused by problems with the build system.
If you are talking about CodeWarrior on OS X saga, then it is more build system-related than anything else. [...]
- test post processing has to work on output from different compilers. Naturally, that output is formatted differently. What's the problem here?
It isn't a problem? We don't parse the output from the compilers?
Oh, I thought you were referring to something else. Yes, as we've agreed before, the need to post-process the output is probably the biggest source of problems.
- several times the post processing broke due to problems with the XSLT processor. And twice as often it broke due to somebody's erroneous checkin. The latter is IMO much more important to account for and handle to gracefully. Most of XSLT-related problems of the past were caused by inadequate usage, such as transformation algorithms not prepared for a huge volume of data we are now processing.
Do you expect the recent updates to be able to handle a significantly higher volume?
Yes, and we have implemented only the most obvious optimizations. If there is further need to speed up things, we'll speed them up.
additional resources and requires manual intervention. What do you think of this one -- http://article.gmane.org/gmane.comp.lib.boost.devel/119337?
I'm with Victor on this point; for the testers (and hopefully there'll be more of them one day) it's significantly easier not to have to change anything during the release preparations. This could be achieved by using the CVS trunk as release branch until the actual release gets tagged.
What about the tarballs, though?
I hoped other people would contribute to the list; I'm sure there's a lot more to say about testing. E.g. it would be nice to have some sort of history of recent regression results.
Its' on our TODO list -- http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Boost.Testing.
It would be nice to be able a to split the runs vertically (running tests for a smaller set of toolsets)
Aren't this possible now?
and horizontally (running tests for a smaller set of libraries) easily;
Agreed.
I realize, though, that presenting the results would become more difficult.
Nothing we can't figure out. -- Aleksey Gurtovoy MetaCommunications Engineering