[boost] Re: [1.33.0] Let's start preparations...

9 Mar 2005


      [Please follow up on Boost.Testing list]

Martin Wille writes:
...
Aleksey Gurtovoy wrote:
...
Martin Wille writes:
...
...
The people involved in creating the test procedure have put very
much effort in it and the resulting system does its job nicely when
it happens to work correctly. However, apparently, the overall
complexity of the testing procedure has grown above our management
capabilities.
Honestly, I don't see from what you conclude that, less how it's
apparent. Having said that...
- many reports of "something doesn't work right", often related to
  post-processing.
And in almost all cases "something doesn't work right" usually ended
up being a temporary breakage caused either by newly implemented
functionality in the regression tools' chain / internal environment
changes on our side, or malfunctioning directly related to inremental
runs/jam log parsing. The only thing the former cases indicate is that
the tools are being worked on and only _possibly_ that people doing
the work taking somewhat more risks at breaking things than, say,
during the release. In any case, this by no means indicates loss of
control -- quite the opposite. The latter cases, as we all agree,
_are_ tips of the seriously hurting issues that needs to be resolved
ASAP. Yet it's nothing new.
...
Less than optimal responses on those.
Well, I disagree with this particular angle of looking at the
situation. Given the history of the recent issues which _I_ would
classify as suboptimally resolved/responded to, for me the above
statement is equivalent to saying: "Something didn't work right
recently and it seemed like it might as well be the problem be on the
reporting side -- I'd expect the corresponding maintainers to look at
it proactively and sort things out". Needless to say I don't consider
this to be neither fair nor productive way of looking at things.
...
We all do
  understand that you and Misha are under time constraints and
  therefor aren't able to answer immediately. Having only two people
  who are able to fix these things is one small part of our
  problems. The fact that people do not know who would be responsible
  for finding out what part of the testing procedure is going wrong
  seems to indicate a management problem.
IMO the problem is not that people don't know who is responsible (in
fact, assigning a single person to be responsible is going to bring us
back to square one) but rather that nobody steps up and says
"I'll research this and report back" -- in a timely manner, that is. 
Is it a management problem? Rather lack of resources, I think.
...
- bugs suddenly go away and people involved in tracking them down do
  not understand what was causing them. This kind of problem is
  probably related to the build system. I consider this one fairly
  dangerous, actually.
Same here. Yet again, we've been having these problems from the day
one. If your point is that it's time to solve them, I agree 100%.
...
- We're not really able to tell when a bug started to get reported.
I'm not sure I understand this one. Could you please provide an
example?
...
...
...
I'll make a start, I hope others will contribute to the list.
Issues and causes unordered (please, excuse any duplicates):
I'll comment on the ones I have something to say about.
...
- testing takes a huge amount of resources (HD, CPU, RAM, people
 operating the test systems, people operating the result rendering
 systems, people coding the test post processing tools, people
 finding the bugs in the testing system)
True. It's also a very general observation. Don't see how having it
here helps us.
I'm under the impression some people did not know how much resources
testing actually costs. I've seen reactions of surprise when I
mentioned the CPU time, HD space or RAM consumed by the tests. Pleas
for splitting test cases were ignored (e.g. random_test).
OK.
...
...
...
- the testing procedure is complex
Internally, yes. The main complexity and _the_ source of fragility
lies in "bjam results to XML" stage of processing. I'd say it's one of
the top 10 issues by solving which we can substantially simplify
everybody's life.
I agree. This processing step has to deal with the build system (which
in complex itself) and with different compiler output. Other
complexity probably stems from having to collect and to display test
results that reflect different cvs checkout times.
It it really a problem nowdays? I think we have timestamps in every
possible place and they make things pretty obvious.
...
...
...
- the code-change to result-rendering process takes too long
Not anymore. In any case, there is nothing in the used technology
(XSLT) that would make this an inherent bottleneck. It became one
because the original implementation of the reporting tools just
wasn't written for the volume of the processed data the tools are
asked to handle nowdays.
*This* step might be a lot fast now (congrats, this is a *big*
 improvement). However, there still are other factors which make the
 code-change to result rendering process take too long.
I think the asnwer to this is further splitting of work among
distributed machines.
...
...
...
- bugs in the testing procedure take too long to get fixed
I think all I can say on this one is said here -- 
http://article.gmane.org/gmane.comp.lib.boost.devel/119341.
I'm not trying to imply Misha or you wouldn't do enough. However,
the fact that only two people have the knowledge and the access to
the result collection stage of the testing process is a problem in
itself.
It is. Anybody who feels interested enough to be filled in on this is
more than welcome to join.

[...]
...
...
...
- lousy performance of Sourceforge
- resource limitations at Sourceforge (e.g. the number of files there)
This doesn't hurt us anymore, does it?
It hurts everytime the result collecting stage doesn't work correctly.
We're not able to generate our own XML results and to upload them due
to the SF resource limits.
I'd say we just need a backup results-processing site.
...
...
...
- becoming a new contributor for testing resources is too difficult.
I don't think it's true anymore. How simplier it can become --
http://www.meta-comm.com/engineering/regression_setup/instructions.html?
Hmm, recent traffic on the testing reflector seemed to indicate it
isn't too simple. This might be caused by problems with the build
system.
If you are talking about CodeWarrior on OS X saga, then it is more
build system-related than anything else.


[...]
...
...
...
- test post processing has to work on output from different
 compilers. Naturally, that output is formatted differently.
What's the problem here?
It isn't a problem? We don't parse the output from the compilers?
Oh, I thought you were referring to something else. Yes, as we've
agreed before, the need to post-process the output is probably the
biggest source of problems.
...
...
...
- several times the post processing broke due to problems with the
 XSLT processor.
And twice as often it broke due to somebody's erroneous checkin. The
latter is IMO much more important to account for and handle to
gracefully. Most of XSLT-related problems of the past were caused by
inadequate usage, such as transformation algorithms not prepared for a
huge volume of data we are now processing.
Do you expect the recent updates to be able to handle a significantly
higher volume?
Yes, and we have implemented only the most obvious optimizations. If
there is further need to speed up things, we'll speed them up.
...
...
...
additional resources and requires manual intervention.
What do you think of this one --
http://article.gmane.org/gmane.comp.lib.boost.devel/119337?
I'm with Victor on this point; for the testers (and hopefully there'll
be more of them one day) it's significantly easier not to have to
change anything during the release preparations. This could be
achieved by using the CVS trunk as release branch until the actual
release gets tagged.
What about the tarballs, though?
...
I hoped other people would contribute to the list; I'm sure there's a
lot more to say about testing. E.g. it would be nice to have some sort
of history of recent regression results.
Its' on our TODO list --
http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?Boost.Testing.
...
It would be nice to be able a to split the runs vertically (running
tests for a smaller set of toolsets)
Aren't this possible now?
...
and horizontally (running tests for a smaller set of
libraries) easily;
Agreed.
...
I realize, though, that presenting the results would become more
difficult.
Nothing we can't figure out.

-- 
Aleksey Gurtovoy
MetaCommunications Engineering