[Report] 35 regressions on RC_1_34_0 (2007-03-12)

Boost Regression test failures Report time: 2007-03-12T05:48:13Z This report lists all regression test failures on release platforms. Detailed report: http://engineering.meta-comm.com/boost-regression/CVS-RC_1_34_0/developer/is... 35 failures in 7 libraries graph (4) iostreams (7) numeric/interval (1) optional (6) parameter (1) python (15) test (1) |graph| dijkstra_heap_performance: msvc-7.0 graphviz_test: msvc-7.1_stlport4 layout_test: msvc-7.0 relaxed_heap_test: msvc-7.0 |iostreams| bzip2_test: msvc-7.1 msvc-8.0 file_descriptor_test: gcc-cygwin-3.4.4 gzip_test: msvc-7.1 msvc-8.0 zlib_test: msvc-7.1 msvc-8.0 |numeric/interval| test_float: msvc-7.1_stlport4 |optional| optional_test: msvc-6.5 msvc-6.5 msvc-6.5_stlport4 msvc-7.0 optional_test_ref_fail2: msvc-7.1 msvc-8.0 |parameter| python_test: gcc-cygwin-3.4.4 |python| import_: cw-9.4 gcc-mingw-3.4.2 gcc-mingw-3.4.5 intel-vc71-win-9.1 msvc-6.5 msvc-6.5 msvc-6.5_stlport4 msvc-7.0 msvc-7.1 msvc-7.1 msvc-7.1 msvc-7.1_stlport4 msvc-8.0 msvc-8.0 msvc-8.0 |test| prg_exec_fail3: cw-9.4

Douglas Gregor wrote:
Boost Regression test failures Report time: 2007-03-12T05:48:13Z
This report lists all regression test failures on release platforms.
Detailed report: http://engineering.meta-comm.com/boost-regression/CVS-RC_1_34_0/developer/is...
35 failures in 7 libraries graph (4) iostreams (7) numeric/interval (1) optional (6) parameter (1) python (15)
|python| import_: cw-9.4 gcc-mingw-3.4.2 gcc-mingw-3.4.5 intel-vc71-win-9.1 msvc-6.5 msvc-6.5 msvc-6.5_stlport4 msvc-7.0 msvc-7.1 msvc-7.1 msvc-7.1 msvc-7.1_stlport4 msvc-8.0 msvc-8.0 msvc-8.0
I don't understand how the number of import_ failures can go up, no down, given that the issue was fixed a week ago. Is anybody taking these numbers seriously at all ? Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan, Stefan Seefeld wrote:
Douglas Gregor wrote:
Boost Regression test failures Report time: 2007-03-12T05:48:13Z
|python| import_: cw-9.4 gcc-mingw-3.4.2 gcc-mingw-3.4.5 intel-vc71-win-9.1 msvc-6.5 msvc-6.5 msvc-6.5_stlport4 msvc-7.0 msvc-7.1 msvc-7.1 msvc-7.1 msvc-7.1_stlport4 msvc-8.0 msvc-8.0 msvc-8.0
I don't understand how the number of import_ failures can go up, no down, given that the issue was fixed a week ago. Is anybody taking these numbers seriously at all ?
Don't get me wrong, but what makes you sure that it's not the patch that's broken. Thomas -- Thomas Witt witt@acm.org

Thomas Witt wrote:
Stefan,
Stefan Seefeld wrote:
Boost Regression test failures Report time: 2007-03-12T05:48:13Z
|python| import_: cw-9.4 gcc-mingw-3.4.2 gcc-mingw-3.4.5 intel-vc71-win-9.1 msvc-6.5 msvc-6.5 msvc-6.5_stlport4 msvc-7.0 msvc-7.1 msvc-7.1 msvc-7.1 msvc-7.1_stlport4 msvc-8.0 msvc-8.0 msvc-8.0 I don't understand how the number of import_ failures can go up, no down, given
Douglas Gregor wrote: that the issue was fixed a week ago. Is anybody taking these numbers seriously at all ?
Don't get me wrong, but what makes you sure that it's not the patch that's broken.
Nothing, and that's exactly my point ! A fix went in over a week ago. Instead of having a row of results in the test matrix showing me 24 hours later whether all is well now, results are (for a number of reasons) tickling in one at a time. Also, looking at the above descriptors, there are multiple entries for a number of toolkits, suggesting the number of failures reported is only barely correlated to the actual failures (and quite likely doesn't correspond the the current state of affairs) either. Thus my question: does anybody actually care about these numbers ? Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Stefan, Stefan Seefeld wrote:
A fix went in over a week ago. Instead of having a row of results in the test matrix showing me 24 hours later whether all is well now, results are (for a number of reasons) tickling in one at a time.
While turnaround times like ours are not desirable I can't see why they make the whole system unreliable/unusable.
Also, looking at the above descriptors, there are multiple entries for a number of toolkits, suggesting the number of failures reported is only barely correlated to the actual failures (and quite likely doesn't correspond the the current state of affairs) either.
This again is more a problem of what you read into the numbers than it's a problem with the numbers not being correct. Interestingly zero does not have this issues with interpretation.
Thus my question: does anybody actually care about these numbers ?
Let me put it this way. I do because they are the only numbers we got. Don't get me wrong these are all valid point with respect to usability. They just don't prove that the system is broken. Thomas -- Thomas Witt witt@acm.org

Thomas Witt wrote:
Stefan,
Stefan Seefeld wrote:
A fix went in over a week ago. Instead of having a row of results in the test matrix showing me 24 hours later whether all is well now, results are (for a number of reasons) tickling in one at a time.
While turnaround times like ours are not desirable I can't see why they make the whole system unreliable/unusable.
It's not the turnaround time per se. It's that in any given report there are test runs that don't reflect the same state of the code, as they are run against (sometimes wildly) different revisions of the code. Of course I can figure out the exact time a fix went in and then mentally mask those runs that were run prior to that, but that's all stuff that the regression harness could do much better and much more reliably.
Also, looking at the above descriptors, there are multiple entries for a number of toolkits, suggesting the number of failures reported is only barely correlated to the actual failures (and quite likely doesn't correspond the the current state of affairs) either.
This again is more a problem of what you read into the numbers than it's a problem with the numbers not being correct. Interestingly zero does not have this issues with interpretation.
Right, the distribution isn't symmetrically distributed around some mean value; it's an absorbing boundary condition. :-)
Thus my question: does anybody actually care about these numbers ?
Let me put it this way. I do because they are the only numbers we got.
Don't get me wrong these are all valid point with respect to usability. They just don't prove that the system is broken.
How much does it take to prove that ? Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...
participants (3)
-
Douglas Gregor
-
Stefan Seefeld
-
Thomas Witt