
On 8 Aug 2007, at 17:01, David Abrahams wrote:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
I've struggled to create a coherent organization to this post, but it still rambles a little, for which I apologize in advance.
Feedback Systems ================
Boost's feedback system has evolved some unique and valuable features
Unique Boost Features ---------------------
* Automatic distinction of regressions from new failures.
* A markup system that allows us to distinguish library bugs from compiler bugs and add useful, detailed descriptions of severity and consequences. This feature will continue to be important at *least* as long as widely-used compilers are substantially nonconforming.
* Automatic distinction of tests that had been failing due to toolset limitiations and begin passing without a known explanation.
* A summary page that shows only unresolved issues.
* A separate view encoding failure information in a way most appropriate for users rather than library developers.
While I acknowledge that Boost's feedback system has substantial weaknesses, no other feedback system I've seen accomodates most of these features in any way.
I agree. I've had numerous experiences with large projects that
have not done it as well as boost. Personally I find the status information held by meta-comm to be useful and informative. The opening page isn't very useful but digging in always leads to the information that is most useful.
Dart ----
It seems like Dart is a long, long way from being able to handle our display needs -- it is really oriented towards providing binary "is everything OK?" reports about the health of a project. It would actually be really useful for Boost to have such a binary view; it would probably keep us much closer to the "no failures on the trunk (or integration branch, if you prefer)" state that we hope to maintain continuously. However, I'm convinced our finer distinctions remain extremely valuable as well.
Other problems with Dart's dashboards (see http://public.kitware.com/dashboard.php?name=public):
* It is cryptic, rife with unexplained links and icons. Even some of the Kitware guys didn't know what a few of them meant when asked.
* Just like most of Boost's regression pages, it doesn't deal well with large amounts of data. One look at kitware's main dashboard above will show you a large amount of information, much of which is useless for at-a-glance assessment, and the continuous and experimental build results are all at the bottom of the page.
Dart's major strength is that it maintains a database of past build results, so anyone can review the entire testing history.
BuildBot --------
Buildbot is not really a feedback system; it's more a centralized system for driving testing. I will deal with that aspect of our system in a separate message.
Buildbot's display result (see http://twistedmatrix.com/buildbot/ for example) is no better suited to Boost's specific needs than Dart's, but it does provide one useful feature not seen in either of the other two systems: one can see, at any moment, what any of the test machines are doing. I know that's something Dart users want, and I certainly want it. In fact, as Rene has pointed out to me privately, the more responsive we can make the system, the more useful it will be to developers. His fantasy, and now mine, is that we can show developers the results of individual tests in real time.
Another great feature BuildBot has is an IRC plugin that insults the developer who breaks the build (http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot) Apparently the person who fixes the build gets to choose the next insult ;-)
Most importantly, BuildBot has a plugin architecture that would allow us to (easily?) customize feedback actions (http://buildbot.net/repos/release/docs/buildbot.html#Writing-New- Status-Plugins).
Boost's Systems ---------------
The major problems with our current feedback systems, AFAICT, are fragility and poor user interface.
I probably don't need to make the case about fragility, but in case there are any doubts, visit http://engineering.meta-comm.com/boost-regression/CVS-HEAD/ developer/index.build-index.html For the past several days, it has shown a Python backtrace
Traceback (most recent call last): File "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression \handle_http.py", line 324, in ? ... File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents raise BadZipfile, "Bad magic number for central directory" BadZipfile: Bad magic number for central directory
This is a typical problem, and the system breaks for one reason or another <subjective>on a seemingly weekly basis</subjective>.
With respect to the UI, although substantial effort has been invested (for which we are all very grateful), managing that amount of information is really hard, and we need to do better. Some of the current problems were described in this thread <http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are some others:
* The front page is essentially empty, showing little or no useful information <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/ developer/index.html>
* Summary tables have a redundant list of libraries at left (it also appears in a frame immediately adjacent)
* Summaries and individual library charts present way too much information to be callied "summaries", overwhelming any reasonably-sized browser pane. We usually don't need a square for every test/platform combination
* It's hard to answer simple questions, like, "what is the status of Boost.Python under gcc-3.4?" or "how well does MPL work on windows with STLPort?", or what is the list of
* A few links are cryptic (Full view/Release view) and could be better explained.
The email system that notifies developers when their libraries are broken seems to be fairly reliable. Its major weakness is that it reports all failures (even those that aren't regressions) as regressions, but that's a simple wording change. Its second weakness is that it has no way to harass the person who actually made the code-breaking checkin, and harasses the maintainer of every broken library just as aggressively, even if the breakage is due to one of the library's dependencies.
Recommendations ---------------
Our web-based regression display system needs to be redesigned and rewritten. It was evolved from a state where we had far fewer libraries, platforms, and testers, and is burdened with UI ideas that only work in that smaller context. I suggest we start with as minimal a display as we think we can get away with: the front status reporting page should be both useful and easily-grasped.
IMO the logical approach is to do this rewrite as a Trac plugin, because of the obvious opportunities to integrate test reports with other Trac functions (e.g. linking error messages to the source browser, changeset views, etc.), because the Trac database can be used to maintain the kind of history of test results that Dart manages, and because Trac contains a nice builtin mechanism for generating/displaying reports of all kinds. In my conversations with the Kitware guys, when we've discussed how Dart could accomodate Boost's needs, I've repeatedly pushed them in the direction of rebuilding Dart as a Trac plugin, but I don't think they "get it" yet.
I have some experience writing Trac plugins and would be willing to contribute expertise and labor in this area. However, I know that we also need some serious web-UI design, and many other people are much more skilled in that area than I am. I don't want to waste my own time doing badly what others could do well and more quickly, so I'll need help.
Yes, I realize this raises questions about how test results will actually be collected from testers; I'll try to deal with those in a separate posting.
Generally I agree with all the recommendations. However I am a big fan of incremental delivery and I would advocate boost approach this systemically. You don't want to get into the tool business. (.. avoid the anecdotal 'why fix things in 5 minutes when I can take a year writing a tool to automate it! :-{) For what it is worth my advice would be to do the following; 1. Choose 2/3 representative tool-chains/platforms as boost 'reference models' (msvc-M.N, win XP X.Y...) (gcc-N.M, debian...) (gcc-N.M, MacOSX,...) - the choices are based on what's right 'for the masses' and what is the defacto platform for mainstream development on those platforms (before anyone screams I am seriously NOT advocating dropping the builds on the other platforms - read on) - whatever the choices end up being I believe 'boost' needs to make a clear policy decision. 2. These 'reference models' are the basis of summary reports at the top level against the 'stable' released libraries. That can go on a page and it should take a minor amount of time to generate incrementally from the existing system. 3. As for tracking individual test results I don't personally see what's wrong with putting these under subversion. Given the likelyhood of high commonality between the output text of successive runs I think it is a much better 'implementation choice' than strictly a database. Certainly XML output from the test framework would aid other post-processing - but can be a secondary step/ enhancement to Boost.Test? Also there is a strong correlation between the versioning of test results and the changes since the last run that changed the results. Some relatively trivial automation of the source dependency tree changes between successive runs of individual tests could be a significant aid for the authors/maintainers. I'm not an expert on bjam but I presume for an individual target it would not be difficult to run a diff between the sources in successive invocations in each test. 4. Given the reference models above it would then be sensible to show the status of successive tiers of the boost project. ie. stable, development, sandbox, ... Again an indirection at the top-level will make this accessible. 5. Beyond this I would split out the summaries into platform variants on individual pages 'boost on windows', 'boost on linux' etc. In this way no information is lost and the community of developers is taken care of. Hope this helps. As things scale there is a stronger need for 'standardization' its unavoidable. Tool-chains are rarely the silver bullet. What boost has shouldn't be neglected ... it is already good for reporting status and its failings can be worked on incrementally. Andy
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost