Release Tools Analysis Part I: Build systems

This is part 1 of my analysis of our prospects for improving the Boost release process by improving the tools. I'll post something based on this to the wiki after y'all give me an earful about what I got wrong or overlooked. Build System ============ I'm only considering BBv2 and cmake here, because we currently use the former and Doug and Troy have been putting lots of effort into the latter, and Kitware has volunteered to support it for us. Cmake has lots of advantages over BBv2 for developers (e.g. instant build startup time, use favorite development environment) and packagers (e.g. one-line installer target declaration), but these factors are more important for day-to-day development than they are for testing. Instant build startup time would improve test turnaround time, of course, but build startup does not dominate our testing time yet. BBv2 also has overall speed/memory-consumption problems. These might be addressable using the Boehm GC, but the last attempt resulted in crashes (more investigation needed). BBv2 has a few advantages over Cmake for testing. 1. BBv2 can build with multiple toolsets and/or variants in one invocation. That, however, is trivially overcome by using a wrapper script over cmake. 2. BBv2 serializes the output of simultaneous tests, which makes it possible to run tests in parallel with -jN. This is really important for test turnaround time. Even uniprocessors benefit from a low N (like 2) IME, due to disk/cpu tradeoffs. The IBM testers are currently using -j16 to great effect on their big iron. However, output serialization is only important because both Boost's process_jam_log and ctest/dart currently use a parse-the-build-log strategy to assemble the test results. This approach is hopelessly fragile in my opinion. Instead, each build step in a test should generate an additional target that contains the fragment of XML pertaining to that test, and the complete result should be an XML file that is the catenation of all the fragments. Rene is working on that feature for BBv2. I don't expect it would take more than a couple of days to add this feature to either build system, so the advantage of serialization may be easily neutralized. 3. Platform/compiler-independent build specification. This one seems pretty important on the face of it. If library authors' tests won't run on platforms to which they don't have direct access, we'll need to find people to port and maintain test specifications for the other platforms. On the other hand, very few tests are doing anything fancy when it comes to build configuration. To understand the real impact of this BBv2 feature, someone would have to objectively analyze the cmake build specifications Doug and Troy have been working on for complexity and portability. If either of those guys can be objective, it would be fastest if they'd do the analysis. 4. Cmake's built-in programming language is even more horrible, IMO, than bjam's. I bet you get used to it, like anything else, but it does present a barrier to entry. On the other hand, it's my impression that most programming jobs can be done more simply and directly under Cmake because it lacks BBv2's multiple-toolset/variant builds, and so has no need for virtual targets. Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed. Based on the above points, I'd say either build system has the potential to work for Boost's testing needs and neither one is substantially closer to being ideal for us (for testing). Since our systems use BBv2 now, it makes sense to concentrate effort there. However, if cmake were to suddenly acquire the XML fragment generation and the analysis of point 3 above showed that portable build specification is easily accomplished in cmake or not very important to our testing needs, I'd probably be vote the other way. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
4. Cmake's built-in programming language is even more horrible, IMO, than bjam's. I bet you get used to it, like anything else, but it does present a barrier to entry. On the other hand, it's my impression that most programming jobs can be done more simply and directly under Cmake because it lacks BBv2's multiple-toolset/variant builds, and so has no need for virtual targets.
Note that with 1.34.1 out, I plan to revive the experimental BBV2/Python branch that will allow to write Jamfiles in Python. With a widely known programming language, we can more freely ask users to write procedural code whenever builtin mechanisms are not sufficient, so that will simplify usage overall. - Volodya

David Abrahams wrote:
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed.
FWIW, Wave can be used to generate a complete list of dependencies (full names of included files). This requires knowledge about all predefined system dependend macros, though. Regards Hartmut

On 8/4/07, Hartmut Kaiser <hartmut.kaiser@gmail.com> wrote:
David Abrahams wrote:
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed.
FWIW, Wave can be used to generate a complete list of dependencies (full names of included files). This requires knowledge about all predefined system dependend macros, though.
I was working on a tool that would be able to do that with wave. But from time to time I get very busy and then later I get some spare time, so whatever I have time to continue this is kind of random. I was trying to make the tool very flexible, so maybe for what we need now, maybe just keeping it *very* simple is already good enough.
Regards Hartmut
-- Felipe Magno de Almeida

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser Sent: Sunday, August 05, 2007 2:17 AM To: boost@lists.boost.org Subject: Re: [boost] Release Tools Analysis Part I: Build systems
David Abrahams wrote:
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed.
FWIW, Wave can be used to generate a complete list of dependencies (full names of included files). This requires knowledge about all predefined system dependend macros, though.
Maybe it would be worth then, to make wave the default preprocessor for boost, i.e. to change the build system to take preprocessed files as the input to generate object files. At least this would prevent errors resulting from the fact that the implicit dependencies are generated from another tool (wave) than the compiler's default preprocessor (unless you think this is a non-issue). The disadvantages of this approach would be: * increased compile times * much more disk storage needed * precompiled headers would probably no longer be available cheers, aa -- Andreas Ames | Programmer | Comergo GmbH | ames AT avaya DOT com Sitz der Gesellschaft: Stuttgart Registergericht: Amtsgericht Stuttgart - HRB 22107 Geschäftsführer: Andreas von Meyer zu Knonow, Udo Bühler, Thomas Kreikemeier

on Mon Aug 06 2007, "Ames, Andreas (Andreas)" <ames-AT-avaya.com> wrote:
Maybe it would be worth then, to make wave the default preprocessor for boost, i.e. to change the build system to take preprocessed files as the input to generate object files. At least this would prevent errors resulting from the fact that the implicit dependencies are generated from another tool (wave) than the compiler's default preprocessor (unless you think this is a non-issue).
The disadvantages of this approach would be:
* increased compile times
* much more disk storage needed
* precompiled headers would probably no longer be available
This idea is a non-starter in my opinion, for those reasons and others. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams said: (by the date of Sat, 04 Aug 2007 17:42:59 -0400) Hello,
4. Cmake's built-in programming language is even more horrible, IMO,
Did you consider scons? It's a build system which uses python as a language. Therefore it's not horrible, and pretty flexible at the same time. http://www.scons.org/ Due to my current lack of time I'll not advocate scons here too much. I'll just say what springs to my mind now: it works well on all platforms, from MSVC to gcc on IRIX, and doom3 development was done with scons. Leave other praises out, and please rather test it, if you wish.
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed.
For instance writing that with python should be possible. I'm not good at python, but I can imagine writing this sort of thing. best regards -- Janek Kozicki |

on Sun Aug 05 2007, Janek Kozicki <janek_listy-AT-wp.pl> wrote:
David Abrahams said: (by the date of Sat, 04 Aug 2007 17:42:59 -0400)
Hello,
4. Cmake's built-in programming language is even more horrible, IMO,
Did you consider scons?
Yes, many times. It's not under consideration now mostly because nobody is offering to rewrite all of Boost's build instructions in terms of Scons and support the features we need in Scons, as Kitware has offered to do for Cmake.
It's a build system which uses python as a language. Therefore it's not horrible, and pretty flexible at the same time.
Yes, I'm familiar (though not intimately) with it. Steven Knight and I had some extensive and enjoyable discussions about build system design at PyCon several years ago. I'm not aware of much of Scons' recent evolution and features, though, and of course the user guide says "the SCons documentation isn't always kept up-to-date with the available features. In other words, there's a lot that SCons can do that isn't yet covered in this User's Guide." That makes it hard to evaluate.
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed.
For instance writing that with python should be possible. I'm not good at python, but I can imagine writing this sort of thing.
I can't imagine that a C++ preprocessor written in python could be as efficient as one written in C++, and it's certainly a waste of programmer resources, since we have Wave. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
Neither systems can trace #include dependencies through macros (e.g. #include SOME_MACRO(xyz)), which occasionally leads to inaccurate results from incremental testing. This is an important problem that should be fixed. For instance writing that with python should be possible. I'm not good at python, but I can imagine writing this sort of thing.
I can't imagine that a C++ preprocessor written in python could be as efficient as one written in C++, and it's certainly a waste of programmer resources, since we have Wave.
FWIW, I submitted a patch to SCons providing a python extension module around ucpp (http://pornin.nerim.net/ucpp/) to provide a 'more accurate' C/C++ dependency scanner, long before wave was around. Steven refused that, believing that it could (and should) all be done in pure python. That was many years ago, and the situation hasn't substantially changed since then. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

David Abrahams said: (by the date of Sun, 05 Aug 2007 09:37:57 -0400)
Did you consider scons?
Yes, many times. It's not under consideration now mostly because nobody is offering to rewrite all of Boost's build instructions in terms of Scons and support the features we need in Scons, as Kitware has offered to do for Cmake.
maybe it should be an interesting task for GSoC 2008 ? Maybe scons team would agree to co-mentor a student that would volunteer to rewrite boost build instructions in terms of scons. -- Janek Kozicki |

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Janek Kozicki Sent: Sunday, August 05, 2007 4:54 PM To: boost@lists.boost.org Subject: Re: [boost] Release Tools Analysis Part I: Build systems
David Abrahams said: (by the date of Sun, 05 Aug 2007 09:37:57 -0400)
Did you consider scons?
Yes, many times. It's not under consideration now mostly because nobody is offering to rewrite all of Boost's build instructions in terms of Scons and support the features we need in Scons, as Kitware has offered to do for Cmake.
maybe it should be an interesting task for GSoC 2008 ?
Maybe scons team would agree to co-mentor a student that would volunteer to rewrite boost build instructions in terms of scons.
I think, a proof of concept for selected parts of boost would be much more feasible within only three months. FWIW, I think one of the most obviously lacking parts of scons compared to Boost.Build is the lack of 'features', i.e. tool independent (command line) option descriptors. (IMHO, the build file language, i.e. python -><- 'bjam-script', outweighs this disadvantage. Just consider the ease of adding a new build tool to scons, for example.) cheers, aa -- Andreas Ames | Programmer | Comergo GmbH | ames AT avaya DOT com Sitz der Gesellschaft: Stuttgart Registergericht: Amtsgericht Stuttgart - HRB 22107 Geschäftsführer: Andreas von Meyer zu Knonow, Udo Bühler, Thomas Kreikemeier

Ames, Andreas (Andreas) wrote:
FWIW, I think one of the most obviously lacking parts of scons compared to Boost.Build is the lack of 'features', i.e. tool independent (command line) option descriptors. (IMHO, the build file language, i.e. python -><- 'bjam-script', outweighs this disadvantage. Just consider the ease of adding a new build tool to scons, for example.)
I would phrase it differently: SCons lacks a robust structure in its definition of 'tools' (there is no intermediate language to define features in a tool-independent way), making it rather hard to maintain and develop. That this lack of structure makes it seemingly easy to add new tools is a side-effect of that deficiency. But we are getting off-topic. This belongs more and more on the scons developer's list... Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

-----Original Message----- From: boost-bounces@lists.boost.org [mailto:boost-bounces@lists.boost.org] On Behalf Of Stefan Seefeld Sent: Monday, August 06, 2007 12:30 PM Subject: Re: [boost] Release Tools Analysis Part I: Build systems
SCons lacks a robust structure in its definition of 'tools' (there is no intermediate language to define features in a tool-independent way), making it rather hard to maintain and develop. That this lack of structure makes it seemingly easy to add new tools is a side-effect of that deficiency.
I beg to differ. It just makes simple things easy, and complicated things possible. You often have to integrate (possibly self-written) tools into your build process that are considerably less complicated than C compilers. Just think of the thousands of version generators used out there. -- Andreas Ames | Programmer | Comergo GmbH | ames AT avaya DOT com Sitz der Gesellschaft: Stuttgart Registergericht: Amtsgericht Stuttgart - HRB 22107 Geschäftsführer: Andreas von Meyer zu Knonow, Udo Bühler, Thomas Kreikemeier

David Abrahams wrote:
However, output serialization is only important because both Boost's process_jam_log and ctest/dart currently use a parse-the-build-log strategy to assemble the test results. This approach is hopelessly fragile in my opinion. Instead, each build step in a test should generate an additional target that contains the fragment of XML pertaining to that test, and the complete result should be an XML file that is the catenation of all the fragments. Rene is working on that feature for BBv2. I don't expect it would take more than a couple of days to add this feature to either build system, so the advantage of serialization may be easily neutralized.
There may be some semantics here that I'm missing, but I think ctest is already doing exactly what you describe: * ctest checks the process return value from each test. A nonzero value equals test failure (unless the test is marked as an expected failure). * ctest (optionally) parses the test's output (stdout/stderr) using a configurable regular expression. If there is a match, the test fails. The intent is to catch errors reported by some standard logging mechanism, if such a mechanism exists, e.g. messages of the form "CRITICAL ERROR: blah blah blah ...". You can disable this feature if you don't want it. * ctest produces an XML file that describes the results of each test, including pass/fail, execution time, and test output (stdout/stderr). * ctest also parses the test output (stdout/stderr) for <NamedMeasurement> tags that are incorporated into the final XML. Tests can use this mechanism to pass internally-generated metrics into the test output in an unambiguous way. * ctest (optionally) uploads the final concatenated XML to a dart server where it can be displayed using a web browser. I've attached a very short sample of ctest output XML from a real-world project. It's been trimmed-down to a single test case, normally this file contains hundreds of tests. Regards, Tim Shead

on Sun Aug 05 2007, "Timothy M. Shead" <tshead-AT-k-3d.com> wrote:
David Abrahams wrote:
However, output serialization is only important because both Boost's process_jam_log and ctest/dart currently use a parse-the-build-log strategy to assemble the test results. This approach is hopelessly fragile in my opinion. Instead, each build step in a test should generate an additional target that contains the fragment of XML pertaining to that test, and the complete result should be an XML file that is the catenation of all the fragments. Rene is working on that feature for BBv2. I don't expect it would take more than a couple of days to add this feature to either build system, so the advantage of serialization may be easily neutralized.
There may be some semantics here that I'm missing, but I think ctest is already doing exactly what you describe:
Bill Hoffman (of Kitware) himself told me that parallel builds don't work with ctest because of output interleaving issues. Was he mistaken?
* ctest checks the process return value from each test. A nonzero value equals test failure (unless the test is marked as an expected failure).
* ctest (optionally) parses the test's output (stdout/stderr) using a configurable regular expression. If there is a match, the test fails. The intent is to catch errors reported by some standard logging mechanism, if such a mechanism exists, e.g. messages of the form "CRITICAL ERROR: blah blah blah ...". You can disable this feature if you don't want it.
All of the above is irrelevant to the question at hand, I think.
* ctest produces an XML file that describes the results of each test, including pass/fail, execution time, and test output (stdout/stderr).
A separate XML file for each test? Can it generate these XML files if the tests are run in parallel? That's the key (relevant) issue. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Sun Aug 05 2007, "Timothy M. Shead" <tshead-AT-k-3d.com> wrote:
* ctest produces an XML file that describes the results of each test, including pass/fail, execution time, and test output (stdout/stderr).
A separate XML file for each test? Can it generate these XML files if the tests are run in parallel? That's the key (relevant) issue.
Ah, that's the part that I missed - point well taken. Cheers, Tim

on Sun Aug 05 2007, David Abrahams <dave-AT-boost-consulting.com> wrote:
on Sun Aug 05 2007, "Timothy M. Shead" <tshead-AT-k-3d.com> wrote:
David Abrahams wrote:
However, output serialization is only important because both Boost's process_jam_log and ctest/dart currently use a parse-the-build-log strategy to assemble the test results. This approach is hopelessly fragile in my opinion. Instead, each build step in a test should generate an additional target that contains the fragment of XML pertaining to that test, and the complete result should be an XML file that is the catenation of all the fragments. Rene is working on that feature for BBv2. I don't expect it would take more than a couple of days to add this feature to either build system, so the advantage of serialization may be easily neutralized.
There may be some semantics here that I'm missing, but I think ctest is already doing exactly what you describe:
Bill Hoffman (of Kitware) himself told me that parallel builds don't work with ctest because of output interleaving issues. Was he mistaken?
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable), just not parallel (runtime) *tests*. Why these two kinds of action should be treated so differently is a bit of a mystery to me. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On Aug 8, 2007, at 10:23 AM, David Abrahams wrote:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable), just not parallel (runtime) *tests*. Why these two kinds of action should be treated so differently is a bit of a mystery to me.
It's a different model. The CMake/CTest way of handling tests is to first do a full build of the source tree, including any additional test executables. CMake does the build, and it's exactly the same build that a user would do (although most users would opt not to build extra testing-only executables). That "build" step can be parallelized. CTest then runs the test suite, invoking those test executables and recording the results. This step is still serial. It's not conclusive proof that parallel builds will always work, but while we were working on the CMake-based build system, I was using "make -j4" for the build step of nightly regression testing on a 2- core machine. It made a big difference in regression-testing time, and we never saw any builds broken or any output mangled. - Doug

Am Mittwoch, 8. August 2007 16:23:58 schrieb David Abrahams:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable)
AFAIK the caveat is that cmake doesn't build anythink itself, it just generates a bunch of files for the target buildsystem. Thus building parallel on unix flavors (including mac and msys on windows)is no problem, but i.e. nmake on windows can't(AFAIK). Regards, Maik

On Aug 8, 2007, at 12:35 PM, Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 16:23:58 schrieb David Abrahams:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable)
AFAIK the caveat is that cmake doesn't build anythink itself, it just generates a bunch of files for the target buildsystem. Thus building parallel on unix flavors (including mac and msys on windows)is no problem, but i.e. nmake on windows can't(AFAIK).
Newer versions of Visual Studio can, however, parallelize the build. It actually gives a decent performance boost in the build, even on just a dual-core machine. - Doug

Doug Gregor wrote:
On Aug 8, 2007, at 12:35 PM, Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 16:23:58 schrieb David Abrahams:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable)
AFAIK the caveat is that cmake doesn't build anythink itself, it just generates a bunch of files for the target buildsystem. Thus building parallel on unix flavors (including mac and msys on windows)is no problem, but i.e. nmake on windows can't(AFAIK).
Newer versions of Visual Studio can, however, parallelize the build. It actually gives a decent performance boost in the build, even on just a dual-core machine.
Afaik, Visual Studio only parallelizes project builds so if you have one project with lots of files, that will not be parallelized. On the other hand, there is also some sort of /MP switch but you need to pass in all your files to the one invocation to take advantage of it.

Doug Gregor wrote:
On Aug 8, 2007, at 12:35 PM, Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 16:23:58 schrieb David Abrahams:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable) AFAIK the caveat is that cmake doesn't build anythink itself, it just generates a bunch of files for the target buildsystem. Thus building parallel on unix flavors (including mac and msys on windows)is no problem, but i.e. nmake on windows can't(AFAIK).
Newer versions of Visual Studio can, however, parallelize the build. It actually gives a decent performance boost in the build, even on just a dual-core machine.
Am I the only one who questions the prudence of relying on N+1 make systems, instead of 1? -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 20:21:18 schrieb Rene Rivera:
Am I the only one who questions the prudence of relying on N+1 make systems, instead of 1?
Let others workout and maintain the details, I think.
Yes, that seems to be the ongoing rationale for Cmake. That doesn't remove the need for us to test that Boost builds and works correctly in all those other build systems. And when it doesn't work users are not likely going to bother Kitware, they will complain to us. We will then have to figure out if it's a problem in our Cmake scripts, or in Cmake itself. And then forward the problem on to Kitware. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Am Mittwoch, 8. August 2007 21:26:28 schrieb Rene Rivera:
That doesn't remove the need for us to test that Boost builds and works correctly in all those other build systems. And when it doesn't work users are not likely going to bother Kitware, they will complain to us.
Well, this is valid for all software boost uses and doesn't maintain. This is OK if it work well, IMHO. However, its a good point! Unfortunately Doug is in charge with Trac and Subversion, so I suggest to wait with further discussions on pros and cons about CMake until these issues are stabilized and he has time to participate. Best Regards, Maik

Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 21:26:28 schrieb Rene Rivera:
That doesn't remove the need for us to test that Boost builds and works correctly in all those other build systems. And when it doesn't work users are not likely going to bother Kitware, they will complain to us.
Well, this is valid for all software boost uses and doesn't maintain. This is OK if it work well, IMHO.
Yes, in particular we know we test considerably less platform plus build combinations than users actually deal with. And this is problem I've mentioned before. We should consider not only the immediate features of the make system, but also how it impacts the overall development procedures. And I'm afraid there is a tendency to ignore how we deal with the libraries after we release them. I personally think that *any* meta-build system is detrimental to productivity. But it's not my decision, I just want to make sure others consider the consequences of doing such a switch to their daily lives in developing and maintaining their code.
However, its a good point!
Unfortunately Doug is in charge with Trac and Subversion, so I suggest to wait with further discussions on pros and cons about CMake until these issues are stabilized and he has time to participate.
Why? It's not a Cmake question, it's a problem with any meta-build system. I certainly hope others can see the general issues when consider a build system which directly runs the tools vs. a build system that generates files for other build systems. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Maik Beckmann wrote:
Am Mittwoch, 8. August 2007 16:23:58 schrieb David Abrahams:
Bill just sent me this clarification privately: they can handle parallel *builds* (with some caveats I didn't really understand that made it sound somewhat unreliable)
AFAIK the caveat is that cmake doesn't build anythink itself, it just generates a bunch of files for the target buildsystem. Thus building parallel on unix flavors (including mac and msys on windows)is no problem, but i.e. nmake on windows can't(AFAIK).
This is correct, but you can setup an environment that uses the windows port of gnu make with the "cl" command-line compiler from VS. Then parallel builds work correctly. -Brad

On Aug 4, 2007, at 5:42 PM, David Abrahams wrote:
3. Platform/compiler-independent build specification. This one seems pretty important on the face of it. If library authors' tests won't run on platforms to which they don't have direct access, we'll need to find people to port and maintain test specifications for the other platforms.
On the other hand, very few tests are doing anything fancy when it comes to build configuration. To understand the real impact of this BBv2 feature, someone would have to objectively analyze the cmake build specifications Doug and Troy have been working on for complexity and portability. If either of those guys can be objective, it would be fastest if they'd do the analysis.
It's well short of an analysis, but I can give you my experiences with this. The CMake and BBv2 descriptions for building and testing libraries have nearly a 1-1 correspondence. Troy and I built a thin BBv2-like layer over CMake that gives it the kind of platform/ compiler-independent build specifications we're used to in Boost. See, e.g., http://svn.boost.org/trac/boost/wiki/CMakeAddLibrary BBv2 is far more concise when one needs to do complicated things, e.g., add a specific flag when compiling a certain source file for a shared library on compiler X. The same is possible with CMake, but it requires an "if" statement or two. In Boost's build files, this kind of thing didn't come up more than a handful of times, and it was always the same pattern.
4. Cmake's built-in programming language is even more horrible, IMO, than bjam's.
CMake's language is certainly more ugly.
I bet you get used to it, like anything else, but it does present a barrier to entry. On the other hand, it's my impression that most programming jobs can be done more simply and directly under Cmake because it lacks BBv2's multiple-toolset/variant builds, and so has no need for virtual targets.
I find CMake easier to work with, but the reason you give is not one of my reasons. It actually comes down to two things, for me. (1) Reference documentation: All of CMake's macros are documented in one place, with a terse-but-sufficient style that works well for me. I find it very, very easy to zero in on the macro I want to get the job done. (2) Error messages: this is particularly amusing coming from a C++ template meta-programmer, but BBv2's error messages kill me. Just like with template metaprogramming, BBv2 is bending the Jam language to something it wasn't meant to do, and when something goes wrong, you get an eyeful of backtraces that I find hard to decipher. CMake gives much more concise, more direct error messages, which has made it far easier for me to work with (even though I've spent less time, overall, with CMake than with BBv2). I was hoping not to have the big CMake discussion now, because, frankly, all of my Boost time (and more) is going into the Subversion repository and Trac. I just can't dedicate enough time to this discussion to represent CMake well. - Doug

David Abrahams wrote:
2. BBv2 serializes the output of simultaneous tests, which makes it possible to run tests in parallel with -jN. This is really important for test turnaround time. Even uniprocessors benefit from a low N (like 2) IME, due to disk/cpu tradeoffs. The IBM testers are currently using -j16 to great effect on their big iron.
However, output serialization is only important because both Boost's process_jam_log and ctest/dart currently use a parse-the-build-log strategy to assemble the test results. This approach is hopelessly fragile in my opinion. Instead, each build step in a test should generate an additional target that contains the fragment of XML pertaining to that test, and the complete result should be an XML file that is the catenation of all the fragments. Rene is working on that feature for BBv2. I don't expect it would take more than a couple of days to add this feature to either build system, so the advantage of serialization may be easily neutralized.
Agreed. Parse-the-build-log is a horrid stop-gap. "hopelessly fragile" doesn't even begin to describe how easily it breaks. It was supposed to hold-the-line for a few months until BBv2 was ready. That was in October, 2002. The other point of parse-the-build-log was to refine what was needed in the XML files. That part has been pretty successful; we now know what the XML needs to convey. One element that is still missing from the XML is timings for each of the steps, but particularly the compile and run steps. I remain convinced that both wall-clock and CPU-usage timings are an important element of high-quality testing and test management.
3. Platform/compiler-independent build specification. This one seems pretty important on the face of it. If library authors' tests won't run on platforms to which they don't have direct access, we'll need to find people to port and maintain test specifications for the other platforms.
It isn't just that tests have to run on platforms the authors don't have direct access to, but also that the authors have no knowledge or desire to learn about build specifications on those platforms.
...
Based on the above points, I'd say either build system has the potential to work for Boost's testing needs and neither one is substantially closer to being ideal for us (for testing). Since our systems use BBv2 now, it makes sense to concentrate effort there. However, if cmake were to suddenly acquire the XML fragment generation and the analysis of point 3 above showed that portable build specification is easily accomplished in cmake or not very important to our testing needs, I'd probably be vote the other way.
I'd say it isn't worth moving unless the new system has *very* significant advantages over what we have now. Robustness is really important to me, and it is going to be hard to convince me that any system that uses parse-the-build-log is robust. I see "portable build specification" and everything else that is a part of hiding the details of the underlying platform specific tools as being of critical importance. Thanks for tackling this analysis! --Beman

This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features. I've struggled to create a coherent organization to this post, but it still rambles a little, for which I apologize in advance. Feedback Systems ================ Boost's feedback system has evolved some unique and valuable features Unique Boost Features --------------------- * Automatic distinction of regressions from new failures. * A markup system that allows us to distinguish library bugs from compiler bugs and add useful, detailed descriptions of severity and consequences. This feature will continue to be important at *least* as long as widely-used compilers are substantially nonconforming. * Automatic distinction of tests that had been failing due to toolset limitiations and begin passing without a known explanation. * A summary page that shows only unresolved issues. * A separate view encoding failure information in a way most appropriate for users rather than library developers. While I acknowledge that Boost's feedback system has substantial weaknesses, no other feedback system I've seen accomodates most of these features in any way. Dart ---- It seems like Dart is a long, long way from being able to handle our display needs -- it is really oriented towards providing binary "is everything OK?" reports about the health of a project. It would actually be really useful for Boost to have such a binary view; it would probably keep us much closer to the "no failures on the trunk (or integration branch, if you prefer)" state that we hope to maintain continuously. However, I'm convinced our finer distinctions remain extremely valuable as well. Other problems with Dart's dashboards (see http://public.kitware.com/dashboard.php?name=public): * It is cryptic, rife with unexplained links and icons. Even some of the Kitware guys didn't know what a few of them meant when asked. * Just like most of Boost's regression pages, it doesn't deal well with large amounts of data. One look at kitware's main dashboard above will show you a large amount of information, much of which is useless for at-a-glance assessment, and the continuous and experimental build results are all at the bottom of the page. Dart's major strength is that it maintains a database of past build results, so anyone can review the entire testing history. BuildBot -------- Buildbot is not really a feedback system; it's more a centralized system for driving testing. I will deal with that aspect of our system in a separate message. Buildbot's display result (see http://twistedmatrix.com/buildbot/ for example) is no better suited to Boost's specific needs than Dart's, but it does provide one useful feature not seen in either of the other two systems: one can see, at any moment, what any of the test machines are doing. I know that's something Dart users want, and I certainly want it. In fact, as Rene has pointed out to me privately, the more responsive we can make the system, the more useful it will be to developers. His fantasy, and now mine, is that we can show developers the results of individual tests in real time. Another great feature BuildBot has is an IRC plugin that insults the developer who breaks the build (http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot) Apparently the person who fixes the build gets to choose the next insult ;-) Most importantly, BuildBot has a plugin architecture that would allow us to (easily?) customize feedback actions (http://buildbot.net/repos/release/docs/buildbot.html#Writing-New-Status-Plug...). Boost's Systems --------------- The major problems with our current feedback systems, AFAICT, are fragility and poor user interface. I probably don't need to make the case about fragility, but in case there are any doubts, visit http://engineering.meta-comm.com/boost-regression/CVS-HEAD/developer/index.b... For the past several days, it has shown a Python backtrace Traceback (most recent call last): File "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression\handle_http.py", line 324, in ? ... File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents raise BadZipfile, "Bad magic number for central directory" BadZipfile: Bad magic number for central directory This is a typical problem, and the system breaks for one reason or another <subjective>on a seemingly weekly basis</subjective>. With respect to the UI, although substantial effort has been invested (for which we are all very grateful), managing that amount of information is really hard, and we need to do better. Some of the current problems were described in this thread <http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are some others: * The front page is essentially empty, showing little or no useful information <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/developer/index.html> * Summary tables have a redundant list of libraries at left (it also appears in a frame immediately adjacent) * Summaries and individual library charts present way too much information to be callied "summaries", overwhelming any reasonably-sized browser pane. We usually don't need a square for every test/platform combination * It's hard to answer simple questions, like, "what is the status of Boost.Python under gcc-3.4?" or "how well does MPL work on windows with STLPort?", or what is the list of * A few links are cryptic (Full view/Release view) and could be better explained. The email system that notifies developers when their libraries are broken seems to be fairly reliable. Its major weakness is that it reports all failures (even those that aren't regressions) as regressions, but that's a simple wording change. Its second weakness is that it has no way to harass the person who actually made the code-breaking checkin, and harasses the maintainer of every broken library just as aggressively, even if the breakage is due to one of the library's dependencies. Recommendations --------------- Our web-based regression display system needs to be redesigned and rewritten. It was evolved from a state where we had far fewer libraries, platforms, and testers, and is burdened with UI ideas that only work in that smaller context. I suggest we start with as minimal a display as we think we can get away with: the front status reporting page should be both useful and easily-grasped. IMO the logical approach is to do this rewrite as a Trac plugin, because of the obvious opportunities to integrate test reports with other Trac functions (e.g. linking error messages to the source browser, changeset views, etc.), because the Trac database can be used to maintain the kind of history of test results that Dart manages, and because Trac contains a nice builtin mechanism for generating/displaying reports of all kinds. In my conversations with the Kitware guys, when we've discussed how Dart could accomodate Boost's needs, I've repeatedly pushed them in the direction of rebuilding Dart as a Trac plugin, but I don't think they "get it" yet. I have some experience writing Trac plugins and would be willing to contribute expertise and labor in this area. However, I know that we also need some serious web-UI design, and many other people are much more skilled in that area than I am. I don't want to waste my own time doing badly what others could do well and more quickly, so I'll need help. Yes, I realize this raises questions about how test results will actually be collected from testers; I'll try to deal with those in a separate posting. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
Boost's Systems ---------------
The major problems with our current feedback systems, AFAICT, are fragility and poor user interface.
100% agreement, it's not fault of metacomm: the current structure just can't cope with the volume of data generated these days.
Recommendations ---------------
Our web-based regression display system needs to be redesigned and rewritten. It was evolved from a state where we had far fewer libraries, platforms, and testers, and is burdened with UI ideas that only work in that smaller context. I suggest we start with as minimal a display as we think we can get away with: the front status reporting page should be both useful and easily-grasped.
IMO the logical approach is to do this rewrite as a Trac plugin, because of the obvious opportunities to integrate test reports with other Trac functions (e.g. linking error messages to the source browser, changeset views, etc.), because the Trac database can be used to maintain the kind of history of test results that Dart manages, and because Trac contains a nice builtin mechanism for generating/displaying reports of all kinds. In my conversations with the Kitware guys, when we've discussed how Dart could accomodate Boost's needs, I've repeatedly pushed them in the direction of rebuilding Dart as a Trac plugin, but I don't think they "get it" yet.
I have some experience writing Trac plugins and would be willing to contribute expertise and labor in this area. However, I know that we also need some serious web-UI design, and many other people are much more skilled in that area than I am. I don't want to waste my own time doing badly what others could do well and more quickly, so I'll need help.
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc. Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose, yes, OK I know it's an abuse of SVN, but basically our needs are quite simple: * Log the output from each build step and store it somewhere, with incremental builds much of this information would rairly change. In fact even if the test is rebuilt/rerun the chances are that logged data won't actually change. * Log the status of each test: pass or fail. * Log the date and time of the last test. So what would happen if build logs for each test were stored in an SVN tree set aside for the purpose, with pass/fail and date/time status stored as SVN properties? Could this be automated, from within bjam or CMake or whatever? Of course we're in serious danger of getting into the tool writing business again here .... John.

John Maddock wrote:
Of course we're in serious danger of getting into the tool writing business again here ....
I have been suggesting to look into QMTest (http://www.codesourcery.com/qmtest, which I happen to maintain), to drive the regression testing. It does allow the storage of test results in databases, and it keeps track of how tests results evolve over time. In fact, one of the more important features (which, unfortunately, make it hard to hook it up with boost.build), is that it allows the introspection of a 'test database': You can look at the test database (test suites, test sub-suites, as well as individual tests, test results, etc.) without actually running them. This is good for robustness, and allows to easily generate reports, involving different dimensions (across test suites, across platforms, across time, etc.). Oh, and QMTest makes it easy to run tests in parallel, dispatching to a compile farm, etc. I sincerely believe that, no matter whether QMTest will be seriously considered in this context, the concepts it builds on, definitely should. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Wed Aug 08 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Of course we're in serious danger of getting into the tool writing business again here ....
I have been suggesting to look into QMTest (http://www.codesourcery.com/qmtest, which I happen to maintain), to drive the regression testing. It does allow the storage of test results in databases, and it keeps track of how tests results evolve over time.
In fact, one of the more important features (which, unfortunately, make it hard to hook it up with boost.build), is that it allows the introspection of a 'test database': You can look at the test database (test suites, test sub-suites, as well as individual tests, test results, etc.) without actually running them. This is good for robustness, and allows to easily generate reports, involving different dimensions (across test suites, across platforms, across time, etc.).
Oh, and QMTest makes it easy to run tests in parallel, dispatching to a compile farm, etc.
Can you give a brief summary of what QMTest actually does and how Boost might use it? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Wed Aug 08 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Of course we're in serious danger of getting into the tool writing business again here .... I have been suggesting to look into QMTest (http://www.codesourcery.com/qmtest, which I happen to maintain), to drive the regression testing. It does allow the storage of test results in databases, and it keeps track of how tests results evolve over time.
In fact, one of the more important features (which, unfortunately, make it hard to hook it up with boost.build), is that it allows the introspection of a 'test database': You can look at the test database (test suites, test sub-suites, as well as individual tests, test results, etc.) without actually running them. This is good for robustness, and allows to easily generate reports, involving different dimensions (across test suites, across platforms, across time, etc.).
Oh, and QMTest makes it easy to run tests in parallel, dispatching to a compile farm, etc.
Can you give a brief summary of what QMTest actually does and how Boost might use it?
QMTest is a testing harness. Its concepts are captured in python base classes ('Test', 'Suite', 'Resource', 'Target', etc.) which then are implemented to capture domain-specific details. (It is straight forward to customize QMTest by adding new test classes, for example). QMTest's central concept is that of a 'test database'. A test database organizes tests. It lets users introspect tests (test types, test arguments, prerequisite resources, previous test results, expectations, etc.), as well as run them (everything or only specific sub-suites, by means of different 'target' implementations either in serial, or parallel using multi-threading, multiple processes, or even multiple hosts). Another important point is scalability: While some test suites are simple and small, we also deal with test suites that hold many thousands of tests (QMTest is used for some of the GCC test suites, for example). A test can mean to run a single (local) executable, or require a compilation, an upload of the resulting executable to a target board with subsequent remote execution, or other even more fancy things. Test results are written to 'result streams' (which can be customized as most of QMTest). There is a 'report' command that merges the results from multiple test runs into a single test report (XML), which can then be translated to whatever output medium is desired. How could this be useful for boost ? I found that boost's testing harness lacks robustness. There is no way to ask seemingly simple questions such as "what tests constitute this test suite ?" or "what revision / date / runtime environment etc. does this result correspond to ?", making it hard to assess the overall performance / quality of the software. I believe the hardest part is the connection between QMTest and boost.build. Since boost.build doesn't provide the level of introspection QMTest promises, a custom 'boost.build test database' implementation needs some special hooks from the build system. I discussed that quite a bit with Vladimir. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Tue Aug 14 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
Can you give a brief summary of what QMTest actually does and how Boost might use it?
QMTest is a testing harness.
Meaning, a system for running tests and collecting their results?
Its concepts are captured in python base classes ('Test', 'Suite', 'Resource', 'Target', etc.) which then are implemented to capture domain-specific details. (It is straight forward to customize QMTest by adding new test classes, for example).
What are Resource and Target?
QMTest's central concept is that of a 'test database'. A test database organizes tests. It lets users introspect tests (test types, test arguments, prerequisite resources, previous test results, expectations, etc.), as well as run them (everything or only specific sub-suites, by means of different 'target' implementations
I don't understand what you mean by "run them *by means of* 'target' implementations."
either in serial, or parallel using multi-threading, multiple processes, or even multiple hosts).
Would QMTest be used to drive multi-host testing across the internet (i.e. at different testers' sites), or more likely just within local networks? If the former, how do its facilities for that compare with BuildBot?
Another important point is scalability: While some test suites are simple and small, we also deal with test suites that hold many thousands of tests (QMTest is used for some of the GCC test suites, for example). A test can mean to run a single (local) executable, or require a compilation, an upload of the resulting executable to a target board
Target board?
with subsequent remote execution, or other even more fancy things.
Test results are written to 'result streams' (which can be customized as most of QMTest). There is a 'report' command that merges the results from multiple test runs into a single test report (XML), which can then be translated to whatever output medium is desired.
How could this be useful for boost ?
A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc.
I found that boost's testing harness lacks robustness.
Our testing system itself seems to be pretty reliable. I think it's the reporting system that lacks robustness.
There is no way to ask seemingly simple questions such as "what tests constitute this test suite ?" or "what revision / date / runtime environment etc. does this result correspond to ?", making it hard to assess the overall performance / quality of the software.
I believe the hardest part is the connection between QMTest and boost.build. Since boost.build doesn't provide the level of introspection QMTest promises, a custom 'boost.build test database' implementation needs some special hooks from the build system. I discussed that quite a bit with Vladimir.
And what came of it? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Tue Aug 14 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
Can you give a brief summary of what QMTest actually does and how Boost might use it? QMTest is a testing harness.
Meaning, a system for running tests and collecting their results?
Yes.
Its concepts are captured in python base classes ('Test', 'Suite', 'Resource', 'Target', etc.) which then are implemented to capture domain-specific details. (It is straight forward to customize QMTest by adding new test classes, for example).
What are Resource and Target?
A resource is a prerequisite for a test. Anything that has to be done in preparation, but that may also be shared by multiple tests (and so you don't have to run the same setup procedure for each test). A target is an execution context for a test. Beside the default serial target there are various target classes for parallel execution: multi-process, multi-thread, rsh/ssh-based, etc. Parallel execution aside, targets can also be used to handle multi-platform testing, i.e. where different target instances represent different platforms, on which the tests are to be performed. QMTest guarantees that all resources bound to a test are set up prior to a test execution, in the execution context that test is going to be run. In case of parallel targets the resource may thus be set up multiple times, as needed.
QMTest's central concept is that of a 'test database'. A test database organizes tests. It lets users introspect tests (test types, test arguments, prerequisite resources, previous test results, expectations, etc.), as well as run them (everything or only specific sub-suites, by means of different 'target' implementations
I don't understand what you mean by "run them *by means of* 'target' implementations."
Sorry for expressing myself poorly. (And in fact I'm not sure why I mentioned targets at all in that phrase.) As target classes provide the execution context, it's them that iterate over queues of tests that are assigned to them. But that's getting into implementation detail quite a bit...
either in serial, or parallel using multi-threading, multiple processes, or even multiple hosts).
Would QMTest be used to drive multi-host testing across the internet (i.e. at different testers' sites), or more likely just within local networks? If the former, how do its facilities for that compare with BuildBot?
QMTest would typically be used to drive individual 'test runs', presumably only over local networks, and can then be used during the aggregation of the results of such test runs into test reports. As such, it is complementary to the facilities offered by buildbot.
Another important point is scalability: While some test suites are simple and small, we also deal with test suites that hold many thousands of tests (QMTest is used for some of the GCC test suites, for example). A test can mean to run a single (local) executable, or require a compilation, an upload of the resulting executable to a target board
Target board?
Yes (please note that 'target' here is not the same term used above). In the context here it refers to cross-compilation and cross-testing.
with subsequent remote execution, or other even more fancy things.
Test results are written to 'result streams' (which can be customized as most of QMTest). There is a 'report' command that merges the results from multiple test runs into a single test report (XML), which can then be translated to whatever output medium is desired.
How could this be useful for boost ?
A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc.
I found that boost's testing harness lacks robustness.
Our testing system itself seems to be pretty reliable. I think it's the reporting system that lacks robustness.
I agree.
There is no way to ask seemingly simple questions such as "what tests constitute this test suite ?" or "what revision / date / runtime environment etc. does this result correspond to ?", making it hard to assess the overall performance / quality of the software.
I believe the hardest part is the connection between QMTest and boost.build. Since boost.build doesn't provide the level of introspection QMTest promises, a custom 'boost.build test database' implementation needs some special hooks from the build system. I discussed that quite a bit with Vladimir.
And what came of it?
I'm not sure. boost.build would need to be extended to allow QMTest to gain access to the database structure (the database already exists, conceptually, in terms of the directory layout...). Volodya ? Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Fri Aug 17 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Would QMTest be used to drive multi-host testing across the internet (i.e. at different testers' sites), or more likely just within local networks? If the former, how do its facilities for that compare with BuildBot?
QMTest would typically be used to drive individual 'test runs', presumably only over local networks,
Why presumably? Is there a limitation that prevents it from going out to the web?
and can then be used during the aggregation of the results of such test runs into test reports.
As such, it is complementary to the facilities offered by buildbot.
Can you explain why it makes sense to use two systems?
Another important point is scalability: While some test suites are simple and small, we also deal with test suites that hold many thousands of tests (QMTest is used for some of the GCC test suites, for example). A test can mean to run a single (local) executable, or require a compilation, an upload of the resulting executable to a target board
Target board?
Yes (please note that 'target' here is not the same term used above). In the context here it refers to cross-compilation and cross-testing.
But what is it?
How could this be useful for boost ?
A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc.
Still looking for that.
I believe the hardest part is the connection between QMTest and boost.build. Since boost.build doesn't provide the level of introspection QMTest promises, a custom 'boost.build test database' implementation needs some special hooks from the build system. I discussed that quite a bit with Vladimir.
And what came of it?
I'm not sure. boost.build would need to be extended to allow QMTest to gain access to the database structure (the database already exists, conceptually, in terms of the directory layout...). Volodya ?
There's no a priori reason that Boost.Build needs to maintain the test database, is there? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Fri Aug 17 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Would QMTest be used to drive multi-host testing across the internet (i.e. at different testers' sites), or more likely just within local networks? If the former, how do its facilities for that compare with BuildBot? QMTest would typically be used to drive individual 'test runs', presumably only over local networks,
Why presumably? Is there a limitation that prevents it from going out to the web?
No. I'm just speculating what users might do with it.
and can then be used during the aggregation of the results of such test runs into test reports.
As such, it is complementary to the facilities offered by buildbot.
Can you explain why it makes sense to use two systems?
I'm not quite sure I understand the question. Automating builds (scheduling build processes triggered by some events) is quite different from managing test databases.
Another important point is scalability: While some test suites are simple and small, we also deal with test suites that hold many thousands of tests (QMTest is used for some of the GCC test suites, for example). A test can mean to run a single (local) executable, or require a compilation, an upload of the resulting executable to a target board Target board? Yes (please note that 'target' here is not the same term used above). In the context here it refers to cross-compilation and cross-testing.
But what is it?
It is some piece of hardware that the test code may need to be uploaded to in order to be run. QMTest contains logic to do that, if requested (for example when testing that cross-compiled code runs correctly on a host platform that is different from the build platform, such as embedded chips.)
How could this be useful for boost ? A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc.
Still looking for that.
Yes, I realize that. But as I indicated earlier, I'm not convinced QMTest is a good tool to schedule / drive that. I'd use a buildbot setup for that. (Of course you may argue that it is hard to convince potential testers to install yet another piece of software, but that's a different argument, I think.)
I believe the hardest part is the connection between QMTest and boost.build. Since boost.build doesn't provide the level of introspection QMTest promises, a custom 'boost.build test database' implementation needs some special hooks from the build system. I discussed that quite a bit with Vladimir. And what came of it? I'm not sure. boost.build would need to be extended to allow QMTest to gain access to the database structure (the database already exists, conceptually, in terms of the directory layout...). Volodya ?
There's no a priori reason that Boost.Build needs to maintain the test database, is there?
No. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Fri Aug 17 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
on Fri Aug 17 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
Would QMTest be used to drive multi-host testing across the internet (i.e. at different testers' sites), or more likely just within local networks? If the former, how do its facilities for that compare with BuildBot? QMTest would typically be used to drive individual 'test runs', presumably only over local networks,
Why presumably? Is there a limitation that prevents it from going out to the web?
No. I'm just speculating what users might do with it.
and can then be used during the aggregation of the results of such test runs into test reports.
As such, it is complementary to the facilities offered by buildbot.
Can you explain why it makes sense to use two systems?
I'm not quite sure I understand the question. Automating builds (scheduling build processes triggered by some events) is quite different from managing test databases.
So QMTest doesn't schedule build/test processes? It's just a database manager?
How could this be useful for boost ? A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc.
Still looking for that.
Yes, I realize that. But as I indicated earlier, I'm not convinced QMTest is a good tool to schedule / drive that.
I meant I want some kind of analogous statement about a way we could use it that you *are* convinced of.
I'd use a buildbot setup for that. (Of course you may argue that it is hard to convince potential testers to install yet another piece of software, but that's a different argument, I think.)
I'm not worried about that at this point.
I'm not sure. boost.build would need to be extended to allow QMTest to gain access to the database structure (the database already exists, conceptually, in terms of the directory layout...). Volodya ?
There's no a priori reason that Boost.Build needs to maintain the test database, is there?
No.
So what are the alternatives to that arrangement? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
So QMTest doesn't schedule build/test processes? It's just a database manager?
Yes it does 'schedule' them, but according to rather specific requirements. You wouldn't tell QMTest to run a test suite nightly, for example, or hook up to some other external triggers (checkins, say). Rather, you specify what (sub-) testsuite to run, and QMTest will work out the order etc.
How could this be useful for boost ? A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc. Still looking for that. Yes, I realize that. But as I indicated earlier, I'm not convinced QMTest is a good tool to schedule / drive that.
I meant I want some kind of analogous statement about a way we could use it that you *are* convinced of.
* handle all test suite runs through QMTest * aggregate test results in a central place using QMTest, and manage interpretation (including expectations) to generate test reports.
There's no a priori reason that Boost.Build needs to maintain the test database, is there? No.
So what are the alternatives to that arrangement?
A boost-specific test database implementation could work like this: * by default, map each source file under libs/*/test/ to a test id, and provide a default test type which 1) compiles, 2) runs the result, 3) checks exit status and output. Default compile and link options could be generated per component (boost library). * For cases where the above doesn't work, some special test implementation can be used (that incorporate the special rules now part of the various Jamfiles). Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Fri Aug 17 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
So QMTest doesn't schedule build/test processes? It's just a database manager?
Yes it does 'schedule' them, but according to rather specific requirements. You wouldn't tell QMTest to run a test suite nightly, for example, or hook up to some other external triggers (checkins, say). Rather, you specify what (sub-) testsuite to run, and QMTest will work out the order etc.
The order in which to run tests? I don't believe we have any dependency relationships (at least, not encoded ones) that can help QMTest.
> How could this be useful for boost ? A good question, but I'm more interested in "how Boost might use it." That is, something like, "We'd set up a server with a test database. QMTest would run on the server and drive testing on each testers' machines, ..." etc. Still looking for that. Yes, I realize that. But as I indicated earlier, I'm not convinced QMTest is a good tool to schedule / drive that.
I meant I want some kind of analogous statement about a way we could use it that you *are* convinced of.
* handle all test suite runs through QMTest
too vague.
* aggregate test results in a central place using QMTest,
So QMTest stores results, OK.
and manage interpretation (including expectations)
How would one do that?
to generate test reports.
Does QMTest generate reports?
There's no a priori reason that Boost.Build needs to maintain the test database, is there? No.
So what are the alternatives to that arrangement?
A boost-specific test database implementation could work like this:
* by default, map each source file under libs/*/test/ to a test id, and provide a default test type which 1) compiles, 2) runs the result, 3) checks exit status and output. Default compile and link options could be generated per component (boost library).
* For cases where the above doesn't work, some special test implementation can be used (that incorporate the special rules now part of the various Jamfiles).
I think I understand. Essentially, one would need to implement Python classes whose instances represent each test and know how to do the testing. One could generate Jamfiles for the difficult cases. But how would we represent the tests? Python code? An actual database? As I see it right now, the most significant benefit available from QMTest is in the fact that it robustly controls the running of each test, capturing its results, and comparing those results with what's expected. Is that right? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote: >>> I meant I want some kind of analogous statement about a way we could >>> use it that you *are* convinced of. >> * handle all test suite runs through QMTest > > too vague. That's because QMTest is flexible, i.e. what it does exactly depends on the test database. >> * aggregate test results in a central place using QMTest, > > So QMTest stores results, OK. Right. >> and manage interpretation (including expectations) > > How would one do that? Similarly to what is done now, you would set up an 'expectation database' that manages expected outcomes for all tests / platforms, so QMTest can tell for each result whether it is expected or not. (In the simplest case you could just use an existing set of results from a previous test run and use that as expectation.) >> to generate test reports. > > Does QMTest generate reports? Yes, where 'report' is an XML file, which presumably would be processed by an XSLT stylesheet to generate an HTML report. (There is an XSLT stylesheet that is provided with QMTest, but I'd expect some custom layer to be added, to customize the generated HTML to fit into the boost website style...) As an alternative, QMTest can also be used as a server process from which dynamic HTML can be obtained (QMTest has a HTTP/HTML - based GUI, built using Zope). >>>>> There's no a priori reason that Boost.Build needs to maintain the test >>>>> database, is there? >>>> No. >>> So what are the alternatives to that arrangement? >> A boost-specific test database implementation could work like this: >> >> * by default, map each source file under libs/*/test/ to a test id, >> and provide a default test type which >> 1) compiles, >> 2) runs the result, >> 3) checks exit status and output. >> Default compile and link options could be generated per component >> (boost library). >> >> * For cases where the above doesn't work, some special test implementation >> can be used (that incorporate the special rules now part of the various >> Jamfiles). > > I think I understand. Essentially, one would need to implement Python > classes whose instances represent each test and know how to do the > testing. One could generate Jamfiles for the difficult cases. But > how would we represent the tests? Python code? An actual database? QMTest ships with a set of builtin test classes for the most frequent cases: execution of programs with checks for exit codes / output, compilation and source code, interpretation of python code, etc., so there is a fair chance that only little code needs to be added for customization purposes. QMTest also has some builtin test databases such as one that scans a directory for files with a given extension (.cpp, say), and interprets those as tests. Again, I expect relatively little to be needed to customize those for boost. (For avoidance of doubt: I offer to do the customization to adapt QMTest to boost's need, should you decide to give it a try.) > As I see it right now, the most significant benefit available from > QMTest is in the fact that it robustly controls the running of each > test, capturing its results, and comparing those results with what's > expected. Is that right? Yes, the execution of the tests is certainly the most important thing, but introspection of the test database (or expectations, results, etc.), without running any tests is also part of it (something that is not possible right now, IIUC. Finally, as I note above QMTest has a GUI (Zope-based) that can be used as an alternative to a command-line interface. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Wed Aug 08 2007, "John Maddock" <john-AT-johnmaddock.co.uk> wrote:
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc.
Sure, that's what the Trac plugin would do.
Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose,
The biggest downside of that idea is that it would be very expensive in SVN resources to ever give real-time feedback from testing, because you'd need a separate checkin for each build step.
yes, OK I know it's an abuse of SVN, but basically our needs are quite simple:
* Log the output from each build step and store it somewhere, with incremental builds much of this information would rairly change. In fact even if the test is rebuilt/rerun the chances are that logged data won't actually change. * Log the status of each test: pass or fail. * Log the date and time of the last test.
So what would happen if build logs for each test were stored in an SVN tree set aside for the purpose, with pass/fail and date/time status stored as SVN properties? Could this be automated, from within bjam or CMake or whatever?
Sure it could. So you *are* actually advocating a separate file in SVN for each test? I guess I also worry about the performance cost of doing a checkin for each test.
Of course we're in serious danger of getting into the tool writing business again here ....
Unless we entirely drop our display distinctions and markup, or we stick with the same fragile/unreliable display tools we have, *someone* has to write new display tools. There just aren't any existing tools out there that do what we want. Unless Kitware is prepared to accomodate our display requirements, I don't know where those tools are going to come from other than from within Boost. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
So what would happen if build logs for each test were stored in an SVN tree set aside for the purpose, with pass/fail and date/time status stored as SVN properties? Could this be automated, from within bjam or CMake or whatever?
Sure it could.
So you *are* actually advocating a separate file in SVN for each test? I guess I also worry about the performance cost of doing a checkin for each test.
Nod: however we look at this, there's a lot of data flying around, any system is going to struggle at some point :-(
Of course we're in serious danger of getting into the tool writing business again here ....
Unless we entirely drop our display distinctions and markup, or we stick with the same fragile/unreliable display tools we have, *someone* has to write new display tools. There just aren't any existing tools out there that do what we want.
Sigh, yes. Of course that's the same reason we started Boost.Build, quickbook etc... John.

David Abrahams wrote:
on Wed Aug 08 2007, "John Maddock" <john-AT-johnmaddock.co.uk> wrote:
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc.
Sure, that's what the Trac plugin would do.
Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose,
The biggest downside of that idea is that it would be very expensive in SVN resources to ever give real-time feedback from testing, because you'd need a separate checkin for each build step.
.. And it would clutter our revision history with test log checkins. And we'd loose all capability of making complex queries on the data. What are the upsides to using svn for this, when there are proper databases available? -- Daniel Wallin Boost Consulting www.boost-consulting.com

on Thu Aug 09 2007, Daniel Wallin <daniel-AT-boost-consulting.com> wrote:
David Abrahams wrote:
on Wed Aug 08 2007, "John Maddock" <john-AT-johnmaddock.co.uk> wrote:
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc.
Sure, that's what the Trac plugin would do.
Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose,
The biggest downside of that idea is that it would be very expensive in SVN resources to ever give real-time feedback from testing, because you'd need a separate checkin for each build step.
.. And it would clutter our revision history with test log checkins.
Yes, that exactly puts a finger on something I was worried about but couldn't articulate. On the other hand, that's something you deal with any time you have lots of semi-unrelated activity in source control. For example, the sandbox and the boost release tree are all together (not to mention all the separate libraries).
And we'd loose all capability of making complex queries on the data.
What did you have in mind?
What are the upsides to using svn for this, when there are proper databases available?
Good question. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Thu Aug 09 2007, Daniel Wallin <daniel-AT-boost-consulting.com> wrote:
David Abrahams wrote:
on Wed Aug 08 2007, "John Maddock" <john-AT-johnmaddock.co.uk> wrote:
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc. Sure, that's what the Trac plugin would do.
Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose, The biggest downside of that idea is that it would be very expensive in SVN resources to ever give real-time feedback from testing, because you'd need a separate checkin for each build step. .. And it would clutter our revision history with test log checkins.
Yes, that exactly puts a finger on something I was worried about but couldn't articulate. On the other hand, that's something you deal with any time you have lots of semi-unrelated activity in source control. For example, the sandbox and the boost release tree are all together (not to mention all the separate libraries).
And we'd loose all capability of making complex queries on the data.
What did you have in mind?
Anything that isn't "what was the state at this time?". I don't know what it might be, but my point was that when someone comes up with something we might not be able to do it, because we would have picked a system that isn't as flexible and fast as a relational database when it comes to querying data. -- Daniel Wallin Boost Consulting www.boost-consulting.com

Daniel Wallin wrote:
David Abrahams wrote:
on Wed Aug 08 2007, "John Maddock" <john-AT-johnmaddock.co.uk> wrote:
Just thinking out loud here, but I've always thought that our test results should be collected in a database: something like each test gets an XML result file describing the test result, which then gets logged in the database. The display application would query the database, maybe in real-time for specific queries and present the results etc.
Since I'm the one who mentioned this to Dave, and it's something I never got around to mentioning at BoostCon, here's the summary I had in mind: ** As each Boost.Build action completes Boost.Build submits the results directly to a database. There is no intermediate XML format to deal with, no formatting, etc. ** Other tools would take the data from the DB and do further processing. The important fact to consider here is that there are way more tools, utilities, libraries, and personnel experience with manipulating databases. We might not need to use our own tools for this. It's likely we can find database reporting tools that are close to we want. Hence the goal is to move the data to a medium we now well, and can support better. We also happen to get the benefit of immediate result feedback. Essentially this is the reporting side equivalent to BuildBot. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Daniel Wallin wrote:
Thinking somewhat outside the box here ... but could SVN be conscripted for this purpose,
The biggest downside of that idea is that it would be very expensive in SVN resources to ever give real-time feedback from testing, because you'd need a separate checkin for each build step.
Maybe, you'd probably want a separate repository just for the test logs.
.. And it would clutter our revision history with test log checkins. And we'd loose all capability of making complex queries on the data.
Are you sure?
What are the upsides to using svn for this, when there are proper databases available?
Probably not many: we're using SVN already, folks know what they're doing with it, and we get a revision history of changes to program output: could be useful if you're trying to gradually suppress/fix warnings and you want a diff of recent changes to see if there's any progress. *But*: this was just an idea thrown in: I'm perfectly happy with Rene's suggestion of sending individual build steps to a database, and then using std database/web tools to present the results. And frankly I'd say that whoever is prepared to invest the time in this gets to choose ! :-) John.

David Abrahams wrote:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
I think it is important to consider the different tools for the purpose they were designed for. None of them will do the whole job, but with a good combination of them I believe very useful and robust things can be built. Notably, I do believe that buildbot is an invaluable tool to drive the build and test automation. It provides a good framework to formalize the process in, and it scales very well. While it has some 'GUI' to visualize the state of the various builders, I wouldn't think of using it to display test reports. Generating test reports shouldn't be that hard, once you have all the essential information that should figure in it. (That sounds banal, but until recently wasn't even possible: there still is no way to figure out what revision (or source tree timestamp) a given test run corresponds to !) Once all that information is available in a machine-parsable form, writing that last bit of code to generate a useful report (html or other) should be straight-forward. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin...

on Wed Aug 08 2007, Stefan Seefeld <seefeld-AT-sympatico.ca> wrote:
David Abrahams wrote:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
I think it is important to consider the different tools for the purpose they were designed for. None of them will do the whole job, but with a good combination of them I believe very useful and robust things can be built.
Notably, I do believe that buildbot is an invaluable tool to drive the build and test automation. It provides a good framework to formalize the process in, and it scales very well.
As I mentioned, I plan to deal with that issue in a separate thread of this analysis. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 8 Aug 2007, at 17:01, David Abrahams wrote:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
I've struggled to create a coherent organization to this post, but it still rambles a little, for which I apologize in advance.
Feedback Systems ================
Boost's feedback system has evolved some unique and valuable features
Unique Boost Features ---------------------
* Automatic distinction of regressions from new failures.
* A markup system that allows us to distinguish library bugs from compiler bugs and add useful, detailed descriptions of severity and consequences. This feature will continue to be important at *least* as long as widely-used compilers are substantially nonconforming.
* Automatic distinction of tests that had been failing due to toolset limitiations and begin passing without a known explanation.
* A summary page that shows only unresolved issues.
* A separate view encoding failure information in a way most appropriate for users rather than library developers.
While I acknowledge that Boost's feedback system has substantial weaknesses, no other feedback system I've seen accomodates most of these features in any way.
I agree. I've had numerous experiences with large projects that
have not done it as well as boost. Personally I find the status information held by meta-comm to be useful and informative. The opening page isn't very useful but digging in always leads to the information that is most useful.
Dart ----
It seems like Dart is a long, long way from being able to handle our display needs -- it is really oriented towards providing binary "is everything OK?" reports about the health of a project. It would actually be really useful for Boost to have such a binary view; it would probably keep us much closer to the "no failures on the trunk (or integration branch, if you prefer)" state that we hope to maintain continuously. However, I'm convinced our finer distinctions remain extremely valuable as well.
Other problems with Dart's dashboards (see http://public.kitware.com/dashboard.php?name=public):
* It is cryptic, rife with unexplained links and icons. Even some of the Kitware guys didn't know what a few of them meant when asked.
* Just like most of Boost's regression pages, it doesn't deal well with large amounts of data. One look at kitware's main dashboard above will show you a large amount of information, much of which is useless for at-a-glance assessment, and the continuous and experimental build results are all at the bottom of the page.
Dart's major strength is that it maintains a database of past build results, so anyone can review the entire testing history.
BuildBot --------
Buildbot is not really a feedback system; it's more a centralized system for driving testing. I will deal with that aspect of our system in a separate message.
Buildbot's display result (see http://twistedmatrix.com/buildbot/ for example) is no better suited to Boost's specific needs than Dart's, but it does provide one useful feature not seen in either of the other two systems: one can see, at any moment, what any of the test machines are doing. I know that's something Dart users want, and I certainly want it. In fact, as Rene has pointed out to me privately, the more responsive we can make the system, the more useful it will be to developers. His fantasy, and now mine, is that we can show developers the results of individual tests in real time.
Another great feature BuildBot has is an IRC plugin that insults the developer who breaks the build (http://buildbot.net/repos/release/docs/buildbot.html#IRC-Bot) Apparently the person who fixes the build gets to choose the next insult ;-)
Most importantly, BuildBot has a plugin architecture that would allow us to (easily?) customize feedback actions (http://buildbot.net/repos/release/docs/buildbot.html#Writing-New- Status-Plugins).
Boost's Systems ---------------
The major problems with our current feedback systems, AFAICT, are fragility and poor user interface.
I probably don't need to make the case about fragility, but in case there are any doubts, visit http://engineering.meta-comm.com/boost-regression/CVS-HEAD/ developer/index.build-index.html For the past several days, it has shown a Python backtrace
Traceback (most recent call last): File "D:\inetpub\wwwroots\engineering.meta-comm.com\boost-regression \handle_http.py", line 324, in ? ... File "C:\Python24\lib\zipfile.py", line 262, in _RealGetContents raise BadZipfile, "Bad magic number for central directory" BadZipfile: Bad magic number for central directory
This is a typical problem, and the system breaks for one reason or another <subjective>on a seemingly weekly basis</subjective>.
With respect to the UI, although substantial effort has been invested (for which we are all very grateful), managing that amount of information is really hard, and we need to do better. Some of the current problems were described in this thread <http://tinyurl.com/2w7xch> and <http://tinyurl.com/2n4usf>; here are some others:
* The front page is essentially empty, showing little or no useful information <http://engineering.meta-comm.com/boost-regression/boost_1_34_1/ developer/index.html>
* Summary tables have a redundant list of libraries at left (it also appears in a frame immediately adjacent)
* Summaries and individual library charts present way too much information to be callied "summaries", overwhelming any reasonably-sized browser pane. We usually don't need a square for every test/platform combination
* It's hard to answer simple questions, like, "what is the status of Boost.Python under gcc-3.4?" or "how well does MPL work on windows with STLPort?", or what is the list of
* A few links are cryptic (Full view/Release view) and could be better explained.
The email system that notifies developers when their libraries are broken seems to be fairly reliable. Its major weakness is that it reports all failures (even those that aren't regressions) as regressions, but that's a simple wording change. Its second weakness is that it has no way to harass the person who actually made the code-breaking checkin, and harasses the maintainer of every broken library just as aggressively, even if the breakage is due to one of the library's dependencies.
Recommendations ---------------
Our web-based regression display system needs to be redesigned and rewritten. It was evolved from a state where we had far fewer libraries, platforms, and testers, and is burdened with UI ideas that only work in that smaller context. I suggest we start with as minimal a display as we think we can get away with: the front status reporting page should be both useful and easily-grasped.
IMO the logical approach is to do this rewrite as a Trac plugin, because of the obvious opportunities to integrate test reports with other Trac functions (e.g. linking error messages to the source browser, changeset views, etc.), because the Trac database can be used to maintain the kind of history of test results that Dart manages, and because Trac contains a nice builtin mechanism for generating/displaying reports of all kinds. In my conversations with the Kitware guys, when we've discussed how Dart could accomodate Boost's needs, I've repeatedly pushed them in the direction of rebuilding Dart as a Trac plugin, but I don't think they "get it" yet.
I have some experience writing Trac plugins and would be willing to contribute expertise and labor in this area. However, I know that we also need some serious web-UI design, and many other people are much more skilled in that area than I am. I don't want to waste my own time doing badly what others could do well and more quickly, so I'll need help.
Yes, I realize this raises questions about how test results will actually be collected from testers; I'll try to deal with those in a separate posting.
Generally I agree with all the recommendations. However I am a big fan of incremental delivery and I would advocate boost approach this systemically. You don't want to get into the tool business. (.. avoid the anecdotal 'why fix things in 5 minutes when I can take a year writing a tool to automate it! :-{) For what it is worth my advice would be to do the following; 1. Choose 2/3 representative tool-chains/platforms as boost 'reference models' (msvc-M.N, win XP X.Y...) (gcc-N.M, debian...) (gcc-N.M, MacOSX,...) - the choices are based on what's right 'for the masses' and what is the defacto platform for mainstream development on those platforms (before anyone screams I am seriously NOT advocating dropping the builds on the other platforms - read on) - whatever the choices end up being I believe 'boost' needs to make a clear policy decision. 2. These 'reference models' are the basis of summary reports at the top level against the 'stable' released libraries. That can go on a page and it should take a minor amount of time to generate incrementally from the existing system. 3. As for tracking individual test results I don't personally see what's wrong with putting these under subversion. Given the likelyhood of high commonality between the output text of successive runs I think it is a much better 'implementation choice' than strictly a database. Certainly XML output from the test framework would aid other post-processing - but can be a secondary step/ enhancement to Boost.Test? Also there is a strong correlation between the versioning of test results and the changes since the last run that changed the results. Some relatively trivial automation of the source dependency tree changes between successive runs of individual tests could be a significant aid for the authors/maintainers. I'm not an expert on bjam but I presume for an individual target it would not be difficult to run a diff between the sources in successive invocations in each test. 4. Given the reference models above it would then be sensible to show the status of successive tiers of the boost project. ie. stable, development, sandbox, ... Again an indirection at the top-level will make this accessible. 5. Beyond this I would split out the summaries into platform variants on individual pages 'boost on windows', 'boost on linux' etc. In this way no information is lost and the community of developers is taken care of. Hope this helps. As things scale there is a stronger need for 'standardization' its unavoidable. Tool-chains are rarely the silver bullet. What boost has shouldn't be neglected ... it is already good for reporting status and its failings can be worked on incrementally. Andy
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost

on Wed Aug 08 2007, Andy Stevenson <andystevenson-AT-mac.com> wrote:
While I acknowledge that Boost's feedback system has substantial weaknesses, no other feedback system I've seen accomodates most of these features in any way.
I agree. I've had numerous experiences with large projects that have not done it as well as boost.
I wasn't trying to make a value judgement, FWIW.
Personally I find the status information held by meta-comm to be useful and informative. The opening page isn't very useful but digging in always leads to the information that is most useful.
Yes, you can get there. It should be easier.
Generally I agree with all the recommendations. However I am a big fan of incremental delivery and I would advocate boost approach this systemically.
Systematically?
You don't want to get into the tool business. (.. avoid the anecdotal 'why fix things in 5 minutes when I can take a year writing a tool to automate it! :-{)
I'm afraid it's inevitable in this case, unless we can get someone else to do it for us.
For what it is worth my advice would be to do the following;
1. Choose 2/3 representative tool-chains/platforms as boost 'reference models' (msvc-M.N, win XP X.Y...) (gcc-N.M, debian...) (gcc-N.M, MacOSX,...) - the choices are based on what's right 'for the masses' and what is the defacto platform for mainstream development on those platforms (before anyone screams I am seriously NOT advocating dropping the builds on the other platforms - read on) - whatever the choices end up being I believe 'boost' needs to make a clear policy decision.
2. These 'reference models' are the basis of summary reports at the top level against the 'stable' released libraries. That can go on a page and it should take a minor amount of time to generate incrementally from the existing system.
That won't fix the system's reliability.
3. As for tracking individual test results I don't personally see what's wrong with putting these under subversion.
See several responses on this list to John Maddock, who made that suggestion too.
Given the likelyhood of high commonality between the output text of successive runs I think it is a much better 'implementation choice' than strictly a database.
You can store diffs in a database too. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

On 10 Aug 2007, at 03:44, David Abrahams wrote:
on Wed Aug 08 2007, Andy Stevenson <andystevenson-AT-mac.com> wrote:
While I acknowledge that Boost's feedback system has substantial weaknesses, no other feedback system I've seen accomodates most of these features in any way.
I agree. I've had numerous experiences with large projects that have not done it as well as boost.
I wasn't trying to make a value judgement, FWIW.
Personally I find the status information held by meta-comm to be useful and informative. The opening page isn't very useful but digging in always leads to the information that is most useful.
Yes, you can get there. It should be easier.
Agreed... hence the suggestion to cut down the area covered by the multiple layers of OS/compiler variants....?
Generally I agree with all the recommendations. However I am a big fan of incremental delivery and I would advocate boost approach this systemically.
Systematically?
That too :-)!
You don't want to get into the tool business. (.. avoid the anecdotal 'why fix things in 5 minutes when I can take a year writing a tool to automate it! :-{)
I'm afraid it's inevitable in this case, unless we can get someone else to do it for us.
You're in a much better place to judge this than me. However I would seriously recommend moving the existing framework forward to gauge real requirements.
For what it is worth my advice would be to do the following;
1. Choose 2/3 representative tool-chains/platforms as boost 'reference models' (msvc-M.N, win XP X.Y...) (gcc-N.M, debian...) (gcc-N.M, MacOSX,...) - the choices are based on what's right 'for the masses' and what is the defacto platform for mainstream development on those platforms (before anyone screams I am seriously NOT advocating dropping the builds on the other platforms - read on) - whatever the choices end up being I believe 'boost' needs to make a clear policy decision.
2. These 'reference models' are the basis of summary reports at the top level against the 'stable' released libraries. That can go on a page and it should take a minor amount of time to generate incrementally from the existing system.
That won't fix the system's reliability.
Sure, I didn't intend that it addressed the reliability issues. My point was that from a reporting, support, release and development perspective boost may make more rapid progress if it cut down the 'platform' area it tries to support as the 'vanguard' from development. In an area where skilled resources are scarce or unpredictable then could it be a pragmatic suggestion for addressing some of the key issues on developer feedback?
3. As for tracking individual test results I don't personally see what's wrong with putting these under subversion.
See several responses on this list to John Maddock, who made that suggestion too.
Yep. Read what John wrote and agree to most of it.
Given the likelyhood of high commonality between the output text of successive runs I think it is a much better 'implementation choice' than strictly a database.
You can store diffs in a database too.
Sure, understand that... i think as svn is core to the boost toolchain is there a need to introduce databases into the toolchain? Andy
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
The Astoria Seminar ==> http://www.astoriaseminar.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/ listinfo.cgi/boost

on Fri Aug 10 2007, Andy Stevenson <andystevenson-AT-mac.com> wrote:
Given the likelyhood of high commonality between the output text of successive runs I think it is a much better 'implementation choice' than strictly a database.
You can store diffs in a database too.
Sure, understand that... i think as svn is core to the boost toolchain is there a need to introduce databases into the toolchain?
Nothing to introduce. We already have a Trac server, which has a database backend. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams <dave <at> boost-consulting.com> writes:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
<details snipped> I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/). We make an automated build (or continuous integration) server named Pulse. An ex-colleague of mine drew my attention to this discussion (I have been out of C++ for a couple of years now so did not pick it up myself). I thought that maybe there would be an opportunity for collaboration here if boost community are interested. First off, we offer Pulse licenses for free to any open source project (there are no strings attached). In this more specific case, being a previous boost user I had always thought that the project would be a great test for Pulse. One of the main features of Pulse is to enable building and testing across multiple environments (operating systems, build tools, runtime versions etc) which is obviously a key requirement for boost. I also believe that Pulse offers a very usable web interface in contrast to a lot of the open source alternatives. In addition to a free license and the current features of Pulse (which I will not bore you with here), we also realise we would need to add further features and help with integration. For example, your requirements for supporting different types of test failures are only partially available in the current Pulse release (we indicate that a test has been failing since some earlier build, but do not have a notion of "known" failures). Also, support for boost build and test tools will need to be added, but this is quite simple (and is already on our roadmap for existing customers). In terms of how we benefit from the arrangement, I will be upfront. Most importantly, as I mentioned, I believe that this will be a great test for Pulse and will lead us to feature ideas that will solve real problems. Second, as a previous boost user I am personally happy to give some time back. Finally, we may get some wider exposure for Pulse in the boost community. Let me stress, however, that this is not required in any way -- it is entirely up to the boost team to decide what level of exposure would be appropriate. If it was a real concern we could even rebrand the UI. Thanks for your consideration, and I look forward to your feedback. Cheers, Jason

Jason Sankey wrote:
I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/).
Interesting. A few questions, after having a brief glance at your docs... Would you say your product aims to cover the combined area that Buildbot, build/test collection (CTest, Boost regression.py, Dartboard, and partly Buildbot), and reporting (Dartboard, and Boost XSL reports) encompass? I see the server is a Java application. Do the agents also require Java? Do you support client-to-server connection model for agents, instead of the server-to-client one described in the docs? Is there a description of the SQL DB schema available? Is non-SCM build/testing supported? In particular, obtaining source snapshots in the form of archives from a web server? --Does the subversion change monitoring occur as a watchdog or as an svn post-commit script? Or are both supported?-- Scratch that, found the answer :-) Do you have support for independent builders & testers to post results outside of the master-agent framework? Can a single executed recipe command sprout multiple result data points without output post-processing? Does the output post-processing happen on the client or server? Are dynamic file artifacts supported? And by dynamic I mean a variable set of generated files. I'm a bit confused about your arrangement of projects, builds, and agents... * Is it possible to control which agents get particular builds? * Is it possible to to have sub-projects? * Can an agent performs builds for multiple projects? * Can one or more builds on one or more agents depend on the success of one or more builds on one or more agents? Is it possible to have dynamic builds? By this I mean builds for which the commands, i.e. the recipe(s), change before or during the build itself. Is it possible to add additional notification methods, for example IRC or SMS? And how hard/easy would it be? -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Jason Sankey wrote:
I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/).
Interesting.
PS. Pagination on your issue tracker doesn't work without cookies :-( -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - grafikrobot/yahoo

Rene Rivera wrote:
Jason Sankey wrote:
I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/).
Interesting. A few questions, after having a brief glance at your docs...
Would you say your product aims to cover the combined area that Buildbot, build/test collection (CTest, Boost regression.py, Dartboard, and partly Buildbot), and reporting (Dartboard, and Boost XSL reports) encompass?
The closest match in that list is Buildbot, although you are right that our features and reporting cover more ground including overlapping with the other tools you mention. In particular the unit test results are pulled directly into the Pulse UI as they are such an important part of the build/test cycle.
I see the server is a Java application. Do the agents also require Java?
Yes, the master and agents both require a JVM. The agents are lightweight and throwaway, the master maintains the persistent data on disk and in an embedded or (preferrably) external database.
Do you support client-to-server connection model for agents, instead of the server-to-client one described in the docs?
There is only one connection model, which is bi-directional and over HTTP. The master pings agents to detect when they are available, and agents forward build events back to the master.
Is there a description of the SQL DB schema available?
There is not, although we are open to it. However, for most external integration we prefer usage of the XML-RPC remote API as it is usually simpler and more likely to remain compatible across versions.
Is non-SCM build/testing supported? In particular, obtaining source snapshots in the form of archives from a web server?
Not in the current release, but it should be possbile in the coming release (aiming for beta next month) as we are making SCMs pluggable.
--Does the subversion change monitoring occur as a watchdog or as an svn post-commit script? Or are both supported?-- Scratch that, found the answer :-)
For the benefit of the list both are supported. The default is polling as it works "out of the box" but triggering builds via the remote API is possible from a post-commit hook.
Do you have support for independent builders & testers to post results outside of the master-agent framework?
Not at present.
Can a single executed recipe command sprout multiple result data points without output post-processing?
I am not sure what sort of output data points you are referring to hear? Generally speaking, a single command just succeeds or fails. Post-processing is used to extract further useful information such as errors, warnings and test results.
Does the output post-processing happen on the client or server?
Post-processing happens during the build on the agent that runs the build. It is also possible to execute arbitary post-build commands on the master, but these cannot contribute errors or test results to the build itself.
Are dynamic file artifacts supported? And by dynamic I mean a variable set of generated files.
You can capture files using wildcards quite flexibly.
I'm a bit confused about your arrangement of projects, builds, and agents...
* Is it possible to control which agents get particular builds?
Yes. You can assign build stages to specific agents or to any "capable" agent. A capable agent is determined by the resources available on an agent. Resources are configured via the web UI per-agent, and can be used to represent many things like available build tools, runtimes, the host operating system or logical agent groups.
* Is it possible to to have sub-projects?
There is no current concept of sub-projects. We do have project groups (for grouping in the UI).
* Can an agent performs builds for multiple projects?
Yes, but presently agents will only run one build at a time.
* Can one or more builds on one or more agents depend on the success of one or more builds on one or more agents?
It is possible to trigger the build of one project when the build of another project completes, optionally only if the original build succeeds. This cannot be parameterised by agent at the moment.
Is it possible to have dynamic builds? By this I mean builds for which the commands, i.e. the recipe(s), change before or during the build itself.
You cannot modify the Pulse recipes during the build. However, we prefer for Pulse recipes to be as simple as possible, with as much of the build logic as possible being left to an underlying build tool (like make or bjam). Basically, these tools are better at that part of the process and are not tied to Pulse, and Pulse just aims to interoperate with them,
Is it possible to add additional notification methods, for example IRC or SMS? And how hard/easy would it be?
Custom notifications are currently possible using post-build actions. If you can do it from the command line, then you can trigger it quite easily with a post build action. We are also looking at making notifications pluggable (currently we provide email, jabber and Windows system tray out of the box). Hope that helps clarify some points, Jason

on Sat Aug 18 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
David Abrahams <dave <at> boost-consulting.com> writes:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features.
<details snipped>
I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/)...
Jason, As far as I'm concerned Boost would be more than happy to use your tools if it will save us labor. With no other immediate prospects for improving things, we're going to try to launch a project to re-do the reporting system as a Trac plugin, with something like an XML-RPC frontend that will accept results from the testing servers. If you are offering to set up a Zutubi system that tests Boost, and it can do the things we need in terms of reporting (like handle test markup), we could delay that project and give you a chance to get something going. Do you think that would save us labor in the long run? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

Hi David, David Abrahams wrote:
on Sat Aug 18 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
David Abrahams <dave <at> boost-consulting.com> writes:
This part of my analysis focuses on the tools available for getting feedback from the system about what's broken. Once again, because there's been substantial effort invested in dart/cmake/ctest and interest expressed by Kitware in supporting our use thereof, I'm including that along with our current mechanisms. Although not strictly a reporting system, I'll also discuss BuildBot a bit because Rene has been doing some research on it and it has some feedback features. <details snipped>
I would like to add another possibility into the mix, if I may. I am a founder and developer at Zutubi (http://zutubi.com/)...
Jason,
As far as I'm concerned Boost would be more than happy to use your tools if it will save us labor. With no other immediate prospects for improving things, we're going to try to launch a project to re-do the reporting system as a Trac plugin, with something like an XML-RPC frontend that will accept results from the testing servers. If you are offering to set up a Zutubi system that tests Boost, and it can do the things we need in terms of reporting (like handle test markup), we could delay that project and give you a chance to get something going. Do you think that would save us labor in the long run?
Certainly, this is the idea in general. Software development teams are usually capable of creating their own automated build system, but it costs a lot of labour that is better spent on other things. The only real advantage to rolling your own is the complete customisability. I am confident that a lot of what you need will come out of the box with Pulse. Also, since part of the idea from our end is to push the boundaries of Pulse, we will be available to add features that are required. I can't promise immediate addition of all requested features (it is clearly not practical) but if there are showstoppers we will address them. Further, we continue to open up areas of extensibility in Pulse (like the remote API, and plugins in the coming version), so if members of the boost community want to bend it one way or the other it should be possible. In our opinion a key requirement of a build server is flexibility to integrate with all sorts of existing practices and tools. The next step from our end would be to build in some support for bjam and Boost.Test (I can have it done by next week) and then set up an initial server to show you what Pulse is about. This is the lowest risk way for boost in that you don't need to invest much effort to take a look. The main question is provision of hardware: do you have hardware available that you would like to host this demo on? If not, I am sure I can sort something out. I guess we can take these details off the list and post back once something is running to get some feedback. Cheers, Jason

on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
The next step from our end would be to build in some support for bjam and Boost.Test (I can have it done by next week) and then set up an initial server to show you what Pulse is about. This is the lowest risk way for boost in that you don't need to invest much effort to take a look. The main question is provision of hardware: do you have hardware available that you would like to host this demo on?
I have hardware that I'm *willing* to host it on, but it would be simpler at least in the early stages (while you're the only person who needs write access to the demo) if you hosted it.
If not, I am sure I can sort something out. I guess we can take these details off the list and post back once something is running to get some feedback.
That would be fine. Thanks for your generous offer. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
The next step from our end would be to build in some support for bjam and Boost.Test (I can have it done by next week) and then set up an initial server to show you what Pulse is about. This is the lowest risk way for boost in that you don't need to invest much effort to take a look. The main question is provision of hardware: do you have hardware available that you would like to host this demo on?
I have hardware that I'm *willing* to host it on, but it would be simpler at least in the early stages (while you're the only person who needs write access to the demo) if you hosted it.
Yep, this makes sense. I will sort out a server to host the demo, and if things go well we can migrate as necessary.
If not, I am sure I can sort something out. I guess we can take these details off the list and post back once something is running to get some feedback.
That would be fine. Thanks for your generous offer.
No problem, I will come back to the list when I have something running. Cheers, Jason

on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
Certainly, this is the idea in general. Software development teams are usually capable of creating their own automated build system, but it costs a lot of labour that is better spent on other things. The only real advantage to rolling your own is the complete customisability. I am confident that a lot of what you need will come out of the box with Pulse. Also, since part of the idea from our end is to push the boundaries of Pulse, we will be available to add features that are required.
I can't promise immediate addition of all requested features (it is clearly not practical) but if there are showstoppers we will address them.
The main things that absolutely need to be there are: 1. support for the explicit/expected failure markup that is currently stored at http://boost.org/status/explicit-failures-markup.xml and integrated with our results display as described in my posting at the beginning of this part of the thread. 2. support for determining which tests have regressed since the previous release. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
Certainly, this is the idea in general. Software development teams are usually capable of creating their own automated build system, but it costs a lot of labour that is better spent on other things. The only real advantage to rolling your own is the complete customisability. I am confident that a lot of what you need will come out of the box with Pulse. Also, since part of the idea from our end is to push the boundaries of Pulse, we will be available to add features that are required.
I can't promise immediate addition of all requested features (it is clearly not practical) but if there are showstoppers we will address them.
The main things that absolutely need to be there are:
1. support for the explicit/expected failure markup that is currently stored at http://boost.org/status/explicit-failures-markup.xml and integrated with our results display as described in my posting at the beginning of this part of the thread.
OK, as indicated this would need to be added. I will look into this after I have an initial server running.
2. support for determining which tests have regressed since the previous release.
This we already have: tests that have been failing for multiple builds are displayed differently with a link to the build they have been failing since. We can tweak the reporting of these failures if necessary, but at least all the info is already known to Pulse. Cheers, Jason

on Thu Sep 06 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
David Abrahams wrote:
on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
Certainly, this is the idea in general. Software development teams are usually capable of creating their own automated build system, but it costs a lot of labour that is better spent on other things. The only real advantage to rolling your own is the complete customisability. I am confident that a lot of what you need will come out of the box with Pulse. Also, since part of the idea from our end is to push the boundaries of Pulse, we will be available to add features that are required.
I can't promise immediate addition of all requested features (it is clearly not practical) but if there are showstoppers we will address them.
The main things that absolutely need to be there are:
1. support for the explicit/expected failure markup that is currently stored at http://boost.org/status/explicit-failures-markup.xml and integrated with our results display as described in my posting at the beginning of this part of the thread.
OK, as indicated this would need to be added. I will look into this after I have an initial server running.
Great, looking forward to this. For what it's worth, making results reporting reliable has become a critical problem for us, so the faster you can do get something working, the better. And, if there's going to be a problem, please let us know ASAP so we can start investigating other solutions right away. Thanks again, -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Thu Sep 06 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
David Abrahams wrote:
on Wed Sep 05 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
Certainly, this is the idea in general. Software development teams are usually capable of creating their own automated build system, but it costs a lot of labour that is better spent on other things. The only real advantage to rolling your own is the complete customisability. I am confident that a lot of what you need will come out of the box with Pulse. Also, since part of the idea from our end is to push the boundaries of Pulse, we will be available to add features that are required.
I can't promise immediate addition of all requested features (it is clearly not practical) but if there are showstoppers we will address them. The main things that absolutely need to be there are:
1. support for the explicit/expected failure markup that is currently stored at http://boost.org/status/explicit-failures-markup.xml and integrated with our results display as described in my posting at the beginning of this part of the thread. OK, as indicated this would need to be added. I will look into this after I have an initial server running.
Great, looking forward to this.
For what it's worth, making results reporting reliable has become a critical problem for us, so the faster you can do get something working, the better. And, if there's going to be a problem, please let us know ASAP so we can start investigating other solutions right away.
OK. I am underway, having added boost jam support and looking at the best way to gather and integrate the test results. I was hoping that there would be a way to run a full build with one or more XML test reports generated (since I know that Boost.Test supports XML reporting). Looking more closely I see that the current regression testing process uses the normal test report format, which I can also integrate if generating XML proves difficult. I'm still examining the current build process to try and understand the best way to do it, but should be up and running soon. As an aside, the output from boost jam is somewhat hostile to post-processing. One feature of Pulse is the ability to pull interesting information like warnings and errors out of your build log so we can summarise and highlight them in the UI. With tools like make and GCC this is fairly easy as they have a predictable and uniform error message output. Playing with boost jam I notice that the error messages are quite diverse and hard to predict. Although I added post-processing rules for the errors I found, it might be worth looking into making the output more machine-friendly - not just for my sake, but for any tools that might want to process the output. Cheers, Jason

on Wed Sep 12 2007, Jason Sankey <jason-AT-zutubi.com> wrote:
OK. I am underway, having added boost jam support and looking at the best way to gather and integrate the test results.
Fantastic!
I was hoping that there would be a way to run a full build with one or more XML test reports generated (since I know that Boost.Test supports XML reporting).
We can generate XML test reports, but it's not done by Boost.Test; it's done by process_jam_log.py
Looking more closely I see that the current regression testing process uses the normal test report format,
What "normal test report format" are you referring to?
which I can also integrate if generating XML proves difficult. I'm still examining the current build process to try and understand the best way to do it, but should be up and running soon.
As an aside, the output from boost jam is somewhat hostile to post-processing. One feature of Pulse is the ability to pull interesting information like warnings and errors out of your build log so we can summarise and highlight them in the UI. With tools like make and GCC this is fairly easy as they have a predictable and uniform error message output. Playing with boost jam I notice that the error messages are quite diverse and hard to predict. Although I added post-processing rules for the errors I found, it might be worth looking into making the output more machine-friendly - not just for my sake, but for any tools that might want to process the output.
Rene, IIUC you already have the results capture facility implemented that would allow this? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com The Astoria Seminar ==> http://www.astoriaseminar.com

David Abrahams wrote:
on Wed Sep 12 2007, Jason Sankey <jason-AT-zutubi.com> wrote: <snip>
I was hoping that there would be a way to run a full build with one or more XML test reports generated (since I know that Boost.Test supports XML reporting).
We can generate XML test reports, but it's not done by Boost.Test; it's done by process_jam_log.py
With some more digging and trial and error I went down this path. Instead of adding support for Boost.Test XML reports I have added support for the test_log.xml files generated by process_jam_log. I was expecting Boost.Test to be used as I had not considered the nature of some boost tests: e.g. those that test "compilability". This is a rather unique testing setup in that regard, but at the end of the day once I was able to generate XML reports it was easy to integrate. What I have running at present (in development - I have a server I am readying to transfer this to) is a "developer centric" view like you mentioned earlier in the thread. That is, the result of the build is binary and useful to Boost developers for picking up regressions fast. There are also developer-centric views, reports, notifications etc. All this is less useful for reporting the status of each platform from the Boost *user's* perspective. There are several possible ways to approach this, probably best left until after you get an initial look at what the heck Pulse does so far. I do have a couple of issues though: - Some of the test_log.xml files cannot be parsed by the library I am using due to certain characters contained within. I have not looked into whether the problem is the logs or the library I am using. - I have 49 failing cases (out of 2232 that can be parsed atm). I guess some of these failures may be a certain class of "expected" failure. I am yet to fully understand all the classes of expected failure, in particular which classes are reported as succeed in the test_log.xml vs reported as fail. - I get a couple of warnings about capabilities that are not being built due to external dependencies (GraphML and MPI support). These may be easy to add I just need to read a bit more.
Looking more closely I see that the current regression testing process uses the normal test report format,
What "normal test report format" are you referring to?
Sorry, poor choice of words. I was just referring to what comes out of the bjam build and is processed by process_jam_log (which I also did not really understand at that point). <snip> Cheers, Jason
participants (18)
-
Ames, Andreas (Andreas)
-
Andy Stevenson
-
Beman Dawes
-
Brad King
-
Daniel Wallin
-
David Abrahams
-
Doug Gregor
-
Felipe Magno de Almeida
-
Hartmut Kaiser
-
Janek Kozicki
-
Jason Sankey
-
John Maddock
-
Maik Beckmann
-
Rene Rivera
-
Sohail Somani
-
Stefan Seefeld
-
Timothy M. Shead
-
Vladimir Prus