[testing] Refactoring serialization tests, and possibly others.

After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*]. And reading some of the recent discussion about the desire to expand testing to include cross-version compatibility and cross-compiler compatibility, and hence having the number of tests multiply possibly exponentially. I am seriously concerned that we are going in the wrong direction when it comes to structuring tests. From looking at the tests for serialization I think we are over-testing, and we are past the point of exhausting testing resources. Currently this library takes the approach of carpet bombing the testing space. The current tests follow this overall structure: [feature tests] x [archive types] x [char/wchar] x [DLL/not-DLL] Obviously this will never scale. My first observation is that it doesn't seem that those axis look like independent features to me. That is, for example, the char/wchar functionality doesn't depend on the feature getting tested, or at least it shouldn't. And I can't imagine the library is structure internally in that way. To me it doesn't make sense to test "array" saving with each of the 3 archive types since the code for serialization of the "array" is the same in all situations. Hence it would makes more sense to me to structure the tests as: [feature test] x [xml archive type] x [char] x [not-DLL] [text archive tests] x [char] x [non-DLL] [binary archive tests] x [non-DLL] [wchar tests] x [non-DLL] [DLL tests] Basically it's structured to test specific aspects of the library not to test each aspect against each other aspect. Some benefits as I see them: * Reduced number of tests means faster turn around on testing. * It's much easier to add tests for other aspects as one only has to concentrate on a few tests instead of many likely unrelated aspects. * The tests can be expanded to test the aspects more critically. For example the DLL tests can be very specific as to what aspect of DLL vs non-DLL they test. * It is easier to tell what parts of the library are breaking when the tests are specific. [*] That's just a CPU time observation, not to mention that it takes 2.15GB of disk space out of a total of 5.09GB. And it's only one compiler variation. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Basically you're correct on all of this. Rene Rivera wrote:
After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*]. And reading some of the recent discussion about the desire to expand testing to include cross-version compatibility and cross-compiler compatibility, and hence having the number of tests multiply possibly exponentially. I am seriously concerned that we are going in the wrong direction when it comes to structuring tests.
This was the basis of my suggestion that we run a complete set only very occasionally.
From looking at the tests for serialization I think we are over-testing, and we are past the point of exhausting testing resources. Currently this library takes the approach of carpet bombing the testing space. The current tests follow this overall structure:
[feature tests] x [archive types] x [char/wchar] x [DLL/not-DLL]
Obviously this will never scale.
carpet bombing the test space? - I like the imagery. When I started this was not a problem. I was happy to beat it to death as I could ( and still do ) just run the whole suite on my machine overnight when ever I make a change. However, I agree that we're about at the limit without making some changes.
My first observation is that it doesn't seem that those axis look like independent features to me. That is, for example, the char/wchar functionality doesn't depend on the feature getting tested, or at least it shouldn't. And I can't imagine the library is structure internally in that way. To me it doesn't make sense to test "array" saving with each of the 3 archive types since the code for serialization of the "array" is the same in all situations. Hence it would makes more sense to me to structure the tests as:
[feature test] x [xml archive type] x [char] x [not-DLL]
[text archive tests] x [char] x [non-DLL]
[binary archive tests] x [non-DLL]
[wchar tests] x [non-DLL]
[DLL tests]
Basically it's structured to test specific aspects of the library not to test each aspect against each other aspect. Some benefits as I see them:
This makes a lot of sense - except that in the past it has turned out that some turn out to be accidently connected. Also sometimes compiler quirks show up in just some combinations.
* Reduced number of tests means faster turn around on testing. * It's much easier to add tests for other aspects as one only has to concentrate on a few tests instead of many likely unrelated aspects. * The tests can be expanded to test the aspects more critically. For example the DLL tests can be very specific as to what aspect of DLL vs non-DLL they test.
Note the DLL version should function identially to the static library version - so this is an exhaustive test of that fact.
* It is easier to tell what parts of the library are breaking when the tests are specific.
Hmm - that sort of presumes we know what's going to fail ahead of time. There is another related issue. It seems that the tests are run every night - even though no changes have been made at all to the serialization library. In effect, we're using the serializaiton library to test other changes in boost. The argument you make above can just as well be used to argue that serialization is on a different dimension than other libraries so serialization tests shouldn't be re-run just because some other library changes. So there are a number of things that might be looked into a) Reduce the combinations of the serializaton tests. b) Don't use libraries to test other libraries. That is don't re-test one library (.e.g. serialization) just because some other library that it depends upon (e.g. mpl) has changed. c) Define a two separate test Jamfiles - i) normal test ii) carpet bombing mode e) Maybe normal mode can be altered on frequent basis when I just want to test a new feature. or just one test. f) Include as part of the installation instructions for users an exhaustive test mode. That is a user how downloads and installs the package would have the option of producing the whole test results on his own platform and sending in his results. This would have a couple of advantages i) It would be sure that all new platforms are tested ii) I would ensure that the user has everyting installed correctly Robert Ramey

Robert Ramey wrote:
Basically you're correct on all of this.
Rene Rivera wrote:
After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*]. And reading some of the recent discussion about the desire to expand testing to include cross-version compatibility and cross-compiler compatibility, and hence having the number of tests multiply possibly exponentially. I am seriously concerned that we are going in the wrong direction when it comes to structuring tests.
This was the basis of my suggestion that we run a complete set only very occasionally.
I thought that was given. Sorry if I didn't make that clear when suggesting widening the current strategy. Clearly this isn't for everybody. [snip]
So there are a number of things that might be looked into
a) Reduce the combinations of the serializaton tests. b) Don't use libraries to test other libraries. That is don't re-test one library (.e.g. serialization) just because some other library that it depends upon (e.g. mpl) has changed. c) Define a two separate test Jamfiles - i) normal test ii) carpet bombing mode e) Maybe normal mode can be altered on frequent basis when I just want to test a new feature. or just one test. f) Include as part of the installation instructions for users an exhaustive test mode. That is a user how downloads and installs the package would have the option of producing the whole test results on his own platform and sending in his results. This would have a couple of advantages i) It would be sure that all new platforms are tested ii) I would ensure that the user has everyting installed correctly
a) I haven't thought about yet. I agree that the scheme could be better thought out, but it looks tricky to do it right, and my big issue right now, for a client, is to be able to test platform-independent archives (from different architectures) against each other in carpet-bomb mode. Since it turns out to be trivial to also accommodate testing different boost versions and compilers, I'm offering it up. b) is out of my hands. c) i) I have refactored the "normal tests" in places to support the existence of a carpet bomb mode, without lengthening the current "normal mode" tests. I've also been switching tests over to the autoregistering unit test framework as I go. c) ii) I have been working hard on. Carpet bombing is controlled by an environment variable (which I am considering changing to BOOST_SERIALIZATION_CARPET_BOMB). The tarball I posted recently is now very out-of-date, I have a lot of changes and have noticed a few additional things. For instance one needs a second class A that can be serialized by platform-independent archives, and the build environment will have to do this switching. Some primitives in the existent class A need to be removed in the platform independent case, and the use of std::rand() has to go, as you won't get the same stream of numbers on all platforms, so you use boost::random RNGs, and you need to be sure that you (re)seed the RNGs at appropriate times. d) If there's a carpet bombing mode, what do you call normal mode? Covering Fire? e) Sounds convenient. f) Whatever I come up with for c) ii) should be pretty easy to package like that, if you guys actually want to use it. -t

troy d. straszheim wrote: Strange, I hadn't realized you where actively working to change the serialization test, from reading the previous threads :-\
Robert Ramey wrote:
So there are a number of things that might be looked into
a) Reduce the combinations of the serializaton tests. b) Don't use libraries to test other libraries. That is don't re-test one library (.e.g. serialization) just because some other library that it depends upon (e.g. mpl) has changed. c) Define a two separate test Jamfiles - i) normal test ii) carpet bombing mode e) Maybe normal mode can be altered on frequent basis when I just want to test a new feature. or just one test. f) Include as part of the installation instructions for users an exhaustive test mode. That is a user how downloads and installs the package would have the option of producing the whole test results on his own platform and sending in his results. This would have a couple of advantages i) It would be sure that all new platforms are tested ii) I would ensure that the user has everyting installed correctly
a) I haven't thought about yet. I agree that the scheme could be better thought out, but it looks tricky to do it right, and my big issue right now, for a client, is to be able to test platform-independent archives (from different architectures) against each other in carpet-bomb mode. Since it turns out to be trivial to also accommodate testing different boost versions and compilers, I'm offering it up.
One thing I should make clear about my suggestion to reduce the combinations... Is that it doesn't mean reducing the number of types that get serialized, rather increasing those and having more complex and comprehensive test.
b) is out of my hands.
It's likely out of everyones hands. This is a build system issue with regards to preventing the normal header dependency scanning. I don't think it's a road we want to follow as it overall destabilizes the correctness of test results. (c) smart_ptr already does this by running an expanded set of test if you: "cd libs/smart_ptr/test && bjam test".
c) i) I have refactored the "normal tests" in places to support the existence of a carpet bomb mode, without lengthening the current "normal mode" tests. I've also been switching tests over to the autoregistering unit test framework as I go.
c) ii) I have been working hard on. Carpet bombing is controlled by an environment variable (which I am considering changing to BOOST_SERIALIZATION_CARPET_BOMB).
We really don't want to use environment variables any more. Try making it based on a command line switch. For example we might want to standardize on some set of testing level options, and incorporate them into the build system. Perhaps: --test-level=basic --test-level=regression --test-level=complete You can easily test for the options with: if --test-level=basic in $(ARGV)
d) If there's a carpet bombing mode, what do you call normal mode? Covering Fire?
:-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera wrote:
One thing I should make clear about my suggestion to reduce the combinations... Is that it doesn't mean reducing the number of types that get serialized, rather increasing those and having more complex and comprehensive test.
I need to think about it more... for now, I carpet bomb. But it clearly would get out of control as types, archives, platforms and versions proliferate. No question. It would be interesting to see exactly where all the time is going, I have the feeling it is mostly in the build, and that if one simply clumped many of these tests together, you could eliminate a lot of duplicated effort (template instantiations and linking). This is another motivation for switching to autoregistering tests: an all-in-one-binary (fast but fragile) or one-binary-per-test (slow but durable) scenario could be fairly easily switched by bjam, since these autoregistering tests look like BOOST_AUTO_UNIT_TEST(whatever) { ... } and you can combine as many of these into one binary as you like, which isn't the case with the current int test_main(int, char**) { ... } where you'd get link errors. So conceivably the all-in-one version for --test-level=basic, where you are just looking for a sanity check on your platform and aren't actually doing development on serialization, and the one-binary-per-unit-test version for the --test-level=complete scenario, where you're doing development on serialization and want to zoom in on test failures as closely as possible. Or maybe I'm out of my mind. Hmm. This is looking like a lot of work.
... don't want to use environment variables any more. Try making it based on a command line switch. For example we might want to standardize on some set of testing level options, and incorporate them into the build system. Perhaps:
--test-level=basic --test-level=regression --test-level=complete
You can easily test for the options with:
if --test-level=basic in $(ARGV)
Thanks for the heads-up. I didn't catch the discussions about getting rid of environment variables. --cmdline-arguments=better. -t

"troy d. straszheim" <troy@resophonic.com> writes:
Rene Rivera wrote:
One thing I should make clear about my suggestion to reduce the combinations... Is that it doesn't mean reducing the number of types that get serialized, rather increasing those and having more complex and comprehensive test.
I need to think about it more... for now, I carpet bomb. But it clearly would get out of control as types, archives, platforms and versions proliferate. No question.
Hi Troy, It sounds like, from what Rene has said, it already is out of control. Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently. I hope you realize that, and I think you probably do, but from your posting it wasn't clear. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"troy d. straszheim" <troy@resophonic.com> writes:
Rene Rivera wrote:
One thing I should make clear about my suggestion to reduce the combinations... Is that it doesn't mean reducing the number of types that get serialized, rather increasing those and having more complex and comprehensive test.
I need to think about it more... for now, I carpet bomb. But it clearly would get out of control as types, archives, platforms and versions proliferate. No question.
It sounds like, from what Rene has said, it already is out of control.
I did another incremental test run today which spent most of the time retesting serialization, probably some minor change. But it ran from 10:30am to 8:00pm, 11.5 hours. So yes, I consider that out of control.
Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently. I hope you realize that, and I think you probably do, but from your posting it wasn't clear.
I hope everyone realizes this. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera wrote:
10:30am to 8:00pm, 11.5 hours. So yes, I consider that out of control.
Oops, 9.5 hours :-) I'm horrible at math. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

I should note that the serialization library changes only very infrequently - maybe once every few weeks. I would question re-runing the serialization library tests just because something that the serialization library depends upon changes. Certainly this doesn't do me any good. If someone breaks/changes and api that the serialization library depends upon, does that person even find out about it? Much less track it down? Not that this is a big deal as it almost never happens. The last time it happened that I remember was when all the serialization tests for comeau started to fail when something in Boost Test changed. That was 6 months ago and nothing has changed. A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed. But boost has other scalability problems. The testing/release regimen permits libraries to be more tightly coupled than they probably should be. Ideally I would like to see something like: a) Each maintainer has his own branch. b) He makes changes on his own machine c) He tests his changes on his own machine. b) He makes changes on his branch. c) Whenever he makes a change, tests are run on that branch for that library. d) When he is statisfied with c) above, he merges changes from his branch into the main trunk. e) Whenever d) occurs, tests are run on the main branch for any libraries newly merged in. f) The health of the main trunk "should" be very good. Periodically, and much more frequently than now, (maybe every 60 days or so), Exhaustive tests are run on the trunk - just as they are now. Any problems should be small. These are fixed and a new release is made - zip an tarbal. This would a) diminish requirements for testing immensly. b) testing resources would not be wasted on repeating the same tasks. c) library authors could not rely on other libraries to test their code. I know no one does this intentionally, but we're all human, and w're under pressure and writing tests IS a pain and, etc.... d) tests would by necessity have to be more complete and focused. Presumable these would build incrementally. e) result in much more frequent new releases. If a bug is found, it would be practical to just wait for the next release to fix it rather then trying to "stop the presses" f)Errors would be easier to find. Since we're all testing our changes agains the trunk rather than everyone elses changes any error during development has to be library writers. An error that shows up when changes are merged to the trunk has to be due to an API change in another library. g) Authors wouldn't be under pressure to get feature Y crammed in to make the next release date. Each would work at his how pace knowing that shortly after its ready, it will appear in the official release. h) Author's wouldn't have to hold back changes that they think are OK because they just might throw a monkey wrench into the whole works. It seems to me that this should be the main benefit of the source control system and that we are not receiving that benefit. Robert Ramey Rene Rivera wrote:
David Abrahams wrote:
"troy d. straszheim" <troy@resophonic.com> writes:
Rene Rivera wrote:
One thing I should make clear about my suggestion to reduce the combinations... Is that it doesn't mean reducing the number of types that get serialized, rather increasing those and having more complex and comprehensive test.
I need to think about it more... for now, I carpet bomb. But it clearly would get out of control as types, archives, platforms and versions proliferate. No question.
It sounds like, from what Rene has said, it already is out of control.
I did another incremental test run today which spent most of the time retesting serialization, probably some minor change. But it ran from 10:30am to 8:00pm, 11.5 hours. So yes, I consider that out of control.
Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently. I hope you realize that, and I think you probably do, but from your posting it wasn't clear.
I hope everyone realizes this.

"Robert Ramey" <ramey@rrsd.com> writes:
I should note that the serialization library changes only very infrequently - maybe once every few weeks.
I would question re-runing the serialization library tests just because something that the serialization library depends upon changes.
We can talk about global policy changes after you fix the emergency. It's a lot harder to change the way we do testing and the rest of our infrastructure than it is for you to simply reduce the number of tests being run. Please do what's required to bring the overall Boost testing time back down to something reasonable.
A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed.
No it doesn't. To reiterate: Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently. -- Dave Abrahams Boost Consulting www.boost-consulting.com

I have infrastructure built to do portability testing. It is a lot of changes, flexible, and turned on by command line switches. I know this doesn't help the current emergency, and I'm buried in chasing down bugs in the portable archive that I've found since I can now give it the full battery of serialization tests, and since I need to deliver this archive type ASAP, I can't switch to trying to do something intelligent about overall testing time within serialization. I am collecting ideas, and trying to refactor things as I go to prepare for this "something intelligent". But for now, maybe switch from carpet bombing to running only polymorphic xml archive (static), and non-archive-specific tests? It's a quick hack to the Jamfile, would cut the run time down by 1/6 or so and still get pretty good coverage. Testers could also get immediate relief by running with -sBOOST_ARCHIVE_LIST=xml_archive.hpp to get the same effect. (Apologies if that's obvious.) -t David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I should note that the serialization library changes only very infrequently - maybe once every few weeks.
I would question re-runing the serialization library tests just because something that the serialization library depends upon changes.
We can talk about global policy changes after you fix the emergency. It's a lot harder to change the way we do testing and the rest of our infrastructure than it is for you to simply reduce the number of tests being run.
Please do what's required to bring the overall Boost testing time back down to something reasonable.
A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed.
No it doesn't.
To reiterate:
Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently.

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
I should note that the serialization library changes only very infrequently - maybe once every few weeks.
I would question re-runing the serialization library tests just because something that the serialization library depends upon changes.
We can talk about global policy changes after you fix the emergency. It's a lot harder to change the way we do testing and the rest of our infrastructure than it is for you to simply reduce the number of tests being run.
Please do what's required to bring the overall Boost testing time back down to something reasonable.
I can do this. Should this be RC_1_33_0 or head?
A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed.
No it doesn't.
Yes it does. Here is the scenario. Library X uses something from library Y. Library Y is changed. This triggers a rebuild on Library X. This in turn triggers a re-build and re-test on Library X. At least that's way it looks like it works to me. If I'm wrong about this, then why is so much time being consumed on re-testing when none of the source code in Library X (serialization) has changed ?
To reiterate:
Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently.
Well, its been this way for almost a year. And it has been inconvenient. But its hard to justify characterization as an urgent emergency. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
Please do what's required to bring the overall Boost testing time back down to something reasonable.
I can do this. Should this be RC_1_33_0 or head?
Both. We're testing them both pending the release of 1.33.1
A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed.
No it doesn't.
Yes it does.
Here is the scenario. Library X uses something from library Y. Library Y is changed. This triggers a rebuild on Library X. This in turn triggers a re-build and re-test on Library X. At least that's way it looks like it works to me.
That is correct, but what you said made it sound like X would be retested unconditionally. The idea that we should not be re-testing libraries when their dependencies change is debatable, but that's a different discussion.
To reiterate:
Anything that makes it very difficult and/or expensive for a tester to complete a testing run needs to be fixed rather urgently.
Well, its been this way for almost a year.
Approximately 11-hour testing cycles for all testers of Boost on one compiler due to the serialization library has been a fact of life for a year? That's news to me.
And it has been inconvenient. But its hard to justify characterization as an urgent emergency.
I would have been much more insistent long ago had I known. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
Well, its been this way for almost a year.
Approximately 11-hour testing cycles for all testers of Boost on one compiler due to the serialization library has been a fact of life for a year? That's news to me.
And it has been inconvenient. But its hard to justify characterization as an urgent emergency.
I would have been much more insistent long ago had I known.
I guess "urgent" might be a debatable, but more importantly we are at the brink of multiday turn around times for testing which impacts development of all libraries. Perhaps we should invest in adding those resource statistics to the regression results? Putting next to each test how long they took. And putting a total run test time for libraries would help us see this issue earlier. We currently have a lack of feedback as to how the testing system is behaving. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera <grafik.list@redshift-software.com> writes:
Perhaps we should invest in adding those resource statistics to the regression results? Putting next to each test how long they took. And putting a total run test time for libraries would help us see this issue earlier. We currently have a lack of feedback as to how the testing system is behaving.
I did add features to bjam that could compute the time it took to build any target, so that might be a good idea. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
Please do what's required to bring the overall Boost testing time back down to something reasonable.
I can do this. Should this be RC_1_33_0 or head?
Both. We're testing them both pending the release of 1.33.1
If I recall correctly, the idea was that there would be a release of 1.33.1 30 September 2005. I will be leaving town this sunday 2 October. Do we really want to mess with something that's been this way for a year and will only go on for couple more days at this time? That doesn't seem wise to me.
A main cause of this problem bjam dependency analysis re-runs all tests on Library X even if library X hasn't changed.
No it doesn't.
Yes it does.
Here is the scenario. Library X uses something from library Y. Library Y is changed. This triggers a rebuild on Library X. This in turn triggers a re-build and re-test on Library X. At least that's way it looks like it works to me.
That is correct, but what you said made it sound like X would be retested unconditionally.
The idea that we should not be re-testing libraries when their dependencies change is debatable, but that's a different discussion.
Well, if we weren't doing that we wouldn't have a problem. So if its not the same discussion but it is related. Of course we can test less. But the root of the problem is that probably only a small percentage of the effort invested in testing is actually testing anything. I know I've brought this up before but made no headway so I won't harp on it anymore. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
Please do what's required to bring the overall Boost testing time back down to something reasonable.
I can do this. Should this be RC_1_33_0 or head?
Both. We're testing them both pending the release of 1.33.1
If I recall correctly, the idea was that there would be a release of 1.33.1 30 September 2005. I will be leaving town this sunday 2 October. Do we really want to mess with something that's been this way for a year and will only go on for couple more days at this time? That doesn't seem wise to me.
Maybe not, if 1.33.1 is actually on target for release 30 Sept. I don't know if that's the case.
Here is the scenario. Library X uses something from library Y. Library Y is changed. This triggers a rebuild on Library X. This in turn triggers a re-build and re-test on Library X. At least that's way it looks like it works to me.
That is correct, but what you said made it sound like X would be retested unconditionally.
The idea that we should not be re-testing libraries when their dependencies change is debatable, but that's a different discussion.
Well, if we weren't doing that we wouldn't have a problem.
Wrong. Many testers are doing "clean run" testing that forces everything to be retested unconditionally, specifically to avoid the sorts of inaccuracies that we'd have if we ignore dependencies.
So if its not the same discussion but it is related. Of course we can test less. But the root of the problem is that probably only a small percentage of the effort invested in testing is actually testing anything. I know I've brought this up before but made no headway so I won't harp on it anymore.
It's hard to understand your objection to retesting library A in combination with a changed library B on which it depends, since that is essentially what you're doing with the various parts of the serialization with your N x M x K testing. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
So if its not the same discussion but it is related. Of course we can test less. But the root of the problem is that probably only a small percentage of the effort invested in testing is actually testing anything. I know I've brought this up before but made no headway so I won't harp on it anymore.
It's hard to understand your objection to retesting library A in combination with a changed library B on which it depends, since that is essentially what you're doing with the various parts of the serialization with your N x M x K testing.
LOK, I could also say its hard to understand why you don't object to re-testing A in combinarion with B but do object to doing the same thing with NxMxK serialization library. But there are some differences. The serialization library is still one library with one person taking responsability for reviewing the results and reconciling any issues. any failure in any combination will be addressed. The serialization libary is change only once every few weeks whereas something in the whole of boost changed at least once / day. So the impact on testing time is much greater from retesting every AxB combination. Libraries A and B expect to change their public API only very occasionally. If the API is being tested independently, then re-testing by other libraries should be redundant. This doesn't apply to the internal behavior of any particular library. Regarding the serialization library in particular, I have strived to make the different aspects indepent and interact with each other only through the most narrow of API. This is the basic motivation for splitting the view of things in to archive and serialization. Similarly I strived to make the DLL versions identical in usage to the static library usage and all the archives implement the same API. The reason that we have the MxNxKxL(don't forget deug and release !) is that I was successful in doing this. Never-the-less, it has been a struggle and now the test results show pretty good othogonality except for compiler quirks which show some combinations trip compilers ICE's. I don't know that I could have arrived at this point without testing all the combinations from time to time. Libraries A & B don't have this situation. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
So if its not the same discussion but it is related. Of course we can test less. But the root of the problem is that probably only a small percentage of the effort invested in testing is actually testing anything. I know I've brought this up before but made no headway so I won't harp on it anymore.
It's hard to understand your objection to retesting library A in combination with a changed library B on which it depends, since that is essentially what you're doing with the various parts of the serialization with your N x M x K testing.
LOK, I could also say its hard to understand why you don't object to re-testing A in combinarion with B but do object to doing the same thing with NxMxK serialization library.
The former is hard to change, and it's not clear that the change is desirable.
But there are some differences.
The serialization library is still one library with one person taking responsability for reviewing the results and reconciling any issues. any failure in any combination will be addressed.
The serialization libary is change only once every few weeks whereas something in the whole of boost changed at least once / day. So the impact on testing time is much greater from retesting every AxB combination.
Not for those who only do clean test runs.
Libraries A and B expect to change their public API only very occasionally. If the API is being tested independently, then re-testing by other libraries should be redundant. This doesn't apply to the internal behavior of any particular library. Regarding the serialization library in particular, I have strived to make the different aspects indepent and interact with each other only through the most narrow of API. This is the basic motivation for splitting the view of things in to archive and serialization. Similarly I strived to make the DLL versions identical in usage to the static library usage and all the archives implement the same API. The reason that we have the MxNxKxL(don't forget deug and release !) is that I was successful in doing this. Never-the-less, it has been a struggle and now the test results show pretty good othogonality except for compiler quirks which show some combinations trip compilers ICE's. I don't know that I could have arrived at this point without testing all the combinations from time to time. Libraries A & B don't have this situation.
It's possible to set things up so you run all the combinations but the regular Boost test suite runs fewer. -- Dave Abrahams Boost Consulting www.boost-consulting.com

On Tue, Sep 20, 2005 at 05:35:57PM +0200, troy d. straszheim wrote:
It would be interesting to see exactly where all the time is going, I have the feeling it is mostly in the build, and that if one simply clumped many of these tests together, you could eliminate a lot of duplicated effort (template instantiations and linking). This is another motivation for switching to autoregistering tests:
I tried a simple hack and got a 4x increase in speed. Having converted all the serialization tests to autoregistering unit tests, I took 9 typical test modules and put them together simply by creating a file test_many.cpp that #includes test_array.cpp test_binary.cpp test_contained_class.cpp test_deque.cpp test_map.cpp test_derived.cpp test_exported.cpp test_derived_class.cpp test_list.cpp and compiling that, in the thinking that most of the time spent is taken up by compiling, linking, and calculating dependencies of the same code. The "before" picture: 326.17s user 153.82s system 92% cpu 8:40.03 total the first ~30 seconds of which is bjam calculating the dependencies for each test module. The "after" test_many.cpp hack starts compilation after ~4 seconds of bjam checking dependencies, with a total time of 99.52s user 41.00s system 93% cpu 2:30.51 total This is with the portability testing stuff that started this thread in there, but not turned on. Memory usage on the test_many all-in-one-hack-module isn't noticeably higher, as the test modules typically differ by a couple hundred lines. The time/memory is all in the header files. There is still a compile-link-execute cycle for each archive type plus it's dll version, so it would seem that there's still lots of duplicated work. There's plenty more gains to be had. I need a few tips so I can tinker further, if you think that's a good idea. I think in terms of "make", I believe this is a problem. Lemme know if we should move this to the boostjam list or if I'm just too far afield. 1) The test_many.cpp above is a hack. It would seem cleaner to have the test modules listed in the Jamfile and have bjam invoke the compiler as cc test1.cpp test2.cpp ... testN.cpp is there a clean way to do this? 2) If I can get 1), above, this one is moot. I want to factor out some convenience functions into a file test_tools.cpp and have this compiled only once, and have test_tools.o added to each test executable's link. I've tried adding test_tools.cpp after $(sources) in rule run-template in lib/serialization/test/Jamfile like this: rule run-template ( test-name : sources * : requirements * ) { return [ run <lib>../../test/build/boost_unit_test_framework <lib>../../filesystem/build/boost_filesystem $(sources) test_tools.cpp : # command : # : # requirements std::locale-support toolset::require-boost-spirit-support <borland*><*><cxxflags>"-w-8080 -w-8071 -w-8057 -w-8062 -w-8008 -w-0018 -w-8066" $(requirements) : # test name $(test-name) : # default-build debug ] ; } but this of course just recompiles test_tools.cpp for each module. Dunno. 3) There are "demo" tests in the Jamfile, which are pulled in and compiled as tests by #defining main to test_main and #including the .cpp from the examples directory. This means these demos need <lib>../../test/build/boost_prog_exec_monitor not boost_unit_test_framework. I'm stumped on this one. How to split up the tests into two groups, each of which takes different libs? Should one toss these from the standard testing run completely and just put a separate Jamfile over in examples/ ? You could get the same code into both the tests and the examples by taking the contents of main() out into some other #inlcuded file, but then the examples would get obscured. Dunno. Thanks, -t

"troy d. straszheim" <troy@resophonic.com> writes:
326.17s user 153.82s system 92% cpu 8:40.03 total
the first ~30 seconds of which is bjam calculating the dependencies for each test module.
Unless you have used bjam's profiling feature to break the execution time down carefully, you shouldn't draw any conclusions about how much time it spends calculating dependencies -- they're usually wrong. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"troy d. straszheim" <troy@resophonic.com> writes:
326.17s user 153.82s system 92% cpu 8:40.03 total
the first ~30 seconds of which is bjam calculating the dependencies for each test module.
Unless you have used bjam's profiling feature to break the execution time down carefully, you shouldn't draw any conclusions about how much time it spends calculating dependencies -- they're usually wrong.
as an example, in a few cases recently it has been filesystem operations e.g. pwd, that has caused Boost.Build to sit there spinning its thumbs, the CVS version has some big improvements over the bjam/Boost.Build in 1.33. However reducing the dependancies/number of compile/link operations will no doubt also help, the proportions may now be different. Kevin -- | Kevin Wheatley, Cinesite (Europe) Ltd | Nobody thinks this | | Senior Technology | My employer for certain | | And Network Systems Architect | Not even myself |

troy d. straszheim wrote:
same code. The "before" picture:
326.17s user 153.82s system 92% cpu 8:40.03 total
the first ~30 seconds of which is bjam calculating the dependencies for each test module.
To add to what Dave said... We've been personally bitten by assuming that bjam is slow in certain places. Only to humiliate ourselves after looking at the profile data :-)
The "after" test_many.cpp hack starts compilation after ~4 seconds of bjam checking dependencies, with a total time of
99.52s user 41.00s system 93% cpu 2:30.51 total
I'm assuming you deleted the results from the previous run, right?
1) The test_many.cpp above is a hack. It would seem cleaner to have the test modules listed in the Jamfile and have bjam invoke the compiler as
cc test1.cpp test2.cpp ... testN.cpp
is there a clean way to do this?
Yea... You just put all the sources into the bjam target. For example: run test1.cpp test2.cpp test3.cpp : : : std::locale-support : my_test : debug ;
3) There are "demo" tests in the Jamfile, which are pulled in and compiled as tests by #defining main to test_main and #including the .cpp from the examples directory. This means these demos need
<lib>../../test/build/boost_prog_exec_monitor
not boost_unit_test_framework. I'm stumped on this one. How to split up the tests into two groups, each of which takes different libs? Should one toss these from the standard testing run completely and just put a separate Jamfile over in examples/ ? You could get the same code into both the tests and the examples by taking the contents of main() out into some other #inlcuded file, but then the examples would get obscured. Dunno.
I don't understand your confusion. You just do exactly what you said you want to do and refer to <lib>../../test/build/boost_prog_exec_monitor in the sources for those example/tests. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera <grafik.list@redshift-software.com> writes:
cc test1.cpp test2.cpp ... testN.cpp
is there a clean way to do this?
Yea... You just put all the sources into the bjam target. For example:
run test1.cpp test2.cpp test3.cpp : : : std::locale-support : my_test : debug ;
That still results in 3 separate compilation commands, right? Makes no real difference with G++, but with MSVC that's a lot slower than just one IIUC. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Rene Rivera <grafik.list@redshift-software.com> writes:
cc test1.cpp test2.cpp ... testN.cpp
is there a clean way to do this?
Yea... You just put all the sources into the bjam target. For example:
run test1.cpp test2.cpp test3.cpp : : : std::locale-support : my_test : debug ;
That still results in 3 separate compilation commands, right? Makes no real difference with G++, but with MSVC that's a lot slower than just one IIUC.
Yes, you are right. I misunderstood the question :-\ So I guess there's no current way way of doing the single compile invocation for multiple sources. I guess I should respond to item (2) now :-) -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

On Wed, Oct 12, 2005 at 10:14:34AM -0500, Rene Rivera wrote:
troy d. straszheim wrote:
same code. The "before" picture:
326.17s user 153.82s system 92% cpu 8:40.03 total
the first ~30 seconds of which is bjam calculating the dependencies for each test module.
To add to what Dave said... We've been personally bitten by assuming that bjam is slow in certain places. Only to humiliate ourselves after looking at the profile data :-)
Sorry, I had no idea the phrase "bjam calculating dependencies" was a touchy subject. Please replace with "build system doing whatever it does before the first compilation starts, as timed by looking at my watch". I only understand the before/after picture to mean that bjam, the compiler and linker are all being told to repeat work unnecessarily; I see low-hanging fruit, and I want to pick and eat it. :) I'm *not* taking (and won't take) veiled cheap shots. I've pasted the full profiling info at the bottom of this mail. Be interested to hear what y'all make of it.
I'm assuming you deleted the results from the previous run, right?
Heh, yeah. :) I watched the compilations go by, and I double checked that ccache was turned off.
Yea... You just put all the sources into the bjam target. For example:
run test1.cpp test2.cpp test3.cpp : : : std::locale-support : my_test : debug ;
I just tried this, and test1, test2, test3 each get compiled separately. Here's the thing: the serialization unit tests each precompile to roughly 50k lines, of which at most 200 actually differ from test-to-test. When you then consider how much duplicated work there is for the compiler and linker, after all the templates in those 49,800 lines are instanatiated... there's your speedup. But one can just use the test_many.cpp which #includes all the other .cpps. Whatever you guys are comfortable with.
3) There are "demo" tests in the Jamfile, which are pulled in and compiled as tests by #defining main to test_main and #including the .cpp from the examples directory. This means these demos need
<lib>../../test/build/boost_prog_exec_monitor
not boost_unit_test_framework. I'm stumped on this one. How to split up the tests into two groups, each of which takes different libs? Should one toss these from the standard testing run completely and just put a separate Jamfile over in examples/ ? You could get the same code into both the tests and the examples by taking the contents of main() out into some other #inlcuded file, but then the examples would get obscured. Dunno.
I don't understand your confusion. You just do exactly what you said you want to do and refer to <lib>../../test/build/boost_prog_exec_monitor in the sources for those example/tests.
<lib>../../test/build/boost_prog_exec_monitor is fed directly to rule "run" inside rule run-template, which is invoked from run-(w)invoke, which is called sometimes from test-bsl-run_files, sometimes from test-bsl-run. But after this change, some files currently fed to test-bsl-run (namely those containing the string "demo") will need the exec_monitor, and some (all the others) will need the unit_test_framework. I could copy/past the whole group of rules and make some of them run-template-exec-monitor and some run-template-unit-test-framework, but that seemed ugly. Hopefully I've been clear, I have a lot of balls in the air. Thanks for your help, -t Here's that profiling info: ****** before ****** gross net # entries name 41 41 27247 MATCH 453 453 1329 PWD 0 0 1 find-to-root 6 6 96 GLOB 3454 2 1 boost-build 0 0 1 _poke 0 0 98 set-as-singleton 2 2 3113 DEPENDS 0 0 292 NOTFILE 0 0 91 ALWAYS 3 3 429 feature 3 0 312 free-feature 0 0 4 path-feature 0 0 1 dependency-feature 1 0 5 variant 980 398 5277 select-properties 30 29 7913 relevant-features 72 19 97 include-tools 47 27 5141 flags 104 92 36819 split-path 12 10 908 split 452 186 131436 get-values 382 382 152093 get-properties 8 1 1940 is-subset 93 93 39106 intersection 102 102 52329 unique 45 45 5277 normalize-properties 151 39 6055 sort 112 112 30312 bubble 112 112 28217 difference 0 0 1 declare-target-type 3451 6 615 load-jamfiles 671 3 1233 subproject 665 29 1234 SubDir 10 10 2463 FSubDir 262 44 5855 root-paths 283 20 9615 tokens-to-simple-path 309 169 15183 simplify-path-tokens 51 51 15183 reverse 89 89 15183 strip-initial 67 20 12173 FDirName 71 71 17277 join-path 0 0 1 project-root 3 3 1858 FGrist 16 6 1234 adjust-path-globals 0 0 1 project 0 0 1 path-global 2 0 5 import 0 0 3 declare-build-fail-test 0 0 3 declare-build-succeed-test 3448 0 18 test-bsl-run_files 3448 0 90 test-bsl-run_archive 1851 1 54 run-invoke 0 0 276 in-invocation-subdir 1 1 360 ECHO 3446 0 180 run-template 3446 2 180 run 3444 5 180 boost-test 1597 0 36 run-winvoke 0 0 2 test-suite 1 1 652 type-DEPENDS 0 0 90 get-library-name 3403 4 104 declare-local-target 0 0 104 expand-target-names 1 1 922 FGristFiles 13 3 194 declare-basic-target 6 3 284 expand-source-names 10 9 2784 ungrist 25 25 5990 select-gristed 1 1 198 declare-fake-targets 3387 1 90 main-target 2078 43 1318 expand-target-subvariants 6 6 1318 get-BUILD 18 18 7204 select-ungristed 1667 78 2636 expand-build-request 229 94 7640 segregate-free-properties 20 14 2636 report-free-property-conflicts 35 10 2636 remove-incompatible-builds 48 21 2636 segregate-overrides 29 29 5272 feature-default 6 6 5272 replace-properties 84 20 3864 multiply-property-sets 33 16 8416 distribute-feature 168 44 2646 fixup-path-properties 39 22 2636 make-path-property-sets 11 6 2636 remove-default-properties 2 2 2636 ungrist-properties 42 29 2636 split-path-at-grist 276 13 2186 toolset::requirements 263 21 2186 impose-requirements 1 1 598 std::locale-support 0 0 598 force-NT-static-link 0 0 598 toolset::require-boost-spirit-support 480 4 182 dependent-include 63 11 1228 target-path-of 4 4 2476 directory-of 24 3 1238 top-relative-tokens 4 4 1516 join 1 0 1228 protect-subproject 1 1 1228 protect-subdir 751 7 1228 enter-subproject 62 7 2546 relative-path 0 0 2 template 1 0 6 lib 0 0 10 template-modifier 26 12 624 target-id-of 0 0 6 dll 0 0 2 unless 0 0 3 install 0 0 3 stage 30 7 1318 split-target-subvariant 3025 29 704 subvariant-target 2 2 704 get-tag-features 52 26 1408 rename-target 2 2 704 FAppendSuffix 20 19 463 set-target-variables 2664 3 182 generate-dependencies 2638 8 364 link-libraries 2084 32 614 find-compatible-subvariant 3 2 738 is-link-compatible 6 0 4 library-file 22 1 96 Objects 0 0 187 object-name 21 2 187 Object 7 0 553 MakeLocate 6 6 553 MkDir 0 0 304 NOUPDATE 2 0 187 C++ 2 2 187 Cc-platform-specifics 0 0 187 C++-action 0 0 4 LibraryFromObjects 0 0 4 Archive 0 0 4 Archive-action 0 0 4 Ranlib 0 0 4 Ranlib-action 37 14 868 common-variant-tag 12 0 181 depend-on-static 13 10 364 depend-on-libs 3 0 183 depend-on-shared 1361 0 90 build-test 0 0 180 RMOLD 1354 5 90 run-test 20 1 90 executable-file 6 1 92 main-from-objects 3 2 90 Link-EXE 1 0 92 .do-link 1 0 92 Link-action 0 0 92 Chmod 0 0 90 test-executable(EXE) 1 1 90 capture-run-output 4 4 1822 INCLUDES 0 0 90 succeeded-test-file 36 4 90 dump-test 1 1 270 get-var-value 0 0 299 toolset::require-shared-libraries-support 4 0 2 dll-files 0 0 2 Link-DLL 0 0 180 toolset::require-wide-char-io-support 51 40 1732 HdrRule 3 3 1732 NOCARE 4 4 2029 remember-binding ****** after ****** gross net # entries name 15 15 7289 MATCH 54 54 161 PWD 0 0 1 find-to-root 0 0 16 GLOB 369 2 1 boost-build 0 0 1 _poke 0 0 18 set-as-singleton 0 0 793 DEPENDS 0 0 52 NOTFILE 0 0 11 ALWAYS 1 1 109 feature 1 0 72 free-feature 0 0 4 path-feature 0 0 1 dependency-feature 1 0 5 variant 95 36 605 select-properties 8 7 905 relevant-features 11 3 17 include-tools 7 3 901 flags 20 14 4451 split-path 5 2 196 split 42 20 15140 get-values 30 30 17589 get-properties 2 1 340 is-subset 10 10 4586 intersection 14 14 6081 unique 3 3 605 normalize-properties 18 4 695 sort 14 14 3496 bubble 10 10 3241 difference 0 0 1 declare-target-type 366 0 71 load-jamfiles 71 1 145 subproject 70 1 146 SubDir 1 1 287 FSubDir 25 1 703 root-paths 34 1 1119 tokens-to-simple-path 37 25 1791 simplify-path-tokens 3 3 1791 reverse 9 9 1791 strip-initial 6 1 1421 FDirName 7 7 2013 join-path 1 0 1 project-root 0 0 226 FGrist 1 0 146 adjust-path-globals 0 0 1 project 0 0 1 path-global 2 0 5 import 0 0 3 declare-build-fail-test 0 0 3 declare-build-succeed-test 363 0 2 test-bsl-run_files 363 0 10 test-bsl-run_archive 190 0 6 run-invoke 0 0 36 in-invocation-subdir 0 0 40 ECHO 363 0 20 run-template 363 0 20 run 363 0 20 boost-test 173 0 4 run-winvoke 0 0 2 test-suite 0 0 252 type-DEPENDS 0 0 10 get-library-name 360 1 24 declare-local-target 0 0 24 expand-target-names 0 0 138 FGristFiles 4 1 34 declare-basic-target 2 1 44 expand-source-names 2 2 480 ungrist 1 1 694 select-gristed 0 0 38 declare-fake-targets 356 0 10 main-target 208 2 150 expand-target-subvariants 0 0 150 get-BUILD 0 0 820 select-ungristed 167 10 300 expand-build-request 25 10 872 segregate-free-properties 3 0 300 report-free-property-conflicts 4 1 300 remove-incompatible-builds 2 1 300 segregate-overrides 4 4 600 feature-default 1 1 600 replace-properties 9 2 440 multiply-property-sets 4 0 960 distribute-feature 17 6 310 fixup-path-properties 6 2 300 make-path-property-sets 3 2 300 remove-default-properties 0 0 300 ungrist-properties 9 6 300 split-path-at-grist 27 0 250 toolset::requirements 27 3 250 impose-requirements 0 0 70 std::locale-support 0 0 70 force-NT-static-link 0 0 70 toolset::require-boost-spirit-support 52 0 22 dependent-include 3 0 140 target-path-of 0 0 300 directory-of 1 0 150 top-relative-tokens 2 2 188 join 0 0 140 protect-subproject 0 0 140 protect-subdir 83 2 140 enter-subproject 11 0 290 relative-path 0 0 2 template 0 0 6 lib 0 0 10 template-modifier 2 0 80 target-id-of 1 0 6 dll 0 0 2 unless 0 0 3 install 0 0 3 stage 4 0 150 split-target-subvariant 315 1 80 subvariant-target 1 0 80 get-tag-features 6 2 160 rename-target 0 0 80 FAppendSuffix 7 7 143 set-target-variables 280 0 22 generate-dependencies 277 1 44 link-libraries 206 1 70 find-compatible-subvariant 1 0 82 is-link-compatible 4 0 4 library-file 8 1 16 Objects 0 0 107 object-name 7 2 107 Object 1 0 153 MakeLocate 1 1 153 MkDir 0 0 64 NOUPDATE 0 0 107 C++ 0 0 107 Cc-platform-specifics 0 0 107 C++-action 0 0 4 LibraryFromObjects 0 0 4 Archive 0 0 4 Archive-action 0 0 4 Ranlib 0 0 4 Ranlib-action 6 2 100 common-variant-tag 1 0 21 depend-on-static 2 1 44 depend-on-libs 2 0 23 depend-on-shared 133 0 10 build-test 0 0 20 RMOLD 132 0 10 run-test 2 0 10 executable-file 2 0 12 main-from-objects 0 0 10 Link-EXE 1 1 12 .do-link 0 0 12 Link-action 0 0 12 Chmod 0 0 10 test-executable(EXE) 0 0 10 capture-run-output 1 1 1743 INCLUDES 0 0 10 succeeded-test-file 3 1 10 dump-test 0 0 30 get-var-value 0 0 35 toolset::require-shared-libraries-support 4 0 2 dll-files 1 0 2 Link-DLL 1 1 20 toolset::require-wide-char-io-support 47 40 1733 HdrRule 0 0 1733 NOCARE 1 1 2038 remember-binding

troy d. straszheim wrote:
On Wed, Oct 12, 2005 at 10:14:34AM -0500, Rene Rivera wrote:
To add to what Dave said... We've been personally bitten by assuming that bjam is slow in certain places. Only to humiliate ourselves after looking at the profile data :-)
Sorry, I had no idea the phrase "bjam calculating dependencies" was a touchy subject. Please replace with "build system doing whatever it does before the first compilation starts, as timed by looking at my watch".
In that case... Yea it's slow ;-) We know. It's slightly better now. It will continue to get better.
When you then consider how much duplicated work there is for the compiler and linker, after all the templates in those 49,800 lines are instanatiated... there's your speedup.
Yes, I can imagine. I've also seen the opposite happen. Including more code makes it exponentially slower because it ends up eating much more memory and memory starts trashing.
But one can just use the test_many.cpp which #includes all the other .cpps. Whatever you guys are comfortable with.
I thinks that's the only option you have now.
<lib>../../test/build/boost_prog_exec_monitor is fed directly to rule "run" inside rule run-template, which is invoked from run-(w)invoke, which is called sometimes from test-bsl-run_files, sometimes from test-bsl-run. But after this change, some files currently fed to test-bsl-run (namely those containing the string "demo") will need the exec_monitor, and some (all the others) will need the unit_test_framework. I could copy/past the whole group of rules and make some of them run-template-exec-monitor and some run-template-unit-test-framework, but that seemed ugly. Hopefully I've been clear, I have a lot of balls in the air.
Yes, I get it now. Perhaps the easiest way is to not set all that all the way down in the run-template rule. But instead pass it in from above. And to do that you can employ "template" targets, to reduce the typing. To see how "template" targets help in regression tests take a look at the libs/spirit/test/Jamfile which uses a variety of templates to pass down such configuration setup down to it's own custom run rule int he same vein that serialization has.
Here's that profiling info:
Basic comment... Yea since you reduced the total number of build targets by some factor, the build startup will be mostly directly reduced by that same factor. In the CVS version of bjam there are some basic improvements for operations that deal with the file system like: 0 0 16 GLOB 6 6 96 GLOB 54 54 161 PWD 453 453 1329 PWD In particular PWD now takes on average "0" time. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

troy d. straszheim wrote:
2) If I can get 1), above, this one is moot. I want to factor out some convenience functions into a file test_tools.cpp and have this compiled only once, and have test_tools.o added to each test executable's link. I've tried adding test_tools.cpp after $(sources) in rule run-template in lib/serialization/test/Jamfile like this:
rule run-template ( test-name : sources * : requirements * ) { return [ run <lib>../../test/build/boost_unit_test_framework <lib>../../filesystem/build/boost_filesystem $(sources) test_tools.cpp : # command : # : # requirements std::locale-support toolset::require-boost-spirit-support <borland*><*><cxxflags>"-w-8080 -w-8071 -w-8057 -w-8062 -w-8008 -w-0018 -w-8066" $(requirements) : # test name $(test-name) : # default-build debug ] ; }
but this of course just recompiles test_tools.cpp for each module. Dunno.
You'd need to shove the file into a library and add the <lib>yyy to the sources. Ex.: lib test_tools : test_tools.cpp : (needed requirements) : <suppress>true ; rule run-template ( test-name : sources * : requirements * ) { return [ run <lib>../../test/build/boost_unit_test_framework <lib>../../filesystem/build/boost_filesystem <lib>test_tools $(sources) test_tools.cpp : # command : # : # requirements std::locale-support toolset::require-boost-spirit-support <borland*><*><cxxflags>"-w-8080 -w-8071 -w-8057 -w-8062 -w-8008 -w-0018 -w-8066" $(requirements) : # test name $(test-name) : # default-build debug ] ; } -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

troy d. straszheim wrote:
On Tue, Sep 20, 2005 at 05:35:57PM +0200, troy d. straszheim wrote: ... I have one concern about this.
Originally, I had larger test modules. I breaking it down into smaller tests worked out better for me. But then I wasn't using the unit test framework. Since then, I see that the unit test framework is a better choice for this. Each test program attempts to focus on one feature of the library. But within that test, various ways of using the feature are excercised. When a bug is discovered, a small section is added to the test so that the things will keep moving forward. This turns out to be a good fit with the unit test framework and I'm very pleased to see things moving in this direction. My original tests had one feature tests and internally tested for each archive. This resulted in very long compile times and choked more than one compiler. Moving to smaller tests fixed that. So I would like to see more or less the same setup but finer grained control over what gets tested. I Run the current carpet bomb testing on my local machine - a 2.4GHz Windows XP machine on the following compilers. Borland 5.51, Borland 5.64, Comeau 4.3, gcc 3.3-cygwin, MSVC 6.5, VC 7.1 and VC 8.0. This takes about 4 hours for the serialization library alone. This is in debug mode. I also run release mode from time to time at it takes about the same amount of time. Running the serialization tests for one compiler typically takes about an hour. So it is a lot of time - but its not totally out of control - but it is heading there. There are a number of ideas on how to deal with this which I won't rehash here except for one. I would like to see the "Getting Started" or installation or whatever it is included a "configuration/system validation phase" whereby running the tests is a normal part of the boost installation procedure. This would result in the following: a) all tests would be run on all platforms actually being used. b) Amount of testing would actually be increased by many fold as it would be done by all who installed the package c) Variety of testing would be larger d) Installation problems would be trapped on installation rather than when users are having problems with their code. e) The testing load would be typically be 1-2 hrs per installation - not an unreasonable burden to ensure that one may save days of frustration tracking down what turns out to be an installation configuration problem. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
troy d. straszheim wrote:
On Tue, Sep 20, 2005 at 05:35:57PM +0200, troy d. straszheim wrote: ... I have one concern about this.
Originally, I had larger test modules. I breaking it down into smaller tests worked out better for me. But then I wasn't using the unit test framework. Since then, I see that the unit test framework is a better choice for this. Each test program attempts to focus on one feature of the library. But within that test, various ways of using the feature are excercised. When a bug is discovered, a small section is added to the test so that the things will keep moving forward. This turns out to be a good fit with the unit test framework and I'm very pleased to see things moving in this direction.
Be aware that if you consolidate the tests into a single executable you will only see a single square in the regression test tables.
Running the serialization tests for one compiler typically takes about an hour. So it is a lot of time - but its not totally out of control - but it is heading there.
There are a number of ideas on how to deal with this which I won't rehash here except for one.
I would like to see the "Getting Started" or installation or whatever it is included a "configuration/system validation phase" whereby running the tests is a normal part of the boost installation procedure.
Yikes! Installing already takes way too long, in my experience. Why?
This would result in the following:
a) all tests would be run on all platforms actually being used. b) Amount of testing would actually be increased by many fold as it would be done by all who installed the package c) Variety of testing would be larger d) Installation problems would be trapped on installation rather than when users are having problems with their code.
How many such problems have we seen? Is there any evidence that this would do something other than make installation longer for users? -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Robert Ramey" <ramey@rrsd.com> writes:
troy d. straszheim wrote:
On Tue, Sep 20, 2005 at 05:35:57PM +0200, troy d. straszheim wrote: ... I have one concern about this.
Originally, I had larger test modules. I breaking it down into smaller tests worked out better for me. But then I wasn't using the unit test framework. Since then, I see that the unit test framework is a better choice for this. Each test program attempts to focus on one feature of the library. But within that test, various ways of using the feature are excercised. When a bug is discovered, a small section is added to the test so that the things will keep moving forward. This turns out to be a good fit with the unit test framework and I'm very pleased to see things moving in this direction.
Be aware that if you consolidate the tests into a single executable you will only see a single square in the regression test tables.
I'm aware of that. I expect all the tests to pass. When one fails, the output from the Boost Test shows me which aspect or usage of the feature fails so I'm fine with this. As long as I have one feature per test (or try to) the system works great for me. I will note one think. For my tests here, I use the Beman's compiler_status.cpp program which I "upgraded" (and loaded into the vault) to show all the results for different build types (release/debug/static library/...). The prepares a gigantic table which is very useful to see that some failures are related to a particular build.
I would like to see the "Getting Started" or installation or whatever it is included a "configuration/system validation phase" whereby running the tests is a normal part of the boost installation procedure.
Yikes! Installing already takes way too long, in my experience. Why?
How many such problems have we seen? Is there any evidence that this would do something other than make installation longer for users?
a) OK - make it optional. Well its already optional so it would be just a question of enhancing the documentation to make it easier for new users to "validate" the instalation. b) On a regular basis we have new users who have problems. This may be due the fact they are using a new library (e.g. stlport 4.?) or compiler variation (gcc releases more frequently than we do). So when they ask a question about the serialization library, I really need to know if its a boost or system configuration issue or its an issue with the library itself. c) It would spread the testing effort. d) we would be informed of new anomolies right away. Robert Ramey

"Robert Ramey" <ramey@rrsd.com> writes:
I will note one think. For my tests here, I use the Beman's compiler_status.cpp program which I "upgraded" (and loaded into the vault) to show all the results for different build types (release/debug/static library/...). The prepares a gigantic table which is very useful to see that some failures are related to a particular build.
Noted. Is that just an FYI, or was there some particular response you were looking for?
I would like to see the "Getting Started" or installation or whatever it is included a "configuration/system validation phase" whereby running the tests is a normal part of the boost installation procedure.
Yikes! Installing already takes way too long, in my experience. Why?
How many such problems have we seen? Is there any evidence that this would do something other than make installation longer for users?
a) OK - make it optional. Well its already optional so it would be just a question of enhancing the documentation to make it easier for new users to "validate" the instalation.
Patches welcomed :)
b) On a regular basis we have new users who have problems.
"We" meaning Boost? I'm just asking because in my libraries at least most new user problems don't seem to be related to platform and build configuration; they are usually related to usage.
This may be due the fact they are using a new library (e.g. stlport 4.?) or compiler variation (gcc releases more frequently than we do). So when they ask a question about the serialization library, I really need to know if its a boost or system configuration issue or its an issue with the library itself.
So isn't that when you should ask them to run the tests?
c) It would spread the testing effort.
Seems to me it would be happening a little too late to be considered part of the "testing effort." Testing, as we do it, is for catching these problems *before* a release.
d) we would be informed of new anomolies right away.
Why would a user who wouldn't normally take the trouble to tell us about new anomalies be likely to do so if they were first instructed to run all the tests? -- Dave Abrahams Boost Consulting www.boost-consulting.com

Rene Rivera wrote:
After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*].
BTW - on my machine here I test borland 5.51, borland 5.64, msvc 6.0, msvc 6.0-stlport 4.53, VC 7.1, gcc 3.3 with cygwin, and comeau. The whole set of tests take about 5 hours on my 2.2 GHz machine for all seven compilers. That's just the serialization library - I don't test other libraries. So if you're experiencing 11 hours CPU time for one compiler you're definately doing something differently than I am. Note in a couple of cases I found test that fail in a particularly inconvenient way on some platforms. This is usually that either they loop forever or allocate so much memory that they thrash forever. Sinced you're running 11 hours CPU time for just one compiler, it would be helpful to determine the cause. Its possible that failure in just one test is causing the failure. On win32 platforms there is no test cut-off for tests which exceed some predetermined resource levels. Obviously that would be helpful, but for now we just have to look at them. Robert Ramey

Robert Ramey wrote:
Rene Rivera wrote:
After having run two cycles of tests for Boost with the mingw-3_4_2-stlport-5_0 configuration and having it take more than 14 hours on a 2.2Ghz+1GB machine, most of that in the Boost.Serialization library[*].
BTW - on my machine here I test borland 5.51, borland 5.64, msvc 6.0, msvc 6.0-stlport 4.53, VC 7.1, gcc 3.3 with cygwin, and comeau. The whole set of tests take about 5 hours on my 2.2 GHz machine for all seven compilers. That's just the serialization library - I don't test other libraries. So if you're experiencing 11 hours CPU time for one compiler you're definately doing something differently than I am.
One likely immediate difference is that I run test with optimizations enabled.
Note in a couple of cases I found test that fail in a particularly inconvenient way on some platforms. This is usually that either they loop forever or allocate so much memory that they thrash forever. Sinced you're running 11 hours CPU time for just one compiler, it would be helpful to determine the cause. Its possible that failure in just one test is causing the failure.
I'll try and see where the slow downs occur. But just an FYI the long running times are not just the serialization library. There are some tests out there that do take a long time to just compile.
On win32 platforms there is no test cut-off for tests which exceed some predetermined resource levels. Obviously that would be helpful, but for now we just have to look at them.
Sure, and it's on our TODO list for testing. But if the large resources needed for one particular test are indicative of using the library it reflects badly on that library. And users might be frustrated enough to not use the library if their compile times go from minutes to hours, or from Megs to Gigs of RAM. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera wrote:
One likely immediate difference is that I run test with optimizations enabled.
That would be significant. I have run the test suite in the past in release mode and had lots of problems I couldn't address. In particular, borland 5.51 loops (almost) forever in lots of the tests due to some issue in compilation of borland 5.51 with Boost.Test. I did mention this but I believe it was dismissed because borland 5.51 wasn't an a "release" compiler. I think I gave up running release mode tests at this point because then I whould have to have a different or more elaborate Jamfile setup up and no one seemed to care. FWIW, no users have complained either so I don't know what to make of it. Perhaps other compilers have quirks that only show up in release mode (very likely). It seems to me that this isn't so much an issue of testing taking too long but rather tests failing in release mode in an ungraceful manner. I'll look into this at least as far as the serialization library is concerned. Robert Ramey

(brought over from fast array serialization thread) Robert Ramey wrote:
We will get to that. I'm interested in incorporating your improved testing. But I do have one concern. I test with windows platforms including borland and msvc. These can be quite different than just testing with gcc and can suck up a lot of time. It may not be a big issue here, but it means you'll have to be aware not to do anything toooo tricky.
Sure. I had this in mind. The changes involve only reducing duplicated work. There aren't any tricks there that are platform specific. Probably things will need some tweaking on platforms I haven't tested the stuff on, and mileage may vary. BTW, the best I can get out of it overall is a factor of two speedup (I got a factor of ~4, but that gain is available in only about half the tests. So you net about two.) Of course this kind of reorganizing doesn't address the "real" underlying MxNxK problem. I think going after that requires a better understanding of the problem than I have at the moment. For instance, with a one-line change to the Jamfile you could cut testing time in half by running dll tests only, if you could establish that any given dll test succeeds if and only if the corresponding static test succeeds, which I only guess is the case. Anyhow such tweaking is very easy to do.
Since you're interested in this I would suggest making a few new directories in your personal boost/libs/serialization tree. I see each of these directories having its own Jamfile so we could just invoke runtest from any of the test suites just by locating to the desired directory.
a) old_test - change the current test directory to this b) test - the current test with your changes to use the unit_test library. You might send me source to one of your changed test to see if I want to comment on it before too much effort is invested. c) test_compatibility. Included your back compatibility tests
My hope was to avoid fragmenting the testing like this and make the testing "modes" switchable from the command line. a) - c) can be accomplished pretty easily in one directory with one Jamfile. One of the more important goals, it seems to me, is to leverage the for-all-archives tests (test_array.cpp, test_set.cpp, test_variant.cpp, etc.) as portability ("portability" as in cross-platform portability for portable archives, and as in backwards-compatibility) tests, and to easily reuse these tests for portability verification.
d) test_performance - I want to include a few tests to test times for thinks like time to serialize different primitives, opening/closing archives, etc. This would be similar to the current setup so I could sort of generate a table which shows which combinations of features and archives are bottlenecks. Its the hope that this would help detect really dumb oversights like recreating an xml character translation table for each xml character serialized !
I'd also like to see stress testing. As I mentioned in some previous thread, we're going to be running terabytes of data through this stuff, and I'm not going to sleep well until we've done it several times successfully. This one does sound to me like a job for a separate testing directory. Anyhow, those changes. They're not polished up, but this will give an idea of how things work. Download http://www.resophonic.com/test.tar.gz, untar it in libs/serialization (delete test/ first). First, explanation of the changes w.r.t unit tests and how they make speedups possible, followed by an explanation of the changes for portability testing. -- Look at test_simple_class.cpp. test_main() has been converted to BOOST_AUTO_UNIT_TEST(unique_identifier), and a couple #includes have been changed. There is a corresponding change of lib in the Jamfile. That's it. If you look at test_map.cpp, you'll see that many of these unit tests can go in the same translation unit. -- Look at test_for_all_archives.cpp. This is where the testing speedup is. test_for_all_archives.cpp gets built once per archive type. This technique can bite you, of course, if your compiler requires too much memory and go to swap. My testing shows the compiler topping out at about 460M for this test, which I would think is still smaller than some other parts of boost. At any rate the file could easily be broken into two. One consequence of #including everything together was a lot of name collisions in different test_*.cpp files, each of which I chased down and resolved by changing names. This could probably have been fixed in some cases more elegantly with namespaces. See classes unregistered_polymorphic_base, null_ptr_polymorphic_base, SplitA, SplitB, TestSharedPtrA, etc. -- Look at the Jamfile, at test-suite "serialization". There you see the test_for_all_archives.cpp and a test_for_one_archive.cpp. I have not checked to see how nicely the testing framework displays failures inside individual unit tests. I've assumed the granularity is good. If it isn't the, test_for_all_archives.cpp business can just be tossed out and the unit tests compiled/linked/run one at a time, as in the current system. Notice also the use of rule templates to provide the demo tests with the exec monitor lib, and the unit tests with the unit test framework lib. Now the changes relating to portability testing: -- Look at test_simple_class.cpp. A reseed() has been added at the top of the test. tmpnam(NULL) has been changed to TESTFILE("unique_identifier"), and remove(const char*) has been changed to finish(const char *). -- Now look at the top of the Jamfile. The switch --portability turns on the #define BOOST_SERIALIZATION_TEST_PORTABILITY which affects the behavior of TESTFILE() and finish(). This (almost) gets you the ability to test portability in various ways. (There are a few more changes required, I'll get to them.) -- Looking at test_tools.cpp, if BOOST_SERIALIZATION_TEST_PORTABILITY is *on*: finish() is a no-op TESTFILE("something") returns a path get_tmpdir()/P/archive-type, Where P is a path that identifies the compiler, platform, and boost version. TESTFILE("nvp1"), for example, could return /tmp/Mac_OS/103300/gcc_version_something/portable_binary_archive.nvp1. if --portability is not specified, TESTFILE() works like tmpnam(NULL) and finish(filename) calls std::remove(filename), which is the "old" functionality. In this way, if each of your testing runs points to the same $TMP, each platform/version/compiler's serialized testing data will be "overlaid" in a directory structure in such a way that you can easily walk the $TMP hierarchy comparing checksums of files with the same name. -- Look at A.hpp. There are now two A's, one portable, one nonportable. In other places I've made similar changes to other classes. The portable version contains only portable types and uses boost random number generators (maybe we want to nix the nonportable one completely and put serialization of nonportable types into their own test somewhere.) std::rand() will of course generate different numbers on one architecture than on others and we need all platforms to generate archives containing A's with exactly the same numbers. (I cannot begin to explain what a thrill it was, as my testing strategy appeared to be on the rocks, to discover that the problem was already solved right there in boost::random.) The reseed() that appears at the top of test_simple_class.cpp reseeds the boost random rngs. So those changes get you switchable portability testing. You just need a utility that walks the hierarchy at $TMP and compares files. I've been using a perl script, you could just pretty easily code one up with boost::crc and boost::filesystem. There's a filesystem-walking routine hanging around in test_tools.cpp. Some minor stuff that I stumbled across and had to resolve in the process, and the open issues that come to mind: -- For platform portability testing, one also has to be careful about containers on some platforms making more temporary copies of A than on others. You create as many A's as you're going to insert into your container, and then insert them one at a time. You can't just call e.g. mymap.insert(A()); multiple times, as you don't know how many times A::A() will get called inside that call to insert(). This will get you serialized maps, for instance, where only the first-inserted entry match. Took a while to track down, but they're all fixed. -- Jamfile is revamped per Rene's suggestions using rule templates. I'm sure there are a couple of toolset requirements that I've managed to drop, but this should just be putting them back in some places. Overall I think it's more flexible/maintainable, but of course it isn't finished. -- test_class_info_save and test_class_info_load always write their data to one of these platform/version/compiler directories suitable for portability testing. Need to do a little housekeeping, or maybe the whole $TMP/platform/version/compiler stuff is OK for general use, your call. -- These changes are all against boost release 1.33.0. Dunno if things are broken w.r.t the trunk. -- the test_tools.hpp and test_tools.cpp stuff is messy at the moment. This should probably just be broken out into a separate lib. -- I'm not 100% clear on my use of rule templates in the Jamfile. Somebody might want to take a look at this. Specifically, it isn't clear to me between which of the three colons <define>WHATEVER should go, and where toolset::required-something-or-other should go. I can verify that things work OK for gcc, but I don't have a windows here to test with. -- The DEPENDS $(saving-tests) : $(loading-tests) business is still there. I don't recall if this was deprecated or not. Well let me know what you think. -t

troy d. straszheim wrote:
Notice also the use of rule templates to provide the demo tests with the exec monitor lib, and the unit tests with the unit test framework lib.
-- Jamfile is revamped per Rene's suggestions using rule templates. I'm sure there are a couple of toolset requirements that I've managed to drop, but this should just be putting them back in some places. Overall I think it's more flexible/maintainable, but of course it isn't finished.
Nicely done :-)
-- I'm not 100% clear on my use of rule templates in the Jamfile. Somebody might want to take a look at this. Specifically, it isn't clear to me between which of the three colons <define>WHATEVER should go, and where toolset::required-something-or-other should go. I can verify that things work OK for gcc, but I don't have a windows here to test with.
Both of those go in the "requirements" section of the target definition. Which has this form: [target-type] # target # : # sources * # : # requirements * # : # default-build * # ; Think of "toolset::whatever" as dynamic requirements. They get decided by running the named rule when the real targets, specific to the type of build, are getting created. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org

Rene Rivera wrote:
[target-type] # target # : # sources * # : # requirements * # : # default-build * # ;
Oops, forgot to mention, for testing targets (run, compile, etc.) there is no target. Those start with sources as the first arguments. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org
participants (5)
-
David Abrahams
-
Kevin Wheatley
-
Rene Rivera
-
Robert Ramey
-
troy d. straszheim