[testing] Need optimization help!!!
If someone is interested in an optimization challenge here's your opportunity. Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes. If some brave soul could look into why it's so terribly slow the Boost community would be very grateful. As it would make it possible to move the processing to a CI service and improve results reliability and frequency. Unfortunately the only price I can promise for such a task is to buy you a meal and/or some drinks if I see you :-) If you are interested all that's needed to run the results is: a) A good chunk of disk space.. At least 27GiB. b) A bash shell, and Python, and git, and C++ tools (but you probably already have all that). b) An empty directory to run the reports. c) Download this script < https://github.com/boostorg/regression/blob/develop/reports/src/build_result...
. d) Comment out the upload commands on lines 220-223. e) Run the script, and wait.
And then the fun part of finding out what programs get run and what is slow :-) FYI.. The source for the processing is downloaded by the script. But you can find it here < https://github.com/boostorg/regression/tree/develop/reports>. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail
AMDG On 01/31/2017 08:22 PM, Rene Rivera wrote:
If someone is interested in an optimization challenge here's your opportunity. Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes.
If some brave soul could look into why it's so terribly slow the Boost community would be very grateful. As it would make it possible to move the processing to a CI service and improve results reliability and frequency.
The report is completely generated from scratch on every run, even though a significant fraction of it doesn't change. I suspect that the best way to optimize it would be to make the report generation more incremental. (The old XSL scripts did in fact work this way. When I wrote boost_report, I assumed that it didn't matter, since it took less than 2 minutes at the time.) In Christ, Steven Watanabe
On Tue, Jan 31, 2017 at 9:59 PM, Steven Watanabe
AMDG
If someone is interested in an optimization challenge here's your opportunity. Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes.
If some brave soul could look into why it's so terribly slow the Boost community would be very grateful. As it would make it possible to move
On 01/31/2017 08:22 PM, Rene Rivera wrote: the
processing to a CI service and improve results reliability and frequency.
The report is completely generated from scratch on every run, even though a significant fraction of it doesn't change. I suspect that the best way to optimize it would be to make the report generation more incremental. (The old XSL scripts did in fact work this way. When I wrote boost_report, I assumed that it didn't matter, since it took less than 2 minutes at the time.)
Ah, good to know. I wasn't aware of that change. Perhaps the growth of libraries and testers has skyrocketed the build times. Or something else, don't know. Good place for someone to start though.. Making it incremental again. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail
On 1/31/17 7:22 PM, Rene Rivera wrote:
If someone is interested in an optimization challenge here's your opportunity. Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes.
Assuming that the 40 min includes building all libraries and tests and running the tests, that doesn't seem to be all that long as far as I'm concerned. Here at home I use b2 and library_status to incrementally and run build tests for one library. (usually but not always the serialization library, naturally). It takes couple of minutes for a complete build on my more or less modern desktop - Mac mini with 256GB SSD and 4 cores. I don't think that's a major problem. When I'm working on fixing problems, most of the build/test cycles are incremental so in practice it's not problem at all. FWIW I use XCode project generated from CMake for debugging and b2 to check with gcc compilers before I checkin. Both seem to take a similar amount of time. Robert Ramey
On Wed, Feb 1, 2017 at 11:14 AM, Robert Ramey
On 1/31/17 7:22 PM, Rene Rivera wrote:
If someone is interested in an optimization challenge here's your opportunity. Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes.
Assuming that the 40 min includes building all libraries and tests and running the tests, that doesn't seem to be all that long as far as I'm concerned.
The 40 minutes includes *only* generating the reports that we see on the web site. By this point all the tests have run. Key though is that this is generation for *all* the tests for *all* libraries times all the testers on both master and develop branches. So we are talking about a large amount of data to process and generate. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail
On 2/1/17 9:23 AM, Rene Rivera wrote:
On Wed, Feb 1, 2017 at 11:14 AM, Robert Ramey
wrote:
The 40 minutes includes *only* generating the reports that we see on the web site. By this point all the tests have run. Key though is that this is generation for *all* the tests for *all* libraries times all the testers on both master and develop branches. So we are talking about a large amount of data to process and generate.
OK - that's helpful. I use Library Status to generate tests for all the combinations I test, release, debug, static, shared, debug, release for a couple of compilers. The report generation phase is not nothing, but it's insignificant next to building and running the tests themselves. So it's ever been a problem. Of course I have no idea if it similar or different in speed to the official boost one. My motivation for creating library_status from Baeman's original program was to be able to compare results across compilers and across configurations which is very helpful to me in finding the cause of problems in my library. So it clearly identifies build features such as release vs debug, static vs shared, compiler vs compiler. etc. And it doesn't truncate the error messages - which has caused me no end to frustration on trying to use the boost report. Robert Ramey
Rene Rivera wrote:
And then the fun part of finding out what programs get run and what is slow :-)
From the look of it, most of the time is spent in "Generating links files".
The architecture looks a bit odd. If I understand it correctly, the test runners generate a big .xml file, it's zipped, uploaded, the report script downloads all zips and then generates the whole report. It would be more scalable for the test runners to do most of the work. For instance, what immediately comes to mind is that they could generate the so-called links files directly instead of combining everything into one .xml which is then decomposed back to individual pages. Longer term we could think about splitting the report into individual pages per test runner; the current structure with a column per test runner was indeed more convenient when the table fitted on screen, but now it doesn't and a layout with a row per runner may be more useful: Sandia-darwin-c++11 2017-02-01 20:31:00 102 passed, 14 failed For this to be useful however the expected failure markup needs to be decentralized per-library, so that the maintainer can achieve a zero failures state as a baseline and then to only need look at the red rows following a change.
On 2/1/17 10:35 AM, Peter Dimov wrote:
Rene Rivera wrote:
And then the fun part of finding out what programs get run and what is slow :-)
From the look of it, most of the time is spent in "Generating links files".
The architecture looks a bit odd. If I understand it correctly, the test runners generate a big .xml file, it's zipped, uploaded, the report script downloads all zips and then generates the whole report.
It would be more scalable for the test runners to do most of the work. For instance, what immediately comes to mind is that they could generate the so-called links files directly instead of combining everything into one .xml which is then decomposed back to individual pages.
Hmmm - making just one page of errors for each library and tester rather than a page for each test would help me out. this would eliminate the truncation of the error messages and diminish the number of potential linked pages (in the serialization library) by a factor of about 1000. that might help performance.
Longer term we could ...
But once we start doing that, we'll end up re-architecting the whole thing! Which is what I would very much like to see. But I wouldn't want to impose such a task on anyone. Robert Ramey
Robert Ramey wrote:
Hmmm - making just one page of errors for each library and tester rather than a page for each test would help me out. this would eliminate the truncation of the error messages and diminish the number of potential linked pages (in the serialization library) by a factor of about 1000. that might help performance.
Easiest would be to just zip the output of "b2 libs/pumpkin/test" and give a download link in the report. There wouldn't be much decrease in utility.
Rene Rivera wrote:
Currently the program(s) that generate the test results take a long time to run. In both the live setup of a 4 core 16 GB ram VM it takes about 40 minutes to generate the reports for master and develop branches. On my 4 core 16 GB ram real hardware SSD HD machine it also takes about 40 minutes.
Took about an hour each for me: Wed Feb 1 19:45:38 GTBST 2017 :: Start of testing. [build_all] Wed Feb 1 19:45:38 GTBST 2017 :: Get tools. [build_setup] Wed Feb 1 19:45:38 GTBST 2017 :: Git; boost_root [build_setup] Wed Feb 1 19:47:09 GTBST 2017 :: Git; boost_regression [build_setup] Wed Feb 1 19:47:10 GTBST 2017 :: Git; boost_bb [build_setup] Wed Feb 1 19:47:12 GTBST 2017 :: Build tools. [update_tools] Wed Feb 1 19:50:50 GTBST 2017 :: Build results for branch develop. [build_results] Wed Feb 1 20:54:41 GTBST 2017 :: Build results for branch master. [build_results] Wed Feb 1 21:49:00 GTBST 2017 :: Upload results for branch develop. [upload_results] Wed Feb 1 21:49:00 GTBST 2017 :: Upload results for branch master. [upload_results] Wed Feb 1 21:49:00 GTBST 2017 :: End of testing. [build_all] but this is not a full run, because the programs started failing with "basic_ios::clear" once develop.zip and master.zip reached 2 GB in size. This occurs at Reading /cygdrive/d/tmp2/boost-reports/develop/incoming/develop/processed/teeks99-02-dg4.5-Docker-64on64.xml Merging expected results Generating links pages basic_ios::clear on develop and at Reading /cygdrive/d/tmp2/boost-reports/master/incoming/master/processed/teeks99-02-mg4.5-Docker-64on64.xml Merging expected results Generating links pages basic_ios::clear on master. FWIW. :-)
If some brave soul could look into why it's so terribly slow the Boost community would be very grateful. As it would make it possible to move the processing to a CI service and improve results reliability and frequency.
As you may know, we are running our own Boost-on-Android tests results site https://boost.crystax.net/, which displays content, generated by Boost regression scripts (i.e. the same as official Boost testing report page), but filtered by Android OS. This is done for our convenience, to make it possible to re-generate reports manually if needed, and to be able to set own schedule of reports generation, not bothering Boost team with that. We use Boost regression tests set mainly for testing CrystaX NDK itself rather than Boost, but in practice this leads to the same effect as for Boost team - we use the same scripts for reports generation and we get similar results on our web site. So we've already noticed the fact that reports generation scripts works slowly. Of course, not so slowly as mentioned by Rene (due to the fact that we filter runners by name), but still far from ideal. I was planning to look on that and try to figure out what's wrong there, since just reporting the issue and not suggesting any solution for it was looking useless for me. However, last months I just have no free time, so it's still just a plans and no real work was done in this direction yet. Also, for those reading it, I'd like to point to the fact it's not only subject of optimization. I clearly see floating bug in reports generation. Sometimes (I can't find any regularity yet) generated reports miss big amount of data from test results. You can see that on http://www.boost.org/development/tests/master/developer/summary.html, looking at CrystaX-apilevel-21-armeabi-v7a-llvm-libc++ runner - a) there is only gcc-4.9 listed, even though test results contains data for gcc-5, gcc-6, clang-3.6, clang-3.7 and clang-3.8 and b) even for gcc-4.9 it shows results only for couple of libraries, completely missing data for others. Such wrong report generation happens from time to time for different set of API level/ABI/C++ Standard Library used, without any visible regularity. So, taking above into account, I'd be happy to say that I'd get this task and fix it, but I really can't say anything regarding timing - it may be sooner than I suppose, but most likely will be later - couple of months from now as I assess it. If someone will take it and fix before I could, that would be just great (in this case please take into account floating bug described above). However, if no one would be able to take this task and complete it to the moment when I will have a time, I'll definitely try to fix it, just because it's something needed for our project too, not only for Boost. -- Dmitry Moskalchuk
participants (5)
-
Dmitry Moskalchuk
-
Peter Dimov
-
Rene Rivera
-
Robert Ramey
-
Steven Watanabe