[Test] can tests be run in parallel?
data:image/s3,"s3://crabby-images/4782d/4782d3994261d04366069f7f5b7a7d737d904c87" alt=""
Hi all, I was trying to make better use of my cores for running tests. I tried to find something about running tests in parallel in the online documentation, but without any luck. Does anyone know if this is possible? Thanks Thorsten
data:image/s3,"s3://crabby-images/a943c/a943cf3a95bb380769d2c9b6dad6ca57d0df934f" alt=""
Thorsten Ottosen
Hi all,
I was trying to make better use of my cores for running tests. I tried to find something about running tests in parallel in the online documentation, but without any luck.
Does anyone know if this is possible?
Not at the moment. I had an idea how to implement distribution on a grid/cloud, but never got to implementing it. If there is substantial interest I can look into it again. That said, nothing prevents you to implement different test runner (different from console_test_runner) and implement any kind of parallelism you'd like. Gennadiy
data:image/s3,"s3://crabby-images/4782d/4782d3994261d04366069f7f5b7a7d737d904c87" alt=""
Den 10-06-2011 00:07, Gennadiy Rozental skrev:
Thorsten Ottosen
writes:
Does anyone know if this is possible?
Not at the moment. I had an idea how to implement distribution on a grid/cloud, but never got to implementing it. If there is substantial interest I can look into it again.
Thanks for the clarification. I really think this could be very useful. I also think a very simple solution would suffice: simply run each test suite in parallel, and put a lock around the output such that the entire output of a test suite (or test case) is printed at the end of the test suite (or test case). regards Thorsten
data:image/s3,"s3://crabby-images/22500/22500f3445ec507bcbc1a6b14ddcc1348ae483e2" alt=""
On Fri, Jun 10, 2011 at 11:13 AM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Den 10-06-2011 00:07, Gennadiy Rozental skrev: [...] Thanks for the clarification. I really think this could be very useful. I also think a very simple solution would suffice: simply run each test suite in parallel, and put a lock around the output such that the entire output of a test suite (or test case) is printed at the end of the test suite (or test case).
regards
Thorsten
I think there is much more to do, since test suites might depend on other test cases or test suites. There must be some sort of logic, how to identify independent execution units and execute them, because even if some suite cannot be run in parallel since it depends on other suite(s) or test cases, it is still possible that all tcs in this suite can be run in parallel. May be some sort of work stealing algorithm?
data:image/s3,"s3://crabby-images/4782d/4782d3994261d04366069f7f5b7a7d737d904c87" alt=""
Den 10-06-2011 12:30, Ovanes Markarian skrev:
On Fri, Jun 10, 2011 at 11:13 AM, Thorsten Ottosen
mailto:thorsten.ottosen@dezide.com> wrote:
I think there is much more to do, since test suites might depend on other test cases or test suites. There must be some sort of logic, how to identify independent execution units and execute them, because even if some suite cannot be run in parallel since it depends on other suite(s) or test cases, it is still possible that all tcs in this suite can be run in parallel. May be some sort of work stealing algorithm?
Ccurrently, that is not on of my needs. Anyway, some of the logic must already be there, otherwise how can it run dependent tests today? -Thorsten
data:image/s3,"s3://crabby-images/22500/22500f3445ec507bcbc1a6b14ddcc1348ae483e2" alt=""
On Fri, Jun 10, 2011 at 3:53 PM, Thorsten Ottosen < thorsten.ottosen@dezide.com> wrote:
Den 10-06-2011 12:30, Ovanes Markarian skrev:
On Fri, Jun 10, 2011 at 11:13 AM, Thorsten Ottosen
mailto:thorsten.ottosen@dezide.com> wrote: I think there is much more to do, since test suites might depend on
other test cases or test suites. There must be some sort of logic, how to identify independent execution units and execute them, because even if some suite cannot be run in parallel since it depends on other suite(s) or test cases, it is still possible that all tcs in this suite can be run in parallel. May be some sort of work stealing algorithm?
Ccurrently, that is not on of my needs. Anyway, some of the logic must already be there, otherwise how can it run dependent tests today?
-Thorsten
I remember that in the plain API there is a function add_dependency or smth. like that. The problem is, if you implement parallel test execution, you can't just skip the dependency, because it is not used in most cases. Otherwise it could lead to deadlocks, in the cases where it is used. With Kind Regards, Ovanes
data:image/s3,"s3://crabby-images/7f5df/7f5df4a15e5a50e7e79aca9b353387cf8ec8990d" alt=""
Re: " it is still possible that all tcs in this suite can be run in
parallel. May be some sort of work stealing algorithm ", that reminds me of
something already built by Intel in Intel's Threading Building Blocks
library. There is an open source version of it, but it scales well with
increasing numbers of cores; and it is designed and implemented in such a
way that the programmer does not need to worry about the tedious details of
creating threads. I have examined it only for number crunching, but I don't
see a reason it couldn't be used in designing and implementing test suites.
It does, though, need a slight thift in mindset relative to what you'd
normally do in multithreaded programs or conventional numeric algorithms
(something you can see only by actually playing with it to do trivially
simple things fast, like matrix multiplication). Instead, for example, of
putting a lock around output, you'd design the program to use a class that
collects the results of the tests, and then outputs it in a sensible order
to some stram (standard out or a file stream). It might be worth a look (by
programmers smarter than me), to see if it can hep in the context of this
discussion, and to what extent.
Cheers
Ted
From: boost-users-bounces@lists.boost.org
[mailto:boost-users-bounces@lists.boost.org] On Behalf Of Ovanes Markarian
Sent: June-10-11 6:31 AM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] [Test] can tests be run in parallel?
On Fri, Jun 10, 2011 at 11:13 AM, Thorsten Ottosen
data:image/s3,"s3://crabby-images/22500/22500f3445ec507bcbc1a6b14ddcc1348ae483e2" alt=""
On Fri, Jun 10, 2011 at 4:18 PM, Ted Byers
Re: " it is still possible that all tcs in this suite can be run in parallel. May be some sort of work stealing algorithm ", that reminds me of something already built by Intel in Intel's Threading Building Blocks library. There is an open source version of it, but it scales well with increasing numbers of cores; and it is designed and implemented in such a way that the programmer does not need to worry about the tedious details of creating threads. I have examined it only for number crunching, but I don't see a reason it couldn't be used in designing and implementing test suites. It does, though, need a slight thift in mindset relative to what you'd normally do in multithreaded programs or conventional numeric algorithms (something you can see only by actually playing with it to do trivially simple things fast, like matrix multiplication). Instead, for example, of putting a lock around output, you'd design the program to use a class that collects the results of the tests, and then outputs it in a sensible order to some stram (standard out or a file stream). It might be worth a look (by programmers smarter than me), to see if it can hep in the context of this discussion, and to what extent.
Yes, sure. I know TBB as well. They have taken the concept of work-stealing from Cilk (but also acquired Cilk and made it to one of the product available with Intel parallel studio). Anyway it does not matter if Intel has it or not, because Boost can't depend on Intel's lib. First of all licenses might be incompatible. Secondly, this lib is available on x86 compatible architectures and there is an unoficial community port for arm available, but this lib might be hard to port as well, since it really uses low level stuff of the platform, to avoid such things as false sharing etc. TBB really efficiently integrates work-stealing, to combine bfs and dfs graph traversals to optimize stack growth etc... It is not trivial to implement such a thing in boost test. Besides that boost test would need an own thread abstraction layer, because boost threads uses boost test for tests ;) There was also a discussion in the list if boost test might depend on boost threading lib. All in all it sound as a huge amount of work ;) Best Regards, Ovanes
data:image/s3,"s3://crabby-images/a943c/a943cf3a95bb380769d2c9b6dad6ca57d0df934f" alt=""
Ovanes Markarian
All in all it sound as a huge amount of work ;)
1. Frankly I thought more in terms of running test cases in multiple processes. Either on a local host or in a cloud. Running in multiple threads have some small advantages (like no need to serialize extras test case data and some existing infrastructure), but IMO it breaks most important restriction of unit testing - all test units needs to be run independently (unless your classes are specially prepared for running in MT environment). Running in multi-process scenario is much more close to running test cases serially. In addition we do not need to worry about syncing access Boost.Test internals. 2. Managing dependency should not be a problem IMO. Boost.Test has rather simple model and running test units in parallel should not complicate it much. 3. I do not see, why we need to parallelize on test suites level. IMO it should be on test case level. 4. I do not see a problem with dependence on either Boost.Thread or Intel TBB (or anything else for that matter). In my design the logic of distribution resides in standalone runner application and we can have number of different runners with different dependencies. There is only some generic "distribution support logic which will reside in main library. And it does indeed non trivial piece of work. So if there are any volunteers... I'll be happy to help with general design and some core infrastructure changes Gennadiy
data:image/s3,"s3://crabby-images/22500/22500f3445ec507bcbc1a6b14ddcc1348ae483e2" alt=""
Hi!
On Fri, Jun 10, 2011 at 5:13 PM, Gennadiy Rozental
Ovanes Markarian
writes: All in all it sound as a huge amount of work ;)
1. Frankly I thought more in terms of running test cases in multiple processes. Either on a local host or in a cloud. Running in multiple threads have some small advantages (like no need to serialize extras test case data and some existing infrastructure), but IMO it breaks most important restriction of unit testing - all test units needs to be run independently (unless your classes are specially prepared for running in MT environment). Running in multi-process scenario is much more close to running test cases serially. In addition we do not need to worry about syncing access Boost.Test internals.
I might agree, but it heavily depends on the target platform of test framework. It is really more clean to have the process based context for TC execution, but on many embedded platforms you simply do not have processes, but a single image (which includes your app, operating system etc). Usually such image is started by the bootloader and executes multiple tasks. Tasks might be seen in this context as thread equivalents (because each task owns a separate stack, but they all run within the same address space). Therefore, it is still beneficial to have a test app which can use multi-core ability of the embedded target to execute much faster. Because in some complex systems (e.g. telecoms ) there might be some thousands of TCs. Running them parallel on a multi-core processor might greatly reduce CI times. On the other hand it might be simply impossible or very difficult to produce and run per core images (instead of per processor). I see these major embedded systems as a C++ niche. Therefore there is a need of some easily portable task layer for parallel TC execution from within one process (or let's say execution image).
2. Managing dependency should not be a problem IMO. Boost.Test has rather simple model and running test units in parallel should not complicate it much.
But how do you see the dependencies in the tcs? Does it make sence to run a dependent tc if the main tc already failed? There must be some split, which splits tcs and considers their dependency graphs. What happens now if e.g. there is a cyclic dependency?
3. I do not see, why we need to parallelize on test suites level. IMO it should be on test case level.
Isn't it so that TC and TestSuite have a common base class, which is additionally is responsible for dependency management? If so this class should be used as parallelization unit ;)
4. I do not see a problem with dependence on either Boost.Thread or Intel TBB (or anything else for that matter). In my design the logic of distribution resides in standalone runner application and we can have number of different runners with different dependencies. There is only some generic "distribution support logic which will reside in main library.
Actually, when I wrote my previous post I was thinking about, but than it is some kind of a spin off, which in case of dependency upon TBB will require additional lib to be installed and will not build with the whole boost?
And it does indeed non trivial piece of work. So if there are any volunteers... I'll be happy to help with general design and some core infrastructure changes
We could try that ;) Sounds really interesting to me.
Gennadiy
Regards, Ovanes
data:image/s3,"s3://crabby-images/0425d/0425d767771932af098628cd72e2ccd4040cb8a0" alt=""
On Jun 10, 2011, at 1:19 PM, Ovanes Markarian
On Fri, Jun 10, 2011 at 5:13 PM, Gennadiy Rozental
wrote: 2. Managing dependency should not be a problem IMO. Boost.Test has rather simple model and running test units in parallel should not complicate it much. But how do you see the dependencies in the tcs? Does it make sence to run a dependent tc if the main tc already failed? There must be some split, which splits tcs and considers their dependency graphs. What happens now if e.g. there is a cyclic dependency?
To me this resembles a data-flow problem. You can parallelize wherever it fans out, join() when a given test depends on multiple antecedents (if you can express such a thing in Boost.Test). The data item being passed is the bool expressing whether all previous tests along that path have succeeded; with multiple antecedents, you && the path results. A test is run only if all its inputs are true. Failure of a test passes false to its successors. There's even a (Vault?) data-flow library... ;-)
participants (5)
-
Gennadiy Rozental
-
Nat Goodspeed
-
Ovanes Markarian
-
Ted Byers
-
Thorsten Ottosen