It would be interesting to see this number when giving 1..N cores to the scheduler.
Things like contention caused by the work stealing or by NUMA effects such when you start stealing across NUMA domains usually overshadow the memory allocation costs. Additionally, the quality of the scheduler implementation affects things gravely.
You might want to compare the performance of your library with other existing solutions (for instance TBB, qthreads, openmp, HPX). The link I provided above will give you a set of trivial tests for those. Moreover, we'd be happy to add an equivalent test for your library to our repository.
after re-reading I have the the impression that there is a misunderstanding.
I hope not.
boost.fiber is a thin wrapper over coroutines (each fiber contains on coroutine) - the library schedules and synchronizes fibers (as requested on the developer list in 2013) in one thread. the fibers in this lib are agnostic of threads - I've only added some support that the classes (mutex, condition_variable) could be used in a multi-threaded context. combining fibers with threads should be done in another, more sophisticated library (at higher level).
I believe you can't and shouldn't compare fibers with qthreads, TBB or openmp. I'll write a test measuring the overhead of a fiber running in one thread (as already described above) first.
I beg to disagree. Surely, you run fibers on top of OS-threads (in your case using the coroutines mechanism). However, every fiber is semantically indistinguishable from a std::thread (if implemented properly). It has a dedicated function to execute, it represents a context of execution, you can synchronize it with other fibers, etc. In fact nothing in the C++ Standard implies that a std::thread has to be implemented using OS (kernel) threads, why we decided to name our lightweight tasks 'hpx::thread' which expose 100% of the mandated interface for std::threads. If you run on several cores (OS-threads), you start executing your fibers concurrently. AFAIU, your library is clearly designed for this, otherwise you wouldn't have implemented special, fiber-oriented synchronization primitives or work stealing capabilities. To clarify, I'm not talking about measuring the performance of (kernel) threads, rather I would like for you to give us performance data for Boost.Fiber so we can understand what are the overheads imposed by using fibers in the first place. The only way to not only get quantitative numbers which do not mean anything beyond a single machine, I was suggesting to run equivalent performance benchmarks using other, similar libraries, such a TBB, openmp, HPX, etc. as this would allow to get a qualitative picture regardless of the machine the tests are run on. And the libraries I listed clearly implement a semantically equivalent idiom: lightweight parallelism (be it a task in TBB, a fiber in Boost.Fiber, a hpx::thread, or a qthread, etc.). Hope this clarifies what I had in mind. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu