Boost.Fiber mini-review September 4-13
Hi all, The mini-review of Boost.Fiber by Oliver Kowalke begins today, Friday September 4th, and closes Sunday September 13th. It was reviewed in January 2014; the verdict at that time was "not in its present form." Since then Oliver has substantially improved documentation, performance, library customization and the underlying implementation, and is bringing the library back for mini-review. The substance of the library API remains the same, which is why a mini-review is appropriate. The Fiber library now requires a C++14-conforming compiler. I will monitor reviews and discussion on both the boost-users@ list and the boost@ developers' list. Please include at least "fiber" and "review" in your mail subject, e.g. by replying to this message. (Please reply to only ONE list, however.) Thank you for your interest and your feedback! ----------------------------------------------------- About the library: Boost.Fiber provides a framework for micro-/userland-threads (fibers) scheduled cooperatively. The API contains classes and functions to manage and synchronize fibers similar to Boost.Thread. Each fiber has its own stack. A fiber can save the current execution state, including all registers and CPU flags, the instruction pointer, and the stack pointer and later restore this state. The idea is to have multiple execution paths running on a single thread using a sort of cooperative scheduling (versus threads, which are preemptively scheduled). The running fiber decides explicitly when it should yield to allow another fiber to run (context switching). Boost.Fiber internally uses execution_context from Boost.Context; the classes in this library manage, schedule and, when needed, synchronize those contexts. A context switch between threads usually costs thousands of CPU cycles on x86, compared to a fiber switch with a few hundred cycles. A fiber can only run on a single thread at any point in time. docs: http://olk.github.io/libs/fiber/doc/html/index.html git: https://github.com/olk/boost-fiber --------------------------------------------------- Please always state in your review whether you think the library should be accepted as a Boost library! Additionally please consider giving feedback on the following general topics: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Nat Goodspeed Boost.Fiber Review Manager ________________________________
On Fri, Sep 4, 2015 at 11:14 AM, Nat Goodspeed
I will monitor reviews and discussion on both the boost-users@ list and the boost@ developers' list.
I will also monitor reviews posted to the Boost Library Incubator: http://rrsd.com/blincubator.com/bi_library/fiber/?gform_post_id=859
Le 04/09/15 17:14, Nat Goodspeed a écrit :
Hi all,
The mini-review of Boost.Fiber by Oliver Kowalke begins today, Friday September 4th, and closes Sunday September 13th. It was reviewed in January 2014; the verdict at that time was "not in its present form." Since then Oliver has substantially improved documentation, performance, library customization and the underlying implementation, and is bringing the library back for mini-review.
Hi Nat, Oliver. Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues? Best, Vicente
On Fri, Sep 4, 2015 at 2:07 PM, Vicente J. Botet Escriba
Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues?
http://lists.boost.org/boost-announce/2014/01/0393.php I have not yet tried to address those point by point.
Le 04/09/15 20:37, Nat Goodspeed a écrit :
On Fri, Sep 4, 2015 at 2:07 PM, Vicente J. Botet Escriba
wrote: Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues? http://lists.boost.org/boost-announce/2014/01/0393.php
I have not yet tried to address those point by point. I don't understand then why are we doing the mini review now, before you check that any point has at least tried to be addressed.
Oliver, is there something in the documentation that refers to each one of the points that must be covered? Best, Vicente
2015-09-04 23:08 GMT+02:00 Vicente J. Botet Escriba < vicente.botet@wanadoo.fr>: Oliver, is there something in the documentation that refers to each one of
the points that must be covered?
you refer to the issues of the last review? I did not add the list to the boost.fiber documentation
Le 05/09/15 04:26, Oliver Kowalke a écrit :
2015-09-04 23:08 GMT+02:00 Vicente J. Botet Escriba < vicente.botet@wanadoo.fr>:
Oliver, is there something in the documentation that refers to each one of
the points that must be covered?
you refer to the issues of the last review? I did not add the list to the boost.fiber documentation
It would be good if you can take the points in the review summary and replay on this post to each one of them. Otherwise each one of us needs to do this check ourself and ask you when the answer is not evident. Best, Vicente
On Fri, Sep 4, 2015 at 5:08 PM, Vicente J. Botet Escriba
Le 04/09/15 20:37, Nat Goodspeed a écrit :
On Fri, Sep 4, 2015 at 2:07 PM, Vicente J. Botet Escriba
wrote:
Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues?
http://lists.boost.org/boost-announce/2014/01/0393.php
I have not yet tried to address those point by point.
I don't understand then why are we doing the mini review now, before you check that any point has at least tried to be addressed.
Sorry. How about these points: Performance: Oliver has not only worked to improve performance, he has included and documented performance tests you can run on your own hardware. Documentation: The documentation now contains several new sections explaining how to use the library for interesting/common use cases. New examples are presented and documented. API: The API has been aligned more closely with std::thread. C++14 is not only supported but required. Move-only callables are supported. Variadic parameters are supported. std::chrono is more generically supported. Channels now support value_pop(). fiber_group has been dropped. Migrating fibers between threads has been dropped. That said, of course, it is up to each reviewer to state for him- or herself whether s/he believes that the Fiber library should become part of Boost. In particular, regardless of what Oliver or I might synopsize, it is up to each previous reviewer to decide whether his January 2014 objections have been addressed.
Am 05.09.2015 4:27 nachm. schrieb "Nat Goodspeed"
API: The API has been aligned more closely with std::thread. C++14 is not only supported but required. Move-only callables are supported. Variadic parameters are supported. std::chrono is more generically supported. Channels now support value_pop(). fiber_group has been dropped. Migrating fibers between threads has been dropped.
Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
2015-09-05 16:36 GMT+02:00 Thomas Heller
Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
of course, the documentation has a short description in the rational section (http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html). I refer to Giovanni's more detailed explanation ( http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread... ). because of compiler optimization the address of TLS might be pre-calculated, the access to TLS might be moved out of a loop, etc. even if boost.fiber would not use TLS (keyword thread_local), the usage of TLS in the user-code (executed by a fiber) is not permitted - this is an unacceptable restriction for users. I suggest a pattern like boost.job proposes - on a logical processors a worker-thread is pinned, running a pool of fibers (boost. job uses some aspects of NUMA architectures). The jobs can be synchronized by classes from boost.fiber.
On 09/05/2015 04:50 PM, Oliver Kowalke wrote:
2015-09-05 16:36 GMT+02:00 Thomas Heller
mailto:thom.heller@gmail.com>: Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
of course, the documentation has a short description in the rational section (http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html). I refer to Giovanni's more detailed explanation (http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread...).
because of compiler optimization the address of TLS might be pre-calculated, the access to TLS might be moved out of a loop, etc. even if boost.fiber would not use TLS (keyword thread_local), the usage of TLS in the user-code (executed by a fiber) is not permitted - this is an unacceptable restriction for users.
I think this is not a good argument against thread migration. While this whole TLS issue is certainly something that can lead to very subtile and hard find bugs, it's something that user level threads currently just can't deal with properly. As much as they look like std::thread, this is one of their limitations and usage of TLS should be highly discouraged. Consider code like this: void f() { static thread_local int i = 0; std::cout << i++ << '\n'; } boost::fiber f1(f); boost::fiber f2(f); This will print: 0 1 instead of what might be expected: 0 0
I suggest a pattern like boost.job proposes - on a logical processors a worker-thread is pinned, running a pool of fibers (boost. job uses some aspects of NUMA architectures). The jobs can be synchronized by classes from boost.fiber.
I think NUMA awareness is something that is out of scope of Boost.Fiber and should therefor not be used as an argument against thread migration. Especially since not all algorithms are memory bound. For compute bound problems, fiber migration across NUMA domain boundaries is not a big deal. For memory bound problems, stealing inside of a NUMA domain can be benefitial. All in all, it is not something the fiber library should decide but rather its user through the use of appropriate schedulers/executors.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
2015-09-05 20:22 GMT+02:00 Thomas Heller
I think this is not a good argument against thread migration. While this whole TLS issue is certainly something that can lead to very subtile and hard find bugs, it's something that user level threads currently just can't deal with properly. As much as they look like std::thread, this is one of their limitations and usage of TLS should be highly discouraged.
Consider code like this: void f() { static thread_local int i = 0; std::cout << i++ << '\n'; }
boost::fiber f1(f); boost::fiber f2(f);
This will print: 0 1
instead of what might be expected: 0 0
I would not expect this and I don't see why your test code proves that fibers (which do not migrate) should not use TLS In your example you could also remove thread_local and you would get the first result
I suggest a pattern like boost.job proposes - on a logical processors a worker-thread is pinned, running a pool of fibers (boost. job uses some aspects of NUMA architectures). The jobs can be synchronized by classes from boost.fiber.
I think NUMA awareness is something that is out of scope of Boost.Fiber and should therefor not be used as an argument against thread migration.
you misunderstood my previous post - boost.fiber is completely unaware of NUMA - as I wrote NUMA utilization is part of boost.job (uses boost.fiber)
Especially since not all algorithms are memory bound. For compute bound problems, fiber migration across NUMA domain boundaries is not a big deal.
yes maybe
For memory bound problems, stealing inside of a NUMA domain can be benefitial.
agreed
All in all, it is not something the fiber library should decide but rather its user through the use of appropriate schedulers/executors.
as I already mention before
On 09/05/2015 08:22 PM, Thomas Heller wrote:
On 09/05/2015 04:50 PM, Oliver Kowalke wrote:
2015-09-05 16:36 GMT+02:00 Thomas Heller
mailto:thom.heller@gmail.com>: Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
of course, the documentation has a short description in the rational section (http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html). I refer to Giovanni's more detailed explanation (http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread...).
because of compiler optimization the address of TLS might be pre-calculated, the access to TLS might be moved out of a loop, etc. even if boost.fiber would not use TLS (keyword thread_local), the usage of TLS in the user-code (executed by a fiber) is not permitted - this is an unacceptable restriction for users.
I think this is not a good argument against thread migration. While this whole TLS issue is certainly something that can lead to very subtile and hard find bugs, it's something that user level threads currently just can't deal with properly. As much as they look like std::thread, this is one of their limitations and usage of TLS should be highly discouraged.
Consider code like this: void f() { static thread_local int i = 0; std::cout << i++ << '\n'; }
boost::fiber f1(f); boost::fiber f2(f);
This will print: 0 1
instead of what might be expected: 0 0
I suggest a pattern like boost.job proposes - on a logical processors a worker-thread is pinned, running a pool of fibers (boost. job uses some aspects of NUMA architectures). The jobs can be synchronized by classes from boost.fiber.
I think NUMA awareness is something that is out of scope of Boost.Fiber and should therefor not be used as an argument against thread migration. Especially since not all algorithms are memory bound. For compute bound problems, fiber migration across NUMA domain boundaries is not a big deal. For memory bound problems, stealing inside of a NUMA domain can be benefitial. All in all, it is not something the fiber library should decide but rather its user through the use of appropriate schedulers/executors.
I almost forgot to mention: Even without migratable fibers, users of Boost.Fiber can not be guarantueed that all fibers are executed within the same NUMA domain as the OS is free to migrate its processes/threads however it wants. This can only be encountered with third party tools like numactl, libnuma, hwloc or likwid-pin.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
2015-09-05 20:25 GMT+02:00 Thomas Heller
I almost forgot to mention: Even without migratable fibers, users of Boost.Fiber can not be guarantueed that all fibers are executed within the same NUMA domain as the OS is free to migrate its processes/threads however it wants.
1. boost.fiber does not know anything about NUMA - this is done in boost.job (I mentioned this lib as an example how boost.fiber could be used to execute jobs and distribute those jobs on processors) 2. you could pin an os thread to a processor
This can only be encountered with third party tools like numactl, libnuma, hwloc or likwid-pin.
or you use the os specific syscalls - as boost.job does, so no third party tools are required
I almost forgot to mention: Even without migratable fibers, users of Boost.Fiber can not be guarantueed that all fibers are executed within the same NUMA domain as the OS is free to migrate its processes/threads however it wants.
1. boost.fiber does not know anything about NUMA - this is done in boost.job (I mentioned this lib as an example how boost.fiber could be used to execute jobs and distribute those jobs on processors)
You use the NUMA awareness argument as a rationale for not supporting moving Fiber, however.
2. you could pin an os thread to a processor
Sure. But that's outside of the scope of Boost.Fiber as well. Thus inhibiting moving Fibers by the library is restricting things for libraries/code built on top of it. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 20:36 GMT+02:00 Hartmut Kaiser
You use the NUMA awareness argument as a rationale for not supporting moving Fiber, however.
no - I mentioned that the TLS problem prevents migrating fibers.
2. you could pin an os thread to a processor
Sure. But that's outside of the scope of Boost.Fiber as well.
as I already explained
Thus inhibiting moving Fibers by the library is restricting things for libraries/code built on top of it.
yes
You use the NUMA awareness argument as a rationale for not supporting moving Fiber, however.
no - I mentioned that the TLS problem prevents migrating fibers.
I was referring to this: http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html#fiber.rational... Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
On Sat, Sep 5, 2015 at 10:36 AM, Thomas Heller
Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html#fiber.rational...
Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this?
http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html#fiber.ration ale.migrating_fibers_between_threads
I'm quoting from the link you gave: <quote> Support for migrating fibers between threads was removed. Especially in the case of NUMA-architectures, it is not always advisable to migrate data between threads. Suppose fiber f is running on logical CPU cpu0 which belongs to NUMA node node0. The data of f are allocated on the physical memory located at node0. Migrating the fiber from cpu0 to another logical CPU cpuX which is part of a different NUMA node nodeX will reduce the performance of the application because of increased latency of memory access. A more important aspect is the problem with thread-local-storage (TLS). Instead of recomputing the address of a TLS variable, a compiler might, as an optimization, cache its previously-computed address in various function stack frames.[2] If a fiber was running on thread t0 and then migrated to thread t1, the cached TLS variable address(es) would continue pointing to the TLS for thread t0. Bad things would ensue. </quote> Hmmm, this is not what I would have expected, frankly. One of the main advantages of Fibers are their low creation/context switching/termination overhead. I would have expected for Fibers to be freely movable between kernel-threads (work-stealing!), even more so as you expose an interface conforming std::thread... If Fibers look like threads, they should behave accordingly. I think that it is too over-constraining to inhibit moving Fibers between kernel-threads just to maintain NUMA awareness (not all architectures have those in the first place). The decision whether to move a Fiber or not should be entirely left to a scheduler/executor (which could confine a Fiber to a NUMA domain, for instance, if needed). Also, it should be made very clear that the use of Fibers has certain implications on using TLS (in short - don't use TLS, but FLS instead). Thus the fact whether compilers cache the TLS or not is mostly irrelevant. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 17:31 GMT+02:00 Hartmut Kaiser
Hmmm, this is not what I would have expected, frankly. One of the main advantages of Fibers are their low creation/context switching/termination overhead. I would have expected for Fibers to be freely movable between kernel-threads (work-stealing!), even more so as you expose an interface conforming std::thread... If Fibers look like threads, they should behave accordingly.
but this is what C++ (incl. C++14) gives to us- compilers are free to optimize as they deserve. I hope you agree that is not a restriction coming from boost.fiber
I think that it is too over-constraining to inhibit moving Fibers between kernel-threads just to maintain NUMA awareness (not all architectures have those in the first place). The decision whether to move a Fiber or not should be entirely left to a scheduler/executor (which could confine a Fiber to a NUMA domain, for instance, if needed).
I agree but the compiler optimizations will break the application! Even if it looks like correct code!
Also, it should be made very clear that the use of Fibers has certain implications on using TLS (in short - don't use TLS, but FLS instead)
user cod, executed by fibers, might contain TLS. especially if thrid-party libraries is used you might not aware if it uses TLS or not. companies legacy code is not likely changed
Thus the fact whether compilers cache the TLS or not is mostly irrelevant.
I strongly disagree!
Hmmm, this is not what I would have expected, frankly. One of the main advantages of Fibers are their low creation/context switching/termination overhead. I would have expected for Fibers to be freely movable between kernel-threads (work-stealing!), even more so as you expose an interface conforming std::thread... If Fibers look like threads, they should behave accordingly.
but this is what C++ (incl. C++14) gives to us- compilers are free to optimize as they deserve. I hope you agree that is not a restriction coming from boost.fiber
The restriction that fibers can't move is imposed by the library. The question is whether the user code relies on TLS or not.
I think that it is too over-constraining to inhibit moving Fibers between kernel-threads just to maintain NUMA awareness (not all architectures have those in the first place). The decision whether to move a Fiber or not should be entirely left to a scheduler/executor (which could confine a Fiber to a NUMA domain, for instance, if needed).
I agree but the compiler optimizations will break the application! Even if it looks like correct code!
Only if the user does not know what he/she is doing. If all my Fiber is executing is a simple self-contained operation which does not rely on any contextual data (which in my experience is a large part of code which was written with parallelism/concurrency in mind), I usually don't care which core is used to run this code on (sans NUMA placement issues, which is a completely orthogonal can of worms).
Also, it should be made very clear that the use of Fibers has certain implications on using TLS (in short - don't use TLS, but FLS instead)
user cod, executed by fibers, might contain TLS. especially if thrid-party libraries is used you might not aware if it uses TLS or not. companies legacy code is not likely changed
Thus the fact whether compilers cache the TLS or not is mostly irrelevant.
I strongly disagree!
All I can say is that allowing to move Fibers between kernel-threads works very well for us (HPX) and is one of the enabling functionalities for very high resource utilization in highly parallel applications. As I said, work-stealing (i.e. moving Fibers to different cores - and as in HPX every core has its own dedicated thread - this means moving them to different threads) ensures that cores which run out of work can be kept busy. In certain contexts this is even beneficial if you steal work across NUMA domain boundaries. All of this certainly assumes that the user does not rely on TLS - and I agree this puts some restrictions on what can be done. As a Boost.Fibers library however, the decision whether to allow moving Fibers or not should not only be left to a separate scheduler/executor (as mentioned before), but in the first place this decision should be left to the user of the library. If you know it's safe to move Fibers as no TLS is involved, why not allow doing so? Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 18:18 GMT+02:00 Hartmut Kaiser
Only if the user does not know what he/she is doing. If all my Fiber is executing is a simple self-contained operation which does not rely on any contextual data (which in my experience is a large part of code which was written with parallelism/concurrency in mind),
hmm, my experience is that some times you are forced to use code/libraries from third parties you can not control/inspect (closed source), legacy code or large code base
All I can say is that allowing to move Fibers between kernel-threads works very well for us (HPX) and is one of the enabling functionalities for very high resource utilization in highly parallel applications.
so HPX does not permit the usage of TLS in user-code (which might be migrated to other threads)?
All of this certainly assumes that the user does not rely on TLS - and I agree this puts some restrictions on what can be done.
OK, so the question is which strategy boost.fiber should support - restricting the use of TLS or not.
As a Boost.Fibers library however, the decision whether to allow moving Fibers or not should not only be left to a separate scheduler/executor (as mentioned before),
wouldn't that be an alternative and reasonable strategy?
but in the first place this decision should be left to the user of the library. If you know it's safe to move Fibers as no TLS is involved, why not allow doing so?
boost.fiber needs to access the fiber_manager (for instance to suspend a fiber waiting on a mutex, joining another fiber ...) - this implies to store the fiber_manager in TLS. otherwise boost.fiber could not support something like boost::this_fiber::yield(). OK, an alternative would be a global container (no in TLS), mapping thread and its fiber_manager - but such a lookup table would degrease performance. How does HPX solve this issue? Does HPX provide this_fiber::yield() etc? beside the TLS problem related to fiber_manager, boost.fiber might support migrating fibers. you can customize the scheduling algorithm - so you could write your own scheduler which has access to the instance running in the other threads. if the a custom scheduler encounters an empty local fiber-queue it might iterate over the other scheduler instances and steal fibers.
On 09/05/2015 06:44 PM, Oliver Kowalke wrote:
2015-09-05 18:18 GMT+02:00 Hartmut Kaiser
mailto:hartmut.kaiser@gmail.com>: Only if the user does not know what he/she is doing. If all my Fiber is executing is a simple self-contained operation which does not rely on any contextual data (which in my experience is a large part of code which was written with parallelism/concurrency in mind),
hmm, my experience is that some times you are forced to use code/libraries from third parties you can not control/inspect (closed source), legacy code or large code base
All I can say is that allowing to move Fibers between kernel-threads works very well for us (HPX) and is one of the enabling functionalities for very high resource utilization in highly parallel applications.
so HPX does not permit the usage of TLS in user-code (which might be migrated to other threads)?
Yes
All of this certainly assumes that the user does not rely on TLS - and I agree this puts some restrictions on what can be done.
OK, so the question is which strategy boost.fiber should support - restricting the use of TLS or not.
I think Boost.Fiber should highly discourage its users to use TLS, more in my other answer.
As a Boost.Fibers library however, the decision whether to allow moving Fibers or not should not only be left to a separate scheduler/executor (as mentioned before),
wouldn't that be an alternative and reasonable strategy?
Absolutely
but in the first place this decision should be left to the user of the library. If you know it's safe to move Fibers as no TLS is involved, why not allow doing so?
boost.fiber needs to access the fiber_manager (for instance to suspend a fiber waiting on a mutex, joining another fiber ...) - this implies to store the fiber_manager in TLS. otherwise boost.fiber could not support something like boost::this_fiber::yield(). OK, an alternative would be a global container (no in TLS), mapping thread and its fiber_manager - but such a lookup table would degrease performance. How does HPX solve this issue? Does HPX provide this_fiber::yield() etc?
Yes, we have a slightly different approach. All HPX Fibers are called from within a scheduling loop. A yield is therefor just a context switch back to the scheduling loop, no lookup required.
2015-09-05 20:03 GMT+02:00 Thomas Heller
Absolutely
it seams to me that it makes not sense that boost.fiber tries to re-implement HPX's strategy. if a user requires migration of fibers he can use HPX, if the user has to use TLS in its code (fiber-fn) and wants an std::thread-like API (== this_fiber::yield(), mutex, condition_variable ...) he can choose boost.fiber. at the moment I don't see a possibility to eliminate the TLS-requirement for fiber_manager Yes, we have a slightly different approach. All HPX Fibers are called from
within a scheduling loop. A yield is therefor just a context switch back to the scheduling loop, no lookup required.
so HPX's scheudling loop runs in the main-context (e.g. main() or thread-fn) - in contrast boost.fiber handles the main-fiber (main() or thread-fn) as a fiber too. e.g. the main-fiber can be suspended, stored, resumed int the same way as ordinary fibers (ordinary fibers == nedd to allocate fiber-stack, created via fiber ctor). thus the difference between main-fiber and ordinary fibers (created via fiber ctor) is that the main-fiber has already a stack (provided and assigned by the os) assigned
it seams to me that it makes not sense that boost.fiber tries to re- implement HPX's strategy. if a user requires migration of fibers he can use HPX, if the user has to use TLS in its code (fiber-fn) and wants an std::thread-like API (== this_fiber::yield(), mutex, condition_variable ...) he can choose boost.fiber.
From this perspective there wouldn't be a need for Boost.Fiber in the first place as all of this (yield, mutex, condvar, etc.) is implemented in HPX already.
at the moment I don't see a possibility to eliminate the TLS-requirement for fiber_manager
Too bad, I was hoping we could reuse it in HPX :/
Yes, we have a slightly different approach. All HPX Fibers are called from within a scheduling loop. A yield is therefor just a context switch back to the scheduling loop, no lookup required.
so HPX's scheudling loop runs in the main-context (e.g. main() or thread- fn) - in contrast boost.fiber handles the main-fiber (main() or thread-fn) as a fiber too. e.g. the main-fiber can be suspended, stored, resumed int the same way as ordinary fibers (ordinary fibers == nedd to allocate fiber-stack, created via fiber ctor). thus the difference between main-fiber and ordinary fibers (created via fiber ctor) is that the main-fiber has already a stack (provided and assigned by the os) assigned
HPX has no 'main' context. It requires for the user to supply a special hpx_main() function which is used as the application's main() and which is already a fiber. // This is executed as the first fiber int hpx_main(int argc, char* argv) { // ...your app goes here... // signal end of execution to the runtime return hpx::finalize(); } int main(int argc, char* argv) { // start runtime and wait for it to finish return hpx::init(argc, argv); } HTH Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 20:31 GMT+02:00 Hartmut Kaiser
at the moment I don't see a possibility to eliminate the TLS-requirement for fiber_manager
Too bad, I was hoping we could reuse it in HPX :/
because you mentioned you have already solved this problem you could give me a hint/some little help?
at the moment I don't see a possibility to eliminate the TLS- requirement for fiber_manager
Too bad, I was hoping we could reuse it in HPX :/
because you mentioned you have already solved this problem you could give me a hint/some little help?
We don't have a fiber_manager so we don't need to solve this problem. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 20:45 GMT+02:00 Hartmut Kaiser
because you mentioned you have already solved this problem you could give me a hint/some little help?
We don't have a fiber_manager so we don't need to solve this problem.
but you told us that you use TLS in your fibers and it is completely safe to migrate the fibers to other threads?! how do you achieve that a migrated fiber accesses the correct object, stored in TLS, of the current thread and not the object of the thread the fiber was migrated from?
because you mentioned you have already solved this problem you could give me a hint/some little help?
We don't have a fiber_manager so we don't need to solve this problem.
but you told us that you use TLS in your fibers and it is completely safe to migrate the fibers to other threads?! how do you achieve that a migrated fiber accesses the correct object, stored in TLS, of the current thread and not the object of the thread the fiber was migrated from?
I said that in certain contexts it is possible/useful to use TLS even from inside user code running on a fiber. Granted, those use cases are not very common. HPX itself uses TLS to store a) a reference to the current scheduler responsible for the particular kernel-thread, and b) a reference to the currently active (running) fiber object. Both uses are safe as they are maintained by the runtime. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 21:06 GMT+02:00 Hartmut Kaiser
HPX itself uses TLS to store
a) a reference to the current scheduler responsible for the particular kernel-thread, and b) a reference to the currently active (running) fiber object.
Both uses are safe as they are maintained by the runtime.
what does that mean 'maintained by the runtime'? void bar( ptr_scheduler_thread_local_var) { printf("%p\n", ptr_scheduler_thread_local_var); suspend_and_migrate_fiber(); } void foo() { while ( true) { bar( return_ptr_of_scheduler_thread_local_var() ); ... } } in this example b) accesses a) (TLS of scheduler) prints in to stdout suspends active fiber and migrates it to another thread after resumption the address of the TLS-scheduler is printed out ... how does your code look like after compiler optimization?!
HPX itself uses TLS to store
a) a reference to the current scheduler responsible for the particular kernel-thread, and b) a reference to the currently active (running) fiber object.
Both uses are safe as they are maintained by the runtime.
what does that mean 'maintained by the runtime'? void bar( ptr_scheduler_thread_local_var) { printf("%p\n", ptr_scheduler_thread_local_var); suspend_and_migrate_fiber(); }
void foo() { while ( true) { bar( return_ptr_of_scheduler_thread_local_var() ); ... } } in this example b) accesses a) (TLS of scheduler) prints in to stdout suspends active fiber and migrates it to another thread after resumption the address of the TLS-scheduler is printed out ... how does your code look like after compiler optimization?!
a) is safe as all threads sharing the same scheduler store the same reference (fibers can be moved only inside that scheduler) b) is safe as the runtime sets the reference to the fiber it is about to run in TLS and resets it right after the fiber returned. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 21:31 GMT+02:00 Hartmut Kaiser
what does that mean 'maintained by the runtime'? void bar( ptr_scheduler_thread_local_var) { printf("%p\n", ptr_scheduler_thread_local_var); suspend_and_migrate_fiber(); }
void foo() { while ( true) { bar( return_ptr_of_scheduler_thread_local_var() ); ... } } in this example b) accesses a) (TLS of scheduler) prints in to stdout suspends active fiber and migrates it to another thread after resumption the address of the TLS-scheduler is printed out ... how does your code look like after compiler optimization?!
a) is safe as all threads sharing the same scheduler store the same reference (fibers can be moved only inside that scheduler) b) is safe as the runtime sets the reference to the fiber it is about to run in TLS and resets it right after the fiber returned.
but I was asking for the code snipped which accesses TLS variable - if all threads share the same scheduler why has it to be stored in TLS? a slight modification of my example above - replace retrn_ptr_of_scheduler_thread_lcoal_var() by return_ptr_to_active_fiber() I assume that as you mentioned in b) that the active fiber is stored in TLS - the exmaple code prints out the address of the current active fiber how does the code look like after compiler optimization?
a) is safe as all threads sharing the same scheduler store the same reference (fibers can be moved only inside that scheduler) b) is safe as the runtime sets the reference to the fiber it is about to run in TLS and resets it right after the fiber returned.
but I was asking for the code snipped which accesses TLS variable - if all threads share the same scheduler why has it to be stored in TLS? a slight modification of my example above - replace retrn_ptr_of_scheduler_thread_lcoal_var() by return_ptr_to_active_fiber() I assume that as you mentioned in b) that the active fiber is stored in TLS - the exmaple code prints out the address of the current active fiber how does the code look like after compiler optimization?
I don't know. We have never ran into any problems related to this. Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
2015-09-05 22:04 GMT+02:00 Hartmut Kaiser
a) is safe as all threads sharing the same scheduler store the same reference (fibers can be moved only inside that scheduler) b) is safe as the runtime sets the reference to the fiber it is about to run in TLS and resets it right after the fiber returned.
but I was asking for the code snipped which accesses TLS variable - if all threads share the same scheduler why has it to be stored in TLS? a slight modification of my example above - replace retrn_ptr_of_scheduler_thread_lcoal_var() by return_ptr_to_active_fiber() I assume that as you mentioned in b) that the active fiber is stored in TLS - the exmaple code prints out the address of the current active fiber how does the code look like after compiler optimization?
I don't know. We have never ran into any problems related to this.
hmm, that's strange because I got those problems as I implemented support for migrating fibers. after optimization the code form the example looks like this: void bar( Fiber * f) { printf("%p\n", f);; suspend_and_migrate(); } void foo() { Fiber * f = return_ptr_to_active_fiber(); while( true) { bar( f); } } the code is translated back from assembler (g++ -S -std=c++11 -O1 test_tls.cpp)
On 09/05/2015 08:03 PM, Oliver Kowalke wrote:
2015-09-05 20:03 GMT+02:00 Thomas Heller
mailto:thom.heller@gmail.com>: wouldn't that be an alternative and reasonable strategy?
Absolutely
it seams to me that it makes not sense that boost.fiber tries to re-implement HPX's strategy.
I don't think that is what Hartmut or I wanted to suggest. What should be possible though is to implement something like HPX on top of Boost.Fiber.
if a user requires migration of fibers he can use HPX, if the user has to use TLS in its code (fiber-fn) and wants an std::thread-like API (== this_fiber::yield(), mutex, condition_variable ...) he can choose boost.fiber.
Oh, HPX has a fully conformant implementation of something std::thread-like. We also support non stealing schedulers, but that's not important. What I think is important though is that Boost.Fiber shouldn't unnecessarily restrict its usefulness due to some overly restrictive usage requirements.
at the moment I don't see a possibility to eliminate the TLS-requirement for fiber_manager
I currently don't see why this design wouldn't allow to have threads migrated, even if the TLS value is cached, it should still be valid, right? In that case you would have something like a benign "race".
Yes, we have a slightly different approach. All HPX Fibers are called from within a scheduling loop. A yield is therefor just a context switch back to the scheduling loop, no lookup required.
so HPX's scheudling loop runs in the main-context (e.g. main() or thread-fn) - in contrast boost.fiber handles the main-fiber (main() or thread-fn) as a fiber too. e.g. the main-fiber can be suspended, stored, resumed int the same way as ordinary fibers (ordinary fibers == nedd to allocate fiber-stack, created via fiber ctor). thus the difference between main-fiber and ordinary fibers (created via fiber ctor) is that the main-fiber has already a stack (provided and assigned by the os) assigned
I understand and I certainly see the advantages of that design.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
2015-09-05 20:33 GMT+02:00 Thomas Heller
I don't think that is what Hartmut or I wanted to suggest. What should be possible though is to implement something like HPX on top of Boost.Fiber.
OK, then it was a misunderstanding
I currently don't see why this design wouldn't allow to have threads migrated
I'm confused - I assume you mean '... have fibers migrated...'?!
, even if the TLS value is cached, it should still be valid, right? In that case you would have something like a benign "race".
- fiber F1 runs in thread T1 - T1 runs fiber_manager FM1 - F1 gets migrated to another thread T2 which runs fiber_manager FM2 - as I figured out that F1, running in T2, still accesses FM1, but its should be FM2 so I'm wondering how HPX gets the TLS issue right
Only if the user does not know what he/she is doing. If all my Fiber is executing is a simple self-contained operation which does not rely on any contextual data (which in my experience is a large part of code which was written with parallelism/concurrency in mind),
hmm, my experience is that some times you are forced to use code/libraries from third parties you can not control/inspect (closed source), legacy code or large code base
Sure. You can't and shouldn't use fibers in this context anyways.
All I can say is that allowing to move Fibers between kernel-threads works very well for us (HPX) and is one of the enabling functionalities for very high resource utilization in highly parallel applications.
so HPX does not permit the usage of TLS in user-code (which might be migrated to other threads)?
It does not inhibit the use of TLS explicitly, as even in the context of Fibers the use of TLS in user code could be useful (for instance for implementing something like Cilk hyperobjects).
All of this certainly assumes that the user does not rely on TLS - and I agree this puts some restrictions on what can be done.
OK, so the question is which strategy boost.fiber should support - restricting the use of TLS or not.
Leave it to the user of your library to decide whether a particular Fiber should be movable or not.
As a Boost.Fibers library however, the decision whether to allow moving Fibers or not should not only be left to a separate scheduler/executor (as mentioned before),
wouldn't that be an alternative and reasonable strategy?
Sure, that's what I'm saying. Leave it to the user and do not inhibit moving Fibers.
but in the first place this decision should be left to the user of the library. If you know it's safe to move Fibers as no TLS is involved, why not allow doing so?
boost.fiber needs to access the fiber_manager (for instance to suspend a fiber waiting on a mutex, joining another fiber ...) - this implies to store the fiber_manager in TLS. otherwise boost.fiber could not support something like boost::this_fiber::yield().
It's your decision to keep the fiber manager in TLS or not.
OK, an alternative would be a global container (no in TLS), mapping thread and its fiber_manager - but such a lookup table would degrease performance.
How does HPX solve this issue? Does HPX provide this_fiber::yield() etc?
Yes, we do support this operation. Otherwise there wouldn't be a way to support futures, condition variables, mutexes, etc. In HPX however a yield does not directly switch to a different fiber but gives control back to the scheduler which selects the next fiber to execute.
beside the TLS problem related to fiber_manager, boost.fiber might support migrating fibers.
Good!
you can customize the scheduling algorithm - so you could write your own scheduler which has access to the instance running in the other threads. if the a custom scheduler encounters an empty local fiber-queue it might iterate over the other scheduler instances and steal fibers.
That's exactly what HPX does. A scheduler/executor object is responsible for a number of kernel-threads (you can have several of those schedulers/executors). Regards Hartmut --------------- http://boost-spirit.com http://stellar.cct.lsu.edu
Le 05/09/15 17:31, Hartmut Kaiser a écrit :
Oliver, do you mind providing the rationale for why fiber migration has been dropped or point me to the section in the docs discussing this? http://olk.github.io/libs/fiber/doc/html/fiber/rationale.html#fiber.ration ale.migrating_fibers_between_threads I'm quoting from the link you gave:
<quote> Support for migrating fibers between threads was removed. Especially in the case of NUMA-architectures, it is not always advisable to migrate data between threads. Suppose fiber f is running on logical CPU cpu0 which belongs to NUMA node node0. The data of f are allocated on the physical memory located at node0. Migrating the fiber from cpu0 to another logical CPU cpuX which is part of a different NUMA node nodeX will reduce the performance of the application because of increased latency of memory access.
A more important aspect is the problem with thread-local-storage (TLS). Instead of recomputing the address of a TLS variable, a compiler might, as an optimization, cache its previously-computed address in various function stack frames.[2] If a fiber was running on thread t0 and then migrated to thread t1, the cached TLS variable address(es) would continue pointing to the TLS for thread t0. Bad things would ensue. </quote>
Hmmm, this is not what I would have expected, frankly. One of the main advantages of Fibers are their low creation/context switching/termination overhead. I would have expected for Fibers to be freely movable between kernel-threads (work-stealing!), even more so as you expose an interface conforming std::thread... If Fibers look like threads, they should behave accordingly.
I think that it is too over-constraining to inhibit moving Fibers between kernel-threads just to maintain NUMA awareness (not all architectures have those in the first place). The decision whether to move a Fiber or not should be entirely left to a scheduler/executor (which could confine a Fiber to a NUMA domain, for instance, if needed).
Also, it should be made very clear that the use of Fibers has certain implications on using TLS (in short - don't use TLS, but FLS instead). Thus the fact whether compilers cache the TLS or not is mostly irrelevant.
Completely agreed. We need fibers migration when we associate a fiber to a job/work and we have work stealing. Vicente
2015-09-06 2:27 GMT+02:00 Vicente J. Botet Escriba : Completely agreed. We need fibers migration when we associate a fiber to a
job/work and we have work stealing. I don't know why you ignore the problem of TLS and compiler optimization in
the context of work-stealing - this is an issue.
2015-09-05 22:04 GMT+02:00 Hartmut Kaiser
but I was asking for the code snipped which accesses TLS variable - if all threads share the same scheduler why has it to be stored in TLS? a slight modification of my example above - replace retrn_ptr_of_scheduler_thread_lcoal_var() by return_ptr_to_active_fiber() I assume that as you mentioned in b) that the active fiber is stored in TLS - the exmaple code prints out the address of the current active fiber how does the code look like after compiler optimization?
I don't know. We have never ran into any problems related to this.
I did a short look at the souces from HPX - please correct me if my assumptions are false: - HPX does not support C++11, so thread_local is not available - instead you implemented thread_specific_ptr - thread_specific_ptr has an private member - static __thread T * t - so thread_specific_ptr is the TLS container - coroutine_impl has a private member - static thread_specific_ptr< self_type > self_ which is used to access the 'active fiber/coroutine' via static member function - static self_type * get_self() - coroutine_impl::operator() executes the function/code passed to coroutine_impl - the implementation of coroutine::operator() looks like this (in short): void operator() { do { self_type * old_self = coroutine_impl::get_self(); self_type self( this, old_self); reset_self_on_exit( & self, old_self); fun(); // execute function } while ( state == running); } I the compiler optimizes the code - self_type * old_self = coroutine_impl::get_self(); - will likely be moved out of the do-loop. void operator() { self_type * old_self = coroutine_impl::get_self(); do { self_type self( this, old_self); reset_self_on_exit( & self, old_self); fun(); // execute function } while ( state == running); } This happens probably in other places of HPX too - the question is why you never ran into any problems.
2015-09-05 4:26 GMT+02:00 Oliver Kowalke
2015-09-04 23:08 GMT+02:00 Vicente J. Botet Escriba < vicente.botet@wanadoo.fr>:
Oliver, is there something in the documentation that refers to each one of
the points that must be covered?
you refer to the issues of the last review? I did not add the list to the boost.fiber documentation
requested was provide performance tests: - create, join, detach, yield fiber - wait on future - create on join several fibers - cost of thread safety (atomics) improve documentation - rationale page explaining what's there, what's not and why - explain distinction between Coroutine library and Fiber library - a section on how to install and run the tests and examples. The need to embed in a Boost tree is implied but not stated. Mention the need to build the library and link with it. - explain synchronization between fibers on different threads. Must the code take more care with this than with synchronizing fibers on the same thread? - clarify that an exception raised by a fiber function calls std::terminate(), as with std::thread, rather than being consumed - clarify thread-local effect of set_scheduling_algorithm(). There was a request to put this function in a this_thread nested namespace to further clarify - Move algorithm class documentation to "Extension" or "Customization" section. Clarify that it's not part of the baseline library functionality, but a customization point. - Document fiber::id. - Better document promise/future for void and R& (per C++ standard). - Document thread safety of each support class (or method, if it varies by method). - Document complexity guarantees per API. - Document exception safety per API. - Document supported architectures (perhaps link to Coroutine library's list); state minimum compiler versions. - Document the library's ASIO support. Link to Coroutine's ASIO yield functionality; ensure that ASIO yield is adequately explained. In particular, distinguish it from this_fiber::yield. - Better explain (and/or comment) publish-subscribe example, also other existing examples. In addition to the documentation requests above, there were requests for additional examples: - Simple example of ASIO callback implementation vs. the same logic using Fiber's ASIO support, a la [4]. - Example of a fiber pool. - Example of an arbitrary thread B filling a future on which a fiber in thread A is waiting. - Example of an arbitrary thread B posting to an asio::io_service running fibers in thread A. - Either defend fibers::condition_variable from spurious wakeups in existing examples, OR document the stronger condition_variable guarantee. - Example of M:N threading with ASIO. That might involve either one io_service per CPU, with fiber migration; or a single io_service with run() calls from each CPU, grouping fibers for each CPU into strands. - Example of one thread with many fibers making service requests on a pool of worker threads performing blocking calls. - Example of using thread_specific_ptr to manage lifespan of user-specified scheduler. - Example of the owner of a fiber changing the fiber's thread affinity vs. the fiber itself. When would you use each tactic? - Load example programs into an Examples appendix so that Google searches can turn up library documentation. API - Allocating a default scheduler object, rather than specifying a default template param, was praised. - Three people called out the set difference between Boost.Thread features and Fiber (e.g. future::get_exception_ptr()). One wants these implemented immediately; another says they can be added later; the third simply requests that they be documented, with rationale. - Two people frowned on introducing operator bool methods not found in std::thread or Boost.Thread. - C++11 support was mentioned, notably Boost macros such as BOOST_RV_REF and BOOST_EXPLICIT_OPERATOR_BOOL. Also mentioned were: C++11 idioms; C++11 std::thread patterns; move construction; initializer lists; rvalue this overloads; deleting operators. - The fiber constructor and async() should accept a move-only callable. - At least for a C++11 compiler, fiber constructor and async() should accept variadic parameters. These should support move-only types, like Boost.Thread. C++03 support for variadic parameters would be nice, but is less important. - Every API involving time point or duration should accept arbitrary clock types, immediately converting to a canonical duration type for internal use. - Queues should support value_pop() returning item by value. This supports an item type without a default constructor. - Nested scoped_lock typedef has been deprecated in thread library. Remove in Fiber library. - Align the return type of shared_future::get() with the standard. In general, ensure that parameter types and return types are aligned with the standard. - A couple of people were bothered by the use of types in the detail namespace as parameters or return values in the algorithm API. (I note, however, that extending e.g. Boost.Range can involve touching its detail namespace. A customization point for a library may be a bit of a gray area.) - There was a suggestion to rename algorithm to scheduler. In that case, presumably set_scheduling_algorithm() could be renamed set_scheduler(). - There was a request to rename round_robin_ws to round_robin_work_stealing. - A couple of people consider the algorithm API too monolithic, pointing to redundancies in the round_robin, round_robin_ws and asio round_robin implementations. They suggested teasing out distinct classes, so that (for instance) a user-coded scheduler might be able to override a single method to respect fiber priority. In fact Eugene Yakubovich offered to experiment with refactoring the algorithm class this way. - There was a request for set_scheduling_algorithm() to return the previous pointer. (It might be useful for the requester to explain the anticipated use case. An earlier iteration of set_scheduling_algorithm() did return the previous pointer; Oliver intentionally changed that behavior.) - fiber_group got one thumbs-up and two thumbs-down. Options: retain; improve to use move support rather than fiber*; discard. There is an opportunity to improve on thread_group; naturally there is risk in diverging from thread_group. - Request deferred futures for lazy evaluation. - There was a suggestion to introduce a global object to coordinate thread-specific fiber schedulers, in the hope that the global object could perform all relevant locking and the thread-specific fiber schedulers could themselves be thread-unsafe. - There was a request to unify steal_from() and migrate_to() into a single method. I infer that this is predicated on the previous suggestion. - Request future::then() et al, per [5]. (Someone please clarify the present status of N3784?) - Request enriched barrier support per [6] and [7]. (Someone please clarify the present status of N3817?) - There are two fiber properties specific to particular schedulers: thread_affinity (used only by round_robin_ws) and priority (as yet unused by any scheduler). What if a user-coded scheduler requires a fiber property that does not yet exist? Is there a general approach that could subsume the present support for thread_affinity and priority, in fiber and this_fiber? Could the initial values for such properties be passed as part of the fiber constructor's attributes parameter? - One use case was surfaced that may engage the previous bullet: the desire to associate a given fiber with any of a group of threads, such as the set of threads local to a NUMA domain or physical CPU. IMPLEMENTATION - Replace std::auto_ptr with boost::scoped_ptr. The former produces deprecation warnings on GCC. - Reduce redundancy between try_lock() and lock(). - boost::fibers::asio::detail::yield_handler::operator()() calls algorithm::spawn() before algorithm::run(). Does this allow the scheduler to choose the next fiber to run, e.g. a user-coded scheduler that respects fiber priority? - Add memory transaction support to spinlock a la [8]. - Intel TSX lock avoidance would be nice.
On 09/06/2015 06:30 AM, Oliver Kowalke wrote:
2015-09-05 22:04 GMT+02:00 Hartmut Kaiser
mailto:hartmut.kaiser@gmail.com>: > but I was asking for the code snipped which accesses TLS variable - if all > threads share the same scheduler why has it to be stored in TLS? > a slight modification of my example above - replace > retrn_ptr_of_scheduler_thread_lcoal_var() by return_ptr_to_active_fiber() > I assume that as you mentioned in b) that the active fiber is stored in > TLS - the exmaple code prints out the address of the current active fiber > how does the code look like after compiler optimization?
I don't know. We have never ran into any problems related to this.
I did a short look at the souces from HPX - please correct me if my assumptions are false:
- HPX does not support C++11, so thread_local is not available - instead you implemented thread_specific_ptr
It does support C++11 and C++14. thread_specific_ptr is an artifact from the past but still perfectly conformant in C++11/14.
- thread_specific_ptr has an private member - static __thread T * t - so thread_specific_ptr is the TLS container
correct.
- coroutine_impl has a private member - static thread_specific_ptr< self_type > self_ which is used to access the 'active fiber/coroutine' via static member function - static self_type * get_self()
correct.
- coroutine_impl::operator() executes the function/code passed to coroutine_impl
- the implementation of coroutine::operator() looks like this (in short):
void operator() { do { self_type * old_self = coroutine_impl::get_self(); self_type self( this, old_self); reset_self_on_exit( & self, old_self);
fun(); // execute function } while ( state == running); }
I the compiler optimizes the code - self_type * old_self = coroutine_impl::get_self(); - will likely be moved out of the do-loop.
nope it won't. see my other post for an explanation of this.
void operator() { self_type * old_self = coroutine_impl::get_self(); do { self_type self( this, old_self); reset_self_on_exit( & self, old_self);
fun(); // execute function } while ( state == running); }
This happens probably in other places of HPX too - the question is why you never ran into any problems.
because we use the value, not the address of the TLS variable, which is of course constant.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 09/06/2015 06:12 AM, Oliver Kowalke wrote:
2015-09-06 2:27 GMT+02:00 Vicente J. Botet Escriba
mailto:vicente.botet@wanadoo.fr>: Completely agreed. We need fibers migration when we associate a fiber to a job/work and we have work stealing.
I don't know why you ignore the problem of TLS and compiler optimization in the context of work-stealing - this is an issue.
It's certainly an issue in user code which relies on the address of a thread local variable to stay the same. The Fiber library, itself however should not be affected by this. That is it can safely store a reference to the currently running fiber or thread manager in a TLS variable. The trick is, of course to update the variable on each context switch. Let's analyze the situation a little further. Here is a link to the slightly changed code from http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread...: http://goo.gl/A9J7ae The compiler has of course any right to assume the address of test is constant in any case, as such, it has every right to "cache" it somewhere. However, as soon as you try to get the *value* of that thing, no such optimizations are possible anymore because bar() is aliasing it (in fact, since it is a global variable, it is aliased by anything and it can't be "cached"). As such, storing the result of "this_fiber::get_id" into a TLS variable is safe even in the presence of migrating fibers to other threads. You can even further relax the situation if you use something like the baz function, where the compiler has no notion of where the pointer is coming from. This of course completely ignores how user code might use thread local variables, which I still believe should be highly discouraged when using anything fiber-like. But it is still not in the scope of Boost.Fiber to make that decision, it should be possible to migrate fibers to other threads.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On Sun, Sep 6, 2015 at 8:11 AM, Thomas Heller
On 09/06/2015 06:12 AM, Oliver Kowalke wrote:
I don't know why you ignore the problem of TLS and compiler optimization in the context of work-stealing - this is an issue.
The Fiber library, itself however should not be affected by this. That is it can safely store a reference to the currently running fiber or thread manager in a TLS variable. The trick is, of course to update the variable on each context switch.
... it is still not in the scope of Boost.Fiber to make that decision, it should be possible to migrate fibers to other threads.
Let me put a stake in the ground: I assert that the present design of the Fiber library permits cross-thread fiber migration. The primary reason for the documented prohibition is the TLS optimization/bug under discussion. If that can be robustly overcome, the Fiber library can claim support for fiber migration. Consider this program: https://github.com/nat-goodspeed/boost-fiber/blob/shared_ready_queue/example... The library design permits a custom sched_algorithm::pick_next() implementation to return a fiber_context* from any thread. The shared_ready_queue scheduler illustrates this point. For simplicity, it is not a "work-stealing" scheduler, but rather a "work-sharing" scheduler. Obviously more sophisticated logic could be built around that notion. However -- at least with Boost 1.59.0 on Ubuntu 14.04 with gcc 4.9.2 -- the cited program pretty reliably crashes with SIGSEGV. Oliver has requested help overcoming the TLS optimization/bug. It would be wonderful if one of you requesting support for fiber migration would be willing to suggest a way for a given fiber to robustly locate the fiber_manager, and so forth.
This is probably the most naive reply ever, so I'll keep it brief. I could imagine that on creation, a UUID could be created using boost::uuid. Using boost::interprocess, the fiber_manager could be put into a named shared memory, the name of which is the UUID. The UUID is then passed to any fiber upon creation so that all it needs to do is open the named interprocess pool to access the fiber_manager. A big redesign? Maybe, I haven't looked at the library. But maybe there is something of value in that concept. I have had to do something similar and more complex myself recently to update legacy 16-bit code that migrated things between /processes/ that it shouldn't have. On 9/6/2015 4:19 PM, Nat Goodspeed wrote:
Oliver has requested help overcoming the TLS optimization/bug. It would be wonderful if one of you requesting support for fiber migration would be willing to suggest a way for a given fiber to robustly locate the fiber_manager, and so forth.
On Fri, Sep 4, 2015 at 5:08 PM, Vicente J. Botet Escriba
wrote: Le 04/09/15 20:37, Nat Goodspeed a écrit :
On Fri, Sep 4, 2015 at 2:07 PM, Vicente J. Botet Escriba
wrote: Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues? http://lists.boost.org/boost-announce/2014/01/0393.php
I have not yet tried to address those point by point. I don't understand then why are we doing the mini review now, before you check that any point has at least tried to be addressed. Sorry. How about these points:
Performance: Oliver has not only worked to improve performance, he has included and documented performance tests you can run on your own hardware. Great, I will check. See below. Documentation: The documentation now contains several new sections explaining how to use the library for interesting/common use cases. New examples are presented and documented. See below.
API: The API has been aligned more closely with std::thread. C++14 is not only supported but required. Move-only callables are supported. Variadic parameters are supported. std::chrono is more generically supported. Channels now support value_pop(). fiber_group has been dropped. Migrating fibers between threads has been dropped. See below. That said, of course, it is up to each reviewer to state for him- or herself whether s/he believes that the Fiber library should become part of Boost. In particular, regardless of what Oliver or I might synopsize, it is up to each previous reviewer to decide whether his January 2014 objections have been addressed. To be clear, I believe that by respect to the reviewers you should take
Le 05/09/15 16:27, Nat Goodspeed a écrit : the review summary you wrote and add a comment for each point. Best, Vicente
Le 06/09/15 02:21, Vicente J. Botet Escriba a écrit :
Le 05/09/15 16:27, Nat Goodspeed a écrit :
On Fri, Sep 4, 2015 at 5:08 PM, Vicente J. Botet Escriba
wrote: Le 04/09/15 20:37, Nat Goodspeed a écrit :
On Fri, Sep 4, 2015 at 2:07 PM, Vicente J. Botet Escriba
wrote: Please could you recall us what "not in the present form" meant as a result of the review and what has been done to overcome these issues? http://lists.boost.org/boost-announce/2014/01/0393.php
I have not yet tried to address those point by point. I don't understand then why are we doing the mini review now, before you check that any point has at least tried to be addressed. Sorry. How about these points:
Performance: Oliver has not only worked to improve performance, he has included and documented performance tests you can run on your own hardware. Great, I will check. See below. Documentation: The documentation now contains several new sections explaining how to use the library for interesting/common use cases. New examples are presented and documented. See below.
API: The API has been aligned more closely with std::thread. C++14 is not only supported but required. Move-only callables are supported. Variadic parameters are supported. std::chrono is more generically supported. Channels now support value_pop(). fiber_group has been dropped. Migrating fibers between threads has been dropped. See below. That said, of course, it is up to each reviewer to state for him- or herself whether s/he believes that the Fiber library should become part of Boost. In particular, regardless of what Oliver or I might synopsize, it is up to each previous reviewer to decide whether his January 2014 objections have been addressed. To be clear, I believe that by respect to the reviewers you should take the review summary you wrote and add a comment for each point.
Best, Vicente
Hi again, In addition to my last request and in order to be able to write a review, I would like to know what exactly has been changed since the first review, what was expected after the review summary and what was removed, changed or added and what was the rationale. Best, Vicente
2015-09-04 23:08 GMT+02:00 Vicente J. Botet Escriba < vicente.botet@wanadoo.fr>: I don't understand then why are we doing the mini review now, before you
check that any point has at least tried to be addressed.
I've taken care that almost all requests from the former review have been addressed. But I've not added the list to the documentation.
Am 04.09.2015 5:15 nachm. schrieb "Nat Goodspeed"
Hi all,
The mini-review of Boost.Fiber by Oliver Kowalke begins today, Friday September 4th, and closes Sunday September 13th. It was reviewed in January 2014; the verdict at that time was "not in its present form." Since then Oliver has substantially improved documentation, performance, library customization and the underlying implementation, and is bringing the library back for mini-review.
In the performance section, were all tests executed on a single thread? Regarding the different stack allocators, do they have any noticeable impact on performance?
2015-09-05 9:58 GMT+02:00 Thomas Heller
yes
Regarding the different stack allocators, do they have any noticeable impact on performance?
after modifying the tests (excluding fiber creation) following numbers are measured ( Intel Core2 Q6700): create fiber: 205 ns join fiber: 950 ns detach fiber: 21 ns yield fiber: 189 ns
2015-09-05 12:22 GMT+02:00 Oliver Kowalke
2015-09-05 9:58 GMT+02:00 Thomas Heller
: In the performance section, were all tests executed on a single thread?
yes
Regarding the different stack allocators, do they have any noticeable impact on performance?
after modifying the tests (excluding fiber creation) following numbers are measured ( Intel Core2 Q6700):
create fiber: 205 ns join fiber: 950 ns detach fiber: 21 ns yield fiber: 189 ns
changes commited to branch develop numbers are available at http://olk.github.io/libs/fiber/doc/html/fiber/performance.html
On Fri, Sep 4, 2015 at 11:14 AM, Nat Goodspeed
The mini-review of Boost.Fiber by Oliver Kowalke begins today, Friday September 4th, and closes Sunday September 13th.
I'm pleased to see the level of interest in this library. Many people have contributed to the discussions so far. However, as of this moment we have no definite reviews in hand. I invite those of you who have an opinion to state explicitly whether you believe the candidate Fiber library should, or should not, be included in Boost. If, regardless of your yes/no vote, you also have ideas about how the library could/should be improved, please state them as explicitly as possible to give the library author the best chance to act. If you have already elaborated a particular suggestion in previous mail, please at least summarize and say so. Please especially note any change that you consider a requirement for library adoption. Finally -- please distinguish between "perfect" and "good enough." ;-) Nat Goodspeed Boost.Fiber Review Manager ________________________________
participants (6)
-
David Schneider
-
Hartmut Kaiser
-
Nat Goodspeed
-
Oliver Kowalke
-
Thomas Heller
-
Vicente J. Botet Escriba