SUMMARY The review of the proposed Boost.Fiber library ended on January 15, 2014. The verdict is: not in its present form. The lively discussions during the course of the review indicate considerable interest in this library, however, and every submitted review can be read as: perhaps, if certain changes are made. This produced a long list of suggestions, which constitutes the bulk of this report. On behalf of the Boost community, I would like to commend Oliver Kowalke for the work he has put into the Fiber library so far. I encourage him to continue to evolve this library and to bring it back for a mini-review. Rather than considering the subject library as a whole, a Boost mini-review zooms in on particular issues. That seems an appropriate tool to use for further evaluation of the Fiber library -- always subject, of course, to the approval of the Review Wizards. I received seven formal votes, abbreviated here: Niall Douglas: YES, IF Eugene Yakubovich: YES, AND Antony Polukhin: YES, IF Vicente J. Botet Escriba: NO, UNTIL Agustín K-ballo Bergé: NO, IN CURRENT STATE Hartmut Kaiser: NO, IN CURRENT FORM Thomas Heller: NO, IN CURRENT FORM I thank each of these reviewers, and indeed everyone who investigated the Fiber library and participated in the discussions. If you feel that I have misrepresented your position, or have omitted or garbled an important point, please respond: the archived mail thread should accurately reflect the will of the community, even if this message does not.
From the long list of suggestions, a few key themes emerged.
PERFORMANCE Many respondents requested performance tests (specifics below). There were a number of suggestions about possible performance pitfalls in the current implementation, such as use of STL containers (with consequent heap allocations) and locks for thread safety. Several people suggested implementing performance tests *before* starting any such optimizations, which seems like sensible advice. Niall Douglas pointed out that picking some fixed number of fibers is less interesting than showing how resource consumption rises with the number of fibers: "I think it isn't unreasonable for a library to enter Boost if it has good performance *scaling* to load (e.g. O(log N)), even if performance in the absolute or comparative-to-near-alternatives sense is not great. "Absolute performance can always be incrementally improved later, whereas poor performance scaling to load usually means the design is wrong and you're going to need a whole new library with new API. "This is why I really wanted to see performance scaling graphs. If they show O(N log N) or worse, then the design is deeply flawed and the library must not enter Boost. Until we have such a graph, we can't know as there is no substitute for empirical testing." Given the truth of that last observation, I have recast certain suggested optimizations as requests for measured performance cost. Performance requests include: - empty function (create, schedule, execute, delete one fiber) - same with one yield - overhead of using a future object - tests as described in [1]; such tests should allow comparing with TBB, qthreads, openmp, HPX - construct and join a single fiber vs. construct and join a single thread (empty function) - construct and join several fibers vs. construct and join several threads (empty function) - construct and detach a single fiber vs. construct and detach a single thread (empty function) - cost of STL containers (std::vector, std::map, std::deque) - cost of fiber interruption support (same as next bullet?) - cost of thread safety: intra-thread fibers vs. inter-thread fibers - cost of round_robin_ws vs. round_robin scheduler - cost of building on Coroutine vs. building on Context Once Fiber performance tests are available, I trust the community will assist Oliver in running them on systems otherwise unavailable to him. There was some disagreement on whether it is essential for the Fiber library to attain a certain level of performance before it should be accepted. Respondents fall into two broad camps. Some see fibers as lightweight threads (without the overhead of kernel context switching). They assert that since the API is that of std::thread, the only reason to accept a separate library would be astounding performance. They note that on some hardware it is already plausible to run hundreds of thousands of concurrent std::threads. Others see fibers as a tool for organizing code based around asynchronous operations (rather than chains of callbacks). They assert that for such purposes, eliminating the kernel from context switching is sufficient performance guarantee. I myself maintain that cooperative context switching provides important functionality you cannot reasonably get from std::thread. The Fiber library attempts to address both use cases. (It was suggested that if its primary target were code organization, perhaps it should be renamed something other than "Fiber.") In any case, the community clearly wants to see Fiber performance data. Some requested CPU cycles to eliminate clock speed differences, also memory consumption. ARMv7 data was requested for "extra bonus points." DOCUMENTATION Many respondents requested additional documentation, or clarifications. Requests include: - Rationale page explaining what's there, what's not and why. Explain distinction between Coroutine library and Fiber library. If certain Fiber functionality is intended to support yet another library (rather than being complete in itself), call out what would need to be added. - A section on how to install and run the tests and examples. The need to embed in a Boost tree is implied but not stated. Mention the need to build the library and link with it. - Explain synchronization between fibers on different threads. Must the code take more care with this than with synchronizing fibers on the same thread? - Clarify that an exception raised by a fiber function calls std::terminate(), as with std::thread, rather than being consumed. - More clearly explain migrating a fiber from one scheduler to another. - Document async() in a way compatible with the C++ standard. - Clarify thread-local effect of set_scheduling_algorithm(). There was a request to put this function in a this_thread nested namespace to further clarify. - Move algorithm class documentation to "Extension" or "Customization" section. Clarify that it's not part of the baseline library functionality, but a customization point. - Document fiber::id. - Better document promise/future for void and R& (per C++ standard). - Document thread safety of each support class (or method, if it varies by method). - Document complexity guarantees per API. - Document exception safety per API. - Document supported architectures (perhaps link to Coroutine library's list); state minimum compiler versions. - Document the get/set overloads of thread_affinity() and priority() separately. Perhaps rename setters to set_thread_affinity() and set_priority(). - Explain how portable is fiber priority, if it's specific to a scheduler. What does priority mean when you migrate a fiber from one thread (one scheduler instance) to another? - Document the library's ASIO support. Link to Coroutine's ASIO yield functionality; ensure that ASIO yield is adequately explained. In particular, distinguish it from this_fiber::yield. - Better explain (and/or comment) publish-subscribe example, also other existing examples. In addition to the documentation requests above, there were requests for additional examples: - Simple example of ASIO callback implementation vs. the same logic using Fiber's ASIO support, a la [4]. - Example of a fiber pool. - Example of an arbitrary thread B filling a future on which a fiber in thread A is waiting. - Example of an arbitrary thread B posting to an asio::io_service running fibers in thread A. - Either defend fibers::condition_variable from spurious wakeups in existing examples, OR document the stronger condition_variable guarantee. - Example of M:N threading with ASIO. That might involve either one io_service per CPU, with fiber migration; or a single io_service with run() calls from each CPU, grouping fibers for each CPU into strands. - Example of one thread with many fibers making service requests on a pool of worker threads performing blocking calls. - Example of using thread_specific_ptr to manage lifespan of user-specified scheduler. - Example of the owner of a fiber changing the fiber's thread affinity vs. the fiber itself. When would you use each tactic? - Load example programs into an Examples appendix so that Google searches can turn up library documentation. LIBRARY API - Four people overtly approved the close parallel with std::thread and its support classes. - Allocating a default scheduler object, rather than specifying a default template param, was praised. - Three people called out the set difference between Boost.Thread features and Fiber (e.g. future::get_exception_ptr()). One wants these implemented immediately; another says they can be added later; the third simply requests that they be documented, with rationale. - Two people frowned on introducing operator bool methods not found in std::thread or Boost.Thread. - C++11 support was mentioned, notably Boost macros such as BOOST_RV_REF and BOOST_EXPLICIT_OPERATOR_BOOL. Also mentioned were: C++11 idioms; C++11 std::thread patterns; move construction; initializer lists; rvalue this overloads; deleting operators. - The fiber constructor and async() should accept a move-only callable. - At least for a C++11 compiler, fiber constructor and async() should accept variadic parameters. These should support move-only types, like Boost.Thread. C++03 support for variadic parameters would be nice, but is less important. - Every API involving time point or duration should accept arbitrary clock types, immediately converting to a canonical duration type for internal use. - Queues should support value_pop() returning item by value. This supports an item type without a default constructor. - Nested scoped_lock typedef has been deprecated in thread library. Remove in Fiber library. - Align the return type of shared_future::get() with the standard. In general, ensure that parameter types and return types are aligned with the standard. - A couple of people were bothered by the use of types in the detail namespace as parameters or return values in the algorithm API. (I note, however, that extending e.g. Boost.Range can involve touching its detail namespace. A customization point for a library may be a bit of a gray area.) - There was a suggestion to rename algorithm to scheduler. In that case, presumably set_scheduling_algorithm() could be renamed set_scheduler(). - There was a request to rename round_robin_ws to round_robin_work_stealing. - A couple of people consider the algorithm API too monolithic, pointing to redundancies in the round_robin, round_robin_ws and asio round_robin implementations. They suggested teasing out distinct classes, so that (for instance) a user-coded scheduler might be able to override a single method to respect fiber priority. In fact Eugene Yakubovich offered to experiment with refactoring the algorithm class this way. - There was a request for set_scheduling_algorithm() to return the previous pointer. (It might be useful for the requester to explain the anticipated use case. An earlier iteration of set_scheduling_algorithm() did return the previous pointer; Oliver intentionally changed that behavior.) - fiber_group got one thumbs-up and two thumbs-down. Options: retain; improve to use move support rather than fiber*; discard. There is an opportunity to improve on thread_group; naturally there is risk in diverging from thread_group. - Request deferred futures for lazy evaluation. - There was a suggestion to introduce a global object to coordinate thread-specific fiber schedulers, in the hope that the global object could perform all relevant locking and the thread-specific fiber schedulers could themselves be thread-unsafe. - There was a request to unify steal_from() and migrate_to() into a single method. I infer that this is predicated on the previous suggestion. - Request future::then() et al, per [5]. (Someone please clarify the present status of N3784?) - Request enriched barrier support per [6] and [7]. (Someone please clarify the present status of N3817?) - There are two fiber properties specific to particular schedulers: thread_affinity (used only by round_robin_ws) and priority (as yet unused by any scheduler). What if a user-coded scheduler requires a fiber property that does not yet exist? Is there a general approach that could subsume the present support for thread_affinity and priority, in fiber and this_fiber? Could the initial values for such properties be passed as part of the fiber constructor's attributes parameter? - One use case was surfaced that may engage the previous bullet: the desire to associate a given fiber with any of a group of threads, such as the set of threads local to a NUMA domain or physical CPU. IMPLEMENTATION - Replace std::auto_ptr with boost::scoped_ptr. The former produces deprecation warnings on GCC. - Reduce redundancy between try_lock() and lock(). - boost::fibers::asio::detail::yield_handler::operator()() calls algorithm::spawn() before algorithm::run(). Does this allow the scheduler to choose the next fiber to run, e.g. a user-coded scheduler that respects fiber priority? - Add memory transaction support to spinlock a la [8]. - Intel TSX lock avoidance would be nice. LINKS [1] https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance [2] http://stellar.cct.lsu.edu/pubs/isc2012.pdf [3] http://stellar.cct.lsu.edu/pubs/scala13.pdf [4] https://ci.nedprod.com/job/Boost.AFIO%20Build%20Documentation/Boost.AFIO_Doc... [5] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3784.pdf [6] http://www.boost.org/doc/libs/1_55_0/doc/html/thread/synchronization.html#th... [7] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3817.html#barrier_o... [8] https://github.com/BoostGSoC/boost.afio/blob/master/boost/afio/detail/ Nat Goodspeed Boost.Fiber Review Manager ________________________________