Problems with yield_k workaround. Still too slow.
Hello Boost team, As exposed by Gav Wood in http://boost.2283326.n4.nabble.com/Boost-Interprocess-Chronic-performance-on... Windows doesn't handle well the contest switching between threads. Basically if a thread is given a time to sleep for 1ms then the OS puts that thread on sleep for its entire timeslice (~20ms), no matter if the concurrent threads need a much shorter time to complete their job. The effect of this choice is that if you need to have two threads talking with each other through an emulated synchronization object, they take well 2 seconds to do just 100 back and forth. At least it seems to me that this problem arises only when *emulating * a synchronization object, which I understand is necessary, for example, in the interprocess library. It's not a problem of easy solution. There is an ongoing discussion in the interprocess library on how to fix this (see for example http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... ) but clearly this is not a problem limited to it. The workaround http://www.boost.org/doc/libs/1_54_0/boost/smart_ptr/detail/yield_k.hpp improves the situation, yet only a bit. In order to isolate the problem I've run some tests using the interprocess library, with the example comp_doc_anonymous_conditionA.cpp (and B). This test is slowed down by this time of a factor ranging from 2 to 10, 100 or even 1000 or more times depending on how soon that infamous Sleep(1) is triggered. *The bottom line is that in yield_k.cpp we need to change the constant in the code * from 32 to *1000 * at least. I call this constant kMax: Here some test results, running XPsp3 on Xeon E5450 - dual socket 32bit CPU quad cores. 2GHz core speed. For test I changed comp_doc_anonymous_conditionA.cpp : that is 100000 iterations, while in comp_doc_anonymous_conditionB.cpp I've added the line just to emulate a job done by the thread while owning the mutex. That is: The results are (these numbers are representative of many tests I've run) : kMax = 32, jMax = 0 : time = 00:00:01.125000 kMax = 50, jMax = 0 : time = 00:00:00.403750 kMax = 100, jMax = 0 : time = 00:00:00.2803750 kMax = 65535, jMax = 0 : time = 00:00:00.203125 kMax ~= 64 is good kMax = 32, jMax = 10 : time = 00:00:32.843750 kMax = 100, jMax = 10 : time = 00:00:02.750000 kMax = 1000, jMax = 10 : time = 00:00:00.859375 kMax = 65535, jMax = 10 : time = 00:00:00.328125 kMax ~= 1000 is good kMax = 32, jMax = 100 : time = 00:04:01.953125 kMax = 50, jMax = 100 : time = 00:00:04.296875 kMax = 1000, jMax = 100 : time = 00:00:02.062500 kMax = 65535, jMax = 100 : time = 00:00:01.593750 kMax ~= 1000 is good If the job takes longer we need to increase kMax even more, but it seems that a value of 65535 cuts the problem and makes sure that a locked thread doesn't take most of the cpu time for more than one timeslice. -- View this message in context: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... Sent from the Boost - Dev mailing list archive at Nabble.com.
Marcello Pietrobon wrote:
*The bottom line is that in yield_k.cpp we need to change the constant in the code *
from 32 to *1000 * at least.
I do not object to such a change (32 -> 1024 and perhaps 16 -> 64 or higher for the 'pause' case as well); however, it would be prudent to first test it against other usage scenarios. I can provide two test cases that would likely be affected, sp_atomic_mt_test.cpp and sp_atomic_mt2_test.cpp, and I'm sure that there are other Interprocess usage examples that can be checked. http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/test/sp_atomic_mt_test.c... http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/test/sp_atomic_mt2_test....
El 21/08/2013 10:41, Peter Dimov escribió:
I do not object to such a change (32 -> 1024 and perhaps 16 -> 64 or higher for the 'pause' case as well); however, it would be prudent to first test it against other usage scenarios. I can provide two test cases that would likely be affected, sp_atomic_mt_test.cpp and sp_atomic_mt2_test.cpp, and I'm sure that there are other Interprocess usage examples that can be checked.
http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/test/sp_atomic_mt_test.c...
http://www.boost.org/doc/libs/1_54_0/libs/smart_ptr/test/sp_atomic_mt2_test....
As I wrote in the "[Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser" thread: "It's definitely hard to tell if 1000 will be OK for everyone, as it might depend on the CPU speed or waiter count". It might also depend on the CPU count (on uniprocessor we should always yield) spinlock implementations found on the net are quite varied: -> some only yield once and sleep after that because they say it's better on high contention cases -> some just spin (optimized for very short waits). -> some measure wait times to know how much to spin http://www.1024cores.net/home/lock-free-algorithms/tricks/spinning http://src.chromium.org/svn/trunk/src/third_party/tcmalloc/chromium/src/base... I'm afraid there is no a unique solution. In Marcello's example (condition variable) I think he needs to yield current timeslice to the sibling thread as it needs a fast roundtrip between the waiter and the notifier. Best, Ion
-- View this message in context: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... Sent from the Boost - Dev mailing list archive at Nabble.com.
The latest version, by Ion, using his spin_wait class solves completely the problem, at least in the cases I could test. An interesting note: Microsoft has introduced a function that completely solves the sleep(1) problem: http://msdn.microsoft.com/en-us/library/system.threading.thread.spinwait.aspx?cs-save-lang=1&cs-lang=cpp#code-snippet-1 No comments on their decision to provide it only for their .net library and not for C++ as it seems to be. -- View this message in context: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... Sent from the Boost - Dev mailing list archive at Nabble.com.
On 7 Sep 2013 at 10:59, Marcello Pietrobon wrote:
The latest version, by Ion, using his spin_wait class solves completely the problem, at least in the cases I could test.
An interesting note: Microsoft has introduced a function that completely solves the sleep(1) problem: http://msdn.microsoft.com/en-us/library/system.threading.thread.spinwait.aspx?cs-save-lang=1&cs-lang=cpp#code-snippet-1
No comments on their decision to provide it only for their .net library and not for C++ as it seems to be.
It's very, very easy to roll your own in C++. Simply loop the PAUSE x86 instruction (PAUSE = REP NOP, i.e. loop no-op). Hyperthreading CPUs understand this means to go execute the other Hyperthread instead. Niall -- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
The latest version, by Ion, using his spin_wait class solves completely the problem, at least in the cases I could test. An interesting note: Microsoft has introduced a function that completely solves the sleep(1) problem: http://msdn.microsoft.com/en-us/library/system.threading.thread.spinwait.aspx?cs-save-lang=1&cs-lang=cpp#code-snippet-1 No comments on their decision to provide it only for their .net library and not for C++ as it seems to be. -- View this message in context: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... Sent from the Boost - Dev mailing list archive at Nabble.com.
participants (4)
-
Ion Gaztañaga
-
Marcello Pietrobon
-
Niall Douglas
-
Peter Dimov