Re: [boost] [Boost.Interprocess] conditions variables get 10 times faster when opening a multiprocess browser
I've noticed the same 10 times acceleration even while I make a skyPE call... How can this have anything to do with interprocess programming? -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
I've applied the workaround suggested by Gav Wood and that seem to be able to completely fix the problem. Still I don't get why we should see such huge speed up when some other applications connecting to internet are open. Besides, despite the fact the cause of this seems entirely due to the inner working of Windows OS (sigh) I'm still left wondering why it is not possible to *optionally* implement Gav's workaround in the library when BOOST_INTERPROCESS_WINDOWS is defined, considering the increase in performance can be up to almost 1000 times. -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
El 18/08/2013 1:11, Marcello Pietrobon escribió:
I've applied the workaround suggested by Gav Wood and that seem to be able to completely fix the problem.
Still I don't get why we should see such huge speed up when some other applications connecting to internet are open.
Besides, despite the fact the cause of this seems entirely due to the inner working of Windows OS (sigh) I'm still left wondering why it is not possible to *optionally* implement Gav's workaround in the library when BOOST_INTERPROCESS_WINDOWS is defined, considering the increase in performance can be up to almost 1000 times.
It's a very strange issue, but we need to definitely fix this issue applyging a patch similar to Gav's, maybe using Peter Dimov's "yield_k" (http://www.boost.org/doc/libs/1_54_0/boost/smart_ptr/detail/yield_k.hpp) function. I've been very busy lately to work in Boost and the little time I had has been spent on Intrusive and Container. I'll try to fix the issue in the following weeks, thanks for the ticket and for testing Gav's patch. Best, Ion
Great. I will test your fix right away if possible. Here in this attachment my code, just to have an idea, not that I'm suggesting it for boost. Regards, Marcello interprocess.zip http://boost.2283326.n4.nabble.com/file/n4650817/interprocess.zip -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
Thank you for the last fix Ion. I've run some tests on it and it has improved the performance, but not completely. Clearly this problem is not limited to your interprocess library so I thought to open a different thread discussion for it: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... I've done some profiling plus some tests and so it's clear to me that the test program is still slowed down around the ::sleep(1) instruction. I am personally content with replacing the value 32 with a value above 1000, so the resolution to this is not urgent for me (just to take some pressure of you ;)). Best regards, Marcello -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
El 21/08/2013 7:39, Marcello Pietrobon escribió:
Thank you for the last fix Ion.
I've run some tests on it and it has improved the performance, but not completely.
Clearly this problem is not limited to your interprocess library so I thought to open a different thread discussion for it: http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to...
I've done some profiling plus some tests and so it's clear to me that the test program is still slowed down around the ::sleep(1) instruction.
I am personally content with replacing the value 32 with a value above 1000, so the resolution to this is not urgent for me (just to take some pressure of you ;)).
Thanks for the test. It's definitely hard to tell if 1000 will be OK for everyone, as it might depend on the CPU speed or waiter count (in your example there is a lot of waiting between the same two processes, which is not same use case as hundreds of threads waiting for a single resource). There is a *very experimental* support for native synchronization primitives on windows if you comment the line: #define BOOST_INTERPROCESS_FORCE_GENERIC_EMULATION on boost/interprocess/detail/workaround.hpp It tries to create Windows native named semaphores on the fly with a unique name and implements Alexander Terekhov's 8a algorithm to implement a condition variable. I don't know if it could be faster on your application, but it should use less CPU as it does not use busy waiting. Best, Ion
The default 'blind' prior is a simple exponential expectation whereby
we assume for any given duration of waiting 't', that the expected
completion time is 2t; i.e. we expect to wait as long as we have
already been waiting. As such, the optimum time to start with the
'sleep(1)' strategy (which from my tests sleeps for a full 20ms
timeslice) is after 20ms, (only after which point the prior leads us
to assert that the completion will probably take at least another
20ms).
In my patch, I found the 20ms 'optimum' value to be considerably
higher than 1000, and that was on hardware circa 2010.
Gav.
On 21 August 2013 08:07, Ion Gaztañaga
El 21/08/2013 7:39, Marcello Pietrobon escribió:
Thank you for the last fix Ion.
I've run some tests on it and it has improved the performance, but not completely.
Clearly this problem is not limited to your interprocess library so I thought to open a different thread discussion for it:
http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to...
I've done some profiling plus some tests and so it's clear to me that the test program is still slowed down around the ::sleep(1) instruction.
I am personally content with replacing the value 32 with a value above 1000, so the resolution to this is not urgent for me (just to take some pressure of you ;)).
Thanks for the test. It's definitely hard to tell if 1000 will be OK for everyone, as it might depend on the CPU speed or waiter count (in your example there is a lot of waiting between the same two processes, which is not same use case as hundreds of threads waiting for a single resource).
There is a *very experimental* support for native synchronization primitives on windows if you comment the line:
#define BOOST_INTERPROCESS_FORCE_GENERIC_EMULATION
on boost/interprocess/detail/workaround.hpp
It tries to create Windows native named semaphores on the fly with a unique name and implements Alexander Terekhov's 8a algorithm to implement a condition variable. I don't know if it could be faster on your application, but it should use less CPU as it does not use busy waiting.
Best,
Ion
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Gav Wood wrote:
The default 'blind' prior is a simple exponential expectation whereby we assume for any given duration of waiting 't', that the expected completion time is 2t; i.e. we expect to wait as long as we have already been waiting. As such, the optimum time to start with the 'sleep(1)' strategy (which from my tests sleeps for a full 20ms timeslice) is after 20ms, (only after which point the prior leads us to assert that the completion will probably take at least another 20ms).
For a 20ms timeslice, this means that one should Sleep(1) after 10ms of Sleep(0). But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1) may shorten it.
El 21/08/2013 17:43, Peter Dimov escribió:
But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1) may shorten it.
timeBeginPeriod seems a bit scary as it affects the general windows scheduler. One option could be to call GetTickCount until it changes its value twice (as the resolution of this time is the resolution of the system timer). Once it changes (after 20-40ms looping, on average that would me 30ms), then Sleep(1) is called. And for uniprocessor systems we should avoid doing any loop and call only Sleep(0). But that would be a really hard to implement spinlock ;-) Another option is to obtain the a high resolution timestamp each loop and just loop, say, for 100ms before going to Sleep(1). Best, Ion
I haven't looked at the code, but I see mention of Sleep(0) here. Does
everyone realize the special behavior of Sleep(0) ?
It only gives up a timeslice to threads of >= priority. Starving lower
priority threads (if you were to spin with only that).
Not sure it is a problem inthis case, but wanted to mention it, as it is
often overlooked.
Tony
Sent from my portable Analytical Engine
------------------------------
*From:* "Ion Gaztañaga"
But the timeslice is not necessarily 20ms. A call to timeBeginPeriod(1) may shorten it.
timeBeginPeriod seems a bit scary as it affects the general windows scheduler. One option could be to call GetTickCount until it changes its value twice (as the resolution of this time is the resolution of the system timer). Once it changes (after 20-40ms looping, on average that would me 30ms), then Sleep(1) is called. And for uniprocessor systems we should avoid doing any loop and call only Sleep(0). But that would be a really hard to implement spinlock ;-) Another option is to obtain the a high resolution timestamp each loop and just loop, say, for 100ms before going to Sleep(1). Best, Ion _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
El 08/09/2013 7:39, Gottlob Frege escribió:
I haven't looked at the code, but I see mention of Sleep(0) here. Does everyone realize the special behavior of Sleep(0) ?
It only gives up a timeslice to threads of >= priority. Starving lower priority threads (if you were to spin with only that).
Yes we take in care this Sleep(0) behaviour. In fact Sleep was changed starting with Windows Server 2003 and now it can relinquish the remainder of its time slice to any other thread that is ready to run. The code uses a combination of SwitchToThread + Sleep(0) plus Sleep(1) to avoid starvation. Best, Ion
El 17/08/2013 8:24, Marcello Pietrobon escribió:
I've noticed the same 10 times acceleration even while I make a skyPE call...
How can this have anything to do with interprocess programming?
After yield_k didn't offer good enough results, I decided to wrap the wait logic in a class instead of a function (called, spin_wait). This class would container the "k_" integer of yield_k and it would lazily obtain the value of the system tick, spining and yielding until that period has elapsed (using a high resolution counter or similar). I'm still finishing this class for Windows and then I need to write it for POSIX systems (and since MacOS does not support nanosleep, I maybe will need to do something special for this platform). However, in my first tests, I found that several applications change the default Windows tick period from 15,6 ms to 1ms (like just after launching Google Chrome). That's the reason why current Interprocess spinlocks run better when you start those applications: Sleep(1) was really sleeping for 1ms instead of 15ms (these values might change between different computers, I guess). In my first tests in my system (2,8Ghz Core i7), when the system tick is 1ms, an interprocess mutex needs 2700 iterations (32 nops/pauses + Sleep(0)) to wait for a tick. When the system tick is 15,6ms, it needs 41860 iterations (32 nops/pauses, + Sleep(0))). This means that no fixed value should be used to mark the yield/sleep limit, as it highly depends on the processor core and the system tick (that can be changed at any moment). I think N x (system tick time) limit could be a good guess. I don't know which N value is optimal to minimize both CPU usage and context switch overhead. We'd need to do some tests for that. In any case, I think this new approach will improve a lot current Interprocess horrible latencies. I'll ping the list when I commit a portable spin wait logic in a few days. Best, Ion
Great job again Ion. I've checked out your last revision, so curious to try it in the little time available I have. Unfortunately it links only if you have only one object file including wait.hpp, as in the test examples, not if you have many. So I decided to roll back the changes on my hard drive and yield() and wait() for a fix :) Here the error message: The problem is obviously at line 71 of wait.hpp : -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
Very good. I've did some testing using the latest version (in the repository), with the same testing code as reported in http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... and the performance seems excellent. Again these times are representative of few trials. jMax = 0 : time = 00:00:00.281250 jMax = 10 : time = 00:00:00.426250 jMax = 100 : time = 00:00:01.631875 Thanks!! -- View this message in context: http://boost.2283326.n4.nabble.com/Boost-Interprocess-conditions-variables-g... Sent from the Boost - Dev mailing list archive at Nabble.com.
El 07/09/2013 19:52, Marcello Pietrobon escribió:
Very good.
I've did some testing using the latest version (in the repository), with the same testing code as reported in http://boost.2283326.n4.nabble.com/Problems-with-yield-k-workaround-Still-to... and the performance seems excellent. Again these times are representative of few trials.
jMax = 0 : time = 00:00:00.281250 jMax = 10 : time = 00:00:00.426250 jMax = 100 : time = 00:00:01.631875
Nice to hear it. This will also help Mac Os users as this platform lacks process-shared mutexes/conditions and spinlocks are used. Thanks for the report and testing. Thanks also to Gav Wood for this original report and tests. Best, Ion
participants (5)
-
Gav Wood
-
Gottlob Frege
-
Ion Gaztañaga
-
Marcello Pietrobon
-
Peter Dimov