Re: [Boost-users] Poor/erratic boost::interprocess named_semaphore performance

Ok, so I converted my example to native Windows semaphores, and it consistently completes in less than 250ms (up to 160x faster). Something is definitely going on.
-----Original Message-----
From: Davidson, Josh
Sent: Sunday, February 12, 2012 9:01 PM
To: boost-users@lists.boost.org
Subject: Poor/erratic boost::interprocess named_semaphore performance
I'm experiencing performance issues using named semaphores on Windows 7 x64. Currently, I'm on 1.49 beta 1, but the behavior was similar on 1.48 and 1.47. Below, I'm copying two sample programs. They simply synchronize with one another using a pair of semaphores. To run them, start the first program and then the second. They will synchronize with each other 100k times and then the second program will spit out the elapsed time. You can restart the second program to run again without bringing down the first program.
On Windows, the elapsed time the first time around is anywhere from 20 - 40 seconds. If you leave the second program running, the following run will be only about 2-3 seconds. A third run it will be back up to 20-40 seconds, and it will keep bouncing back and forth like that. On Linux, the total elapsed time is consistently less than half a second.
I'm guessing the Windows implementation isn't as efficient, but I wouldn't expect it to be 100 times slower than Linux. The other perplexing thing is how wildly different the results are on Windows from run to run.
=====================Test1.cpp==========================================
#include

From quickly looking at the header files it appears that there is no Windows implementation of the synchronization portion of the library, and instead a generic user-space mechanism is used. All POSIX systems (Linux, Mac) use POSIX mechanisms. It appears relatively simple to port, if you're already familiar with the Windows API code for it. The POSIX implementation of semaphores is done in fewer than 50 lines. On 2/12/2012 11:59 PM, Davidson, Josh wrote:
Ok, so I converted my example to native Windows semaphores, and it consistently completes in less than 250ms (up to 160x faster). Something is definitely going on.
-----Original Message----- From: Davidson, Josh Sent: Sunday, February 12, 2012 9:01 PM To: boost-users@lists.boost.org Subject: Poor/erratic boost::interprocess named_semaphore performance
I'm experiencing performance issues using named semaphores on Windows 7 x64. Currently, I'm on 1.49 beta 1, but the behavior was similar on 1.48 and 1.47. Below, I'm copying two sample programs. They simply synchronize with one another using a pair of semaphores. To run them, start the first program and then the second. They will synchronize with each other 100k times and then the second program will spit out the elapsed time. You can restart the second program to run again without bringing down the first program.
On Windows, the elapsed time the first time around is anywhere from 20 - 40 seconds. If you leave the second program running, the following run will be only about 2-3 seconds. A third run it will be back up to 20-40 seconds, and it will keep bouncing back and forth like that. On Linux, the total elapsed time is consistently less than half a second.
I'm guessing the Windows implementation isn't as efficient, but I wouldn't expect it to be 100 times slower than Linux. The other perplexing thing is how wildly different the results are on Windows from run to run.
=====================Test1.cpp========================================== #include
using namespace boost::posix_time; #include using namespace boost::interprocess; #include<iostream> using namespace std; int main() { named_semaphore::remove("sem1"); named_semaphore::remove("sem2");
named_semaphore sem1(create_only_t(), "sem1", 0); named_semaphore sem2(create_only_t(), "sem2", 0);
while(true) { sem1.wait(); sem2.post(); }
return 0; }
=====================Test2.cpp========================================== #include
using namespace boost::interprocess; #include
using namespace boost::posix_time; #include<iostream> using namespace std; int main() {
const size_t iterations = 100000; named_semaphore sem1(open_only_t(), "sem1"); named_semaphore sem2(open_only_t(), "sem2");
ptime start = boost::posix_time::microsec_clock::local_time(); for(size_t i = 0; i< iterations; ++i) { sem1.post(); sem2.wait(); }
ptime end = boost::posix_time::microsec_clock::local_time(); time_duration delta = end - start;
double seconds = double(delta.total_nanoseconds())/1000000000.0;
cout<< "Total elapsed time: "<< seconds<< endl;; cout<< "Time per iteration: "<< (seconds/double(iterations))<< endl; return 0; } _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

El 13/02/2012 11:25, Nathaniel J Fries escribió:
From quickly looking at the header files it appears that there is no Windows implementation of the synchronization portion of the library, and instead a generic user-space mechanism is used. All POSIX systems (Linux, Mac) use POSIX mechanisms.
It appears relatively simple to port, if you're already familiar with the Windows API code for it. The POSIX implementation of semaphores is done in fewer than 50 lines.
Not so easy if you want to achieve the same POSIX lifetime guarantees. There is an implementation in my head to achieve better windows performance using windows native synchronization primitives, but that would need to wait until I find some time as I have tons of bugs, ideas and requests for Container, Intrusive and Move. Things could be improved a bit not calling thread_yield in every loop step in spin_semaphore::wait() (Boost 1.49). Ion

Thanks for the information. I did briefly peek at the code and when I saw WaitForSingleObject, I assumed the handle was an actual win32 semaphore. Obviously there's a trade-off, and I have an agenda, but would it be reasonable to deviate from POSIX lifetime on semaphores just like what was done for shared memory? Of course, this could break existing code, but if I'm not mistaken, a similar change was made in 1.48 for shared memory.
Things could be improved a bit not calling thread_yield in every loop step in spin_semaphore::wait() (Boost 1.49). I am currently using 1.49 Beta 1.
Josh -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Ion Gaztañaga Sent: Monday, February 13, 2012 5:55 AM To: Boost User List Subject: EXTERNAL: Re: [Boost-users] Poor/erratic boost::interprocess named_semaphore performance El 13/02/2012 11:25, Nathaniel J Fries escribió:
From quickly looking at the header files it appears that there is no Windows implementation of the synchronization portion of the library, and instead a generic user-space mechanism is used. All POSIX systems (Linux, Mac) use POSIX mechanisms.
It appears relatively simple to port, if you're already familiar with the Windows API code for it. The POSIX implementation of semaphores is done in fewer than 50 lines.
Not so easy if you want to achieve the same POSIX lifetime guarantees. There is an implementation in my head to achieve better windows performance using windows native synchronization primitives, but that would need to wait until I find some time as I have tons of bugs, ideas and requests for Container, Intrusive and Move. Things could be improved a bit not calling thread_yield in every loop step in spin_semaphore::wait() (Boost 1.49). Ion _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

El 13/02/2012 18:08, Davidson, Josh escribió:
Thanks for the information. I did briefly peek at the code and when I saw WaitForSingleObject, I assumed the handle was an actual win32 semaphore. Obviously there's a trade-off, and I have an agenda, but would it be reasonable to deviate from POSIX lifetime on semaphores just like what was done for shared memory? Of course, this could break existing code, but if I'm not mistaken, a similar change was made in 1.48 for shared memory.
Yes, existing code was sadly broken, but lifetime semantics remained under POSIX rules (POSIX allows preserving shared memory between reboots, as shm might be implemented as mapped files). I hope to fix COM issues to get again kernel persistence, but I'll need some help from experienced windows programmers. In 1.49 beta there is some native-windows implementations of mutex, condition, etc., but those are disabled (search for BOOST_INTERPROCESS_USE_WINDOWS) until I test them a bit and I choose a absolutely-ABI-breaking release. Named semaphores are not yet implemented using winapi calls and should be implemented in that absolutely-ABI-breaking release . The idea would be: -> Create a temporary file representing the semaphore. That file stores a count when the last process attached to the semaphore is detached. -> Use file id with a "prefix bips." as global semaphore name in the system: "Global\prefix bips.XXXXXXXXXXXXXXXXXXX" -> Create on demand (open or create) a windows named semaphore with all access permissions (permissions are checked on the file). If semaphore was created, use file count as initial value. -> Write sem status to file at semaphore close. Use native file locking to serialize access to the file. This strategy is used by cygwin 1.7. This has a weak point if the last process attached to a semaphore dies, as the semaphore count won't be correctly written to the file (and the windows semaphore will dissapear. If another process opens the semaphore, the semaphore count won't be correct. If anyone discovers a more robust strategy, let me know ;-) Ion
participants (3)
-
Davidson, Josh
-
Ion Gaztañaga
-
Nathaniel J Fries