boost asio synchronous vs asynchronous operations performance

Hi, I'm trying to compare the performance of boost::asio asynchronous vs synchronous IO operations for a single client. Below, I've sample synchronous and asynchronous server applications, which send 25 byte message to the client in a loop continuously. On the client side, I'm checking at what rate it is able to receive the messages. The sample setup is pretty simple. In synchronous server case, it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop. In asynchronous server case as well it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop, using asynchronous write (main thread is the one which calls ioservice.run()). For the performance testing I'm using only one client. *Synchronous server code* #include <iostream> #include <boost/bind.hpp> #include <boost/shared_ptr.hpp> #include <boost/enable_shared_from_this.hpp> #include <boost/asio.hpp> #include <boost/thread.hpp> using boost::asio::ip::tcp; class tcp_connection : public boost::enable_shared_from_this<tcp_connection> { public: typedef boost::shared_ptr<tcp_connection> pointer; static pointer create(boost::asio::io_service& io_service) { return pointer(new tcp_connection(io_service)); } tcp::socket& socket() { return socket_; } void start() { for (;;) { try { ssize_t len = boost::asio::write(socket_, boost::asio::buffer(message_)); if (len != message_.length()) { std::cerr<<"Unable to write all the bytes"<<std::endl; break; } if (len == -1) { std::cerr<<"Remote end closed the connection"<<std::endl; break; } } catch (std::exception& e) { std::cerr<<"Error while sending data"<<std::endl; break; } } } private: tcp_connection(boost::asio::io_service& io_service) : socket_(io_service), message_(25, 'A') { } tcp::socket socket_; std::string message_; }; class tcp_server { public: tcp_server(boost::asio::io_service& io_service) : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234)) { start_accept(); } private: void start_accept() { for (;;) { tcp_connection::pointer new_connection = tcp_connection::create(acceptor_.get_io_service()); acceptor_.accept(new_connection->socket()); boost::thread(boost::bind(&tcp_connection::start, new_connection)); } } tcp::acceptor acceptor_; }; int main() { try { boost::asio::io_service io_service; tcp_server server(io_service); } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; } *ASynchronous server code:* #include <iostream> #include <string> #include <boost/bind.hpp> #include <boost/shared_ptr.hpp> #include <boost/enable_shared_from_this.hpp> #include <boost/asio.hpp> #include <boost/thread.hpp> using boost::asio::ip::tcp; class tcp_connection : public boost::enable_shared_from_this<tcp_connection> { public: typedef boost::shared_ptr<tcp_connection> pointer; static pointer create(boost::asio::io_service& io_service) { return pointer(new tcp_connection(io_service)); } tcp::socket& socket() { return socket_; } void start() { while (socket_.is_open()) { boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&tcp_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); } } private: tcp_connection(boost::asio::io_service& io_service) : socket_(io_service), message_(25, 'A') { } void handle_write(const boost::system::error_code& error, size_t bytes_transferred) { if (error) { if (socket_.is_open()) { std::cout<<"Error while sending data asynchronously"<<std::endl; socket_.close(); } } } tcp::socket socket_; std::string message_; }; class tcp_server { public: tcp_server(boost::asio::io_service& io_service) : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234)) { start_accept(); } private: void start_accept() { tcp_connection::pointer new_connection = tcp_connection::create(acceptor_.get_io_service()); acceptor_.async_accept(new_connection->socket(), boost::bind(&tcp_server::handle_accept, this, new_connection, boost::asio::placeholders::error)); } void handle_accept(tcp_connection::pointer new_connection, const boost::system::error_code& error) { if (!error) { boost::thread(boost::bind(&tcp_connection::start, new_connection)); } start_accept(); } tcp::acceptor acceptor_; }; int main() { try { boost::asio::io_service io_service; tcp_server server(io_service); io_service.run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; } *Client code* #include <iostream> #include <boost/asio.hpp> #include <boost/array.hpp> int main(int argc, char* argv[]) { if (argc != 3) { std::cerr<<"Usage: client <server-host> <server-port>"<<std::endl; return 1; } boost::asio::io_service io_service; boost::asio::ip::tcp::resolver resolver(io_service); boost::asio::ip::tcp::resolver::query query(argv[1], argv[2]); boost::asio::ip::tcp::resolver::iterator it = resolver.resolve(query); boost::asio::ip::tcp::resolver::iterator end; boost::asio::ip::tcp::socket socket(io_service); boost::asio::connect(socket, it); // Statscollector to periodically print received messages stats // sample::myboost::StatsCollector stats_collector(5); // sample::myboost::StatsCollectorScheduler statsScheduler(stats_collector); // statsScheduler.start(); for (;;) { boost::array<char, 25> buf; boost::system::error_code error; size_t len = socket.read_some(boost::asio::buffer(buf), error); // size_t len = boost::asio::read(socket, boost::asio::buffer(buf)); if (len != buf.size()) { std::cerr<<"Length is not "<< buf.size() << " but "<<len<<std::endl; } // stats_collector.incr_msgs_received(); } } <b>Question:* When the client is running against synchronous server it is able to receive around 700K msgs/sec but when it is running against asynchronous server the performance is dropped to around 100K-120K msgs/sec. I know that one should use asynchronous IO for scalability when we have more number of clients and in the above case as I'm using only a single client, the obvious advantage of asynchronous IO is not evident. But the question is, is asynchronous IO expected to effect the performance so badly for a single client case or am I missing some obvious best practices to follow with asynchronous IO? Is the significant drop in the performance is because of the thread switch between ioservice thread (which is main thread in the above case) and connection thread? *Setup:* I'm using BOOST 1.47 on Linux machine. -- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com.

void start() { while (socket_.is_open()) { boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&tcp_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); } }
private: tcp_connection(boost::asio::io_service& io_service) : socket_(io_service), message_(25, 'A') { }
void handle_write(const boost::system::error_code& error, size_t bytes_transferred) { if (error) { if (socket_.is_open()) { std::cout<<"Error while sending data asynchronously"<<std::endl; socket_.close(); } } }
I guess in the above handle_wite() you intended to call start() again. <...>
When the client is running against synchronous server it is able to receive around 700K msgs/sec but when it is running against asynchronous server the performance is dropped to around 100K-120K msgs/sec.
Since you use a very small message, the overhead related to the completion handlers may be significant. In general, it's worth using performance profiler to see what's going on, but anyway you could use a trivial "static" handler allocator and see if it helps: http://www.boost.org/doc/libs/1_55_0/doc/html/boost_asio/example/cpp03/alloc... Of course, ensure you compile with optimizations.

Hi,
I'm trying to compare the performance of boost::asio asynchronous vs synchronous IO operations for a single client.
Below, I've sample synchronous and asynchronous server applications, which send 25 byte message to the client in a loop continuously. On the client side, I'm checking at what rate it is able to receive the messages. The sample setup is pretty simple. In synchronous server case, it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop. In asynchronous server case as well it spawns a new thread per client connection and the thread keeps sending the 25-byte message in a loop, using asynchronous write (main thread is the one which calls ioservice.run()). For the performance testing I'm using only one client.
*Synchronous server code*
#include <iostream> #include <boost/bind.hpp> #include <boost/shared_ptr.hpp> #include <boost/enable_shared_from_this.hpp> #include <boost/asio.hpp> #include <boost/thread.hpp>
using boost::asio::ip::tcp;
class tcp_connection : public boost::enable_shared_from_this<tcp_connection> { public: typedef boost::shared_ptr<tcp_connection> pointer;
static pointer create(boost::asio::io_service& io_service) { return pointer(new tcp_connection(io_service)); }
tcp::socket& socket() { return socket_; }
void start() { for (;;) { try { ssize_t len = boost::asio::write(socket_, boost::asio::buffer(message_)); if (len != message_.length()) { std::cerr<<"Unable to write all the bytes"<<std::endl; break; } if (len == -1) { std::cerr<<"Remote end closed the connection"<<std::endl; break; } } catch (std::exception& e) { std::cerr<<"Error while sending data"<<std::endl; break; } } }
private: tcp_connection(boost::asio::io_service& io_service) : socket_(io_service), message_(25, 'A') { }
tcp::socket socket_; std::string message_; };
class tcp_server { public: tcp_server(boost::asio::io_service& io_service) : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234)) { start_accept(); }
private: void start_accept() { for (;;) { tcp_connection::pointer new_connection = tcp_connection::create(acceptor_.get_io_service()); acceptor_.accept(new_connection->socket()); boost::thread(boost::bind(&tcp_connection::start, new_connection)); } } tcp::acceptor acceptor_; };
int main() { try { boost::asio::io_service io_service; tcp_server server(io_service); } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; }
*ASynchronous server code:* #include <iostream> #include <string> #include <boost/bind.hpp> #include <boost/shared_ptr.hpp> #include <boost/enable_shared_from_this.hpp> #include <boost/asio.hpp>
#include <boost/thread.hpp>
using boost::asio::ip::tcp;
class tcp_connection : public boost::enable_shared_from_this<tcp_connection> { public: typedef boost::shared_ptr<tcp_connection> pointer;
static pointer create(boost::asio::io_service& io_service) { return pointer(new tcp_connection(io_service)); }
tcp::socket& socket() { return socket_; }
void start() { while (socket_.is_open()) { boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&tcp_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); } }
private: tcp_connection(boost::asio::io_service& io_service) : socket_(io_service), message_(25, 'A') { }
void handle_write(const boost::system::error_code& error, size_t bytes_transferred) { if (error) { if (socket_.is_open()) { std::cout<<"Error while sending data asynchronously"<<std::endl; socket_.close(); } } }
tcp::socket socket_; std::string message_; };
class tcp_server { public: tcp_server(boost::asio::io_service& io_service) : acceptor_(io_service, tcp::endpoint(tcp::v4(), 1234)) { start_accept(); }
private: void start_accept() { tcp_connection::pointer new_connection = tcp_connection::create(acceptor_.get_io_service()); acceptor_.async_accept(new_connection->socket(), boost::bind(&tcp_server::handle_accept, this, new_connection, boost::asio::placeholders::error)); }
void handle_accept(tcp_connection::pointer new_connection, const boost::system::error_code& error) { if (!error) { boost::thread(boost::bind(&tcp_connection::start, new_connection)); }
start_accept(); }
tcp::acceptor acceptor_; };
int main() { try { boost::asio::io_service io_service; tcp_server server(io_service); io_service.run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
*Client code* #include <iostream>
#include <boost/asio.hpp> #include <boost/array.hpp>
int main(int argc, char* argv[]) { if (argc != 3) { std::cerr<<"Usage: client <server-host> <server-port>"<<std::endl; return 1; }
boost::asio::io_service io_service; boost::asio::ip::tcp::resolver resolver(io_service); boost::asio::ip::tcp::resolver::query query(argv[1], argv[2]); boost::asio::ip::tcp::resolver::iterator it = resolver.resolve(query); boost::asio::ip::tcp::resolver::iterator end; boost::asio::ip::tcp::socket socket(io_service); boost::asio::connect(socket, it);
// Statscollector to periodically print received messages stats // sample::myboost::StatsCollector stats_collector(5); // sample::myboost::StatsCollectorScheduler statsScheduler(stats_collector); // statsScheduler.start();
for (;;) { boost::array<char, 25> buf; boost::system::error_code error; size_t len = socket.read_some(boost::asio::buffer(buf), error); // size_t len = boost::asio::read(socket, boost::asio::buffer(buf)); if (len != buf.size()) { std::cerr<<"Length is not "<< buf.size() << " but "<<len<<std::endl; } // stats_collector.incr_msgs_received(); } }
<b>Question:* When the client is running against synchronous server it is able to receive around 700K msgs/sec but when it is running against asynchronous server the performance is dropped to around 100K-120K msgs/sec. I know that one should use asynchronous IO for scalability when we have more number of clients and in the above case as I'm using only a single client, the obvious advantage of asynchronous IO is not evident. But the question is, is asynchronous IO expected to effect the performance so badly for a single client case or am I missing some obvious best practices to follow with asynchronous IO? Is the significant drop in the performance is because of the thread switch between ioservice thread (which is main thread in the above case) and connection thread?
*Setup:* I'm using BOOST 1.47 on Linux machine.
-- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users http://www.boost.org/doc/libs/1_42_0/doc/html/boost_asio/reference/io_servic... Try using concurrency_hint = number of threads you'll create. At least with I/O Completion ports (Windows NT), only concurrency_hint
On 3/21/2014 12:54, Donald Alan wrote: threads can perform an asynchronous operation simultaneously. All other threads have to wait. The default constructor probably uses concurrency_hint = #processors but tbh I'm not sure. disclaimer: I'm also not sure if I/O Completion ports actually supports more threads than #processors. MSDN doesn't suggest that there's any limit. Also, the main point of asynchronous I/O is that you don't need a thread-per-file/connection to achieve necessary performance. A typical asynchronous I/O server will have either 1 or #processors dedicated I/O threads and use a fixed number of worker threads for non-IO tasks, not a thread-per-connection. I would suggest that this renders your benchmark questionable even after you make my suggested change.

On 22/03/2014 15:05, Quoth Nate:
Try using concurrency_hint = number of threads you'll create. At least with I/O Completion ports (Windows NT), only concurrency_hint threads can perform an asynchronous operation simultaneously. All other threads have to wait. The default constructor probably uses concurrency_hint = #processors but tbh I'm not sure. disclaimer: I'm also not sure if I/O Completion ports actually supports more threads than #processors. MSDN doesn't suggest that there's any limit.
It does. It's fairly common to allocate 1.5x or 2x #processors threads to the pool. What happens then is that Windows will keep up to #processors threads (or whatever other concurrency value you specify) processing from the completion port at all times -- if one of the worker threads goes to sleep on some resource other than the completion port itself (during the course of whatever processing it's doing) then it will allow one of the "extra" threads to be woken if needed.

Gavin Lambert wrote
It does. It's fairly common to allocate 1.5x or 2x #processors threads to the pool. What happens then is that Windows will keep up to #processors threads (or whatever other concurrency value you specify) processing from the completion port at all times -- if one of the worker threads goes to sleep on some resource other than the completion port itself (during the course of whatever processing it's doing) then it will allow one of the "extra" threads to be woken if needed.
Given that in the sample use case I've there are only 2 threads that are working on ioservice I think it doesn't matter what value set as concurrency_hint. Let me know if I'm missing something. -- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com.

Nathaniel J Fries wrote
Also, the main point of asynchronous I/O is that you don't need a thread-per-file/connection to achieve necessary performance. A typical asynchronous I/O server will have either 1 or #processors dedicated I/O threads and use a fixed number of worker threads for non-IO tasks, not a thread-per-connection. I would suggest that this renders your benchmark questionable even after you make my suggested change.
Yes, I don't intend to create one thread per connection in asynchronous case. Just for this sample use case I created a thread on connection request and used it to generate messages. My main concern is, if we are trying to send messages asynchronously (in non-ioservice thread) then the performance is significantly bad compare to sending the messages synchronously in the connection thread (of course if more and more number of clients get added then thread-per-connection in synchronous case won't scale well). -- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com.

On 03/24/2014 03:28 PM, Donald Alan wrote:
and used it to generate messages. My main concern is, if we are trying to send messages asynchronously (in non-ioservice thread) then the performance is significantly bad compare to sending the messages synchronously in the connection thread (of course if more and more number of clients get added
I ran your code through a profiler and it shows that the slowdown comes from boost::bind and boost::shared_ptr that are needed to setup the async operations. I also tried to change your async_server so that it does not write all buffers in a loop, but instead writes the next buffer from the handler. This yielded almost the same performance results. I also tried to omit the connection thread, so that all work is done in the io_service thread. Same performance results.

Thanks Bjorn. So, this definitely seems to suggest that boost async operations are much slower than their corresponding synchronous operations. Is there any way I can avoid creating boost::bind and boost::shared_ptr for each async operation? -- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com.

Thanks Bjorn. So, this definitely seems to suggest that boost async operations are much slower than their corresponding synchronous operations. Is there any way I can avoid creating boost::bind and boost::shared_ptr for each async operation? -- View this message in context: http://boost.2283326.n4.nabble.com/boost-asio-synchronous-vs-asynchronous-op... Sent from the Boost - Users mailing list archive at Nabble.com.

On 25 Mar 2014 at 14:06, Bjorn Reese wrote:
and used it to generate messages. My main concern is, if we are trying to send messages asynchronously (in non-ioservice thread) then the performance is significantly bad compare to sending the messages synchronously in the connection thread (of course if more and more number of clients get added
I ran your code through a profiler and it shows that the slowdown comes from boost::bind and boost::shared_ptr that are needed to setup the async operations.
If running in a debugger such that Visual Studio disables the non-pathologically slow memory allocator, I can believe it. Otherwise I struggle to see how these could cause the kind of figures the OP was seeing. If AFIO can push 400k ops/sec per core, and it's doing seven memory allocations and frees per op which include two std::binds and two std::shared_ptr constructions and deletions, plus a boost::future creation and deletion which is at least another boost::shared_ptr, the maths doesn't add up that the OP is so slow.
I also tried to change your async_server so that it does not write all buffers in a loop, but instead writes the next buffer from the handler. This yielded almost the same performance results.
I also tried to omit the connection thread, so that all work is done in the io_service thread. Same performance results.
Very odd. How much time is spent in the kernel? Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 03/26/2014 01:51 AM, Niall Douglas wrote:
Otherwise I struggle to see how these could cause the kind of figures the OP was seeing. If AFIO can push 400k ops/sec per core, and it's
Your scepticism is warranted. I accidentially misspelled the compiler optimization option, so all my performance measurements were done on debug code. Looking at the new numbers for the optimized build, the main difference between the synchronous and asynchronous case is due to the internals of the asio::io_service queue (primarily locking.)
I also tried to change your async_server so that it does not write all buffers in a loop, but instead writes the next buffer from the handler. This yielded almost the same performance results.
I also tried to omit the connection thread, so that all work is done in the io_service thread. Same performance results.
With optimization on, these changes improve performance by approx 20%.
Very odd. How much time is spent in the kernel?
Around 5% in debugging code, and 40% in optimized code.

On 28/03/2014 00:09, quoth Bjorn Reese:
Looking at the new numbers for the optimized build, the main difference between the synchronous and asynchronous case is due to the internals of the asio::io_service queue (primarily locking.)
FWIW, that matches my own testing results. I had a case (with serial ports) where this locking latency was sufficiently high to be bothersome. I wrote an experimental lock-free reactor engine which appears to outperform Asio (at least on Windows) -- but it's also much more limited and doesn't provide some of the same guarantees. For normal (particularly socket) usage it's probably not worth the hassle.

On 28 Mar 2014 at 18:17, Gavin Lambert wrote:
I had a case (with serial ports) where this locking latency was sufficiently high to be bothersome. I wrote an experimental lock-free reactor engine which appears to outperform Asio (at least on Windows) -- but it's also much more limited and doesn't provide some of the same guarantees.
For normal (particularly socket) usage it's probably not worth the hassle.
Windows has quite chunky thread switch times anyway, so as soon as you need to wait on another thread via the kernel rather than CAS lock you can forget about performance. Windows completion ports ought to be completely user space when thread A posts work to thread B, but any additional locking e.g. by ASIO can create enough stalls to send completion ports to sleep in the kernel. That said, it would be interesting to patch in TSX support to ASIO instead of its mutex and see what happens. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 27 Mar 2014 at 12:09, Bjorn Reese wrote:
Otherwise I struggle to see how these could cause the kind of figures the OP was seeing. If AFIO can push 400k ops/sec per core, and it's
Very odd. How much time is spent in the kernel?
Around 5% in debugging code, and 40% in optimized code.
AFIO spends about 45% of its time in locks as well when fully loaded on non-TSX hardware. I am looking forward to getting my hands on some TSX hardware though, as I believe AFIO ought to become little slower than ASIO i.e. ASIO will be the overwhelming limiting throughput factor. Out of curiosity, how many CPU cycles per op in your ASIO test case? AFIO seems to need ~9,000 CPU cycles per op processed, half of which is spent spinning on CAS locks - I would assume that ASIO can knock that down by two thirds? Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 03/28/2014 11:21 AM, Niall Douglas wrote:
Out of curiosity, how many CPU cycles per op in your ASIO test case? AFIO seems to need ~9,000 CPU cycles per op processed, half of which is spent spinning on CAS locks - I would assume that ASIO can knock that down by two thirds?
1000 cycles/op -- measured via io_service::do_run_once(). It locks/unlocks four times per operation, which accounts for a total of 20% of the CPU time.

On 30 Mar 2014 at 14:00, Bjorn Reese wrote:
Out of curiosity, how many CPU cycles per op in your ASIO test case? AFIO seems to need ~9,000 CPU cycles per op processed, half of which is spent spinning on CAS locks - I would assume that ASIO can knock that down by two thirds?
1000 cycles/op -- measured via io_service::do_run_once().
Just to clarify, my ~9000 cycles/op is for maximum contention i.e. fully loaded with eight threads all fighting it out. Is your 1000 cycles/op for two threads only? Methinks AFIO could do with some minimum latency benchmarks actually ... might as well, I already have build time benchmarks.
It locks/unlocks four times per operation, which accounts for a total of 20% of the CPU time.
I'm actually surprised it's as much as that. I would have thought twice per operation is the minimum possible, but you have to make some hard design choices to get it that low. AFIO "looks funny" partially because it locks exactly twice per op as from the v1.2 engine, unless you have TSX in which case it never locks at all except if more memory from the kernel is needed. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 03/30/2014 04:13 PM, Niall Douglas wrote:
Just to clarify, my ~9000 cycles/op is for maximum contention i.e. fully loaded with eight threads all fighting it out. Is your 1000 cycles/op for two threads only?
Yes.
It locks/unlocks four times per operation, which accounts for a total of 20% of the CPU time.
I'm actually surprised it's as much as that. I would have thought twice per operation is the minimum possible, but you have to make some hard design choices to get it that low. AFIO "looks funny"
It uses one lock to protect its epoll data, and three to protect the io_service members. Don't ask me why it needs three in the latter case. Talking about surprise, I was surprised that system::system_category() accounted for 10% of the CPU time, but it looks like Asio wraps all system calls in system::error_code() before using them (e.g. for checking for would_block in non-blocking I/O.)

On 30 Mar 2014 at 15:13, Niall Douglas wrote:
1000 cycles/op -- measured via io_service::do_run_once().
Methinks AFIO could do with some minimum latency benchmarks actually ... might as well, I already have build time benchmarks.
I have some results: on a 3.5Ghz quad core CPU with hyperthreading, latencies are as follows: 1-4 concurrency: constant ~9 microseconds between op issue and operation beginning, ~7 microseconds between operation end and op future signals. Total latency for main thread: ~ 16 microseconds. 4-8 concurrency: linear rise with concurrency. I assume this is the hyperthreading. 8-32 concurrency: fairly constant ~12 microseconds between op issue and operation beginning, ~9 microseconds between operation end and op future signals. Total latency for main thread: ~ 21 microseconds. The latency curve after 8 concurrency is pretty flat, but it's probably because the tasks are getting executed as fast as you can dispatch them so basically you don't really see the true scaling to load. For reference a thread context switch was measured at 0.2 microseconds, obviously there will be quite a few of those during a typical AFIO op dispatch. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/

On 21 Mar 2014 at 9:54, Donald Alan wrote:
When the client is running against synchronous server it is able to receive around 700K msgs/sec but when it is running against asynchronous server the performance is dropped to around 100K-120K msgs/sec. I know that one should use asynchronous IO for scalability when we have more number of clients and in the above case as I'm using only a single client, the obvious advantage of asynchronous IO is not evident. But the question is, is asynchronous IO expected to effect the performance so badly for a single client case or am I missing some obvious best practices to follow with asynchronous IO? Is the significant drop in the performance is because of the thread switch between ioservice thread (which is main thread in the above case) and connection thread?
Linux isn't capable of asynchronous i/o [1], so of course directly calling synchronous kernel APIs will be faster than using threads to multiplex kernel APIs. I think most of your disparity though is that you are doing at least two (and probably more) syscalls per message for the async case as ASIO must do a poll/select per message. I'd very interested to see your results on an OS which does implement async i/o - Windows is the easiest. Anyway, I really wouldn't worry about ASIO performance. ASIO can exceed 3m threaded dispatches per second on a quad core Intel. Even AFIO, which extends ASIO and uses lots of "slow" futures, breaks past 1.5m dispatches/sec. [1]: Linux can do non-blocking socket i/o, but non-blocking is *not* asynchronous i/o. Linux can do a limited amount of async file i/o using a special syscall not used by any of the libc implementations of POSIX routines. FreeBSD can do async i/o, but ASIO isn't wired up for it. Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
participants (6)
-
Bjorn Reese
-
Donald Alan
-
Gavin Lambert
-
Igor R
-
Nate
-
Niall Douglas