
Rene Rivera wrote:
Peter Dimov wrote:
Rene Rivera wrote:
Did you try to use the library?
No.
It'd probably be helpful if you create a simple throughput test and post the results so that there is a clear target for Asio to match or exceed. This will also validate your objections as they currently seem to be based on intuition. :-)
Working on it...
OK, I have now tried it. Two problems I ran into: compile errors with CW-8.3, and compile errors if any windows headers are included before the asio headers. The test is a single process multi-thread uni-directional UDP timing of 100,000 messages of 1K size on the local loop (127.0.0.1). The small message size and the localoop are to flood the server with as many messages as possible, as opposed to as much data as possible. Since it's the individual message handling that I'm testing. Each test cycle is run 6 times, and the first and last samples are removed, all in the process. The test process is run 3 times and the median is what I'm reporting here. In the process there are two basically identical variants which are tested. There is a single sync client, which is reused, that sends the messages to the current server thread. There are two types of servers tested, an async server using the callbacks for handling messages, and a sync server doing the reads directly and dispatching manually. Internally those represent almost identical code paths, except that the async is doing the extra allocations for the deferred dispatch. [code attached and output attached] I ran the same 100,000*1K*6*2*3 tests with both debug and release compiled code. As can be seen from the attached output in the best case, of release code, there is a 5.6% "overhead" from the async to sync cases. For the debug code the difference is a more dramatic 25.2%. In my continue reading and use of the code I concluded that the basic design flaw is that the underlying *socket_service classes are not available as a public interface. It's nice that thought and care has gone into designing a handling and dispatch pattern that can serve many needs. But as it's currently structured, and was pointed and/or alluded to in the sync/demuxer threads, not having access to using or customizing this lower level leads to forcing a single use case. Another restriction that this causes is that the various OS handling models are not available as individual handling types. Hence, for example on Linux, it is not possible to choose at runtime whether one should use epoll, kqueue, or select. I should mention that just because I voted no, doesn't mean I won't be using this library myself. The performance for my current application is sufficient. But I'm hoping that more improvements come along to make it possible to still have those optimum uses. Thanks to all for reading ;-) And happy _insert_your_preferred_holiday_here_. PS. The test was on my AMD Athlon 3200+ 2.2Ghz, 1Gig RAM, Win2KSP4 setup. -- -- Grafik - Don't Assume Anything -- Redshift Software, Inc. - http://redshift-software.com -- rrivera/acm.org - grafik/redshift-software.com -- 102708583/icq - grafikrobot/aim - Grafik/jabber.org
DEBUG ====== BEGIN OUTPUT ====== --- ASYNC... ### TIME: total = 2.97428; iterations = 100000; iteration = 2.97428e-005; iterations/second = 33621.6 ### TIME: total = 3.01433; iterations = 100000; iteration = 3.01433e-005; iterations/second = 33174.8 ### TIME: total = 2.98429; iterations = 100000; iteration = 2.98429e-005; iterations/second = 33508.8 ### TIME: total = 3.00432; iterations = 100000; iteration = 3.00432e-005; iterations/second = 33285.4 ### TIME: total = 2.98429; iterations = 100000; iteration = 2.98429e-005; iterations/second = 33508.8 ### TIME: total = 3.05439; iterations = 100000; iteration = 3.05439e-005; iterations/second = 32739.7 -- ...ASYNC: average iterations/second = 33369.5 --- SYNC... ### TIME: total = 2.40346; iterations = 100000; iteration = 2.40346e-005; iterations/second = 41606.8 ### TIME: total = 2.41347; iterations = 100000; iteration = 2.41347e-005; iterations/second = 41434.1 ### TIME: total = 2.41347; iterations = 100000; iteration = 2.41347e-005; iterations/second = 41434.1 ### TIME: total = 2.40346; iterations = 100000; iteration = 2.40346e-005; iterations/second = 41606.8 ### TIME: total = 2.42348; iterations = 100000; iteration = 2.42348e-005; iterations/second = 41262.9 ### TIME: total = 2.31333; iterations = 100000; iteration = 2.31333e-005; iterations/second = 43227.8 ### TIME: total = 0; iterations = 100000; iteration = 0; iterations/second = 1.#INF -- ...SYNC: average iterations/second = 41793.1
EXIT STATUS: 1 ====== END OUTPUT ======
DIFF = 25.2%
RELEASE ====== BEGIN OUTPUT ====== --- ASYNC... ### TIME: total = 2.46354; iterations = 100000; iteration = 2.46354e-005; iterations/second = 40592 ### TIME: total = 2.42348; iterations = 100000; iteration = 2.42348e-005; iterations/second = 41262.9 ### TIME: total = 2.42348; iterations = 100000; iteration = 2.42348e-005; iterations/second = 41262.9 ### TIME: total = 2.4335; iterations = 100000; iteration = 2.4335e-005; iterations/second = 41093.1 ### TIME: total = 2.4335; iterations = 100000; iteration = 2.4335e-005; iterations/second = 41093.1 ### TIME: total = 2.37341; iterations = 100000; iteration = 2.37341e-005; iterations/second = 42133.4 -- ...ASYNC: average iterations/second = 41178 --- SYNC... ### TIME: total = 2.2933; iterations = 100000; iteration = 2.2933e-005; iterations/second = 43605.3 ### TIME: total = 2.27327; iterations = 100000; iteration = 2.27327e-005; iterations/second = 43989.5 ### TIME: total = 2.30331; iterations = 100000; iteration = 2.30331e-005; iterations/second = 43415.7 ### TIME: total = 2.31333; iterations = 100000; iteration = 2.31333e-005; iterations/second = 43227.8 ### TIME: total = 2.30331; iterations = 100000; iteration = 2.30331e-005; iterations/second = 43415.7 ### TIME: total = 2.30331; iterations = 100000; iteration = 2.30331e-005; iterations/second = 43415.7 ### TIME: total = 0; iterations = 100000; iteration = 0; iterations/second = 1.#INF -- ...SYNC: average iterations/second = 43492.9
EXIT STATUS: 1 ====== END OUTPUT ======
DIFF = 5.6%
#include <cstdio> #include <iostream> #include <vector> #include <valarray> #include <boost/shared_ptr.hpp> #include <boost/thread.hpp> #include <boost/bind.hpp> #include <boost/asio.hpp> const std::size_t message_size = /**/ 1*1024 /*/ 64*1024 /**/; const int port = 9999; const std::size_t message_iterations = /** 100 /*/ 100000 /**/; namespace asio = ::boost::asio; namespace detail_test { struct timed_scope { boost::xtime t0; boost::xtime t1; std::size_t n; std::vector<double> & results; inline timed_scope(std::vector<double> & r, std::size_t iterations = 1) : results(r), n(iterations) { boost::xtime_get(&t0,boost::TIME_UTC); } inline ~timed_scope() { boost::xtime_get(&t1,boost::TIME_UTC); double t = double(t1.sec)+double(t1.nsec)/double(1000000000); t -= double(t0.sec)+double(t0.nsec)/double(1000000000); std::cerr << "### TIME" << ": total = " << t << "; iterations = " << n << "; iteration = " << t/double(n) << "; iterations/second = " << double(n)/t << '\n'; results.push_back(double(n)/t); } }; template <typename out> out & result_summary(out & o, const std::vector<double> & results) { std::valarray<double> r(&results[1],results.size()-2); o << r.sum()/r.size(); return o; } void sleep_for_secs(int n) { boost::xtime t; boost::xtime_get(&t,boost::TIME_UTC); t.sec += n; boost::thread::sleep(t); } } struct async_server { char io_buffer[message_size]; boost::asio::demuxer demuxer; boost::asio::datagram_socket socket; std::auto_ptr<detail_test::timed_scope> timer; std::size_t message_count; std::size_t message_recount; std::auto_ptr<boost::thread> runner; std::vector<double> results; async_server() : socket(this->demuxer,boost::asio::ipv4::udp::endpoint(port)) , message_count(0) , message_recount(0) { socket.async_receive( boost::asio::buffer(io_buffer, message_size), 0, boost::bind(handle_receive_from, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred ) ); } void handle_receive_from( const boost::asio::error& error, size_t /*bytes_transferred*/) { if (!error || error == boost::asio::error::message_size) { if (++message_count == this->timer->n) { this->clear_timer(); message_count = 0; this->reset_timer(message_recount); } socket.async_receive( boost::asio::buffer(io_buffer, message_size), 0, boost::bind(handle_receive_from, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred ) ); } } void start() { this->runner.reset(new boost::thread( boost::bind(boost::asio::demuxer::run,&this->demuxer) )); } void stop() { this->demuxer.interrupt(); this->runner->join(); this->clear_timer(); } void reset_timer(std::size_t i = 1) { this->message_recount = i; this->timer.reset(new detail_test::timed_scope(this->results,i)); } void clear_timer() { this->timer.reset(); } }; struct sync_server { char io_buffer[message_size]; boost::asio::demuxer demuxer; boost::asio::datagram_socket socket; std::auto_ptr<detail_test::timed_scope> timer; std::size_t message_count; std::size_t message_recount; std::auto_ptr<boost::thread> runner; std::vector<double> results; volatile bool running; sync_server() : socket(this->demuxer,boost::asio::ipv4::udp::endpoint(port)) , message_count(0) , message_recount(0) , running(false) { } void handle_receive_from( const boost::asio::error& error, size_t /*bytes_transferred*/) { if (!error || error == boost::asio::error::message_size) { if (++message_count == this->timer->n) { this->clear_timer(); message_count = 0; this->reset_timer(message_recount); } } } void start() { this->runner.reset(new boost::thread( boost::bind(sync_server::run,this) )); } void run() { this->running = true; while (this->running) { boost::asio::error error; std::size_t bytes_transferred = socket.receive( boost::asio::buffer(io_buffer, message_size), 0, boost::asio::assign_error(error)); this->handle_receive_from(error,bytes_transferred); if (error && error != boost::asio::error::message_size) break; } this->running = false; } void stop() { this->running = false; this->socket.close(); this->runner->join(); this->clear_timer(); } void reset_timer(std::size_t i = 1) { this->message_recount = i; this->timer.reset(new detail_test::timed_scope(this->results,i)); } void clear_timer() { this->timer.reset(); } }; struct sync_client { char io_buffer[message_size]; boost::asio::demuxer demuxer; boost::asio::ipv4::host_resolver host_resolver; boost::asio::ipv4::host host; boost::asio::ipv4::udp::endpoint receiver_endpoint; boost::asio::datagram_socket socket; sync_client() : host_resolver(this->demuxer) , receiver_endpoint(port) , socket(this->demuxer,boost::asio::ipv4::udp::endpoint(0)) { host_resolver.get_host_by_name(host, "127.0.0.1"); receiver_endpoint.address(host.address(0)); } void send() { socket.send_to( boost::asio::buffer(io_buffer, message_size), 0, receiver_endpoint); } }; int main() { sync_client c0; { async_server s0; s0.start(); detail_test::sleep_for_secs(2); std::cerr << "--- ASYNC...\n"; s0.reset_timer(message_iterations); for (std::size_t m = 0; m < message_iterations*(1+4+1); ++m) { c0.send(); } s0.stop(); detail_test::result_summary( std::cerr << "-- ...ASYNC: average iterations/second = ", s0.results) << "\n"; } detail_test::sleep_for_secs(2); { sync_server s0; s0.start(); detail_test::sleep_for_secs(2); std::cerr << "--- SYNC...\n"; s0.reset_timer(message_iterations); for (std::size_t m = 0; m < message_iterations*(1+4+1); ++m) { c0.send(); } s0.stop(); detail_test::result_summary( std::cerr << "-- ...SYNC: average iterations/second = ", s0.results) << "\n"; } return 1; }