[asio] Randomly having problems reading from the socket

Hi all, I have some problems using the boost.asio.read for a TCP-socket. So far it works fine, but it randomly stops the transfer for at least 5 seconds. To resolve the error I used a simple synchronous server and client from the boost examples and modified it too send a larger amount of data. (307400 Bytes; this size comes from the program I'm trying to get run). On the client side I also changed the reading from the socket to read the complete data with boost.asio.read and transfer_all as argument. Both, server and client, measure the time for sending and reading to the socket and will give an error if it needs more than 1 second. Here is the sourcecode for the server: #include <iostream> #include <string> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp> using boost::asio::ip::tcp; using namespace boost::posix_time; int main() { try { boost::asio::io_service io_service; tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 8002)); tcp::socket socket(io_service); acceptor.accept(socket); unsigned char * array = new unsigned char[307400]; for (;;) { boost::system::error_code ignored_error; ptime startTime = microsec_clock::universal_time(); boost::asio::write(socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), ignored_error); time_duration neededTime = microsec_clock::universal_time() - startTime; if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; } My client sourcecode: #include <iostream> #include <boost/array.hpp> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp> using boost::asio::ip::tcp; using namespace boost::posix_time; int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr << "Usage: client <host>" << std::endl; return 1; } boost::asio::io_service io_service; boost::asio::ip::tcp::endpoint endpoint( boost::asio::ip::address::from_string( argv[1] ), 8002 ); tcp::socket socket(io_service); boost::system::error_code error = boost::asio::error::host_not_found; socket.connect( endpoint, error ); if (error) throw boost::system::system_error(error); int counter = 0; unsigned char* array = new unsigned char[307400]; for (;;) { boost::system::error_code error; ptime startTime = microsec_clock::universal_time(); size_t len = boost::asio::read( socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), error ); time_duration neededTime = microsec_clock::universal_time() - startTime; if ( error ) { printf( "body read error %s\n", error.message().c_str() ); } if ( counter % 5000 == 0 ) { std::cout << "passed: " << counter << "\r"; } ++counter; if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; } return 0; } To reproduce the error just let it run for a while. It can take up to 30 minutes or even more... After a while it will produce the line: "ERROR needed too much time: 5007ms". Maybe with slightly different times, but all around 5 seconds. I tested it on different machines with Windows Vista and Seven. So am I'm doing sth. wrong? Is the reading from the socket correct? I'm using boost version 1.44. Kind regards Torben Wiggerich

Hi Torben, Firstly I claim I always use async way of asio, so I don't get the same problem as yours. But for your code, I might suggest, because you are transferring big chunk of data, if network condition is bad, you might get some error, so maybe you want to see the 'error' while not use 'ignored error' ? cheers, Jinqiang On Mon, May 2, 2011 at 10:28 PM, Torben Wiggerich <torben.wiggerich@nisys.de
wrote:
Hi all,
I have some problems using the boost.asio.read for a TCP-socket. So far it works fine, but it randomly stops the transfer for at least 5 seconds. To resolve the error I used a simple synchronous server and client from the boost examples and modified it too send a larger amount of data. (307400 Bytes; this size comes from the program I'm trying to get run). On the client side I also changed the reading from the socket to read the complete data with boost.asio.read and transfer_all as argument.
Both, server and client, measure the time for sending and reading to the socket and will give an error if it needs more than 1 second.
Here is the sourcecode for the server:
#include <iostream> #include <string> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main() { try { boost::asio::io_service io_service;
tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 8002));
tcp::socket socket(io_service); acceptor.accept(socket);
unsigned char * array = new unsigned char[307400];
for (;;) { boost::system::error_code ignored_error;
ptime startTime = microsec_clock::universal_time();
boost::asio::write(socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), ignored_error);
time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
My client sourcecode:
#include <iostream> #include <boost/array.hpp> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr << "Usage: client <host>" << std::endl; return 1; }
boost::asio::io_service io_service; boost::asio::ip::tcp::endpoint endpoint( boost::asio::ip::address::from_string( argv[1] ), 8002 );
tcp::socket socket(io_service); boost::system::error_code error = boost::asio::error::host_not_found; socket.connect( endpoint, error ); if (error) throw boost::system::system_error(error);
int counter = 0; unsigned char* array = new unsigned char[307400]; for (;;) { boost::system::error_code error;
ptime startTime = microsec_clock::universal_time(); size_t len = boost::asio::read( socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), error ); time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( error ) { printf( "body read error %s\n", error.message().c_str() ); }
if ( counter % 5000 == 0 ) { std::cout << "passed: " << counter << "\r"; } ++counter;
if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
To reproduce the error just let it run for a while. It can take up to 30 minutes or even more... After a while it will produce the line: "ERROR needed too much time: 5007ms". Maybe with slightly different times, but all around 5 seconds. I tested it on different machines with Windows Vista and Seven.
So am I'm doing sth. wrong? Is the reading from the socket correct?
I'm using boost version 1.44.
Kind regards Torben Wiggerich
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Mon, May 2, 2011 at 8:28 PM, Torben Wiggerich <torben.wiggerich@nisys.de>wrote:
Hi all,
I have some problems using the boost.asio.read for a TCP-socket. So far it works fine, but it randomly stops the transfer for at least 5 seconds. To resolve the error I used a simple synchronous server and client from the boost examples and modified it too send a larger amount of data. (307400 Bytes; this size comes from the program I'm trying to get run). On the client side I also changed the reading from the socket to read the complete data with boost.asio.read and transfer_all as argument.
Both, server and client, measure the time for sending and reading to the socket and will give an error if it needs more than 1 second.
Here is the sourcecode for the server:
#include <iostream> #include <string> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main() { try { boost::asio::io_service io_service;
tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 8002));
tcp::socket socket(io_service); acceptor.accept(socket);
unsigned char * array = new unsigned char[307400];
for (;;) { boost::system::error_code ignored_error;
ptime startTime = microsec_clock::universal_time();
boost::asio::write(socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), ignored_error);
time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
My client sourcecode:
#include <iostream> #include <boost/array.hpp> #include <boost/asio.hpp> #include <boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr << "Usage: client <host>" << std::endl; return 1; }
boost::asio::io_service io_service; boost::asio::ip::tcp::endpoint endpoint( boost::asio::ip::address::from_string( argv[1] ), 8002 );
tcp::socket socket(io_service); boost::system::error_code error = boost::asio::error::host_not_found; socket.connect( endpoint, error ); if (error) throw boost::system::system_error(error);
int counter = 0; unsigned char* array = new unsigned char[307400]; for (;;) { boost::system::error_code error;
ptime startTime = microsec_clock::universal_time(); size_t len = boost::asio::read( socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), error ); time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( error ) { printf( "body read error %s\n", error.message().c_str() ); }
if ( counter % 5000 == 0 ) { std::cout << "passed: " << counter << "\r"; } ++counter;
if ( neededTime.total_milliseconds() > 1000 ) { std::cout << "ERROR needed too much time: " << neededTime.total_milliseconds() << "ms" << std::endl; } } } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
To reproduce the error just let it run for a while. It can take up to 30 minutes or even more... After a while it will produce the line: "ERROR needed too much time: 5007ms". Maybe with slightly different times, but all around 5 seconds. I tested it on different machines with Windows Vista and Seven.
So am I'm doing sth. wrong? Is the reading from the socket correct?
I'm using boost version 1.44.
Your server program exits immediately on my Win7 machine with an exception thrown from the line "tcp::acceptor acceptor(...)". It gives the same error message as yours after I change the port from 8002 to 8003. And the program works fine after I change the port to 18002. So I guess the problem is the port you are using. Maybe it's reserved for some other usage and you can not use these ports. Regards, Feng
Kind regards Torben Wiggerich
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Am 05.05.2011 08:15, schrieb 肖锋:
On Mon, May 2, 2011 at 8:28 PM, Torben Wiggerich <torben.wiggerich@nisys.de>wrote:
Hi all,
I have some problems using the boost.asio.read for a TCP-socket. So far it works fine, but it randomly stops the transfer for at least 5 seconds. To resolve the error I used a simple synchronous server and client from the boost examples and modified it too send a larger amount of data. (307400 Bytes; this size comes from the program I'm trying to get run). On the client side I also changed the reading from the socket to read the complete data with boost.asio.read and transfer_all as argument.
Both, server and client, measure the time for sending and reading to the socket and will give an error if it needs more than 1 second.
Here is the sourcecode for the server:
#include<iostream> #include<string> #include<boost/asio.hpp> #include<boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main() { try { boost::asio::io_service io_service;
tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 8002));
tcp::socket socket(io_service); acceptor.accept(socket);
unsigned char * array = new unsigned char[307400];
for (;;) { boost::system::error_code ignored_error;
ptime startTime = microsec_clock::universal_time();
boost::asio::write(socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), ignored_error);
time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( neededTime.total_milliseconds()> 1000 ) { std::cout<< "ERROR needed too much time: "<< neededTime.total_milliseconds()<< "ms"<< std::endl; } } } catch (std::exception& e) { std::cerr<< e.what()<< std::endl; }
return 0; }
My client sourcecode:
#include<iostream> #include<boost/array.hpp> #include<boost/asio.hpp> #include<boost/date_time/posix_time/posix_time.hpp>
using boost::asio::ip::tcp; using namespace boost::posix_time;
int main(int argc, char* argv[]) { try { if (argc != 2) { std::cerr<< "Usage: client<host>"<< std::endl; return 1; }
boost::asio::io_service io_service; boost::asio::ip::tcp::endpoint endpoint( boost::asio::ip::address::from_string( argv[1] ), 8002 );
tcp::socket socket(io_service); boost::system::error_code error = boost::asio::error::host_not_found; socket.connect( endpoint, error ); if (error) throw boost::system::system_error(error);
int counter = 0; unsigned char* array = new unsigned char[307400]; for (;;) { boost::system::error_code error;
ptime startTime = microsec_clock::universal_time(); size_t len = boost::asio::read( socket, boost::asio::buffer( array, 307400 ), boost::asio::transfer_all(), error ); time_duration neededTime = microsec_clock::universal_time() - startTime;
if ( error ) { printf( "body read error %s\n", error.message().c_str() ); }
if ( counter % 5000 == 0 ) { std::cout<< "passed: "<< counter<< "\r"; } ++counter;
if ( neededTime.total_milliseconds()> 1000 ) { std::cout<< "ERROR needed too much time: "<< neededTime.total_milliseconds()<< "ms"<< std::endl; } } } catch (std::exception& e) { std::cerr<< e.what()<< std::endl; }
return 0; }
To reproduce the error just let it run for a while. It can take up to 30 minutes or even more... After a while it will produce the line: "ERROR needed too much time: 5007ms". Maybe with slightly different times, but all around 5 seconds. I tested it on different machines with Windows Vista and Seven.
So am I'm doing sth. wrong? Is the reading from the socket correct?
I'm using boost version 1.44.
Your server program exits immediately on my Win7 machine with an exception thrown from the line "tcp::acceptor acceptor(...)". It gives the same error message as yours after I change the port from 8002 to 8003. And the program works fine after I change the port to 18002. So I guess the problem is the port you are using. Maybe it's reserved for some other usage and you can not use these ports.
Regards, Feng
Hi Feng, I tried many different ports already. But I couldn't find a port which works... On your suggested port 18002 it also stops on my machine. It is just a matter of time until it fails. I tested this on two machines. The first stoped in two hours 6 times and the other only once. Kind regards Torben

On Thu, May 5, 2011 at 3:21 PM, Torben Wiggerich <torben.wiggerich@nisys.de> wrote:
I tried many different ports already. But I couldn't find a port which works... On your suggested port 18002 it also stops on my machine. It is just a matter of time until it fails. I tested this on two machines. The first stoped in two hours 6 times and the other only once.
Can't you run the app in the debugger and inspect the point at which the exception is thrown or the error is generated? Olaf

Am 05.05.2011 16:16, schrieb Olaf van der Spek:
I tried many different ports already. But I couldn't find a port which works... On your suggested port 18002 it also stops on my machine. It is just a matter of time until it fails. I tested this on two machines. The first stoped in two hours 6 times and the other only once. Can't you run the app in the debugger and inspect the point at which
On Thu, May 5, 2011 at 3:21 PM, Torben Wiggerich <torben.wiggerich@nisys.de> wrote: the exception is thrown or the error is generated?
There is no exception which is thrown and there is also no error generated. The system just hangs inside the boost::asio::read method. It doesn't generate an error inside this. The package is even correctly transmitted... (with a latency of 5 seconds). So I have no clear entrance point for debugging... Because I dont' know when it occurs. Regards Torben
Olaf _______________________________________________ Unsubscribe& other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Torben Wiggerich NISYS GmbH Technologiezentrum Ruhr (TZR) Universitätsstr. 142 44799 Bochum Germany phone: +49 (0)234 978323-9 email: torben.wiggerich@nisys.de URL: http://www.nisys.de

The symptoms you describe, based on the code you included in your first e-mail, seems to point to a simple and straightforward problem - you're running into TCP flow control blockage. This is a traditional TCP problem and is not directly related to Asio. Anytime you're sending data as fast as possible, and the receiving side is just slightly slower at receiving the data, send buffers in the OS will fill up. Eventually this will result in the send being blocked for a short period of time while the receiving side processes the data. If you're using blocking writes, the send will block for a period of time. If you're using async writes, the write completion handler will take longer before it is called (which is why I greatly prefer async operations all around). That the problem is transitory and takes a while to manifest is another clue - it depends totally on how fast the sending and receiving applications are running, and if there's a short period on the receiving side where the app can't run as fast (e.g. another app takes up some processing time), the blockage will be transitory. Cliff

Am 05.05.2011 16:30, schrieb Cliff Green:
The symptoms you describe, based on the code you included in your first e-mail, seems to point to a simple and straightforward problem - you're running into TCP flow control blockage. This is a traditional TCP problem and is not directly related to Asio.
Anytime you're sending data as fast as possible, and the receiving side is just slightly slower at receiving the data, send buffers in the OS will fill up. Eventually this will result in the send being blocked for a short period of time while the receiving side processes the data. If you're using blocking writes, the send will block for a period of time. If you're using async writes, the write completion handler will take longer before it is called (which is why I greatly prefer async operations all around).
That the problem is transitory and takes a while to manifest is another clue - it depends totally on how fast the sending and receiving applications are running, and if there's a short period on the receiving side where the app can't run as fast (e.g. another app takes up some processing time), the blockage will be transitory.
Cliff
Thanks for your answer and Sorry for my late response... So I will give the asynchronous connection a try ... If I understand you correct, just the sender should be blocked for a time. You say that it is because the buffers are running full. Is there a method to detect if the buffers are full, so that I should stop sending new packets? It would be ok if it stops for a few milliseconds, but not for a such long time like 5 seconds... But I also don't understand how both sides can stop receiving and sending data for 5 seconds at the same time... On the other hand the application where I observed this error didn't produce so frequently network traffic at all. It produces this packages only with 25 Hz... Kind regards Torben

From: "Torben Wiggerich" <torben.wiggerich@nisys.de>
If I understand you correct, just the sender should be blocked for a time. You say that it is because the buffers are running full. Is there a method to detect if the buffers are full, so that I should stop sending new packets?
Not that I know of (someone else may have some suggestions). If you're using blocking writes, you may have blockage.
It would be ok if it stops for a few milliseconds, but not for a such long time like 5 seconds...
Yeah, it seems excessive to me, too. I would guess there's something else at play. Maybe when TCP flow control kicks in, there's some kind of driver reset or timeout that is occurring.
But I also don't understand how both sides can stop receiving and sending data for 5 seconds at the same time...
That also seems odd - I would expect just the receiving side to pause. So, as already mentioned, maybe something else odd is occurring. Have you put an analyzer (e.g. WireShark) on it and watched the TCP packets from right before and when the pause occurs? This might help isolate the problem.
On the other hand the application where I observed this error didn't produce so frequently network traffic at all. It produces this packages only with 25 Hz...
I don't understand what you mean. Hope I've helped, Cliff

Am 09.05.2011 18:05, schrieb Cliff Green:
From: "Torben Wiggerich" <torben.wiggerich@nisys.de>
If I understand you correct, just the sender should be blocked for a time. You say that it is because the buffers are running full. Is there a method to detect if the buffers are full, so that I should stop sending new packets?
Not that I know of (someone else may have some suggestions). If you're using blocking writes, you may have blockage.
It would be ok if it stops for a few milliseconds, but not for a such long time like 5 seconds...
Yeah, it seems excessive to me, too. I would guess there's something else at play. Maybe when TCP flow control kicks in, there's some kind of driver reset or timeout that is occurring.
But I also don't understand how both sides can stop receiving and sending data for 5 seconds at the same time...
That also seems odd - I would expect just the receiving side to pause. So, as already mentioned, maybe something else odd is occurring.
Have you put an analyzer (e.g. WireShark) on it and watched the TCP packets from right before and when the pause occurs? This might help isolate the problem.
Ok I will try this...
On the other hand the application where I observed this error didn't produce so frequently network traffic at all. It produces this packages only with 25 Hz...
I don't understand what you mean.
In the example program I gave was the sender sending continously without any pausing. I just wanted to say that it even occours if the sender is not sending which such a high frequency. What speaks a little bit against the theroy that a buffer is running full. (I didn't try to reproduce this with the example program.) But that is just speculative. Torben

I am also having the problem that in my application, sending/receiving at 20Hz over a localhost/loopback connection, sometimes stalls for exactly 5000ms. This happens mostly on slower (e.g. Athlon X2 4400+) than on faster (e.g. i7-930) machines. Torben Wiggerich wrote
Have you put an analyzer (e.g. WireShark) on it and watched the TCP packets from right before and when the pause occurs? This might help isolate the problem.
Ok I will try this...
Did this help finding the source of the problem? As explained here (http://wiki.wireshark.org/CaptureSetup/Loopback), Wireshark/WinPcap will not be able to capture traffic over the loopback device. As suggested, I used RawCap (http://www.netresec.com/?page=RawCap) instead. I filter the data using "tcp.port == 1234" to not get sidetracked by other traffic. I uploaded a screenshot here: http://i.imgur.com/36U62.jpg I uploaded the whole dumpfile here: http://www8.zippyshare.com/v/50356468/file.html The interesting part is probably this:
85030 16.690429 127.0.0.1 127.0.0.1 TCP 1500 [TCP segment of a reassembled PDU] 85041 21.689453 127.0.0.1 127.0.0.1 TCP 808 [TCP Previous segment lost] [TCP segment of a reassembled PDU]
You can clearly see the ~5000ms pause between those two events. Does anyone have further insight in the reason why it is exactly 5000ms, and how to maybe fix or work around it? -- View this message in context: http://boost.2283326.n4.nabble.com/asio-Randomly-having-problems-reading-fro... Sent from the Boost - Dev mailing list archive at Nabble.com.

Posted a Bug Report here: Ticket #8937 <https://svn.boost.org/trac/boost/ticket/8937> -- View this message in context: http://boost.2283326.n4.nabble.com/asio-Randomly-having-problems-reading-fro... Sent from the Boost - Dev mailing list archive at Nabble.com.
participants (6)
-
Cliff Green
-
FlorianGeorge
-
Jinqiang Zhang
-
Olaf van der Spek
-
Torben Wiggerich
-
肖锋