[Newbie] Problem receiving data with asio::serial_port
Hi all, I’m writing a Qt application running on both Windows and MacOs platforms. The application is based on CMake project that includes boost-asio package 1.84.0 by vcpkg packager. The application must communicate with a board by USB cable = virtual serial port. I’m handling the communication with a small wrap around the asio::serial_port object, found in sample https://github.com/fedetft/serial-port/tree/master/2_with_timeout The problem is on MacOS platform; Windows version works correctly. The problem is that after some data exchanges (they are correct) the program does not receive a reply (with the TIMEOUT exception) and, after that, it receives only system_error = 89. This is a chunk of my log: ... ( I send a command and I receive the reply) [ReceiveLog] start [TP] start SendCommand 'log export' [WriteCommand] Byte[] Write - cmd: 'log export' [WaitForResponse] ms 1000 RX [34] : ' log export -> XModem receive file' [SendCommand] response: ' log export -> XModem receive file' [ReceiveLog] response1: 'XModem receive file' matching: 'XModem receive file' [ReceiveLog] ok -> XMODEM (now I send a byte to start the data from the board - 3 attempts)
Write BEGIN DATA RECEIVE .. [WaitForXModemResponse] ms 4000 [WaitForXModemResponse] TIMEOUT TO diff: 4
Write BEGIN DATA RECEIVE .. [WaitForXModemResponse] ms 4000 [ToSerial][readCompleted] error system:89 [ToSerial][readCompleted] byteTransf 0 [WaitForXModemResponse] Boost System_error
Write BEGIN DATA RECEIVE .. [WaitForXModemResponse] ms 4000 [ToSerial][readCompleted] error system:89 [ToSerial][readCompleted] byteTransf 0 [WaitForXModemResponse] Boost System_error [TP][ReceiveLog] start XMODEM: 0
(now I sent an ACK, just to try ...) [WaitForXModemResponse] ms 1000 [ToSerial][readCompleted] error system:89 [ToSerial][readCompleted] byteTransf 0 [WaitForXModemResponse] Boost System_error [dlThread] ReceiveLog ret: 0 The read function is simply set the timeout and read 133 bytes: ... _serialPort.setTimeout( boost::posix_time::millisec(timeout)); char data[XMODEM_PACKET_LENGTH]; try { _serialPort.read( data, XMODEM_PACKET_LENGTH); ... } catch(timeout_exception){ ... } catch(boost::system::system_error){ ...} Please what error 89 means (in MACos context)? Any idea on how to exit from this situation? Thanks! Regards
Try re-creating the serial port object on timeout. If I'm not mistaken, error code 89 on OSX means ECANCELED.
From reading https://github.com/fedetft/serial-port/blob/master/2_with_timeout/TimeoutSer..., I'd say that the class is not handling cancellations correctly. When a timeout occurs, serial_port::cancel gets called, which in turns makes the operation complete with a ECANCELED error code (which BTW has a different number for each platform). This line is handling the case for Linux: https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445.... But it's failing to do so for OSX.
https://github.com/fedetft/serial-port/blob/master/2_with_timeout/TimeoutSer... does several questionable things: - io_service name has been deprecated long ago (in favor of io_context) - error codes should be checked by name, and not by number (hence avoiding problems like the one you encountered) - If you're creating more than one serial port in your application, this creates an io_service per object, which incurs in _a lot_ of overhead. - The way it handles cancellation is not ideal - it should wait until both async operations finish before returning or throwing the relevant exception. You'll probably need to do some cleanup there before proceeding. If you need more help, it may be faster to ask in the #boost-asio channel in Slack: https://cpplang.slack.com/archives/C06BRML5EFK Hope it helps. Ruben.
Thanks a lot Ruben. I changed io_context for io_service and I need only a single serial port. When you say: - The way it handles cancellation is not ideal - it should wait until both async operations finish before returning or throwing the relevant exception. which operations are you referring on? Read and Write? How can I achieve it? Thanks!
Il 06/06/2024 15:57 CEST Ruben Perez
ha scritto: Try re-creating the serial port object on timeout.
If I'm not mistaken, error code 89 on OSX means ECANCELED.
From reading https://github.com/fedetft/serial-port/blob/master/2_with_timeout/TimeoutSer..., I'd say that the class is not handling cancellations correctly. When a timeout occurs, serial_port::cancel gets called, which in turns makes the operation complete with a ECANCELED error code (which BTW has a different number for each platform). This line is handling the case for Linux: https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445.... But it's failing to do so for OSX.
https://github.com/fedetft/serial-port/blob/master/2_with_timeout/TimeoutSer... does several questionable things: - io_service name has been deprecated long ago (in favor of io_context) - error codes should be checked by name, and not by number (hence avoiding problems like the one you encountered) - If you're creating more than one serial port in your application, this creates an io_service per object, which incurs in _a lot_ of overhead. - The way it handles cancellation is not ideal - it should wait until both async operations finish before returning or throwing the relevant exception.
You'll probably need to do some cleanup there before proceeding.
If you need more help, it may be faster to ask in the #boost-asio channel in Slack: https://cpplang.slack.com/archives/C06BRML5EFK
Hope it helps.
Ruben.
When you say:
- The way it handles cancellation is not ideal - it should wait until both async operations finish before returning or throwing the relevant exception.
which operations are you referring on? Read and Write?
When you call TimeoutSerial::read, two async operations are fired in parallel: - A read operation (asio::async_read, in your case): https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445... - A timer wait (https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445...) The idea is that you run both in parallel, and when the first one finishes, the second one gets cancelled. However, even if you cancel an Asio async operation, its completion handler gets called (with a cancelled error code). The problem is that TimeoutSerial::read, upon timeout, is cancelling the asio::async_read operation but not waiting for the handler to be called: https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445... This is being workarounded in these lines, but the workaround is not taking OSX into account: https://github.com/fedetft/serial-port/blob/291a7997a665a52bf37d15c615679445... Writing this cleanly requires a big refactor of the class. For a quick workaround, you can try adding this line just after the ones I linked to: if (error == boost::asio::error::operation_aborted) return; Ruben.
Thanks a lot Ruben, now the process is clean. I'm filling the TimeoutSerial class with debug outputs (just to visualise the flow) and I discovered a thing: - all the successful read calls are done by readStringUntil() function - the wrong call is read(char*, int) function. - comparing these functions I find that in the resultSuccess of the read() function there is NOT any data read (inside readStringUntil() yes!), so the function cannot returns any binary data. But I'm wondering how the Windows version can works ?!?!?! How the read() function could return ny data ??? Thanks! Stefano
Hello, please I need other information about the serial port read + timeout. Referring to the TimeoutSerial class: in the read(char*,size_t) implementation I understand that it run an async read and then it wait on timer. The for loop the function waits on io.run_one(): I guess that both port and timer are under the io (io_context) control (by the constructors). It is not clear what happen in some steps:at the beginning I only try to read something (just eventually to empty the queue) with a 100ms timeout. Then I sent 2 chars (ESC ESC) to check the presence of the device. My log is: [WriteCommand] Write nbytes: 0 [WaitForResponse] ms 100 [WaitForResponse] or pattern: '' RX .. [Tos] async_read_until .. [ToSerial][timeoutExpired] error system:0 [Tos][read] exit from run_one result: 3 = resultTimeoutExpired [waitForResponse] catch TIMEOUT dt:0 Send ESC ESC .. [WriteCommand] Write nbytes: 2 [WaitForResponse] ms 2000 [WaitForResponse] or pattern: '
' RX .. [Tos] async_read_until ..
[ToSerial][readCompleted] - error: system:995 - bytes: 0 [Tos][read] exit from run_one result: 0 [ToSerial][readCompleted] - error: system:0 - bytes: 210 [readCompleted] success! [Tos][read] exit from run_one result: 1 RX [207] : '[0m[?25l[2J[1;1H******************************** ' [TP][WaitForResponse] Edn wait End ESC ESC !! My question is: - on the first read correctly the timeout handler is recalled, it set the resultTimeoutExpired state and exits. The run_one() exits with value 3 and the switch selects the case where it cancels the port and throws the timeout_exception. Why the readCompleted function (aborted 995) is recalled more later, after the ESC_ESC buffer is sent and another async_read_until is rearmed? Anyway the reply is received .. Thanks
Hello, hope to have found a solution. The problem seems to be caused to a "random" readCompleted() callback function recalled with error=0 and byteTransferred=0 - while Wireshark shows me that the packet was received! At the beginning I only exit from the function but the TimeoutExpired arrived. I corrected the procedure adding the rearm of the async read in case of error==0 and bytes==0. Now I'm able to recover the packet and go on! I don't know why the callback function is called with bytes=0, but that's the way .. Thanks!
On Thu, 13 Jun 2024 at 12:18, Stefano Mora via Boost
Hello, hope to have found a solution.
The problem seems to be caused to a "random" readCompleted() callback function recalled with error=0 and byteTransferred=0 - while Wireshark shows me that the packet was received! At the beginning I only exit from the function but the TimeoutExpired arrived. I corrected the procedure adding the rearm of the async read in case of error==0 and bytes==0. Now I'm able to recover the packet and go on!
I don't know why the callback function is called with bytes=0, but that's the way ..
In Asio, every time you call an async function, you will eventually get a completion callback. So every time you attempt a read, two callbacks will always get generated, one for the timer, one for the read operation. This is true even if one of the operations gets cancelled - the callback will happen to contain an "asio::error::operation_aborted" error, but will get called. In case of timeout, your code is only dispatching the event loop (calling run_one) until one of the callbacks gets called, then throws. The other callback still sits there, waiting for it to be dispatched. The next run_one, it will get called, even if it belongs to the previous operation. This is the behavior you're seeing. What your code is trying to do is best modelled in Asio as a parallel group. This is an experimental feature, but should work fine for you. I've rewritten your TimeoutSerial class using parallel_group. You can find it in this gist: https://gist.github.com/anarthal/4e6ac7fdffc6af4bc14c9d64060d534b I haven't tried it on a real serial port, but it should do the job. Regards, Ruben.
Il 13/06/2024 20:15 CEST Ruben Perez
ha scritto: In Asio, every time you call an async function, you will eventually get a completion callback. So every time you attempt a read, two callbacks will always get generated, one for the timer, one for the read operation. This is true even if one of the operations gets cancelled - the callback will happen to contain an "asio::error::operation_aborted" error, but will get called.
thanks a lot, Ruben. I think I changed the code to wait/align all the notifications. My last doubt is the 'error=0 and bytes=0' notification: as I said I handled it with a re-launch of a new async_read. Thanks again, Stefano
participants (2)
-
Ruben Perez
-
Stefano Mora