Implemented support for recvmmsg / sendmmsg (on supported operating systems: Linux / *BSD / AIX / QNX)
As stated on asio-users mailing list (
https://sourceforge.net/p/asio/mailman/asio-users/thread/CADfydx%2BDF1kvqchV...
), I wrote a patch that enables sendmmsg / recvmmsg on supported operating
systems (Linux, [Free/Net/Open]BDS, AIX, Blackberry QNXNeutrino).
The syscalls are implemented in reactive_socket_service backend (the
io_uring backend on Linux does not support sendmmsg / recvmmsg yet, so no
recvmmsg / sendmmsg when using newest support for io_uring on asio).
The config.hpp has macros for forcible disable those calls, and detect
supported operating system compiler macros.
New methods were developed for (send/send_to/receive/receive_from) suffixed
with '_multiple_buffer_sequence' that accepts a multiple_buffer_sequence
class (backed by asio::detail::array
On 14/01/2023 18:41, Virgilio Fornazin via Boost wrote:
I finished the first version of the patch. It's able to send and receive 1million packet / second on my box at same time (an intel i9 9900k / ubuntu 22.04 lts low latency kernel) using packet sizes of 64 / 1430 bytes without packet loss (using 64mb udp socket buffers)
A good example of why support for one million scatter-gather buffer lists is a useful thing. Did you consider implementing the same support for Windows? It's more work than it would be on Linux as you'd need to extend the ASIO IOCP reactor with RIO, but you'd get similar performance. Niall
Hi Niall The ASIO development started some work on working with 'registered buffers' using lib io_uring (on Linux). Windows RIO support need to ensure some thread semantics that must be checked if they are compatible with ASIO model. For standard IOCP mode, WSASendMsg is analog to sendmsg, and it's already implemented on ASIO. My patch was develop because I need to implement a high-volume UDP service and I reached the limit of sendto/recvfrom calls/CPU core/second on Linux (about 35k calls/datagrams on my box). With sendmmsg / recvmmsg, I easily reaches 200k datagrams packets/second perform about 6k sendmmsg/recvmmsg calls on same hardware). Using SO_REUSEPORT, with 12 sockets, without sendmmsg / recvmmsg, I reached about 190k datagrams/s using 13 cores. Using 1 receive and 5 senders, reached 1 million datagrams / second (I've reached a peak 12.5 gigabit/second transfers, without any packet loss, using 64MB send/receive buffers). To reach this kind of performance, very specialized programs are needed (busy pooling udp sockets, pin each socket process to a cpu core, also pin RSS-queue to same CPU core (you can read some material on that on Cloudfare's blog https://blog.cloudflare.com/how-to-receive-a-million-packets ... it's not a trivial task). The recvmmsg / sendmmsg also suppots TCP on some OS's, so it raised the of I/O performance to another level. On Mon, Jan 16, 2023 at 12:17 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 14/01/2023 18:41, Virgilio Fornazin via Boost wrote:
I finished the first version of the patch. It's able to send and receive 1million packet / second on my box at same time (an intel i9 9900k / ubuntu 22.04 lts low latency kernel) using packet sizes of 64 / 1430 bytes without packet loss (using 64mb udp socket buffers)
A good example of why support for one million scatter-gather buffer lists is a useful thing.
Did you consider implementing the same support for Windows? It's more work than it would be on Linux as you'd need to extend the ASIO IOCP reactor with RIO, but you'd get similar performance.
Niall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 16/01/2023 18:28, Virgilio Fornazin wrote:
My patch was develop because I need to implement a high-volume UDP service and I reached the limit of sendto/recvfrom calls/CPU core/second on Linux (about 35k calls/datagrams on my box). With sendmmsg / recvmmsg, I easily reaches 200k datagrams packets/second perform about 6k sendmmsg/recvmmsg calls on same hardware). Using SO_REUSEPORT, with 12 sockets, without sendmmsg / recvmmsg, I reached about 190k datagrams/s using 13 cores. Using 1 receive and 5 senders, reached 1 million datagrams / second (I've reached a peak 12.5 gigabit/second transfers, without any packet loss, using 64MB send/receive buffers). To reach this kind of performance, very specialized programs are needed (busy pooling udp sockets, pin each socket process to a cpu core, also pin RSS-queue to same CPU core (you can read some material on that on Cloudfare's blog https://blog.cloudflare.com/how-to-receive-a-million-packets https://blog.cloudflare.com/how-to-receive-a-million-packets ... it's not a trivial task).
The recvmmsg / sendmmsg also suppots TCP on some OS's, so it raised the of I/O performance to another level.
Firstly, you shouldn't top post on boost-dev. Absolutely does recvmmsg / sendmmsg hugely improve performance if you're working with many tiny messages. But perhaps you missed my point: a patch adding support to ASIO looks a lot more attractive if it isn't Linux exclusive. For example, FreeBSD has those APIs, but they don't work the same as on Linux. Mac OS has recvmsg_x and sendmsg_x, and those work differently again. Finally, as mentioned, Windows also has support, but it's different yet again. I'm not the ASIO maintainer, so it's up to Chris K whether your patch would be accepted or not. But I can tell you a patch with wide platform support and a good test suite would be the ideal, whereas for a single platform I - personally speaking - wouldn't find compelling. Niall
Niall The patch supports more than Linux. Currently it supports: Linux Free/Net/Open BSD AIX (7.2+) QNX Neutrino (7+) .. and now implemented and tested on MacOS X using sendmsg_x / recvmsg_x support (thanks for your tip). (About the top post, I didn't mean to cause any problems here. Sorry for that) Note: about Windows Registered I/O support (as you suggested), I could work on (if no-one is still working) a new patch to support RIO using new asio registered_buffer supports later, but it's not in the scope of this implementation, since the semantics are different (a lot). The _multiple_receive_buffers calls are 'emulated' in composed (async_)send(to) calls for sending, and (async_)receive(from) is implemented receiving only the first buffersequence. Since it uses the underlying (async_)receive(from), there's no performance penalty on it, but efficiency is not optimal, while still works. On Mon, Jan 16, 2023 at 4:35 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 16/01/2023 18:28, Virgilio Fornazin wrote:
My patch was develop because I need to implement a high-volume UDP service and I reached the limit of sendto/recvfrom calls/CPU core/second on Linux (about 35k calls/datagrams on my box). With sendmmsg / recvmmsg, I easily reaches 200k datagrams packets/second perform about 6k sendmmsg/recvmmsg calls on same hardware). Using SO_REUSEPORT, with 12 sockets, without sendmmsg / recvmmsg, I reached about 190k datagrams/s using 13 cores. Using 1 receive and 5 senders, reached 1 million datagrams / second (I've reached a peak 12.5 gigabit/second transfers, without any packet loss, using 64MB send/receive buffers). To reach this kind of performance, very specialized programs are needed (busy pooling udp sockets, pin each socket process to a cpu core, also pin RSS-queue to same CPU core (you can read some material on that on Cloudfare's blog https://blog.cloudflare.com/how-to-receive-a-million-packets https://blog.cloudflare.com/how-to-receive-a-million-packets ... it's not a trivial task).
The recvmmsg / sendmmsg also suppots TCP on some OS's, so it raised the of I/O performance to another level.
Firstly, you shouldn't top post on boost-dev.
Absolutely does recvmmsg / sendmmsg hugely improve performance if you're working with many tiny messages. But perhaps you missed my point: a patch adding support to ASIO looks a lot more attractive if it isn't Linux exclusive.
For example, FreeBSD has those APIs, but they don't work the same as on Linux. Mac OS has recvmsg_x and sendmsg_x, and those work differently again.
Finally, as mentioned, Windows also has support, but it's different yet again.
I'm not the ASIO maintainer, so it's up to Chris K whether your patch would be accepted or not. But I can tell you a patch with wide platform support and a good test suite would be the ideal, whereas for a single platform I - personally speaking - wouldn't find compelling.
Niall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
The _multiple_receive_buffers calls are 'emulated' in composed (async_)send(to) calls for sending, and (async_)receive(from) is implemented receiving only the first buffersequence. Since it uses the underlying (async_)receive(from), there's no performance penalty on it, but efficiency is not optimal, while still works.
ONLY when target does not support native sendmmsg/recvmmsg (or sendmsg_x/recvmsg_x) On Mon, Jan 16, 2023 at 7:38 PM Virgilio Fornazin < virgiliofornazin@gmail.com> wrote:
Niall
The patch supports more than Linux. Currently it supports:
Linux Free/Net/Open BSD AIX (7.2+) QNX Neutrino (7+) .. and now implemented and tested on MacOS X using sendmsg_x / recvmsg_x support (thanks for your tip).
(About the top post, I didn't mean to cause any problems here. Sorry for that)
Note: about Windows Registered I/O support (as you suggested), I could work on (if no-one is still working) a new patch to support RIO using new asio registered_buffer supports later, but it's not in the scope of this implementation, since the semantics are different (a lot).
The _multiple_receive_buffers calls are 'emulated' in composed (async_)send(to) calls for sending, and (async_)receive(from) is implemented receiving only the first buffersequence. Since it uses the underlying (async_)receive(from), there's no performance penalty on it, but efficiency is not optimal, while still works.
On Mon, Jan 16, 2023 at 4:35 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 16/01/2023 18:28, Virgilio Fornazin wrote:
My patch was develop because I need to implement a high-volume UDP service and I reached the limit of sendto/recvfrom calls/CPU core/second on Linux (about 35k calls/datagrams on my box). With sendmmsg / recvmmsg, I easily reaches 200k datagrams packets/second perform about 6k sendmmsg/recvmmsg calls on same hardware). Using SO_REUSEPORT, with 12 sockets, without sendmmsg / recvmmsg, I reached about 190k datagrams/s using 13 cores. Using 1 receive and 5 senders, reached 1 million datagrams / second (I've reached a peak 12.5 gigabit/second transfers, without any packet loss, using 64MB send/receive buffers). To reach this kind of performance, very specialized programs are needed (busy pooling udp sockets, pin each socket process to a cpu core, also pin RSS-queue to same CPU core (you can read some material on that on Cloudfare's blog https://blog.cloudflare.com/how-to-receive-a-million-packets https://blog.cloudflare.com/how-to-receive-a-million-packets ... it's not a trivial task).
The recvmmsg / sendmmsg also suppots TCP on some OS's, so it raised the of I/O performance to another level.
Firstly, you shouldn't top post on boost-dev.
Absolutely does recvmmsg / sendmmsg hugely improve performance if you're working with many tiny messages. But perhaps you missed my point: a patch adding support to ASIO looks a lot more attractive if it isn't Linux exclusive.
For example, FreeBSD has those APIs, but they don't work the same as on Linux. Mac OS has recvmsg_x and sendmsg_x, and those work differently again.
Finally, as mentioned, Windows also has support, but it's different yet again.
I'm not the ASIO maintainer, so it's up to Chris K whether your patch would be accepted or not. But I can tell you a patch with wide platform support and a good test suite would be the ideal, whereas for a single platform I - personally speaking - wouldn't find compelling.
Niall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 16/01/2023 22:38, Virgilio Fornazin wrote:
Niall
The patch supports more than Linux. Currently it supports:
Linux Free/Net/Open BSD AIX (7.2+) QNX Neutrino (7+) .. and now implemented and tested on MacOS X using sendmsg_x / recvmsg_x support (thanks for your tip).
Cool. Personally speaking that level of platform support I would find compelling for a merge. You may wish to remix your patch suitable for standalone ASIO and submit it as a PR there (Boost.ASIO is generated by script from standalone ASIO). Good luck with persuading Chris K to merge it, and thanks for your contribution! Niall
Hi Niall I've implemented it in pure ASIO com Chris. 'Boostification' process should take care of it. On Tue, Jan 17, 2023 at 5:44 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 16/01/2023 22:38, Virgilio Fornazin wrote:
Niall
The patch supports more than Linux. Currently it supports:
Linux Free/Net/Open BSD AIX (7.2+) QNX Neutrino (7+) .. and now implemented and tested on MacOS X using sendmsg_x / recvmsg_x support (thanks for your tip).
Cool.
Personally speaking that level of platform support I would find compelling for a merge. You may wish to remix your patch suitable for standalone ASIO and submit it as a PR there (Boost.ASIO is generated by script from standalone ASIO).
Good luck with persuading Chris K to merge it, and thanks for your contribution!
Niall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (2)
-
Niall Douglas
-
Virgilio Fornazin