On Tue, Jul 16, 2024 at 1:49 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
That's exactly what S&R delivers!
WG21 S&R has very severe template bloat. Some people see compile times reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R can be implemented in a much lighter weight way. I made mine ABI stable, and that forces most of the template bloat to not exist.
Ha ha, this actually makes me even _more_ hesitant to adopt it. For now, I think the simple coroutines-only scheme I have now is sufficient. The code can always theoretically be altered later to support different async schemes.
You're not using the linked op timeout feature of io_uring?
It's a bit expensive TBH. I've 'cheated' and set a timeout directly on the socket itself so it errors out after a while. This is nasty, but fast :)
I use it in places. I use it for controlling connect() timeouts with TCP sockets. I'm not sure there's any other spots. But yes, it is quite expensive. For sends and receives, I instead have a multishot timeout operation that's created when the `tcp::stream` class is. This timer automatically posts a CQE periodically which I then use to check activity on the TCP stream. So if it's in the middle of an initiated send() operation, I can check its last activity and if nothing has happened, I can cancel the operation. In benchmarks, I actually didn't really notice a difference when I toggled this functionality in or out so it's relatively lightweight for "realistic" cases.
That plus the DMA registered buffers support. ASIO could support the older form which didn't deliver much speedup, but the new form where io_uring/the NIC allocates the receive buffers for you ... it's Windows RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single kernel thread without much effort now, and 100 Gbps NIC if you can keep the i/o granularity big enough. That was expensive Mellanox userspace TCP type performance a few years ago.
I'm not sure I know what you're talking about here, being honest. I know io_uring has registered buffers for file I/O and I know that you can also use a provided buffers API for multishot recv() and multishot read() (i.e. `io_uring_register_buffers()` and `io_uring_buf_ring_setup()`). This is confusing to me because these two functions don't really allocate. _You_ allocate and then register them with the ring. So I'm curious about this NIC allocating a receive buffer for me here. Fwiw, Fiona does actually use multishot TCP recv(), so it does use the buf_ring stuff. This has interesting API implications because in the epoll world, users are accustomed to: co_await socket.async_recv(my_buffer); But in Fiona, you instead have: auto m_buf_sequence = co_await socket.async_recv(); return std::move(m_buf_sequence).value(); Ownership of the buffers is inverted here, which actually turns out to be quite the API break. Once I get the code into better shape, I'd like to start shilling it but who knows if it'll ever catch on. - Christian