Re: [boost] Reimplementing ASIOs

17 Jul 2024

      On Tue, Jul 16, 2024 at 1:49 PM Niall Douglas via Boost <
boost@lists.boost.org> wrote:
...
That's exactly what S&R delivers!
WG21 S&R has very severe template bloat. Some people see compile times
reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R
can be implemented in a much lighter weight way. I made mine ABI stable,
and that forces most of the template bloat to not exist.
Ha ha, this actually makes me even _more_ hesitant to adopt it. For now, I
think
the simple coroutines-only scheme I have now is sufficient. The code can
always
theoretically be altered later to support different async schemes.
...
You're not using the linked op timeout feature of io_uring?
It's a bit expensive TBH. I've 'cheated' and set a timeout directly on
the socket itself so it errors out after a while. This is nasty, but fast
:)
I use it in places. I use it for controlling connect() timeouts with TCP
sockets.

I'm not sure there's any other spots. But yes, it is quite expensive.

For sends and receives, I instead have a multishot timeout operation that's
created
when the `tcp::stream` class is. This timer automatically posts a CQE
periodically
which I then use to check activity on the TCP stream. So if it's in the
middle of an initiated
send() operation, I can check its last activity and if nothing has
happened, I can cancel
the operation.

In benchmarks, I actually didn't really notice a difference when I toggled
this functionality
in or out so it's relatively lightweight for "realistic" cases.
...
That plus the DMA registered buffers support. ASIO could support the
older form which didn't deliver much speedup, but the new form where
io_uring/the NIC allocates the receive buffers for you ... it's Windows
RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single
kernel thread without much effort now, and 100 Gbps NIC if you can keep
the i/o granularity big enough. That was expensive Mellanox userspace
TCP type performance a few years ago.
I'm not sure I know what you're talking about here, being honest. I know
io_uring
has registered buffers for file I/O and I know that you can also use a
provided buffers
API for multishot recv() and multishot read() (i.e.
`io_uring_register_buffers()` and
`io_uring_buf_ring_setup()`).

This is confusing to me because these two functions don't really allocate.
_You_ allocate
and then register them with the ring. So I'm curious about this NIC
allocating a receive buffer
for me here.

Fwiw, Fiona does actually use multishot TCP recv(), so it does use the
buf_ring stuff. This has
interesting API implications because in the epoll world, users are
accustomed to:

    co_await socket.async_recv(my_buffer);

But in Fiona, you instead have:

    auto m_buf_sequence = co_await socket.async_recv();
    return std::move(m_buf_sequence).value();

Ownership of the buffers is inverted here, which actually turns out to be
quite the API break.

Once I get the code into better shape, I'd like to start shilling it but
who knows if it'll ever catch on.

- Christian

Re: [boost] Reimplementing ASIOs

Christian Mazakas