Re: [boost] Reimplementing ASIOs

19 Jul 2024

      On Thu, Jul 18, 2024 at 2:47 PM Niall Douglas via Boost <
boost@lists.boost.org> wrote:
...
Instead of over-allocating and wasting a page, I would put the link
pointers at the end and slightly reduce the maximum size of the i/o
buffer. This kinda is annoying to look at because the max buffer fill is
no longer a power of two, but in terms of efficiency it's the right call.
Hey, this is actually a good idea. I had similar thoughts when I was
designing it.

I can give benchmarking it a shot and see what the results are.

What kind of benchmark do you think would be the best test here? I suppose
one
thing I should try is a multishot recv benchmark with many small buffers
and a large
amount of traffic to send. Probably just max out the size of a buf_ring,
which is only
like 32k buffers anyway.

Ooh, we can even try page-aligning the buffers too.

Surely for reading you want io_uring to tell you the buffers, and when
...
you're done, you immediately push them back to io_uring? So no need to
keep buffer lists except for the write buffers?
You'd think so, but there's no such thing as a free lunch.

When it comes to borrowing the buffers, to do any meaningful work you'll
have to
either allocate and memcpy the incoming buffers so you can then immediately
release
them back to the ring or you risk buffer starvation.

This is because not all protocol libraries are designed to copy their input
from you and
they require the caller use stable storage. Beast is like this and I think
zlib is too. There's
no guarantee across protocol libraries that they'll reliably copy your
input for you.

The scheme I chose is one where users own the returned buffer sequence and
this enables
nice things like an in-place TLS decryption, which I use via Botan. This
reminds me, I use
Botan in order to provide a generally much stronger TLS interface than
Asio's.

I've experimented with routines that recycle the owned buffers but
honestly, it's faster to just
re-allocate holes in the buf_ring in `recv_awaitable::await_resume()`.
Benchmarks show a small
hit to perf but I think it's an acceptable trade-off here as I now have
properly working TLS/TCP
streams, which is kind of all that matters.

On Thu, Jul 18, 2024 at 4:28 PM Virgilio Fornazin via Boost <
boost@lists.boost.org> wrote:
...
The linux kernel code for sendmmsg/recvmmsg is just a for loop, the cost of
syscall traversing ring3 to ring0(1 on virtualized) it's something that
really pays off in high performance udp networking.
If you consider something like this, thils would be a high win for high
packet I/O use in UDP.
As Niall previously noted, you don't need recvmmsg() with io_uring.

The point of recvmmsg() was to avoid syscall overhead, which io_uring
already solves via
bulk submission and bulk reaping of completions and then via multishot
recvmsg().
multishot recvmsg() will definitely be fast enough, I confidently say while
measuring nothing.

I was torn after completing a MVP of TLS/TCP: do I add UDP or file I/O?

Unfortunately, I chose file I/O because what's the point of an io_uring
runtime if it doesn't even
offer async file I/O?

This conversation makes me realize that I should've just chosen UDP lol.

- Christian

Re: [boost] Reimplementing ASIOs

Christian Mazakas