Boost Foundation, Boost, and the Beman Project
Hi all, I recently brought up (by e-mail) the subject of the relationship between the Boost Foundation and the Beman Project before the BF board. I expressed the opinion that, since the Boost Foundation's primary purpose is to support Boost, the Beman Project should eventually have its own backing entity (foundation/nonprofit), instead of the existing Boost Foundation supporting both. David Sankel responded, quoted Boost Foundation's mission statement: "The Boost Foundation’s broad C++ mission is: (a) development of high quality, expert reviewed, legally unencumbered, open-source libraries, (b) inspiring standard enhancements, and (c) advancing and disseminating software development best practices. It does this by fostering community engagement, nurturing leaders, providing necessary financial/legal support, and making directional decisions in the event of Boost community deadlock." and helpfully pointed out that it very much doesn't say that supporting Boost is the primary purpose of the Foundation. And indeed, if one pays attention to the above, one would notice that Boost only appears once, at the end, somewhat incidentally, and as an afterthought. Unsurprisingly, I disagree. I think that the primary purpose of an entity named "the Boost Foundation" should be to support Boost, and if it currently isn't, something not quite right. At the moment I'm not proposing anything yet; this is purely informative. But no matter how I look at it, I see a pretty fundamental difference of opinion, which we'll need to deal with at some point.
On Thu, Jul 11, 2024 at 5:29 PM Peter Dimov via Boost
Hi all,
[...]
Unsurprisingly, I disagree. I think that the primary purpose of an entity named "the Boost Foundation" should be to support Boost, and if it currently isn't, something not quite right.
I think I am not quite understanding the history between the Boost foundation and Boost. Wasn't there a Boost Steering Committee? How does that connect to the founding of Boost? Who incorporated what when?
On 11/07/2024 10:28, Peter Dimov via Boost wrote:
I recently brought up (by e-mail) the subject of the relationship between the Boost Foundation and the Beman Project before the BF board.
I expressed the opinion that, since the Boost Foundation's primary purpose is to support Boost, the Beman Project should eventually have its own backing entity (foundation/nonprofit), instead of the existing Boost Foundation supporting both.
David Sankel responded, quoted Boost Foundation's mission statement:
"The Boost Foundation’s broad C++ mission is: (a) development of high quality, expert reviewed, legally unencumbered, open-source libraries, (b) inspiring standard enhancements, and (c) advancing and disseminating software development best practices. It does this by fostering community engagement, nurturing leaders, providing necessary financial/legal support, and making directional decisions in the event of Boost community deadlock."
and helpfully pointed out that it very much doesn't say that supporting Boost is the primary purpose of the Foundation.
And indeed, if one pays attention to the above, one would notice that Boost only appears once, at the end, somewhat incidentally, and as an afterthought.
Unsurprisingly, I disagree. I think that the primary purpose of an entity named "the Boost Foundation" should be to support Boost, and if it currently isn't, something not quite right.
Historically what used to be the entity which was formerly the entity which preceded the Boost Foundation got most of its money from C++ Now, which was formerly BoostCon. Back when it was BoostCon, it could be strongly argued that the monies involved were generated by Boost. Since the name change, that has been harder to argue, and since covid emptied coffers everywhere, any monies generated since are even less Boost involved. We have always struggled with spending that money. The purse strings have been open for spending on infrastructure (e.g. Boost test regression servers), students (e.g. Louis Dionne) but not historically on people. It therefore mounted and mounted, unspent. I mainly agreed with not spending on devs, but I did not agree with refusing to spend on maintenance and infrastructure, and probably on docs. We certainly could have done with paid workers on those, and it wasn't like we couldn't afford it. This left open a resourcing gap which the C++ Alliance has filled, and as has been very evident from all the posts answering "A question for folks here". I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past. Though perhaps it is I can't think of others at the moment. In any case, the C++ Alliance has spent its way into a viable contender for taking over Boost, and I think the former steering group could have handled its rise better, especially around optics and communication, neither of which were ever its strong points. It certainly could have looked harder at its past inertia, and chosen to take more risks. All this didn't need to come out as it has, but no point looking backwards. We should look forwards.
At the moment I'm not proposing anything yet; this is purely informative. But no matter how I look at it, I see a pretty fundamental difference of opinion, which we'll need to deal with at some point.
I think you're seeing the Beman Project as a Boost 2.0, or Boost replacement. If one saw it instead as preparing the ground for reforming the shit show which is WG21 library standardisation, then it would be complementary. Boost's very own founders had first had experience of the shit show which is WG21 library standardisation. It could be argued that Dave left C++ over it, and Beman holds the record for the longest and hardest library standardisation process ever at WG21. I think Boost has - for extremely good reasons given the evidence - stopped trying at WG21. Therefore a new org focused on library at WG21 seems to me appropriate at this time. If one chooses to see things as I am, then I see no issue with the Boost Foundation continuing its role for Boost and the Beman Project. Separate things. I also see no issue with leaving everything absolutely as it currently is going forth, either. Niall
On Thu, Jul 11, 2024 at 10:13 AM Niall Douglas via Boost
On 11/07/2024 10:28, Peter Dimov via Boost wrote:
At the moment I'm not proposing anything yet; this is purely informative. But no matter how I look at it, I see a pretty fundamental difference of opinion, which we'll need to deal with at some point.
I think you're seeing the Beman Project as a Boost 2.0, or Boost replacement.
If one saw it instead as preparing the ground for reforming the shit show which is WG21 library standardisation, then it would be complementary.
Boost's very own founders had first had experience of the shit show which is WG21 library standardisation. It could be argued that Dave left C++ over it, and Beman holds the record for the longest and hardest library standardisation process ever at WG21. I think Boost has - for extremely good reasons given the evidence - stopped trying at WG21.
I had the opportunity to chat with one of the contributors to the first release of Boost libraries at the last WG21 meeting. And my impression of the rationale for the foundation of Boost now differs from what appears to be the popular understanding. Boost originally didn't specifically aim to be an avenue for libraries to be adopted into the C++ Standard. The aim was to collect and distribute quality libraries aimed at general C++ developers in a web site. Everything else was just happenstance. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
On 11/07/2024 16:29, René Ferdinand Rivera Morell wrote:
On Thu, Jul 11, 2024 at 10:13 AM Niall Douglas via Boost
wrote: On 11/07/2024 10:28, Peter Dimov via Boost wrote:
At the moment I'm not proposing anything yet; this is purely informative. But no matter how I look at it, I see a pretty fundamental difference of opinion, which we'll need to deal with at some point.
I think you're seeing the Beman Project as a Boost 2.0, or Boost replacement.
If one saw it instead as preparing the ground for reforming the shit show which is WG21 library standardisation, then it would be complementary.
Boost's very own founders had first had experience of the shit show which is WG21 library standardisation. It could be argued that Dave left C++ over it, and Beman holds the record for the longest and hardest library standardisation process ever at WG21. I think Boost has - for extremely good reasons given the evidence - stopped trying at WG21.
I had the opportunity to chat with one of the contributors to the first release of Boost libraries at the last WG21 meeting. And my impression of the rationale for the foundation of Boost now differs from what appears to be the popular understanding. Boost originally didn't specifically aim to be an avenue for libraries to be adopted into the C++ Standard. The aim was to collect and distribute quality libraries aimed at general C++ developers in a web site. Everything else was just happenstance.
It could indeed be very fairly argued that WG21 did, for a period, prefer to standardise from Boost. And then it stopped doing that. My main recollection of early Boost was it was principally a collection of evil hacks and workarounds for C++ compilers being terrible. So a "C++ compiler portability layer" as it were. After that came collections of useful algorithms etc, but TBH most shops had their own local algorithm libraries, so that part was less useful. Then came the rise of "killer apps" for Boost, of which smart pointers, networking, parsing and Python integration were definite drivers as it was easier to use Boost's stuff than locally reinvent. That gap has reopened since. My last two jobs have seen me reimplementing ASIO several times over now, as that's what the customer wants. Niall
El 11/07/2024 a las 17:41, Niall Douglas via Boost escribió:
On 11/07/2024 16:29, René Ferdinand Rivera Morell wrote:
I had the opportunity to chat with one of the contributors to the first release of Boost libraries at the last WG21 meeting. And my impression of the rationale for the foundation of Boost now differs from what appears to be the popular understanding. Boost originally didn't specifically aim to be an avenue for libraries to be adopted into the C++ Standard. The aim was to collect and distribute quality libraries aimed at general C++ developers in a web site. Everything else was just happenstance.
It could indeed be very fairly argued that WG21 did, for a period, prefer to standardise from Boost. And then it stopped doing that.
Beman's original rationale is available for everyone: https://www.boost.org/users/proposal.pdf ------ "A world-wide web site containing a repository of free C++ class libraries would be of great benefit to the C++ community. Although other sites supply specific libraries or provide links to libraries, there is currently no well-known web site that acts as a general repository for C++ libraries. The vision is of a site where programmers can find libraries they need, post libraries they would like to share, and act as a focal point to encourage innovative C++ library development. An online peer review process is envisioned to ensure library quality with a minimum of bureaucracy. Secondary goals include encouraging effective programming techniques and providing a focal point for C++ programmers to participate in a wider community. Additionally, such a site might foster C++ standards activity by helping to establish existing practice." ------ So definitely Boost's original goal was not to create libraries for the standard, but to be a repository of top-quality C++ libraries. That's why I explicitly posted in this ML that trying to describe the Beman Project as "what Boost was aimed for" is not correct. Each project has a different goal. Best, Ion
On Thu, Jul 11, 2024 at 3:17 PM Ion Gaztañaga via Boost
So definitely Boost's original goal was not to create libraries for the standard, but to be a repository of top-quality C++ libraries.
Indeed. But for me it was good to hear direct confirmation from someone who was there at the start. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
On Thu, Jul 11, 2024 at 4:30 PM René Ferdinand Rivera Morell via Boost < boost@lists.boost.org> wrote:
On Thu, Jul 11, 2024 at 3:17 PM Ion Gaztañaga via Boost
wrote: So definitely Boost's original goal was not to create libraries for the standard, but to be a repository of top-quality C++ libraries.
Indeed. But for me it was good to hear direct confirmation from someone who was there at the start.
I reached out to Dave Abrahams and this was his response: The “additional goal” certainly loomed large in my mind. The way I understood it, the whole discussion started because Beman was asking where we would get libraries with existing practice for the next standard. So, two possibilities: 1. My impression of the original intent is incorrect 2. It was decided that the best way to fulfill that goal was to incubate a general library collection It's possible it was some combination. Feel free to share this. Obviously, I wasn't there for the original discussion; you'd have to check with Robert Klarer for a first-hand account.
On Thu, Jul 11, 2024 at 1:17 PM Ion Gaztañaga via Boost < boost@lists.boost.org> wrote:
...Boost's original goal was...
The history lesson is nice, and may be academic. Currently, Boost is a collection of useful C++ libraries which need maintenance, and we do not have control over which new libraries are proposed. The collection comes with implied support over time. With Boost's large number of mostly-silent users, the only relevant "mission" is to maximize the value delivered to those users. How we accomplish that is worth its own discussion. Thanks
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
There is/are at least a few more independents,but a short list.
On Thursday, July 11, 2024 at 05:23:33 PM GMT+2, Niall Douglas via Boost
I recently brought up (by e-mail) the subject of the relationship between the Boost Foundation and the Beman Project before the BF board.
I expressed the opinion that, since the Boost Foundation's primary purpose is to support Boost, the Beman Project should eventually have its own backing entity (foundation/nonprofit), instead of the existing Boost Foundation supporting both.
David Sankel responded, quoted Boost Foundation's mission statement:
"The Boost Foundation’s broad C++ mission is: (a) development of high quality, expert reviewed, legally unencumbered, open-source libraries, (b) inspiring standard enhancements, and (c) advancing and disseminating software development best practices. It does this by fostering community engagement, nurturing leaders, providing necessary financial/legal support, and making directional decisions in the event of Boost community deadlock."
and helpfully pointed out that it very much doesn't say that supporting Boost is the primary purpose of the Foundation.
And indeed, if one pays attention to the above, one would notice that Boost only appears once, at the end, somewhat incidentally, and as an afterthought.
Unsurprisingly, I disagree. I think that the primary purpose of an entity named "the Boost Foundation" should be to support Boost, and if it currently isn't, something not quite right.
Historically what used to be the entity which was formerly the entity which preceded the Boost Foundation got most of its money from C++ Now, which was formerly BoostCon. Back when it was BoostCon, it could be strongly argued that the monies involved were generated by Boost. Since the name change, that has been harder to argue, and since covid emptied coffers everywhere, any monies generated since are even less Boost involved. We have always struggled with spending that money. The purse strings have been open for spending on infrastructure (e.g. Boost test regression servers), students (e.g. Louis Dionne) but not historically on people. It therefore mounted and mounted, unspent. I mainly agreed with not spending on devs, but I did not agree with refusing to spend on maintenance and infrastructure, and probably on docs. We certainly could have done with paid workers on those, and it wasn't like we couldn't afford it. This left open a resourcing gap which the C++ Alliance has filled, and as has been very evident from all the posts answering "A question for folks here". I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past. Though perhaps it is I can't think of others at the moment. In any case, the C++ Alliance has spent its way into a viable contender for taking over Boost, and I think the former steering group could have handled its rise better, especially around optics and communication, neither of which were ever its strong points. It certainly could have looked harder at its past inertia, and chosen to take more risks. All this didn't need to come out as it has, but no point looking backwards. We should look forwards.
At the moment I'm not proposing anything yet; this is purely informative. But no matter how I look at it, I see a pretty fundamental difference of opinion, which we'll need to deal with at some point.
I think you're seeing the Beman Project as a Boost 2.0, or Boost replacement. If one saw it instead as preparing the ground for reforming the shit show which is WG21 library standardisation, then it would be complementary. Boost's very own founders had first had experience of the shit show which is WG21 library standardisation. It could be argued that Dave left C++ over it, and Beman holds the record for the longest and hardest library standardisation process ever at WG21. I think Boost has - for extremely good reasons given the evidence - stopped trying at WG21. Therefore a new org focused on library at WG21 seems to me appropriate at this time. If one chooses to see things as I am, then I see no issue with the Boost Foundation continuing its role for Boost and the Beman Project. Separate things. I also see no issue with leaving everything absolutely as it currently is going forth, either. Niall _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Thu, Jul 11, 2024 at 11:45 AM Christopher Kormanyos via Boost
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
There is/are at least a few more independents,but a short list.
I also "haven't had income from the C++ Alliance". You can get a rough idea of who, at least minimally, maintain their libraries from the responses I got to the modular PR merge process inquiry.. https://github.com/users/grafikrobot/projects/1/views/6 -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supone Nada -- Robot Dreams - http://robot-dreams.net
On 7/11/24 18:13, Niall Douglas via Boost wrote:
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
I haven't. I think, there are quite a few developers who I don't think are affiliated with The C++ Alliance.
Andrey Semashev wrote:
On 7/11/24 18:13, Niall Douglas via Boost wrote:
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
I haven't. I think, there are quite a few developers who I don't think are affiliated with The C++ Alliance.
I wonder what I need to do for Niall to consider me a library developer. Write ten more libraries? Take over the maintenance of twenty additional ones?
On Thursday, July 11, 2024, Peter Dimov wrote:
Andrey Semashev wrote:
On 7/11/24 18:13, Niall Douglas via Boost wrote:
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
I haven't. I think, there are quite a few developers who I don't think are affiliated with The C++ Alliance.
I wonder what I need to do for Niall to consider me a library developer.
Write ten more libraries? Take over the maintenance of twenty additional ones?
As much as I criticize Twitter it does have those "Community Notes" or "Fact Checks" that are commendable. The mailing list equivalent is someone manually replying to Niall's posts to point out that they are almost always entirely false. (Also someone who has never received a single cent from the C++ Alliance but has helped it on a volunteer basis, but is a library author and contributor to many more libraries than the person inventing fiction about Boost yet again). Glen
Niall Douglas via Boost That gap has reopened since. My last two jobs have seen me reimplementing ASIO several times over now, as that's what the customer wants.
I'm guessing these were custom event loops around io_uring? If so, we should talk shop someday, Niall. It's kind of interesting, once you get the hang of writing an event loop, this isn't such a daunting task if you're able to bring over and port a good portion of tests. Plus, this kind of makes sense for most shops. You use C++ 'cause you care about low-level stuff and you'd wanna own this portion of your tech stack. - Christian
On 12/07/2024 16:18, Christian Mazakas via Boost wrote:
Niall Douglas via Boost That gap has reopened since. My last two jobs have seen me reimplementing ASIO several times over now, as that's what the customer wants.
I'm guessing these were custom event loops around io_uring? If so, we should talk shop someday, Niall.
My current employer is Linux only, so I've been able to very tightly wind this thing around io_uring. My current latest iteration is completely wait free, malloc free, lock free and 100% deterministic around io_uring. I also implemented priorities so some work gets prioritised over other work, and that is separate to i/o priority for io_uring (which has no effect anyway for most default kernel configs). A nice thing is that the benchmarking tool written on top of my stuff shows 8-10% better results than the fio tool, written by Axboe himself. It is that more efficient.
It's kind of interesting, once you get the hang of writing an event loop, this isn't such a daunting task if you're able to bring over and port a good portion of tests.
Depends on the idiom. My current latest iteration fuses a Boost.Fiber type stackful coroutine in with the i/o execution. So when you're writing your code, your current execution context will suspend and resume as i/o initiates and completes. Basically it's a kernel thread scheduler, and indeed the entire thing is written in C as that is the employer's wish. Thankfully C 23 is the least painful C to write in yet. I principally miss lambdas, but otherwise it's not too bad. Outcome is shortly going to receive much improved C support. Outcome has shipped with C support since the beginning, but it will shortly be first tier instead of second tier support. In any case, the stackful coroutine type approach is _very_ different to the Senders-Receivers type approach which my preceding ASIO reimplementation used. Tests need basically rewriting. I have, of course, written a portability layer so the new implementation can quack like the old one, and that should enable migration of the work codebase.
Plus, this kind of makes sense for most shops. You use C++ 'cause you care about low-level stuff and you'd wanna own this portion of your tech stack.
You're definitely right that enough performance is left on the table with ASIO that it's worth shops investing in their own custom implementation if they care enough about the last few percent. Our codebase is currently Boost.Fiber based, and getting off that is worth the investment. As anybody who has used it for real world applications will have experienced, it has 'quirks'. Niall
Sounds sick. You should paste a link if you can. Check this out: https://github.com/cmazakas/fiona It's a take two on Asio that aims to fix all of the things I found annoying about it. Namely, my implementation here actually supports foreign awaitables because I just copied Rust's Waker approach. - Christian
Christian Mazakas wrote:
Sounds sick. You should paste a link if you can.
Check this out: https://github.com/cmazakas/fiona
It's a take two on Asio that aims to fix all of the things I found annoying about it. Namely, my implementation here actually supports foreign awaitables because I just copied Rust's Waker approach.
Christian, you should quote whatever and whoever you're replying to, because as is, that's not clear at all.
On 16/07/2024 20:12, Christian Mazakas via Boost wrote:
Sounds sick. You should paste a link if you can.
Employer's code, not mine. They'll be open sourcing everything next year, and unlike most promises of that kind in this situation they really will have to.
Check this out: https://github.com/cmazakas/fiona
It's a take two on Asio that aims to fix all of the things I found annoying about it. Namely, my implementation here actually supports foreign awaitables because I just copied Rust's Waker approach.
I'm all for foreign awaitables support. Everything I do in C++ coroutines allows arbitrary awaitable types. A library has no business imposing what users can use there in my opinion, and Klemens of course took that boat right out to sea with Boost.Cobalt. But ... I don't agree with hard coding in C++ coroutines personally. I think Sender-Receiver (before WG21 corrupted it) is a better design choice here especially as if within a C++ coroutine you can co_await and it'll "just work" without any extra effort. Here's what I did: - Free function `co_initialize()` connects a Sender to an appropriate C++ coroutine resuming Receiver and launches it. - Free function `initialize()` is a customisation point for how to launch connected S&R states. - Free function `connect()` is a customisation point for how to generate connected S&R states. So, as an example, you might `connect()` a Sender to a `boost::fiber::future` and that would connect to a Receiver which sets the future's promise on completion. As you can see, this is very flexible, and you can completely avoid all malloc-free cycles entirely which C++ coroutines make it hard to do. (My new 100% C executor has runtime switchable context switchers, so you can configure a C++ coroutine context switcher if you want. It being C it just isn't compile time enforced nor checkable, that's all) I see in your github repo you are benching against ASIO. What kinds of results did you get? Niall
On Tue, Jul 16, 2024 at 12:35 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Employer's code, not mine. They'll be open sourcing everything next year, and unlike most promises of that kind in this situation they really will have to.
Hey, that sounds good to me! I'd say that I have a somewhat naive use of io_uring. I just use liburing and whatever that API happens to expose. It does happen to expose quite a bit, however. I feel like I'm always learning new little tricks and flags. `IORING_SETUP_DEFER_TASKRUN` was a neat one to experiment with because it actually had tangible effects on my scheduling code and showed how fragile some of my code was to concurrency.
But ... I don't agree with hard coding in C++ coroutines personally. I think Sender-Receiver (before WG21 corrupted it) is a better design choice here especially as if within a C++ coroutine you can co_await and it'll "just work" without any extra effort.
This is interesting. Asio was developed when there was no standardized concurrency primitive in C++. We now have one: c++20 coroutines. To me, the universal completion token stuff was a lot of try-hard and template bloat for a feature wasn't worth its weight. But at the time, we didn't know better because no one was doing this kind of stuff. I think in hindsight, the universal completion token was a mistake. Maybe Sender, Receiver abuses all that ADL to avoid introducing templates here but I'm hesitant to un-hardcode myself from coroutines because being realistic, I imagine most C++ users really just wanna `co_await some_socket_recv();`.
I see in your github repo you are benching against ASIO. What kinds of results did you get?
Pretty alright. I have benchmarks that attempt to measure both latency and throughput and in general, I'm like 1.75x faster than Asio, almost 2x. This includes builtin timeouts so I use Beast's tcp_stream for this purpose. I guess this affects the latency-based benchmark more but for the throughput one, io_uring's batched I/O and handling of it really starts to shine. Anything where you can use multishot recv() effectively means you're going to shred Asio or other readiness-based models. - Christian
On 16/07/2024 21:29, Christian Mazakas via Boost wrote:
On Tue, Jul 16, 2024 at 12:35 PM Niall Douglas via Boost
But ... I don't agree with hard coding in C++ coroutines personally. I think Sender-Receiver (before WG21 corrupted it) is a better design choice here especially as if within a C++ coroutine you can co_await and it'll "just work" without any extra effort.
This is interesting. Asio was developed when there was no standardized concurrency primitive in C++. We now have one: c++20 coroutines. To me, the universal completion token stuff was a lot of try-hard and template bloat for a feature wasn't worth its weight. But at the time, we didn't know better because no one was doing this kind of stuff.
I think in hindsight, the universal completion token was a mistake. Maybe Sender, Receiver abuses all that ADL to avoid introducing templates here but I'm hesitant to un-hardcode myself from coroutines because being realistic, I imagine most C++ users really just wanna `co_await some_socket_recv();`.
That's exactly what S&R delivers! WG21 S&R has very severe template bloat. Some people see compile times reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R can be implemented in a much lighter weight way. I made mine ABI stable, and that forces most of the template bloat to not exist.
I see in your github repo you are benching against ASIO. What kinds of results did you get?
Pretty alright.
I have benchmarks that attempt to measure both latency and throughput and in general, I'm like 1.75x faster than Asio, almost 2x. This includes builtin timeouts so I use Beast's tcp_stream for this purpose. I guess this affects the latency-based benchmark more but for the throughput one, io_uring's batched I/O and handling of it really starts to shine.
You're not using the linked op timeout feature of io_uring? It's a bit expensive TBH. I've 'cheated' and set a timeout directly on the socket itself so it errors out after a while. This is nasty, but fast :)
Anything where you can use multishot recv() effectively means you're going to shred Asio or other readiness-based models.
That plus the DMA registered buffers support. ASIO could support the older form which didn't deliver much speedup, but the new form where io_uring/the NIC allocates the receive buffers for you ... it's Windows RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single kernel thread without much effort now, and 100 Gbps NIC if you can keep the i/o granularity big enough. That was expensive Mellanox userspace TCP type performance a few years ago. Niall
On Tue, Jul 16, 2024 at 1:49 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
That's exactly what S&R delivers!
WG21 S&R has very severe template bloat. Some people see compile times reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R can be implemented in a much lighter weight way. I made mine ABI stable, and that forces most of the template bloat to not exist.
Ha ha, this actually makes me even _more_ hesitant to adopt it. For now, I think the simple coroutines-only scheme I have now is sufficient. The code can always theoretically be altered later to support different async schemes.
You're not using the linked op timeout feature of io_uring?
It's a bit expensive TBH. I've 'cheated' and set a timeout directly on the socket itself so it errors out after a while. This is nasty, but fast :)
I use it in places. I use it for controlling connect() timeouts with TCP sockets. I'm not sure there's any other spots. But yes, it is quite expensive. For sends and receives, I instead have a multishot timeout operation that's created when the `tcp::stream` class is. This timer automatically posts a CQE periodically which I then use to check activity on the TCP stream. So if it's in the middle of an initiated send() operation, I can check its last activity and if nothing has happened, I can cancel the operation. In benchmarks, I actually didn't really notice a difference when I toggled this functionality in or out so it's relatively lightweight for "realistic" cases.
That plus the DMA registered buffers support. ASIO could support the older form which didn't deliver much speedup, but the new form where io_uring/the NIC allocates the receive buffers for you ... it's Windows RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single kernel thread without much effort now, and 100 Gbps NIC if you can keep the i/o granularity big enough. That was expensive Mellanox userspace TCP type performance a few years ago.
I'm not sure I know what you're talking about here, being honest. I know io_uring has registered buffers for file I/O and I know that you can also use a provided buffers API for multishot recv() and multishot read() (i.e. `io_uring_register_buffers()` and `io_uring_buf_ring_setup()`). This is confusing to me because these two functions don't really allocate. _You_ allocate and then register them with the ring. So I'm curious about this NIC allocating a receive buffer for me here. Fwiw, Fiona does actually use multishot TCP recv(), so it does use the buf_ring stuff. This has interesting API implications because in the epoll world, users are accustomed to: co_await socket.async_recv(my_buffer); But in Fiona, you instead have: auto m_buf_sequence = co_await socket.async_recv(); return std::move(m_buf_sequence).value(); Ownership of the buffers is inverted here, which actually turns out to be quite the API break. Once I get the code into better shape, I'd like to start shilling it but who knows if it'll ever catch on. - Christian
On 17/07/2024 18:17, Christian Mazakas via Boost wrote:
That plus the DMA registered buffers support. ASIO could support the older form which didn't deliver much speedup, but the new form where io_uring/the NIC allocates the receive buffers for you ... it's Windows RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single kernel thread without much effort now, and 100 Gbps NIC if you can keep the i/o granularity big enough. That was expensive Mellanox userspace TCP type performance a few years ago.
I'm not sure I know what you're talking about here, being honest. I know io_uring has registered buffers for file I/O and I know that you can also use a provided buffers API for multishot recv() and multishot read() (i.e. `io_uring_register_buffers()` and `io_uring_buf_ring_setup()`).
This is confusing to me because these two functions don't really allocate. _You_ allocate and then register them with the ring. So I'm curious about this NIC allocating a receive buffer for me here.
Fwiw, Fiona does actually use multishot TCP recv(), so it does use the buf_ring stuff. This has interesting API implications because in the epoll world, users are accustomed to:
co_await socket.async_recv(my_buffer);
But in Fiona, you instead have:
auto m_buf_sequence = co_await socket.async_recv(); return std::move(m_buf_sequence).value();
Ownership of the buffers is inverted here, which actually turns out to be quite the API break.
Once I get the code into better shape, I'd like to start shilling it but who knows if it'll ever catch on.
Yes, you're already using the thing I was referring to, which is the "ring provided buffers" feature via the API io_uring_register_buf_ring. You're right that its docs presents the feature as userspace allocating pages from the kernel, then giving ownership of those pages to io_uring, which then fills them with received data as it chooses and hands ownership back to userspace. That's how it appears from userspace anyway. If I were the kernel, I'd free the backing store for the pages handed to me, and repoint the virtual memory address at pages coming off the NIC's DMA. Depends on the NIC, some can address all of memory, some a subset, some barely at all. High end NICs would be very efficient, occasional memory copying might be needed for prosumer NICs, and for cheap and nasty NICs incapable of more than a 64Kb window ... well, kinda have to copy memory there. Anyway point is having the kernel tell you the buffers filled instead of you telling it what buffers to fill is the right design. This is why LLFIO's read op allowed reads to fill in buffers read completely differently to buffers supplied, incidentally. Niall
On Wed, Jul 17, 2024 at 1:31 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Anyway point is having the kernel tell you the buffers filled instead of you telling it what buffers to fill is the right design. This is why LLFIO's read op allowed reads to fill in buffers read completely differently to buffers supplied, incidentally.
Ha ha, indeed. Actually, it might be nice to compare implementation notes. For Fiona, I create a buffer sequence for the user by over-allocating for each buffer and making an intrusive doubly-linked list. This means that the user gets a mutable buffer sequence without the need for any intermediate allocations. It also enables nice constant time push/pop and slicing. I was heavily inspired by Rust and Asio when coming up with the buffer abstraction: https://github.com/cmazakas/fiona/blob/main/test/buffer_test.cpp If you'd like, I'd really appreciate any implementation feedback on what's going on here. Not many seem to have the expertise in io_uring and this kind of stuff so it's rare when I get to really talk shop. Maybe I need to actually sit down and write some docs because I realize there's a lot to the codebase here and it's hard to convey everything just using email. - Christian
a good point is to support sendmmsg / recvmmsg on linux/bsd (macos similar syscall SYS_sendmsg_x / SYS_recvmsg_x) to get good UDP throughput support. I didn't find io_uring support for sendmmsg/recvmmsg at that time, maybe a time you could get coding / pushing this too. I keep the repo alive at https://github.com/virgiliofornazin/asio/tree/feature/multiple_datagram_buff... it's working rock solid on a vpn platform capable of about 3gbps udp without packet loss in a 10gbe 9k mtu network. On Thu, Jul 18, 2024 at 5:59 PM Christian Mazakas via Boost < boost@lists.boost.org> wrote:
On Wed, Jul 17, 2024 at 1:31 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Anyway point is having the kernel tell you the buffers filled instead of you telling it what buffers to fill is the right design. This is why LLFIO's read op allowed reads to fill in buffers read completely differently to buffers supplied, incidentally.
Ha ha, indeed.
Actually, it might be nice to compare implementation notes. For Fiona, I create a buffer sequence for the user by over-allocating for each buffer and making an intrusive doubly-linked list. This means that the user gets a mutable buffer sequence without the need for any intermediate allocations.
It also enables nice constant time push/pop and slicing. I was heavily inspired by Rust and Asio when coming up with the buffer abstraction: https://github.com/cmazakas/fiona/blob/main/test/buffer_test.cpp
If you'd like, I'd really appreciate any implementation feedback on what's going on here. Not many seem to have the expertise in io_uring and this kind of stuff so it's rare when I get to really talk shop.
Maybe I need to actually sit down and write some docs because I realize there's a lot to the codebase here and it's hard to convey everything just using email.
- Christian
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 18/07/2024 22:34, Virgilio Fornazin via Boost wrote:
a good point is to support sendmmsg / recvmmsg on linux/bsd (macos similar syscall SYS_sendmsg_x / SYS_recvmsg_x) to get good UDP throughput support.
I didn't find io_uring support for sendmmsg/recvmmsg at that time, maybe a time you could get coding / pushing this too.
I believe the mm-variants of send and receive take a very different code path to what io_uring uses, so it will never support those. It's a bit like the other zero copy i/o path in Linux, it uses a weird code path to work and io_uring won't support it, ever. So they're forging ahead with their own zero copy approach which I believe requires NIC driver support, otherwise you get silent memory copying. I haven't personally tested it, but the multishot receive feature of io_uring should get quite close to recvmmsg performance. In any case, the once class leading UDP performance in Windows RIO Linux is steadily closing the gap. I believe FreeBSD still holds the record for the most packets switched per core per second, and that probably matters more for real world performance. Niall
The linux kernel code for sendmmsg/recvmmsg is just a for loop, the cost of syscall traversing ring3 to ring0(1 on virtualized) it's something that really pays off in high performance udp networking. If you consider something like this, thils would be a high win for high packet I/O use in UDP. On Thu, Jul 18, 2024 at 6:53 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 18/07/2024 22:34, Virgilio Fornazin via Boost wrote:
a good point is to support sendmmsg / recvmmsg on linux/bsd (macos similar syscall SYS_sendmsg_x / SYS_recvmsg_x) to get good UDP throughput support.
I didn't find io_uring support for sendmmsg/recvmmsg at that time, maybe a time you could get coding / pushing this too.
I believe the mm-variants of send and receive take a very different code path to what io_uring uses, so it will never support those.
It's a bit like the other zero copy i/o path in Linux, it uses a weird code path to work and io_uring won't support it, ever. So they're forging ahead with their own zero copy approach which I believe requires NIC driver support, otherwise you get silent memory copying.
I haven't personally tested it, but the multishot receive feature of io_uring should get quite close to recvmmsg performance.
In any case, the once class leading UDP performance in Windows RIO Linux is steadily closing the gap. I believe FreeBSD still holds the record for the most packets switched per core per second, and that probably matters more for real world performance.
Niall
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Thu, Jul 18, 2024 at 2:47 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Instead of over-allocating and wasting a page, I would put the link pointers at the end and slightly reduce the maximum size of the i/o buffer. This kinda is annoying to look at because the max buffer fill is no longer a power of two, but in terms of efficiency it's the right call.
Hey, this is actually a good idea. I had similar thoughts when I was designing it. I can give benchmarking it a shot and see what the results are. What kind of benchmark do you think would be the best test here? I suppose one thing I should try is a multishot recv benchmark with many small buffers and a large amount of traffic to send. Probably just max out the size of a buf_ring, which is only like 32k buffers anyway. Ooh, we can even try page-aligning the buffers too. Surely for reading you want io_uring to tell you the buffers, and when
you're done, you immediately push them back to io_uring? So no need to keep buffer lists except for the write buffers?
You'd think so, but there's no such thing as a free lunch. When it comes to borrowing the buffers, to do any meaningful work you'll have to either allocate and memcpy the incoming buffers so you can then immediately release them back to the ring or you risk buffer starvation. This is because not all protocol libraries are designed to copy their input from you and they require the caller use stable storage. Beast is like this and I think zlib is too. There's no guarantee across protocol libraries that they'll reliably copy your input for you. The scheme I chose is one where users own the returned buffer sequence and this enables nice things like an in-place TLS decryption, which I use via Botan. This reminds me, I use Botan in order to provide a generally much stronger TLS interface than Asio's. I've experimented with routines that recycle the owned buffers but honestly, it's faster to just re-allocate holes in the buf_ring in `recv_awaitable::await_resume()`. Benchmarks show a small hit to perf but I think it's an acceptable trade-off here as I now have properly working TLS/TCP streams, which is kind of all that matters. On Thu, Jul 18, 2024 at 4:28 PM Virgilio Fornazin via Boost < boost@lists.boost.org> wrote:
The linux kernel code for sendmmsg/recvmmsg is just a for loop, the cost of syscall traversing ring3 to ring0(1 on virtualized) it's something that really pays off in high performance udp networking. If you consider something like this, thils would be a high win for high packet I/O use in UDP.
As Niall previously noted, you don't need recvmmsg() with io_uring. The point of recvmmsg() was to avoid syscall overhead, which io_uring already solves via bulk submission and bulk reaping of completions and then via multishot recvmsg(). multishot recvmsg() will definitely be fast enough, I confidently say while measuring nothing. I was torn after completing a MVP of TLS/TCP: do I add UDP or file I/O? Unfortunately, I chose file I/O because what's the point of an io_uring runtime if it doesn't even offer async file I/O? This conversation makes me realize that I should've just chosen UDP lol. - Christian
On 19/07/2024 17:12, Christian Mazakas via Boost wrote:
On Thu, Jul 18, 2024 at 2:47 PM Niall Douglas via Boost < boost@lists.boost.org> wrote:
Instead of over-allocating and wasting a page, I would put the link pointers at the end and slightly reduce the maximum size of the i/o buffer. This kinda is annoying to look at because the max buffer fill is no longer a power of two, but in terms of efficiency it's the right call.
Hey, this is actually a good idea. I had similar thoughts when I was designing it.
I can give benchmarking it a shot and see what the results are.
It'll be a bit faster due to reduced TLB pressure :)
What kind of benchmark do you think would be the best test here? I suppose one thing I should try is a multishot recv benchmark with many small buffers and a large amount of traffic to send. Probably just max out the size of a buf_ring, which is only like 32k buffers anyway.
Ooh, we can even try page-aligning the buffers too.
The first one I always start with is "how much bandwidth can I transfer using a single kernel thread?" The second one is how small the write quantum can I use to still max out bandwidth from a single kernel thread. It's not dissimilar to tuning for file i/o, there is a bandwidth-latency tradeoff and latency is proportional to i/o quantum. If you can get the i/o quantum down without overly affecting bandwidth, that has huge beneficial effects on i/o latency, particularly in terms of a nice flat-ish latency distribution.
Surely for reading you want io_uring to tell you the buffers, and when you're done, you immediately push them back to io_uring? So no need to keep buffer lists except for the write buffers?
You'd think so, but there's no such thing as a free lunch.
When it comes to borrowing the buffers, to do any meaningful work you'll have to either allocate and memcpy the incoming buffers so you can then immediately release them back to the ring or you risk buffer starvation.
This is because not all protocol libraries are designed to copy their input from you and they require the caller use stable storage. Beast is like this and I think zlib is too. There's no guarantee across protocol libraries that they'll reliably copy your input for you.
The scheme I chose is one where users own the returned buffer sequence and this enables nice things like an in-place TLS decryption, which I use via Botan. This reminds me, I use Botan in order to provide a generally much stronger TLS interface than Asio's.
Oh okay. io_uring permits 4096 locked i/o buffers per ring. I put together a bit of C++ metaprogramming which encourages users to release i/o buffers as soon as possible, but if they really want to hang onto a buffer, they can. If we run out of buffers, I stall new i/o until new buffers appear. I then have per-op TSC counts so if we spend too much time stalling new i/o, the culprits holding onto buffers for too long can be easily identified. I reckon this the least worst of the approaches before us - well behaved code gets maximum performance, less well behaved code gets less performance. But everything is reliable. If you think this model through, the most efficient implementation requirement is that all work must always be suspend-resumable because any work can be suspended at any time due to temporary lack of resources. In other words, completion callbacks won't cut it here. Niall
On 18/07/2024 21:58, Christian Mazakas via Boost wrote:
Actually, it might be nice to compare implementation notes. For Fiona, I create a buffer sequence for the user by over-allocating for each buffer and making an intrusive doubly-linked list. This means that the user gets a mutable buffer sequence without the need for any intermediate allocations.
Instead of over-allocating and wasting a page, I would put the link pointers at the end and slightly reduce the maximum size of the i/o buffer. This kinda is annoying to look at because the max buffer fill is no longer a power of two, but in terms of efficiency it's the right call.
It also enables nice constant time push/pop and slicing. I was heavily inspired by Rust and Asio when coming up with the buffer abstraction: https://github.com/cmazakas/fiona/blob/main/test/buffer_test.cpp
Surely for reading you want io_uring to tell you the buffers, and when you're done, you immediately push them back to io_uring? So no need to keep buffer lists except for the write buffers? There is a further optimisation trick for writes: if you allocate the write buffers out of a memory mapped anonymous inode, you can use sendfile offload. This isn't actually always a win surprisingly enough, if your write buffers are big then it's a big win, but if they're small, it usually isn't. I try to aim for write buffers of 8Mb and I try to fill them to max before sending them. Re: scatter-gather buffers, generally if somebody is doing more than one buffer at a time, they're working with wider data structures in the program that usually don't need to be in registered i/o buffers i.e. locked memory. I've found dropping support for registered i/o buffers if there is more than one struct iovec is generally pain free.
If you'd like, I'd really appreciate any implementation feedback on what's going on here. Not many seem to have the expertise in io_uring and this kind of stuff so it's rare when I get to really talk shop.
I can definitely agree on that!
Maybe I need to actually sit down and write some docs because I realize there's a lot to the codebase here and it's hard to convey everything just using email.
Generally if you've made these things before, the concepts and patterns all are familiar and it's easy to understand each other. If you've never done it before, yeah it can be impenetrable. Niall
El 11/07/2024 a las 19:57, Andrey Semashev via Boost escribió:
On 7/11/24 18:13, Niall Douglas via Boost wrote:
I think myself, Robert Ramey and John Maddock are just about the only library devs left still active who **haven't** had income from the C++ Alliance in the past.
I haven't. I think, there are quite a few developers who I don't think are affiliated with The C++ Alliance.
Sure. I'm not affiliated with The C++ Alliance and I'm participating in Boost since 2004. That being said, I think many developers would be happy if The C++ Alliance or any other institution is interested in sponsoring their Boost work. Best, Ion
participants (12)
-
Andrey Semashev
-
Christian Mazakas
-
Christopher Kormanyos
-
David Sankel
-
Glen Fernandes
-
Ion Gaztañaga
-
Klemens Morgenstern
-
Niall Douglas
-
Peter Dimov
-
René Ferdinand Rivera Morell
-
Vinnie Falco
-
Virgilio Fornazin