
Hi Ruben, thank you very much for your thorough review and valuable suggestions! And of course, for recommending our library to Boost!
On 21.10.2024., at 10:43, Ruben Perez <rubenperez038@gmail.com> wrote:
Hi all,
This is my review of async_mqtt5.
First of all, thanks Ivica and the Mireo team for submitting this library for Boost review, and Klemens for managing the review.
- Does this library bring real benefit to C++ developers for real world use-case?
Yes. MQTT clients are widely used in the IoT world. Boost.Asio is one of the strongest networking libraries in C++. Bringing these together in a high-quality library is valuable. The library already has already more than 140 stars on GitHub, which is a good sign.
- Do you have an application for this library?
Not right now. But I used to work in the IoT world, and I would have used it if I would have had it available back then.
- Does the API match with current best practices?
It does almost everywhere. Analysis follows.
1. It's only a high-level API, akin to Boost.Redis. This may not cover some very specialized use cases, but makes it much easier handling common tasks robustly. I don't think this is concerning, since the API is flexible enough. Moreover, the Mireo team has more field experience regarding the protocol than I do. A lower level client can be added in the future without breaking compatibility, if requested by users.
2. It correctly follows Boost.Asio universal async model (except for a gotcha with executors, see my next point), accounting for allocator customization, custom completion tokens, and per-operation cancellation. Looking at the code, it's clear that the developers bear a deep understanding of Asio, which is very valuable.
3. At the moment, the client requires the following: "The same executor must execute mqtt_client::async_run and all the subsequent async_xxx operations." (quoting https://spacetime.mireo.com/async-mqtt5/async_mqtt5/asio_compliance/executor...). This is a deviation from the standard Asio model, not present in any of the libraries currently in Boost. The former statement makes the following malformed, without any diagnostic being emitted:
cli.async_run(asio::bind_executor(ex1, ...)); // 1 cli.async_publish(asio::bind_executor(ex2, ...)); // 2 cli.async_publish(asio::bind_executor(ex3, ...)); // 3
Where operations 2 and 3 may complete on any of ex1, ex2 or ex3.
I think this limitation should be lifted, and that the above code should be legal. I'd suggest making stream writes child agents of async_run, instead of async_publish and similar functions (see https://lists.boost.org/Archives/boost/2024/10/258132.php for the discussion regarding this issue).
4. The client is templated on the stream type. This is a well-tested design pattern, but can lead to higher compile times. Writing dynamic clients (where the transport type or TLS usage is determined by a runtime variable) is also more involved. I think the current architecture is acceptable, but I'd consider adding a type-erased client in the future. While this shouldn't be too much effort for well-versed Asio developers, regular users may find it unsurmountable. Alternatively, I'd also be good with completely replacing the templated client with a type-erased one (up to the authors).
5. The library internally buffers outgoing and ingoing messages automatically, which is great because it's easier to maintain resilience. However, the only limit on the publishing side are the limited 2^16 packet IDs (restricted by the protocol), with no limit on the reading side. I'm slightly concerned about applications buffering too much on either side. I don't know if this is a real concern or just produced by my lack of MQTT field experience. Would it make sense to somehow introduce a per-topic limit on the number of unacknowledged messages on both the subscribing side (possibly specified when the client subscribes) and on the publishing side? If that's not possible, I'd like to see an example on how to implement this flow control limit on the publishing side, at least.
6. Some APIs use owning types (std::vector and std::string) where non-owning types (boost::core::span and std::string_view) could suffice. I'd suggest reviewing these.
7. TLS streams require a number of template specializations to set up. At the moment, this must be done by the user even for the usual asio::ssl::stream. This risks ODR violations if two libraries end up defining such specializations independently. I'd advise placing the required boilerplate code, together with helper typedefs, in an optional, separate header.
8. The API is really small, and I think that's great.
- Is the documentation helpful and clear?
It is in general well written and clear. It is however geared towards a user that knows Asio and MQTT pretty well. I'd suggest a couple of changes:
1. There are a lot of pages justifying design decisions made by the library. This has been very useful to me, but is geared towards advanced users. I'd attempt to gather all pages explaining how to use the library in the first section, and move pages with design rationales to a second section. 2. There is no page in the discussion addressing how to use async_publish and async_receive (only an example). This new page should likely go into the first section. 3. There is no page in the discussion addressing TLS. 4. I miss an example about custom authentication. 5. I don't think the multi-threaded example captures the problems that a multi-threaded program integrating the library may face. Running everything under a single strand negates the benefit of multi-threading. I'd target the publisher example, instead. It's important that you ensure that reading the sensor happens outside the strand.
I have more minor comments, included at the bottom of this email.
- Did you try to use it? What problems or surprises did you encounter?
I built a sample publisher and subscriber, based on the examples. They're available here: https://github.com/anarthal/async-mqtt5-test
The publisher reads two sensors at different frequencies, which is something that scales poorly when using threads. The async architecture makes this task trivial, which is great. Note that the publisher doesn't co_await async_publish, because that'd prevent further sensor reads, reducing the benefits of resilience. The publisher example in the docs may consider mentioning this.
Debugging config issues is extremely difficult (see https://spacetime.mireo.com/async-mqtt5/async_mqtt5/auto_reconnect.html for a rationale). I initially tried to use a public test broker that rejected my connection because of high utilization. It took me a decent amount of time to figure it out - I tried debugging, enabling Asio handler tracking, until I finally launched Wireshark and figured it out. This is a problem during development but also in production, as logging when things go wrong is a usual practice.
I think this should be somehow addressed, at the very least by suggesting to use Wireshark/tshark (or other tools that may be practical) in the docs. When using TLS, using SSL_CTX_set_keylog_callback (https://docs.openssl.org/1.1.1/man3/SSL_CTX_set_keylog_callback/#synopsis) to create a keylog file for Wireshark may also be useful, and mentioning how to do it may be useful.
I'd at least consider the possibility of including some sort of logging or diagnostics generation, either by using a custom macro or abusing Asio handler tracking.
I found that the implementation does not correctly shutdown TLS and websockets (no calls to asio::ssl::stream::async_shutdown and beast::websocket::stream::async_close). These should be added (https://github.com/mireo/async-mqtt5/issues/18).
- What is your evaluation of the implementation?
The implementation is well written and very easy to follow, which is great.
Build times are pretty high, which is not great. The sample publisher takes 22s to build under clang-18, Linux, Debug, C++23 on my machine (30s in Release mode). Building with -ftime-trace, two points show up in the flame graph:
1. The client header includes boost/beast/websocket/stream.hpp, even when not using websockets. The header is very heavyweight and slows down compilation. I think the class can be forward declared if the definition of rebind_executor for websocket streams is changed to the following:
template <typename Stream, bool deflate_supported, typename Executor> struct rebind_executor<boost::beast::websocket::stream<asio::ssl::stream<Stream>, deflate_supported>, Executor> { using other = typename boost::beast::websocket::stream< asio::ssl::stream<typename rebind_executor<Stream, Executor>::other>, deflate_supported
; };
Some other Beast headers are included in detail/connect_op.hpp. Connection attempts to distinguish whether the stream is a websocket using metaprogramming, and then runs Beast-specific code when a websocket is detected. I think this creates too much coupling between the libraries. My suggestion is to re-architect this a bit. I'd rewrite websocket detection as a separate trait, and then provide a definition for it in a separate header, thus making Beast a peer dependency.
Beast is also use for get_lowest_layer. I don't know if this is strictly required, since asio::ip::tcp::socket does implement the layered protocol (it defines a lowest_layer_type and a get_lowest_layer function, see asio::basic_socket).
2. asio::append and asio::prepend are heavily used, which are heavy to instantiate. I'd suggest experimenting with replacing these by a custom handler class that specializes asio::associator to propagate associated characteristics, as suggested in this ML post: https://lists.boost.org/Archives/boost/2024/10/258132.php. This may or may not reduce compile times, but it's worth a try.
The library has many Boost dependencies. While this might not itself impact compile times, it affects package managers. Using Boostdep, the dependency with most transitive dependencies (by far) is Spirit, pulling in variant, thread, preprocessor, phoenix, and a number of other libs.
The obvious solution would be encouraging the author to remove the dependency. But I think we have something off in how we manage dependencies in Boost, since the parts of Spirit they're pulling don't require most of these libraries.
Regarding testing, I like seeing both unit and integration tests. Code coverage is great. I haven't looked into them in great detail, but from some snippets I saw and Ivica's comments, they seem correctly architectured. I miss sanitizers, fuzz testing and some compilers. See my comments below.
- Do you have experience with asio?
Yes. I developed and maintain Boost.MySQL.
- Did you work with MQTT in the past?
I know the basic concepts and have read part of the spec for this review. I haven't used MQTT in production. I have worked in embedded systems before, though.
- How much time did you spend on this review?
I spent around 25h reading the documentation, reference, researching MQTT, building code snippets and discussing design decisions on the mailing list.
- Final decision and comments
My recommendation is to CONDITIONALLY ACCEPT async_mqtt5 into Boost. It's a great library and I'd love to see it in Boost.
Here are a lot of feedback points, in decreasing order of importance.
1. Acceptance conditions: I think these should be addressed before Boost integration.
- Executor correctness. The restriction of forbidding different executors for different operations should be lifted. - Making debugging configuration easier. This can have the form of a documentation page explaining the available tools and how to use them, or by enhancing the current API with logging facilities. - TLS/websocket shutdown code should be added, as per https://github.com/mireo/async-mqtt5/issues/18.
2. Recommendations. These are not acceptance criteria, but I strongly recommend considering them, at least:
- Remove the hard dependency on Boost.Beast websocket in detail/rebind_executor.hpp by forward-declaring websocket::stream. - Change the websocket connect handling from if constexpr to traits, to remove the Beast hard dependency. - Consider creating an optional header containing the required boilerplate to use TLS-based connections without the need of template specializations. Helper typedefs could be beneficial. - Experiment with build times and -ftime-trace by reducing the usage of asio::prepend and Boost.Spirit in the implementation. If this happens to improve build times, consider applying the relevant changes. - Introduce fuzz testing for the encoders and decoders (I can aid with this if required). - Introduce address and undefined sanitizer builds (I can also aid with this). - Consider supporting total cancellation in async_publish and other functions. Probably requires fixing the executor issue. - Consider introducing a type-erased client that does not depend on the stream type. I'd consider acceptable entirely removing the templated one in favor of the type-erased one, if the authors feel comfortable with that. - Consider changing the mqtt_client::brokers signature to take a span<host_and_port>, where host_and_port contains a string and a port number. I think this better represents what the function does. - Consider changing usages of std::vector and std::string in public APIs to boost::core::span and std::string_view where it makes sense. - Consider introducing APIs to limit the message production/consumption rates per topic. If not feasible, introduce an example on how to rate-limit a publisher user-side. - Consider trimming some Boost dependencies to make the library lighter for package managers.
3. Minor comments. These are not acceptance criteria, either.
- Consider adding CIs for clang 16 to 19. - Apply concept checking to completion tokens, using the BOOST_ASIO_COMPLETION_TOKEN_FOR macro for C++17 compatibility. - Consider using an inline variable to implement get_error_code_category, instead of a static one. - Place non-documented, non-public functions in the detail namespace. Candidate functions: - client_error_to_string, in error.hpp - client_ec_category, in error.hpp - Auxiliary functions and types implementing properties, in property_types.hpp - to_reason_code, in reason_codes.hpp - Consider making the library compatible with the BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT macro. Although legacy, other libraries (namely a postgresql one) require it. - Consider making the will class a plain struct, instead of a class. This would make it compatible with C++20 designated initializers. - Consider renaming the library and namespace to mqtt5 or mqtt, following what we have for MySQL and Redis. Note that typing boost::async_mqtt5 may be long. - Consider making constexpr global variables (like the ones in reason_codes.hpp) inline constexpr. - Consider enhancing support for Asio handler tracking (as per https://live.boost.org/doc/libs/1_86_0/doc/html/boost_asio/overview/core/han...) by using the BOOST_ASIO_HANDLER_LOCATION macro. This may mitigate the frustration of users experiencing connectivity problems. - Consider running the thread-safety example under the thread sanitizer. I've found a lot of surprises (and learnt a lot) by doing this. - Consider renaming client_service to something that doesn't include the term "service". When working with Asio, a service usually refers to something inheriting from asio::execution_context::service, allocated once per execution context. This is not the case for client_service, which initially created confusing expectations to me. - Consider replacing the SFINAE clause in authenticator() by a static_assert. Calling authenticator() with something not fulfilling the concept is always an error - the clearer the diagnostic, the better. - Consider whether asio::cancel_after/asio::cancel_at can be used to simplify some of the internal waiting logic. - Consider moving the code snippets shown in the discussion to a C++ file and use Quickbook import statements to include them in the docs. This reduces the possibility of having out-of-sync snippets. - Consider specializing asio::associated_executor, asio::associated_immediate_executor and asio::associated_allocator to implement property propagation in detail::cancellable_handler.
4. Comments on docs. These are not acceptance criteria.
- The reference lists is_authenticator (https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/is_authenticator.htm...), but it looks like it's in the detail namespace. Either promote it to the public namespace, or hide it from the docs (you can still create a page for the concept even if you don't expose the actual type). - The reference lists the phrase "On immediate completion, invocation of the handler will be performed in a manner equivalent to using boost::asio::post.". This is not true, since you support immediate executors. - Some property types are listed as scalars, when they are really vectors of scalars. This might be confusing. For instance, https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/publish_props.html lists subscription_identifier as std::string, when it's actually std::vector<std::string>. Same applies for user_property. - As of Boost 1.86, timeouts should be implemented using asio::cancel_after, rather than by awaitable operators or parallel groups. I suggest replacing the examples on timeouts by a single one using asio::cancel_after. - Type requirements for ExecutionContext (https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/ExecutionContext.htm...) are not correct. You can probably just link to Asio's description: https://www.boost.org/doc/libs/1_86_0/doc/html/boost_asio/reference/Executio.... - The multithreaded example recommends the following code to run callbacks though a strand:
boost::asio::post( strand, [&client, &strand] { // Considering that the default associated executor of all completion handlers is the strand, // it is not necessary to explicitly bind it to async_run or other async_xxx's handlers. client.async_run(boost::asio::detached); } );
This form of post is not equivalent to asio::post(asio::bind_executor(ex, ...)), and can yield surprising behavior if the token has a bound executor. Furthermore, post might be less efficient than dispatch, and should only be used if a reschedule is required (which is not the case). I'd advise to change the above by:
boost::asio::dispatch(boost::asio::bind_executor( strand, [&client, &strand] { // Considering that the default associated executor of all completion handlers is the strand, // it is not necessary to explicitly bind it to async_run or other async_xxx's handlers. client.async_run(boost::asio::detached); } ));
- The multi-threaded example uses asio::io_context and a vector of threads. This is error-prone. It can be simplified by using asio::thread_pool, asio::thread_pool::attach and asio::thread_pool::join. - The coroutine examples can be simplified with the latest Asio additions. At the moment, asio::deferred is recommended over asio::use_awaitable for efficiency reasons. With partial tokens, async_xxx(asio::as_tuple(asio::deferred)) can be replaced by async_xxx(asio::as_tuple). - The coroutines examples shouldn't use asio::detached with asio::co_spawn, as this swallows any thrown exceptions. It's better to use [](std::exception_ptr e) { if(e) std::rethrow_exception(e); }, as this propagates exceptions to main. - It's probably better to avoid the unified header in the examples. We should encourage including individual headers, instead, as this is better for compile times. - In https://spacetime.mireo.com/async-mqtt5/async_mqtt5/connection_maintenance.h...: should be "the MQTT <protocol>" (missing word). - https://spacetime.mireo.com/async-mqtt5/async_mqtt5/optimising_communication...: typo: "simultanious". - https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/operator...: moved-from state probably supports being assigned to, too. - https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/tls_cont...: render error in the 1st template parameter. - https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/authenti...: render error in the 1st template parameter. - https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/authority_path.html: member descriptions are empty.
Thanks Ivica for proposing the library. I really hope to see it in Boost in the near future.
Regards, Ruben.
dr. Ivica Siladic Chief Technology Officer Member of the Board Mireo d.d. T: +385 (1) 6636966 E: ivica.siladic@mireo.com www.mireo.com