On 18.10.2024., at 17:56, Ruben Perez
Hi Ivica,
First of all, thanks for submitting your library to Boost.
Thanks a lot Ruben for taking a look into the library and especially for your comments bellow!
I've been reading the discussion and reference and have some questions for you:
1. The limitation listed on https://spacetime.mireo.com/async-mqtt5/async_mqtt5/auto_reconnect.html about concealing configuration issues looks concerning, as it can be very frustrating for the user. As far as I know, Boost.Redis solved it by including logging as part of its API, and Boost.MySQL did it by providing an extra diagnostics output argument, populated on operation cancellation, that informs the user of potential causes to the problem. Neither of us are very happy with our solution, to be frank. But I think it's a point that might be worth solving before going into Boost. As an option, I have been looking into abusing Asio handler tracking (https://live.boost.org/doc/libs/1_86_0/doc/html/boost_asio/overview/core/han...) to log custom events. BOOST_ASIO_HANDLER_OPERATION might be a possibility. What are your thoughts regarding this?
We’ve had extensive internal discussions about logging. We’re fully aware that the lack of logging can be a major issue for users. For example, if you don’t properly set up CA storage for an SSL connection, the TLS handshake will continually fail, and the client will keep trying to reconnect. As a user, you’d have no indication of what went wrong. Throughout the library, we’ve been careful to ensure that "you don’t get for what you did’t pay," especially when it comes to performance. In terms of logging, this means we wanted to avoid any runtime overhead associated with logging if you choose not to use it, even simple `if` checks. For this reason, we didn’t want to use something like `bool logging` to enable or disable logging. Another option was to use a (defaulted) template parameter that would generate logging code if enabled. However, this cluttered the code significantly, as nearly every class would need an additional template parameter. Worse yet, in most cases where logging is needed — such as during reconnections — it was unclear what meaningful, helpful messages to log. The Stream template parameter’s interface can report a variety of errors, and without knowing the exact nature of the Stream, it’s not always possible to interpret them correctly. Using a macro like MQTT5_ENABLE_LOGGING might be the best option, but for now, we wanted to avoid reducing code readability by adding logging lines throughout the code. In the absence of a better solution, we recommend users add their own logging code directly into the async_mqtt5 source during development. It’s very far from ideal, but we couldn’t come up with a better approach.
2. Quoting the allocators section (https://spacetime.mireo.com/async-mqtt5/async_mqtt5/asio_compliance/allocato...): "[...] associates the allocator of the first packet in the queue to the low-level Boost.Asio function async_write on the transport layer". Does this mean that the queue might use the allocator bound to the first async_publish operation that is issued? If this is the case, wouldn't it be violating the requirement that all memory allocated by an associated allocator is deallocated before the async operation completes?
It's a bit more complicated. The internal queue is a std::vector with an explicit asio::recycling_allocator as the custom allocator. The queue contains references (not values) to the actual MQTT data, along with type-erased completion handlers. The actual MQTT data (packet) is stored in memory that was allocated using the allocator associated with the corresponding handler. When we call async_write on the underlying stream, we supply the entire write queue at once as a ConstBuffers parameter, effectively performing a scatter/gather write. The allocator from the first handler in the queue is associated with the completion handler of async_write. Once async_write completes, we clear the queue and complete each (outer) handler associated with the packets in the queue. The memory allocated for the MQTT packets is also released before the handler is invoked. As a result, all memory allocated by an associated allocator is deallocated before the asynchronous operation completes.
3. Quoting the executors section (https://spacetime.mireo.com/async-mqtt5/async_mqtt5/asio_compliance/executor...): "The same executor must execute mqtt_client::async_run and all the subsequent async_xxx operations". This sounds like an unusual restriction. What happens if it is violated? Wouldn't it make more sense to use the executor associated with async_run for intermediate handlers, and then the async_xxx associated executor for the final one?
The `async_run` function starts the internal stream read loop. The executor used for the read operation is either the one associated with the `async_run` completion handler or the default executor provided to the `mqtt_client` constructor. Asynchronous operations like `async_publish` are composed operations that involve both writing to stream and reading messages. These operations must be synchronized with the read loop. If we were to use a different executor for `async_publish` than the one used for the internal read loop, we would need to synchronize access to shared data (like read buffers) and, more importantly, to the underlying stream’s `async_read` and `async_write` operations. This is why the same executor must be used for both `async_run` and other asynchronous operations.
4. I noticed that async_mqtt5 does not implement persistent and clean sessions. I'm no MQTT expert, so I can't evaluate whether these features are relevant or not - I just know that they exist. Is there a reason why you chose to not implement them? If so, I'd suggest adding these reasons somewhere in the docs.
We always initiate the connection with Clean Start set to 0. This is due to the automatic reconnect functionality, which cannot be overridden. From the user's perspective, the mqtt_client interface behaves as though the underlying TCP connection is always "on." If the connection drops, the client will do everything it can to restore the session, which is the essence of Clean Start being set to false. Allowing Clean Start to be true would mean all pending async_publish operations would need to be canceled, leaving it up to the user to decide what to do with those packets. In most cases, the user would likely re-send them, especially when using QoS > 0, which is exactly what the client automatically handles when Clean Start is set to false. In short, we think that automatic reconnect and Clean Start = true are not logically compatible.
5. According to what I read in the docs, message payloads might contain UTF-8 encoded text or just data blobs, depending on the payload_format_indicator property passed to publish. However, std::string is always used for payloads. As I understand it, if I want to send a binary blob, I need to copy my blob to a string (probably with a reinterpret_cast in the way), and then setting payload_format_indicator. Would it make sense to expose a simpler API taking a blob (e.g. a std::vector<unsigned char>) to make this use case safer?
We use std::string instead of std::vector<unsigned char> for MQTT message payloads because, in practice, people often use strings (like JSON) as MQTT messages. This is also hinted at in the MQTT specification, which includes the payload_format_indicator with only two possible values: 0 and 1. A value of 0 means the format is unspecified, while 1 indicates a UTF-8 string. Since they gave special significance to UTF-8 strings, it made sense for us to use std::string as the more common format. However, you don’t need to explicitly set payload_format_indicator to 1 or 0, regardless of what you’re sending. The payload_format_indicator is an optional property that the broker may use to validate the payload. If you omit this property, the broker will treat the payload as an unspecified binary array and won’t perform any validation checks. Therefore, there's no need to explicitly set the payload_format_indicator to 0 in such cases. To address your related question—why we decided to use an owning type for the message payload—it’s due to the symmetry between async_publish and async_receive. In async_receive, the payload is obtained through the completion handler's arguments. When messages are received, they are stored (moved) into the Asio channel, and the underlying buffer used for stream reading is reused. Since messages might remain in the channel for some time (until the user reads them), their payloads are moved (or copied) away from the operating read buffer. Yes, async_publish involves an extra payload copy. We were aware that the typical pattern is to keep the buffer alive until the async operation completes. While this extra copy can be avoided at the cost of slightly more complex code, in our performance tests, we found no significant impact from the additional copy. This is likely because copying results in the entire packet occupying a contiguous block of memory, rather than being split into two parts, which would otherwise require scatter/gather writes. We believe this is related to L1 cache efficiency.
6. I can't find any docs (other than the examples) on how to use TLS. From the simplest example (https://spacetime.mireo.com/async-mqtt5/async_mqtt5/hello_world_over_tls.htm...) it looks like setting up TLS requires defining some non-trivial template specializations. I'd like these to be documented. On the other hand, the specializations seem to be using only Boost.Asio types, rather than user-defined types. What would happen if two libraries separately define these hooks because they need MQTT over TLS? Wouldn't it make more sense to provide an optional header providing these functions?
First of all, you're correct—it seems we didn’t properly document the TLS customization points. The example you mentioned (hello_world_over_tls.cpp) includes code that shows what needs to be defined to use a boost::asio::ssl stream. The first mandatory customization point is a specialization of the tls_handshake_type template, which must contain two constexpr values: client and server. The second customization point is the assign_tls_sni function, which will be called when SNI (Server Name Indication) needs to be set. The TLS stream used by async_mqtt5 isn’t limited to boost::asio::ssl. In fact, we’ve designed it to work with both boost::asio::ssl and Botan Asio streams, to ensure we cover at least two different cases. In this setup, the TlsContext is fully opaque to mqtt_client, meaning that mqtt_client doesn’t make any assumptions about the TlsContext type other than it being move-constructible. The stream itself uses this context, and it must be compatible with the stream. We simply guarantee that the TlsStream instance will remain alive as long as the client is alive.
7. What's the use of the TlsContext type in mqtt_client? It looks very coupled to asio::ssl::stream.
As mentioned above, for boost::asio::ssl you would use boost::asio::ssl::context. For Botan TLS stream, there's equivalent context type.
8. I've noticed that async_receive returns with an error code whenever a reconnection occurs, which forces the user to re-subscribe to topics. This contrasts with how async_publish works, where reconnection is transparent to the user. Is there a reason for this difference? Would it make sense to implement automatic resubscription on reconnection?
Unlike the reconnect logic, we wanted to expose potential async_subscribe MQTT errors directly to the user. For example, an attempt to subscribe to a topic might result in an MQTT error like "Quota exceeded." There are nine different subscription error codes, and users may need to make different decisions based on the specific error. That’s why users must explicitly re-subscribe if they choose to do so after receiving an error code from the async_receive function.
Regards, Ruben.
With regards, Ivica Siladic