On Fri, 1 Apr 2022 at 12:32, Marcelo Zimbres Silva
Hi Ruben,
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...
The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.
The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).
My expectation is that the communication with the mysql server occurs only through the connection class. It feels awkward to me that I need these proxy objects when there is a connection object around, the lifetime of these proxies are bound to it for example.
In addition to a pointer to the connection object, resultsets also hold several pieces of metadata about rows. Because of how the MySQL protocol works, these pieces of metadata are mandatory to parse rows. The MySQL server sends these as several packets following connection::query or prepared_statement::execute, and the library stores these in the resultset object. In my opinion, you need an object representing this information about how to interpret a resultset. Something similar happens with prepared_statement. Something we could do is remove the pointer to the connection from the resultset and prepared_statement objects, making these just plain data objects. You would have the following function signatures in connection (I'll just list the sync-with-exception ones for brevity): void connection::query(string_view query, resultset& output); void connection::prepare_statement(string_view statement, prepared_statement& output); void connection::execute_statement(const prepared_statement& stmt, resultset& output); void connection::read_row(resultset& resultset, row& output); The drawback of this approach is that you should always call execute_statement and read_row on the connection object that created the statement or resultset object, and with this approach there is nothing enforcing this. I picked the other approach because I feel it is more semantic. Would these signatures make more sense for you? It is true that the current approach ties the lifetime of prepared_statement and resultset to connection's lifetime. It is indicated in the docs here https://anarthal.github.io/mysql/mysql/resultsets.html#mysql.resultsets.comp..., but maybe we could make this point clearer in the reference. I would also like to listen to what other members in the community think about this.
Some further points:
- A row in your library is basically std::vector<value>, which means std::vector<row> is pessimized storage for rows with same length. Aren't rows always equal in length for a table?
The std::vector<row> overloads are not optimized for performance, that is true. If you're striving for better performance, I would go for not having a vector at all, and reading one row at a time with resultset::read_one, which allows you to re-use the same row over and over, avoiding allocations almost at all. What would be your suggestion for the vector<row> approach? Having an ad-hoc, matrix-like container for values?
- Your value class is a variant behind the scenes i.e.
boost::variant2::variant
; I would like to understand what is the lifetime of the string_view. Is it pointing to some kind of internal buffer? I expect all data to be owned by the client class after the completion of an async operation. Having pointers pointing to the connection internal state feels bad.
The string_view's point to memory owned by rows, and never to the connection internal buffers. As long as you keep your row object alive, its values will be valid. This is also the reason why row objects are movable but not copyable. Currently, row packets are read into the row buffers directly and parsed in-place, to avoid copying. This is stated in the row object https://anarthal.github.io/mysql/mysql/ref/boost__mysql__row.html docs, but it may be worth noting it in the value object docs, too. Having all the variant alternatives as cheap-to-copy objects has the advantage of being able to implement functions with conversions, like value::get or value::get_optional, without worrying about expensive copies.
- There one further point that doesn't seem to play well with this variant design. Some people may store some kind of serialized data in the database e.g. json strings. It would be nice if it were possible to read those values avoiding temporaries, that is important to reduce latency and memory usage on large payloads. It would look something like
struct mydata1, mydata2; using myrow = std::tuple
myrow row; result.read_row(row);
In this case, the parser would have to call user code every time it reads user data in the socket buffer, instead of copying to additional buffers ( in order to hand it to user later when parsing is done).
I have in mind implementing this in future versions. My idea is creating
a concept similar to Beast's http::basic_parser
https://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/ref/boost__b....
That row_parser object
should be a type with member functions like:
row_parser::on_value(std::uint8_t)
row_parser::on_value(std::uint16_t)
// And so on, for all the possible types MySQL allows
// Strings could have special handling, implementing incremental parsing
for them
That would be called like:
struct mydata1, mydata2;
using myrow = std::tuple
Marcelo
Ruben