Re: [boost] Review of Boost.MySql

11 May 2022

      On Tue, 10 May 2022 at 22:33, Marcelo Zimbres Silva via Boost
<boost@lists.boost.org> wrote:
...
Hi, here is my review.
Thanks for taking your time to write your review.
...
1) ================================
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea...
The handler signature for this operation is
void(boost::mysql::error_code, std::vector<boost::mysql::row>).
etc.
Having a std::vector in the signature of the completion handler is
unusual in ASIO.  In general, large objects should be passed as
parameters to the initiating function.
Additionally, this operation does not support custom allocators. That
wouldn't be a big problem to me if I could reuse the same memory for
each request but the interface above prevents me from doing so.
It is also a pessimized storage as each row will allocate its own
chunk of memory. A single block of memory would be better.
Those functions don't strive for efficiency, that is true. I think they can be
useful for people using callbacks, for example. I see two options here,
and I'm happy with both of them:

a) Change the function signature to
async_read_all(vector<row, Allocator>&, CompletionToken&&), with the completion
handler signature being void(error_code)
b) Remove the functionality altogether.

I'm happy to make a) or b) acceptance criteria for inclusion. I've raised
https://github.com/anarthal/mysql/issues/58 to track this.
...
2) ================================
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...
The handler signature for this operation is
void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Here the completion handler is returning an io object i.e.
boost::mysql::resultset, which is also unusual as far as I can see.
Wouldn't it make sense for users to specify a different executor for
the resultset? This interface is preventing that.
Please note that `connection`, `resultset` and `prepared_statement`
always use the same executor as the underlying `Stream` object, as
returned by `connection::next_layer`. `resultset` and `prepared_statement`
are proxy I/O objects to make the operations more semantic, but end up
in calls to the underlying `Stream` read and write functions. I don't think
it is possible at all to have separate executors for different objects
in this context.
...
I prefer having the completion handler return config parameters with
which I can instantiate a boost::mysql::resultset myself in my desired
executor.
As pointed above, I don't think this makes sense, unless I'm missing
something. However, returning resultsets by lvalue reference instead of
as completion handler arguments may be positive for efficiency, as it
would allow more memory reuse. Resultsets hold table metadata,
which is required to parse rows and lives in dynamic storage.
There would be no performance gain for prepared_statatetment's,
as these perform no allocations.

I'm happy for this to become an acceptance condition, too,
but please note that it won't allow different executors in any case.
I've raised https://github.com/anarthal/mysql/issues/59
to track the issue.
...
3) ================================
...
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I am not convinced the design of the value class is a good idea.  The
reason is that users know at compile time what types they are
expecting from a given query. That means it would be possible for them
to specify that before doing the query. So instead of doing this
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
boost::mysql::row row;
   while (result.async_read_one(row, additional_info, yield[ec]))
users could do
boost::mysql::row<T1, T2, T3, T3> row;
   while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even
possible to parse directly in a custom data structure, for example,
when storing json string in the database.
I see different use cases here:

a) Parsing a row into a data structure known at compile-time. Rather than
a row<T1, T2, T3, T4> I would go for a std::tuple<T1, T2, T3, T4> and
a describe-able struct (annotated by BOOST_DESCRIBE_STRUCT).
b) Parsing into custom data structures with arbitrary user code. This
would be the case when parsing a JSON directly into a user-provided
data structure.
c) Parsing a row where the schema is not known at compile-time. This
would be the case when implementing a MySQL command-line interface,
for example.

The current interface exposes c) natively, and would require a) and b)
to be implemented on top of it by the user. Support for a) can be added by
adding overloads to resultset::read_one. Supporting b) natively would require
something like Beast's HTTP parser.

I would say the main gain here is on user code simplicity rather than
efficiency. There may be some efficiency gains when working with long
strings, but I don't think they are relevant when compared to I/O operation
overhead.

Implementing a) natively is a decent amount of work but can be done.
I would be reluctant to expose an interface for b) from the beginning as
it increases the API surface of the library. In any case, I wouldn't get
rid of the variant interface, as I think it can be useful for some use cases.

I've raised https://github.com/anarthal/mysql/issues/60 to address this.
...
4) ================================
...
https://anarthal.github.io/mysql/mysql/examples.html
IMO, the examples are too limited and show variation of the same
functionality. For example
- Text query, async with callbacks
      - Text query, async with futures
      - Text query, async with Boost.Coroutine coroutines
      - Text query, async with C++20 coroutines
      - Default completion tokens
Here you are showing nice Asio features. I am not meaning you should
remove them but if I am not mistaken there is no example that shows
- How to keep long lasting connections open (specially for ssl).
Unless I'm missing something obvious, I don't think there is anything to
do on the user side to keep connections open. Once you connect the
underlying `Stream` and call `connection::handshake` (or the composed
function `connection::connect`), the connection will remain open until:

a) It is closed by the user (e.g. by calling `connection::close`).
b) The connection remains idle by a certain amount of time, as defined
by the MySQL variable
https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_....
By default, this timeout is 8h. It can be changed at the database level and on a
per-connection basis, by issuing a `SET SESSION` command
(e.g. `conn.query("SET SESSION wait_timeout = 3600");` would set the
connection timeout to 1h). Note that this is set *after* the connection is
established.

Unless I'm missing something, there is nothing special in handling
SSL long-lasting connections - keep alives happen at the TCP level.
There is no keep-alive mechanism at the MySQL protocol level.

I've raised https://github.com/anarthal/mysql/issues/61
to add an example demonstrating b).
...
- Timers for resolve, connect, read and write.
There is an example on how to use Asio cancellation
mechanism together with C++20 coroutines and Asio timers to implement
timeouts for name resolution, connection establishment and queries here:
https://anarthal.github.io/mysql/mysql/examples/timeouts.html
Is this what you mean?
...
- Synchronized writes.
Not following you here, could you please elaborate on this particular
use case?
...
- Healthy checks, failover etc.
At the protocol level, there is a COM_PING command
(https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_com_ping.htm...)
that allows the user to check whether the server is still alive or
not. Issuing that
requires additional library support. The interface for this would be
something like
`void connection::ping(error_code&)`. It would return an empty error
code on success
and an appropriate error code on failure. I guess this can be
currently be work-arounded by issuing a dummy SQL query to the server as
a health-check.

I've raised https://github.com/anarthal/mysql/issues/62
to track the PING addition.

I need to further investigate failover in replica set scenarios.
I hope to do it today evening. I've raised
https://github.com/anarthal/mysql/issues/63
to track this.
...
I estimate that to address these points users will be required to
write further 500 - 1000 LOC on top of your library. Otherwise they
will be stuck on underperforming short lived connections.
It would be helpful for users to have a starting point to begin with.
There some minor things I have also noticed:
5) ================================
...
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I expect this function to be O(1).
Raised https://github.com/anarthal/mysql/issues/64
to track it.
...
6) ================================
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different
versions of optional. I suggest using the boost version until we
transition to C++17.
Why do you think it can be a problem? I would say having both just targets
a broader audience: those using `boost::optional` and those using
`std::optional`,
without compromising either.
...
7) ================================
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__errc.html
errc should be conditions and not error AFAICS.
Not following you here - errc here is just acting the same as
`boost::beast::http::error` enum.
...
============= Review End ================
I vote for conditional acceptance. Points 1, 2, 3 and 4 should be
addressed before acceptance. My acceptance criteria sums up to
1) Remove functionality that provides low performance. With
coroutines it will be easy for users to loop on async_query to create
a std::vector of rows.
As I mentioned above, either removing or switching to return-by-lvalue
seems OK as an acceptance condition for me.
...
2) No IO object should be returned as a parameter of the completion handler.
I can accept this as a performance improvement, but please read the note
on executors above.
...
3) Review the value type. Can it be improved?
Does this mean that you consider parsing into compile-time
user-defined data structures (e.g. `tuple`) an acceptance condition?
...
4) Improve examples.
Will inform as soon as I have an answer on failover, happy to provide
the other ones.
...
Regards,
Marcelo
Regards,
Ruben.
...
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] Review of Boost.MySql

Ruben Perez