Re: [boost] Proposing a MySQL client library for inclusion in Boost

1 Apr 2022

      On Fri, 1 Apr 2022 at 12:32, Marcelo Zimbres Silva <mzimbres@gmail.com>
wrote:
...
Hi Ruben,
...
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...
...
The handler signature for this operation is
void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Hier the completion handler is returning an io object i.e.
boost::mysql::resultset, which is also unusual as far as I
can see.  Does that play nice with executors? I would
prefer having the completion handler return config
parameters with which I can instantiate a
boost::mysql::resultset myself in my desired executor.
The three I/O objects this library provides (connection,
resultset and prepared_statement) are just proxies for the
underlying Stream I/O object in terms of the executor (i.e.
they just return the underlying Stream::get_executor()
value).
My expectation is that the communication with the mysql server occurs
only through the connection class. It feels awkward to me that I need
these proxy objects when there is a connection object around, the
lifetime of these proxies are bound to it for example.
In addition to a pointer to the connection object, resultsets also hold
several pieces of metadata about rows. Because of how the MySQL
protocol works, these pieces of metadata are mandatory to parse rows.
The MySQL server sends these as several packets following
connection::query or prepared_statement::execute, and the library
stores these in the resultset object. In my opinion, you need an object
representing this information about how to interpret a resultset.
Something similar happens with prepared_statement.

Something we could do is remove the pointer to the connection
from the resultset and prepared_statement objects, making these
just plain data objects. You would have the following function signatures
in connection (I'll just list the sync-with-exception ones for brevity):

void connection::query(string_view query, resultset& output);
void connection::prepare_statement(string_view statement,
prepared_statement& output);
void connection::execute_statement(const prepared_statement& stmt,
resultset& output);
void connection::read_row(resultset& resultset, row& output);

The drawback of this approach is that you should always
call execute_statement and read_row on the connection object
that created the statement or resultset object, and with this approach
there is nothing enforcing this. I picked the other approach because
I feel it is more semantic. Would these signatures make more sense for you?

It is true that the current approach ties the lifetime of
prepared_statement and resultset to connection's lifetime. It is indicated
in the docs here
<https://anarthal.github.io/mysql/mysql/resultsets.html#mysql.resultsets.complete>,
but maybe we could make this point clearer in the reference.

I would also like to listen to what other members in the community think
about this.
...
Some further points:
- A row in your library is basically std::vector<value>, which means
std::vector<row> is pessimized storage for rows with same length.
Aren't rows always equal in length for a table?
The std::vector<row> overloads are not optimized for performance, that is
true. If you're striving for better performance, I would go for not having
a vector at all, and reading one row at a time with resultset::read_one,
which
allows you to re-use the same row over and over, avoiding allocations
almost at all.

What would be your suggestion for the vector<row> approach? Having an
ad-hoc, matrix-like container for values?
...
- Your value class is a variant behind the scenes i.e.
boost::variant2::variant<null_t, std::int64_t, std::uint64_t,
boost::string_view, float, double, date, datetime, time>;
I would like to understand what is the lifetime of the string_view.
Is it pointing to some kind of internal buffer? I expect all data to
be owned by the client class after the completion of an async
operation. Having pointers pointing to the connection internal state
feels bad.
The string_view's point to memory owned by rows, and never to the
connection internal buffers. As long as you keep your row object
alive, its values will be valid. This is also the reason why row objects
are movable but not copyable. Currently, row packets are read into
the row buffers directly and parsed in-place, to avoid copying.
This is stated in the row object
<https://anarthal.github.io/mysql/mysql/ref/boost__mysql__row.html> docs,
but it may be worth noting it
in the value object docs, too.

Having all the variant alternatives as cheap-to-copy objects
has the advantage of being able to implement functions with
conversions, like value::get or value::get_optional, without
worrying about expensive copies.
...
- There one further point that doesn't seem to play well with this
variant design. Some people may store some kind of serialized data in
the database e.g. json strings. It would be nice if it were possible
to read those values avoiding temporaries, that is important to reduce
latency and memory usage on large payloads. It would look something
like
struct mydata1, mydata2;
  using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>
myrow row;
  result.read_row(row);
In this case, the parser would have to call user code every time it
reads user data in the socket buffer, instead of copying to additional
buffers ( in order to hand it to user later when parsing is done).
I have in mind implementing this in future versions. My idea is creating
a concept similar to Beast's http::basic_parser
<https://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/ref/boost__beast__http__basic_parser.html>.
That row_parser object
should be a type with member functions like:

row_parser::on_value(std::uint8_t)
row_parser::on_value(std::uint16_t)
// And so on, for all the possible types MySQL allows
// Strings could have special handling, implementing incremental parsing
for them

That would be called like:

struct mydata1, mydata2;
using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>;
struct my_parser {
    myrow r;
    error_code on_value(std::uint8_t) { /* place the value in your row
object }
};

my_parser parser;
resultset.read_row(parser);

I'm not keen on implementing a higher-level system that allows the user
to describe rows and to pass them to read_row, as you described,
because I think this is typical ORM functionality, and thus should be
implemented by a higher-level component. ORMs usually need to attach
more information to row objects than the one we would need to parse the rows
(e.g. information to issue CREATE TABLE statements).
...
Marcelo
Ruben

Re: [boost] Proposing a MySQL client library for inclusion in Boost

Ruben Perez