[boost] MySql review

11 May 2022

      Here is my review of Ruben Perez's proposed MySql library.

Background
----------

I have previously implemented C++ wrappers for PostgreSQL and
SQLite, so I have some experience of what an SQL API can look
like. I know little about ASIO.

I have also recently used the AWS SDKs for C++ and Javascript to
talk to DynamoDB; this has async functionality, which is interesting
to compare.

I confess some minor disappointment that MySql, rather than
PostgreSQL or SQLite, is the subject of this first Boost database
library review, since those others have liberal licences that
are closer to Boost's own licence than MySql (and MariaDB). But
I don't think that should be a factor in the review.

Trying the library
------------------

I have tried using the library with

- g++ 10.2.1, Arm64, Debian Linux
- ASIO from Boost 1.74 (Debian packages)
- Amazon Aurora MySql-compatible edition

I've written a handful of simple test programs. Everything works
as expected. Compilation times are a bit slow but not terrible.

The remainder of this review approximately follows the structure
of the library documentation.

Introduction
------------

I note that "Ease of use" is claimed as the first design goal,
which is good.

I feel that some mention should be made of the existing C / C++
APIs and their deficiencies. You should also indicate whether or
not the network protocol you are using to communicate with the
server is a "public" interface with some sort of stability
guarantee. (I guess maybe it is, if it is common to MySql and
MariaDB.)

Tutorial
--------

The code fragments should start with the necessary #includes,
OR you should prominently link to the complete tutorial source
code at the start.

You say that "this tutorial assumes you have a basic familiarity
with Boost.Asio". I think that's unfortunate. It should be
possible for someone to use much of the library's functionality
knowing almost nothing about ASIO. Remember your design goal of
ease-of-use. In fact, it IS possible to follow the tutorial with
almost no knowledge of ASIO because I have just done so.

You have this boilerplate at the start of the tutorial:

  boost::asio::io_context ctx;
  boost::asio::ssl::context ssl_ctx (boost::asio::ssl::context::tls_client);
  boost::mysql::tcp_ssl_connection conn (ctx.get_executor(), ssl_ctx);
  boost::asio::ip::tcp::resolver resolver (ctx.get_executor());
  auto endpoints = resolver.resolve(argv[3], boost::mysql::default_port_string);
  boost::mysql::connection_params params (
    argv[1], // username
    argv[2]  // password
  );
  conn.connect(*endpoints.begin(), params);
    // I guess that should really be doing something more
    // intelligent than just trying the first endpoint, right?

I would like to see a convenience function that hides all of that:

  auto conn = boost::mysql::make_connection( ...params... );

I guess this will need to manage a global, private, ctx object
or something.

There are various possibilities for the connection parameters to
pass to that, e.g.

  connection_params params = {
    .hostname = ".....",
    .port = 3306,          // why is that a string in yours?
    .user = "admin",
    .password = "12345",
    .dbname = "foo"
  };

  make_connection(params);

Or

  make_connection("mysql://admin:12345@hostname:3306/dbname");

Now... why the heck does your connection_params struct use
string_views? That ought to be a Regular Type, with Value
Semantics, using std::strings. Is this the cult of not using
strings because "avoid copying above all else"?

Another point about the connection parameters: you should
provide a way to supply credentials without embedding them
in the source code. You should aim to make the secure option
the default and the simplest to use. I suggest that you
support the ~/.my.cnf and /etc/my.cnf files and read passwords
etc. from there, by default. You might also support getting
credentials from environment variables or by parsing the
command line. You could even have it prompt for a password.

Does MySQL support authentication using SSL client certs?
I try to use this for PostgreSQL when I can. If it does, you
should try to support that too.

About two thirds of the way through the tutorial, it goes from
"Hello World" to retrieving "employees". Please finish the hello
world example with code that gets the "Hello World" string
from the results and prints it.

Queries
-------

I encourage you to present prepared queries first in the
documentation and to use them almost exclusively in the tutorial
and examples.

You say that "client side query composition is not available".
What do you mean by "query composition"? I think you mean
concatenating strings together'); drop table users; -- to
form queries, right? Is that standard MySql terminology? I
suggest that you replace the term with something like
"dangerous string concatenation".

In any case, that functionality *is* available, isn't it!
It's trivial to concatenate strings and pass them to your
text query functions. You're not doing anything to block that.
So what you're really saying is that you have not provided any
features to help users do this *safely*. I think that's a serious
omission. It would not be difficult for you to provide an
escape_for_mysql_quoted_string() function, rather than having
every user roll their own slightly broken version.

IIRC, in PostgreSQL you can only use prepared statements
for SELECT, UPDATE, INSERT and DELETE statements; if you
want to do something like

   ALTER TABLE a ALTER COLUMN c SET DEFAULT = ?
or CREATE VIEW v as SELECT * FROM t WHERE c = ?

then you are forced to do string concatenation with escaping
of parameters. But according to
https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-statements.html,
MySQL allows more statements to be prepared. In what cases
can prepared statements not be used? Perhaps non-prepared
statements really can be hidden as mysql::unsafe::statement?

Regarding your prepared statements:

tcp_ssl_prepared_statement is verbose. Why does the prepared
statement type depend on the underlying connection type?
I have to change it if I change the connection type?! If that's
unavoidable, I suggest putting a type alias in the connection
type:

  connection_t conn = ....;
  connection_t::prepared_statement stmt(.....);

Does MySql allow numbered or named parameters? SQLite supports
?1 and :name; I think PostgreSQL uses $n. Queries with lots of
parameters are error-prone if you just have ?. If MySql does
support this, it would be good to see it used in some of the
examples.

Invoking the prepared statement seems unnecessarily verbose.
Why can't I just write

  auto result = my_query("hello", "world", 42);

? That's a variadic operator().

In my PostgreSQL library, I chose to specify the query parameter
types as template parameters to the query (prepared statement)
type:

  Query<std::string, std::string, int> my_query(".....");

Query's operator() takes exactly those types (maybe with some
TMP to use std::string_view where std::string is specified).
With hindsight, a default where all parameters are strings
would have been useful in practice.

I also added Query variants where the result is expected to be

- A single value, e.g. a SELECT COUNT(*) statement.
- Empty, e.g. INSERT or DELETE.
- A single row.
- Zero or one rows.
- A single column.

This is just syntactic sugar but it makes life easier. For
example:

  SingletonQuery<int, string> sessions_for_user("SELECT COUNT(*) FROM sessions WHERE username = ?");
  // int is the return type, string is the one parameter type.
  auto n_sessions = sessions_for_user("phil");

I don't see anything about the result of an INSERT or UPDATE.
PostgreSQL tells me the number of rows affected, which I have
found useful to return to the application.

resultset, row and value
------------------------

I'm not enthusiastic about the behaviour nor the names of these
types:

- resultset is not a set of results. It's more like a sequence of
rows. But more importantly, it's lazy; it's something like an
input stream, or an input range. So why not actually make it
an input range, i.e. make it "a model of the input_range concept".
Then we could write things like:

  auto result = ...execute query...
  for (auto&& row: result) {
    ...
  }

- row: not bad, it does actually represent a row; it's a shame
it's not a regular type though.

- value: it's not a value! It doesn't have value semantics!

I guess it is time to address the string_view issue.....

I just modified one of my example programs to copy from the
string_view in the variant into a std::string, and it slowed
the program down by... no, it actually went fractionally faster.

The use of string_views in the value variant seems to me to
be a premature optimisation. There are significant disadvantages
to using string_view here. Your default API should have value
semantics. If you want, provide another "low copying" API for
users who want it with names that make it clear what they are
doing.

The problem is that the lifetime of the string_view is tied
to the row in a non-obvious way.

I'm also uncertain that a variant for the individual values
is the right solution here. All the values in a column should
have the same type, right? (Though some can be null.) So I
would make row a tuple. Rather than querying individual values
for their type, have users query the column.

MySQL to C++ mapping reference
------------------------------

It seems odd that MySQL small integers all map to C++ 64-bit
types.

I use NUMERIC quite a lot in PostgreSQL; I don't know if the
MySql type is similar. I would find it inconvenient that it
is treated as a string. Can it not be converted to a numeric
type, if that is what the user wants?

Need to consume the entire resultset
------------------------------------

I seem to get an assertion if I fail to read the resultset
(resultset.hpp:70). Could this throw instead?

Or, should the library read and discard the unread results
in this case?

Going Async
-----------

I have tried this with both callbacks and C++20 coroutines
and it seems to work as expected. I'm actually delighted to
see the support for coroutines here.

But the lack of protocol support for multiple in-flight queries
immediately becomes apparent. It almost makes me question
the value of the library - what's the point of the async
support, if we then have to serialise the queries?

Should the library provide this serialisation for us? I.e.
if I async_execute a query while another is in progress, the
library could wait for the first to complete before starting
the second.

Or, should the library provide a connection pool? (Does some
other part of ASIO provide connection pool functionality that
can be used here?)

Transactions
------------

I have found it useful to have a Transaction class:

  {
    Transaction t(conn);  // Issues "BEGIN"
    .... run queries ....
    t.commit();           // Issues "COMMIT"
  }                       // t's dtor issues "ROLLBACK" if we have not committed.

Nested transactions are an interesting problem.

Trademark
---------

Klemens Morgenstern makes the point that MySql is a trademark of
Oracle. Calling this "Boost.MySql" doesn't look great to me.
How can you write "The Boost MySql-compatible Database Library"
more concisely?

Conclusion
==========

Good points: It actually works! Great to see coroutines!

Bad points: The resultset / row / value type system is a poor
design. A lot of boilerplate could be hidden from many users.
Connection pool may be required for useful performance. Several
features should be re-worked to be secure by default. Use of
MySQL trademark needs to be considered.

Overall, I think this proposal needs a fair amount of API
re-design and additional features to be accepted, and should
be rejected at this time. It does seem to be a good start 
though!

Thanks to Ruben for the submission.

(I have spend one working day on this review. I have not looked
at any of the code.)

Regards, Phil.

[boost] MySql review

Phil Endecott