Review of Boost.MySql

Hi, here is my review. 1) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea... The handler signature for this operation is void(boost::mysql::error_code, std::vector<boost::mysql::row>). etc.
Having a std::vector in the signature of the completion handler is unusual in ASIO. In general, large objects should be passed as parameters to the initiating function. Additionally, this operation does not support custom allocators. That wouldn't be a big problem to me if I could reuse the same memory for each request but the interface above prevents me from doing so. It is also a pessimized storage as each row will allocate its own chunk of memory. A single block of memory would be better. 2) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu... The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Here the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Wouldn't it make sense for users to specify a different executor for the resultset? This interface is preventing that. I prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor. 3) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I am not convinced the design of the value class is a good idea. The reason is that users know at compile time what types they are expecting from a given query. That means it would be possible for them to specify that before doing the query. So instead of doing this https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee... boost::mysql::row row; while (result.async_read_one(row, additional_info, yield[ec])) users could do boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec])) avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database. 4) ================================
IMO, the examples are too limited and show variation of the same functionality. For example - Text query, async with callbacks - Text query, async with futures - Text query, async with Boost.Coroutine coroutines - Text query, async with C++20 coroutines - Default completion tokens Here you are showing nice Asio features. I am not meaning you should remove them but if I am not mistaken there is no example that shows - How to keep long lasting connections open (specially for ssl). - Timers for resolve, connect, read and write. - Synchronized writes. - Healthy checks, failover etc. I estimate that to address these points users will be required to write further 500 - 1000 LOC on top of your library. Otherwise they will be stuck on underperforming short lived connections. It would be helpful for users to have a starting point to begin with. There some minor things I have also noticed: 5) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I expect this function to be O(1). 6) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio... https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different versions of optional. I suggest using the boost version until we transition to C++17. 7) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__errc.html
errc should be conditions and not error AFAICS. ============= Review End ================ I vote for conditional acceptance. Points 1, 2, 3 and 4 should be addressed before acceptance. My acceptance criteria sums up to 1) Remove functionality that provides low performance. With coroutines it will be easy for users to loop on async_query to create a std::vector of rows. 2) No IO object should be returned as a parameter of the completion handler. 3) Review the value type. Can it be improved? 4) Improve examples. Regards, Marcelo

On Tue, 10 May 2022 at 22:33, Marcelo Zimbres Silva via Boost < boost@lists.boost.org> wrote:
Hi, here is my review.
1) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea...
The handler signature for this operation is void(boost::mysql::error_code, std::vector<boost::mysql::row>). etc.
Having a std::vector in the signature of the completion handler is unusual in ASIO. In general, large objects should be passed as parameters to the initiating function.
Additionally, this operation does not support custom allocators. That wouldn't be a big problem to me if I could reuse the same memory for each request but the interface above prevents me from doing so.
It is also a pessimized storage as each row will allocate its own chunk of memory. A single block of memory would be better.
2) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...
The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Here the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Wouldn't it make sense for users to specify a different executor for the resultset? This interface is preventing that.
I prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.
I seem to remember this being commented on during the development of the library. It is worth remembering that because this is MySQL, nothing can be done with the connection object until the resultset has been fully read. In effect, the resultset can be thought of as a modal extension to the connection. Causing it to run on a different executor would mean that calls to the resultset would need to be marshalled to the connection's executor in any case, and the opposite in reverse. This may be of questionable value given the limitation of the underlying protocol. This arguably strengthens the case for the use of a connection pool, since without one, as you say, frequent queries will have setup overhead. The point about preferring not to return IO objects by value is a separate stylistic issue, on which I offer no opinion. However it is worth noting that Asio itself does have a form of the ip::tcp::acceptor::async_accept initiation function whose completion handler is passed a socket: https://www.boost.org/doc/libs/1_79_0/doc/html/boost_asio/reference/basic_so...
3) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I am not convinced the design of the value class is a good idea. The reason is that users know at compile time what types they are expecting from a given query. That means it would be possible for them to specify that before doing the query. So instead of doing this
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
boost::mysql::row row; while (result.async_read_one(row, additional_info, yield[ec]))
users could do
boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database.
4) ================================
IMO, the examples are too limited and show variation of the same functionality. For example
- Text query, async with callbacks - Text query, async with futures - Text query, async with Boost.Coroutine coroutines - Text query, async with C++20 coroutines - Default completion tokens
Here you are showing nice Asio features. I am not meaning you should remove them but if I am not mistaken there is no example that shows
- How to keep long lasting connections open (specially for ssl). - Timers for resolve, connect, read and write. - Synchronized writes. - Healthy checks, failover etc.
I estimate that to address these points users will be required to write further 500 - 1000 LOC on top of your library. Otherwise they will be stuck on underperforming short lived connections.
It would be helpful for users to have a starting point to begin with.
There some minor things I have also noticed:
5) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I expect this function to be O(1).
6) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different versions of optional. I suggest using the boost version until we transition to C++17.
7) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__errc.html
errc should be conditions and not error AFAICS.
============= Review End ================
I vote for conditional acceptance. Points 1, 2, 3 and 4 should be addressed before acceptance. My acceptance criteria sums up to
1) Remove functionality that provides low performance. With coroutines it will be easy for users to loop on async_query to create a std::vector of rows.
2) No IO object should be returned as a parameter of the completion handler.
3) Review the value type. Can it be improved?
4) Improve examples.
Regards, Marcelo
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, May 10, 2022 at 10:54 PM Richard Hodges via Boost <boost@lists.boost.org> wrote:
It is worth remembering that because this is MySQL, nothing can be done with the connection object until the resultset has been fully read.
Interesting, didn't know that (not being familiar with MySQL). Note that PostgreSQL's official C client API now offers a *pipeline* mode [1], which was part of the protocol for a while, but not exposed client-side until v14. The protocol also supports two *encoding* modes, text or binary. Is MySQL's protocol text only? We tested both, and binary is faster (surprise, right), which matter to me/us. There's also a special COPY mode, for higher-performance (and incremental IO) bulk row access. Any equivalent MySQL side? Again, COPY IN and OUT is faster than regular DML in PostgreSQL. A little off-topic for this review, but OTOH answers to the above would help understand the kind of async possible with MySQL in general, and thus the proposed Boost.MySQL, at the protocol level. More related to this review, I don't see any performance chapter in the doc's TOC with the official MySQL (C I guess) API. Doing away with the official client code is a risk, and using ASIO brings in significant complexity (from my POV), so how is proposed Boost.MySQL better than the official C client, or a good C++ wrapper on top of it? Or is the target market of this library only existing ASIO users who want to perform MySQL queries/statements? What's the target use-case? Small (in resultset size) and frequent (thus low duration / overhead) queries? Any wins on large bulk data loads for ETL tools, especially compared to (simpler?) code using the official client? Efficiency is listed as a goal, but there's no absolute metric, nor comparison to the obvious alternative that's the offical client API. Again, I'm not a MySQL user at this point. But I'm a heavy user of PostgreSQL and SQLite. And also of Oracle OCI in the past. So I have interest and perspective on the domain. I'm also aware of the Boost.PostgreSQL demo/proto Ruben made in the past, in the same style as Boost.MySQL. Thus IF I was a user of MySQL, the questions above would be first and foremost in my mind, about Boost.MySQL. And OTOH, if Boost.MySQL really has benefits, then maybe I can translate those benefits to that Boost.PostgreSQL proto? Beyond the design and implementation of such an ASIO-based *from-scracth* C++ client, I feel like there's too much missing for me to evaluate the "What-In-It-For-Me" and tradeoffs associated with proposed Boost.MySQL. Hopefully that makes sense :). Thanks, --DD [1]: https://www.postgresql.org/docs/current/libpq-pipeline-mode.html

On Wed, 11 May 2022 at 10:19, Dominique Devienne via Boost <boost@lists.boost.org> wrote:
On Tue, May 10, 2022 at 10:54 PM Richard Hodges via Boost <boost@lists.boost.org> wrote:
It is worth remembering that because this is MySQL, nothing can be done with the connection object until the resultset has been fully read.
To be clear here, this is what the protocol for a query (or a prepared statement execution) looks like. Every "message" in the below diagram is a 4-byte header plus a payload. client server ----- query request ---> // written by connection::query <------- query OK ---- // read by connection::query <------- metadata ---- // read by connection::query <------- rows -------- // read by resultset::read_one <------- EOF --------- // read by resultset::read_one When sending two queries, you can follow a strictly sequential model. Let's call this model a). It is the one implemented by this library: client server ----- query request ---> // written by connection::query <------- query OK ---- // read by connection::query <------- metadata ---- // read by connection::query <------- rows -------- // read by resultset::read_one <------- EOF --------- // read by resultset::read_one ----- query request ---> // written by connection::query <------- query OK ---- // read by connection::query <------- metadata ---- // read by connection::query <------- rows -------- // read by resultset::read_one <------- EOF --------- // read by resultset::read_one You could also initiate the next query without completely reading all the packets sent by the server. This would look like this: client server --- query request 1 ---> --- query request 2 ---> <----- query 1 OK ---- <----- metadata 1 ---- <----- rows 1 -------- <----- EOF 1 --------- <----- query 2 OK ---- <----- metadata 2 ---- <----- rows 2 -------- <----- EOF 2 --------- This is possible at the protocol level. Note that the server won't do any special handling here: it will process the two queries sequentially. It will read the first one, process it, then send all the response packets, then repeat for the second one. This second model b) is *currently not possible* with the current interface. It would require a batch interface like: serializer sr; sr.add_query("SELECT * FROM table1"); sr.add_query("SELECT * FROM table2"); connection.write(sr); resultset r1 = connection.read_query_result(); // Now you *MUST* read all the rows in this resultset // before moving on to the next one resultset r2 = connection.read_query_result(); // Same for r2 Note that this is different from a proper pipeline mode, as described by Postgres. I see two major differences: 1) In a real pipeline mode, the server processes the queries in batch. Here, the server still processes the queries sequentially. 2) Pipeline modes usually specify an option on what to do when a query in the pipeline fails. Here, you don't have that - subsequent queries will be executed regardless of the result of previous ones. If you think this interface should be provided, please let me know, and I will raise the relevant issues. More recent versions of MySQL (v8.x) include a plugin with a new version of the protocol, called the X protocol. I'm not an expert on it, AFAIK it was created with the idea of using MySQL as a document store, but can also be used for regular SQL ops. The documentation is here: https://dev.mysql.com/doc/dev/mysql-server/8.0.26/page_mysqlx_protocol.html This protocol does have a pipeline mode. Please note that this library does *NOT* implement this protocol. There are actually two problems with it: 1) It employs Google's protobufs as message format, which creates licensing conflicts with anything submitted to Boost. 2) It only targets MySQL 8+, not MariaDB or MySQL v5.x. In the classic protocol (the one implemented by this library), there are a couple extra concepts that haven't been implemented yet: 1) Multi-statement. This is a primitive form of pipelining, where you specify several queries to connection::query() as a single string, separated by semicolons. The server sends several resultsets after that. I haven't focused a lot on this because it sounded risky (in terms of security) for me. 2) Multi-resultset. This is used with stored procedures, when a procedure returns more than one resultset. This issue tracks both: https://github.com/anarthal/mysql/issues/8 MySQL docs on this: https://dev.mysql.com/doc/dev/mysql-server/8.0.26/page_protocol_command_phas...
Interesting, didn't know that (not being familiar with MySQL).
Note that PostgreSQL's official C client API now offers a *pipeline* mode [1], which was part of the protocol for a while, but not exposed client-side until v14.
The protocol also supports two *encoding* modes, text or binary. Is MySQL's protocol text only? We tested both, and binary is faster (surprise, right), which matter to me/us.
MySQL does define a text and a binary encoding. It will use the text encoding when using text queries (i.e. connection::query) and the binary encoding when using prepared statements (i.e. prepared_statement::execute). Resultset objects remember where they come from and will use the relevant encoding.
There's also a special COPY mode, for higher-performance (and incremental IO) bulk row access. Any equivalent MySQL side? Again, COPY IN and OUT is faster than regular DML in PostgreSQL.
There are two operations on the MySQL side: 1) The LOAD DATA statement (https://dev.mysql.com/doc/refman/8.0/en/load-data.html#load-data-local). You can use it like "LOAD DATA LOCAL INFILE 'test.csv' INTO TABLE test;" to do bulk loads from the client. This needs library support and is currently *NOT* implemented (I actually wasn't aware of the LOCAL option, thanks for bringing this up). I've raised https://github.com/anarthal/mysql/issues/67 to track this issue. 2) The SELECT INTO OUTFILE statement (https://dev.mysql.com/doc/refman/8.0/en/select-into.html). Unfortunately, this only writes files in the server host and not to the client. This means there is no special support required in the client. I'd say the usefulness of this statement is more limited than 1)
A little off-topic for this review, but OTOH answers to the above would help understand the kind of async possible with MySQL in general, and thus the proposed Boost.MySQL, at the protocol level.
More related to this review, I don't see any performance chapter in the doc's TOC with the official MySQL (C I guess) API. Doing away with the official client code is a risk, and using ASIO brings in significant complexity (from my POV), so how is proposed Boost.MySQL better than the official C client, or a good C++ wrapper on top of it?
I've updated https://github.com/anarthal/mysql/issues/50 to include this performance page. Traditionally, the C interface had only synchronous functions, so you would have to spawn a separate thread for each connection you had. With an async API, you can likely have much higher throughput. I've noticed that the official client has very recently added non-blocking functions. It's a curious interface, as it seems you have to repeatedly call the same function with the same parameters until the operation completes. https://dev.mysql.com/doc/c-api/8.0/en/c-api-asynchronous-interface-usage.ht... if you're curious.
Or is the target market of this library only existing ASIO users who want to perform MySQL queries/statements?
This was definitely my initial target audience. But I think we can reach more than that.
What's the target use-case? Small (in resultset size) and frequent (thus low duration / overhead) queries? Any wins on large bulk data loads for ETL tools, especially compared to (simpler?) code using the official client?
I've been more focused on that first case, typical in web apps. I'd say ETL loads can be a target in the future, but we may have some missing features (like the LOAD DATA I mentioned above).
Efficiency is listed as a goal, but there's no absolute metric, nor comparison to the obvious alternative that's the offical client API.
This is true. I'd say being able to use an async API can already grant the user some performance benefits over a sync-only API, but I agree that we should have benchmarks. Which benchmarks would you, as a user, find useful? I'm now thinking on single-query execution time as a measurement of latency, and bulk query execution time as a measurement of throughput. I'm open to suggestions.
Again, I'm not a MySQL user at this point. But I'm a heavy user of PostgreSQL and SQLite. And also of Oracle OCI in the past. So I have interest and perspective on the domain.
I'm also aware of the Boost.PostgreSQL demo/proto Ruben made in the past, in the same style as Boost.MySQL. Thus IF I was a user of MySQL, the questions above would be first and foremost in my mind, about Boost.MySQL. And OTOH, if Boost.MySQL really has benefits, then maybe I can translate those benefits to that Boost.PostgreSQL proto?
Beyond the design and implementation of such an ASIO-based *from-scracth* C++ client, I feel like there's too much missing for me to evaluate the "What-In-It-For-Me" and tradeoffs associated with proposed Boost.MySQL.
Hopefully that makes sense :). Thanks, --DD
[1]: https://www.postgresql.org/docs/current/libpq-pipeline-mode.html
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, May 12, 2022 at 3:12 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:
...Google's protobufs...creates licensing conflicts with anything submitted to Boost.
If you are only reading protobufs and using the most common subset of its features, it should be fairly easy to do a clean-room implementation of just the part of the protobuf format that you need to make your library work. It wouldn't be that much code either. It would not be necessary (or desirable) to require linking with the stock protobuf library. Thanks

On Fri, 13 May 2022 at 19:29, Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Thu, May 12, 2022 at 3:12 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:
...Google's protobufs...creates licensing conflicts with anything submitted to Boost.
If you are only reading protobufs and using the most common subset of its features, it should be fairly easy to do a clean-room implementation of just the part of the protobuf format that you need to make your library work. It wouldn't be that much code either. It would not be necessary (or desirable) to require linking with the stock protobuf library.
It's good to know that this is not impossible, as I could see this library implementing the X protocol at some point. I don't have the time right now to do this, but it's good to know it can be done. Regards, Ruben.

Am 10.05.2022 um 22:33 schrieb Marcelo Zimbres Silva via Boost:
users could do
boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database.
i've developed a small C++17 example of how to work with Input/Output transformers to write the statements more compact and type-safe the base idea is that the MySQL <-> C++ needs sometimes some sort of conversions and to combine most of the infos at one point so its easier or at least possible to optimize better - the current interface looks very runtime allocation stylish :) the Input- and Output Transformers help to work with basic types and also with SQL/MySQL special types like Null-string etc. - its not possible to map MySQL types always 1:1 to C++ and back, sometimes you want to behave the transformation different the Transformers aren't visible when used with basic types its just an example to promote the Idea: https://pastebin.com/raw/vepbTAKL the best combination would be some sort of fluent SQL interface like: https://github.com/rbock/sqlpp11 but i still think a plain string+binding interface like Boost.MySql currently got is also needed

On Wed, 11 May 2022 at 06:16, Dennis Luehring via Boost <boost@lists.boost.org> wrote:
Am 10.05.2022 um 22:33 schrieb Marcelo Zimbres Silva via Boost:
users could do
boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database.
i've developed a small C++17 example of how to work with Input/Output transformers to write the statements more compact and type-safe
the base idea is that the MySQL <-> C++ needs sometimes some sort of conversions and to combine most of the infos at one point so its easier or at least possible to optimize better - the current interface looks very runtime allocation stylish :)
Thanks for sharing. I think I get the idea of "allowing the user to define custom MySQL <=> C++ mappings". I'm not really getting the interface that your input/output transformers would expose, though. Let's take this one, for example: template <> struct OutputTransformer<NullAsEmptyString> { using value_type = std::string; static std::string get_value( int /*index_*/ ) { static std::string x = "hello"; return x; }; static void get_value( int index_, std::string& value_ ) { value_ = get_value( index_ ); }; }; I assume this is user-provided code. Where does the user get the string value from? What does index_ mean in this context? Let's also take a look at actually using the statements: My_select my_select( db_connection, "select a, b, c from test where d == ?1" ); { // fetch into ref tuple int some_int{}; float some_float{}; std::string some_string; my_select( { some_int, some_float, some_string }, { 123 } ); } How is that different from the following snippet? resultset r = conn.query("select a, b, c from test where d == ?1" ); tuple<int, float, string> row; r.read_one(row);
the Input- and Output Transformers help to work with basic types and also with SQL/MySQL special types like Null-string etc. - its not possible to map MySQL types always 1:1 to C++ and back, sometimes you want to behave the transformation different
the Transformers aren't visible when used with basic types
its just an example to promote the Idea: https://pastebin.com/raw/vepbTAKL
the best combination would be some sort of fluent SQL interface like: https://github.com/rbock/sqlpp11
This library is supposed to be a protocol driver library, so it provides primitives close to the MySQL protocol. sqlpp11 is great, but it's a higher level library. I don't think it makes sense trying to incorporate this kind of features here. It would make more sense for a higher level library like sqlpp11 to build on top of Boost.MySQL, instead.
but i still think a plain string+binding interface like Boost.MySql currently got is also needed
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 12 May 2022 at 09:46, Ruben Perez via Boost <boost@lists.boost.org> wrote:
On Wed, 11 May 2022 at 06:16, Dennis Luehring via Boost <boost@lists.boost.org> wrote:
the best combination would be some sort of fluent SQL interface like: https://github.com/rbock/sqlpp11
This library is supposed to be a protocol driver library, so it provides primitives close to the MySQL protocol. sqlpp11 is great, but it's a higher level library.
Ruben, this is a very important feature that distinguishes your library from high level 'common denominator' database libraries like sqlpp11, SOCI and others. It may be worth to make this difference prominent, somewhere in the docs. Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

On Thu, 12 May 2022 at 10:07, Mateusz Loskot via Boost <boost@lists.boost.org> wrote:
This library is supposed to be a protocol driver library, so it provides primitives close to the MySQL protocol. sqlpp11 is great, but it's a higher level library.
Ruben, this is a very important feature that distinguishes your library from high level 'common denominator' database libraries like sqlpp11, SOCI and others.
It may be worth to make this difference prominent, somewhere in the docs.
I honestly thought it was on the main page but it is not. Raised https://github.com/anarthal/mysql/issues/70.
Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Am 12.05.2022 um 09:46 schrieb Ruben Perez:
On Wed, 11 May 2022 at 06:16, Dennis Luehring via Boost <boost@lists.boost.org> wrote:
Am 10.05.2022 um 22:33 schrieb Marcelo Zimbres Silva via Boost:
users could do
boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database.
i've developed a small C++17 example of how to work with Input/Output transformers to write the statements more compact and type-safe
the base idea is that the MySQL <-> C++ needs sometimes some sort of conversions and to combine most of the infos at one point so its easier or at least possible to optimize better - the current interface looks very runtime allocation stylish :)
Thanks for sharing. I think I get the idea of "allowing the user to define custom MySQL <=> C++ mappings".
I'm not really getting the interface that your input/output transformers would expose, though. Let's take this one, for example:
template <> struct OutputTransformer<NullAsEmptyString> { using value_type = std::string;
static std::string get_value( int /*index_*/ ) { static std::string x = "hello"; return x; }; static void get_value( int index_, std::string& value_ ) { value_ = get_value( index_ ); }; };
I assume this is user-provided code. Where does the user get the string value from? What does index_ mean in this context?
sorry for beeing not clear (and sending a not directly fitting example) that code should be library code - more a less a collection of base-mysql concepts that can be used - this sample transformer lets you act empty strings as null in mysql - the implementation is a dummy - only to get a feeling how the data-flow is my adaption is used with SQLite and the index is the parameter index that would then map to SQLite bind functions or as in this case checks if the value is null and returns "" plus serveral other "typical" helper for adaption problems the transformer get also used for all fetch routines
Let's also take a look at actually using the statements:
My_select my_select( db_connection, "select a, b, c from test where d == ?1" );
{ // fetch into ref tuple int some_int{}; float some_float{}; std::string some_string; my_select( { some_int, some_float, some_string }, { 123 } ); }
How is that different from the following snippet?
resultset r = conn.query("select a, b, c from test where d == ?1" ); tuple<int, float, string> row; r.read_one(row);
my goal was to keep the sql-string combined with the Prepared_fetch_1 instanciation but string use in templates is a little bit limited and i also map input types for inserts or where clauses - thats also possible with splitted tuples for the input/output data but then its even more separated from the statement (which is tied to the input/output types) to know as much as possible before-hand - allows maybe deeper optimization etc. for example the my_select instance can use prepared statements per default (and this is connection oriented with sqlite) the "readers" are just variants (that also allow to beeing const - see const auto tuple): // fetch into ref tuple my_select( { ein_int, ein_float, ein_string }, { 123 } ); // return value tuple const auto [ein_int2, ein_float2, ein_string2] = my_select( { 123 } ); // fetch into class/struct... Result3 result; my_select( result, { 123 } ); the real optimization party starts with multi row fetches // multi row fetch using My_select = Prepared_fetch<std::tuple<int>, std::tuple<int, float, NullAsEmptyString>>; My_select my_select( db_connection, "select a, b from test where c == ?1" ); std::vector<Result2> result; my_select.fetch_copy( std::back_inserter( result ), 34 ); my_select.fetch_copy( result, 34 ); auto fetch_func = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) {}; my_select.fetch_func( fetch_func, 34 ); auto fetch_func_cancel = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) { return false; }; my_select.fetch_func_with_cancel( fetch_func_cancel, 34 ); because i know at instanciation times what parts are fixed, variant size etc. - so i can further reduce the memory overhead etc. - you could directly combine the procotocl result parsing with the result-set content etc. its not implemented in my sqlite wrapper so far but the interface allows such optization (if the backend is deep enough - like yours) that means fetch-copy can be prepared at compile time for exact the data etc. would allow zero or less-copy concepts
the Input- and Output Transformers help to work with basic types and also with SQL/MySQL special types like Null-string etc. - its not possible to map MySQL types always 1:1 to C++ and back, sometimes you want to behave the transformation different
the Transformers aren't visible when used with basic types
its just an example to promote the Idea: https://pastebin.com/raw/vepbTAKL
the best combination would be some sort of fluent SQL interface like: https://github.com/rbock/sqlpp11
This library is supposed to be a protocol driver library, so it provides primitives close to the MySQL protocol. sqlpp11 is great, but it's a higher level library. I don't think it makes sense trying to incorporate this kind of features here. It would make more sense for a higher level library like sqlpp11 to build on top of Boost.MySQL, instead.
Boost does only provide low level stuff for real low level concepts (smart-pointer, maps etc.-) but most other libraries are always introducing very high level concepts
but i still think a plain string+binding interface like Boost.MySql currently got is also needed
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Thu, 12 May 2022 at 10:21, Dennis Luehring via Boost <boost@lists.boost.org> wrote:
Boost does only provide low level stuff for real low level concepts
This is a misconception. There nothing that should prevent authors from submitting high level libraries or libraries specific to protocols or domains. See also https://lists.boost.org/Archives/boost/2022/04/252823.php Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

I'm not really getting the interface that your input/output transformers would expose, though. Let's take this one, for example:
template <> struct OutputTransformer<NullAsEmptyString> { using value_type = std::string;
static std::string get_value( int /*index_*/ ) { static std::string x = "hello"; return x; }; static void get_value( int index_, std::string& value_ ) { value_ = get_value( index_ ); }; };
I assume this is user-provided code. Where does the user get the string value from? What does index_ mean in this context?
sorry for beeing not clear (and sending a not directly fitting example)
that code should be library code - more a less a collection of base-mysql concepts that can be used - this sample transformer lets you act empty strings as null in mysql - the implementation is a dummy - only to get a feeling how the data-flow is
my adaption is used with SQLite and the index is the parameter index that would then map to SQLite bind functions or as in this case checks if the value is null and returns ""
plus serveral other "typical" helper for adaption problems
What kind of transformers do you propose? For instance, something to make a NULL string be treated as empty is a pattern I don't especially like. I guess string-to-json parsing is the obvious one. That would end up having the form of custom types that can be specified in the tuple/struct representing a row, so something like: tuple<int, std::string, null_as_empty_string> row; resultset.read_one(row);
to know as much as possible before-hand - allows maybe deeper optimization etc. for example the my_select instance can use prepared statements per default (and this is connection oriented with sqlite)
I'd say you know at compile-time as much in both cases.
// return value tuple const auto [ein_int2, ein_float2, ein_string2] = my_select( { 123 } );
I'd suggest something like struct my_row { int ein_int2; float ein_float2; string ein_string2; }; BOOST_DESCRIBE_STRUCT(my_row); resultset.read_one(row); Returning it by value works great for sync ops but can have bad implications in async ops.
// multi row fetch using My_select = Prepared_fetch<std::tuple<int>, std::tuple<int, float, NullAsEmptyString>>; My_select my_select( db_connection, "select a, b from test where c == ?1" );
std::vector<Result2> result; my_select.fetch_copy( std::back_inserter( result ), 34 ); my_select.fetch_copy( result, 34 );
auto fetch_func = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) {}; my_select.fetch_func( fetch_func, 34 ); auto fetch_func_cancel = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) { return false; }; my_select.fetch_func_with_cancel( fetch_func_cancel, 34 );
I can see something like fetch_copy, as a generalization of resultset::read_many. Other users have also brought it up. https://github.com/anarthal/mysql/issues/58 tracks it. It can be extended to work with tuples or structs.
Boost does only provide low level stuff for real low level concepts (smart-pointer, maps etc.-) but most other libraries are always introducing very high level concepts
As Mateusz pointed out, I think there is room for everything - Beast has very low level stuff, too. Thanks, Ruben.

Am 15.05.2022 um 18:25 schrieb Ruben Perez:
I'm not really getting the interface that your input/output transformers would expose, though. Let's take this one, for example:
template <> struct OutputTransformer<NullAsEmptyString> { using value_type = std::string;
static std::string get_value( int /*index_*/ ) { static std::string x = "hello"; return x; }; static void get_value( int index_, std::string& value_ ) { value_ = get_value( index_ ); }; };
I assume this is user-provided code. Where does the user get the string value from? What does index_ mean in this context?
sorry for beeing not clear (and sending a not directly fitting example)
that code should be library code - more a less a collection of base-mysql concepts that can be used - this sample transformer lets you act empty strings as null in mysql - the implementation is a dummy - only to get a feeling how the data-flow is
my adaption is used with SQLite and the index is the parameter index that would then map to SQLite bind functions or as in this case checks if the value is null and returns ""
plus serveral other "typical" helper for adaption problems
What kind of transformers do you propose? For instance, something to make a NULL string be treated as empty is a pattern I don't especially like. I guess string-to-json parsing is the obvious one. That would end up having the form of custom types that can be specified in the tuple/struct representing a row, so something like:
tuple<int, std::string, null_as_empty_string> row; resultset.read_one(row);
std:optional would be more correct clearly all Basic types TEXT NULL <-> std::optional<std::string> BasicType NULL <-> std::optional<BasicType>
to know as much as possible before-hand - allows maybe deeper optimization etc. for example the my_select instance can use prepared statements per default (and this is connection oriented with sqlite)
I'd say you know at compile-time as much in both cases.
// return value tuple const auto [ein_int2, ein_float2, ein_string2] = my_select( { 123 } );
I'd suggest something like
struct my_row { int ein_int2; float ein_float2; string ein_string2; }; BOOST_DESCRIBE_STRUCT(my_row); resultset.read_one(row);
i like it flexible to adapt to my structures, tuples etc. - but yours seems flexible enough so far im a big nearly/zero copy fan
Returning it by value works great for sync ops but can have bad implications in async ops.
thats true - but that counts for every value stuff using async
// multi row fetch using My_select = Prepared_fetch<std::tuple<int>, std::tuple<int, float, NullAsEmptyString>>; My_select my_select( db_connection, "select a, b from test where c == ?1" );
std::vector<Result2> result; my_select.fetch_copy( std::back_inserter( result ), 34 ); my_select.fetch_copy( result, 34 );
auto fetch_func = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) {}; my_select.fetch_func( fetch_func, 34 ); auto fetch_func_cancel = []( const int& /*ein_int_*/, const float& /*ein_float_*/, const std::string& /*ein_string_*/ ) { return false; }; my_select.fetch_func_with_cancel( fetch_func_cancel, 34 );
I can see something like fetch_copy, as a generalization of resultset::read_many. Other users have also brought it up. https://github.com/anarthal/mysql/issues/58 tracks it. It can be extended to work with tuples or structs.
don't forget the ability to read cursor/stream based and allow breaking the read beforehand therefor im having this callable lambda that allows to end reading based on runtime decision
Boost does only provide low level stuff for real low level concepts (smart-pointer, maps etc.-) but most other libraries are always introducing very high level concepts
As Mateusz pointed out, I think there is room for everything - Beast has very low level stuff, too.
you right with that

clearly all Basic types
TEXT NULL <-> std::optional<std::string> BasicType NULL <-> std::optional<BasicType>
When we do https://github.com/anarthal/mysql/issues/60 we will have those mappings. Nullable types correspond to optionals, surely. We will allow boost::optional, too.
I can see something like fetch_copy, as a generalization of resultset::read_many. Other users have also brought it up. https://github.com/anarthal/mysql/issues/58 tracks it. It can be extended to work with tuples or structs.
don't forget the ability to read cursor/stream based and allow breaking the read beforehand
therefor im having this callable lambda that allows to end reading based on runtime decision
The trouble here is that when calling connection::query, MySQL sends all the results beforehand. So you will have to read those packets even if you want to discard them; otherwise, you won't be able to use the connection further. Prepared statements allow fetching a limited number of rows, using cursors. This is not implemented yet, and is tracked by https://github.com/anarthal/mysql/issues/20. Even in this case, you have to manually ask the server "I want 5 rows", and you will have to read those 5 rows before going further. Regards, Ruben.

On Wed, May 18, 2022 at 12:16 PM Ruben Perez via Boost <boost@lists.boost.org> wrote:
The trouble here is that when calling connection::query, MySQL sends all the results beforehand. So you will have to read those packets even if you want to discard them; otherwise, you won't be able to use the connection further.
Prepared statements allow fetching a limited number of rows, using cursors. This is not implemented yet, and is tracked by https://github.com/anarthal/mysql/issues/20. Even in this case, you have to manually ask the server "I want 5 rows", and you will have to read those 5 rows before going further.
LibPQ (and the protocol below it I assume) are similar. You must wait and "consume" the whole resultset of a prepared statement, while with a cursor, you fetch "batches" of your chosen size(s). BUT at least with LibPQ, turns out the Cursor approach is slower overall, in my testing. The Cursor approach gives you faster time-to-first-row (in sync mode at least), and you don't need to have libpq accumulate a large resultset in its own memory, since you can process and discard the smaller batches, but you pay for that with being up to 2X slower overall, probably from the increased round-trips I guess (the slowdown probably depends on the batch sizes too). In your case, you are async, so unlike libpq (sync mode), you have good time-to-first-row in both cases, and still allow processing the rows as they arrive. But I'd double check the overhead of Statement vs Cursor, with different batch sizes, on a larger resultset, in your performance testing. But that's straying a bit outside the scope of your library maybe. Although you have high-performance as clearly in-scope, so tradeoffs like these, which granted are outside your control, are still important to bench for, and mention in the doc IMHO. My $0.02.

On Wed, 18 May 2022 at 14:34, Dominique Devienne <ddevienne@gmail.com> wrote:
On Wed, May 18, 2022 at 12:16 PM Ruben Perez via Boost <boost@lists.boost.org> wrote:
The trouble here is that when calling connection::query, MySQL sends all the results beforehand. So you will have to read those packets even if you want to discard them; otherwise, you won't be able to use the connection further.
Prepared statements allow fetching a limited number of rows, using cursors. This is not implemented yet, and is tracked by https://github.com/anarthal/mysql/issues/20. Even in this case, you have to manually ask the server "I want 5 rows", and you will have to read those 5 rows before going further.
LibPQ (and the protocol below it I assume) are similar. You must wait and "consume" the whole resultset of a prepared statement, while with a cursor, you fetch "batches" of your chosen size(s).
BUT at least with LibPQ, turns out the Cursor approach is slower overall, in my testing.
The Cursor approach gives you faster time-to-first-row (in sync mode at least), and you don't need to have libpq accumulate a large resultset in its own memory, since you can process and discard the smaller batches, but you pay for that with being up to 2X slower overall, probably from the increased round-trips I guess (the slowdown probably depends on the batch sizes too).
In your case, you are async, so unlike libpq (sync mode), you have good time-to-first-row in both cases, and still allow processing the rows as they arrive. But I'd double check the overhead of Statement vs Cursor, with different batch sizes, on a larger resultset, in your performance testing. But that's straying a bit outside the scope of your library maybe. Although you have high-performance as clearly in-scope, so tradeoffs like these, which granted are outside your control, are still important to bench for, and mention in the doc IMHO. My $0.02.
It will likely have similar effects to what you describe. Unless I don't think this should be in scope in this first library version, it definitely is something users may want at some point and that requires library support, so it should be taken into account. https://github.com/anarthal/mysql/issues/20 tracks it. I've noted your comments on performance benchmarks so they don't get lost. Regards, Ruben.

Hi Ruben, Ruben Perez wrote:
This library is supposed to be a protocol driver library, so it provides primitives close to the MySQL protocol.
It's frustrating to see this now, after writing my review. I thought I was reviewing a library that, quoting your docs, made "ease of use" its first design goal and aimed to "hide the MySQL protocol complexities". Make up your mind, are you trying to hide the protocol or provide something close to it? And in a later message: Ruben Perez wrote:
When sending two queries [...] You could also initiate the next query without completely reading all the packets sent by the server. [...] This is possible at the protocol level. [...] This [...] is *currently not possible* with the current [library] interface.
Contrast that with what your docs say: Warning Because of how the MySQL protocol works, once you obtain a resultset, you must read it entirely (i.e. until it's complete) before engaging in any subsequent operation that implies communication with the server (e.g. issuing another query, preparing a statement, closing a statement...). Failing to do so results in undefined behavior. You are explicitly blaming the protocol for this restriction in that quote. It seems that that is not true, and that it is a limitation of the library. If I were writing my review again on the basis of this being a low-level protocol library - which I'm not going to do, I don't have time - I would probably say that support for this sort of pipelining is essential. I would perhaps hope to see "low-level" functions that encode and decode the protocol packets, which the user could even use with their own networking layer, if they wanted to.
Pipeline modes usually specify an option on what to do when a query in the pipeline fails. Here, you don't have that - subsequent queries will be executed regardless of the result of previous ones.
Transactions help here though. For example: Transaction t(conn); for (auto&& a: lots_of_things) { execute_insert_stmt(a); } t.commit(); I would expect pipelining to significantly improve the performance of that, especially if there were significant network latency. Regards, Phil.

On Thu, May 12, 2022 at 4:25 PM Phil Endecott via Boost <boost@lists.boost.org> wrote:
Ruben Perez wrote (in the doc):
Warning Because of how the MySQL protocol works, once you obtain a resultset, you must read it entirely (i.e. until it's complete) before engaging in any subsequent operation that implies communication with the server (e.g. issuing another query, preparing a statement, closing a statement...). Failing to do so results in undefined behavior.
You are explicitly blaming the protocol for this restriction in that quote. It seems that that is not true, and that it is a limitation of the library.
That's not quite how I interpret it. The (basic) MySQL protocol does indeed require the client to fully consume a resultset, and not interleave any other operations until that's done, from what Ruben described. The alternate sequence-diagram Ruben presents, shows writing the next query *before* starting to read the first one. It's once you start reading that you can't do anything else. And that mode is essentially like having the two queries in the same packet, semicolons-separated, something also not supported at this point, but which probably should be.
Pipeline modes usually specify an option on what to do when a query in the pipeline fails. Here, you don't have that - subsequent queries will be executed regardless of the result of previous ones.
Transactions help here though. For example:
Transaction t(conn); for (auto&& a: lots_of_things) { execute_insert_stmt(a); } t.commit();
That's the pattern I also use, because my APIs throw, yielding the implicit-rollback. Although I'm not sure it will translate well to async code.
I would expect pipelining to significantly improve the performance of that, especially if there were significant network latency.
It depends Phil. It really helps on slow, high-latency networks. In a demo example, manually adding 50ms latency, Larenz Albe got a 4x speedup [1]. But if you run on the LAN with sub-1ms latencies, and/or fast Multi-Gb/sec (Cloud) networks, it's less likely to matter that much I suspect. [1]: https://www.cybertec-postgresql.com/en/pipeline-mode-better-performance-on-s...

On Thu, 12 May 2022 at 17:37, Dominique Devienne via Boost <boost@lists.boost.org> wrote:
On Thu, May 12, 2022 at 4:25 PM Phil Endecott via Boost <boost@lists.boost.org> wrote:
Ruben Perez wrote (in the doc):
Warning Because of how the MySQL protocol works, once you obtain a resultset, you must read it entirely (i.e. until it's complete) before engaging in any subsequent operation that implies communication with the server (e.g. issuing another query, preparing a statement, closing a statement...). Failing to do so results in undefined behavior.
You are explicitly blaming the protocol for this restriction in that quote. It seems that that is not true, and that it is a limitation of the library.
That's not quite how I interpret it. The (basic) MySQL protocol does indeed require the client to fully consume a resultset, and not interleave any other operations until that's done, from what Ruben described.
The alternate sequence-diagram Ruben presents, shows writing the next query *before* starting to read the first one. It's once you start reading that you can't do anything else. And that mode is essentially like having the two queries in the same packet, semicolons-separated, something also not supported at this point, but which probably should be.
What I meant with that, is that you will have to consume the entire resultset before starting to consume the next resultset (or the response to any subsequent request). You can still write stuff to the server, but your replies will be queued until you read that resultset. I do think the docs are misleading, and raised https://github.com/anarthal/mysql/issues/76 to fix it.
It depends Phil. It really helps on slow, high-latency networks. In a demo example, manually adding 50ms latency, Larenz Albe got a 4x speedup [1]. But if you run on the LAN with sub-1ms latencies, and/or fast Multi-Gb/sec (Cloud) networks, it's less likely to matter that much I suspect.
I have raised https://github.com/anarthal/mysql/issues/76 to address pipeline mode. Regards, Ruben.

It's frustrating to see this now, after writing my review.
Hi Phil, please accept my apologies for the misunderstanding. I still think much of your comments were useful and many will end as improvements in the library.
I thought I was reviewing a library that, quoting your docs, made "ease of use" its first design goal and aimed to "hide the MySQL protocol complexities". Make up your mind, are you trying to hide the protocol or provide something close to it?
Ease of use is written there because it is really important to me. I've tried to hide as many complexities of the protocol as I've been able to, without compromising performance (maybe too many for a protocol library, as we'll discuss below). I've abstracted the handshake algorithm, the different encodings, and tried to expose an interface as simple as possible. But the library is not an ORM or a SQL framework, and the documentation may not be clear enough in scope. https://github.com/anarthal/mysql/issues/70 addresses this rewording.
You are explicitly blaming the protocol for this restriction in that quote. It seems that that is not true, and that it is a limitation of the library.
If I were writing my review again on the basis of this being a low-level protocol library - which I'm not going to do, I don't have time - I would probably say that support for this sort of pipelining is essential. I would perhaps hope to see "low-level" functions that encode and decode the protocol packets, which the user could even use with their own networking layer, if they wanted to.
I've raised https://github.com/anarthal/mysql/issues/76 to correct that wording, as it is definitely not true. This section should explain the whole story, with both the library limitation and the protocol capabilities. I've raised https://github.com/anarthal/mysql/issues/75 for pipeline mode. Regards, Ruben.

Ruben Perez wrote:
But the library is not an ORM or a SQL framework, and the documentation may not be clear enough in scope.
I have little use for ORM or SQL frameworks. I do, however, have use for two things: 1. A way to say conn.query( "SELECT * FROM tb_something WHERE id = ?", id ); There is a school of thought that holds that I shouldn't be doing that and using prepared statements instead, but managing the lifetime of prepared statements is inconvenient enough to steer people, including myself, towards sprintf-ing their way to victory, which of course invites security issues. The above interface is easier than sprintf, which will steer people away from sprintf, which will be a big win in both usability and safety. 2. A way to obtain a described struct from a result row of the above query. Pretty self-explanatory (*). This is almost always what one wants, if the query returns more than one column, and is what everyone ends up reinventing and coding by hand. Both of these features contain a significant amount of protocol-independent code, which makes them out of scope under a strict reading, but they are worth having IMO. (*) Except for the question of whether out of order columns should be automatically put into proper order based on the member names.

I have little use for ORM or SQL frameworks. I do, however, have use for two things:
1. A way to say
conn.query( "SELECT * FROM tb_something WHERE id = ?", id );
It has become obvious to me during this review that this functionality should be high priority, one way or another. https://github.com/anarthal/mysql/issues/69 tracks it.
2. A way to obtain a described struct from a result row of the above query.
This one has been mentioned in almost all reviews, too. https://github.com/anarthal/mysql/issues/60 tracks it.
Both of these features contain a significant amount of protocol-independent code, which makes them out of scope under a strict reading, but they are worth having IMO.
It may be the case for 1., although escaping is MySQL-specific, so it's not insane for the library to provide it. First-class support for 2. has advantages, too. Regards, Ruben.

Le 2022-05-10 22:33, Marcelo Zimbres Silva via Boost a écrit :
6) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio... https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different versions of optional. I suggest using the boost version until we transition to C++17.
I probably won't have the time to do a proper review, but had a quick look at the documentation and this interface surprised me a bit. I was expecting optional (whether std or boost) to be used to map nullable values, but it seems that's not the case. It seems this interface will return an empty optional if using a wrong type (like trying to get an int from a string column), but the doc is silent on what it will do if trying to get an int from a nullable int which is actually NULL (my understanding is that it would return an empty optional). It bothers me a bit that these two use cases are handled the same way, because one of them is a valid use, and the other is a mismatch between the program and the database schema that i would like to diagnose. I understand this issue is non-trivial, since when the value is retrieved from the row object, the information about the type of the data in the DB (and thus its nullability) is lost. However, it seems odd. It may be of interest to store whether the field is null in a independent way from its type, instead of relying on a single null type. Or maybe i just missed something obvious. Regards, Julien

On Wed, 11 May 2022 at 09:54, Julien Blanc via Boost <boost@lists.boost.org> wrote:
Le 2022-05-10 22:33, Marcelo Zimbres Silva via Boost a écrit :
6) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio... https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different versions of optional. I suggest using the boost version until we transition to C++17.
I probably won't have the time to do a proper review, but had a quick look at the documentation and this interface surprised me a bit. I was expecting optional (whether std or boost) to be used to map nullable values, but it seems that's not the case. It seems this interface will return an empty optional if using a wrong type (like trying to get an int from a string column), but the doc is silent on what it will do if trying to get an int from a nullable int which is actually NULL (my understanding is that it would return an empty optional). It bothers me a bit that these two use cases are handled the same way, because one of them is a valid use, and the other is a mismatch between the program and the database schema that i would like to diagnose.
That's right, value::get_optional<uint64_t>() will return an empty optional either if your value is not an int (e.g. an string) or an actual NULL value. You can distinguish both using value::is_null(). For the use case you are proposing, I would suggest this kind of code: value v = /* get your value */ if (v.is_null()) { // handle NULL case } else { uint64_t my_int = v.get<uint64_t>(); // This will throw on type mismatch } Of course, if we end up implementing reading rows into compile-time known data structures, we can do a better job here.
I understand this issue is non-trivial, since when the value is retrieved from the row object, the information about the type of the data in the DB (and thus its nullability) is lost. However, it seems odd. It may be of interest to store whether the field is null in a independent way from its type, instead of relying on a single null type.
Additionally to value::is_null, you can also access field metadata using resultset.fields()[index]. This returns a field_metadata object https://anarthal.github.io/mysql/mysql/ref/boost__mysql__field_metadata.html which contains a is_not_null() function, which will return true if you created your field with a NOT NULL clause.
Or maybe i just missed something obvious.
Regards,
Julien
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, 10 May 2022 at 22:33, Marcelo Zimbres Silva via Boost <boost@lists.boost.org> wrote:
Hi, here is my review.
Thanks for taking your time to write your review.
1) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea... The handler signature for this operation is void(boost::mysql::error_code, std::vector<boost::mysql::row>). etc.
Having a std::vector in the signature of the completion handler is unusual in ASIO. In general, large objects should be passed as parameters to the initiating function.
Additionally, this operation does not support custom allocators. That wouldn't be a big problem to me if I could reuse the same memory for each request but the interface above prevents me from doing so.
It is also a pessimized storage as each row will allocate its own chunk of memory. A single block of memory would be better.
Those functions don't strive for efficiency, that is true. I think they can be useful for people using callbacks, for example. I see two options here, and I'm happy with both of them: a) Change the function signature to async_read_all(vector<row, Allocator>&, CompletionToken&&), with the completion handler signature being void(error_code) b) Remove the functionality altogether. I'm happy to make a) or b) acceptance criteria for inclusion. I've raised https://github.com/anarthal/mysql/issues/58 to track this.
2) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu... The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)
Here the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Wouldn't it make sense for users to specify a different executor for the resultset? This interface is preventing that.
Please note that `connection`, `resultset` and `prepared_statement` always use the same executor as the underlying `Stream` object, as returned by `connection::next_layer`. `resultset` and `prepared_statement` are proxy I/O objects to make the operations more semantic, but end up in calls to the underlying `Stream` read and write functions. I don't think it is possible at all to have separate executors for different objects in this context.
I prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.
As pointed above, I don't think this makes sense, unless I'm missing something. However, returning resultsets by lvalue reference instead of as completion handler arguments may be positive for efficiency, as it would allow more memory reuse. Resultsets hold table metadata, which is required to parse rows and lives in dynamic storage. There would be no performance gain for prepared_statatetment's, as these perform no allocations. I'm happy for this to become an acceptance condition, too, but please note that it won't allow different executors in any case. I've raised https://github.com/anarthal/mysql/issues/59 to track the issue.
3) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I am not convinced the design of the value class is a good idea. The reason is that users know at compile time what types they are expecting from a given query. That means it would be possible for them to specify that before doing the query. So instead of doing this
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
boost::mysql::row row; while (result.async_read_one(row, additional_info, yield[ec]))
users could do
boost::mysql::row<T1, T2, T3, T3> row; while (result.async_read_one(row, additional_info, yield[ec]))
avoiding unnecessary further copies. In principle it would be even possible to parse directly in a custom data structure, for example, when storing json string in the database.
I see different use cases here: a) Parsing a row into a data structure known at compile-time. Rather than a row<T1, T2, T3, T4> I would go for a std::tuple<T1, T2, T3, T4> and a describe-able struct (annotated by BOOST_DESCRIBE_STRUCT). b) Parsing into custom data structures with arbitrary user code. This would be the case when parsing a JSON directly into a user-provided data structure. c) Parsing a row where the schema is not known at compile-time. This would be the case when implementing a MySQL command-line interface, for example. The current interface exposes c) natively, and would require a) and b) to be implemented on top of it by the user. Support for a) can be added by adding overloads to resultset::read_one. Supporting b) natively would require something like Beast's HTTP parser. I would say the main gain here is on user code simplicity rather than efficiency. There may be some efficiency gains when working with long strings, but I don't think they are relevant when compared to I/O operation overhead. Implementing a) natively is a decent amount of work but can be done. I would be reluctant to expose an interface for b) from the beginning as it increases the API surface of the library. In any case, I wouldn't get rid of the variant interface, as I think it can be useful for some use cases. I've raised https://github.com/anarthal/mysql/issues/60 to address this.
4) ================================
IMO, the examples are too limited and show variation of the same functionality. For example
- Text query, async with callbacks - Text query, async with futures - Text query, async with Boost.Coroutine coroutines - Text query, async with C++20 coroutines - Default completion tokens
Here you are showing nice Asio features. I am not meaning you should remove them but if I am not mistaken there is no example that shows
- How to keep long lasting connections open (specially for ssl).
Unless I'm missing something obvious, I don't think there is anything to do on the user side to keep connections open. Once you connect the underlying `Stream` and call `connection::handshake` (or the composed function `connection::connect`), the connection will remain open until: a) It is closed by the user (e.g. by calling `connection::close`). b) The connection remains idle by a certain amount of time, as defined by the MySQL variable https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_.... By default, this timeout is 8h. It can be changed at the database level and on a per-connection basis, by issuing a `SET SESSION` command (e.g. `conn.query("SET SESSION wait_timeout = 3600");` would set the connection timeout to 1h). Note that this is set *after* the connection is established. Unless I'm missing something, there is nothing special in handling SSL long-lasting connections - keep alives happen at the TCP level. There is no keep-alive mechanism at the MySQL protocol level. I've raised https://github.com/anarthal/mysql/issues/61 to add an example demonstrating b).
- Timers for resolve, connect, read and write.
There is an example on how to use Asio cancellation mechanism together with C++20 coroutines and Asio timers to implement timeouts for name resolution, connection establishment and queries here: https://anarthal.github.io/mysql/mysql/examples/timeouts.html Is this what you mean?
- Synchronized writes.
Not following you here, could you please elaborate on this particular use case?
- Healthy checks, failover etc.
At the protocol level, there is a COM_PING command (https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_com_ping.htm...) that allows the user to check whether the server is still alive or not. Issuing that requires additional library support. The interface for this would be something like `void connection::ping(error_code&)`. It would return an empty error code on success and an appropriate error code on failure. I guess this can be currently be work-arounded by issuing a dummy SQL query to the server as a health-check. I've raised https://github.com/anarthal/mysql/issues/62 to track the PING addition. I need to further investigate failover in replica set scenarios. I hope to do it today evening. I've raised https://github.com/anarthal/mysql/issues/63 to track this.
I estimate that to address these points users will be required to write further 500 - 1000 LOC on top of your library. Otherwise they will be stuck on underperforming short lived connections.
It would be helpful for users to have a starting point to begin with.
There some minor things I have also noticed:
5) ================================
https://github.com/anarthal/mysql/blob/13d3615464f41222052d9ad8fa0c86bcdddee...
I expect this function to be O(1).
Raised https://github.com/anarthal/mysql/issues/64 to track it.
6) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio... https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....
Sounds also unusual to have two member functions for the different versions of optional. I suggest using the boost version until we transition to C++17.
Why do you think it can be a problem? I would say having both just targets a broader audience: those using `boost::optional` and those using `std::optional`, without compromising either.
7) ================================
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__errc.html
errc should be conditions and not error AFAICS.
Not following you here - errc here is just acting the same as `boost::beast::http::error` enum.
============= Review End ================
I vote for conditional acceptance. Points 1, 2, 3 and 4 should be addressed before acceptance. My acceptance criteria sums up to
1) Remove functionality that provides low performance. With coroutines it will be easy for users to loop on async_query to create a std::vector of rows.
As I mentioned above, either removing or switching to return-by-lvalue seems OK as an acceptance condition for me.
2) No IO object should be returned as a parameter of the completion handler.
I can accept this as a performance improvement, but please read the note on executors above.
3) Review the value type. Can it be improved?
Does this mean that you consider parsing into compile-time user-defined data structures (e.g. `tuple`) an acceptance condition?
4) Improve examples.
Will inform as soon as I have an answer on failover, happy to provide the other ones.
Regards, Marcelo
Regards, Ruben.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Tue, May 10, 2022 at 1:33 PM Marcelo Zimbres Silva via Boost <boost@lists.boost.org> wrote:
Having a std::vector in the signature of the completion handler is unusual in ASIO. In general, large objects should be passed as parameters to the initiating function.
I agree that this is weird. Instead of a vector, the library could introduce a new concept, "RowAccumulator" which is a lightweight, movable value which is invoked to append elements to a notional container. This solves all the problems with allocators and memory-reuse. I suggest that if the library is accepted, it should be conditional upon removing the vector and replacing it with a concept. And the library should provide either in the examples, or as a public API - a wrapper that implements accumulation to a vector. Thanks

On Fri, 13 May 2022 at 19:20, Vinnie Falco via Boost <boost@lists.boost.org> wrote:
On Tue, May 10, 2022 at 1:33 PM Marcelo Zimbres Silva via Boost <boost@lists.boost.org> wrote:
Having a std::vector in the signature of the completion handler is unusual in ASIO. In general, large objects should be passed as parameters to the initiating function.
I agree that this is weird. Instead of a vector, the library could introduce a new concept, "RowAccumulator" which is a lightweight, movable value which is invoked to append elements to a notional container. This solves all the problems with allocators and memory-reuse.
How would that RowAccumulator concept be different from a regular output iterator with boost::mysql::row value type?
I suggest that if the library is accepted, it should be conditional upon removing the vector and replacing it with a concept. And the library should provide either in the examples, or as a public API - a wrapper that implements accumulation to a vector.
This makes sense. I've updated https://github.com/anarthal/mysql/issues/58 to include your comments. Regards, Ruben.
participants (10)
-
Dennis Luehring
-
Dominique Devienne
-
Julien Blanc
-
Marcelo Zimbres Silva
-
Mateusz Loskot
-
Peter Dimov
-
Phil Endecott
-
Richard Hodges
-
Ruben Perez
-
Vinnie Falco