Re: [boost] Proposing a MySQL client library for inclusion in Boost

newer
[release][multi_index] permission...

older
Boost.URL Proposal max_size

Marcelo Zimbres Silva

31 Mar 2022 31 Mar '22

4 p.m.

Hi Ruben, I have had a look into your library and some questions arise

...

https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea... The handler signature for this operation is void(boost::mysql::error_code, std::vector<boost::mysql::row>). etc.

Having a std::vector in the signature of the completion handler is unusual in ASIO. In general large objects should be passed as parameters in the async operation e.g. socket::async_read gets a dynamic_buffer which is a lightweight view to storage. Additionally, there is not support for custom allocators. That wouldn't be a big problem to me if I could reuse the same memory for each request but the interface above prevents me from doing so.

...

https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu... The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

...

https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection.html

I don't see a reason for this class, or at least I fail to see what state does the mysql protocol imposes on you that would justify this class. I would prefer free funcitions much more here, for example, in the same way as beast::http::async_read.

...

https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio... https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....

Sounds also unusual to have two member functions for the different versions of optional. Regards, Marcelo

Show replies by date

Ruben Perez

31 Mar 31 Mar

9:46 p.m.

New subject: Proposing a MySQL client library for inclusion in Boost

Hi Marcelo, Thank you for taking your time to look into the library. Let me explain inline my design decisions and what can be changed or not. On Thu, 31 Mar 2022 at 18:14, Marcelo Zimbres Silva via Boost < boost@lists.boost.org> wrote:

...

Hi Ruben,

I have had a look into your library and some questions arise

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_rea...

...
The handler signature for this operation is void(boost::mysql::error_code, std::vector<boost::mysql::row>). etc.

Having a std::vector in the signature of the completion handler is unusual in ASIO. In general large objects should be passed as parameters in the async operation e.g. socket::async_read gets a dynamic_buffer which is a lightweight view to storage.

...

Additionally, there is not support for custom allocators. That wouldn't be a big problem to me if I could reuse the same memory for each request but the interface above prevents me from doing so.

The rationale behind that function was to provide an easy-to-use interface. You can use resultset::async_read_one <https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_read_one.html> to read rows one-by-one, reusing memory storage and thus maximizing efficiency. That being said, I think your point is valid. If the community agrees with you in this, I can replace the current resultset::async_read_many <https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_read_many.html> and resultset::async_read_all <https://anarthal.github.io/mysql/mysql/ref/boost__mysql__resultset/async_read_all.html> by functions taking an rvalue reference to a std::vector<rows>. It would be trivial then to support vectors with custom allocators.

...

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...

...
The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).

...

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection.html

I don't see a reason for this class, or at least I fail to see what state does the mysql protocol imposes on you that would justify this class. I would prefer free funcitions much more here, for example, in the same way as beast::http::async_read.

The connection object keeps the following protocol state: - The sequence number. This is an int that gets incremented for each frame received and sent by the protocol. Some operations reset this sequence number (e.g. query), but others don't (e.g. resultset::read_one). - The capabilities. This is a bitmask negotiated between client and server during handshake, applicable to the lifespan of the connection, and used to serialize and deserialize some packages. - An SSL state flag. This is applicable for SSL-enabled <https://anarthal.github.io/mysql/mysql/ssl.html#mysql.ssl.streams> Stream types, only, and is used to implement SSL negotiation <https://anarthal.github.io/mysql/mysql/ssl.html#mysql.ssl.negotiation>. If your Stream supports SSL, then you have the option to always use it, use it only if the server supports it, or never use it (as per the ssl_mode <https://anarthal.github.io/mysql/mysql/ref/boost__mysql__ssl_mode.html> option). This is inspired by the official MySQL client --ssl-mode <https://dev.mysql.com/doc/refman/8.0/en/connection-options.html#option_general_ssl-mode> flag. If the negotiation determines that no SSL is to be used, this fact is kept in the connection object and the next layer to the SSL stream object is used for I/O. Additionally, the connection object also stores a dynamic buffer (based on a vector<uint8_t> using the default allocator) to store frames. This buffer is used to serialize all outgoing messages, and to store all incoming messages except for rows and metadata objects. The connection also instantiates an object of the passed Stream type. If you are curious about the code, please have a look at the channel <https://github.com/anarthal/mysql/blob/master/include/boost/mysql/detail/protocol/channel.hpp> I/O object (which is not part of the interface, but an internal object that actually holds the state I mentioned). A single channel object is created per connection, and connections, resultsets and prepared_statements just hold pointers to it. I know not being able to configure the send/receive buffer is also unusual, specially when compared with Beast. I'm not keen on accepting an arbitrary DynamicBuffer in this context, as I think optimizing that buffer requires knowledge about the MySQL protocol, which is better handled by the library (I'm currently writing a more optimized version of the buffering strategy that shouldn't change the library's interface). I'm open to suggestions though.

...

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_std_optio...

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__value/get_optional....

Sounds also unusual to have two member functions for the different versions of optional.

Could you please suggest an alternative?

...

Regards, Marcelo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Regards, Ruben.

Marcelo Zimbres Silva

1 Apr 1 Apr

10:32 a.m.

New subject: Proposing a MySQL client library for inclusion in Boost

Hi Ruben,

...

...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu... The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).

My expectation is that the communication with the mysql server occurs only through the connection class. It feels awkward to me that I need these proxy objects when there is a connection object around, the lifetime of these proxies are bound to it for example. Some further points: - A row in your library is basically std::vector<value>, which means std::vector<row> is pessimized storage for rows with same length. Aren't rows always equal in length for a table? - Your value class is a variant behind the scenes i.e. boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>; I would like to understand what is the lifetime of the string_view. Is it pointing to some kind of internal buffer? I expect all data to be owned by the client class after the completion of an async operation. Having pointers pointing to the connection internal state feels bad. - There one further point that doesn't seem to play well with this variant design. Some people may store some kind of serialized data in the database e.g. json strings. It would be nice if it were possible to read those values avoiding temporaries, that is important to reduce latency and memory usage on large payloads. It would look something like struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...> myrow row; result.read_row(row); In this case, the parser would have to call user code every time it reads user data in the socket buffer, instead of copying to additional buffers ( in order to hand it to user later when parsing is done). Marcelo

Ruben Perez

2:24 p.m.

New subject: Proposing a MySQL client library for inclusion in Boost

On Fri, 1 Apr 2022 at 12:32, Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...

Hi Ruben,

...
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...

...
The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).

My expectation is that the communication with the mysql server occurs only through the connection class. It feels awkward to me that I need these proxy objects when there is a connection object around, the lifetime of these proxies are bound to it for example.

In addition to a pointer to the connection object, resultsets also hold several pieces of metadata about rows. Because of how the MySQL protocol works, these pieces of metadata are mandatory to parse rows. The MySQL server sends these as several packets following connection::query or prepared_statement::execute, and the library stores these in the resultset object. In my opinion, you need an object representing this information about how to interpret a resultset. Something similar happens with prepared_statement. Something we could do is remove the pointer to the connection from the resultset and prepared_statement objects, making these just plain data objects. You would have the following function signatures in connection (I'll just list the sync-with-exception ones for brevity): void connection::query(string_view query, resultset& output); void connection::prepare_statement(string_view statement, prepared_statement& output); void connection::execute_statement(const prepared_statement& stmt, resultset& output); void connection::read_row(resultset& resultset, row& output); The drawback of this approach is that you should always call execute_statement and read_row on the connection object that created the statement or resultset object, and with this approach there is nothing enforcing this. I picked the other approach because I feel it is more semantic. Would these signatures make more sense for you? It is true that the current approach ties the lifetime of prepared_statement and resultset to connection's lifetime. It is indicated in the docs here <https://anarthal.github.io/mysql/mysql/resultsets.html#mysql.resultsets.complete>, but maybe we could make this point clearer in the reference. I would also like to listen to what other members in the community think about this.

...

Some further points:

- A row in your library is basically std::vector<value>, which means std::vector<row> is pessimized storage for rows with same length. Aren't rows always equal in length for a table?

The std::vector<row> overloads are not optimized for performance, that is true. If you're striving for better performance, I would go for not having a vector at all, and reading one row at a time with resultset::read_one, which allows you to re-use the same row over and over, avoiding allocations almost at all. What would be your suggestion for the vector<row> approach? Having an ad-hoc, matrix-like container for values?

...

- Your value class is a variant behind the scenes i.e.

boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>;

I would like to understand what is the lifetime of the string_view. Is it pointing to some kind of internal buffer? I expect all data to be owned by the client class after the completion of an async operation. Having pointers pointing to the connection internal state feels bad.

The string_view's point to memory owned by rows, and never to the connection internal buffers. As long as you keep your row object alive, its values will be valid. This is also the reason why row objects are movable but not copyable. Currently, row packets are read into the row buffers directly and parsed in-place, to avoid copying. This is stated in the row object <https://anarthal.github.io/mysql/mysql/ref/boost__mysql__row.html> docs, but it may be worth noting it in the value object docs, too. Having all the variant alternatives as cheap-to-copy objects has the advantage of being able to implement functions with conversions, like value::get or value::get_optional, without worrying about expensive copies.

...

- There one further point that doesn't seem to play well with this variant design. Some people may store some kind of serialized data in the database e.g. json strings. It would be nice if it were possible to read those values avoiding temporaries, that is important to reduce latency and memory usage on large payloads. It would look something like

struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>

myrow row; result.read_row(row);

In this case, the parser would have to call user code every time it reads user data in the socket buffer, instead of copying to additional buffers ( in order to hand it to user later when parsing is done).

I have in mind implementing this in future versions. My idea is creating a concept similar to Beast's http::basic_parser <https://www.boost.org/doc/libs/master/libs/beast/doc/html/beast/ref/boost__beast__http__basic_parser.html>. That row_parser object should be a type with member functions like: row_parser::on_value(std::uint8_t) row_parser::on_value(std::uint16_t) // And so on, for all the possible types MySQL allows // Strings could have special handling, implementing incremental parsing for them That would be called like: struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>; struct my_parser { myrow r; error_code on_value(std::uint8_t) { /* place the value in your row object } }; my_parser parser; resultset.read_row(parser); I'm not keen on implementing a higher-level system that allows the user to describe rows and to pass them to read_row, as you described, because I think this is typical ORM functionality, and thus should be implemented by a higher-level component. ORMs usually need to attach more information to row objects than the one we would need to parse the rows (e.g. information to issue CREATE TABLE statements).

...

Marcelo

Ruben

Marcelo Zimbres Silva

9:35 p.m.

New subject: Proposing a MySQL client library for inclusion in Boost

Hi, On Fri, 1 Apr 2022 at 16:24, Ruben Perez <rubenperez038@gmail.com> wrote:

...

...
On Fri, 1 Apr 2022 at 12:32, Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...
Hi Ruben,

...
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu... The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).

My expectation is that the communication with the mysql server occurs only through the connection class. It feels awkward to me that I need these proxy objects when there is a connection object around, the lifetime of these proxies are bound to it for example.

In addition to a pointer to the connection object, resultsets also hold several pieces of metadata about rows. Because of how the MySQL protocol works, these pieces of metadata are mandatory to parse rows. The MySQL server sends these as several packets following connection::query or prepared_statement::execute, and the library stores these in the resultset object. In my opinion, you need an object representing this information about how to interpret a resultset. Something similar happens with prepared_statement.

Something we could do is remove the pointer to the connection from the resultset and prepared_statement objects, making these just plain data objects. You would have the following function signatures in connection (I'll just list the sync-with-exception ones for brevity):

void connection::query(string_view query, resultset& output); void connection::prepare_statement(string_view statement, prepared_statement& output); void connection::execute_statement(const prepared_statement& stmt, resultset& output); void connection::read_row(resultset& resultset, row& output);

This looks better to me. If the metadata is simple and small enough you can return it on the completion handler as you are doing now. That would be ok to me as it is not an IO object anymore.

...

...
The drawback of this approach is that you should always call execute_statement and read_row on the connection object that created the statement or resultset object, and with this approach there is nothing enforcing this. I picked the other approach because I feel it is more semantic. Would these signatures make more sense for you?

You can still provide the call to execute_statement and read_row as a composed operation so that users have to call only one function.

...

...
...
Some further points:

- A row in your library is basically std::vector<value>, which means std::vector<row> is pessimized storage for rows with same length. Aren't rows always equal in length for a table?

The std::vector<row> overloads are not optimized for performance, that is true. If you're striving for better performance, I would go for not having a vector at all, and reading one row at a time with resultset::read_one, which allows you to re-use the same row over and over, avoiding allocations almost at all.

That is what I would do.

...

...
What would be your suggestion for the vector<row> approach? Having an ad-hoc, matrix-like container for values?

I wouldn't offer this function at all. Let users decide how they want to read the rows. With coroutines this becomes trivial.

...

...
...
- Your value class is a variant behind the scenes i.e.

boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>;

I would like to understand what is the lifetime of the string_view. Is it pointing to some kind of internal buffer? I expect all data to be owned by the client class after the completion of an async operation. Having pointers pointing to the connection internal state feels bad.

The string_view's point to memory owned by rows, and never to the connection internal buffers. As long as you keep your row object alive, its values will be valid. This is also the reason why row objects are movable but not copyable. Currently, row packets are read into the row buffers directly and parsed in-place, to avoid copying. This is stated in the row object docs, but it may be worth noting it in the value object docs, too.

Having all the variant alternatives as cheap-to-copy objects has the advantage of being able to implement functions with conversions, like value::get or value::get_optional, without worrying about expensive copies.

...
- There one further point that doesn't seem to play well with this variant design. Some people may store some kind of serialized data in the database e.g. json strings. It would be nice if it were possible to read those values avoiding temporaries, that is important to reduce latency and memory usage on large payloads. It would look something like

struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>

myrow row; result.read_row(row);

In this case, the parser would have to call user code every time it reads user data in the socket buffer, instead of copying to additional buffers ( in order to hand it to user later when parsing is done).

I have in mind implementing this in future versions. My idea is creating a concept similar to Beast's http::basic_parser. That row_parser object should be a type with member functions like:

row_parser::on_value(std::uint8_t) row_parser::on_value(std::uint16_t) // And so on, for all the possible types MySQL allows // Strings could have special handling, implementing incremental parsing for them

That would be called like:

struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>; struct my_parser { myrow r; error_code on_value(std::uint8_t) { /* place the value in your row object } };

my_parser parser; resultset.read_row(parser);

I think it get even simpler, once you allow users to pass their own types as rows, all they have to provide is a function to deserialize them from the read-buffer directly into the object e.g. void from_string(T1& obj, char const* data, std::size_t size, boost::system::error_code& ec) void from_string(T2& obj, char const* data, std::size_t size, boost::system::error_code& ec) ... std::tuple<T1, T2, ...> row; connection.read_row(row); When your parser hits user data it calls from_string with the correct type that it knows now, as you are passing as argument. Marcelo

Ruben Perez

10:01 p.m.

New subject: Proposing a MySQL client library for inclusion in Boost

On Fri, 1 Apr 2022 at 23:35, Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...

Hi,

...
...
On Fri, 1 Apr 2022 at 12:32, Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...
Hi Ruben,

...
...
https://anarthal.github.io/mysql/mysql/ref/boost__mysql__connection/async_qu...

...
...
...
The handler signature for this operation is void(boost::mysql::error_code, boost::mysql::resultset<Stream>)

Hier the completion handler is returning an io object i.e. boost::mysql::resultset, which is also unusual as far as I can see. Does that play nice with executors? I would prefer having the completion handler return config parameters with which I can instantiate a boost::mysql::resultset myself in my desired executor.

The three I/O objects this library provides (connection, resultset and prepared_statement) are just proxies for the underlying Stream I/O object in terms of the executor (i.e. they just return the underlying Stream::get_executor() value).

My expectation is that the communication with the mysql server occurs only through the connection class. It feels awkward to me that I need these proxy objects when there is a connection object around, the lifetime of these proxies are bound to it for example.

In addition to a pointer to the connection object, resultsets also hold several pieces of metadata about rows. Because of how the MySQL protocol works, these pieces of metadata are mandatory to parse rows. The MySQL server sends these as several packets following connection::query or prepared_statement::execute, and the library stores these in the resultset object. In my opinion, you need an object representing this information about how to interpret a resultset. Something similar happens with prepared_statement.

Something we could do is remove the pointer to the connection from the resultset and prepared_statement objects, making these just plain data objects. You would have the following function signatures in connection (I'll just list the sync-with-exception ones for brevity):

void connection::query(string_view query, resultset& output); void connection::prepare_statement(string_view statement,

On Fri, 1 Apr 2022 at 16:24, Ruben Perez <rubenperez038@gmail.com> wrote: prepared_statement& output);

...
...
void connection::execute_statement(const prepared_statement& stmt, resultset& output); void connection::read_row(resultset& resultset, row& output);

This looks better to me. If the metadata is simple and small enough you can return it on the completion handler as you are doing now. That would be ok to me as it is not an IO object anymore.

It is a pretty complex structure but mostly lives in the heap and it is move-only. I can consider returning this as rvalue reference, though.

...

...
...
The drawback of this approach is that you should always call execute_statement and read_row on the connection object that created the statement or resultset object, and with this approach there is nothing enforcing this. I picked the other approach because I feel it is more semantic. Would these signatures make more sense for you?

You can still provide the call to execute_statement and read_row as a composed operation so that users have to call only one function.

...
...
...
Some further points:

- A row in your library is basically std::vector<value>, which means std::vector<row> is pessimized storage for rows with same length. Aren't rows always equal in length for a table?

The std::vector<row> overloads are not optimized for performance, that is true. If you're striving for better performance, I would go for not having a vector at all, and reading one row at a time with resultset::read_one, which allows you to re-use the same row over and over, avoiding allocations almost at all.

That is what I would do.

...
...
What would be your suggestion for the vector<row> approach? Having an ad-hoc, matrix-like container for values?

I wouldn't offer this function at all. Let users decide how they want to read the rows. With coroutines this becomes trivial.

...
...
...
- Your value class is a variant behind the scenes i.e.

boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>;

I would like to understand what is the lifetime of the string_view. Is it pointing to some kind of internal buffer? I expect all data to be owned by the client class after the completion of an async operation. Having pointers pointing to the connection internal state feels bad.

The string_view's point to memory owned by rows, and never to the connection internal buffers. As long as you keep your row object alive, its values will be valid. This is also the reason why row objects are movable but not copyable. Currently, row packets are read into the row buffers directly and parsed in-place, to avoid copying. This is stated in the row object docs, but it may be worth noting it in the value object docs, too.

Having all the variant alternatives as cheap-to-copy objects has the advantage of being able to implement functions with conversions, like value::get or value::get_optional, without worrying about expensive copies.

...
- There one further point that doesn't seem to play well with this variant design. Some people may store some kind of serialized data in the database e.g. json strings. It would be nice if it were possible to read those values avoiding temporaries, that is important to reduce latency and memory usage on large payloads. It would look something like

struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>

myrow row; result.read_row(row);

In this case, the parser would have to call user code every time it reads user data in the socket buffer, instead of copying to additional buffers ( in order to hand it to user later when parsing is done).

I have in mind implementing this in future versions. My idea is creating a concept similar to Beast's http::basic_parser. That row_parser object should be a type with member functions like:

row_parser::on_value(std::uint8_t) row_parser::on_value(std::uint16_t) // And so on, for all the possible types MySQL allows // Strings could have special handling, implementing incremental parsing for them

That would be called like:

struct mydata1, mydata2; using myrow = std::tuple<std::string, ..., mydata2, ..., mydata, ...>; struct my_parser { myrow r; error_code on_value(std::uint8_t) { /* place the value in your row object } };

my_parser parser; resultset.read_row(parser);

I think it get even simpler, once you allow users to pass their own types as rows, all they have to provide is a function to deserialize them from the read-buffer directly into the object e.g.

void from_string(T1& obj, char const* data, std::size_t size, boost::system::error_code& ec) void from_string(T2& obj, char const* data, std::size_t size, boost::system::error_code& ec)

You're assuming MySQL sends everything as strings that can be trivially parsed, but that's not the case. There are two different encodings, depending on whether a row came from connection::query or prepared_statement::execute (text queries employ a plain-text format, while prepared statements use binary encoding). In particular, the types DATE, DATETIME, TIME and BIT employ a nasty struct-like format in the binary protocol. BIT employs a binary-inside-text representation in the text protocol. DATE, TIME and DATETIME have the concept of "zero dates", which are invalid values MySQL allows but that don't represent any actual value. NULL values are represented as a bitmap in the binary protocol, and as special values in the text protocol. I would expect a MySQL library to handle all these details (I would say this is one of the main benefits this library grants the user). As a user, I wouldn't want to know all this stuff.

...

...

std::tuple<T1, T2, ...> row; connection.read_row(row);

When your parser hits user data it calls from_string with the correct type that it knows now, as you are passing as argument.

Marcelo

Peter Dimov

10:23 p.m.

New subject: Proposing a MySQL client library for inclusion in Boost

Ruben Perez wrote: ...

...

...
...
...
...
- Your value class is a variant behind the scenes i.e.

boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>; ... You're assuming MySQL sends everything as strings that can be trivially parsed, but that's not the case. There are two different encodings, depending on whether a row came from connection::query or prepared_statement::execute (text queries employ a plain-text format, while prepared statements use binary encoding). In particular, the types DATE, DATETIME, TIME and BIT employ a nasty struct-like format in the binary protocol. BIT employs a binary-inside-text representation in the text protocol. DATE, TIME and DATETIME have the concept of "zero dates", which are invalid values MySQL allows but that don't represent any actual value. NULL values are represented as a bitmap in the binary protocol, and as special values in the text protocol.

I would expect a MySQL library to handle all these details (I would say this is one of the main benefits this library grants the user). As a user, I wouldn't want to know all this stuff.

Right. But, if I have, say struct X { int64_t x; string y; }; BOOST_DESCRIBE_STRUCT(X, (), (x, y)) and a matching result schema, isn't it reasonable to make `read_row` work for X? (Values that can be null would use boost::optional or std::optional as members.) I suppose the same applies to std::tuple<int64_t, string>.

Ruben Perez

2 Apr 2 Apr

7:33 a.m.

New subject: Proposing a MySQL client library for inclusion in Boost

On Sat, 2 Apr 2022, 00:23 Peter Dimov, <pdimov@gmail.com> wrote:

...

...
...
...
...
...
- Your value class is a variant behind the scenes i.e.

boost::variant2::variant<null_t, std::int64_t, std::uint64_t, boost::string_view, float, double, date, datetime, time>; ... You're assuming MySQL sends everything as strings that can be trivially parsed, but that's not the case. There are two different encodings, depending on whether a row came from connection::query or prepared_statement::execute (text queries employ a plain-text format, while prepared statements use binary encoding). In particular, the types DATE, DATETIME, TIME and BIT employ a nasty struct-like format in the binary protocol. BIT employs a binary-inside-text representation in

...
protocol. DATE, TIME and DATETIME have the concept of "zero dates", which are invalid values MySQL allows but that don't represent any actual value. NULL values are represented as a bitmap in the binary protocol, and as special values in the text protocol.

I would expect a MySQL library to handle all these details (I would say

Ruben Perez wrote: ... the text this is

...
one of the main benefits this library grants the user). As a user, I wouldn't want to know all this stuff.

Right.

But, if I have, say

struct X { int64_t x; string y; };

BOOST_DESCRIBE_STRUCT(X, (), (x, y))

and a matching result schema, isn't it reasonable to make `read_row` work for X?

It is. Does Describe provide any concept check (like is_describable_class_v<T>)? I would still implement these on top of the row parser I described in my previous email. I think these are all very useful features, and the library's direction goes towards there, but feel they may be nice-to-haves at this particular point in time. I would say the current variant interface is not perfect but covers a decent amount of use cases. This new custom row parsing mechanism can be introduced without breaking the current interface.

...

(Values that can be null would use boost::optional or std::optional as members.)

I suppose the same applies to std::tuple<int64_t, string>.

Granted, same point here.

...

1225

Age (days ago)

1227

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Marcelo Zimbres Silva
Peter Dimov
Ruben Perez