Re: [boost] [review] [sqlite]

19 Nov 2024

      Hi all,

This is my review of the proposed Boost.Sqlite. First of all, thanks
Klemens for submitting the library, and Richard for managing the
review.
...
- Will the library bring additional out-of-the-box utility to Boost?
I think it will. SQLite is very popular, and many people will likely
benefit from the library.
...
- Did you try to use the library? With which compiler(s)? Did you
    have any problems?
I had two use cases in mind: regular SQL use, and virtual tables as a
way to expose in-memory data structures. I've implemented both. My
experiences with them follow.

To try regular SQL use, I adapted an existing example I used to have
in Boost.MySQL. It simulates an order management system for an online
store, with a command-line interface. It's intentionally simplistic. I
know that SQLite is more targeted towards resource-constrained
systems, rather than web applications, but it's been enough to have a
taste of how using the API looks like. For reference, the resulting
code is here: https://gist.github.com/anarthal/40c018cc4d133d0c9a082814d99d2a7a

I've used connection::prepare, connection::query, connection::execute,
statement, static_resultset and transaction. I've found the API mostly
satisfactory. Some gotchas:

1. I've found myself running into lifetime issues with code that
looked well-formed. At one point, I had the following:

sqlite::static_resultset<order_with_items>
get_order_with_items(std::int64_t order_id)
{
    sqlite::statement stmt = conn.prepare("SELECT ...");
    return conn.execute<order_with_items>({order_id});
}

This is undefined behavior (as stated in the docs), and needs to be
rewritten as:

sqlite::static_resultset<order_with_items>
get_order_with_items(std::int64_t order_id)
{
    return conn.prepare("SELECT ...")
        .execute<order_with_items>({order_id});
}

The problem being that both sqlite::static_resultset and
sqlite::statement are implemented in terms of a sqlite3_statement*.
The second snippet makes the static_resultset own the statement
handle, while the first version makes it a view. This problem will
probably appear much more frequently if you're using error codes
instead of exceptions.

IMO the problem arises due to a mismatch between the SQLite and the
Boost.Sqlite object model regarding statements and resultsets.
Resultsets don't exist in the C API, and thus may become problematic
in the C++ API. Making a type sometimes owning and sometimes
non-owning can be misleading.

As a quick idea (I haven't implemented it), you may have a reader
class, with a similar interface to the current resultset, that allows
iterating over the rows, but never takes ownership of the statement.
For example:

auto stmt = conn.prepare("SELECT ...");
stmt.execute({order_id});
for (const order& ord: stmt.reader<order>())
{
    //
}

You can probably make it a one-liner if you make execute return a
reference to the statement.

The main point would be making statement always owning, and
reader/resultset always a view.

2. I've found it impossible to use the static interface with fields
that may be NULL. I've tried the following:
  a. Using std::optional<T> and boost::optional<T> - these are not
implemented yet.
  b. Using sqlite::value is documented to work, but doesn't. The
author has stated that this is a bug.
  c. Using sqlite::field doesn't work either.

I've hot-patched the library to include the relevant tag_invoke
overload to make sqlite::field work. The tag_invoke mechanism looks
like a private interface, so as a regular user I would have not been
able to make use of it. Nullable fields are very common, and should
likely be supported by the static interface.

3. I tried using std::int64_t for my 64-bit integer fields and it
didn't work. You need to use sqlite3_int64. In my machine,
sqlite3_int64 is defined as a long long, while std::int64_t is defined
to be a long. I found this annoying, since it makes database
implementation types leak into business logic types.

The compiler errors when using the static interface are not as
informative as I'd have liked. When using a struct with an unsupported
type, it'd be very helpful if the offending type name appeared first
in the compiler message - diagnosing the sqlite3_int64 problem would
have been much easier. For example, these are the first lines that get
emitted when trying to use a struct with a wchar_t field (unsupported)
in the Boost.MySQL static interface:

/opt/boost-1.87.0-b1-rc2/include/boost/mysql/detail/typing/row_traits.hpp:172:13:
error: static assertion failed due to requirement
'is_readable_field<wchar_t>::value': You're trying to use an
unsupported field type in a row type. Review your row type
definitions.
   172 |             is_readable_field<T>::value,
       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~

4. In the static interface, type mismatches are silent. If one of your
field types doesn't match what your query returns, the library doesn't
emit any error and silently sets the field to its default value. While
this matches what the sqlite3_column_xxxx functions do, I think it'd
be more helpful to emit an error. Such type mismatches are likely due
to programming errors, and erroring helps to detect them soon. For
instance, Boost.MySQL does this by checking metadata (you can probably
use sqlite3_column_type to perform the check).

My second use case involves exposing in-memory data structures to
foreign processes. In one of my previous jobs, we had a high-severity
bug due to one of such tables getting stuck with incorrect data.
Diagnosing the problem was quite involved. A tool that made it easy to
inspect the data in such tables could have been a reasonable measure
after the bugfix.

I have a similar data structure in my servertech chat project
(https://github.com/anarthal/servertech-chat/blob/master/server/src/services/...),
so I thought on adapting the multi_index.cpp example to suit my needs.

I have no prior experience with SQLite virtual tables, but I've been
able to learn about them using SQLite's official documentation.
Adapting multi_index.cpp has been challenging, though, because:
a. Virtual tables are inherently complex.
b. There is no discussion page on virtual tables in the proposed library.
c. The provided examples are inherently complex and don't have many comments.

As a result, I've found myself reverse-engineering the library and
examples, which is not very rewarding.

I was under the wrong impression that loading the module in one
process would make it immediately available to other processes loading
the database file onto which it was loaded. This is my fault, as there
is no way for SQLite to do this. This implies that, once I have my
module, I need to write a protocol to accept and execute SQLite
queries (e.g. by using a UNIX socket). Unfortunately, the overall
architecture is too complex and has security implications that need to
be addressed, and thus would have never gone into production in the
company I used to work for.

This left me questioning whether using virtual tables to expose
in-memory data structures to SQLite is really strong. If I'm not
mistaken, the library has 3 examples on this, so I'm likely missing
something. I think a good motivation for these three examples would be
very valuable - that is, some comments stating what real world
situations would require me to write code like the one in the
examples.

I'm actually not convinced of the value added by the library in terms
of virtual tables. My impression is that providing tested, robust
virtual table implementations for concrete use cases would add more
value than what the current infrastructure offers. In other words, is
writing virtual table implementations really something that happens in
day to day development? I feel that the answer to this question is no,
so I find it surprising that all examples except one are about virtual
tables. Take this with perspective, though, as I'm no expert in this
topic.
...
- Will the choice of API abstraction model ease the development of software that must talk to a SQLITE database?
The API simplifies regular SQL usage, as explained above. A couple of comments:

1. The virtual table infrastructure uses a macro
(BOOST_SQLITE_NO_VIRTUAL) to conditionally make functions virtual. I
think this is a potential problem in the long-run. If virtualization
is not required, appropriate compile-time checks should be performed,
instead.
2. sqlite::transaction features a throwing destructor, which I find surprising.
...
- What is your evaluation of the implementation?
I've only peaked at it in particular sites, and have the following comments:

1. transaction::commit throwing and non-throwing overloads seem to
have different behavior on error. One will call rollback() and the
other one won't. This case doesn't look to be covered by unit tests.
2. The virtual table infrastructure may be invoking undefined behavior
by violating strict aliasing, casting sqlite3_value* into
sqlite::value via a reinterpret_cast. Some conversations have happened
in the Slack workspace about it, with conflicting opinions whether
this is actual undefined behavior or not.
3. Compiling with -Wall generates a lot of warnings, most of which are
located in headers. I'd advise building with -Wall and -Werror in the
CI to clean up these.
4. Passing incorrect parameter types to .execute() and .query()
doesn't yield clean error messages. It would be beneficial
implementing concepts (or a static_assert) for better messages. As I
mentioned earlier, this also applies for the static interface.
5. Compiler coverage in CI looks improvable. The asan job seems
commented out, and I can't see any build targeting C++20 or 23.
6. If BOOST_SQLITE_NO_VIRTUAL is to stay, it needs to be documented and tested.
...
- What is your evaluation of the documentation?
In my opinion, the documentation is insufficient for the level
required by a Boost library. I've found the virtual table
functionality specially involved because of this. Some of my major
comments:

1. All supported, public functionality should be documented, both in
the reference pages and in the discussion. This is not the case for
most of the library: virtual tables, hooks, json, metadata, mutexes,
transactions, blobs and memory management (sqlite::memory_tag) are not
explained in the discussion.
2. Excepting virtual tables, the rest of the functionality isn't
explained using examples, either.
3. Some Doxygen comments seem outdated, referencing functions that
have been removed or renamed. See the bottom of this email for a list
of what I found.
4. Many Doxygen comments don't provide the information I'd expect to
find, like exception safety, lifetime rules, or the underlying
sqlite3_xxx functions they call.
...
- How much effort did you put into your evaluation? A glance? A quick
    reading? In-depth study?
I've spent around 12h doing inspecting the library, building the
examples, writing the online store simulator, adapting the
multi_index.cpp example and writing this review.
...
- Are you knowledgeable about the problem domain?
I've had mild SQLite experience in the past, mainly in prototypes. I
didn't have prior experience with virtual tables. I have experience
with SQL in general, and am the author of Boost.MySQL.
...
Final decision
Unfortunately, my final recommendation is to REJECT Boost.Sqlite in
its current form. I think the concept is interesting and could benefit
many people, but the library still requires some work before it gets
into Boost. The documentation needs a lot of work. Getting some field
experience would also be beneficial.

Thanks again Klemens and Richard for your effort. I'd love to see the
library being proposed again in the future.

Typos and documentation comments:

* The documentation should state which C++ standards and compilers are
tested and supported.
* The examples on virtual tables require step-by-step comments, since
they implement non-trivial logic.
* The examples on virtual tables need a rationale. That is, as a user,
when would I find myself writing code like this?
* The discussion seems to jump to advanced details too quickly. There
are very few docs on everyday tasks.
* The following types don't have any documentation at all but appear
in the public namespace. They should be either documented or made
private: allocator, unique_ptr, msize, make_unique,
set_variant_result, like, glob, icmp,
* Functionality that allocates using sqlite3-specific memory
allocations should likely state that fact it in the reference.
* Intro: "This library provides a simple C++ sqlite library." Do you
mean "simple C++ SQLite wrapper library"?
* Discussion: "Prepared statements can also be used multiple time and
used with named parameters instead of indexed." should be "multiple
times" and "named parameters instead of indexed ones."
* Discussion: "The result of a query is a field type,": this is
technically not true - you get a resultset, which yields rows that
have fields.
* Discussion: "Fields & values can have subtypes, while parameter to
prepared statements do not have thos associated.": should be
"parameters" and "do not have associated subtypes".
* In general: avoid & as a way to state "and"
* vtable.hpp: "@tparam T The implementation type of the module. It
must inherit": I don't know what this phrase means. The type
requirements for T likely need a page on their own.
* vtable.hpp: "It's lifetime is managed by the database.": should be "Its"
* vtable.hpp: "@param The requirements for `module`.": this creates a
bogus parameter entry, looks like a leftover
* field.hpp and value.hpp: "Returns the value as text, i.e. a
string_view. Note that this value may be invalidated`.": this is not
enough information. It should list when it gets invalidated.
* statement.hpp: "The handle is shared between the statement &
resultset. The statemens need to be kept alive.": should be
"statement" and "needs".
* statement.hpp: "This can be a map, a vector or a stuple": should be "tuple"
* vtable.hpp (create function): "The instance_type gets used &
managed" references an instance_type that does not exist (presumably
you meant table_type).
* vtable.hpp: "Destroy the storage = this function needs to be present
for non eponymous tables": syntax error
* vtable.hpp: "index info used by the find_index function": no
find_index function exists, you presumably mean best_index
* cstring_ref.hpp: doesn't have any function documented
* connection.hpp: "Perform a query without parametert, It execute a
multiple statement.": should be "parameter"
* collation.hpp: "a case insensitive string omparison, e.g. from
boost.urls": should be "comparison"
* README.md: "auto r = q.current();''" there's an extra '' there. I'd
advise to run documentation snippets to prevent such errors
* multi_index.cpp: the create table statement uses the "url" name,
which doesn't harm but is misleading
* multi_index.cpp: I'd advise to mark overriding methods with the
"override" keyword
* multi_index.cpp: "static_assert(sizeof(const_iterator) <=
sizeof(sqlite3_int64), "");": why is the static_assert needed? Please
add a comment documenting it
* multi_index.cpp and ordered_map.cpp: declares an "enum indices" that
presumably was thought to make bit fiddling more clear, but don't seem
to be used, increasing user confusion.

On Wed, 13 Nov 2024 at 13:31, Richard Hodges via Boost
<boost@lists.boost.org> wrote:
...
Dear All,
Surprise!
The Boost formal review of the Boost SQLITE library starts *TODAY*, taking
place
from November 13th, 2024 to November 22nd, 2024 (inclusive).
I apologise profusely for springing this on you without prior warning. The
error is entirely mine. I am extending the period by one day to compensate.
The library is authored by Klemens Morgenstern (@klemens-morgenstern in the
CppLang
slack).
Documentation: https://klemens.dev/sqlite/
<https://anarthal.github.io/mysql/index.html>
Source: https://github.com/klemens-morgenstern/sqlite
<https://github.com/anarthal/mysql/>
From the documentation:
boost.sqlite is a simple to use C++ sqlite library.
It provides a modern interface using facilities like error_code,
views (e.g. for blobs) and the ability to use boost.describe or boost.pfr
for parameterised queries.
Supported features include:
- typed queries
   - prepared statements
   - json support
   - custom functions (scalar, aggregate, windows)
   - event hooks
   - virtual tables
SQLite provides an excellent C-API, so this library does not attempt to
hide, but to augment it.
Please provide in your review information you think is valuable to explain
your choice to ACCEPT or REJECT including SQLITE as a Boost library. Please
be explicit about your decision (ACCEPT or REJECT).
Some other questions you might want to consider answering:
- Will the library bring additional out-of-the-box utility to Boost?
  - What is your evaluation of the implementation?
  - What is your evaluation of the documentation?
  - Will the choice of API abstraction model ease the development of
software that must talk to a SQLITE database?
  - Are there any immediate improvements that could be made after
acceptance, if acceptance should happen?
  - Did you try to use the library? With which compiler(s)? Did you
    have any problems?
  - How much effort did you put into your evaluation? A glance? A quick
    reading? In-depth study?
  - Are you knowledgeable about the problem domain?
More information about the Boost Formal Review Process can be found
at: http://www.boost.org/community/reviews.html
The review is open to anyone who is prepared to put in the work of
evaluating and reviewing the library. Prior experience in contributing to
Boost reviews is not a requirement.
Thank you for your efforts in the Boost community. They are very much
appreciated.
Richard Hodges
- review manager of the proposed Boost.SQLITE library
Klemens is often available on CppLang Slack and of course by email should
you
require any clarification not covered by the documentation, as am I.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Re: [boost] [review] [sqlite]

Ruben Perez