
Hi all, This is my review of the proposed Boost.Sqlite. First of all, thanks Klemens for submitting the library, and Richard for managing the review.
- Will the library bring additional out-of-the-box utility to Boost?
I think it will. SQLite is very popular, and many people will likely benefit from the library.
- Did you try to use the library? With which compiler(s)? Did you have any problems?
I had two use cases in mind: regular SQL use, and virtual tables as a way to expose in-memory data structures. I've implemented both. My experiences with them follow. To try regular SQL use, I adapted an existing example I used to have in Boost.MySQL. It simulates an order management system for an online store, with a command-line interface. It's intentionally simplistic. I know that SQLite is more targeted towards resource-constrained systems, rather than web applications, but it's been enough to have a taste of how using the API looks like. For reference, the resulting code is here: https://gist.github.com/anarthal/40c018cc4d133d0c9a082814d99d2a7a I've used connection::prepare, connection::query, connection::execute, statement, static_resultset and transaction. I've found the API mostly satisfactory. Some gotchas: 1. I've found myself running into lifetime issues with code that looked well-formed. At one point, I had the following: sqlite::static_resultset<order_with_items> get_order_with_items(std::int64_t order_id) { sqlite::statement stmt = conn.prepare("SELECT ..."); return conn.execute<order_with_items>({order_id}); } This is undefined behavior (as stated in the docs), and needs to be rewritten as: sqlite::static_resultset<order_with_items> get_order_with_items(std::int64_t order_id) { return conn.prepare("SELECT ...") .execute<order_with_items>({order_id}); } The problem being that both sqlite::static_resultset and sqlite::statement are implemented in terms of a sqlite3_statement*. The second snippet makes the static_resultset own the statement handle, while the first version makes it a view. This problem will probably appear much more frequently if you're using error codes instead of exceptions. IMO the problem arises due to a mismatch between the SQLite and the Boost.Sqlite object model regarding statements and resultsets. Resultsets don't exist in the C API, and thus may become problematic in the C++ API. Making a type sometimes owning and sometimes non-owning can be misleading. As a quick idea (I haven't implemented it), you may have a reader class, with a similar interface to the current resultset, that allows iterating over the rows, but never takes ownership of the statement. For example: auto stmt = conn.prepare("SELECT ..."); stmt.execute({order_id}); for (const order& ord: stmt.reader<order>()) { // } You can probably make it a one-liner if you make execute return a reference to the statement. The main point would be making statement always owning, and reader/resultset always a view. 2. I've found it impossible to use the static interface with fields that may be NULL. I've tried the following: a. Using std::optional<T> and boost::optional<T> - these are not implemented yet. b. Using sqlite::value is documented to work, but doesn't. The author has stated that this is a bug. c. Using sqlite::field doesn't work either. I've hot-patched the library to include the relevant tag_invoke overload to make sqlite::field work. The tag_invoke mechanism looks like a private interface, so as a regular user I would have not been able to make use of it. Nullable fields are very common, and should likely be supported by the static interface. 3. I tried using std::int64_t for my 64-bit integer fields and it didn't work. You need to use sqlite3_int64. In my machine, sqlite3_int64 is defined as a long long, while std::int64_t is defined to be a long. I found this annoying, since it makes database implementation types leak into business logic types. The compiler errors when using the static interface are not as informative as I'd have liked. When using a struct with an unsupported type, it'd be very helpful if the offending type name appeared first in the compiler message - diagnosing the sqlite3_int64 problem would have been much easier. For example, these are the first lines that get emitted when trying to use a struct with a wchar_t field (unsupported) in the Boost.MySQL static interface: /opt/boost-1.87.0-b1-rc2/include/boost/mysql/detail/typing/row_traits.hpp:172:13: error: static assertion failed due to requirement 'is_readable_field<wchar_t>::value': You're trying to use an unsupported field type in a row type. Review your row type definitions. 172 | is_readable_field<T>::value, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ 4. In the static interface, type mismatches are silent. If one of your field types doesn't match what your query returns, the library doesn't emit any error and silently sets the field to its default value. While this matches what the sqlite3_column_xxxx functions do, I think it'd be more helpful to emit an error. Such type mismatches are likely due to programming errors, and erroring helps to detect them soon. For instance, Boost.MySQL does this by checking metadata (you can probably use sqlite3_column_type to perform the check). My second use case involves exposing in-memory data structures to foreign processes. In one of my previous jobs, we had a high-severity bug due to one of such tables getting stuck with incorrect data. Diagnosing the problem was quite involved. A tool that made it easy to inspect the data in such tables could have been a reasonable measure after the bugfix. I have a similar data structure in my servertech chat project (https://github.com/anarthal/servertech-chat/blob/master/server/src/services/...), so I thought on adapting the multi_index.cpp example to suit my needs. I have no prior experience with SQLite virtual tables, but I've been able to learn about them using SQLite's official documentation. Adapting multi_index.cpp has been challenging, though, because: a. Virtual tables are inherently complex. b. There is no discussion page on virtual tables in the proposed library. c. The provided examples are inherently complex and don't have many comments. As a result, I've found myself reverse-engineering the library and examples, which is not very rewarding. I was under the wrong impression that loading the module in one process would make it immediately available to other processes loading the database file onto which it was loaded. This is my fault, as there is no way for SQLite to do this. This implies that, once I have my module, I need to write a protocol to accept and execute SQLite queries (e.g. by using a UNIX socket). Unfortunately, the overall architecture is too complex and has security implications that need to be addressed, and thus would have never gone into production in the company I used to work for. This left me questioning whether using virtual tables to expose in-memory data structures to SQLite is really strong. If I'm not mistaken, the library has 3 examples on this, so I'm likely missing something. I think a good motivation for these three examples would be very valuable - that is, some comments stating what real world situations would require me to write code like the one in the examples. I'm actually not convinced of the value added by the library in terms of virtual tables. My impression is that providing tested, robust virtual table implementations for concrete use cases would add more value than what the current infrastructure offers. In other words, is writing virtual table implementations really something that happens in day to day development? I feel that the answer to this question is no, so I find it surprising that all examples except one are about virtual tables. Take this with perspective, though, as I'm no expert in this topic.
- Will the choice of API abstraction model ease the development of software that must talk to a SQLITE database?
The API simplifies regular SQL usage, as explained above. A couple of comments: 1. The virtual table infrastructure uses a macro (BOOST_SQLITE_NO_VIRTUAL) to conditionally make functions virtual. I think this is a potential problem in the long-run. If virtualization is not required, appropriate compile-time checks should be performed, instead. 2. sqlite::transaction features a throwing destructor, which I find surprising.
- What is your evaluation of the implementation?
I've only peaked at it in particular sites, and have the following comments: 1. transaction::commit throwing and non-throwing overloads seem to have different behavior on error. One will call rollback() and the other one won't. This case doesn't look to be covered by unit tests. 2. The virtual table infrastructure may be invoking undefined behavior by violating strict aliasing, casting sqlite3_value* into sqlite::value via a reinterpret_cast. Some conversations have happened in the Slack workspace about it, with conflicting opinions whether this is actual undefined behavior or not. 3. Compiling with -Wall generates a lot of warnings, most of which are located in headers. I'd advise building with -Wall and -Werror in the CI to clean up these. 4. Passing incorrect parameter types to .execute() and .query() doesn't yield clean error messages. It would be beneficial implementing concepts (or a static_assert) for better messages. As I mentioned earlier, this also applies for the static interface. 5. Compiler coverage in CI looks improvable. The asan job seems commented out, and I can't see any build targeting C++20 or 23. 6. If BOOST_SQLITE_NO_VIRTUAL is to stay, it needs to be documented and tested.
- What is your evaluation of the documentation?
In my opinion, the documentation is insufficient for the level required by a Boost library. I've found the virtual table functionality specially involved because of this. Some of my major comments: 1. All supported, public functionality should be documented, both in the reference pages and in the discussion. This is not the case for most of the library: virtual tables, hooks, json, metadata, mutexes, transactions, blobs and memory management (sqlite::memory_tag) are not explained in the discussion. 2. Excepting virtual tables, the rest of the functionality isn't explained using examples, either. 3. Some Doxygen comments seem outdated, referencing functions that have been removed or renamed. See the bottom of this email for a list of what I found. 4. Many Doxygen comments don't provide the information I'd expect to find, like exception safety, lifetime rules, or the underlying sqlite3_xxx functions they call.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I've spent around 12h doing inspecting the library, building the examples, writing the online store simulator, adapting the multi_index.cpp example and writing this review.
- Are you knowledgeable about the problem domain?
I've had mild SQLite experience in the past, mainly in prototypes. I didn't have prior experience with virtual tables. I have experience with SQL in general, and am the author of Boost.MySQL.
Final decision
Unfortunately, my final recommendation is to REJECT Boost.Sqlite in its current form. I think the concept is interesting and could benefit many people, but the library still requires some work before it gets into Boost. The documentation needs a lot of work. Getting some field experience would also be beneficial. Thanks again Klemens and Richard for your effort. I'd love to see the library being proposed again in the future. Typos and documentation comments: * The documentation should state which C++ standards and compilers are tested and supported. * The examples on virtual tables require step-by-step comments, since they implement non-trivial logic. * The examples on virtual tables need a rationale. That is, as a user, when would I find myself writing code like this? * The discussion seems to jump to advanced details too quickly. There are very few docs on everyday tasks. * The following types don't have any documentation at all but appear in the public namespace. They should be either documented or made private: allocator, unique_ptr, msize, make_unique, set_variant_result, like, glob, icmp, * Functionality that allocates using sqlite3-specific memory allocations should likely state that fact it in the reference. * Intro: "This library provides a simple C++ sqlite library." Do you mean "simple C++ SQLite wrapper library"? * Discussion: "Prepared statements can also be used multiple time and used with named parameters instead of indexed." should be "multiple times" and "named parameters instead of indexed ones." * Discussion: "The result of a query is a field type,": this is technically not true - you get a resultset, which yields rows that have fields. * Discussion: "Fields & values can have subtypes, while parameter to prepared statements do not have thos associated.": should be "parameters" and "do not have associated subtypes". * In general: avoid & as a way to state "and" * vtable.hpp: "@tparam T The implementation type of the module. It must inherit": I don't know what this phrase means. The type requirements for T likely need a page on their own. * vtable.hpp: "It's lifetime is managed by the database.": should be "Its" * vtable.hpp: "@param The requirements for `module`.": this creates a bogus parameter entry, looks like a leftover * field.hpp and value.hpp: "Returns the value as text, i.e. a string_view. Note that this value may be invalidated`.": this is not enough information. It should list when it gets invalidated. * statement.hpp: "The handle is shared between the statement & resultset. The statemens need to be kept alive.": should be "statement" and "needs". * statement.hpp: "This can be a map, a vector or a stuple": should be "tuple" * vtable.hpp (create function): "The instance_type gets used & managed" references an instance_type that does not exist (presumably you meant table_type). * vtable.hpp: "Destroy the storage = this function needs to be present for non eponymous tables": syntax error * vtable.hpp: "index info used by the find_index function": no find_index function exists, you presumably mean best_index * cstring_ref.hpp: doesn't have any function documented * connection.hpp: "Perform a query without parametert, It execute a multiple statement.": should be "parameter" * collation.hpp: "a case insensitive string omparison, e.g. from boost.urls": should be "comparison" * README.md: "auto r = q.current();''" there's an extra '' there. I'd advise to run documentation snippets to prevent such errors * multi_index.cpp: the create table statement uses the "url" name, which doesn't harm but is misleading * multi_index.cpp: I'd advise to mark overriding methods with the "override" keyword * multi_index.cpp: "static_assert(sizeof(const_iterator) <= sizeof(sqlite3_int64), "");": why is the static_assert needed? Please add a comment documenting it * multi_index.cpp and ordered_map.cpp: declares an "enum indices" that presumably was thought to make bit fiddling more clear, but don't seem to be used, increasing user confusion. On Wed, 13 Nov 2024 at 13:31, Richard Hodges via Boost <boost@lists.boost.org> wrote:
Dear All,
Surprise!
The Boost formal review of the Boost SQLITE library starts *TODAY*, taking place
from November 13th, 2024 to November 22nd, 2024 (inclusive).
I apologise profusely for springing this on you without prior warning. The error is entirely mine. I am extending the period by one day to compensate.
The library is authored by Klemens Morgenstern (@klemens-morgenstern in the CppLang slack).
Documentation: https://klemens.dev/sqlite/ <https://anarthal.github.io/mysql/index.html> Source: https://github.com/klemens-morgenstern/sqlite <https://github.com/anarthal/mysql/>
From the documentation:
boost.sqlite is a simple to use C++ sqlite library. It provides a modern interface using facilities like error_code, views (e.g. for blobs) and the ability to use boost.describe or boost.pfr for parameterised queries.
Supported features include:
- typed queries - prepared statements - json support - custom functions (scalar, aggregate, windows) - event hooks - virtual tables
SQLite provides an excellent C-API, so this library does not attempt to hide, but to augment it.
Please provide in your review information you think is valuable to explain your choice to ACCEPT or REJECT including SQLITE as a Boost library. Please be explicit about your decision (ACCEPT or REJECT).
Some other questions you might want to consider answering:
- Will the library bring additional out-of-the-box utility to Boost? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - Will the choice of API abstraction model ease the development of software that must talk to a SQLITE database? - Are there any immediate improvements that could be made after acceptance, if acceptance should happen? - Did you try to use the library? With which compiler(s)? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
More information about the Boost Formal Review Process can be found at: http://www.boost.org/community/reviews.html
The review is open to anyone who is prepared to put in the work of evaluating and reviewing the library. Prior experience in contributing to Boost reviews is not a requirement.
Thank you for your efforts in the Boost community. They are very much appreciated.
Richard Hodges - review manager of the proposed Boost.SQLITE library
Klemens is often available on CppLang Slack and of course by email should you require any clarification not covered by the documentation, as am I.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost