Hi all, This is my review of the proposed Boost.Parser. As in other occasions, I've taken a practical approach and used the library to (re)implement an existing, real feature in Boost.MySQL: client-side SQL formatting. This requires parsing a format string with replacement fields (similar to std::format strings). - What is your evaluation of the potential usefulness of the library? I think it's useful. Parsing is a common task, and this library simplifies it. I tend to prefer hand-crafted parsers in general, but libraries like the proposed tend to save time (and bugs). Also, many other users have manifested interest in it. I think the library is too heavyweight regarding compile times to be used in header-only libraries like Boost.MySQL, but it should be fine in compiled libraries and applications. - What is your evaluation of the design? I think it's pretty good in general. It allows you to express complex parsers using little code, which is good. In my opinion, it may be too generic (allowing any forward character range as input, for instance), which tends to increase complexity. I tend to prefer simpler approaches like the one Boost.URL uses - but I understand not everyone thinks like me, and there's a use case for each approach. As we discussed in previous posts, if I was writing this, I would have made some decisions differently (see https://lists.boost.org/Archives/boost/2024/02/256010.php), but I don't think there is any problem with the current design. The author has justified his decisions, and everything makes sense. - What is your evaluation of the implementation? I can tell it's really complex. I've attempted to debug a problem I had for which parser::trace::on wasn't enough (see below), and navigating the code was extremely difficult (reminds me of debugging Asio). This is probably inherent to the level of complexity of the library. This is not a criticism, but a thought I wanted to share. - Did you try to use the library? With what compiler? Did you have any problems? Yes, I did. As mentioned above, I re-implemented Boost.MySQL SQL formatting (see https://www.boost.org/doc/libs/master/libs/mysql/doc/html/mysql/sql_formatti...) using the proposed library. I've uploaded the code to a branch - you can check it here: https://github.com/boostorg/mysql/blob/66c99e31c9e548b6cf5b98e3d08255f223907... For reference, I've also pasted the grammar and the code at the end of this email. I've found the following problems: * Rule parsers using sequence + optional parsers yield unexpected results when the optional rule doesn't match. Making it a regular parser, rather than a rule parser, fixes the problem. The impact is that automatic replacement fields (i.e. `{}`) were being parsed as `{0}`. https://github.com/tzlaine/parser/issues/125 tracks this. * My clang-11 build fails. Looks like a concept is being used without proper protection, and the compiler doesn't support it. https://github.com/tzlaine/parser/issues/129 * My clang-16 build (and my local build attempts with clang 17, 18 and 19) fail because of a non-matching template redeclaration in parser/detail/text/transcode_view.hpp. I don't know if it's clang being picky or a clang bug. https://github.com/tzlaine/parser/issues/130 * My clang-14 build fails. This looks like it's hitting a clang + libstdc++ bug. See https://github.com/tzlaine/parser/issues/131. * My Apple clang 14 build fails. It also looks like an issue in clang or libc++. See https://github.com/tzlaine/parser/issues/133. * My MSVC 14.2 and 14.3 builds under C++17 fail (C++20 works fine). I've completely failed to parse what's the problem here. Description here: https://github.com/tzlaine/parser/issues/132 Some other minor problems: * As reported by other reviewers, enabling warnings=all or warnings=extra issues a lot of warnings. * [[no_unique_address]] (C++20) is used unprotected, which causes warnings under clang-7 (see https://github.com/tzlaine/parser/issues/128). * The CMakeLists.txt is not Boost-compliant (yet). * There is a #include <boost/hana.hpp>, which is not very friendly for compile-times - better replace it by the individual header that is needed. I've submitted the build logs together with all details I've been able to gather. I haven't been able to provide a minimal reproducible example for some of these (they were a lot and I had limited time). Note that all the builds referenced in the issues (except the OSX one) are run in Docker containers. If you have access to Docker, reproducing these environments should be trivial. Since I've done this in the past, should the author need any help with this, please let me know, I'm happy to help. Reading the CI files for the proposed library, I think compiler coverage is insufficient - for instance, there are no Linux clang builds, and the only Apple clang build uses clang 13 and C++17. - What is your evaluation of the documentation? The presentation is really good. I struggled to understand the concepts at first because I had no previous experience with parsers. I ended up reading the Bison concepts section (https://www.gnu.org/software/bison/manual/bison.html#Concepts), after which everything made perfect sense. I'd advise to expand the introduction a bit, or just point to other material for people like me. I fail to understand what RESOLVE means in https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/tutorial/t.... Docs state "RESOLVE() is a notional macro that expands to the resolution of parse argument or evaluation of a parse predicate", but I'd say this resolution hasn't been explained yet - a link here would help. I've found the reference to be more confusing that helpful - it looks like a straight header dump. This is made worse by the concepts not being linked. Although it has many terrible things, the quickbook toolchain is (partly) capable of linking concepts - have a look at https://www.boost.org/doc/libs/master/libs/mysql/doc/html/mysql/ref/boost__m.... This still has limitations, but I'd say it's better than what's currently there. If this is something you'd be interested in, please let me know and I will help you with it. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I've spent around 2 full days of work reading about the problem domain, studying the docs, writing my use case, diagnosing the problems I reported and writing this review. - Are you knowledgeable about the problem domain? I've never used a parsing library like this before. I've read about the problem in the past, but wasn't very knowledgeable about it before the review. I've written small parsers like the one for format strings in the past, though. - My conclusion My recommendation is to REJECT the library in its current form. This is not a hard rejection - I completely think this library can be useful to people. But I found too many problems while using it - too many to get it into Boost in its current form. I'd be happy to see this library being proposed again in a couple of months, once all the issues found during the review get fixed. - Affiliation disclosure Please note that I'm affiliated with the C++ Alliance. I'd like to thank Zach for proposing the library, and Marshall for managing the review. Regards, Ruben. - Reference: parsing code in Boost.MySQL /** * format-str = *( text | literal-opening-brace | literal-closing-brace | replacement-field ) * text = *( any characters except {} ) * replacement-field = "{" [ replacement-id ] "}" * replacement-id = ushort | field-name * field-name = (character a-z, A-Z, _) *( any characters 0-9, a-z, A-Z, _ ) */ // Note: globals is a format_state structure, which has member functions // to append raw text and replacement fields to the output string // Actions constexpr auto action_text = [](auto& ctx) { _globals(ctx).append_text(_attr(ctx)); }; constexpr auto action_replacement_field = [](auto& ctx) { const std::optional<std::variant<unsigned short, std::string>>& val = _attr(ctx); format_state& st = _globals(ctx); if (val) { if (const auto* idx = std::get_if<unsigned short>(&*val)) { st.append_indexed_field(*idx); } else { st.append_named_field(std::get<std::string>(*val)); } } else { st.append_auto_field(); } }; constexpr auto action_opening_brace = [](auto& ctx) { _globals(ctx).append_text("{"); }; constexpr auto action_closing_brace = [](auto& ctx) { _globals(ctx).append_text("}"); }; // Character ranges constexpr std::array<const char, 63> name_chars{ 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; constexpr boost::span<const char, 53> name_start_chars{name_chars.data(), 53}; // Parsers and rules constexpr auto field_name = parser::char_(name_start_chars) >> *(parser::char_(name_chars)); constexpr auto replacement_id = parser::ushort_ | field_name; constexpr auto replacement_field = ('{' >> -replacement_id) > '}'; constexpr auto literal_open_brace = parser::lit("{{"); constexpr auto literal_close_brace = parser::lit("}}"); constexpr auto text = +(parser::char_ - '{' - '}'); constexpr auto format_str_parser = *( text[action_text] | literal_open_brace[action_opening_brace] | literal_close_brace[action_closing_brace] | replacement_field[action_replacement_field] ); // Top-level parsing function inline void parse_format_str(string_view format_str, format_context_base& ctx, span<const format_arg> args) { // By default, parser prints to cerr, which we don't want parser::callback_error_handler handler([](const std::string&) {}); // Create a parsing state format_state st(ctx, args); // Invoke the actual parsing bool result = parser::parse( format_str, parser::with_globals(parser::with_error_handler(format_str_parser, handler), st) ); // If there was an error, report it if (!result) { ctx.add_error(client_errc::format_string_invalid_syntax); } }
On Feb 28, 2024, at 10:55 AM, Ruben Perez via Boost <boost@lists.boost.org> wrote:
Hi all,
This is my review of the proposed Boost.Parser.
As in other occasions, I've taken a practical approach and used the library to (re)implement an existing, real feature in Boost.MySQL: client-side SQL formatting. This requires parsing a format string with replacement fields (similar to std::format strings).
Thanks for the review,. Ruben! — Marshall
participants (2)
-
Marshall Clow
-
Ruben Perez