[boost] Boost.Parser review

28 Feb 2024

      Hi all,

This is my review of the proposed Boost.Parser.

As in other occasions, I've taken a practical approach and used the library
to (re)implement an existing, real feature in Boost.MySQL: client-side SQL
formatting. This requires parsing a format string with replacement fields
(similar to std::format strings).

- What is your evaluation of the potential usefulness of the library?

I think it's useful. Parsing is a common task, and this library simplifies it.
I tend to prefer hand-crafted parsers in general, but libraries like
the proposed
tend to save time (and bugs). Also, many other users have manifested
interest in it.

I think the library is too heavyweight regarding compile times to be used in
header-only libraries like Boost.MySQL, but it should be fine in
compiled libraries
and applications.

- What is your evaluation of the design?

I think it's pretty good in general. It allows you to express complex
parsers using
little code, which is good. In my opinion, it may be too generic (allowing
any forward character range as input, for instance), which tends to
increase complexity.
I tend to prefer simpler approaches like the one Boost.URL uses - but
I understand
not everyone thinks like me, and there's a use case for each approach.

As we discussed in previous posts, if I was writing this, I would have
made some decisions differently (see
https://lists.boost.org/Archives/boost/2024/02/256010.php),
but I don't think there is any problem with the current design.
The author has justified his decisions, and everything makes sense.

- What is your evaluation of the implementation?

I can tell it's really complex. I've attempted to debug a problem I had for
which parser::trace::on wasn't enough (see below), and navigating the
code was extremely
difficult (reminds me of debugging Asio). This is probably inherent to the
level of complexity of the library. This is not a criticism, but a thought
I wanted to share.

- Did you try to use the library? With what compiler? Did you have any problems?

Yes, I did. As mentioned above, I re-implemented Boost.MySQL SQL formatting
(see https://www.boost.org/doc/libs/master/libs/mysql/doc/html/mysql/sql_formatti...)
using the proposed library. I've uploaded the code to a branch - you
can check it here:
https://github.com/boostorg/mysql/blob/66c99e31c9e548b6cf5b98e3d08255f223907...
For reference, I've also pasted the grammar and the code at the end of
this email.

I've found the following problems:

* Rule parsers using sequence + optional parsers yield unexpected
results when the optional
  rule doesn't match. Making it a regular parser, rather than a rule
parser, fixes the problem.
  The impact is that automatic replacement fields (i.e. `{}`) were
being parsed as `{0}`.
  https://github.com/tzlaine/parser/issues/125 tracks this.
* My clang-11 build fails. Looks like a concept is being used without
proper protection,
  and the compiler doesn't support it.
https://github.com/tzlaine/parser/issues/129
* My clang-16 build (and my local build attempts with clang 17, 18 and 19) fail
  because of a non-matching template redeclaration in
parser/detail/text/transcode_view.hpp.
  I don't know if it's clang being picky or a clang bug.
https://github.com/tzlaine/parser/issues/130
* My clang-14 build fails. This looks like it's hitting a clang + libstdc++ bug.
  See https://github.com/tzlaine/parser/issues/131.
* My Apple clang 14 build fails. It also looks like an issue in clang or libc++.
  See https://github.com/tzlaine/parser/issues/133.
* My MSVC 14.2 and 14.3 builds under C++17 fail (C++20 works fine).
I've completely
  failed to parse what's the problem here. Description here:
https://github.com/tzlaine/parser/issues/132

Some other minor problems:

* As reported by other reviewers, enabling warnings=all or
warnings=extra issues a lot of warnings.
* [[no_unique_address]] (C++20) is used unprotected, which causes
warnings under clang-7
  (see https://github.com/tzlaine/parser/issues/128).
* The CMakeLists.txt is not Boost-compliant (yet).
* There is a #include <boost/hana.hpp>, which is not very friendly for
compile-times -
  better replace it by the individual header that is needed.

I've submitted the build logs together with all details I've been able
to gather.
I haven't been able to provide a minimal reproducible example for some of these
(they were a lot and I had limited time). Note that all the builds
referenced in the
issues (except the OSX one) are run in Docker containers. If you have
access to Docker,
reproducing these environments should be trivial. Since I've done this
in the past,
should the author need any help with this, please let me know, I'm
happy to help.

Reading the CI files for the proposed library, I think compiler
coverage is insufficient -
for instance, there are no Linux clang builds, and the only Apple
clang build uses clang 13 and C++17.

- What is your evaluation of the documentation?

The presentation is really good. I struggled to understand the
concepts at first because I had
no previous experience with parsers. I ended up reading the Bison
concepts section
(https://www.gnu.org/software/bison/manual/bison.html#Concepts), after
which everything made
perfect sense. I'd advise to expand the introduction a bit, or just
point to other material
for people like me.

I fail to understand what RESOLVE means in
https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/tutorial/t....
Docs state "RESOLVE() is a notional macro that expands to the
resolution of parse argument or evaluation of a parse predicate",
but I'd say this resolution hasn't been explained yet - a link here would help.

I've found the reference to be more confusing that helpful - it looks
like a straight header dump.
This is made worse by the concepts not being linked.

Although it has many terrible things, the quickbook toolchain is
(partly) capable of
linking concepts - have a look at
https://www.boost.org/doc/libs/master/libs/mysql/doc/html/mysql/ref/boost__m....
This still has limitations, but I'd say it's better than what's currently there.
If this is something you'd be interested in, please let me know and I
will help you with it.

- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?

I've spent around 2 full days of work reading about the problem
domain, studying the docs,
writing my use case, diagnosing the problems I reported and writing this review.

- Are you knowledgeable about the problem domain?

I've never used a parsing library like this before. I've read about
the problem in the past,
but wasn't very knowledgeable about it before the review. I've written
small parsers like
the one for format strings in the past, though.

- My conclusion

My recommendation is to REJECT the library in its current form. This
is not a hard
rejection - I completely think this library can be useful to people.
But I found too many
problems while using it - too many to get it into Boost in its current form.
I'd be happy to see this library being proposed again in a couple of months,
once all the issues found during the review get fixed.

- Affiliation disclosure

Please note that I'm affiliated with the C++ Alliance.

I'd like to thank Zach for proposing the library, and Marshall for
managing the review.

Regards,
Ruben.

- Reference: parsing code in Boost.MySQL

/**
 * format-str  = *( text | literal-opening-brace |
literal-closing-brace | replacement-field )
 * text        = *( any characters except {} )
 * replacement-field = "{" [ replacement-id ] "}"
 * replacement-id = ushort | field-name
 * field-name = (character a-z, A-Z, _) *( any characters 0-9, a-z, A-Z, _ )
 */

// Note: globals is a format_state structure, which has member functions
// to append raw text and replacement fields to the output string

// Actions
constexpr auto action_text = [](auto& ctx) {
_globals(ctx).append_text(_attr(ctx)); };
constexpr auto action_replacement_field = [](auto& ctx) {
    const std::optional<std::variant<unsigned short, std::string>>&
val = _attr(ctx);
    format_state& st = _globals(ctx);
    if (val)
    {
        if (const auto* idx = std::get_if<unsigned short>(&*val))
        {
            st.append_indexed_field(*idx);
        }
        else
        {
            st.append_named_field(std::get<std::string>(*val));
        }
    }
    else
    {
        st.append_auto_field();
    }
};
constexpr auto action_opening_brace = [](auto& ctx) {
_globals(ctx).append_text("{"); };
constexpr auto action_closing_brace = [](auto& ctx) {
_globals(ctx).append_text("}"); };

// Character ranges
constexpr std::array<const char, 63> name_chars{
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u',
    'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
    'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', '0', '1',
'2', '3', '4', '5', '6', '7', '8', '9'
};
constexpr boost::span<const char, 53> name_start_chars{name_chars.data(), 53};

// Parsers and rules
constexpr auto field_name = parser::char_(name_start_chars) >>
*(parser::char_(name_chars));
constexpr auto replacement_id = parser::ushort_ | field_name;
constexpr auto replacement_field = ('{' >> -replacement_id) > '}';

constexpr auto literal_open_brace = parser::lit("{{");
constexpr auto literal_close_brace = parser::lit("}}");
constexpr auto text = +(parser::char_ - '{' - '}');
constexpr auto format_str_parser = *(
    text[action_text] | literal_open_brace[action_opening_brace] |
literal_close_brace[action_closing_brace] |
    replacement_field[action_replacement_field]
);

// Top-level parsing function
inline void parse_format_str(string_view format_str,
format_context_base& ctx, span<const format_arg> args)
{
    // By default, parser prints to cerr, which we don't want
    parser::callback_error_handler handler([](const std::string&) {});

    // Create a parsing state
    format_state st(ctx, args);

    // Invoke the actual parsing
    bool result = parser::parse(
        format_str,
        parser::with_globals(parser::with_error_handler(format_str_parser,
handler), st)
    );

    // If there was an error, report it
    if (!result)
    {
        ctx.add_error(client_errc::format_string_invalid_syntax);
    }
}

[boost] Boost.Parser review

Ruben Perez