Re: [boost] Boost.Parser questions

27 Feb 2024


      On Tue, Feb 27, 2024 at 5:44 PM Zach Laine <whatwasthataddress@gmail.com> wrote:
...
On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost
<boost@lists.boost.org> wrote:
...
...
* My use case involves parsing identifiers that can only contain ASCII
lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them.
I noticed this too but it's actually pretty easy to fill this in yourself.
Here's a working example: https://godbolt.org/z/6P6dTbGYY
auto const digit = p::char_('0', '9');
    auto const lower = p::char_('a', 'z');
    auto const upper = p::char_('A', 'Z');
    auto const ident = digit | lower | upper | '_';
Parser does have these (digit, lower, upper), but those match more
than what is desired here.  What is desired here is alnum |
char_('_'), I think.  That is, only the ASCII a-z, A-Z, 0-9, and _.
You can spell that out yourself as above, as you've done.  You could
also just use digit | lower | upper | char_('_').  It will be vaguely
as fast I expect (but certainly measure if it's a perf-critical
situation).
I should have mentioned -- I recently removed the ascii::* parsers,
which used is_*() from the C standard library.  It included
ascii::alnum.  I removed them because those is_*() functions are
considered just plain wrong by me and lots of other people from SG-16
(the committee's Unicode study group).  They are also technically
dangerous, though most standard libraries I know of patch around the
potential UB because it is so easy to fall afoul of.  I don't know if
you're using one of the big three std libs though, so it seems sketchy
to use those, just for safety reasons.  They also have wrong semantics
in a Unicode context.

Zach