On Tue, Feb 27, 2024 at 5:44 PM Zach Laine
On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost
wrote: * My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.
Here's a working example: https://godbolt.org/z/6P6dTbGYY
auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';
Parser does have these (digit, lower, upper), but those match more than what is desired here. What is desired here is alnum | char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _. You can spell that out yourself as above, as you've done. You could also just use digit | lower | upper | char_('_'). It will be vaguely as fast I expect (but certainly measure if it's a perf-critical situation).
I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group). They are also technically dangerous, though most standard libraries I know of patch around the potential UB because it is so easy to fall afoul of. I don't know if you're using one of the big three std libs though, so it seems sketchy to use those, just for safety reasons. They also have wrong semantics in a Unicode context. Zach