Re: [boost] Determining interest: C++11 parser generator library

15 Nov 2011


      "Joel de Guzman" <joel@boost-consulting.com> wrote in message 
news:j9sb3i$c7g$1@dough.gmane.org...
...
On 11/15/2011 1:08 AM, Gene Bushuyev wrote:
...
"John Bytheway" <jbytheway+boost@gmail.com> wrote in message
news:j9qp21$f09$1@dough.gmane.org...
...
On 13/11/11 23:39, Gene Bushuyev wrote:
...
Sorry if it turns out to be a duplicate, it looks my original post was
lost in cyberspace, so I'm re-posting this request.
I'm trying to determine if there is a sufficient interest for including
AXE C++11 recursive descent parser generator library in Boost. The
zipped sources and documentation are here:
http://www.gbresearch.com/axe/axe.zip
People are more likely to investigate if you can provide a link to the
documentation online somewhere, so they don't have to download and
extract a zip file.
It would also be useful to explain briefly how it compares with
Boost.Spirit.
John Bytheway
It's true there is a significant overlap with Spirit. It's also true 
there is more than
one way to do the parsing, so some people will be more comfortable with 
Spirit, and I have
reasons to believe some people will be more comfortable with AXE. There 
are differences,
importance of which depends on personal perspective and needs. I tried to 
summarize below
what I would consider advantages of AXE:
* it's a much smaller header only library: 15 files, 126 KB total
* it has no dependencies on other libraries apart from the Standard 
library
* it uses only standard facilities, so theoretically it should work with 
any C++11
compiler without any modifications
* compilation times are much shorter than Spirit
* the syntax is less cryptic than Spirit, so it's easier to remember, 
write, debug, and
read parsers written in AXE (this is, of course, subjective)
* in my limited comparison, parsers written in AXE take fewer lines of 
code to write, and
development times are shorter
Disadvantages:
* AXE requires C++11 compiler, current status of compiler support is 
unknown
* It's been released recently, thus there is limited experience working 
with it
When Spirit debuted, it was a 7 header file. If your library gets more 
mature,
the added complexity will be necessary. Your main advantage is simplicity.
I can't argue with that. However, it is also your big disadvantage. Here 
are
some more important points you missed in your Disadvantages section:
* It does not have unicode support.
It does have a wide character support, or better say, you can instantiate 
rules on any character type. You can mix narrow and wide characters, you 
also mix binary and text parsing. Many parsers would work with unicode files 
without any modification.
...
* It does not have attributes and AST support. It is a purely
 transduction parser like Spirit 1.0. So in every step,
 you have to convert an iterator range to an attribute manually.
I was thinking about adding AST, but so far there wasn't any need that would 
justify the additional complexity. My previous experience with various 
parsers creating ASTs and then traversing them was rather negative both in 
terms of performance and complexity. Maybe it will change in the future.
...
* It does not have support for polymorphic semantic actions
 (you know that c++ lambda is monomorphic, right?).
There is a polymorfic class r_rule, which uses std::function. It's primarily 
used for expressing recursion. It can be used on it's own, of course, but 
unlike auto rules it would introduce often unnecessary performance hit. But 
if one wants to return a parser from a function or keep a rule as class 
member then polymorphic rule will do just fine.
...
* It does not have reusable grammars
There aren't reusable grammars, but nothing prevents from creating reusable 
parsers. I have a few.
...
* No symbol tables
* No character sets
So far there wasn't a need for that.
...
* No separation of grammar construction and parsing. Your examples have
 a big overhead: they build the parser every time you parse.
I don't think there is a big overhead. Not in real world applications. This 
design in its pre-C++11 incarnation was used for the last 7 years in several 
binary and text parsers. Based on that I know its raw performance was not a 
factor. In existing parsers I've seen disk access and filling the data 
structures was a major factor. But, of course, as I mentioned the experience 
is limited.
...
* The syntax is *more* cryptic than Spirit (this is, of course, subjective 
:-)
Just to name a few.
Regards,
-- 
Joel de Guzman
http://www.boostpro.com
http://boost-spirit.com
Thanks for taking a bite.

Gene Bushuyev