Determining interest: C++11 parser generator library

newer
Boost.Local Review - comments by...

older
[program_options] Improving error...

Gene Bushuyev

13 Nov 2011 13 Nov '11

11:39 p.m.

Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request. I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip - Gene Bushuyev

Show replies by date

John Bytheway

14 Nov 14 Nov

10:03 a.m.

On 13/11/11 23:39, Gene Bushuyev wrote:

...

Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request.

I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip

People are more likely to investigate if you can provide a link to the documentation online somewhere, so they don't have to download and extract a zip file. It would also be useful to explain briefly how it compares with Boost.Spirit. John Bytheway

Gene Bushuyev

5:08 p.m.

"John Bytheway" <jbytheway+boost@gmail.com> wrote in message news:j9qp21$f09$1@dough.gmane.org...

...

On 13/11/11 23:39, Gene Bushuyev wrote:

...
Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request.

I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip

People are more likely to investigate if you can provide a link to the documentation online somewhere, so they don't have to download and extract a zip file.

It would also be useful to explain briefly how it compares with Boost.Spirit.

John Bytheway

It's true there is a significant overlap with Spirit. It's also true there is more than one way to do the parsing, so some people will be more comfortable with Spirit, and I have reasons to believe some people will be more comfortable with AXE. There are differences, importance of which depends on personal perspective and needs. I tried to summarize below what I would consider advantages of AXE: * it's a much smaller header only library: 15 files, 126 KB total * it has no dependencies on other libraries apart from the Standard library * it uses only standard facilities, so theoretically it should work with any C++11 compiler without any modifications * compilation times are much shorter than Spirit * the syntax is less cryptic than Spirit, so it's easier to remember, write, debug, and read parsers written in AXE (this is, of course, subjective) * in my limited comparison, parsers written in AXE take fewer lines of code to write, and development times are shorter Disadvantages: * AXE requires C++11 compiler, current status of compiler support is unknown * It's been released recently, thus there is limited experience working with it The link to the reference is here: www.gbresearch.com/axe/reference.pdf -- Gene Bushuyev

Joel de Guzman

15 Nov 15 Nov

12:17 a.m.

On 11/15/2011 1:08 AM, Gene Bushuyev wrote:

...

"John Bytheway" <jbytheway+boost@gmail.com> wrote in message news:j9qp21$f09$1@dough.gmane.org...

...
On 13/11/11 23:39, Gene Bushuyev wrote:

...
Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request.

I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip

People are more likely to investigate if you can provide a link to the documentation online somewhere, so they don't have to download and extract a zip file.

It would also be useful to explain briefly how it compares with Boost.Spirit.

John Bytheway

It's true there is a significant overlap with Spirit. It's also true there is more than one way to do the parsing, so some people will be more comfortable with Spirit, and I have reasons to believe some people will be more comfortable with AXE. There are differences, importance of which depends on personal perspective and needs. I tried to summarize below what I would consider advantages of AXE:

* it's a much smaller header only library: 15 files, 126 KB total * it has no dependencies on other libraries apart from the Standard library * it uses only standard facilities, so theoretically it should work with any C++11 compiler without any modifications * compilation times are much shorter than Spirit * the syntax is less cryptic than Spirit, so it's easier to remember, write, debug, and read parsers written in AXE (this is, of course, subjective) * in my limited comparison, parsers written in AXE take fewer lines of code to write, and development times are shorter

Disadvantages: * AXE requires C++11 compiler, current status of compiler support is unknown * It's been released recently, thus there is limited experience working with it

When Spirit debuted, it was a 7 header file. If your library gets more mature, the added complexity will be necessary. Your main advantage is simplicity. I can't argue with that. However, it is also your big disadvantage. Here are some more important points you missed in your Disadvantages section: * It does not have unicode support. * It does not have attributes and AST support. It is a purely transduction parser like Spirit 1.0. So in every step, you have to convert an iterator range to an attribute manually. * It does not have support for polymorphic semantic actions (you know that c++ lambda is monomorphic, right?). * It does not have reusable grammars * No symbol tables * No character sets * No separation of grammar construction and parsing. Your examples have a big overhead: they build the parser every time you parse. * The syntax is *more* cryptic than Spirit (this is, of course, subjective :-) Just to name a few. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Gene Bushuyev

3:23 a.m.

"Joel de Guzman" <joel@boost-consulting.com> wrote in message news:j9sb3i$c7g$1@dough.gmane.org...

...

On 11/15/2011 1:08 AM, Gene Bushuyev wrote:

...
"John Bytheway" <jbytheway+boost@gmail.com> wrote in message news:j9qp21$f09$1@dough.gmane.org...

...
On 13/11/11 23:39, Gene Bushuyev wrote:

...
Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request.

I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip

People are more likely to investigate if you can provide a link to the documentation online somewhere, so they don't have to download and extract a zip file.

It would also be useful to explain briefly how it compares with Boost.Spirit.

John Bytheway

It's true there is a significant overlap with Spirit. It's also true there is more than one way to do the parsing, so some people will be more comfortable with Spirit, and I have reasons to believe some people will be more comfortable with AXE. There are differences, importance of which depends on personal perspective and needs. I tried to summarize below what I would consider advantages of AXE:

* it's a much smaller header only library: 15 files, 126 KB total * it has no dependencies on other libraries apart from the Standard library * it uses only standard facilities, so theoretically it should work with any C++11 compiler without any modifications * compilation times are much shorter than Spirit * the syntax is less cryptic than Spirit, so it's easier to remember, write, debug, and read parsers written in AXE (this is, of course, subjective) * in my limited comparison, parsers written in AXE take fewer lines of code to write, and development times are shorter

Disadvantages: * AXE requires C++11 compiler, current status of compiler support is unknown * It's been released recently, thus there is limited experience working with it

When Spirit debuted, it was a 7 header file. If your library gets more mature, the added complexity will be necessary. Your main advantage is simplicity. I can't argue with that. However, it is also your big disadvantage. Here are some more important points you missed in your Disadvantages section:

* It does not have unicode support.

It does have a wide character support, or better say, you can instantiate rules on any character type. You can mix narrow and wide characters, you also mix binary and text parsing. Many parsers would work with unicode files without any modification.

...

* It does not have attributes and AST support. It is a purely transduction parser like Spirit 1.0. So in every step, you have to convert an iterator range to an attribute manually.

I was thinking about adding AST, but so far there wasn't any need that would justify the additional complexity. My previous experience with various parsers creating ASTs and then traversing them was rather negative both in terms of performance and complexity. Maybe it will change in the future.

...

* It does not have support for polymorphic semantic actions (you know that c++ lambda is monomorphic, right?).

There is a polymorfic class r_rule, which uses std::function. It's primarily used for expressing recursion. It can be used on it's own, of course, but unlike auto rules it would introduce often unnecessary performance hit. But if one wants to return a parser from a function or keep a rule as class member then polymorphic rule will do just fine.

...

* It does not have reusable grammars

There aren't reusable grammars, but nothing prevents from creating reusable parsers. I have a few.

...

* No symbol tables * No character sets

So far there wasn't a need for that.

...

* No separation of grammar construction and parsing. Your examples have a big overhead: they build the parser every time you parse.

I don't think there is a big overhead. Not in real world applications. This design in its pre-C++11 incarnation was used for the last 7 years in several binary and text parsers. Based on that I know its raw performance was not a factor. In existing parsers I've seen disk access and filling the data structures was a major factor. But, of course, as I mentioned the experience is limited.

...

* The syntax is *more* cryptic than Spirit (this is, of course, subjective :-)

Just to name a few.

Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Thanks for taking a bite. Gene Bushuyev

Joel de Guzman

5:52 a.m.

On 11/15/2011 11:23 AM, Gene Bushuyev wrote:

...

...
When Spirit debuted, it was a 7 header file. If your library gets more mature, the added complexity will be necessary. Your main advantage is simplicity. I can't argue with that. However, it is also your big disadvantage. Here are some more important points you missed in your Disadvantages section:

* It does not have unicode support.

It does have a wide character support, or better say, you can instantiate rules on any character type. You can mix narrow and wide characters, you also mix binary and text parsing. Many parsers would work with unicode files without any modification.

Unicode support is a lot more than that. See http://unicode.org/reports/tr18/tr18-5.1.html. You do not even have level-1 support.

...

...
* It does not have attributes and AST support. It is a purely transduction parser like Spirit 1.0. So in every step, you have to convert an iterator range to an attribute manually.

I was thinking about adding AST, but so far there wasn't any need that would justify the additional complexity. My previous experience with various parsers creating ASTs and then traversing them was rather negative both in terms of performance and complexity. Maybe it will change in the future.

Seems you haven't done much parsing ;-) When you get into *real* attribute grammars, then your simplicity will no longer be an advantage (http://www.haskell.org/haskellwiki/Attribute_grammar).

...

...
* It does not have support for polymorphic semantic actions (you know that c++ lambda is monomorphic, right?).

There is a polymorfic class r_rule, which uses std::function. It's primarily used for expressing recursion. It can be used on it's own, of course, but unlike auto rules it would introduce often unnecessary performance hit. But if one wants to return a parser from a function or keep a rule as class member then polymorphic rule will do just fine.

Nope. That's not what I meant. Anyway, rules cannot ever be polymorphic because of type erasure, regardless if it's c++11.

...

...
* It does not have reusable grammars

There aren't reusable grammars, but nothing prevents from creating reusable parsers. I have a few.

...
* No symbol tables * No character sets

So far there wasn't a need for that.

Which makes it very limited in my view.

...

...
* No separation of grammar construction and parsing. Your examples have a big overhead: they build the parser every time you parse.

I don't think there is a big overhead. Not in real world applications. This design in its pre-C++11 incarnation was used for the last 7 years in several binary and text parsers. Based on that I know its raw performance was not a factor. In existing parsers I've seen disk access and filling the data structures was a major factor. But, of course, as I mentioned the experience is limited.

It's OK for small micro parsers. Wait till you go beyond "small". Even your JSON parser will never be optimal because of this. I'm not sure what you mean by real world applications. Regards, -- Joel de Guzman http://www.boostpro.com http://boost-spirit.com

Mathias Gaunard

1:02 p.m.

On 15/11/2011 04:23, Gene Bushuyev wrote:

...

There is a polymorfic class r_rule, which uses std::function.

std::function is monomorphic. It's not that kind of polymorphism Joel was talking about, he meant parametric polymorphism, not type erasure.

Mathias Gaunard

14 Nov 14 Nov

1:15 p.m.

On 14/11/2011 00:39, Gene Bushuyev wrote:

...

Sorry if it turns out to be a duplicate, it looks my original post was lost in cyberspace, so I'm re-posting this request.

I'm trying to determine if there is a sufficient interest for including AXE C++11 recursive descent parser generator library in Boost. The zipped sources and documentation are here: http://www.gbresearch.com/axe/axe.zip

From a quick glance, it doesn't seem significantly different from Spirit, except it uses slightly different syntax. I don't think it would be a good idea to have two libraries that do exactly the same thing, unless one is clearly recommended as a replacement for the other.

5005

Age (days ago)

5007

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Gene Bushuyev
Joel de Guzman
John Bytheway
Mathias Gaunard