compile time parser generator

Hello, with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost? Is maybe somebody already working on this? Sincerely, Martin Bidlingmaier -- NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie! Jetzt informieren: http://www.gmx.net/de/go/freephone

Hi Martin,
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version>= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost? Is maybe somebody already working on this?
We've been working on an open source library for building parsers parsing at compile-time. You can find the documentation of it here: http://abel.web.elte.hu/mpllibs/metaparse/index.html, the source code here: https://github.com/sabel83/mpllibs, example programs for using it here: https://github.com/sabel83/mpllibs/tree/master/libs/metaparse/example. There is a type-safe printf based on it (parsing the printf format string at compile-time) which you can also take a look at as an example: http://abel.web.elte.hu/mpllibs/safe_printf/index.html. The library uses parser combinators. It doesn't build an AST, unless you - as the user of the library - do it yourself. The reason behind it is that it would make compilation much slower. Building parsers from an EBNF is a manual task because of not building an AST, but it is not complicated and the resulting code reflects the EBNF. You can find an example for that in the following paper: Zoltán Porkoláb, Ábel Sinkovics: Domain-specific Language Integration with Compile-time Parser Generator Library In Eelco Visser, Jaakko Järvi, editors, Proceedings of the ninth international conference on Generative programming and component engineering (GPCE 2010). ACM, October 2010, pp. 137-146. The library provides a number of combinators already implemented and tested, see http://abel.web.elte.hu/mpllibs/metaparse/reference.html. The library is mostly based on the old standard with the exception of defining the text to parse, which needs to be a compile-time character sequence. As described in the paper mentioned above, it should be doable using user-defined literals (and a few other C++11 features), however we couldn't try it out on any compiler so far. How does your solution do it? Regards, Ábel

Martin Bidlingmaier wrote:
Hello,
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost? Is maybe somebody already working on this?
Sincerely,
Martin Bidlingmaier
No realistic discussion of such a proposal can be undertaken with reference to/ comparison with boost.spirit which has been in usage for 10 years and has been continually updated, enhanced and maintained over that period. You should start out by taking a careful look at spirit and contrast your proposal with this mature library used for the same purpose. Robert Ramey

No realistic discussion of such a proposal can be undertaken with reference to/ comparison with boost.spirit which has been in usage for 10 years and has been continually updated, enhanced and maintained over that period. You should start out by taking a careful look at spirit and contrast your proposal with this mature library used for the same purpose.
Robert Ramey
I know boost.spirit, but it doesn't solve the same problem: boost.spirit creates a parser for parsing at run time, whereas my 'library' creates a parser that parses string literals at compile time (it could parse a string at run time as well, but it's not meant for that). For example, this is an excerpt from main.cpp: constexpr const_string< char > cstr = "2 + 54 * 2 + 83"; constexpr addition a( cstr );//cstr is processed at compile time static_assert( a.eval() == 193 , ""); -- Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

on Fri Jan 06 2012, "Martin Bidlingmaier" <Martin.Bidlingmaier-AT-gmx.de> wrote:
Hello,
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost?
It was only a matter of time before someone proposed this. Of course it has a chance to make it into boost. I guess the the first question will be, "can you describe some plausible use-cases?" -- Dave Abrahams BoostPro Computing http://www.boostpro.com

From: Dave Abrahams
on Fri Jan 06 2012, "Martin Bidlingmaier" <Martin.Bidlingmaier-AT-gmx.de> wrote:
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost?
It was only a matter of time before someone proposed this. Of course it has a chance to make it into boost. I guess the the first question will be, "can you describe some plausible use-cases?"
Regular expressions. If there were an interpreted language (perhaps domain specific) with a simple grammar then we could inline string constants with code into C++ code that gets parsed at compile time to generate executable code. So simple things like sed and awk scripts could also be inlined into C++ as string literals and translated to executable code at compile time instead of runtime if you combine the interpreter library with the compile time parsing library. The benefit over run time interpretation of the string is debatable. It is only a performance optimization. I would say regular expression is the thing that is crying out for it. The real question is what kind of syntax errors will it generate when parsing fails? It would be nice to catch badly formed regular expressions at compile time, for example, though not so nice if the errors are not so nice. What this allows us to do is extend the language in the form of libraries (in a hackish sort of way) which is the desire of modern language theorists from what I've been told by my language theorist friends. I'd say it's almost as exciting as overloading the comma operator. Regards, Luke

on Sat Jan 07 2012, "Simonson, Lucanus J" <lucanus.j.simonson-AT-intel.com> wrote:
From: Dave Abrahams
on Fri Jan 06 2012, "Martin Bidlingmaier" <Martin.Bidlingmaier-AT-gmx.de> wrote:
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost?
It was only a matter of time before someone proposed this. Of course it has a chance to make it into boost. I guess the the first question will be, "can you describe some plausible use-cases?"
Regular expressions.
Yeah, sort of like xpressive.
If there were an interpreted language (perhaps domain specific) with a simple grammar then we could inline string constants with code into C++ code that gets parsed at compile time to generate executable code. So simple things like sed and awk scripts could also be inlined into C++ as string literals and translated to executable code at compile time instead of runtime if you combine the interpreter library with the compile time parsing library. The benefit over run time interpretation of the string is debatable. It is only a performance optimization. I would say regular expression is the thing that is crying out for it. The real question is what kind of syntax errors will it generate when parsing fails? It would be nice to catch badly formed regular expressions at compile time, for e xample, though not so nice if the errors are not so nice. What this allows us to do is extend the language in the form of libraries (in a hackish sort of way) which is the desire of modern language theorists from what I've been told by my language theorist friends. I'd say it's almost as exciting as overloading the comma operator.
Haha. Actually, you gain expressivity but lose a lot of power when you do metaprogramming this way, as compared with TMP. With TMP you are working with first-class expressions so you can insert references to variables and other C++ entities right into your metaprogram. Also, different DSLs can be combined (e.g. as in Spirit and Phoenix) to do a more complicated job. That's really not so easy when you are working with strings. The desire of "modern language theorists" is to enable language extension without losing interoperability with the rest of the language, and I think what you can do with constexpr, while very cool, still falls far short of that. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On 1/7/2012 7:06 PM, Dave Abrahams wrote:
The desire of "modern language theorists" is to enable language extension without losing interoperability with the rest of the language, and I think what you can do with constexpr, while very cool, still falls far short of that.
While I agree with all of this, I still think there is a place for a constexpr string-based regex library. One of the shortcomings of xpressive is the need to learn a new syntax to use it. I anticipate that compile time will be a big problem with string-based constexpr metaprogramming. For a string of length N, you'll need to instantiate /at least/ O(N) templates just to get the parse tree. And that's only the first step. With expression templates, you get the parse tree for free, and even so compile times have been a bottleneck. -- Eric Niebler BoostPro Computing http://www.boostpro.com

The Dave wrote:
Actually, you gain expressivity but lose a lot of power when you do metaprogramming this way, as compared with TMP. With TMP you are working with first-class expressions so you can insert references to variables and other C++ entities right into your metaprogram. Also, different DSLs can be combined (e.g. as in Spirit and Phoenix) to do a more complicated job. That's really not so easy when you are working with strings.
Yes, I had already considered this. You would need to register C++ variables with the interpreter module to make them available to the interpreted code. It would be a pain in the neck, but could be encapsulated in a wrapper function. If the interpreter returns a function that is passable it becomes just as composable as any other function. That said, I don't think it's even remotely practical. The only reason to do it is because it is cool.
The desire of "modern language theorists" is to enable language extension without losing interoperability with the rest of the language, and I think what you can do with constexpr, while very cool, still falls far short of that.
Yes, well, I guess I got a little carried away. This constexr based idea for a limited and painfully difficult to achieve mixed language application (which we have plenty of practical examples of) is indeed a far cry from implementing compiler optimizations as a library. Regards, Luke

Hi,
Regular expressions.
If there were an interpreted language (perhaps domain specific) with a simple grammar then we could inline string constants with code into C++ code that gets parsed at compile time to generate executable code. So simple things like sed and awk scripts could also be inlined into C++ as string literals and translated to executable code at compile time instead of runtime if you combine the interpreter library with the compile time parsing library. The benefit over run time interpretation of the string is debatable. It is only a performance optimization. I would say regular expression is the thing that is crying out for it. The real question is what kind of syntax errors will it generate when parsing fails? It would be nice to catch badly formed regular expressions at compile time, for example, though not so nice if the errors are not so nice. What this allows us to do is extend the language in the form of libraries (in a hackish sort of way) which is the desire of modern language theorists from what I've been told by my language theorist friends. I'd say it's almost as exciting as overloading the comma operator.
If you check the examples for our compile-time parsing library, there is a (yet incomplete) wrapper around Xpressive: you give it a compile-time string, it parses and checks it and builds an sregex object for you. Link to the example: https://github.com/sabel83/mpllibs/tree/master/libs/metaparse/example/regexp Our library deals with human-readable error messages as well, but the error message is returned as a template metaprogramming structure which can be pretty-printed. So the approach for getting a human-readable error message is the following: - You make a mistake in your DSL - You get an error from the C++ compiler - You instantiate a special template, debug_parsing_error in a separate program - You compile and run it - The error message is printed to stdout You can find an example for that (not for regular expressions but for another parser) here: https://github.com/sabel83/mpllibs/tree/master/libs/metaparse/example/parsin... The output of the example is the following: Compile-time parsing results ---------------------------- Input text: aaac Parsing failed: Error at source_position<int_<1>, int_<4>, char_<'a'>>: 'b' literal expected. Regards, Ábel

Hi Martin, 2012/1/7 Martin Bidlingmaier <Martin.Bidlingmaier@gmx.de>:
with c++11's constexpr it got possible to parse strings in c++ at compile time. I've written a parser generator to create an AST for a given EBNF syntax at compile time, which can be traversed at both run time and compile time (actually, parsing can take place at run time, too, but it's probably rather slow). At the moment, the only compiler I know of that implements enough of c++11 features is gcc (version >= 4.6). Is refining my implementation worth the effort, has such a library a chance to make it into boost? Is maybe somebody already working on this?
I also working on library for constexpr based parsing (the parser combinators). It's undocumented, but you can browse the source here. https://github.com/bolero-MURAKAMI/Sprout/tree/master/sprout/weed Example: UUID string parsing https://gist.github.com/1578326 This is currently not building AST, has been designed to retrieve data from a string. Interface has to resemble the Spirit.Qi, can be described as pseudo EBNF. If constexpr based parser combinators libraries (such as Spirit.Qi) when they are added to the Boost, I think need more discussion to be done about constexpr based design: (The design of the string class, and ExpressionTemplate, etc...) I'm not a native speaker, So I apologize if this is hard to read. Regards, Genya
participants (7)
-
Dave Abrahams
-
Eric Niebler
-
Martin Bidlingmaier
-
MURAKAMI Genya
-
Robert Ramey
-
Simonson, Lucanus J
-
Ábel Sinkovics