What about a Spirit-powered C++ syntax analysis library in Boost?

Hi Boosters, I've written a C++ syntax analysis library using Boost.Spirit. (This 'library' is actually a subset of the Scalpel library. I talked about it in the Boost mailing list here: http://article.gmane.org/gmane.comp.lib.boost.devel/208217 ) For the sake of brevity, let's call it Salsa (for Stand-ALone Syntax Analysis). While most C++ compilers need semantic information to perform the syntax analysis, Salsa is a standalone syntax analyzer. Its Spirit grammar doesn't run any semantic action. Consequently, you can use it to parse some C++ code without having to analyze a whole translation unit (i.e. without processing #include directives). At this point, you may wonder how syntax ambiguities are managed. In most cases, there's always an interpretation which is more obvious than the other one(s). In all cases, you may reasonably ask the programmer to disambiguate its code. Whatever the case, Salsa (predictably) chooses one of the interpretations. Here are some examples: The following statement…: a * b; … may be either a multiplication or a pointer declaration. The default interpretation is the pointer declaration. You can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants the syntax analyzer to interpret it as the former: (a * b); Trickier. In the following declaration…: bool bool_ = a< b || c> (d&& e); … the right-hand side expression may be either a boolean expression (where 'a', 'b', 'c', 'd' and 'e' are variables of type bool) or a function template call (whose name is 'a', which takes one bool template parameter and where 'b', 'c', 'd' and 'e' are all variables of type const bool). The default interpretation is the boolean expression. Once again, you can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants Salsa to interpret it as the latter: bool bool_ = a< (b || c)> (d&& e); (Actually, I wonder why the standard allows such ambiguities.) Note: Salsa isn't finalized yet, but it successfully parses Apache's implementation of the C++ standard library. I'd like to know: is there a reasonable chance that such a library will be accepted into Boost?

On Wed, Sep 8, 2010 at 5:07 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
Hi,
I've written a C++ syntax analysis library using Boost.Spirit. (This 'library' is actually a subset of the Scalpel library. I talked about it in the Boost mailing list here: http://article.gmane.org/gmane.comp.lib.boost.devel/208217 ) For the sake of brevity, let's call it Salsa (for Stand-ALone Syntax Analysis).
[snip] I've been looking for a lightweight and portable solution for automatic registering of things with a reflection library I'm working on (http://bit.ly/bn7iYM). I've already looked at tools like gcc, clang, doxygen + xslt, etc. but most of them have too many dependencies or are too heavyweight and clumsy for this task. I've had time only to briefly 'scroll through' the docs for Scalpel but it seems like the thing I've been looking for and I'm planning to explore the possibility to use it. But since the registering concerns mostly things like class declarations, which *usually* don't need preprocessing, the Salsa library might be even better.
I'd like to know: is there a reasonable chance that such a library will be accepted into Boost?
In my opinion it would be a great addition to Boost and useful in many other situations besides the one that I mentioned above. BR Matus

On Wed, Sep 8, 2010 at 8:07 AM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
I've written a C++ syntax analysis library using Boost.Spirit. (This 'library' is actually a subset of the Scalpel library. I talked about it in the Boost mailing list here: http://article.gmane.org/gmane.comp.lib.boost.devel/208217 ) For the sake of brevity, let's call it Salsa (for Stand-ALone Syntax Analysis).
While most C++ compilers need semantic information to perform the syntax analysis, Salsa is a standalone syntax analyzer. Its Spirit grammar doesn't run any semantic action. Consequently, you can use it to parse some C++ code without having to analyze a whole translation unit (i.e. without processing #include directives).
At this point, you may wonder how syntax ambiguities are managed. In most cases, there's always an interpretation which is more obvious than the other one(s). In all cases, you may reasonably ask the programmer to disambiguate its code. Whatever the case, Salsa (predictably) chooses one of the interpretations. Here are some examples:
The following statement…: a * b; … may be either a multiplication or a pointer declaration. The default interpretation is the pointer declaration. You can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants the syntax analyzer to interpret it as the former: (a * b);
I really didn't want to get into this, but you asked me to weight in, so... You cannot reasonably ask the programmer to disambiguate the code for you, especially when existing tools handle the code just fine. "Change you code, then you can try out my tool" is the fastest way to kill off any chance of large-scale adoption.
Trickier. In the following declaration…: bool bool_ = a< b || c> (d&& e); … the right-hand side expression may be either a boolean expression (where 'a', 'b', 'c', 'd' and 'e' are variables of type bool) or a function template call (whose name is 'a', which takes one bool template parameter and where 'b', 'c', 'd' and 'e' are all variables of type const bool). The default interpretation is the boolean expression. Once again, you can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants Salsa to interpret it as the latter: bool bool_ = a< (b || c)> (d&& e); (Actually, I wonder why the standard allows such ambiguities.)
That's not enough, actually: "a" may still be a class template or a function template. How will you handle that ambiguity? More importantly, do you believe that you can handle *every* ambiguity in the C++ language in this way, by asking the user to insert parentheses that no other tool requires?
I'd like to know: is there a reasonable chance that such a library will be accepted into Boost?
That's decided by the Boost community, but if I were to review a library that professes to parse C++ while actually parsing an arbitrarily-disambiguated subset of the C++ language, or that cannot parse Boost itself, I would vote against acceptance. - Doug
participants (3)
-
Doug Gregor
-
Florian Goujeon
-
Matus Chochlik