Scalpel: a Spirit&Wave-powered C++ source code analysis library

Dear Boosters, I'd like to introduce to you the Scalpel library, a project on which I've worked during the last two years. Scalpel is a C++ library. Its name stands for source code analysis, libre and portable library. It is still under development, but is a fairly advanced work in progress. The purpose of this library is to produce a data structure which corresponds to the meaning (or semantics) of a given C++ source code. It reveals notions such as namespaces, classes, functions, variables, types, etc.. Some source code analyzers, like those used by syntax coloring and autocomplete modules which need to be fast, perform a superficial analysis. Unlike them, Scalpel aims to accomplish a strict and exhaustive analysis so that it could even be used as a compiler front-end. Actually, Scalpel is a compiler front-end, since it goes through the phases of preprocessing, syntax analysis and semantic analysis, just like every C++ compiler does. Maybe one day there will be a Scalpel-powered C++ compiler! Besides, Scalpel's analysis depth will be adjustable in order to fit the needs of most programs. For example, it could be possible to disable the function body analysis for those who need to retrieve namespace and class members only. The labor of C++ source code analysis is extremely complex. This is why having a library wholly devoted to it is a good thing. Many programs could take advantage of such a library. Among them we can find modules for code editors, reverse-engineering tools, code audit software and many other CASE (Computer-Aided Software Engineering) tools that remain to be invented… For further information, visit the Scalpel project's website: http://42ndart.org/scalpel/ The reason why I post this message to the Boost's mailing list is that Scalpel exclusively uses Boost libraries, notably Boost.Wave (for the preprocessing part) and Boost.Spirit (for the syntax analysis part). I believe Scalpel could be a good candidate for inclusion in Boost in the future. But in the meantime, my project needs contributors: I simply cannot do all this work by myself. If I've caught your interest, please visit the Scalpel project's website. Feedback are welcome!

On Thu, Sep 2, 2010 at 6:55 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
Dear Boosters,
I'd like to introduce to you the Scalpel library, a project on which I've worked during the last two years.
Scalpel is a C++ library. Its name stands for source code analysis, libre and portable library. It is still under development, but is a fairly advanced work in progress.
The purpose of this library is to produce a data structure which corresponds to the meaning (or semantics) of a given C++ source code. It reveals notions such as namespaces, classes, functions, variables, types, etc..
Some source code analyzers, like those used by syntax coloring and autocomplete modules which need to be fast, perform a superficial analysis. Unlike them, Scalpel aims to accomplish a strict and exhaustive analysis so that it could even be used as a compiler front-end. Actually, Scalpel is a compiler front-end, since it goes through the phases of preprocessing, syntax analysis and semantic analysis, just like every C++ compiler does. Maybe one day there will be a Scalpel-powered C++ compiler!
Besides, Scalpel's analysis depth will be adjustable in order to fit the needs of most programs. For example, it could be possible to disable the function body analysis for those who need to retrieve namespace and class members only.
The labor of C++ source code analysis is extremely complex. This is why having a library wholly devoted to it is a good thing.
Many programs could take advantage of such a library. Among them we can find modules for code editors, reverse-engineering tools, code audit software and many other CASE (Computer-Aided Software Engineering) tools that remain to be invented…
Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud and propose that we already have such a library. Clang: http://clang.llvm.org/ Clang is an open-source C++ library developed under a Boost-compatible BSD-like license [1]. It's written completely in C++, and performs preprocessing, parsing, semantic analysis, and code generation for C/C++/Objective-C/Objective-C++ (+ OpenCL, if you have a supporting environment) on a variety of targets. It's designed as a set of reusable libraries, so that it can form the basis of tools, and includes support libraries for program indexing, source-to-source transformation, code completion, and static analysis (among others!). Clang already implements support for the entire C++98/03 language (except exported templates), and does so well enough that it can handle all of Boost [2]. For reference, check out the dgregor2/clang-darwin-2.8 column on today's Boost regression-test results: http://www.boost.org/development/tests/release/developer/summary.html where you'll see that Clang is passing nearly every Boost regression test on the release branch. Clang is supported both by industry [3] and by an awesome open-source community, which (as with Boost) makes for a great symbiotic relationship: industry provides the stability and focus needed to turn Clang into a production-quality C++ compiler, while the open-source community provides a wealth of ideas and vision that pushes Clang into new areas. For example, a group within the Clang community has taken it upon themselves to start implementing Microsoft-specific extensions to make Clang far easier to use on Windows. Writing a C++ parser/compiler requires years of full-time technical effort, and I strongly encourage you not to begin yet another open-source C++ parser. There are already two good open-source C++ compilers, GCC and Clang, and I'd strongly recommend working toward making one of those two projects better. As an added bonus, both compilers are at a stage where you can come into the community and work on the fun stuff (C++0x features, tools for C++ programmers, optimizations, etc.) rather than slog through the dull parts of C++ (initialization, name lookup, access control). - Doug, Clang C++ technical lead [1] Scalpel appears to be under an LGPL license, which is not Boost-compatible. [2] Clang hit this milestone back in May: http://blog.llvm.org/2010/05/clang-builds-boost.html [3] Apple has already shipped Clang as a C/Objective-C compiler (about a year ago). Apparently, several OpenCL implementations are also based on Clang (see http://en.wikipedia.org/wiki/OpenCL), and there has been significant interest from industry

Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud and propose that we already have such a library. Clang: I strongly encourage you not to begin yet another open-source C++ parser. Before I started this project two years ago, I obviously checked whether there was any similar C++ source code analysis library
[1] Scalpel appears to be under an LGPL license, which is not Boost-compatible. In the beginning, Scalpel was under GPL. Hartmut Kaiser, Joel de Guzman and some fellows of mine convinced me to switch under a more
Hi Doug, On 09/03/2010 05:04 PM, Doug Gregor wrote: project. However, I didn't find anything. In the meantime, I did discover the existence of Clang, but I already spent a lot of time working on Scalpel. I must confess it was a pretty bad news for me, but I've decided to carry on in spite of it. After all, compared with the G++ front-end, Clang is yet another open-source C++ parser as well. Similarly, LLVM is yet another open-source compiler compared with GCC, ArchLinux is yet another GNU/Linux distro compared with Debian, and so on. All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities. I've been working on Scalpel for two years and I strongly intend to complete it. Even more so, I encourage developers to contribute to the open-source software's diversity! liberal software license. Then, I've switched to LGPL. If one day Scalpel is accepted into Boost, I'll release it under the BSL without any hesitation.

On Fri, Sep 3, 2010 at 6:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that. Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Fri, Sep 3, 2010 at 4:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
[1] Scalpel appears to be under an LGPL license, which is not Boost-compatible.
In the beginning, Scalpel was under GPL. Hartmut Kaiser, Joel de Guzman and some fellows of mine convinced me to switch under a more liberal software license. Then, I've switched to LGPL. If one day Scalpel is accepted into Boost, I'll release it under the BSL without any hesitation.
LGPL is still too restrictive for single binary distributions, Clang is under an MIT style license. On Fri, Sep 3, 2010 at 4:57 PM, Dave Abrahams <dave@boostpro.com> wrote:
On Fri, Sep 3, 2010 at 6:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job.
I do not think it can do that either, without seeing the whole translation unit then you are going to see a *LOT* of undefined symbols, no clue if they are a type, function, etc... etc.... It is impossible to have any kind of decent syntax-coloring without that (see the difference between the fully parsed and complete Visual Assist VS add-in compared to emacs/vi/VS/etc...), and refactoring becomes all but impossible.

On 09/04/2010 03:50 AM, OvermindDL1 wrote:
LGPL is still too restrictive for single binary distributions, Clang is under an MIT style license. I know that and I agree with you. I think I'll change Scalpel's license to a BSD-like license sooner or later. I temporarily keep it under LGPL in order to prevent a hypothetical proprietary fork. I guess you may find it overcautious, not to say pointless, but I spent a lot of time working on this project and it's not easy to make such a decision.

On Saturday 04 September 2010 00:39:05 Florian Goujeon wrote:
On 09/04/2010 03:50 AM, OvermindDL1 wrote:
LGPL is still too restrictive for single binary distributions, Clang is under an MIT style license.
I know that and I agree with you. I think I'll change Scalpel's license to a BSD-like license sooner or later. I temporarily keep it under LGPL in order to prevent a hypothetical proprietary fork. I guess you may find it overcautious, not to say pointless, but I spent a lot of time working on this project and it's not easy to make such a decision.
You could always release under LGPL with a static link exception. I work for a commercial software company, and we use LGPL all of the time. Dynamic linking is not an unnecessary restriction for most users. The static link exception is nice for those that want to release a single binary though. Bottom line is that you've clearly shown that you're amenable to changing the license to be more permissive in the future, and it's really counterproductive to nitpick about your choice of license terms. It is, after all, your blood and sweat that brought this project to fruition.

On Fri, Sep 3, 2010 at 9:50 PM, OvermindDL1 <overminddl1@gmail.com> wrote:
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job.
I do not think it can do that either, without seeing the whole translation unit then you are going to see a *LOT* of undefined symbols, no clue if they are a type, function, etc... etc....
By exploring all possible valid parses you can usually deduce the role of a symbol from all the contexts in which it is used. Humans do it all the time.
It is impossible to have any kind of decent syntax-coloring without that (see the difference between the fully parsed and complete Visual Assist VS add-in compared to emacs/vi/VS/etc...), and refactoring becomes all but impossible.
emacs, vi, and VS only make local decisions about each symbol. You ought to be able to do much better by taking in an entire file and throwing out the possibilities that result in invalid parses. I know this approach works for natural language, which is full of the same kinds of ambiguities (homonyms/homophones). Actually the NLP case is much harder because a word doesn't have to be used consistently across sentences. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

Hi, On Sun, Sep 5, 2010 at 2:45 AM, Dave Abrahams <dave@boostpro.com> wrote:
emacs, vi, and VS only make local decisions about each symbol. You ought to be able to do much better by taking in an entire file and throwing out the possibilities that result in invalid parses. I know this approach works for natural language, which is full of the same kinds of ambiguities (homonyms/homophones). Actually the NLP case is much harder because a word doesn't have to be used consistently across sentences.
FYI, there is an GCCSense project(http://cx4a.org/software/gccsense/) to use with emacs/vim for translation unit aware code completion. Best regards, -- Ryo IGARASHI, Ph.D. rigarash@gmail.com

Or red underline, like a spell checker. And blue underline like MS Word for "grammar" (syntax) errors. You get the idea. Might as well do real spell checking (against dictionary and code identifiers) in the comments as well...
AFAIK MS Visual Studio 2010 do that for C++ now. Not perfect yet but helpful. On Wed, Sep 8, 2010 at 09:38, Ryo IGARASHI <rigarash@gmail.com> wrote:
Hi,
On Sun, Sep 5, 2010 at 2:45 AM, Dave Abrahams <dave@boostpro.com> wrote:
emacs, vi, and VS only make local decisions about each symbol. You ought to be able to do much better by taking in an entire file and throwing out the possibilities that result in invalid parses. I know this approach works for natural language, which is full of the same kinds of ambiguities (homonyms/homophones). Actually the NLP case is much harder because a word doesn't have to be used consistently across sentences.
FYI, there is an GCCSense project(http://cx4a.org/software/gccsense/) to use with emacs/vim for translation unit aware code completion.
Best regards, -- Ryo IGARASHI, Ph.D. rigarash@gmail.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Wed, Sep 8, 2010 at 1:54 AM, Klaim <mjklaim@gmail.com> wrote:
Or red underline, like a spell checker. And blue underline like MS Word for "grammar" (syntax) errors. You get the idea. Might as well do real spell checking (against dictionary and code identifiers) in the comments as well...
AFAIK MS Visual Studio 2010 do that for C++ now. Not perfect yet but helpful.
As does Xcode 4 (developer preview). - Doug

At Wed, 8 Sep 2010 16:38:36 +0900, Ryo IGARASHI wrote:
FYI, there is an GCCSense project(http://cx4a.org/software/gccsense/) to use with emacs/vim for translation unit aware code completion.
Hi Ryo, Wow, that is *cool*. Thank you very much for posting the link. I'll definitely be trying it! -- Dave Abrahams BoostPro Computing http://www.boostpro.com

David Abrahams <dave@boostpro.com> writes:
At Wed, 8 Sep 2010 16:38:36 +0900, Ryo IGARASHI wrote:
FYI, there is an GCCSense project(http://cx4a.org/software/gccsense/) to use with emacs/vim for translation unit aware code completion.
Wow, that is *cool*. Thank you very much for posting the link. I'll definitely be trying it!
It *is* cool. I just saw this, and installed it. It works very well, and is pretty quick too. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++0x thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

At Wed, 08 Sep 2010 19:18:57 +0200, joel falcou wrote:
On 08/09/10 17:36, Anthony Williams wrote:
It *is* cool. I just saw this, and installed it. It works very well, and is pretty quick too.
Seconded, i was astonished by the lightweightishness of the whole thing :o
Now, can I plug clang into it in lieu of GCC? -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Fri, Sep 3, 2010 at 3:57 PM, Dave Abrahams <dave@boostpro.com> wrote:
On Fri, Sep 3, 2010 at 6:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Clang does syntax coloring [*], although it does so with knowledge of the full translation unit.
Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job.
Perhaps, although I completely disagree with the "very easily" bit. C++ is a ridiculously ambiguous language. Note that a compiler could implement these same techniques along its recovery path to both improve diagnostics and improve support for syntax coloring. - Doug [*] The C API is here: http://clang.llvm.org/doxygen/group__CINDEX__LEX.html

On Sun, Sep 5, 2010 at 12:35 PM, Doug Gregor <doug.gregor@gmail.com> wrote:
On Fri, Sep 3, 2010 at 3:57 PM, Dave Abrahams <dave@boostpro.com> wrote:
On Fri, Sep 3, 2010 at 6:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Clang does syntax coloring [*], although it does so with knowledge of the full translation unit.
And you know I know that ;-)
Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job.
Perhaps, although I completely disagree with the "very easily" bit. C++ is a ridiculously ambiguous language.
Perhaps I overstated the case a *wee little* bit ;-)
Note that a compiler could implement these same techniques along its recovery path to both improve diagnostics and improve support for syntax coloring.
Now it's my turn to be a little skeptical. I can't imagine a compiler would ever try to do error recovery by throwing out all the information from #include files you had already processed. -- Dave Abrahams BoostPro Computing http://www.boostpro.com

On Tue, Sep 7, 2010 at 4:04 AM, Dave Abrahams <dave@boostpro.com> wrote:
On Sun, Sep 5, 2010 at 12:35 PM, Doug Gregor <doug.gregor@gmail.com> wrote:
On Fri, Sep 3, 2010 at 3:57 PM, Dave Abrahams <dave@boostpro.com> wrote:
On Fri, Sep 3, 2010 at 6:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Clang does syntax coloring [*], although it does so with knowledge of the full translation unit.
And you know I know that ;-)
:)
Of course I realize you can't always get a correct analysis if you don't see the whole TU, but especially if you're willing to do nondeterministic parsing/backtracking, you could very easily do a really good job.
Perhaps, although I completely disagree with the "very easily" bit. C++ is a ridiculously ambiguous language.
Perhaps I overstated the case a *wee little* bit ;-)
Note that a compiler could implement these same techniques along its recovery path to both improve diagnostics and improve support for syntax coloring.
Now it's my turn to be a little skeptical. I can't imagine a compiler would ever try to do error recovery by throwing out all the information from #include files you had already processed.
Heck no! But in the absence of information (e.g., an #include couldn't be found, or an identifier is horribly mis-typed), such approaches could drastically improve recovery. - Doug

Doug Gregor wrote:
Dave Abrahams wrote:
Now it's my turn to be a little skeptical. I can't imagine a compiler would ever try to do error recovery by throwing out all the information from #include files you had already processed.
Heck no! But in the absence of information (e.g., an #include couldn't be found, or an identifier is horribly mis-typed), such approaches could drastically improve recovery.
From my naive user point of view, there is a "preprocessing" step, a "compilation" step and a "linking" step. As a compiler user, I would prefer that the compiler doesn't start the "compilation" step in case the "preprocessing" step failed (e.g., an #include couldn't be found). Similar, I would prefer that the compiler doesn't start the "link" step in case the "compilation" step failed.
I know that GCC has a different opinion about this, but I prefer the behavior MSVC in this case. Regards, Thomas

On Tue, Sep 7, 2010 at 7:16 AM, Thomas Klimpel <Thomas.Klimpel@synopsys.com> wrote:
Doug Gregor wrote:
Dave Abrahams wrote:
Now it's my turn to be a little skeptical. I can't imagine a compiler would ever try to do error recovery by throwing out all the information from #include files you had already processed.
Heck no! But in the absence of information (e.g., an #include couldn't be found, or an identifier is horribly mis-typed), such approaches could drastically improve recovery.
From my naive user point of view, there is a "preprocessing" step, a "compilation" step and a "linking" step. As a compiler user, I would prefer that the compiler doesn't start the "compilation" step in case the "preprocessing" step failed (e.g., an #include couldn't be found). Similar, I would prefer that the compiler doesn't start the "link" step in case the "compilation" step failed.
The right answer often depends on how you're using the parser. As a compiler, Clang stops parsing after a missing #include, because there's rarely any point in continuing the parse. When performing syntax highlighting or code completion, you want results even though the source is never actually going to compile. - Doug

The right answer often depends on how you're using the parser. As a compiler, Clang stops parsing after a missing #include, because there's rarely any point in continuing the parse. When performing syntax highlighting or code completion, you want results even though the source is never actually going to compile.
- Doug
I want my "compiler" "compiling" in the background of my IDE at all times, as I type. I want undefined identifiers (or thing that appear to be indentifiers) to be colored red (or whatever) until I fix them up. etc. As helpful as possible without being annoying (ie no dialogs pop up or anything like that). Once my code will actually pass a compile, I want it to already have. :-) Tony

On Tue, Sep 7, 2010 at 11:07 PM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
The right answer often depends on how you're using the parser. As a compiler, Clang stops parsing after a missing #include, because there's rarely any point in continuing the parse. When performing syntax highlighting or code completion, you want results even though the source is never actually going to compile.
- Doug
I want my "compiler" "compiling" in the background of my IDE at all times, as I type. I want undefined identifiers (or thing that appear to be indentifiers) to be colored red (or whatever) until I fix them up. etc. As helpful as possible without being annoying (ie no dialogs pop up or anything like that).
Once my code will actually pass a compile, I want it to already have. :-)
Tony
Or red underline, like a spell checker. And blue underline like MS Word for "grammar" (syntax) errors. You get the idea. Might as well do real spell checking (against dictionary and code identifiers) in the comments as well... Tony

On 08/09/2010 04:07, Gottlob Frege wrote:
The right answer often depends on how you're using the parser. As a compiler, Clang stops parsing after a missing #include, because there's rarely any point in continuing the parse. When performing syntax highlighting or code completion, you want results even though the source is never actually going to compile.
- Doug
I want my "compiler" "compiling" in the background of my IDE at all times, as I type. I want undefined identifiers (or thing that appear to be indentifiers) to be colored red (or whatever) until I fix them up. etc. As helpful as possible without being annoying (ie no dialogs pop up or anything like that).
Once my code will actually pass a compile, I want it to already have. :-)
That approach doesn't work when you're cross-compiling, and using headers only available on the target platform. Yet you still want syntax coloring.

On Wed, Sep 8, 2010 at 4:19 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 08/09/2010 04:07, Gottlob Frege wrote:
The right answer often depends on how you're using the parser. As a compiler, Clang stops parsing after a missing #include, because there's rarely any point in continuing the parse. When performing syntax highlighting or code completion, you want results even though the source is never actually going to compile.
- Doug
I want my "compiler" "compiling" in the background of my IDE at all times, as I type. I want undefined identifiers (or thing that appear to be indentifiers) to be colored red (or whatever) until I fix them up. etc. As helpful as possible without being annoying (ie no dialogs pop up or anything like that).
Once my code will actually pass a compile, I want it to already have. :-)
That approach doesn't work when you're cross-compiling, and using headers only available on the target platform.
It works perfectly fine. Your compiler/IDE just needs to know the target (including where those headers reside), but it has to know that anyway to produce code. There would be a slight issue with GCC, because a given GCC executable only targets a single architecture, and multiple instances of GCC can't coexist in an executable. I don't know how common that limitation is, but Clang (for example) allows dynamic selection of the target architecture and allows multiple instances. - Doug

On 08/09/2010 17:07, Doug Gregor wrote:
On Wed, Sep 8, 2010 at 4:19 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 08/09/2010 04:07, Gottlob Frege wrote: That approach doesn't work when you're cross-compiling, and using headers only available on the target platform.
It works perfectly fine. Your compiler/IDE just needs to know the target (including where those headers reside), but it has to know that anyway to produce code.
There would be a slight issue with GCC, because a given GCC executable only targets a single architecture, and multiple instances of GCC can't coexist in an executable. I don't know how common that limitation is, but Clang (for example) allows dynamic selection of the target architecture and allows multiple instances.
Alright, then replace "cross-compiling" by "developing on a platform that cannot compile the target". From my experience, it is quite typical to develop code on a platform and compile it on another.

On Wed, Sep 8, 2010 at 11:10 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 08/09/2010 17:07, Doug Gregor wrote:
On Wed, Sep 8, 2010 at 4:19 AM, Mathias Gaunard <mathias.gaunard@ens-lyon.org> wrote:
On 08/09/2010 04:07, Gottlob Frege wrote: That approach doesn't work when you're cross-compiling, and using headers only available on the target platform.
It works perfectly fine. Your compiler/IDE just needs to know the target (including where those headers reside), but it has to know that anyway to produce code.
There would be a slight issue with GCC, because a given GCC executable only targets a single architecture, and multiple instances of GCC can't coexist in an executable. I don't know how common that limitation is, but Clang (for example) allows dynamic selection of the target architecture and allows multiple instances.
Alright, then replace "cross-compiling" by "developing on a platform that cannot compile the target".
Ah, that's a different issue. Any system for syntax highlighting, code completion, indexing, etc., will be slightly hobbled in this environment, since you don't have the declarations you need to actual provide good suggestions. Still, so long as the parser's recovery is okay, this will work. - Doug

Hi Florian, On Fri, Sep 3, 2010 at 3:22 PM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
On 09/03/2010 05:04 PM, Doug Gregor wrote:
Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud and propose that we already have such a library. Clang: I strongly encourage you not to begin yet another open-source C++ parser.
Before I started this project two years ago, I obviously checked whether there was any similar C++ source code analysis library project. However, I didn't find anything.
In the meantime, I did discover the existence of Clang, but I already spent a lot of time working on Scalpel. I must confess it was a pretty bad news for me, but I've decided to carry on in spite of it. After all, compared with the G++ front-end, Clang is yet another open-source C++ parser as well. Similarly, LLVM is yet another open-source compiler compared with GCC, ArchLinux is yet another GNU/Linux distro compared with Debian, and so on.
All competition is stimulating. It's beneficial for everyone. All competitors are different from each other and aim to bring a surplus value. As I said, Scalpel brings high homogeneity with Boost. It has its own unique design and I also plan to endow it with round-trip engineering capabilities.
I agree, to a point. New projects need critical mass to effectively compete with established projects, and without effective competition we don't see the benefits of diversity; we just see redundancy. At some point, a community coalesces around a few projects that compete on the large scale, while the majority of the competition/diversity moves to subprojects within those large projects. See GNOME vs. Qt, or WebKit vs. Gecko, where there is unlikely to be a third large-scale competitor in the open-source world, but there is a ton of innovation within both projects. Only if those large-scale projects stop innovating, or calcify around inflexible architectures, will there be an opening for a third large-scale competitor. I want to see great, new ideas in C++ parsing and development tools, but I strongly feel that those ideas could be far better disseminated through extending/adapting/changing existing the large-scale, industry-backed projects (GCC or Clang/LLVM) than by bringing up a third large-scale competitor.
I've been working on Scalpel for two years and I strongly intend to complete it. Even more so, I encourage developers to contribute to the open-source software's diversity!
I won't try to dissuade you further, because I've been in precisely the same position as you are now. Best of luck to you!
[1] Scalpel appears to be under an LGPL license, which is not Boost-compatible.
In the beginning, Scalpel was under GPL. Hartmut Kaiser, Joel de Guzman and some fellows of mine convinced me to switch under a more liberal software license. Then, I've switched to LGPL. If one day Scalpel is accepted into Boost, I'll release it under the BSL without any hesitation.
One note of caution: if you start getting contributions from others, you'll have to ask permission of each and every one of them when you want to switch licenses. Boost went through this when we switched over to the Boost Software License, and it's a real pain in the butt. Better to switch to the license you want now, or (barring that) get copyright assignment along with each contribution (as is done by the FSF) to ensure that you can easily switch later. - Doug

Hi Doug, I've been thinking about it for the past five days, and I must face it: I'm the only one developer of Scalpel, while Clang is much more advanced and maintained by a whole community, I don't have any chance. Besides, when you say:
New projects need critical mass to effectively compete with established projects, and without effective competition we don't see the benefits of diversity; we just see redundancy.
I finally have to admit you're totally right. I thought the round-trip engineering/refactoring feature I planned to develop would have made Scalpel unique, but it seems Clang's librewrite already does it. It's terribly hard to accept, but c'est la vie. However, this would be a pity to throw all my work away. I could try something. It's related to Dave's suggestion:
One area that scalpel could conceivably find a niche, depending on how you do it, would be in analyzing source code without seeing the full translation unit (as you might for syntax-coloring purposes). Since CLANG is really built to be a compiler, I don't think it can do that.
Scalpel could actually do it. While it's true that C++ is a "ridiculously ambiguous language", it turns out that Scalpel's design has something special. I wanted the syntax analyzer to be very loosely coupled with the semantic analyzer. Consequently, the syntax analyzer is standalone. The Spirit grammar doesn't run any semantic action. ***** At this point, you may wonder how I planned to manage syntax ambiguities. There is two types of syntax ambiguity cases: 1) cases where there's always an interpretation which is more obvious than the other one(s); 2) cases where you may reasonably ask the programmer to disambiguate its code. Whatever the case, the syntax analyzer (predictably) chooses one of the interpretations. Here are some examples: The following line of code…: a * b; … may be either a multiplication or a pointer declaration. The default interpretation is the pointer declaration. You can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants the syntax analyzer to interpret it as the former: (a * b); Trickier. The following line of code…: a < b || c > d; … may be either a boolean expression (a, b, c and d are variables of type bool) or a variable declaration (whose name is 'd' and whose type is a<b || c>, where 'a' is a class template taking one bool template parameter and where 'b' and 'c' are both variables of type const bool). The default interpretation is the boolean expression. You can reasonably ask the programmer to disambiguate the code by putting parenthesis if he wants the syntax analyzer to interpret it as the latter: a < (b || c) > d; Actually, I even wonder why the standard allows such ambiguities. Note: Scalpel successfully parses Apache's implementation of the C++ standard library. ***** I could extract the syntax analyzer of Scalpel to create such a library. This would save one year of work out of two and make the syntax analysis even more generic than it would have been by staying encapsulated in Scalpel. HOWEVER. I started the Scalpel project for two reasons: 1) I would have liked to develop/use a kind of UML tool with round-trip engineering capabilities (i.e. able to generate a class diagram from the source code and able to synchronize the source code after a modification of that diagram) which would have used Scalpel. 2) I'm a 24 year old software engineer and completing such a complex project could have been good for my starting career. Developing a syntax analysis library is far less impressive than developing a full front-end. So, point 2 is out. So is point 1, for obvious reasons. It seems like I don't have significant interest in starting this project. UNLESS… Of course, I like to code and I would be glad not to throw my whole two-year work away. Besides, just like Doug said: "I want to see great, new ideas in C++ parsing and development tools", just for the sake of C++. But this would be even better if my career (and, secondarily, my personal satisfaction) could still take advantage of it. This is why is need to know: is there a reasonable chance that such a library will be accepted into Boost? This would be a significant motive for me.
I won't try to dissuade you further, because I've been in precisely the same position as you are now. Best of luck to you!
Thank you anyway ;).
If one day Scalpel is accepted into Boost, I'll release it under the BSL without any hesitation.
One note of caution: if you start getting contributions from others, you'll have to ask permission of each and every one of them when you want to switch licenses. Boost went through this when we switched over to the Boost Software License, and it's a real pain in the butt. Better to switch to the license you want now, or (barring that) get copyright assignment along with each contribution (as is done by the FSF) to ensure that you can easily switch later.
I planned to apply the latter for Scalpel. For the hypothetical new library, I'll switch to the BSL right in the beginning.

On 09/07/2010 06:30 AM, Florian Goujeon wrote: So, any idea?
You could use your library to build a replacement for ctags (ctags.sf.net). ctags supports over 40 languages and uses regexes to do so. ctags hardly copes with overloading. The user is supposed to choose among overloads. A compiler based tool should be able to do a lot better. There is a need for such an open source tool in the unix world. Regards, Dmitry

Hi Dmitry, On 09/08/2010 12:20 PM, Dmitry Goncharov wrote:
You could use your library to build a replacement for ctags (ctags.sf.net).
ctags supports over 40 languages and uses regexes to do so. ctags hardly copes with overloading. The user is supposed to choose among overloads.
A compiler based tool should be able to do a lot better. There is a need for such an open source tool in the unix world. Scalpel has been designed to be used by such tools. But I guess Clang can do the job.

On Wed, Sep 8, 2010 at 7:27 AM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
Hi Dmitry,
[snip]
Scalpel has been designed to be used by such tools. But I guess Clang can do the job.
A spirit&wave implementation of the C++ grammar is useful in its own right. With or without clang. Regards, -- Felipe Magno de Almeida

On Wed, Sep 8, 2010 at 8:23 AM, Felipe Magno de Almeida <felipe.m.almeida@gmail.com> wrote:
On Wed, Sep 8, 2010 at 7:27 AM, Florian Goujeon <florian.goujeon@42ndart.org> wrote:
Hi Dmitry,
[snip]
Scalpel has been designed to be used by such tools. But I guess Clang can do the job.
A spirit&wave implementation of the C++ grammar is useful in its own right.
I agree in the abstract, but how good must such a parser be for it to be useful? Must it handle a simple "Hello, world!"? <iostream> on common platforms? But the real question is... Would Boost accept a C++ parser library that cannot parse all of Boost? That's a very high bar for library acceptance. However, if it can't parse Boost, we can't use it to build cool new libraries and tools that deal with C++ code, because we could not use those tools ourselves. And that's the whole point of this exercise: we want a C++ parser library so we can make cool new tools for ourselves and other C++ programmers. My conclusion, then, is "no": Boost would not accept a C++ parser library that cannot parse Boost itself. That's the acceptance criteria, far more than any other technical concern. - Doug

On 09/08/2010 12:27 PM, Doug Gregor wrote:
On Wed, Sep 8, 2010 at 8:23 AM, Felipe Magno de Almeida <felipe.m.almeida@gmail.com> wrote:
A spirit&wave implementation of the C++ grammar is useful in its own right.
I agree in the abstract, but how good must such a parser be for it to be useful?
My conclusion, then, is "no": Boost would not accept a C++ parser library that cannot parse Boost itself. That's the acceptance criteria, far more than any other technical concern.
On a meta-level: I see the same questions popping up again and again on this list. It apparently is considered way cool if the entire world could be recreated using boost components. While certainly fun and useful for educational purposes, I'm not convinced of the practicality of such a task. Not to speak of all the concerns that only become relevant after the library has been written and accepted, such as keeping (evil tongues would say making) boost maintainable, with such an ever growing set of components. Stefan -- ...ich hab' noch einen Koffer in Berlin...

My conclusion, then, is "no": Boost would not accept a C++ parser library that cannot parse Boost itself. That's the acceptance criteria, far more than any other technical concern. I think most Boosters didn't see my suggestion of a C++ source code
On 09/08/2010 06:27 PM, Doug Gregor wrote: parsing library, so I've opened a new thread concerning it. Here it is: http://article.gmane.org/gmane.comp.lib.boost.devel/208409 Could you please explain in that thread why such a library would not be able to parse Boost?

I'm sounding too much like an advertisement, so I'll try to batch those replies a bit more... On Wed, Sep 8, 2010 at 3:20 AM, Dmitry Goncharov <dgoncharov@unison.com> wrote:
On 09/07/2010 06:30 AM, Florian Goujeon wrote: So, any idea?
You could use your library to build a replacement for ctags (ctags.sf.net).
ctags supports over 40 languages and uses regexes to do so. ctags hardly copes with overloading. The user is supposed to choose among overloads.
A compiler based tool should be able to do a lot better. There is a need for such an open source tool in the unix world.
Yes, absolutely! To this end, Clang provides a simplified C API with detailed cross-referencing information of the form needed for this task, mapping between the source code (file/line/column) and the associated AST (expressions, statements, declarations, types). We can map the "f" in "f(x)" back to the function selected by overload resolution, or map the "+" in "x + y" to the overloaded operator it uses. Building something ctags-like from that API should be easy. The Clang C API is documented here: http://clang.llvm.org/doxygen/group__CINDEX.html On Wed, Sep 8, 2010 at 12:38 AM, Ryo IGARASHI <rigarash@gmail.com> wrote:
FYI, there is an GCCSense project(http://cx4a.org/software/gccsense/) to use with emacs/vim for translation unit aware code completion.
Clang provides code completion through the C API: http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html and via the command line. Integration with vim/Emacs should be simple; I hacked up the first Emacs mode for it in a few hours. On Tue, Sep 7, 2010 at 8:09 PM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
Or red underline, like a spell checker. And blue underline like MS Word for "grammar" (syntax) errors. You get the idea. Might as well do real spell checking (against dictionary and code identifiers) in the comments as well...
It turns out that spell-checking is also great for error recovery: http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spel... - Doug

On Sunday 05 September 2010 11:31:55 Doug Gregor wrote:
I want to see great, new ideas in C++ parsing and development tools, but I strongly feel that those ideas could be far better disseminated through extending/adapting/changing existing the large-scale, industry-backed projects (GCC or Clang/LLVM) than by bringing up a third large-scale competitor.
I generally agree with this, which is why the whole libc++ project is highly unfortunate. -Dave

On Thu, Sep 9, 2010 at 10:31 PM, David Greene <greened@obbligato.org> wrote:
On Sunday 05 September 2010 11:31:55 Doug Gregor wrote:
I want to see great, new ideas in C++ parsing and development tools, but I strongly feel that those ideas could be far better disseminated through extending/adapting/changing existing the large-scale, industry-backed projects (GCC or Clang/LLVM) than by bringing up a third large-scale competitor.
I generally agree with this, which is why the whole libc++ project is highly unfortunate.
How so? It has a better license, more designed for modern architectures, made by someone at Apple, although not done yet.

On 09/03/2010 05:04 PM, Doug Gregor wrote:
Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud
and propose that we already have such a library. Clang: I strongly encourage you not to begin yet another open-source C++ parser.
Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate? Erik

On Mon, Sep 6, 2010 at 5:32 PM, Nelson, Erik - 2 <erik.l.nelson@bankofamerica.com> wrote:
On 09/03/2010 05:04 PM, Doug Gregor wrote:
Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud
and propose that we already have such a library. Clang: I strongly encourage you not to begin yet another open-source C++ parser.
Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
Erik
Yes, but there have recently been a handful of patches and a lot of discussion to make it compatible. -- Cory Nelson http://int64.org

On Sep 6, 2010, at 5:32 PM, Nelson, Erik - 2 wrote:
On 09/03/2010 05:04 PM, Doug Gregor wrote:
Having a good, open-source C++ parser library that could support such tools would be wonderful. However, I am going to be a stick-in-the-mud
and propose that we already have such a library. Clang: I strongly encourage you not to begin yet another open-source C++ parser.
Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
It's probably accurate, but also outdated. Clang can be compiled by Visual Studio (many thanks to Steven Watanabe, by the way). What Clang can't do yet is parse VS's own headers, at least not enough of them for reasonable programs. So you have to use MinGW's headers. Sebastian

AMDG Sebastian Redl wrote:
On Sep 6, 2010, at 5:32 PM, Nelson, Erik - 2 wrote:
Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
It's probably accurate, but also outdated. Clang can be compiled by Visual Studio (many thanks to Steven Watanabe, by the way). What Clang can't do yet is parse VS's own headers,
It's getting close. I'm down to 13 errors for #include <iostream> with my working copy of clang. Most of the remaining errors have to do with rvalue-references.
at least not enough of them for reasonable programs. So you have to use MinGW's headers.
In Christ, Steven Watanabe

Sebastian Redl wrote on Monday, September 06, 2010 8:51 PM
On Sep 6, 2010, at 5:32 PM, Nelson, Erik - 2 wrote: Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
It's probably accurate, but also outdated. Clang can be compiled by Visual Studio (many thanks to Steven Watanabe, by the way). What Clang can't do yet is parse VS's own headers, at least not enough of them for reasonable programs. So you have to use MinGW's headers.
It's great that folks are working hard on it, but it's simply a nonstarter (for me, and I expect many others) to have to install yet another environment (MinGW) in order to use it. One of the greatest things about Boost is that it was designed from the ground up to be portable, and it "just works" (most of the time). Erik

On Sep 6, 2010, at 7:01 PM, Nelson, Erik - 2 wrote:
Sebastian Redl wrote on Monday, September 06, 2010 8:51 PM
On Sep 6, 2010, at 5:32 PM, Nelson, Erik - 2 wrote: Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
It's probably accurate, but also outdated. Clang can be compiled by Visual Studio (many thanks to Steven Watanabe, by the way). What Clang can't do yet is parse VS's own headers, at least not enough of them for reasonable programs. So you have to use MinGW's headers.
It's great that folks are working hard on it, but it's simply a nonstarter (for me, and I expect many others) to have to install yet another environment (MinGW) in order to use it.
You misunderstand. Clang works perfectly fine when compiled with Visual Studio. As far as Clang's own code is concerned, it has its own portability layer, and it doesn't need any POSIX portability layer. (MinGW isn't one anyway.) But it can't parse the header files supplied with Visual Studio, because they're not C++, they're VS's dialect of C++. That's a problem that affects *every* C++ parser. You have to implement Microsoft's extensions before you can use their headers. This is why, if you want to actually parse C++ code in Windows, you need to avoid VS headers, and MinGW happens to offer replacements. Sebastian

On Tue, Sep 7, 2010 at 1:07 AM, Sebastian Redl <sebastian.redl@getdesigned.at> wrote:
On Sep 6, 2010, at 7:01 PM, Nelson, Erik - 2 wrote:
Sebastian Redl wrote on Monday, September 06, 2010 8:51 PM
On Sep 6, 2010, at 5:32 PM, Nelson, Erik - 2 wrote: Last time I looked at Clang, it wasn't compilable with MSVC- is my memory accurate?
It's probably accurate, but also outdated. Clang can be compiled by Visual Studio (many thanks to Steven Watanabe, by the way). What Clang can't do yet is parse VS's own headers, at least not enough of them for reasonable programs. So you have to use MinGW's headers.
It's great that folks are working hard on it, but it's simply a nonstarter (for me, and I expect many others) to have to install yet another environment (MinGW) in order to use it.
You misunderstand. Clang works perfectly fine when compiled with Visual Studio. As far as Clang's own code is concerned, it has its own portability layer, and it doesn't need any POSIX portability layer. (MinGW isn't one anyway.) But it can't parse the header files supplied with Visual Studio, because they're not C++, they're VS's dialect of C++. That's a problem that affects *every* C++ parser. You have to implement Microsoft's extensions before you can use their headers. This is why, if you want to actually parse C++ code in Windows, you need to avoid VS headers, and MinGW happens to offer replacements.
Some people (including me) are working to making sure clang can handle VS headers. clang already support the -fms-extensions option. It is just that not all extensions are currently implemented. My personal goal is to have clang parses VS header files before the end of this year.
participants (22)
-
Anthony Williams
-
Cory Nelson
-
Dave Abrahams
-
David Abrahams
-
David Greene
-
Dmitry Goncharov
-
Doug Gregor
-
Felipe Magno de Almeida
-
Florian Goujeon
-
Francois Pichet
-
Gottlob Frege
-
joel falcou
-
Klaim
-
KSpam
-
Mathias Gaunard
-
Nelson, Erik - 2
-
OvermindDL1
-
Ryo IGARASHI
-
Sebastian Redl
-
Stefan Seefeld
-
Steven Watanabe
-
Thomas Klimpel