Do you notice this overhead in C++03 or C++0x compilers, in both modes?
Can't say really, the move from STL containers to Boost.Containers was a while ago (approaching 2 years, maybe), so I don't really remember, but I would assume the problem is as much of a problem in either case. In my experience the pre-processor tricks that were common in C++03, which were replaced by the enhancements of C++11, are not really much worse (or better) than their C++11 equivalent. This is because what consumes a lot of time for compilation is the instantiations of templates (not the amount of code or the amount of overloads), and whether multiple templates are generated with the pre-processor or whether they are generated with variadic templates or some combination of specializations and Sfinae-style tricks, the real problems are with the combinatorial explosion of template instantiations. Some (possibly obvious) tips: Limit the depth of TMP recursive meta-evaluations. Never use more than the bare minimum of template arguments in the meta-functions or other helper templates. Things like Sfinae can be a significant problem for compilation times, because they require context-aware complex template instantiations to evaluate each overload, prefer to use tag-dispatching or regular overloads. Variadic templates are also really dangerous for compilation times. It is very easy to create deep recursions and O(N^2) template instantiations in cases where the C++03-style of template would have N templates of O(N) instantiation complexity, which is much better.
However I haven't diagnosed where the problem lies and I have no much time and knowledge to investigate it.
Yeah. I hear you. This is a really hard problem to tackle, and it's never very obvious what causes the most issues.
In any case without measurements, we don't know where we waste the compilation time. I don't know if measuring this overhead reliably is possible.
It's sort of possible. Some time ago I did some crude tests like that on a project and it was very tedious. I basically enabled compilation profiling on GCC and created small cpp files that contained explicit template instantiations of selective pieces of the library's code. It's crude and tedious, you start with high-level (end-user) templates to see which are worse, and work your way through the template instantiations that they trigger until you find the templates that seem to be the worse offenders, and then try to understand way (e.g., find the complexity (O(N)..) of the number of template instantiations.. it's really tedious and time-consuming, but it can pay off sometimes. A similar trick is to comment out certain features (e.g., certain functions or parts of the container) and see how compilation times change (it only needs to compile, it does not need to be able run, so you can comment out a lot of stuff without having to replace them with dummy code or stubs). But recently, there has been some work on a template instantiation profiler (time and memory) and a template instantiation debugger, which is called Templight (an instrumentation of Clang). The problem is that it is still in a beta stage. I'm actually the one who picked up the code from the original authors (who seem to have abandoned it) and enhanced it as well as creating an interactive (GDB-style) debugger for stepping through template instantiations in the same way you can step through code execution on a regular run-time debugger. However, this project has been on my back-burner for a while because every time I hack away at the Clang code-base I get very frustrated and discouraged, because it is such a huge mess and a horrible code-base to work with. Anyway, in theory, you can use Templight on profile the time and memory consumed by the compiler for each instantiation templates, just like a run-time profiler, but for template instantiations instead of function calls. I've been trying to bump up the priority of that item ("finish templight coding") on my to-do list, and hope to get to it some time in the next few weeks. Part of being able to get to that to-do item is wrapping up my work on those major additions to the BGL (which is another side-project that has been lingering for far too long).
Which containers do you use in BGL?
I use most of them. The main chunk of code is that I entirely re-implemented the adjacency_list class with Boost.Container on the back-end. As you probably know, the adj-list class provides options for choosing the type of containers used under-the-hood, and so, I implemented it for every sensible combination of vector, list, set, multiset, unordered_set, unordered_multiset, and "pool" (which is a vector container, but leaves "holes" in it upon removal of elements to avoid invalidating indices / iterators to other elements). My implementation of adj-list is much leaner than the existing one, but it takes quite a bit more time to compile it (at least, so it seems, I haven't measured it), and so, I now suspect it's coming from the Boost containers. But I cannot really test against STL containers any more, because I rely on incomplete type guarantees and n3644 iterators ;) neither of which are implemented in standard STL releases. But I agree that the vector container is probably not a problem, because I use it elsewhere, and it compiles fast (this is just my subjective impression, of course, it's hard to measure). Cheers, Mikael.