Re: [boost] Profiling template instantiations

Steven Watanabe wrote:
Ok. It would be great if compilers supported this directly.
Just fyi: the cxx compiler has verbose template instantiation mode. On Tru64, for example: x.cpp ----- #include <map> int main() { map<int, int> m; return m.size(); } cosf.zko.hp.com> cxx -V Compaq C++ V7.1-006 for Compaq Tru64 UNIX V5.1B (Rev. 2650) Compiler Driver V7.1-006 (cxx) cxx Driver cosf.zko.hp.com> cxx -ptv -c x.cpp cxx: Info: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/tree.cc, line 84: automatically instantiating void _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::_RWdeallocate_buffers() void _RWrb_tree<_Key, _Val, _KeyOf, _Comp, _Alloc>::_RWdeallocate_buffers () ------------------------------------------------------^ cxx: Info: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/tree.cc, line 549: automatically instantiating void _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::_RWerase(_RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::_RWrb_tree_node *) void _RWrb_tree<_Key, _Val, _KeyOf, _Comp, _Alloc>::_RWerase (_RWlink_type x) ------------------------------------------------------^ cxx: Info: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/tree.cc, line 301: automatically instantiating _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::iterator _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::erase(_RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::iterator) _RWrb_tree<_Key, _Val, _KeyOf, _Comp, _Alloc>::erase (iterator position) -------------------------------------------------^ cxx: Info: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/tree.cc, line 565: automatically instantiating _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::iterator _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::erase(_RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::iterator, _RWrwstd::_RWrb_tree<int, std::pair<const int, int> , _RWrwstd::_RWselect1st<std::pair<const int, int> , int> , std::less<int> , std::allocator<std::pair<const int, int> > > ::iterator) _RWrb_tree<_Key, _Val, _KeyOf, _Comp, _Alloc>::erase (iterator first, -------------------------------------------------^ cosf.zko.hp.com> ----- Original Message ----- From: <boost@lists.boost.org> To: <boost@lists.boost.org> Sent: Thursday, May 08, 2008 8:29 PM Subject: Re: [boost] Profiling template instantiations
AMDG
Simonson, Lucanus J wrote:
Ingenious, you are automatically inserting a compile time error into every basic block of the template code to be profiled, compiling the translation unit and counting how many error are generated for each template to get a count of how many times the compiler tries (and fails) to instantiate the template.
Actually it only generates warnings. Otherwise, the compiler is liable to stop before it finishes compiling everything.
There may be a more direct way to extract this information with VTune related profiling features provided in icc. I'll follow up and let you know.
Ok. It would be great if compilers supported this directly.
In Christ, Steven Watanabe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

AMDG Boris Gubenko wrote:
Steven Watanabe wrote:
Ok. It would be great if compilers supported this directly.
Just fyi: the cxx compiler has verbose template instantiation mode. On Tru64, for example:
<snip>
I only see function template instantiations, I'm more interested in class template instantiations because that's where the metaprogramming is done. Am I missing something? I'd also like to have all template instantiations, not just those that are triggered from inside other template instantiations. (Although this doesn't make a huge difference) In Christ, Steven Watanabe

Boris Gubenko wrote:
Steven Watanabe wrote:
Ok. It would be great if compilers supported this directly.
Just fyi: the cxx compiler has verbose template instantiation mode. On Tru64, for example:
Steven wrote: I only see function template instantiations, I'm more interested in class template instantiations because that's where the metaprogramming is done. Am I missing something?
I'd also like to have all template instantiations, not just those that are triggered from inside other template instantiations. (Although this doesn't make a huge difference)
This is the icc documentation for the -prof-gen flag. Apparently it instruments the code for every basic block to enable profile guided optimization later on. This should include template instantiates and basic blocks from inlined functions. ----------------------------------------- prof-gen, Qprof-gen Instruments a program for profiling. IDE Equivalent Windows: General > PGO Phase Architectures IA-32 architecture, Intel(r) 64 architecture, IA-64 architecture Syntax Linux and Mac OS X: -prof-gen -prof-genx Windows: /Qprof-gen /Qprof-genx Arguments None Default OFF Programs are not instrumented for profiling. Description This option instruments a program for profiling to get the execution count of each basic block. It also creates a new static profile information file (.spi). If -prof-genx or /Qprof-genx is specified, extra information (source position) is gathered for code-coverage tools. If you do not use a code-coverage tool, this option may slow parallel compile times. If you are doing a parallel make, this option will not affect it. These options are used in phase 1 of the Profile Guided Optimizer (PGO) to instruct the compiler to produce instrumented code in your object files in preparation for instrumented execution. ------------------------------------------ Later on you would use -prof-gen-sampling to, among other things, create a map from object code to line number in the source code. This map should be a superset of data you are looking for, which is instantiation count for templates. For templates that don't end up with any object code (meta-functions) I think you would find no instantiations in the map, whereas you might still find that the compiler evaluated the meta-function many times with your warning based approach. I guess it depends on what you are looking for. ------------------------------------------------------ prof-gen-sampling, Qprof-gen-sampling Prepares application executables for hardware profiling (sampling) and causes the compiler to generate source code mapping information. IDE Equivalent None Architectures IA-32 architecture Syntax Linux and Mac OS X: -prof-gen-sampling Windows: /Qprof-gen-sampling Arguments None Default OFF Application executables are not prepared for hardware profiling and the compiler does not generate source code mapping information. Description This option prepares application executables for hardware profiling (sampling) and causes the compiler to generate source code mapping information. The application executables are prepared for hardware profiling by using the profrun utility followed by a recompilation with option -prof-use (Linux and Mac OS X) or /Qprof-use (Windows). This causes the compiler to look for and use the hardware profiling information written by profrun (by default, into a file called pgopti.hpi). This option also causes the compiler to generate the information necessary to map hardware profile sample data to specific source code lines, so it can be used for optimization in a later compilation. The compiler generates both a line number and a column number table in the debug symbol table. This process can be used, for example, to collect cache miss information for use by option ssp on a later compilation. Alternate Options None See Also prof-use, Qprof-use compiler options ssp, Qssp compiler options ---------------------------------------------- My own interest is that I would like to do performance profiling and tuning of template instantiated code with VTune. It looks like these compiler options in icc are more well suited to that than the static information you are looking for. Is there a way to write a meta-function that implements a counter? The idea is that each time a template is instantiated by the compiler a meta-function counter (inserted by a script similar to your warning) would be evaluated. Then you could collect the counts computed at compile time and print them to file at runtime along with the assocated type name information from the counter's template parameter. template <typename T> struct meta_counter {...}; template < something> struct my_template { //inserted by script meta_counter<typeof_this_template>::increment_somehow; ... }; I'm not sure how retrieving the count would work. Perhaps you can create a global const with set to the value and just use nm to retrive the value. I have no idea how to fully implement this right now, but perhaps Steven can run with the idea. Thanks, Luke

AMDG Simonson, Lucanus J wrote:
Later on you would use -prof-gen-sampling to, among other things, create a map from object code to line number in the source code. This map should be a superset of data you are looking for, which is instantiation count for templates. For templates that don't end up with any object code (meta-functions) I think you would find no instantiations in the map, whereas you might still find that the compiler evaluated the meta-function many times with your warning based approach. I guess it depends on what you are looking for.
Metafunction instantiations are my primary concern.
My own interest is that I would like to do performance profiling and tuning of template instantiated code with VTune. It looks like these compiler options in icc are more well suited to that than the static information you are looking for.
Yep. Oh well.
Is there a way to write a meta-function that implements a counter?
There is no legal way to implement such a counter, AFAIK. There are illegal ways, though. In Christ, Steven Watanabe
participants (3)
-
Boris Gubenko
-
Simonson, Lucanus J
-
Steven Watanabe