Zeljko Vrba wrote:
On Fri, Jun 29, 2007 at 02:05:59AM -0500, Michael Marcin wrote:
There is a senior engineer that I work with who believes templates are slow and prefers to write C or assembly routines. The templates are slow
What does he base his belief on? And did he provide *any* proof for his reasoning? (Well, if he's in a higher position than you, he might not be required to do so. People listen to him because he's in higher position, not because he has good arguments. Been there, experienced that.)
Apparently from looking at generated assembly from template code in the past with old compilers and probably bad programmers.
Write some interesting code and generate the assembly for it. Analyze this assembly manually and save it off in source control. When the test suite is run compile that code down to assembly again and have the test suite do a simple byte comparison of the two files.
I don't understand this part. What do you want to compare? Macro vs. template version? This will certainly *not* yield identical object file (because it contains symbol names, etc. along with generated code).
Yes this is a little confusing. Essentially the idea was to write snippets in both C and with templates and manually compare the generated assembly by looking at it. Then after I'm satisfied with the results the regenerate and compare tests would hopefully only fail if a meaningful change was made to the library code at which point I would have to reexamine the files by hand again. A lot of work.. especially when multiple configurations come into play.
Write the templated and C versions of the algorithms. Run the test suite to generate the assembly for each version. Write a parser to and a heuristic to analyze the generated code of each. Grade and compare each
Unfortunately, it's almost meaningless to analyze the run-time performance of a program (beyond algorithmic complexity) without the actual input. "Register usage" is a vague term, and the number of function calls does not have to play a role (infrequent code paths, large functions, cache effects, etc).
Whether it is matters or not is another question but you can look at generated code and determine if the compiler is doing a good job. For instance say I have: class my_type { public: int value() const { return m_value; } private: int m_value; }; bool operator==( const my_type& lhs, const my_type& rhs ) { return lhs.value() == rhs.value(); } bool test_1( my_type a, my_type b ) { return a == b; } bool test_2( int a, int b ) { return a == b; } Now if test_1 ends up calling a function for operator== or does any pushes onto the stack its not optimal and my_type and/or its operator== need to be fiddled with. It's this level of straight forward code I'm concerned with at the moment.
Does anyone have any input/ideas/suggestions?
How about traditional profiling? Write a test-suite that feeds the same input to C and C++ versions and compares their run-time? Compile once with optimizations, other time with profiling and compare run-times and hot-spots shown by the profiler.
As I said before there is no reliable timing mechanism available and the process of compiling, installing, and running programs on this target cannot be automated AFAIK. Thanks, Michael Marcin