[Test] Testing generated code.
I have written type safe wrappers around some math operations that I need to use everywhere and I expect and rely on them being equivalent to the handcoded or macro based approach for performance reasons. I tend to check the resulting assembly by hand every so often and it surprises me from time to time to see that inefficiencies have crept in. If I write a routine in C and a routine using the type safe wrappers that I expect to have equivalent generated assembly is there a way to test that this is indeed the case? Can anyone else think of a better way to make sure that register usage is optimal, proper inlining is occurring, etc? Thanks, Michael Marcin
"Michael Marcin"
I have written type safe wrappers around some math operations that I need to use everywhere and I expect and rely on them being equivalent to the handcoded or macro based approach for performance reasons.
I tend to check the resulting assembly by hand every so often and it surprises me from time to time to see that inefficiencies have crept in.
If I write a routine in C and a routine using the type safe wrappers that I expect to have equivalent generated assembly is there a way to test that this is indeed the case?
Can anyone else think of a better way to make sure that register usage is optimal, proper inlining is occurring, etc?
You can make this routine a template with parameter policy type modeling either wrapped or non-wrapped operations. Than you can instantiate this function with wrapped and unwrapped parameter policy. Measure and compare the performance of this calls in N invocations. HTH, Gennadiy
Gennadiy Rozental wrote:
"Michael Marcin"
wrote in message news:f5pev8$dfe$1@sea.gmane.org... I have written type safe wrappers around some math operations that I need to use everywhere and I expect and rely on them being equivalent to the handcoded or macro based approach for performance reasons.
I tend to check the resulting assembly by hand every so often and it surprises me from time to time to see that inefficiencies have crept in.
If I write a routine in C and a routine using the type safe wrappers that I expect to have equivalent generated assembly is there a way to test that this is indeed the case?
Can anyone else think of a better way to make sure that register usage is optimal, proper inlining is occurring, etc?
You can make this routine a template with parameter policy type modeling either wrapped or non-wrapped operations.
Than you can instantiate this function with wrapped and unwrapped parameter policy. Measure and compare the performance of this calls in N invocations.
Unfortunately this is primarily for a mobile device that is (relatively) easy to compile for but annoyingly tedious to execute programs on. Additionally there is no reliable timing API available. The problem is a little more involved than I initially let on. There is a senior engineer that I work with who believes templates are slow and prefers to write C or assembly routines. The templates are slow belief has spread to some other members of the company. In fact the next major project is being written almost entirely in C (luckily I'm not part of that endeavor). Unfortunately my words and literature references aren't doing much to dispel this belief. I've made a few isolated instances of success by showing identical or better assembly output to some engineers in off-the-cuff examples. What I think I need is a suite of tests to ensure I can quickly and definitively back up any claim I make as to the efficiency of my code versus a hand coded or macro based implementation. And as a side-effect I'd get to ensure I'm generating good code even in the face of changes and refactorings. I can only think of two options at the moment. A. Write some interesting code and generate the assembly for it. Analyze this assembly manually and save it off in source control. When the test suite is run compile that code down to assembly again and have the test suite do a simple byte comparison of the two files. B. Write the templated and C versions of the algorithms. Run the test suite to generate the assembly for each version. Write a parser to and a heuristic to analyze the generated code of each. Grade and compare each assembly_analyzer t( "template_impl.asm" ); assembly_analyzer c( "c_impl.asm" ); BOOST_CHECK_EQUAL( t.register_usage() <= c.register_usage(), true ); BOOST_CHECK_EQUAL( t.function_calls() <= c.function_calls(), true ); etc. Unless there already exists a tool to do 90% of B it is way too complex to tackle. A seems like it should be possible if a lot of work. Then again maybe I should just let everything alone and spend my time on other ventures. Does anyone have any input/ideas/suggestions? Thanks, Michael Marcin
On Fri, Jun 29, 2007 at 02:05:59AM -0500, Michael Marcin wrote:
There is a senior engineer that I work with who believes templates are slow and prefers to write C or assembly routines. The templates are slow
What does he base his belief on? And did he provide *any* proof for his reasoning? (Well, if he's in a higher position than you, he might not be required to do so. People listen to him because he's in higher position, not because he has good arguments. Been there, experienced that.) Templates are *purely* compile-time mechanism and thus amenable to optimization in compile-time. You might try explaining that templates are just an "advanced preprocessor". [Or he might just be "rationalizing" his unwillingness to learn something new. If this is the case, then any argument with him is probably lost apriori. Your best bet would be to show people willing to listen that he's wrong. But even *if* you show that templates are not less efficient than C+macros, you will have to show what *advantage* they have over C. So you need to make your argument based on two things: 1) no efficiency loss, 2) advantages over C. And you need to show this argument to people who are 1) willing to listen, and 2) can override the senior programmer. Those people will probably be interested in the time that others in the team will need to learn templates, etc.]
Write some interesting code and generate the assembly for it. Analyze this assembly manually and save it off in source control. When the test suite is run compile that code down to assembly again and have the test suite do a simple byte comparison of the two files.
I don't understand this part. What do you want to compare? Macro vs. template version? This will certainly *not* yield identical object file (because it contains symbol names, etc. along with generated code).
Write the templated and C versions of the algorithms. Run the test suite to generate the assembly for each version. Write a parser to and a heuristic to analyze the generated code of each. Grade and compare each
Unfortunately, it's almost meaningless to analyze the run-time performance of a program (beyond algorithmic complexity) without the actual input. "Register usage" is a vague term, and the number of function calls does not have to play a role (infrequent code paths, large functions, cache effects, etc).
Does anyone have any input/ideas/suggestions?
How about traditional profiling? Write a test-suite that feeds the same input to C and C++ versions and compares their run-time? Compile once with optimizations, other time with profiling and compare run-times and hot-spots shown by the profiler.
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Zeljko Vrba Sent: 29 June 2007 08:40 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Test] Testing generated code.
On Fri, Jun 29, 2007 at 02:05:59AM -0500, Michael Marcin wrote:
There is a senior engineer that I work with who believes
templates are
slow and prefers to write C or assembly routines. The templates are slow
There is some interesting evidence (don't have web address) that shows that templates are actually considerably faster than some of the standard C library functions (i.e. qsort vs std:: sort functions). This is because templates are optimised to the actual type when in many cases the standard C functions (although written in assembler) have to have extra code (and therefore are slower) to handle more generic types. Obviously, the exact differences will depend on the type of code being used. Try google searches on this - I was surprised at the difference - perhaps your senior engineer will be also!! James This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
Zeljko Vrba wrote:
On Fri, Jun 29, 2007 at 02:05:59AM -0500, Michael Marcin wrote:
There is a senior engineer that I work with who believes templates are slow and prefers to write C or assembly routines. The templates are slow
What does he base his belief on? And did he provide *any* proof for his reasoning? (Well, if he's in a higher position than you, he might not be required to do so. People listen to him because he's in higher position, not because he has good arguments. Been there, experienced that.)
Apparently from looking at generated assembly from template code in the past with old compilers and probably bad programmers.
Write some interesting code and generate the assembly for it. Analyze this assembly manually and save it off in source control. When the test suite is run compile that code down to assembly again and have the test suite do a simple byte comparison of the two files.
I don't understand this part. What do you want to compare? Macro vs. template version? This will certainly *not* yield identical object file (because it contains symbol names, etc. along with generated code).
Yes this is a little confusing. Essentially the idea was to write snippets in both C and with templates and manually compare the generated assembly by looking at it. Then after I'm satisfied with the results the regenerate and compare tests would hopefully only fail if a meaningful change was made to the library code at which point I would have to reexamine the files by hand again. A lot of work.. especially when multiple configurations come into play.
Write the templated and C versions of the algorithms. Run the test suite to generate the assembly for each version. Write a parser to and a heuristic to analyze the generated code of each. Grade and compare each
Unfortunately, it's almost meaningless to analyze the run-time performance of a program (beyond algorithmic complexity) without the actual input. "Register usage" is a vague term, and the number of function calls does not have to play a role (infrequent code paths, large functions, cache effects, etc).
Whether it is matters or not is another question but you can look at generated code and determine if the compiler is doing a good job. For instance say I have: class my_type { public: int value() const { return m_value; } private: int m_value; }; bool operator==( const my_type& lhs, const my_type& rhs ) { return lhs.value() == rhs.value(); } bool test_1( my_type a, my_type b ) { return a == b; } bool test_2( int a, int b ) { return a == b; } Now if test_1 ends up calling a function for operator== or does any pushes onto the stack its not optimal and my_type and/or its operator== need to be fiddled with. It's this level of straight forward code I'm concerned with at the moment.
Does anyone have any input/ideas/suggestions?
How about traditional profiling? Write a test-suite that feeds the same input to C and C++ versions and compares their run-time? Compile once with optimizations, other time with profiling and compare run-times and hot-spots shown by the profiler.
As I said before there is no reliable timing mechanism available and the process of compiling, installing, and running programs on this target cannot be automated AFAIK. Thanks, Michael Marcin
On Fri, Jun 29, 2007 at 03:24:47PM -0500, Michael Marcin wrote:
Whether it is matters or not is another question but you can look at generated code and determine if the compiler is doing a good job.
For instance say I have:
Yes, you can determine that. But, IMHO, not by a fixed metrics which computation can be automated by static analysis.
Now if test_1 ends up calling a function for operator== or does any pushes onto the stack its not optimal and my_type and/or its operator== need to be fiddled with.
*OR* you need to fiddle with compiler options because inlining limit has been reached.
As I said before there is no reliable timing mechanism available and the process of compiling, installing, and running programs on this target cannot be automated AFAIK.
If the target (CPU+OS) uses a common CPU, you can acquire a machine with the same CPU time and an OS that allows you to do proper profiling. If the problem is with the CPU itself.. then I'm out of ideas. I would personally go down the route of figuring out how to do empirical measurements rather than static analysis. As for static analysis - I'd begin with the list of "blacklisted" functions, ie. those that MUST be inlined in a good code, and grep generated ASM for calls to these functions. Simple (once you manually prepare the list) and easily automated (fgrep). With modern CPUs and without input, anything else is not a reliable indication of run-time performance (again, IMHO). Oh, and read your compiler's docs :) Some compilers can generate optimizer reports. Eg. Intel's compiler has options to report optimizer's actions during compilation, and Sun's compiler has a seperate tool to analyze the final executable (er_src) and report on inlining, loop transforms, etc. Best regards, Zeljko.
participants (4)
-
Gennadiy Rozental
-
Hughes, James
-
Michael Marcin
-
Zeljko Vrba