
Martin Slater wrote:
I believe the intel machine has hardware instructions which implement strcmp and that compilers support them. So even if strcmp is the bottleneck, I wouldn't expect it to show up on the profiler unless some sort of inlining were turned off. Or maybe the vtune profiler has special provision for these cases somewhere.
VTune doesn't have any special provision for this, if a function is inlined it will just show up in the function it inlined it to. Under VC by default strcmp is just a regular function call but you can enable it as a compiler intrinsic (#pragma intrinsic(strcmp) ) and the compiler may well then generate much better code (I know enabling memcpy this way can reduce memcpy(&a, &b, sizeof(int)) to a simple register mov in places).
Hmmm - then the fact it showed up on the profiler suggests that the program wasn't compiled with full optimisation? You might want to expand upon this.
If your interested in vtune they do an evaluation verion at https://registrationcenter.intel.com/EvalCenter/EvalForm.aspx?ProductID=319 It is simply the best profiling tool I have ever used.
At one time I had the intel eval compiler installed and it was very good. My license expired and I just didn't have the incentive to actually pay for it. Too bad I would have liked to have it my test suite.
I'd be more than happy to help out with any profiling and optimisation I can.
Well, you'll get your chance pretty soon. Soon I'll be checking in my test_overhead program into the development tree. I think I can pass the compiler switches to get it to generate a profile - at least for gcc - but I'm struggling to figure out how to get bjam to invoke gprof to display the profile in the output and to make sure I can see it in the test matrix. So you may get your chance to make this work for vtune. I'm surprised that profling / bench marking isn't commonly part of the test suites of boost libraries. Robert Ramey