
Martin Slater wrote:
I believe the intel machine has hardware instructions which implement strcmp and that compilers support them. So even if strcmp is the bottleneck, I wouldn't expect it to show up on the profiler unless some sort of inlining were turned off. Or maybe the vtune profiler has special provision for these cases somewhere.
VTune doesn't have any special provision for this, if a function is inlined it will just show up in the function it inlined it to. Under VC by default strcmp is just a regular function call but you can enable it as a compiler intrinsic (#pragma intrinsic(strcmp) ) and the compiler may well then generate much better code (I know enabling memcpy this way can reduce memcpy(&a, &b, sizeof(int)) to a simple register mov in places).
I did check to verify that the strcmp in the type-id lookup has been removed. Instead we just make sure there is only one instance of a particular extended_type_info record so that we can just compare the addresses. There are still some optimizations
This is very good, I was looking at how to do this myself so am very happy I now don't have to;)
to be implemented - but I can't predict how much they will speed up anything.
Predication in optimisation I have found to be nigh on impossible, without a profiler or at the very least extemely heavy instrumentation within you code you will always get a shock as to where the time is spent. VC6 was a nightmare in this regard as for example it would not inline some trivial functions without being given __forceinline for that function casuing some potentially extrememly fast code to run pathetically slowly.
If your interested in vtune they do an evaluation verion at https://registrationcenter.intel.com/EvalCenter/EvalForm.aspx?ProductID=319 It is simply the best profiling tool I have ever used.
I'd be more than happy to help out with any profiling and optimisation I can.
Martin.