And now I'd like to optimize call_f's implementation by avoiding the creation of the T object when it is not necessary (as in bar, whose memfun f is static).
IMHO this is unnecessary if you are only interested in avoiding the small object instatiation. I expect the compiler to erase them. I once played with template code that had nesting levels of 50 and more with function calls calling functions calling ... etc. ... calling C-library code. Using link time optimization destroyed all of those: the first step in debugger went right through to the C-library call. The compiler sorted it out for me. No smybols of my own code in a symbol table print. OTOH this remains an interesting question: Might be useful if you want to ensure to grab only those classes with the static version for reasons other than runtime performance. Markus