
Do you think the overhead of calling through boost::bind could be comparable to the length of time it takes to run the function?
I don't know -- haven't looked -- where boost::bind was added or why, but any unnecessary overhead should be eliminated unless it is proven to be insignificant.
It seems that it's not a problem. [....]
I suggest something that simply iterates over the test data but does not check for correctness of parsing. Although it won't make a fat lot of difference in this case at least it's then consistent - you're timing the parsers not the tests for equality. The correctness test could then be done later once the timings are complete.
I should have done that from the beginning. It would be best, I think, to run through the data once for each test, to verify the result. Then, timing runs can be done without any checks. Both should run from main() each time so that any optimizations are validated before considering the performance.
Sure.
Does the size of the test data set matter? In other words do you notice similar speedups if the test data will all fit in cache?
Wouldn't that give less representative performance results?
I guess it depends what you're trying to represent when measuring the performance. If you're trying to represent the [totally contrived] case of parsing short segments of text that all fit in cache surely it's better? Or is your question more regarding the distribution of the actual values under test rather than the length? My suggestion is merely the curiosity of seeing if there is a cache dependent effect or not rather than a 'representative' measurement. I don't for a minute think it's a good idea to permanently modify someone's hard thought through and exhaustive testing. As a tongue-in-cheek aside - there's potentially some merit in having a parser that's particularly fast for numbers starting with "123". http://en.wikipedia.org/wiki/Benford%27s_law Regards, -ed ------------------------------------------------ "No more boom and bust." -- Dr. J. G. Brown, 1997