
Gottlob Frege wrote:
On Tue, Aug 23, 2011 at 11:12 PM, Gottlob Frege <gottlobfrege@gmail.com> wrote:
raw_move assumes that later the algorithm will raw_move back into the temporarily invalid source object. So sooner or later we write to that memory. Depending on caching scenarios, it might actually be faster to write to that memory *sooner*.
We can write sooner, but that doesn't change anything to our need to write later, does it? So either we write soon with zeros and later with the new value, or we write only later with the new value. Or are you saying that the compiler might be able to predict what value we're going to move later and move that value sooner? Or was your point maybe that reading the source object of a raw move might not be enough to promote it in the cache, and that writing it with zeros instead will therefore improve the speed of retrieval when the object has become the target of a new raw move?
[...] But I wouldn't be that surprised if raw_move offered no speed up in most cases.
To be honest, in the meanwhile I wouldn't be surprised about that anymore either. Christopher Jefferson already argued quite convincingly that there might be very little to gain. I'm still going to write a benchmark, but I'm prepared for the possibility that it will be no more than a benchmarking exercise.
Particularly if you are still calling raw_move one container element at a time. If you really want to speed things up, you need to memcpy a whole block of objects at once. ie an array/vector of pods. If 100 elements is 100 memcpy calls, I'm not sure you will get much benefit. If we can get 100 elements to compile into 1 memcpy call, there is a chance at a speed up. A good memcpy is hand optimized for the given architecture to prime the cache as it moves along, etc. That only works if it is one big call.
Yes, I agree that this is a much more powerful way to speed up a program. -Julian