
On Friday 05 August 2011 20:54:50 Grund, Holger wrote:
Hi Helge,
keep an "even" and an "odd" copy of your data structure, keep an atomically readable "generation counter" -- on access, read the generation counter, read the data (depending on parity of generation counter), read the generation counter again
if it changed, start over. if it didn't change, you have your data; on modification, update generation counter as appropriate (if you are paranoid about counter overflows, you can repeat a similar trick with the counter itself)
no need for anything larger than word-sized atomics here, size of shared read-only data structure does not matter
Agreed, this is not impossible, but I still tend to think we should strive for a more efficient implementation if at all possible.
Where do you see room for improvement? It is a fallacy to assume that "most efficient implementation" always means "there is a machine instruction providing a 1:1 translation of my high-level construct". Look at this from the POV of cache synchronisation cost (which is the real cost, not the number of instructions), and you will realize that there is not much you can do (assuming you can squeeze the data copies as well as the sequence counter into the same cacheline). This approach BTW is already way faster than e.g. using a 64-bit mmx register and paying the cost of mmx->gpr transfers on x86. Regards Helge