
it is also performance related, though ... on x86_64 (nehalem) my fifo stress test runs about 25% faster with pointer/tag compression than with cmpxchg16b ... that said, the lock-free property is for me more important than the throughput, since i am using it for soft real-time systems ...
Yes, this has also been my fear that cmpxchg outperforms cmpxch8b which outperforms cmpxch16b.
Tim, have you read the replies to my post on c.p.t regarding ABA bits? Even on this thread someone (I think Helge) argued that even 32 bits may not be enough. Now I'm thinking that maybe "generation counter" solution may not be workable as a general solution.
i just went through the replies ... (maybe i should upcase some parts in the documentation, that the implementation focus on WORST CASE, not AVERAGE CASE performance ... people keep complaining that the stack/fifo may be outperformed be blocking algorithms, which is both true and irrelevant for me, as these implementations are soft real-time safe (and could be made hard real-time safe). as for the aba tag ... increasing the tag is not necessary in the enqueue operation (chris thomasson made a valid point here), but then 16bit would give 2**16 different tags. of course a tag overflow is possible, but not very likely ... by definition ll/sc are immune to aba problems, but implementing cas via ll/sc, one loses this feature ... personally i would prefer to have language support for ll/sc transactions instead of aba-prone cas ... most cas-architectures provide dcas, while most ll/sc architectures shouldn't use cas emulation, but ll/sc-style transactions directly, as it is by definition aba immune ... too bad, c++0x doesn't provide language support for ll/sc, but only for cas :/ tim -- tim@klingt.org http://tim.klingt.org Life is really simple, but we insist on making it complicated. Confucius