
Well, IMO, it should perform better because the producer and consumer are not thrashing each other wrt the head and tail indexes.
the performance difference is almost the same. current implementation: Performance counter stats for 'workspace/boost_lockfree/bin.v2/libs/lockfree/test/ringbuffer_test.test/gcc-4.4.1/release/threading-multi/ringbuffer_test': 104843.459702 task-clock-msecs # 1.919 CPUs 86507 context-switches # 0.001 M/sec 100 CPU-migrations # 0.000 M/sec 2123 page-faults # 0.000 M/sec 292926371337 cycles # 2793.940 M/sec 139203486138 instructions # 0.475 IPC 3829638652 cache-references # 36.527 M/sec 19789409 cache-misses # 0.189 M/sec 31762728722 branches # 302.954 M/sec 1669597777 branch-misses # 15.925 M/sec 54.643822949 seconds time elapsed your proposed implementation (with cache line padding): Performance counter stats for 'workspace/boost_lockfree/bin.v2/libs/lockfree/test/ringbuffer_test2.test/gcc-4.4.1/release/threading-multi/ringbuffer_test2': 104827.826694 task-clock-msecs # 1.922 CPUs 84196 context-switches # 0.001 M/sec 196 CPU-migrations # 0.000 M/sec 2135 page-faults # 0.000 M/sec 292892624370 cycles # 2794.035 M/sec 149885017189 instructions # 0.512 IPC 3301119985 cache-references # 31.491 M/sec 20729054 cache-misses # 0.198 M/sec 33378354235 branches # 318.411 M/sec 1685405757 branch-misses # 16.078 M/sec 54.527086724 seconds time elapsed cheers, tim -- tim@klingt.org http://tim.klingt.org Desperation is the raw material of drastic change. Only those who can leave behind everything they have ever believed in can hope to escape. William S. Burroughs