
Anthony Williams <anthony_w.geo <at> yahoo.com> writes:
Kowalke Oliver (QD IT PA AS <Oliver.Kowalke <at> qimonda.com> writes:
Andrey Tcherepanov <moyt63c02 <at> sneakemail.com> writes:
I sent you results to anthony_w.geo<at>yahoo.com, please let me know if you received them or not - these corp. filters play strange games sometimes...
Received, thank you.
Would you share the data?
My spreadsheet with the results in can be downloaded from http://www.justsoftwaresolutions.co.uk/files/dining_results.ods Not all of Andrey's runs are in there, but enough to get a reasonable mean and Standard-deviation for most cases. This shows some interesting observations. Firstly, the padding (8k between entries) is hugely beneficial for the small (8-byte) mutexes (boost 1.35, and my BTS-based variant), but the benefit is less for the CRITICAL_SECTION based mutex, which is 24-bytes. If you consider that the data is only 4 bytes, and each cache line is 64 bytes, this is unsurprising --- with a 24 byte mutex and 4 byte data, you can only fit 2.5 entries in a cache line, so it's only adjacent entries that clash, whereas with a 8 byte mutex and 4 byte data you get 5 entries per cache line, so there is much more conflict. This makes me wonder if it's worth padding the mutex to 64 bytes, so it occupies an entire cache line, or maybe adding a 64-byte alignment requirement. Secondly, the yield is quite beneficial in some cases (e.g. 58% improvement), but often detrimental (up to 33% degradation). Overall, I think it is not worth adding. The next thing I noticed was quite surprising. On the machines with 16 hardware threads, the 32-thread versions often ran faster than the 16-thread versions. I can only presume that this is because the increased thread count meant less contention, and thus fewer blocking waits. Finally, though the BTS-based mutex is faster than the boost 1.35 one in most cases, the CRITICAL_SECTION based version is actually very competitive, and fastest in many cases. I found this surprising, because it didn't match my prior experience, but might be a consequence of the (generally undesirable) properties of CRITICAL_SECTIONS: some threads end up getting all the locks, and running to completion very quickly, thus reducing the running thread count for the remainder of the test. Anthony -- Anthony Williams Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk Registered in England, Company Number 5478976. Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL