[Looking for Feedback] - Fairness lib - Boost

21 Oct 2023

      Dear Boost Community,

I am writing to seek feedback on a new C++ library that I have been 
developing (https://github.com/Sernior/fairness 
<https://github.com/Sernior/fairness>).
This library, named "*Fairness*," is still a work in progress. 
Nonetheless, I believe that this is an opportune time to begin gathering 
feedback.

*Fairness* is designed to provide fair synchronization primitives. While 
the standard C++ library offers various tools for managing concurrency, 
it does not provide mechanisms for handling fairness policies.

To introduce the most fundamental component of my library, the 
"priority_mutex," I would like to share a simple example:

/#include <thread>/
/#include <chrono>/
/#include <algorithm>/
/#include <BS_thread_pool.hpp>/
/#include <boost/fairness.hpp>/
/
/
/static boost::fairness::priority_mutex<4> pm;/
/
/
/static void busy_wait_nano(int64_t nanoseconds){/
/auto begin = std::chrono::steady_clock::now();/
/for(;std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::steady_clock::now() 
- begin).count() < nanoseconds;)/
/      continue;/
/}/
/
/
/static void thread_function_nano(int p, int preCriticalTime, int 
postCriticalTime, int criticalTime){/
/  busy_wait_nano(preCriticalTime);/
/  pm.lock(p);/
/  busy_wait_nano(criticalTime);/
/  pm.unlock();/
/  busy_wait_nano(postCriticalTime);/
/}/
/
/
/int main()/
/{/
/  std::array<int, 8> prios {0, 2, 2, 1, 1, 3, 3, 0};/
/  std::array<int, 8> preCT {2000, 1500, 2000, 3000, 1000, 500, 500, 2000};/
/  std::array<int, 8> postCT {5000, 3000, 2000, 2500, 1000, 1500, 1500, 
4500};/
/  int criticalTime = 1000;/
/
/
/  BS::thread_pool pool(8);/
/
/
/  for (int i = 0; i < 8; ++i){/
/      pool.push_task(thread_function_nano, prios[i], preCT[i], 
postCT[i], criticalTime);/
/  }/
/  pool.wait_for_tasks();/
/
/
/  return 0;/
/}

/
/// using BS thread pool https://github.com/bshoshany/thread-pool in 
this example./
/
/
In this example priorities are not given randomly, you can see that I 
gave 0 priority (the highest priority) to thread 0 which has 2000 + 1000 
+ 5000 and thread 7 which has 2000 + 1000 + 4500.
Priority mutexes function much like a `std::array`, using a 0-indexed 
based system. For instance, a `priority_mutex<4>` will manage priorities 
ranging from 0 to 3, with 0 being the highest priority.

Currently, the library comprises priority mutexes, both recursive and 
shared versions, along with RAII wrappers such as `unique_lock` and 
`shared_lock`, all of which support priorities. There are, however, some 
elements still missing, including priority condvars and timed versions 
of the mutexes.

The primary motivation behind creating this library is to enhance 
throughput in multi-threaded pipelines.
To illustrate the concept more simply, imagine a pipeline with N tasks, 
some of which access a protected resource.
In cases of contention, there exists an optimal order of resource 
accesses that minimizes overall execution time, 
https://sernior.github.io/fairness/ 
<https://sernior.github.io/fairness/> the images show 2 different ways 
of ordering critical sections one of which is much better than the other.

This library aims to empower developers, who have profiled their 
software, to manipulate this order with minimal performance overhead and 
as simply as possible.

In terms of implementation, much of the development has revolved around 
benchmarking and testing. During this process, I've experimented with 
over 10 different versions of the priority mutex.
Ultimately, I achieved the best results by using a priority spinlock, 
which is also available to library users, to provide internal atomicity 
to the rest of the mutexes.
An essential consideration in the implementation is ensuring that the 
internal priority spinlock is 128 bytes away from the rest of the memory 
used by the mutex especially the memory used to perform the atomic::wait on.
This separation helps avoid false sharing. The choice of 128 bytes 
aligns with processor architectures like AMD Zen 4 (the one I am testing 
on), as indicated in their optimization manual.
I can attest that without those 128 byte separation the PM_LockUnlock 
benchmark was 10X worse and the PM_pipeline_benchmark_fast was worse 
than the std counterpart (see below for the benchmarks).

Another aspect to note in the implementation is the use of 4 bytes for 
booleans when only a single byte is needed. This decision is related to 
the way atomic::wait is implemented. Most standard library 
implementations of atomic::wait check if the type size is supported by 
the current platform's wait syscall.
For instance, Windows can wait on 1 byte, while Linuxkernel based 
systems can only wait on 4 bytes.

These are some benchmark results:
6.4.0-kali3-amd64
AMD Ryzen 5 7600X 6-Core Processor

Now the LockUnlock benchmarks were the most difficult to optimize.
Those are 8 threads simply doing:
m.lock();
m.unlock();

The pipeline benchmarks (with different lenght) are actual pipelines 
where each thread has different times before/during/after the critical 
section and I try to set the priorities accordingly to finish as soon as 
possible.

You can see there is also a mutex called slim_priority_mutex, that is a 
priority_mutex implementation that can support up to 7 (15 if boost 
atomic is found and your hardware supports the DWCAS) priorities. Those 
are light weight versions that only take 128 bits up to 7 priorities and 
past that, up to 15, 256 bits.

I am also testing with a custom wait implementations 
(https://github.com/Sernior/fairness/blob/main/include/boost/fairness/detail/... 
<https://github.com/Sernior/fairness/blob/main/include/boost/fairness/detail/wait_ops.hpp>) 
which can be tested by defining BOOST_FAIRNESS_USE_EXPERIMENTAL_WAIT_NOTIFY.
At some point I wish to make this wait the default version the lib uses 
instead of the std one since users of my lib could configure this 
version using my configs macros (number of fast spins - relaxed spins - 
yield spins before the wait syscall).

As I said in the beginning other than to present my lib, this email has 
the purpose of obtaining feedback. I would like to see if there is 
anyone (other than me) interested in what I am doing and, if so, 
consider that I am still far from finished and I would love suggestions.

Thank you all for your attention.

Best regards,

Federico Abrignani.

[Looking for Feedback] - Fairness lib

Federico Abrignani

Federico Abrignani

Andrey Semashev

Federico Abrignani

Andrey Semashev

Federico Abrignani

tags

participants (3)