[fiber] new version in vault

Oliver Kowalke

28 Nov 2009 28 Nov '09

10:29 p.m.

I've uploaded a new version of boost.fiber - main features are: - multiple schedulers can be used in one thread - user can provide its own scheduler implementation as template argument to class scheduler - classes disable_interruption and restore_interruption moved into namespace boost::this_fiber regards, Oliver

Show replies by date

Vicente Botet Escriba

28 Nov 28 Nov

10:43 p.m.

Oliver Kowalke-2 wrote:

...

I've uploaded a new version of boost.fiber - main features are:

- multiple schedulers can be used in one thread - user can provide its own scheduler implementation as template argument to class scheduler - classes disable_interruption and restore_interruption moved into namespace boost::this_fiber

regards, Oliver

Hi, please, could you add the html documentation or give a pointer to it? Thanks, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26557657.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Oliver Kowalke

10:52 p.m.

Vicente Botet Escriba wrote:

...

Oliver Kowalke-2 wrote:

...
I've uploaded a new version of boost.fiber - main features are:

- multiple schedulers can be used in one thread - user can provide its own scheduler implementation as template argument to class scheduler - classes disable_interruption and restore_interruption moved into namespace boost::this_fiber

regards, Oliver

Hi,

please, could you add the html documentation or give a pointer to it?

Thanks, Vicente

# ls boost.fiber-0.2.0/doc/html/ boostbook.css docutils.css fiber/ images/ index.html reference.css standalone_HTML.manifest

Vicente Botet Escriba

11:34 p.m.

Oliver Kowalke-2 wrote:

...

Vicente Botet Escriba wrote:

...
Oliver Kowalke-2 wrote:

...
I've uploaded a new version of boost.fiber - main features are:

- multiple schedulers can be used in one thread - user can provide its own scheduler implementation as template argument to class scheduler - classes disable_interruption and restore_interruption moved into namespace boost::this_fiber

regards, Oliver

Hi,

please, could you add the html documentation or give a pointer to it?

Thanks, Vicente

# ls boost.fiber-0.2.0/doc/html/ boostbook.css docutils.css fiber/ images/ index.html reference.css standalone_HTML.manifest _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Oliver, IMO you have no included the html file in the compressed file. Please could you check? Best, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26558029.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Vicente Botet Escriba

11:37 p.m.

Vicente Botet Escriba wrote:

...

Oliver Kowalke-2 wrote:

...
Vicente Botet Escriba wrote:

...
Oliver Kowalke-2 wrote:

...
I've uploaded a new version of boost.fiber - main features are:

- multiple schedulers can be used in one thread - user can provide its own scheduler implementation as template argument to class scheduler - classes disable_interruption and restore_interruption moved into namespace boost::this_fiber

regards, Oliver

Hi,

please, could you add the html documentation or give a pointer to it?

Thanks, Vicente

# ls boost.fiber-0.2.0/doc/html/ boostbook.css docutils.css fiber/ images/ index.html reference.css standalone_HTML.manifest _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Oliver, IMO you have no included the html file in the compressed file. Please could you check?

Best, Vicente

Sorry for the noise, I was looking for in boost.fiber-0.2.0/libs/fiber/doc/html, as it was the case for version 0.1.1, and where the doc use to be for the Boost libraries. Thanks, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26558053.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Vicente Botet Escriba

29 Nov 29 Nov

9:25 p.m.

Oliver Kowalke-2 wrote:

...

I've uploaded a new version of boost.fiber - main features are:

I have a question. I understand we need fibers::mutex and fibers::condition_variable, but could you explain why do we need a separated fibers::lock_guard and fibers::unique_lock template classes? Why the ones from Boost.Thread are not usable in the fiber context, at the end the Mutex parameter can be any model of lockable? Best, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26566310.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Stefan Strasser

11:30 p.m.

Am Sunday 29 November 2009 22:25:21 schrieb Vicente Botet Escriba:

...

Oliver Kowalke-2 wrote:

...
I've uploaded a new version of boost.fiber - main features are:

I haven't found it in the vault, is it still there? is the version in the sandbox the most current?

...

I have a question. I understand we need fibers::mutex and fibers::condition_variable, but could you explain why do we need a separated fibers::lock_guard and fibers::unique_lock template classes? Why the ones from Boost.Thread are not usable in the fiber context, at the end the Mutex parameter can be any model of lockable?

the mutexes are also almost independent of fibers, aren't they? they are CAS-based mutexes that could be used with threads as well, except for the call to this_fiber::yield(). CAS-based mutexes are long overdue for Boost.Thread. (I've even used random mutex sharing just to avoid the 24 bytes overhead a pthreads mutex is worth, for something that's basically a CAS-protected bool). so a mutex template with a parameter on what to do to waste some time (this_fiber::yield, this_thread::yield, spinning...) could clean this up. Boost.Interprocess also has its share of mutexes AND its own scoped_lock/shared_lock/upgrade_lock. it doesn't make much sense to have seperate synchronization code for Processes, Threads and Fibers. (with some exceptions, like named mutexes in Boost:Interprocess)

Vicente Botet Escriba

30 Nov 30 Nov

12:21 a.m.

Stefan Strasser-2 wrote:

...

Am Sunday 29 November 2009 22:25:21 schrieb Vicente Botet Escriba:

...
Oliver Kowalke-2 wrote:

...
I've uploaded a new version of boost.fiber - main features are:

I haven't found it in the vault, is it still there? is the version in the sandbox the most current?

...
I have a question. I understand we need fibers::mutex and fibers::condition_variable, but could you explain why do we need a separated fibers::lock_guard and fibers::unique_lock template classes? Why the ones from Boost.Thread are not usable in the fiber context, at the end the Mutex parameter can be any model of lockable?

the mutexes are also almost independent of fibers, aren't they? they are CAS-based mutexes that could be used with threads as well, except for the call to this_fiber::yield(). CAS-based mutexes are long overdue for Boost.Thread. (I've even used random mutex sharing just to avoid the 24 bytes overhead a pthreads mutex is worth, for something that's basically a CAS-protected bool). so a mutex template with a parameter on what to do to waste some time (this_fiber::yield, this_thread::yield, spinning...) could clean this up. Boost.Interprocess also has its share of mutexes AND its own scoped_lock/shared_lock/upgrade_lock.

it doesn't make much sense to have seperate synchronization code for Processes, Threads and Fibers. (with some exceptions, like named mutexes in Boost:Interprocess) _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hi, I suppose the version on the vault is the stable one, and the one on the sandbox the ongoing version. I know Interprocess has its share of mutexes AND its own scoped_lock/shared_lock/upgrade_lock. I was warming Oliver exactly for this reason, to avoid other family of locks. For the mutexes is another history. Have you tried to make a mutex class that works for process, threads and fibers? If you did, I'm interested in seen how you have reached that. Best, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26567922.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Stefan Strasser

6:13 a.m.

Am Monday 30 November 2009 01:21:42 schrieb Vicente Botet Escriba:

...

For the mutexes is another history. Have you tried to make a mutex class that works for process, threads and fibers? If you did, I'm interested in seen how you have reached that.

no I have not, but if you look at... https://svn.boost.org/svn/boost/sandbox/fiber/libs/fiber/src/mutex.cpp ...Oliver almost has. slightly changed to: namespace sync{ template<class Suspend> struct basic_cas_mutex{ void lock(){ while(true){ uint32_t expected = 0; if ( detail::atomic_compare_exchange_strong( & state_, & expected, 1) ) break; else Suspend()(); } } ... ... }; namespace fiber{ struct suspend{ void operator()() const{ this_fiber::yield(); } }; typedef sync::basic_cas_mutex<suspend> mutex; } namespace thread{ struct suspend{ void operator()() const{ this_thread::yield(); } }; #if CAS supported on platform typedef sync::basic_cas_mutex<suspend> mutex; #else typedef sync::native_mutex mutex; #endif } ...and if you have multiple CPUs and shortlived locks only, you might choose to not yield at all, to avoid a system call: struct null_suspend{ void operator()() const{} }; typedef sync::basic_cas_mutex<null_suspend> spin_mutex;

Oliver Kowalke

7:39 a.m.

I'll rename the current mutex impl of boost.fiber to spin_mutex soon and I'll introduce a mutex with true suspending of fiber too. -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02

Vicente Botet Escriba

1:57 p.m.

Stefan Strasser-2 wrote:

...

Am Monday 30 November 2009 01:21:42 schrieb Vicente Botet Escriba:

...
For the mutexes is another history. Have you tried to make a mutex class that works for process, threads and fibers? If you did, I'm interested in seen how you have reached that.

no I have not, but if you look at...

https://svn.boost.org/svn/boost/sandbox/fiber/libs/fiber/src/mutex.cpp

...Oliver almost has.

slightly changed to:

namespace sync{

template<class Suspend> struct basic_cas_mutex{ void lock(){ while(true){ uint32_t expected = 0; if ( detail::atomic_compare_exchange_strong( & state_, & expected, 1) ) break; else Suspend()(); } } ... ... };

<snip>

Hi, I like your sync::basic_cas_mutex class template, how it can be used on a thread or fiber context or even to build a spin lock. Anyway you will have two classes fiber::mutex and thread::mutex with a common implementation. Can sync::basic_cas_mutex be used to protect inter-process concurrent access? My concern was that we can and should use the same boost::lock_guard and boost::unique_lock even when we have several implementations of Lockables. In addition, I don't think that we can say sync::basic_cas_mutex<thread::suspend> could be equivalent to boost:mutex as the following let think.

...

#if CAS supported on platform typedef sync::basic_cas_mutex<suspend> mutex; #else typedef sync::native_mutex mutex; #endif

Both are useful on different context and should be provided separately. Up to the user to choose the one adapted to its context. Best, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26575154.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Helge Bahmann

2:41 p.m.

On Mon, 30 Nov 2009, Vicente Botet Escriba wrote:

...

Stefan Strasser-2 wrote:

...
namespace sync{

template<class Suspend> struct basic_cas_mutex{ void lock(){ while(true){ uint32_t expected = 0; if ( detail::atomic_compare_exchange_strong( & state_, & expected, 1) )

unless you use correct memory ordering constraints, your mutex will be twice as expensive as a platform-native mutex on any non-x86 [snip]

...

Hi,

I like your sync::basic_cas_mutex class template, how it can be used on a thread or fiber context or even to build a spin lock. Anyway you will have two classes fiber::mutex and thread::mutex with a common implementation. Can sync::basic_cas_mutex be used to protect inter-process concurrent access?

why would you ever want to use a mutex that does not properly inform the system scheduler until when the calling thread should be suspended (in case of contention) for "true" inter-thread (or even inter-process) coordination? Regards, Helge

Phil Endecott

5:48 p.m.

Helge Bahmann wrote:

...

why would you ever want to use a mutex that does not properly inform the system scheduler until when the calling thread should be suspended (in case of contention) for "true" inter-thread (or even inter-process) coordination?

When you believe that the probability of contention is very small, and you care only about average and not worst-case performance, and you wish to avoid the overhead (e.g. time or code or data size) of "properly informing" the system scheduler. Earlier in this thread, Stefan Strasser wrote:

...

a mutex template with a parameter on what to do to waste some time (this_fiber::yield, this_thread::yield, spinning...) could clean this up

Yes. I implemented something like this that could either spin, or call sched_yield, or (on Linux) call futex_wait. It's basically a Futex-based mutex that implements the other behaviours by replacing the futex with something trivial. See http://svn.chezphil.org/libpbe/trunk/include/Mutex.hh and Futex.hh, Yield.hh, Spin.hh in the same directory. I've discussed the relative performance of these things on this list in the past. Regards, Phil.

Helge Bahmann

6:24 p.m.

Am Monday 30 November 2009 18:48:21 schrieb Phil Endecott:

...

Helge Bahmann wrote:

...
why would you ever want to use a mutex that does not properly inform the system scheduler until when the calling thread should be suspended (in case of contention) for "true" inter-thread (or even inter-process) coordination?

When you believe that the probability of contention is very small, and you care only about average and not worst-case performance, and you wish to avoid the overhead (e.g. time or code or data size) of "properly informing" the system scheduler.

in the non-contention case, a properly implemented platform mutex will (unsurprisingly): 1. compare_exchange with acquire semantic 2. detect contention with a single compare of the value obtained in step 1 < your protected code here > 3. compare_exchange with release semantic 4. detect contention with a single compare of the value obtained in step 3 and wake suspended threads A clever implementation of course handles contention out-of-line. if you don't believe me, just disassemble pthread_mutex_lock/unlock on any linux system. FWIW, I just exercised a CAS-based mutex as you proposed (using the __sync_val_compare_and_exchange intrinsic), in a tight lock/unlock cycle on a Linux/PPC32 system and... it is 25% *slower* than the glibc pthread_mutex_lock/unlock based one! This is the "no contention case" you aim to optimize for... (a "proper" CAS-based mutex using inline assembly and with weaker memory barriers, is 10% faster, mainly because it eliminates the function call overhead). BTW we are talking about gains of ~5-10 clock cycles per operation here... As for the space overhead of a pthread_mutex_t... if you cannot pay that, just use hashed locks. Last note: calling "sched_yield" on contention is about the *worst* thing you can do -- linux/glibc will call futex(..., FUTEX_WAIT, ...) instead on contention, which can properly suspend the thread exactly until the lock is released *and* is about 1/2 - 2/3 the cost of sched_yield in the case the lock was released before the thread could be put to sleep. Whatever gains you think you may achieve, you won't. There is justification for rolling your own locking scheme for user-space scheduling (as the fiber library does), but otherwise just don't do it. Helge

Phil Endecott

11:22 p.m.

Helge Bahmann wrote:

...

Am Monday 30 November 2009 18:48:21 schrieb Phil Endecott:

...
Helge Bahmann wrote:

...
why would you ever want to use a mutex that does not properly inform the system scheduler until when the calling thread should be suspended (in case of contention) for "true" inter-thread (or even inter-process) coordination?

When you believe that the probability of contention is very small, and you care only about average and not worst-case performance, and you wish to avoid the overhead (e.g. time or code or data size) of "properly informing" the system scheduler.

in the non-contention case, a properly implemented platform mutex will (unsurprisingly):

1. compare_exchange with acquire semantic 2. detect contention with a single compare of the value obtained in step 1

< your protected code here >

3. compare_exchange with release semantic 4. detect contention with a single compare of the value obtained in step 3 and wake suspended threads

A clever implementation of course handles contention out-of-line.

Correct.

...

if you don't believe me, just disassemble pthread_mutex_lock/unlock on any linux system.

When I last looked at the glibc source, I believe I found quite a lot of code that needed to test whether the mutex was e.g. recursive or not, on top of that core functionality. On NPTL it was especially slow, though perhaps we can now consider that historic. I was able to get a very worthwhile performance improvement by implementing this stuff myself using atomic ops and futex syscalls, rather than calling glibc's pthread_* functions.

...

FWIW, I just exercised a CAS-based mutex as you proposed (using the __sync_val_compare_and_exchange intrinsic), in a tight lock/unlock cycle on a Linux/PPC32 system and... it is 25% *slower* than the glibc pthread_mutex_lock/unlock based one! This is the "no contention case" you aim to optimize for... (a "proper" CAS-based mutex using inline assembly and with weaker memory barriers, is 10% faster, mainly because it eliminates the function call overhead). BTW we are talking about gains of ~5-10 clock cycles per operation here...

When I benchmarked this stuff a couple of years ago, I believe that I found that gcc would often get its static branch prediction wrong for the CAS loop. Using either branch hints in the source or feedback-driven optimisation I could get the "expected" performance.

...

As for the space overhead of a pthread_mutex_t... if you cannot pay that, just use hashed locks.

A hashed lock library would be welcome here, I'm sure.

...

Last note: calling "sched_yield" on contention is about the *worst* thing you can do -- linux/glibc will call futex(..., FUTEX_WAIT, ...) instead on contention, which can properly suspend the thread exactly until the lock is released *and* is about 1/2 - 2/3 the cost of sched_yield in the case the lock was released before the thread could be put to sleep.

Whatever gains you think you may achieve, you won't.

My work on this was backed up with extensive benchmarking, disassembly of the generated code, and other evaluation. You can find some of the results in the list archive from about two years ago. There are many different types of system with different characteristics (uniprocessor vs multiprocessor, two threads vs 10000 threads, etc etc). Two particular cases that I'll mention are: - An inline spin lock is the only thing that doesn't involve a function call, so leaf functions remain leaf functions and are themselves more likely to be inlined or otherwise optimised. On systems with small caches or small flash chips where code size is important, this is a significant benefit. - sched_yield is a great deal easier to port between operating systems than anything else, other than a spin lock. Regards, Phil.

Helge Bahmann

1 Dec 1 Dec

10:05 a.m.

Hello Phil, On Mon, 30 Nov 2009, Phil Endecott wrote:

...

Helge Bahmann wrote:

...
if you don't believe me, just disassemble pthread_mutex_lock/unlock on any linux system.

When I last looked at the glibc source, I believe I found quite a lot of code that needed to test whether the mutex was e.g. recursive or not, on top of that core functionality.

Sure, there is a lot of code, but the fast-path check for a normal mutex (properly marked with __builtin_expect) is just at the top of the function, and it drops straight into CAS, so with the compiler doing its job, it *should* not matter. (However, the fetch of the thread id from TLS should be moved out of the fast path as well -- the compiler is probably not clever enough to move it away).

...

...
As for the space overhead of a pthread_mutex_t... if you cannot pay that, just use hashed locks.

A hashed lock library would be welcome here, I'm sure.

Yes, this would be a really helpful addition to Boost.Thread -- implementing fallback for atomic operations is just not feasible without.

...

...
Last note: calling "sched_yield" on contention is about the *worst* thing you can do -- linux/glibc will call futex(..., FUTEX_WAIT, ...) instead on contention, which can properly suspend the thread exactly until the lock is released *and* is about 1/2 - 2/3 the cost of sched_yield in the case the lock was released before the thread could be put to sleep.

Whatever gains you think you may achieve, you won't.

My work on this was backed up with extensive benchmarking, disassembly of the generated code, and other evaluation. You can find some of the results in the list archive from about two years ago. There are many different types of system with different characteristics (uniprocessor vs multiprocessor, two threads vs 10000 threads, etc etc). Two particular cases that I'll mention are:

I guess this is the code you used for testing? https://svn.chezphil.org/mutex_perf/trunk I would say that your conclusions are valid for ARM only (I don't know the architecture or libc peculiarities), for x86 there are some subtleties which IMHO invalidate the comparison. Your spinlock implementation defers to __sync_lock_test_and_set, which in turn generates an "xchgl" instruction, and NOT an "lock xchgl" instruction (yes, these gcc primitives are tricky which is why I avoid them). Section 8.2.2 of Intel's architecture manual (Volume 3B) states that only the "lock" variants of the atomic instructions serialize reads wrt to writes, therefore using a non-locked variant may allow reads to be scheduled out of the critical section and see invalid data. Additionally, __sync_synchronize just does nothing on x86, so this does not help either. Your VIA C3 will be classified as "i586" (not i686, although it is lacking only few instructions), and therefore not use NPTL/futex at all, but old LinuxThreads. (Yes, this border case is badly supported by glibc). I guess if you force linking to the "wrong" i686 pthread library, pthread_mutex_* numbers will improve drastically. Your futex implementation is however correct, and your P4 numbers for example match my numbers quite well -- and pthread_mutex_* is pretty close.

...

- An inline spin lock is the only thing that doesn't involve a function call, so leaf functions remain leaf functions and are themselves more likely to be inlined or otherwise optimised. On systems with small caches or small flash chips where code size is important, this is a significant benefit.

I'm not sure I'm following here -- for small cache sizes, inlining is *not* preferrable, right?

...

- sched_yield is a great deal easier to port between operating systems than anything else, other than a spin lock.

sched_yield has one very valid use case: SCHED_RR tasks of the same priority yielding to each other. Using it on lock contention is just calling into the scheduler saying "I'm waiting for something, but I won't tell you what -- please have a guess at what is best to do next". And when you have already gone to all the lengths of detecting contention, this is a relatively poor thing to do. Sometimes you may do this as an ugly hack, but I'm not sure this should be promoted as "good practice" by codifying this in a library. A home-grown futex-based implementation is of course valid and useful, but on most architectures it will not be faster, and when it is not, I fail to see why it would not be preferrable to fix the problems at the libc level instead. Regards, Helge

Anthony Williams

11:38 a.m.

Helge Bahmann <hcb@chaoticmind.net> writes:

...

On Mon, 30 Nov 2009, Phil Endecott wrote:

...
My work on this was backed up with extensive benchmarking, disassembly of the generated code, and other evaluation. You can find some of the results in the list archive from about two years ago. There are many different types of system with different characteristics (uniprocessor vs multiprocessor, two threads vs 10000 threads, etc etc). Two particular cases that I'll mention are:

I guess this is the code you used for testing?

https://svn.chezphil.org/mutex_perf/trunk

I would say that your conclusions are valid for ARM only (I don't know the architecture or libc peculiarities), for x86 there are some subtleties which IMHO invalidate the comparison.

Your spinlock implementation defers to __sync_lock_test_and_set, which in turn generates an "xchgl" instruction, and NOT an "lock xchgl" instruction (yes, these gcc primitives are tricky which is why I avoid them).

On x86 these are equivalent --- the LOCK prefix is automatically asserted for XCHG. See the XCHG instruction docs in the Intel manual volumne 2B. Anthony -- Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/ just::thread C++0x thread library http://www.stdthread.co.uk Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Helge Bahmann

12:42 p.m.

On Tue, 1 Dec 2009, Anthony Williams wrote:

...

Helge Bahmann <hcb@chaoticmind.net> writes:

...
On Mon, 30 Nov 2009, Phil Endecott wrote:

...
My work on this was backed up with extensive benchmarking, disassembly of the generated code, and other evaluation. You can find some of the results in the list archive from about two years ago. There are many different types of system with different characteristics (uniprocessor vs multiprocessor, two threads vs 10000 threads, etc etc). Two particular cases that I'll mention are:

I guess this is the code you used for testing?

https://svn.chezphil.org/mutex_perf/trunk

I would say that your conclusions are valid for ARM only (I don't know the architecture or libc peculiarities), for x86 there are some subtleties which IMHO invalidate the comparison.

Your spinlock implementation defers to __sync_lock_test_and_set, which in turn generates an "xchgl" instruction, and NOT an "lock xchgl" instruction (yes, these gcc primitives are tricky which is why I avoid them).

On x86 these are equivalent --- the LOCK prefix is automatically asserted for XCHG. See the XCHG instruction docs in the Intel manual volumne 2B.

Yes you're right, forgot this odd one :/ Which still makes me wonder what is going on -- it's the first time I see "lock xchgl" being noticeably faster than "lock cmpxchgl".

Stefan Strasser

12:13 p.m.

Am Tuesday 01 December 2009 11:05:26 schrieb Helge Bahmann:

...

...
A hashed lock library would be welcome here, I'm sure.

Yes, this would be a really helpful addition to Boost.Thread -- implementing fallback for atomic operations is just not feasible without.

could you explain this please? I use something like that myself, as a workaround, but I don't see how that is a desired solution. why would you hash to access something that should be one word in size? in makes sense if you try to avoid the pthread mutex memory overhead, but if you put effort into it wouldn't it make more sense to replicate exactly what pthreads does inside boost and avoid the overhead and the hashing?

Helge Bahmann

12:18 p.m.

On Tue, 1 Dec 2009, Stefan Strasser wrote:

...

Am Tuesday 01 December 2009 11:05:26 schrieb Helge Bahmann:

...
...
A hashed lock library would be welcome here, I'm sure.

Yes, this would be a really helpful addition to Boost.Thread -- implementing fallback for atomic operations is just not feasible without.

could you explain this please? I use something like that myself, as a workaround, but I don't see how that is a desired solution. why would you hash to access something that should be one word in size?

There must be a fallback implementation if the processor cannot perform an operation atomically -- and the template argument to atomic<> may for example be a double-word which not every processor can access atomically.

...

in makes sense if you try to avoid the pthread mutex memory overhead, but if you put effort into it wouldn't it make more sense to replicate exactly what pthreads does inside boost and avoid the overhead and the hashing?

I don't quite understand this comment -- what do you mean by "what pthreads does inside boost" ?

Stefan Strasser

1:57 p.m.

Am Tuesday 01 December 2009 13:18:19 schrieb Helge Bahmann:

...

On Tue, 1 Dec 2009, Stefan Strasser wrote:

...
Am Tuesday 01 December 2009 11:05:26 schrieb Helge Bahmann:

...
...
A hashed lock library would be welcome here, I'm sure.

Yes, this would be a really helpful addition to Boost.Thread -- implementing fallback for atomic operations is just not feasible without.

could you explain this please? I use something like that myself, as a workaround, but I don't see how that is a desired solution. why would you hash to access something that should be one word in size?

There must be a fallback implementation if the processor cannot perform an operation atomically -- and the template argument to atomic<> may for example be a double-word which not every processor can access atomically.

that's undisputed. my question was referring to a hashed lock library being a good addition to boost. why would you want to use hashed mutexes when you can implement a mutex in the size of a reference into a mutex table? if you don't use pthreads mutexes, but a mutex that doesn't waste memory.

Helge Bahmann

1:12 p.m.

On Tue, 1 Dec 2009, Stefan Strasser wrote:

...

Am Tuesday 01 December 2009 13:18:19 schrieb Helge Bahmann:

...
On Tue, 1 Dec 2009, Stefan Strasser wrote:

...
Am Tuesday 01 December 2009 11:05:26 schrieb Helge Bahmann:

...
...
A hashed lock library would be welcome here, I'm sure.

Yes, this would be a really helpful addition to Boost.Thread -- implementing fallback for atomic operations is just not feasible without.

could you explain this please? I use something like that myself, as a workaround, but I don't see how that is a desired solution. why would you hash to access something that should be one word in size?

There must be a fallback implementation if the processor cannot perform an operation atomically -- and the template argument to atomic<> may for example be a double-word which not every processor can access atomically.

that's undisputed. my question was referring to a hashed lock library being a good addition to boost. why would you want to use hashed mutexes when you can implement a mutex in the size of a reference into a mutex table?

Implementing a mutex requires atomic operations, so this just recurses the problem in case no atomic ops are available :) I don't want to store references into a mutex table anywhere -- the goal is to make atomic objects the exact size of their non-atomic counterpart (possibly padded to word size), and hash an object's address on access to map it to an entry of a fixed preallocated mutex table.

Phil Endecott

4:11 p.m.

Stefan Strasser wrote:

...

why would you want to use hashed mutexes when you can implement a mutex in the size of a reference into a mutex table?

Say I have: uint16_t counters[1000000]; If I want to use those counters from two threads I could pair each one with a mutex. With some thought I might be able to do that with just one bit per counter, but that's still 125 kbytes. Or, I could have 13 mutexes and hash from counter i to mutex i%13; even with a 24-byte pthread_mutex that's only 312 bytes. Phil.

Phil Endecott

4:02 p.m.

Helge Bahmann wrote: (I'm not going to reply to most of this because I've now forgotten most of what I learnt when I looked into it...)

...

...
- An inline spin lock is the only thing that doesn't involve a function call, so leaf functions remain leaf functions and are themselves more likely to be inlined or otherwise optimised. On systems with small caches or small flash chips where code size is important, this is a significant benefit.

I'm not sure I'm following here -- for small cache sizes, inlining is *not* preferrable, right?

Many C++ leaf functions are so trivial that they are smaller when inlined than when out-of-line, when you allow for the register-shuffling needed to get the arguments in the right places for the function call.

...

A home-grown futex-based implementation is of course valid and useful, but on most architectures it will not be faster, and when it is not, I fail to see why it would not be preferrable to fix the problems at the libc level instead.

I think that the main problem with pthread_mutex is that it has several features that unavoidably require extra space in the struct. This can't be fixed in libc without making it no longer a pthread_mutex. Phil.

Oliver Kowalke

30 Nov 30 Nov

7:35 a.m.

...

I suppose the version on the vault is the stable one, and the one on the sandbox the ongoing version.

correct

...

I know Interprocess has its share of mutexes AND its own scoped_lock/shared_lock/upgrade_lock. I was warming Oliver exactly for this reason, to avoid other family of locks.

see prev. post -- Sarah Kreuz, die DSDS-Siegerin der Herzen, mit ihrem eindrucksvollen Debütalbum "One Moment in Time". http://portal.gmx.net/de/go/musik

Oliver Kowalke

7:34 a.m.

...

I have a question. I understand we need fibers::mutex and fibers::condition_variable, but could you explain why do we need a separated fibers::lock_guard and fibers::unique_lock template classes? Why the ones from Boost.Thread are not usable in the fiber context, at the end the Mutex parameter can be any model of lockable?

IT's a question of task - I was not shure if I should intermix the thread namespace with the fiber namespace. If the locks are gerneric enought we could share it between thread/fiber/interprocess. -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

Vicente Botet Escriba

12:15 p.m.

Oliver Kowalke-2 wrote:

...

...
I have a question. I understand we need fibers::mutex and fibers::condition_variable, but could you explain why do we need a separated fibers::lock_guard and fibers::unique_lock template classes? Why the ones from Boost.Thread are not usable in the fiber context, at the end the Mutex parameter can be any model of lockable?

IT's a question of task - I was not sure if I should intermix the thread namespace with the fiber namespace. If the locks are generic enough we could share it between thread/fiber/interprocess. --

Oliver, lock_guard and unique_lock are on boost namespace so no intermix of namespaces. Is there any difference between yours boost::fibers::lock_guard and boost::fibers::unique_lock and boost::lock_guard and boost::unique_lock? If no, IMO you should remove them and request the user to use the provided by Boost.Thread. If yes, either your mutex implementation do not model the Lockable concept, or the boost::lock_guard and boost::unique_lock are not enough generic or some feature is missing. In any case we need a clear answer. Best, Vicente -- View this message in context: http://old.nabble.com/-fiber--new-version-in-vault-tp26557494p26573750.html Sent from the Boost - Dev mailing list archive at Nabble.com.

Oliver Kowalke

1:12 p.m.

...

Oliver, lock_guard and unique_lock are on boost namespace so no intermix of namespaces. Is there any difference between yours boost::fibers::lock_guard and boost::fibers::unique_lock and boost::lock_guard and boost::unique_lock? If no, IMO you should remove them and request the user to use the provided by Boost.Thread. If yes, either your mutex implementation do not model the Lockable concept, or the boost::lock_guard and boost::unique_lock are not enough generic or some feature is missing. In any case we need a clear answer.

I've removed the stuff from boost.fiber - thx Oliver -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02

Stefan Strasser

6:21 a.m.

Am Saturday 28 November 2009 23:29:00 schrieb Oliver Kowalke:

...

I've uploaded a new version of boost.fiber - main features are:

while looking at mutex.cpp, I think there is a bug in mutex::timed_lock(): according to http://www.boost.org/doc/libs/1_41_0/doc/html/thread/synchronization.html#th... "Attempt to obtain ownership for the current thread. Blocks until ownership can be obtained, or the specified time is reached. If the specified time has already passed, behaves as try_lock()." if the specified time has already passed, fiber::timed_lock does not behave as try_lock(), but returns false immediatly.

Oliver Kowalke

7:32 a.m.

...

if the specified time has already passed, fiber::timed_lock does not behave as try_lock(), but returns false immediatly.

I'll remove all timed-ops. from boost.fiber (they rare not documented - some code still remains in the code but will be romoved soon). -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser

5716

Age (days ago)

5719

Last active (days ago)

List overview

Download

29 comments

6 participants

participants (6)

Anthony Williams
Helge Bahmann
Oliver Kowalke
Phil Endecott
Stefan Strasser
Vicente Botet Escriba