[threads] Permission given to change to Boost.License

I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license. He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well. Will someone volunteer to change the files involved? Thanks, --Beman

From: "Beman Dawes" <bdawes@acm.org>
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
Can you pass on the warm wishes of all those like me who are grateful for his large contribution to the thread library. Paul Baxter

"Paul Baxter" <pauljbaxter@hotmail.com> writes:
From: "Beman Dawes" <bdawes@acm.org>
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
Can you pass on the warm wishes of all those like me who are grateful for his large contribution to the thread library.
Please!!! and for consenting to change the license! -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes wrote:
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
Thank goodness...it's hard to fathom that it took this long, but I'm glad you finally got thru. This has been a major thorn in our side...and I know many of us have tried and failed...
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
Will someone volunteer to change the files involved?
Ducks ;-) Jeff

Beman Dawes wrote:
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
Will someone volunteer to change the files involved?
I'll do that. Regards Hartmut
Thanks,
--Beman _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

On Sep 14, 2006, at 5:20 PM, Hartmut Kaiser wrote:
Beman Dawes wrote:
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
Will someone volunteer to change the files involved?
I'll do that.
Thanks! If you didn't know already, bcp has some code to help automatically migrate to the Boost Software License. Doug

Doug Gregor wrote:
Will someone volunteer to change the files involved?
I'll do that.
Thanks! If you didn't know already, bcp has some code to help automatically migrate to the Boost Software License.
Done (in 1.34 branch only, the head branch will be merged from here later on anyway, no?). I added a license statement to the documentation files as well (these were missing this completely) and tried to gather the corresponding information from the CVS. If I happened to forget somebody to mention accordingly, please add yourself! Regards Hartmut

"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
Doug Gregor wrote:
Will someone volunteer to change the files involved?
I'll do that.
Thanks! If you didn't know already, bcp has some code to help automatically migrate to the Boost Software License.
Done (in 1.34 branch only, the head branch will be merged from here later on anyway, no?).
Not by default; the official procedure is to commit to HEAD first and merge from there to the release branch. See the 5th bullet at http://boost-consulting.com/boost/more/release_procedures.htm#Developers [boost.org is down or responding too slowly again; we really need to get off the sf servers]
I added a license statement to the documentation files as well (these were missing this completely) and tried to gather the corresponding information from the CVS. If I happened to forget somebody to mention accordingly, please add yourself!
Thanks for your work on this, Hartmut! -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
Done (in 1.34 branch only, the head branch will be merged from here later on anyway, no?).
Not by default; the official procedure is to commit to HEAD first and merge from there to the release branch. See the 5th bullet at http://boost-consulting.com/boost/more/release_procedures.htm# Developers
[boost.org is down or responding too slowly again; we really need to get off the sf servers]
Done in CVS::HEAD now as well. Regards Hartmut

"Hartmut Kaiser" <hartmut.kaiser@gmail.com> writes:
David Abrahams wrote:
Done (in 1.34 branch only, the head branch will be merged from here later on anyway, no?).
Not by default; the official procedure is to commit to HEAD first and merge from there to the release branch. See the 5th bullet at http://boost-consulting.com/boost/more/release_procedures.htm# Developers
[boost.org is down or responding too slowly again; we really need to get off the sf servers]
Done in CVS::HEAD now as well.
Thanks!! -- Dave Abrahams Boost Consulting www.boost-consulting.com

Beman Dawes wrote:
I've just had an email exchange will Bill Kempf, and he has given permission to change the Boost.Threads licensing to the Boost license.
This is really good news!
He said that the reasons he has not been able to continue work on Boost.Threads are entirely personal, and that he does not expect to be able to help for the foreseeable future. He wishes Boost well.
My best wishes for Bill too. I hope the personal reasons are not caused by anything health related or something.
Will someone volunteer to change the files involved?
As I am currently the one trying to guide the rewrite project, I hereby restate my continuing willingness to maintain the thread lib. I am just back from holidays and I am trying to catch up. I have to admit, that the progress of the rewrite got a little slower than I have intended. Part of my problems came from the steep learning curve of boost.build. (I am willing to discuss this if there is interest.) While in no way I distrust this news, I would feel a little more comfortable if it were possible to get kind of a "approval" by Bill himself. It might very well be the case, that Bill Kempf thinks someone else should be entitled to maintain his work. E.g. Anthony Williams comes to mind, as he currently seems to be somewhat more active than me. Also I would like to get into direct contact with Bill if possible. I want to make sure that the direction of the started rewrite does not miss his intended goals entirely. I would be very glad if you could establish this contact, of course also off-list if Bill prefers. Thank you, Roland

Oh, it seems I answered a little to fast. I had not received the entire thread while composing my mail. Now since the license issues are resolved I rather should have asked if there still is interest or need for the rewrite at all. The license question was only one of the proposed changes (but the most important of course.) I try to sum up the other ones I had in mind: 1) split the code base into platform specific parts to reduce the clutter caused by "ifdefs". (Mostly done.) 2) Provide an easier to use side by side pthreads and native implementation, the user could choose from. Thereby providing a clean pthreads only version as a side effect, which could be helpful during approval, to show the migration path and compatibility with pthreads. (Partly done.) 3) Provide the boost build additions for the upcoming v2 version. (Currently in work.) I would be glad to hear if the boosters think I should continue on theses issues. If the consent is that it is not worth the effort I would abandon my work on this and put more energy to my other projects. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Oh, it seems I answered a little to fast. I had not received the entire thread while composing my mail.
Now since the license issues are resolved I rather should have asked if there still is interest or need for the rewrite at all.
The license question was only one of the proposed changes (but the most important of course.)
I try to sum up the other ones I had in mind:
1) split the code base into platform specific parts to reduce the clutter caused by "ifdefs". (Mostly done.)
2) Provide an easier to use side by side pthreads and native implementation, the user could choose from. Thereby providing a clean pthreads only version as a side effect, which could be helpful during approval, to show the migration path and compatibility with pthreads. (Partly done.)
3) Provide the boost build additions for the upcoming v2 version. (Currently in work.)
All of these ideas have merit; now you can feel free to prioritize them appropriately with other tasks without having to be concerned with doing a ground-up rewrite. -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
All of these ideas have merit; now you can feel free to prioritize them appropriately with other tasks without having to be concerned with doing a ground-up rewrite.
I.e. I may simply cut and paste Bill's code into the platform specific sections where apropriate. Ok, so I'll continue... Roland

Roland Schwarz wrote:
David Abrahams wrote:
All of these ideas have merit; now you can feel free to prioritize them appropriately with other tasks without having to be concerned with doing a ground-up rewrite.
I.e. I may simply cut and paste Bill's code into the platform specific sections where apropriate.
Ok, so I'll continue...
Maybe it's time to switch to HEAD? What does the Boost community think of http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2090.html http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2094.html http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2096.html ?

Peter Dimov wrote:
Maybe it's time to switch to HEAD?
I am not there yet. But if you think it is reasonable to postpone the boost.build v2 code, I can concentrate on getting the code to the state where it is possible to switch from thread_rewrite to HEAD. It should be much easier now since I know I can take Bills original code. However I have some concern about the windows version which is done by Anthony Williams. His new code departs from the current code in larger areas from Bills original. Roland

Roland Schwarz wrote:
Peter Dimov wrote:
Maybe it's time to switch to HEAD?
I am not there yet. But if you think it is reasonable to postpone the boost.build v2 code, I can concentrate on getting the code to the state where it is possible to switch from thread_rewrite to HEAD. It should be much easier now since I know I can take Bills original code.
However I have some concern about the windows version which is done by Anthony Williams. His new code departs from the current code in larger areas from Bills original.
One of the reasons for switching to HEAD is that anybody who is concerned about the fate of the library can see what is being done. I realize that one could track a branch if one really wanted to, but this is just not what happens in practice. If the windows version conforms to the specification and passes its tests, there's nothing to be concerned about; if there are positive interface changes (to pick some arbitrary examples, a better call_once, a working and efficient read/write mutex, or a condition that conforms to N2094 if we decide to pursue this) for which the windows version is a proof of concept implementation, this is a good thing too. But it needs to be on HEAD and it needs to be regression tested. We already "lost" much of Bill's work because it was done on a branch. Branches are evil. Incremental refactorings that cause no unintended regressions are good. :-)

Peter Dimov wrote:
One of the reasons for switching to HEAD is that anybody who is concerned about the fate of the library can see what is being done. I realize that one could track a branch if one really wanted to, but this is just not what happens in practice.
A second that.
If the windows version conforms to the specification and passes its tests, there's nothing to be concerned about;
Of course. What I wanted to point out is that I am not sure if William also "is there", ready for a switch to HEAD. E.g. I have been told by him that he has implemented a different mutex algorithm than Terekhovs. I am not sure if this algorithm is already at the same level of reliability.
... But it needs to be on HEAD and it needs to be regression tested. We already "lost" much of Bill's work because it was done on a branch. Branches are evil. Incremental refactorings that cause no unintended regressions are good. :-)
I also think that branches are evil, but this one was a very special case. Originally we thought, that we would need to rewrite the code, which I think is very different from an incremental change. (If it was we never could have hoped to escape from Bills license.) Fortunately this no longer is the case, and we can go back to incremental changes. For my part I will adjust my priorities so as to arrive at HEAD as soon as possible, while trying to preserve Bills original code as close as possible, to avoid unintended regressions. I do not know if William will revert his changes to Bills code base again, since he already changed a lot more. I also do not know if he already tried to run the regressions on his changes. We definitely will need to wait for Williams statement. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Peter Dimov wrote:
If the windows version conforms to the specification and passes its tests, there's nothing to be concerned about;
Of course. What I wanted to point out is that I am not sure if William also "is there", ready for a switch to HEAD. E.g. I have been told by him that he has implemented a different mutex algorithm than Terekhovs. I am not sure if this algorithm is already at the same level of reliability.
By "William", do you really mean me, Anthony Williams? If so, then most of the code is ready for a switch to HEAD, I believe. The mutex algorithm is fairly simple, but I guess it's the condition variables that you are thinking of. I believe it to be reliable, but that can only be borne out with experience. Putting it on HEAD would allow more people to try it out, and more people would actually look at the code. If it's broken, I'd like to know. The read-write mutexes aren't ready, yet, but the old ones are broken anyway. I have mostly added additional tests, rather than removing old ones. These should be moved to HEAD too.
I do not know if William will revert his changes to Bills code base again, since he already changed a lot more. I also do not know if he already tried to run the regressions on his changes.
I've run the tests on my changes many times, and they all pass, for the bits I've done. For the areas I haven't touched yet (TSS, thread launching, etc.), I'm happy to revert to Bill's code. For the areas I have reimplemented, I would prefer to go with my new code. In some cases, I believe it's an active improvement (e.g. call_once), and in others I think it will make it easier for me to extend, and alter the interface in line with the current standardization proposals. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
The mutex algorithm is fairly simple, but I guess it's the condition variables that you are thinking of. I believe it to be reliable, but that can only be borne out with experience. Putting it on HEAD would allow more people to try it out, and more people would actually look at the code. If it's broken, I'd like to know.
Inventing mutex and condition variable algorithms is a slight cause of concern, I have to admit. The easiest way to get a free review of the algorithms is to post the code on comp.programming.threads and hope that someone of the experts there has enough time to look at them. CVs in particular have proven to be quite hard to implement correctly, and Alexander Terekhov's "8a" algorithm has emerged as a de-facto standard.

Peter Dimov wrote:
Anthony Williams wrote:
The mutex algorithm is fairly simple, but I guess it's the condition variables that you are thinking of. I believe it to be reliable, but that can only be borne out with experience. Putting it on HEAD would allow more people to try it out, and more people would actually look at the code. If it's broken, I'd like to know.
Inventing mutex and condition variable algorithms is a slight cause of concern, I have to admit. The easiest way to get a free review of the algorithms is to post the code on comp.programming.threads and hope that someone of the experts there has enough time to look at them. CVs in particular have proven to be quite hard to implement correctly, and Alexander Terekhov's "8a" algorithm has emerged as a de-facto standard.
I think I'd agree with that: the issue is that no amount of experimental testing will ever be enough. Only a really thorough theoretical analysis is up to the job, and that's hard :-( Using a known good algorithm is definitely the way to go here IMO. Just another 2c worth... John.

"Peter Dimov" <pdimov@mmltd.net> writes:
Inventing mutex and condition variable algorithms is a slight cause of concern, I have to admit. The easiest way to get a free review of the algorithms is to post the code on comp.programming.threads and hope that someone of the experts there has enough time to look at them. CVs in particular have proven to be quite hard to implement correctly, and Alexander Terekhov's "8a" algorithm has emerged as a de-facto standard.
The mutex is simple, and you can see the code in boost/thread/win32/basic_mutex.hpp on the thread_rewrite branch. The algorithm is: init(): active_count=0; no semaphore lock(): atomic increment active_count if new active_count ==1, that's us, so we've got the lock else get semaphore, and wait now we've got the lock unlock(): atomic decrement active_count if new active_count >0, then other threads are waiting, so release semaphore. locked(): return active_count>0 get_semaphore(): if there's already a semaphore associated with this mutex, return that else create new semaphore. use atomic compare-and-swap to make this the associated semaphore if none if another thread beat us to it, and already set a semaphore, destroy new one, and return already-set one else return the new semaphore The condition variable algorithm is actually a variant of Alexander Terekhov's algorithm, and uses the same "gate" idea. However, my implementation uses QueueUserAPC to wake up the waiting threads, rather than have them wait on an event. This gets round what I think is a deficiency in Alexander's algorithm, which is that once a notify operation is underway, no new threads can wait on the condition until all notified threads have actually woken up, and the last one has "opened the gate". By using QueueUserAPC, we can notify the threads to wake up, and open the gate, without having to wait for all the notified threads to resume. The flipside is that the first thread to wait is the first to be notified. Whether this is a good or bad thing is open to debate. The code is in boost/thread/win32/condition.hpp on the thread_rewrite branch. The algorithm is as follows: waiting_list_node: waiting_list_node* next, prev HANDLE thread_handle bool notified waiting_list: doubly-linked list of waiting_list_node gate: mutex init(): waiting_list.next=waiting_list.prev=&waiting_list init mutex wait(external_mutex, timeout): create a new waiting_list_node new_node.thread_handle=thread handle for this thread new_node.prev=&waiting_list lock(gate) new_node.next=waiting_list.next waiting_list.next=&new_node new_node.next->prev=&new_node unlock(external_mutex) unlock(gate) // Any APC will break the sleep, so keep sleeping until we've been // notified, or we've timed out while(!atomic_read(new_node.notified) && SleepEx(milliseconds_until(timeout), true)==WAIT_IO_COMPLETION); lock(gate) unlink(new_node) unlock(gate) lock(external_mutex) return new_node.notified // did we timeout, or were we notified? unlink(node) // unlink the node from the list node.next->prev=new_node.prev node.prev->next=new_node.next node.next=node.prev=&node notify_and_unlink_entry(node) atomic_set(node->notified,true) unlink(node) // wake the node's thread by queueing an APC // the APC func doesn't have to do anything to wake the thread QueueUserAPC(NOP(),node->thread_handle) notify_one() lock(gate) if(waiting_list.prev==&waiting_list) do nothing else notify_and_unlink_entry(waiting_list.prev) unlock(gate) notify_all() create a waiting_list_node for new_list lock(gate) // transfer the existing list over to new_list new_list.prev=waiting_list.prev new_list.next=waiting_list.next new_list.next->prev=&new_list new_list.prev->next=&new_list // the waiting_list is now empty waiting_list.next=waiting_list.prev=&waiting_list unlock(gate) // noone has to wait for us any more while(new_list.prev!=&new_list) // loop until the captured list is empty notify_and_unlink_entry(new_list.prev) I'll post the algorithms over at comp.programming.threads for comment. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
The mutex is simple, and you can see the code in boost/thread/win32/basic_mutex.hpp on the thread_rewrite branch.
The algorithm is:
init(): active_count=0; no semaphore
lock(): atomic increment active_count if new active_count ==1, that's us, so we've got the lock else get semaphore, and wait now we've got the lock
unlock(): atomic decrement active_count if new active_count >0, then other threads are waiting, so release semaphore.
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right. Unfortunately, I can no longer find his post where he explained that with Google.

Peter Dimov wrote:
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right.
Unfortunately, I can no longer find his post where he explained that with Google.
Here it is: http://aspn.activestate.com/ASPN/Mail/Message/boost/2122746

"Peter Dimov" <pdimov@mmltd.net> writes:
Peter Dimov wrote:
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right.
Unfortunately, I can no longer find his post where he explained that with Google.
Here it is:
Thanks. I'll have to look at Alexander's proposed swap-based implementation. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
Peter Dimov wrote:
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right.
Unfortunately, I can no longer find his post where he explained that with Google.
Here it is:
Thanks. I'll have to look at Alexander's proposed swap-based implementation.
I've played with it a bit: http://www.pdimov.com/cpp/mutex.cpp but I suspect that there are a few errors in this old code... some of the reads are not atomic but need to be. Maybe we can give it a review and bring it up to Boost standards.

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
Peter Dimov wrote:
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right.
Unfortunately, I can no longer find his post where he explained that with Google.
Here it is:
Thanks. I'll have to look at Alexander's proposed swap-based implementation.
I've played with it a bit:
http://www.pdimov.com/cpp/mutex.cpp
but I suspect that there are a few errors in this old code... some of the reads are not atomic but need to be. Maybe we can give it a review and bring it up to Boost standards.
From the OS point-of-view, this is a producer-consumer scheme, with A being
The case given in Alexander's email above is: A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] A: lock - [lockcount == 1, slow path] He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved. By A taking the slow path, both A and B will wait on the semaphore, and the OS can pick which wakes up. In your code, we have: A: lock [lock_==0, fast path, set lock_ to 1] B: lock [lock_==1, slow path] B: slow wait [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] A: lock [lock_==0, fast path, set lock_ to 1] B: event posted, [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] If A does unlock/lock without any intervening code, those last three lines can repeat forever: A: lock [lock_==0, fast path, set lock_ to 1] B: event posted, [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] A: lock [lock_==0, fast path, set lock_ to 1] B: event posted, [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] A: lock [lock_==0, fast path, set lock_ to 1] B: event posted, [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] A: lock [lock_==0, fast path, set lock_ to 1] B: event posted, [lock_!=0, set lock_ to -1, wait on event] A: unlock [lock_<0, set lock_ to 0,post event] the producer and B the consumer. B is regularly woken, and does processing until it's next wait. Unfortunately, B never makes any progress. If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote: [...]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
That's not a problem as long what we need is efficient low level exclusive lock. If starvation is a problem, one can solve it with custom scheduling monitor on top of "grab if you can" low level exclusive lock without handoff. Low level, all pigs are equal. ;-) regards, alexander.

Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
That's not a problem as long what we need is efficient low level exclusive lock. If starvation is a problem, one can solve it with custom scheduling monitor on top of "grab if you can" low level exclusive lock without handoff. Low level, all pigs are equal. ;-)
So let them compete equally. If A doesn't wait on the semaphore, it has an unfair advantage. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
That's not a problem as long what we need is efficient low level exclusive lock. If starvation is a problem, one can solve it with custom scheduling monitor on top of "grab if you can" low level exclusive lock without handoff. Low level, all pigs are equal. ;-)
So let them compete equally. If A doesn't wait on the semaphore, it has an unfair advantage.
If you can make them both "wait" on something but without really bad consequence of handoff, I'd have no problems... except that it would simply add totally pointless overhead. regards, alexander.

Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote:
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
That's not a problem as long what we need is efficient low level exclusive lock. If starvation is a problem, one can solve it with custom scheduling monitor on top of "grab if you can" low level exclusive lock without handoff. Low level, all pigs are equal. ;-)
So let them compete equally. If A doesn't wait on the semaphore, it has an unfair advantage.
If you can make them both "wait" on something but without really bad consequence of handoff, I'd have no problems... except that it would simply add totally pointless overhead.
Yes, there's overhead in a semaphore wait, but it's not pointless --- it allows the OS to schedule the threads appropriately. If you have contention, both threads have to do the same thing, otherwise you've added an implicit bias. We could just do a spin-wait if we want really-fast locks and unlocks --- except that for long-lasting locks, a waiting thread would consume CPU time spinning, and hence hinder the progress of other threads. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote: [...]
bias. We could just do a spin-wait if we want really-fast locks and unlocks --- except that for long-lasting locks, a waiting thread would consume CPU time spinning, and hence hinder the progress of other threads.
Right. So we need to block/unblock. But there's no need for ownership handoff. So let unblocked thread compete just like in the case of a spinlock. Or rather think of block/unblock logic in efficient lock as just a spinlock with "usleep(magic());" resulting in exact right time to sleep until the lock is released by the owner. Okay? ;-) regards, alexander.

Anthony Williams wrote:
Yes, there's overhead in a semaphore wait, but it's not pointless --- it allows the OS to schedule the threads appropriately. If you have contention, both threads have to do the same thing, otherwise you've added an implicit bias.
Bias in favor of the currently running thread is good as it improves performance.
We could just do a spin-wait if we want really-fast locks and unlocks ---
Spin-wait can also be slower; it just burns CPU instead of giving it to a thread that could have done something with it. The performance of the individual thread may look like an improvement, but the total time for the task can still increase.

Anthony Williams wrote: [...]
If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab.
That depends on the definition of "fair". Suppose you're running on uniprocessor and that A is a higher priority thread than B... why should it yield to B? Under POSIX rules (which define priority scheduling for scheduling allocation domains of size 1, aka "uniprocessor" domains), it shall not. With your handoff logic, A will yield to B. regards, alexander.

Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab.
That depends on the definition of "fair". Suppose you're running on uniprocessor and that A is a higher priority thread than B... why should it yield to B? Under POSIX rules (which define priority scheduling for scheduling allocation domains of size 1, aka "uniprocessor" domains), it shall not. With your handoff logic, A will yield to B.
Not necessarily. Both A and B will wait for the semaphore. If A is higher priority, the OS scheduler can ensure that A gets it. By waiting on a semaphore, we leave this decision to the OS, which can then decide on the appropriate definition of "fair". If A doesn't wait on the semaphore, A gets to get the lock again just because it had it already at the start of its timeslice. What if B is higher priority than A? Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab.
That depends on the definition of "fair". Suppose you're running on uniprocessor and that A is a higher priority thread than B... why should it yield to B? Under POSIX rules (which define priority scheduling for scheduling allocation domains of size 1, aka "uniprocessor" domains), it shall not. With your handoff logic, A will yield to B.
Not necessarily. Both A and B will wait for the semaphore. If A is higher
Suppose that B is already waiting on the semaphore. If that is not the case, you'll get the same "starvation" scenario as with more efficient lock but with added overhead of self consuming semaphore signal. [...]
What if B is higher priority than A?
Then A will yield to B (waiting for the semaphore) on unlock/slow-path (efficient lock) and B will grab the lock (freed by A prior to signaling). regards, alexander.

Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote:
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab.
That depends on the definition of "fair". Suppose you're running on uniprocessor and that A is a higher priority thread than B... why should it yield to B? Under POSIX rules (which define priority scheduling for scheduling allocation domains of size 1, aka "uniprocessor" domains), it shall not. With your handoff logic, A will yield to B.
Not necessarily. Both A and B will wait for the semaphore. If A is higher
Suppose that B is already waiting on the semaphore. If that is not the case, you'll get the same "starvation" scenario as with more efficient lock but with added overhead of self consuming semaphore signal.
[...]
What if B is higher priority than A?
Then A will yield to B (waiting for the semaphore) on unlock/slow-path (efficient lock) and B will grab the lock (freed by A prior to signaling).
You are assuming that B will wake immediately upon A's release of the semaphore, and that it can grab it before A does. In your "fast lock" algorithm, if B doesn't wake immediately (or even if it does, but on another CPU/core), and A continues to run, A may still grab the lock back again before B can get it, even though B was already waiting, and B is higher priority. B can wake up and run some user code, and still not actually get out of the lock construct. If A always waits on the semaphore when there is another thread already waiting, then the OS will get to choose one (and only one) thread to wake, and that thread will get the lock. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote: [...]
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote:
Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
If A always takes the slow path when we have another waiter, then we give the OS more scope for ensuring that both A and B get a fair stab.
That depends on the definition of "fair". Suppose you're running on uniprocessor and that A is a higher priority thread than B... why should it yield to B? Under POSIX rules (which define priority scheduling for scheduling allocation domains of size 1, aka "uniprocessor" domains), it shall not. With your handoff logic, A will yield to B.
Not necessarily. Both A and B will wait for the semaphore. If A is higher
Suppose that B is already waiting on the semaphore. If that is not the case, you'll get the same "starvation" scenario as with more efficient lock but with added overhead of self consuming semaphore signal.
[...]
What if B is higher priority than A?
Then A will yield to B (waiting for the semaphore) on unlock/slow-path (efficient lock) and B will grab the lock (freed by A prior to signaling).
You are assuming that B will wake immediately upon A's release of the
I'm assuming that A and B are in the same scheduling allocation domain of size 1 (i.e. they compete for the processor cycles and can't make progress in parallel). POSIX defines neither cross-domain scheduling policy nor scheduling policy for domains of size > 1. Note that with scheduling allocation domain higher than 1, you also wouldn't want to hand ownership to the highest priority waiter because there might be any number of threads of higher priority than that waiter, any of which might want the lock "right now" (before unblocked thread is scheduled to run). So you just need to unblock and let unblocked thread compete for lock ownership without handoff. It is lack of competition for lock ownership after waiting that is the problem. regards, alexander.

Anthony Williams wrote:
The case given in Alexander's email above is:
A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] A: lock - [lockcount == 1, slow path]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
This is correct. The mutex does not guarantee fairness (for the above definition of fairness) at the expense of performance. It is indeed true that in heavily contended scenarios thread A can finish well ahead of thread B; however, the total time taken by threads A+B can be (and is) significantly less than that with the "fairer" CS design. As far as I know, most contemporary mutexes do the same, they give priority to thread A not only because it can take the fast path, but also because (a) its cache is hot and (b) it's better to minimize context switches.

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
The case given in Alexander's email above is:
A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] A: lock - [lockcount == 1, slow path]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
This is correct. The mutex does not guarantee fairness (for the above definition of fairness) at the expense of performance. It is indeed true that in heavily contended scenarios thread A can finish well ahead of thread B; however, the total time taken by threads A+B can be (and is) significantly less than that with the "fairer" CS design.
Probably dumb question: does this reasoning apply when A and B need to run "continuously" (e.g. in a server)? -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
The case given in Alexander's email above is:
A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] A: lock - [lockcount == 1, slow path]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
This is correct. The mutex does not guarantee fairness (for the above definition of fairness) at the expense of performance. It is indeed true that in heavily contended scenarios thread A can finish well ahead of thread B; however, the total time taken by threads A+B can be (and is) significantly less than that with the "fairer" CS design.
Probably dumb question: does this reasoning apply when A and B need to run "continuously" (e.g. in a server)?
I think that it does, because A and B are roles, not permanent labels. At the last lock above, A is the currently running thread; B is a thread that is currently blocked on the mutex. Since the scheduler distrubutes the "currently running" role among the threads based on its notion of fairness, a synchronization primitive that favors the currently running thread is as fair as the scheduler in the long run. I don't claim to be an expert on this stuff, though. :-)

"Peter Dimov" <pdimov@mmltd.net> writes:
David Abrahams wrote:
"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
The case given in Alexander's email above is:
A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] A: lock - [lockcount == 1, slow path]
He claims that the second A: lock takes the slow path, when it could have taken the fast path because the lock is free. I disagree. The lock is NOT free, since B is waiting on it. If A can take the fast path here, that's opening the door for B to be starved.
This is correct. The mutex does not guarantee fairness (for the above definition of fairness) at the expense of performance. It is indeed true that in heavily contended scenarios thread A can finish well ahead of thread B; however, the total time taken by threads A+B can be (and is) significantly less than that with the "fairer" CS design.
Probably dumb question: does this reasoning apply when A and B need to run "continuously" (e.g. in a server)?
I think that it does, because A and B are roles, not permanent labels. At the last lock above, A is the currently running thread; B is a thread that is currently blocked on the mutex. Since the scheduler distrubutes the "currently running" role among the threads based on its notion of fairness, a synchronization primitive that favors the currently running thread is as fair as the scheduler in the long run.
I get it: A fair scheduler will force context switches at arbitrary points (e.g. on some timer interval) anyway, so there's no point in _encouraging_ an expensive context switch where access to resources needs to be arbitrated. Now that I said it so simply, I buy 100% into the arguments you and Alexander have been putting forth... so I must be missing something :). Nothing in threading is supposed to be simple. OK, so here's what I think I missed: B is blocking on a resource. The scheduler only distributes the running role among the ready threads. If B never becomes ready, it starves. The only way B can become ready is if A releases its lock to the system, so I guess the question is, does A's unlock actually move B into the ready state, or at least cause the scheduler to try to ready B in its next timeslice? -- Dave Abrahams Boost Consulting www.boost-consulting.com

David Abrahams wrote:
OK, so here's what I think I missed: B is blocking on a resource. The scheduler only distributes the running role among the ready threads. If B never becomes ready, it starves. The only way B can become ready is if A releases its lock to the system, so I guess the question is, does A's unlock actually move B into the ready state, or at least cause the scheduler to try to ready B in its next timeslice?
A's unlock has to wake up B (otherwise it will never run, even though the mutex is free), so yes. The only flaw in the ointment ( :-) ) is if the scheduler decides to preempt A while it's inside the critical section. But this is a general problem with any mutex scheme; if threads are frequently preempted while holding a lock, then that lock's granularity is wrong and the application needs to be redesigned. The same situation can occur with three threads, by the way:
A: lock [lockcount == 0] B: lock [lockcount == 1, slow path] A: unlock [lockcount == 0, slow path/post sema] C: lock - [lockcount == 1, slow path]

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
The mutex is simple, and you can see the code in boost/thread/win32/basic_mutex.hpp on the thread_rewrite branch.
The algorithm is:
init(): active_count=0; no semaphore
lock(): atomic increment active_count if new active_count ==1, that's us, so we've got the lock else get semaphore, and wait now we've got the lock
unlock(): atomic decrement active_count if new active_count >0, then other threads are waiting, so release semaphore.
This is what CRITICAL_SECTIONs do, and I recall being told by Alexander Terekhov that it is suboptimal (threads take the slow path when they could take the fast path). My tests with a contended queue seemed to prove him right.
I'm not sure how it could be faster. If a thread takes the slow path that's because another thread already has the lock. It's possible that that other thread is just about to release it, but we can't know that.
Unfortunately, I can no longer find his post where he explained that with Google.
If there's a faster algorithm, I'd certainly like to see it, so it would be good if you could find the link. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote: [...]
// Any APC will break the sleep, so keep sleeping until we've been // notified, or we've timed out while(!atomic_read(new_node.notified) && SleepEx(milliseconds_until(timeout), true)==WAIT_IO_COMPLETION);
lock(gate)
(generally, access *this after waiting without any sync with dtor) The gate being just a mutex (in 8a the gate is a semaphore unlocked/posted by last exiting waiter in the case of no-NOP signaling). You've got a problem here regarding POSIX safety with respect to CV destruction. POSIX says that it "shall be safe to destroy an initialized condition variable upon which no threads are currently blocked" and even gives an example illustrating it using broadcast() followed by destroy(). 8a handles it by synchronizing with exiting waiters on a gate semaphore in CV dtor. regards, alexander.

Alexander Terekhov <terekhov@web.de> writes:
Anthony Williams wrote: [...]
// Any APC will break the sleep, so keep sleeping until we've been // notified, or we've timed out while(!atomic_read(new_node.notified) && SleepEx(milliseconds_until(timeout), true)==WAIT_IO_COMPLETION);
lock(gate)
(generally, access *this after waiting without any sync with dtor)
The gate being just a mutex (in 8a the gate is a semaphore unlocked/posted by last exiting waiter in the case of no-NOP signaling). You've got a problem here regarding POSIX safety with respect to CV destruction. POSIX says that it "shall be safe to destroy an initialized condition variable upon which no threads are currently blocked" and even gives an example illustrating it using broadcast() followed by destroy(). 8a handles it by synchronizing with exiting waiters on a gate semaphore in CV dtor.
I currently haven't thought about destruction much. This is an important issue, so thanks for raising it. I'll post afresh when I've got a solution. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
By "William", do you really mean me, Anthony Williams? Yes, sorry for having mispelled your name.
If so, then most of the code is ready for a switch to HEAD, I believe. Fine!
.... Putting it on HEAD would allow more people to try it out, and more people would actually look at the code. If it's broken, I'd like to know. Reasonable. And we always can step back, if really necessary.
The read-write mutexes aren't ready, yet, but the old ones are broken anyway. I didn't expect this to be ready for this first step anyways.
I've run the tests on my changes many times, and they all pass, for the bits I've done. For the areas I haven't touched yet (TSS, thread launching, etc.), I'm happy to revert to Bill's code. Ok. So I will take as a basis what is currently in win32 thread_rewrite.
Altough I still will need some more time for the boost build v2. I am currently having some issues with bbv2 windows build. And I am afraid we will need bbv2 now because regressions already are using it. If you (Anthony) think you can spare some time to help me finding what is going wrong, I would appreciate it. But I propose to continue discussion on the thread list, until we have sucessfully moved to HEAD. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Altough I still will need some more time for the boost build v2. I am currently having some issues with bbv2 windows build. And I am afraid we will need bbv2 now because regressions already are using it.
If you (Anthony) think you can spare some time to help me finding what is going wrong, I would appreciate it. But I propose to continue discussion on the thread list, until we have sucessfully moved to HEAD.
I'll gladly help. See you on the thread list. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
The read-write mutexes aren't ready, yet, but the old ones are broken anyway.
Feel free to use http://www.pdimov.com/cpp/rw_mutex.cpp until a better implementation turns up. :-)

On 9/19/06, Peter Dimov <pdimov@mmltd.net> wrote:
Anthony Williams wrote:
The read-write mutexes aren't ready, yet, but the old ones are broken anyway.
Feel free to use
http://www.pdimov.com/cpp/rw_mutex.cpp
until a better implementation turns up. :-)
Here's a lightweight starvation free (ticket based, no preference) reader-writer lock. It only works with vc++ right now, you'll have to change a _InterlockedCompareExchange() and Sleep() for other platforms but that should be easy. Otherwise it follows the boost model already, but uses scoped_read_lock and scoped_write_lock instead of just scoped_lock. http://dev.int64.org/snips/rwlock.hpp
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Cory Nelson http://www.int64.org

"Cory Nelson" <phrosty@gmail.com> writes:
On 9/19/06, Peter Dimov <pdimov@mmltd.net> wrote:
Anthony Williams wrote:
The read-write mutexes aren't ready, yet, but the old ones are broken anyway.
Feel free to use
http://www.pdimov.com/cpp/rw_mutex.cpp
until a better implementation turns up. :-)
Here's a lightweight starvation free (ticket based, no preference) reader-writer lock.
Thanks, guys. I'll have a look later. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

"Peter Dimov" <pdimov@mmltd.net> writes:
What does the Boost community think of
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2090.html http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2094.html http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2096.html
I prefer the thread launching API in N2090 to that in N2094, even though N2094 is closer to the existing boost::thread interface. The "futures" part of N2094 seems very akin to N2096, and reasonably sensible. I like the idea of condition variables being parameterised on the lock type. call_once is notably absent from all papers. You've already seen my comments on "convertible shared ownership" (from N2094) over on the committee reflector, but for everyone else: I think "convertible shared ownership" is a bad idea, as it raises the potential for deadlocks, and given exclusive ownership, shared ownership, and upgradable ownership, "convertible shared ownership" is not required. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Sep 19, 2006, at 11:03 AM, Anthony Williams wrote:
You've already seen my comments on "convertible shared ownership" (from N2094) over on the committee reflector, but for everyone else: I think "convertible shared ownership" is a bad idea, as it raises the potential for deadlocks, and given exclusive ownership, shared ownership, and upgradable ownership, "convertible shared ownership" is not required.
One of the points I keep forgetting to raise on this subject: convertible_shared_mutex and upgradable_mutex represent two engineering tradeoffs. One can not have a "perfect upgradable mutex" because there is inherently no perfect design. It is a compromise. upgradable_mutex represents the case where one *must* be able to convert from shared locking to exclusive locking. But it gives up concurrency to do so: though upgradable ownership can share with sharable ownership, it can't share with upgradable ownership. convertible_shared_mutex represents the opposite compromise: You loose nothing in concurrency, but you do loose the ability of knowing that you can definitely atomically upgrade to an exclusive lock. You might be able to, and you might have to settle for a non-atomic upgrade to exclusive. I believe that applications exist which favor both tradeoffs. Thus neither is a complete substitute for the other. -Howard
participants (13)
-
Alexander Terekhov
-
Anthony Williams
-
Beman Dawes
-
Cory Nelson
-
David Abrahams
-
Doug Gregor
-
Hartmut Kaiser
-
Howard Hinnant
-
Jeff Garland
-
John Maddock
-
Paul Baxter
-
Peter Dimov
-
Roland Schwarz