[thread] Roadmap for RW locks ?

Having tried to catch up with the last RW-lock discussion I am left wondering what is the conclusion of the discussion. In thread_rewrite there is an implementation from Anthony. This however currently is only for windows so far, correct? As I do understand the problem (please correct me if I am wrong) there is not a single reader-writer problem, but three: 1) reader-priority rw-access 2) writer-priority rw-access 3) starvation free rw-access which of these three cases applies is in the application domain and not in the domain of the library implementation. Consequently a generally usable library would need to provide mutices for all three cases. I can see this as a policy that is fed to the ctor of the rwmutex in the original draft of Bill Kempf: writer_priority, reader_priority, alternating_many_reads, alternating_single_read I cannot see however a similar policy parameter in thread_rewrite version. How is this getting controlled in this case? Glancing over Anthonys code I can see that he is making use of some atomic primitives and semaphore calls. So I think this should be portable fairly easy. However, since rw-mutices can be made of a mutex and some condition variables, shouldn't we follow the old principle of getting it correct first and optimize later? I'd very much like to see this happen. Not only is the generic version valuable as a fallback, but also as a reference to measure performance and correctness against. I propose the following roadmap: 0) Decide if we want the current (broken) version in 1.34. Please note that this is URGENT. (A previous poll about this interestingly did not get answered by anyone.) 1) Decide if the current interface proposed by Bill Kempf still is appropriate. 2) Correct the bug that is present in Bill's current (generic) implementation. 3) Optimize for each platform. Please give me your feedback about these issues. Thank you Roland

Roland Schwarz wrote:
Having tried to catch up with the last RW-lock discussion I am left wondering what is the conclusion of the discussion.
In thread_rewrite there is an implementation from Anthony. This however currently is only for windows so far, correct?
POSIX already has a rwlock; it's a mandatory part of the threading support (despite having its own feature test macro). A reimplementation could be necessary if the Boost version supports upgradable locks, though.
As I do understand the problem (please correct me if I am wrong) there is not a single reader-writer problem, but three:
1) reader-priority rw-access 2) writer-priority rw-access 3) starvation free rw-access
which of these three cases applies is in the application domain and not in the domain of the library implementation.
I can't imagine a use case for a reader-priority rwlock; it seems to me that an rwlock is simply not the correct synchronization primitive in such a case. Linux NPTL seems to supply one, though. So I may be wrong. I think that we now have a fairly good idea of how a rwlock is supposed to operate, and that our understanding coincides with the POSIX specification, under the assumption that the threads have equal priorities. In a nutshell, the rule is: 'rdlock' succeeds in obtaining a read lock if and only if no thread has obtained a write lock and there is no thread blocked in 'lock'. In my opinion, we only need to do one version, and do it right.
However, since rw-mutices can be made of a mutex and some condition variables, shouldn't we follow the old principle of getting it correct first and optimize later?
The problem in this case is that a rwlock is almost purely a performance optimization. It is generally not used for its semantics (although I can imagine a few use cases for them); a mutex is replaced with a rwlock in an attempt to increase reader performance, preferably without slowing down writers to a standstill. Viewed in this light, a read/write mutex that is slower than an ordinary mutex for the typical use cases is - IMO - not really worth having in the library.
0) Decide if we want the current (broken) version in 1.34. Please note that this is URGENT. (A previous poll about this interestingly did not get answered by anyone.)
We've never wanted a broken version of any component in any release. :-)

Peter Dimov wrote:
In my opinion, we only need to do one version, and do it right.
I am not yet convinced. Altough Wikipedia is not a citable resource: http://en.wikipedia.org/wiki/Readers-writers_problem Butenhof's example in "Programming with Posix Threads". Bill Kempfs Implementation. Several sites I looked up in the net. I really would like to ask: How is the problem teached in schools? E.g. in an operating systems course?
The problem in this case is that a rwlock is almost purely a performance optimization. It is generally not used for its semantics (although I can imagine a few use cases for them); a mutex is replaced with a rwlock in an attempt to increase reader performance, preferably without slowing down writers to a standstill.
??? Sorry I absolutely cannot understand the previous. A pure performance optimization? I always thought it was mainly used because of its semantics: You are letting multiple readers access at the same time: This is the main source of performance gain. Of course, if you are measuring threads, that do no operations while holding the lock, you will measure the price of the lock implementation. However if you are doing substantial calculation while holding the lock I am almsot sure that this will be a win, even with the "emulated" rw-lock.
Viewed in this light, a read/write mutex that is slower than an ordinary mutex for the typical use cases is - IMO - not really worth having in the library.
I am still not convinced. What are the typical cases? Couild you possibly come up with examples?
We've never wanted a broken version of any component in any release. :-)
Ok second vote to remove it. (Mine first. ;-) Roland

Roland Schwarz wrote:
Peter Dimov wrote:
In my opinion, we only need to do one version, and do it right.
I am not yet convinced.
Altough Wikipedia is not a citable resource: http://en.wikipedia.org/wiki/Readers-writers_problem
Butenhof's example in "Programming with Posix Threads".
Bill Kempfs Implementation.
Several sites I looked up in the net.
Have you ever encountered a situation in which you needed an implementation that could starve readers or writers? (Note that it is possible for a no-starvation implementation to favor readers or writers, but this is different.)
The problem in this case is that a rwlock is almost purely a performance optimization. It is generally not used for its semantics (although I can imagine a few use cases for them); a mutex is replaced with a rwlock in an attempt to increase reader performance, preferably without slowing down writers to a standstill.
???
Sorry I absolutely cannot understand the previous. A pure performance optimization? I always thought it was mainly used because of its semantics: You are letting multiple readers access at the same time: This is the main source of performance gain.
There are very few cases in which an rwlock is used for its semantics; by this I mean that the rwlock cannot be replaced by a mutex without changing the correctness of the code. In the majority of cases, a resource is protected by a rwlock not because the program logic demands reader parallelism, but because there is potentially a performance advantage in allowing reader parallelism. Therefore, if by replacing a rwlock with a mutex you are realizing a performance gain, this rwlock implementation has a negative contribution to the value of the library, because it lures users into what they believe to be an optimization, but is actually a pessimization. False advertising and all that.

Peter Dimov wrote:
Have you ever encountered a situation in which you needed an implementation that could starve readers or writers?
Al tough recognizing that this question likely is meant as a rhetorical one, I do not think that it matters whether I encountered such a situation. But I can easily envision an example: Garbage collection: one class of threads that use a container in a way that strongly interferes with the garbage collector. The first class are the "readers" and the garbage collector is the writer. The "readers" should be able to hold off the garbage collector with priority. I have to admit that this example is somewhat uncorrelated to a rw-lock since the "readers" likely are not only performing ro access. But perhaps I have a different concpetion about rw locks. To me they are a special case of the more general group locks. E.g.: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndllpro/ht...
There are very few cases in which an rwlock is used for its semantics; by this I mean that the rwlock cannot be replaced by a mutex without changing the correctness of the code.
Ok, understood, but there are cases.
In the majority of cases, a resource is protected by a rwlock not because the program logic demands reader parallelism, but because there is potentially a performance advantage in allowing reader parallelism. Therefore, if by replacing a rwlock with a mutex you are realizing a performance gain, this rwlock implementation has a negative contribution to the value of the library, because it lures users into what they believe to be an optimization, but is actually a pessimization. False advertising and all that.
Agreed. But what is wrong on providing the other scheduling strategies as well? Do you think that the current interface does prohibit optimization? And what is wrong to also have a generic variant for platforms where the optimization has not yet been carried out? If you think the interface is lacking, could you please tell in which respect? Roland

Roland Schwarz wrote:
Having tried to catch up with the last RW-lock discussion I am left wondering what is the conclusion of the discussion.
In thread_rewrite there is an implementation from Anthony. This however currently is only for windows so far, correct?
As I do understand the problem (please correct me if I am wrong) there is not a single reader-writer problem, but three:
1) reader-priority rw-access 2) writer-priority rw-access 3) starvation free rw-access
which of these three cases applies is in the application domain and not in the domain of the library implementation.
Consequently a generally usable library would need to provide mutices for all three cases. I can see this as a policy that is fed to the ctor of the rwmutex in the original draft of Bill Kempf:
writer_priority, reader_priority, alternating_many_reads, alternating_single_read
I cannot see however a similar policy parameter in thread_rewrite version. How is this getting controlled in this case?
Glancing over Anthonys code I can see that he is making use of some atomic primitives and semaphore calls. So I think this should be portable fairly easy.
However, since rw-mutices can be made of a mutex and some condition variables, shouldn't we follow the old principle of getting it correct first and optimize later? I'd very much like to see this happen. Not only is the generic version valuable as a fallback, but also as a reference to measure performance and correctness against.
I propose the following roadmap:
0) Decide if we want the current (broken) version in 1.34. Please note that this is URGENT. (A previous poll about this interestingly did not get answered by anyone.)
Apologies, but I think you know my vote: if it was too broken to go in 1.33 then we shouldn't have it in 1.34. When it resufaces there will be an expectation that it has actually been fixed.
1) Decide if the current interface proposed by Bill Kempf still is appropriate.
2) Correct the bug that is present in Bill's current (generic) implementation.
Is this the version that is in mainline cvs or something else? I ask because the version in current cvs was (all IMO) far too clever for it's own good: it hasn't been fixed because frankly, the design was such that it was hard, verging on impossible to fix. I'd much rather see a simpler interface with a lean and mean implementation that a mear human stands a chance of verifying as correct. Remember we can't rely on testing to verify correctness, we need to be able to analsise the code. BTW: I'm not criticising you guys over this: I actually think you've done a really impressive job rewriting and analysing the threading code! Regards, John.
3) Optimize for each platform.
Please give me your feedback about these issues.
Thank you
Roland
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Here is a new experimental eventcount based rw-mutex algorithm that I created: http://groups.google.com/group/comp.programming.threads/browse_thread/thread... (my rw-mutex algorithm...) http://groups.google.com/group/comp.programming.threads/browse_thread/thread... (my evenctount algorithm...) So, what do you all think of this these techniques? :^)

Roland Schwarz <roland.schwarz@chello.at> writes:
In thread_rewrite there is an implementation from Anthony. This however currently is only for windows so far, correct?
Yes. We need a POSIX version. For simple rw-locks, it should be a trivial wrapper around the POSIX-supplied rw-locks. For upgradeable stuff, we will need to wrap it.
As I do understand the problem (please correct me if I am wrong) there is not a single reader-writer problem, but three:
1) reader-priority rw-access 2) writer-priority rw-access 3) starvation free rw-access
which of these three cases applies is in the application domain and not in the domain of the library implementation.
IMO, 3 is the only way to go.
Consequently a generally usable library would need to provide mutices for all three cases. I can see this as a policy that is fed to the ctor of the rwmutex in the original draft of Bill Kempf:
writer_priority, reader_priority, alternating_many_reads, alternating_single_read
I cannot see however a similar policy parameter in thread_rewrite version. How is this getting controlled in this case?
It is not. After much discussion and various implementations from myself, Peter, and Chris, as well as examining prior implementations from others, the final algorithm is "fair". We all seemed to be in agreement that "fair" was the way to go. Reader-priority can potentially starve writers. Writer-priority can potentially starve readers. The "fair" solution in the thread-rewrite implementation is that a writer blocked in lock will stop additional readers, but when the last reader (or the only writer) unlocks then one writer and all the readers are unblocked. These then compete on equal terms. If a reader gets in first, but the writer blocks before the other readers have run, then they will block again.
Glancing over Anthonys code I can see that he is making use of some atomic primitives and semaphore calls. So I think this should be portable fairly easy.
Yes, it should. The hardest part will be the WaitForMultipleObjects used in lock() to ensure that a writer only unblocks when it can get the shared semaphore AND the exclusive semaphore. (implements generic version) This isn't too much of a problem, actually. If we use cond vars, then the mutex is held when they are signalled, so both the writer and the readers are all competing for the mutex in order to return from the wait.
However, since rw-mutices can be made of a mutex and some condition variables, shouldn't we follow the old principle of getting it correct first and optimize later? I'd very much like to see this happen. Not only is the generic version valuable as a fallback, but also as a reference to measure performance and correctness against.
OK, I'll provide a generic implementation of my algorithm.... Attached.
I propose the following roadmap:
0) Decide if we want the current (broken) version in 1.34. Please note that this is URGENT. (A previous poll about this interestingly did not get answered by anyone.)
I didn't reply because I had already expressed my opinion that it should be removed.
1) Decide if the current interface proposed by Bill Kempf still is appropriate.
No, it's too complicated. I think we should follow the interface from Howard's proposal to the C++ Standards committee (N2094).
2) Correct the bug that is present in Bill's current (generic) implementation.
No. This is not reasonably possible.
3) Optimize for each platform.
Optimization is important. As Peter says, the whole point of a rw-lock is that it is an optimization, so if it's slow, then it loses much of the benefit. I have now extended my implementation to include upgradeable stuff. I was planning on working on try and timed variants next. I have attached a generic implementation of my basic rw algorithm using a mutex and two cond vars, without the upgradeable stuff. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk #ifndef BOOST_THREAD_WIN32_READ_WRITE_MUTEX_HPP #define BOOST_THREAD_WIN32_READ_WRITE_MUTEX_HPP // (C) Copyright 2006 Anthony Williams // // Distributed under the Boost Software License, Version 1.0. (See // accompanying file LICENSE_1_0.txt or copy at // http://www.boost.org/LICENSE_1_0.txt) #include <boost/assert.hpp> #include <boost/static_assert.hpp> #include <boost/thread/mutex.hpp> #include <boost/thread/condition.hpp> namespace boost { class read_write_mutex { private: struct state_data { unsigned shared_count:10; unsigned shared_waiting:10; unsigned exclusive:1; unsigned exclusive_waiting:10; unsigned exclusive_waiting_blocked:1; friend bool operator==(state_data const& lhs,state_data const& rhs) { return *reinterpret_cast<unsigned const*>(&lhs)==*reinterpret_cast<unsigned const*>(&rhs); } }; state_data state; boost::mutex state_change; boost::condition shared_cond; boost::condition exclusive_cond; void release_waiters(state_data old_state) { if(old_state.exclusive_waiting) { exclusive_cond.notify_one(); } if(old_state.shared_waiting || old_state.exclusive_waiting) { shared_cond.notify_all(); } } public: read_write_mutex() { state_data state_={0}; state=state_; } ~read_write_mutex() { } void lock_shareable() { boost::mutex::scoped_lock lock(state_change); while(true) { state_data old_state=state; state_data new_state=old_state; if(new_state.exclusive || new_state.exclusive_waiting_blocked) { ++new_state.shared_waiting; } else { ++new_state.shared_count; } state=new_state; if(!(old_state.exclusive| old_state.exclusive_waiting_blocked)) { return; } shared_cond.wait(lock); } } void unlock_shareable() { boost::mutex::scoped_lock lock(state_change); state_data old_state=state; state_data new_state=old_state; bool const last_reader=!--new_state.shared_count; if(last_reader) { if(new_state.exclusive_waiting) { --new_state.exclusive_waiting; new_state.exclusive_waiting_blocked=false; } new_state.shared_waiting=0; } state=new_state; if(last_reader) { release_waiters(old_state); } } void lock() { boost::mutex::scoped_lock lock(state_change); while(true) { state_data old_state=state; state_data new_state=old_state; if(new_state.shared_count || new_state.exclusive) { ++new_state.exclusive_waiting; new_state.exclusive_waiting_blocked=true; } else { new_state.exclusive=true; } state=new_state; if(!old_state.shared_count && !old_state.exclusive) { break; } exclusive_cond.wait(lock); } } void unlock() { boost::mutex::scoped_lock lock(state_change); state_data old_state=state; state_data new_state=old_state; new_state.exclusive=false; if(new_state.exclusive_waiting) { --new_state.exclusive_waiting; new_state.exclusive_waiting_blocked=false; } new_state.shared_waiting=0; state=new_state; release_waiters(old_state); } class scoped_read_lock { read_write_mutex& m; public: scoped_read_lock(read_write_mutex& m_): m(m_) { m.lock_shareable(); } ~scoped_read_lock() { m.unlock_shareable(); } }; class scoped_write_lock { read_write_mutex& m; bool locked; public: scoped_write_lock(read_write_mutex& m_): m(m_),locked(false) { lock(); } void lock() { m.lock(); locked=true; } void unlock() { m.unlock(); locked=false; } ~scoped_write_lock() { if(locked) { unlock(); } } }; }; } #endif

Anthony Williams wrote:
OK, I'll provide a generic implementation of my algorithm.... Attached.
Since you are obviously deviating from the current interface (although I thought we agreed that we leave everything as is for the moment), may I kindly ask you to put together a short description or prototype of the planned new interface? The current implementation partly is so much unreadable because Bill tried hard to express common behavior (try and timed) by means of templates instead of inheritance. Are you planning to mimic this? Also you are providing scoped_read_lock, scoped_write_lock but no scoped_read_write_lock. Is my assumption correct that you renamed this to scoped_upgradeable_lock? Thank you, Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
OK, I'll provide a generic implementation of my algorithm.... Attached.
Since you are obviously deviating from the current interface (although I thought we agreed that we leave everything as is for the moment),
That was when the goal was a simple reimplementation under the BSL. Now Bill's code is under the BSL, the goalposts have moved to focusing on a potential interface for the new standard. Are you in agreement that this is a worthwhile goal?
may I kindly ask you to put together a short description or prototype of the planned new interface?
Howard does quite a good job in N2094: http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2094.html I like the idea of maintaining the mutex_type::scoped_lock members, but they could just be typedefs to lock<mutex_type>, as they can just delegate to the appropriate public member functions. If you would like , I'll be happy to oblige.
The current implementation partly is so much unreadable because Bill tried hard to express common behavior (try and timed) by means of templates instead of inheritance. Are you planning to mimic this?
I intend to try and keep the implementation as clear and simple as possible, which includes eliminating duplication. Sometimes this means templates, sometimes inheritance. I am not sure we need separate types for try_ and timed_ variants of a given mutex, but if we need them for one platform (because a mutex without the options can be simplified), it might be worth providing typedefs or delegating classes for other platforms.
Also you are providing scoped_read_lock, scoped_write_lock but no scoped_read_write_lock. Is my assumption correct that you renamed this to scoped_upgradeable_lock?
Yes. This is one area where I'm not sure I agree with Howard. Howard seems to have gone "all moveable", and his proposal allows for upgrading and downgrading locks by moving them around --- you upgrade an upgradeable to exclusive by move-constructing an exclusive lock from your upgradeable one. Though I think this is a useful facility, it is quite a drastic change from the status quo, and at the moment I favour simple "upgrade" and "downgrade" member functions. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
That was when the goal was a simple reimplementation under the BSL. Now Bill's code is under the BSL, the goalposts have moved to focusing on a potential interface for the new standard. Are you in agreement that this is a worthwhile goal?
Well, yes in principle, but what about the ongoing platform split? It is not finished yet. And then we should concentrate on getting 1.34 ready shouldn't we? I have to say that it is a rather thankless task to do the work for the split, just to possibly see that everything is superfluous, because we are doing a new interface. (I am also just doing this in my spare time, and I also rather like thinking about new stuff.) I am really inclined to abandon work on thread_rewrite if we instead opt. to concentrate on a new interface.
Howard does quite a good job in N2094:
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2094.html
To be honest I didn' yet study this in every detail.
I like the idea of maintaining the mutex_type::scoped_lock members, but they could just be typedefs to lock<mutex_type>, as they can just delegate to the appropriate public member functions.
I prefer lock being templated on the mutex because I think the lock class is the more important object because of its enforcement of the scoped usage pattern.
If you would like , I'll be happy to oblige.
Possibly yes :-)
I am not sure we need separate types for try_ and timed_ variants of a given mutex, but if we need them for one platform (because a mutex without the options can be simplified), it might be worth providing typedefs or delegating classes for other platforms.
Why not simply class scoped_lock class scoped_try_lock : public scoped_lock class scoped timed_lock : public scoped_try_lock Or are you afraid of the virtual functions this implies? As far as I understand Howards draft puts this into a single class that is capable of all variants, like can be expressed by a timeout parameter too: 0 : try, number : timed, infinite : basic
Howard seems to have gone "all moveable", and his proposal allows for upgrading and downgrading locks by moving them around --- you upgrade an upgradeable to exclusive by move-constructing an exclusive lock from your upgradeable one.
Ok, have to study this in more detail. Roland

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
That was when the goal was a simple reimplementation under the BSL. Now Bill's code is under the BSL, the goalposts have moved to focusing on a potential interface for the new standard. Are you in agreement that this is a worthwhile goal?
Well, yes in principle, but what about the ongoing platform split? It is not finished yet.
We likely need a platform split whatever interface we go with, since pthreads is quite different at the API level to Windows, even if the same facilities are on offer. What's left to do? How can I help?
And then we should concentrate on getting 1.34 ready shouldn't we?
If there's something that needs doing for 1.34, it should take priority. I really was under the impression that there was nothing to do here, since the existing code is now under the BSL, and the tests all pass. Are them outstanding bugs that need fixing? Am I missing something? I had missed the fact that the existing rwmutex was still there, even though it was removed from 1.33, so it's quite possible that I'm missing something else too.
I have to say that it is a rather thankless task to do the work for the split, just to possibly see that everything is superfluous, because we are doing a new interface. (I am also just doing this in my spare time, and I also rather like thinking about new stuff.)
Sorry you're finding it thankless. How can I help?
I am really inclined to abandon work on thread_rewrite if we instead opt. to concentrate on a new interface.
It would be a shame if we lost you. Wouldn't you be interested in developing the new interface?
If you would like , I'll be happy to oblige.
Possibly yes :-)
OK, I'll try and draft something. No doubt it will make things clearer to me, too.
I am not sure we need separate types for try_ and timed_ variants of a given mutex, but if we need them for one platform (because a mutex without the options can be simplified), it might be worth providing typedefs or delegating classes for other platforms.
Why not simply class scoped_lock class scoped_try_lock : public scoped_lock class scoped timed_lock : public scoped_try_lock
Or are you afraid of the virtual functions this implies?
The scoped_xxx_lock classes all have slightly different interfaces at the moment, which makes me a bit uncomfortable. For example, a scoped_try_lock constructed with a mutex may or may not lock, whereas a scoped_lock constructed with a mutex always locks, and a scoped_timed_lock cannot be constructed with just a mutex. OTOH, all three scoped_xxx_lock classes have a constructor that takes a bool as a second parameter and will always lock if that parameter is true. However, that wasn't what I was referring to. I was thinking about mutex, try_mutex and timed_mutex, which I think can really be one and the same class. Empty inheritance works fine if you want distinct types, provided you don't need them to be POD. recursive_mutex requires a recursion count, so can justify being a separate class. For read/write mutexes, a simple read/write implementation is simpler than an upgradeable one (and in my implementation requires one less semaphore) so it might make sense to have two varieties. Howard calls them shar(e)able_mutex and upgrad(e)able_mutex in N2094. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
That was when the goal was a simple reimplementation under the BSL. Now Bill's code is under the BSL, the goalposts have moved to focusing on a potential interface for the new standard. Are you in agreement that this is a worthwhile goal?
Well, yes in principle, but what about the ongoing platform split? It is not finished yet.
We likely need a platform split whatever interface we go with, since pthreads is quite different at the API level to Windows, even if the same facilities are on offer.
There is also the option of implementing pthread_rwlock_* on Windows, keeping the higher layer platform-independent. This obviously requires a higher layer that agrees with the pthread_rwlock_* interface.

"Peter Dimov" <pdimov@mmltd.net> writes:
Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
That was when the goal was a simple reimplementation under the BSL. Now Bill's code is under the BSL, the goalposts have moved to focusing on a potential interface for the new standard. Are you in agreement that this is a worthwhile goal?
Well, yes in principle, but what about the ongoing platform split? It is not finished yet.
We likely need a platform split whatever interface we go with, since pthreads is quite different at the API level to Windows, even if the same facilities are on offer.
There is also the option of implementing pthread_rwlock_* on Windows, keeping the higher layer platform-independent.
Which still requires a windows-specific layer, and thus a platform split.
This obviously requires a higher layer that agrees with the pthread_rwlock_* interface.
Which means no upgradeable locks, if nothing else. The platform split issue is wider than just rwlocks, though. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Peter Dimov wrote:
There is also the option of implementing pthread_rwlock_* on Windows, keeping the higher layer platform-independent. This obviously requires a higher layer that agrees with the pthread_rwlock_* interface.
Sorry, but I am not sure If I understand what you mean. Do you mean the windows version should be implemented on top of pthread? In the platform split I took the following route: 1) Native platform API: this means win32API on windows, pthread where pthread is considered native possibly native linux, i.e. semaphores, fork, NPTL, ... 2) pthread API: For every platform where a pthread lib is available provide a pthread variant. I.e.: pthreads-win32, pthread on linux, pthread OSX, ... In the end the user may choose from 2 library variants on platforms that have both API's available. If I may understand your sentence in this respect: Yes of course, we should and will do this on windows. Roland

Roland Schwarz wrote:
Peter Dimov wrote:
There is also the option of implementing pthread_rwlock_* on Windows, keeping the higher layer platform-independent. This obviously requires a higher layer that agrees with the pthread_rwlock_* interface.
Sorry, but I am not sure If I understand what you mean. Do you mean the windows version should be implemented on top of pthread?
I meant that one way to handle the differences between Windows and everything else is to implement the C++ layer on top of pthread_* calls, then implement the necessary pthread_* calls on Windows. Something like Boost.Pthread-Win32, but possibly less complete.

"Peter Dimov" <pdimov@mmltd.net> writes:
Roland Schwarz wrote:
Peter Dimov wrote:
There is also the option of implementing pthread_rwlock_* on Windows, keeping the higher layer platform-independent. This obviously requires a higher layer that agrees with the pthread_rwlock_* interface.
Sorry, but I am not sure If I understand what you mean. Do you mean the windows version should be implemented on top of pthread?
I meant that one way to handle the differences between Windows and everything else is to implement the C++ layer on top of pthread_* calls, then implement the necessary pthread_* calls on Windows. Something like Boost.Pthread-Win32, but possibly less complete.
IMO, that is one abstraction layer too many, and gives us less flexibility in how to implement things on Windows. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
We likely need a platform split whatever interface we go with, since pthreads is quite different at the API level to Windows, even if the same facilities are on offer. What's left to do? How can I help?
I am still working to get bbv2 straight, there are still some issues. But since I vaguely remember that you already said, that writing jam does not belong to your favored tasks, I'm afraid I'll have to continue this. Then rw of course. But luckily enough you are already on this. Then testing and checking if everything works. I started with the examples. Btw.: What do you think about the examples? Shouldn't we also review them? Put better comments inside? Are they really instructive for a user?
If there's something that needs doing for 1.34, it should take priority. I really was under the impression that there was nothing to do here, since the existing code is now under the BSL, and the tests all pass. Are them outstanding bugs that need fixing? Am I missing something?
Wow! Today we are completely green! Only a shade of gray on an old compiler. Yesterday I remember having seen a red light for intel- vc71- win- 9.1. Hopefully this is not a spurious error that hides some problems. Do you know how we can turn off the "unexpectedly passing" deep green signs? E.g. I have made the tss cleanup beeing not present for native threads a warning, since I believe this is an optimization only since everything will run without it. In the worst case the application is left with a bounded memory-leak. And when using boost created thread even this leak will not be present. Could you please help with this?
I had missed the fact that the existing rwmutex was still there, even though it was removed from 1.33, so it's quite possible that I'm missing something else too.
True, it was pruned from 1.33, but not from head (fortunately). Can you help me getting it out off the documentation? I am not sure if it sufficient to pull out a single file, because it might be referenced from several places. I suggest doing this again first only for the RC_1_34_0 branch. Please put me a short note in case you are starting with this, so we do not end up editing the same files concurrently.
It would be a shame if we lost you. Wouldn't you be interested in developing the new interface?
Thank you. :-) This indeed keeps me motivated! Yes of course I would like to share the crowd.
OK, I'll try and draft something. No doubt it will make things clearer to me, too.
Very fine! I am looking for this eagerly.
The scoped_xxx_lock classes all have slightly different interfaces at the moment, which makes me a bit uncomfortable. For example, a scoped_try_lock constructed with a mutex may or may not lock, whereas a scoped_lock constructed with a mutex always locks, and a scoped_timed_lock cannot be constructed with just a mutex. OTOH, all three scoped_xxx_lock classes have a constructor that takes a bool as a second parameter and will always lock if that parameter is true.
It is not the differences about the interfaces that puzzles me, but the fact that recursive_timed_mutex::scoped_lock lk(m); is the same as: mutex::scoped_lock lk(m); This has the potential to unsettle new users of the lib. (I also needed some time to understand this concept, and did only after having read the code.) In this respect I like Howards proposal, simply templating the lock on the mutex type. It suffices if it is impossible to instantiate e.g.: mutex m; scoped_timed_lock lk(m) What is the benefit of having the locks inside the mutex declaration?
However, that wasn't what I was referring to. I was thinking about mutex, try_mutex and timed_mutex, which I think can really be one and the same class. Empty inheritance works fine if you want distinct types, provided you don't need them to be POD.
This might be true for the mutex, but not the locks, yes? But beware, wouldn't we give away the possibility to e.g optimize a plain mutex more than a full blown timed_mutex? On the other hand try, timed, and plain are just variations of timeout: 0 sec, x sec, inifinite. Do you have any findings about loss of optimizability? At least I think any op-sys waitable object has the timeout feature anyways.
recursive_mutex requires a recursion count, so can justify being a separate class.
Agreed. So we would end up with mutex and recursive_mutex, yes? Roland

On Nov 1, 2006, at 4:06 PM, Roland Schwarz wrote:
So we would end up with mutex and recursive_mutex, yes?
Instead of a recursive_mutex, how would you feel about a recursive_mutex *adaptor* (like stack is to containers). Here's a prototype. Warning, it is not mature: template <class Mutex> class recursive_mutex { private: Mutex& mut_; unsigned count_; thread_util::id id_; public: explicit recursive_mutex(Mutex& mut) : mut_(mut), count_(0), id_(thread_self::not_any_thread()) {} void lock() { thread_self::id self = thread_self::current(); if (thread_self::equal(id_, self)) ++count_; else { mut_.lock(); id_ = self; store_release(&count_, 1); } } void unlock() { if (--count_ == 0) { id_ = thread_self::not_any_thread(); mut_.unlock(); } } }; The basic idea is a mutex *adaptor*. Everything else I have wrong above is just details. ;-) -Howard

Howard Hinnant wrote:
Instead of a recursive_mutex, how would you feel about a recursive_mutex *adaptor* (like stack is to containers). Here's a prototype. Warning, it is not mature:
< prototype snipped > I might miss the point. But the problem is not only how to emulate a recursive mutex, in case the operating system does not allow recursive locking, but how to expose these two kinds if they exist on the opsys API level. If they exist I'd rather use them, instead letting the library emulate. No? Roland

On Nov 1, 2006, at 4:49 PM, Roland Schwarz wrote:
I might miss the point. But the problem is not only how to emulate a recursive mutex, in case the operating system does not allow recursive locking, but how to expose these two kinds if they exist on the opsys API level. If they exist I'd rather use them, instead letting the library emulate. No?
Yes! template <> class recursive_mutex<native-type> { native_type mut_; public: void lock() {mut_.lock();} // or whatever straight drop through to the OS ... }; And yet you can still say recursive_mutex<upgradable_mutex> and get the (more expensive) emulation. This is just exactly how and why I want to template condition as well: template <class Lock> class condition { ... }; template <> class condition<native-type> { ... straight drop through to native-condition of native-mutex }; -Howard

Roland Schwarz <roland.schwarz@chello.at> writes:
Anthony Williams wrote:
We likely need a platform split whatever interface we go with, since pthreads is quite different at the API level to Windows, even if the same facilities are on offer. What's left to do? How can I help?
I am still working to get bbv2 straight, there are still some issues. But since I vaguely remember that you already said, that writing jam does not belong to your favored tasks, I'm afraid I'll have to continue this.
I've just never spent the time to really understand how boost.build works. If there's issues, I'll have a go at finding them, if you like.
Then rw of course. But luckily enough you are already on this. Then testing and checking if everything works. I started with the examples.
Btw.: What do you think about the examples? Shouldn't we also review them? Put better comments inside? Are they really instructive for a user?
I guess we should review them, especially if we change the interface.
If there's something that needs doing for 1.34, it should take priority. I really was under the impression that there was nothing to do here, since the existing code is now under the BSL, and the tests all pass. Are them outstanding bugs that need fixing? Am I missing something?
Wow! Today we are completely green! Only a shade of gray on an old compiler. Yesterday I remember having seen a red light for intel- vc71- win- 9.1. Hopefully this is not a spurious error that hides some problems. Do you know how we can turn off the "unexpectedly passing" deep green signs? E.g. I have made the tss cleanup beeing not present for native threads a warning, since I believe this is an optimization only since everything will run without it. In the worst case the application is left with a bounded memory-leak. And when using boost created thread even this leak will not be present. Could you please help with this?
I thought the deep green signs were where the tests were marked as "expected failure", yet passed anyway. I'll adjust status/expected_failures_markup.xml so they're expected to pass.
I had missed the fact that the existing rwmutex was still there, even though it was removed from 1.33, so it's quite possible that I'm missing something else too.
True, it was pruned from 1.33, but not from head (fortunately). Can you help me getting it out off the documentation? I am not sure if it sufficient to pull out a single file, because it might be referenced from several places. I suggest doing this again first only for the RC_1_34_0 branch. Please put me a short note in case you are starting with this, so we do not end up editing the same files concurrently.
Sure, I'll help. I'll drop you a note when I get on it.
What is the benefit of having the locks inside the mutex declaration?
Consistency with previous releases.
However, that wasn't what I was referring to. I was thinking about mutex, try_mutex and timed_mutex, which I think can really be one and the same class. Empty inheritance works fine if you want distinct types, provided you don't need them to be POD.
This might be true for the mutex, but not the locks, yes?
Correct.
But beware, wouldn't we give away the possibility to e.g optimize a plain mutex more than a full blown timed_mutex? On the other hand try, timed, and plain are just variations of timeout: 0 sec, x sec, inifinite. Do you have any findings about loss of optimizability? At least I think any op-sys waitable object has the timeout feature anyways.
try, timed, and plain lock are different functions, so can use different methods for locking the mutex. In my implementation on Windows, there is no penalty for adding timed locks to the mix, and try locks are really just a single call to InterlockedCompareExchange. With a different implementation, this might be different. On pthreads, we can just call the POSIX functions.
recursive_mutex requires a recursion count, so can justify being a separate class.
Agreed. So we would end up with mutex and recursive_mutex, yes?
Yes, though I like Howard's idea of recursive_mutex being an adaptor, with specializations for optimization. Plus, I think we should have a shareable_mutex, which only supports simple read locks and write locks and an upgradeable_mutex which also supports upgradeable locks. Howard has both of these in his proposal. I really don't like the convertible_shareable concept. A shareable_mutex would be implementable using the POSIX rwlock stuff, whereas an upgradeable_mutex requires a custom implementation. An upgradeable version also requires more OS sync primitives on Windows, at least with my implementation. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Nov 2, 2006, at 3:23 AM, Anthony Williams wrote:
Agreed. So we would end up with mutex and recursive_mutex, yes?
Yes, though I like Howard's idea of recursive_mutex being an adaptor, with specializations for optimization.
Btw, recursive_mutex adaptor, if it is a good idea in the first place, is a good example of an application needing to lock a mutex in a non-scoped manner. Ion pointed out to me that recursive_mutex shouldn't hold a mutex&, but rather a mutex. It can then be constructed in place with perfectly forwarding variadic templates: template <class Mutex> class recursive_mutex { private: Mutex mut_; unsigned count_; thread_util::id id_; public: template <class ...T> explicit recursive_mutex(T&& t...) : mut_(std::forward<T>(t)...), count_(0), id_ (thread_self::not_any_thread()) {} I probably messed up the variadic template syntax. But this way you don't have to have an external mutex sitting out somewhere, but if you want that you can use recurisive_mutex<mutex&> (which could also be specialized as a straight drop through if need be). This is of course somewhat emulate-able in C++03 (like bind does). I should also stress: recursive_mutex is an experiment, not a proposed solution. I'm just thinking out loud, and would like more help in the thinking department. :-)
I was thinking about mutex, try_mutex and timed_mutex, which I think can really be one and the same class.
I've also been experimenting with a timed_mutex adaptor: <experimental> struct event { mutex mut_; condition<mutex> cond_; }; template <class Mutex> class timed_mutex { private: Mutex mut_; event sig_; timespec abs_time_; timed_mutex(const timed_mutex&); timed_mutex& operator=(const timed_mutex&); public: template <class ...T> timed_mutex(T&& t..., const timespec& abs_time) : mut_(std::forward<T>(t)...), abs_time_(abs_time)) {} void lock() { mut_.lock(); } bool try_lock() { exclusive_lock<mutex> lk(sig_.mut_); bool timed_out; bool owns; do { owns = mut_.try_lock(); if (owns) break; timed_out = !sig_.cond_.timed_wait(sig_.mut_, abs_time_); } while (!timed_out); return owns; } void unlock() { mut_.unlock(); sig_.cond_.notify_all(); } }; </experimental> This is even less mature than the recursive_mutex. But hopefully people can see where I'm trying to go: Instead of having recursive and timed flavors of every kind of mutex (exclusive, sharable, upgradable), one might instead have adaptors that one could apply to these, and perhaps to user-defined mutexes as well (more motivation for mutex concepts: here's more generic mutex code people could plug into). -Howard

Howard Hinnant <hinnant@twcny.rr.com> writes:
On Nov 2, 2006, at 3:23 AM, Anthony Williams wrote:
Agreed. So we would end up with mutex and recursive_mutex, yes?
Yes, though I like Howard's idea of recursive_mutex being an adaptor, with specializations for optimization.
Btw, recursive_mutex adaptor, if it is a good idea in the first place, is a good example of an application needing to lock a mutex in a non-scoped manner.
Only if you allow non-scoped locking of the recursive mutex adaptor. If you only allowed scoped locks, then the first scoped lock of the recursive mutex can check to see if this thread owns the mutex. If so, then the whole scoped_lock is a no-op. If not, then we need to actually lock the underlying mutex. template<class Mutex> class recursive_mutex { private: Mutex m; thread_util::id id; public: recursive_mutex(): id(thread_util::not_any_thread()) {} class scoped_lock { recursive_mutex& m; std::auto_ptr<typename Mutex::scoped_lock> lock; public: scoped_lock(recursive_mutex& m_): m(m_) { if(m.id!=thread_util::self()) // assume atomic read { lock.reset(new typename Mutex::scoped_lock(m.m)); m.id=thread_util::self(); } } ~scoped_lock() { if(lock.get()) { m.id=thread_util::not_any_thread(); } } }; };
Ion pointed out to me that recursive_mutex shouldn't hold a mutex&, but rather a mutex.
Yes. I always assumed it would be a type adaptor, not an instance adaptor.
I should also stress: recursive_mutex is an experiment, not a proposed solution. I'm just thinking out loud, and would like more help in the thinking department. :-)
Well, my Windows implementation of a recursive mutex is essentially just an adaptor on boost::mutex, so your thinking coincides with mine.
I was thinking about mutex, try_mutex and timed_mutex, which I think can really be one and the same class.
I've also been experimenting with a timed_mutex adaptor:
[snipped example impl] Intriguing thought. It's certainly doable, and can also be specialized to allow for optimization where the adapted mutex is already timed-capable.
This is even less mature than the recursive_mutex. But hopefully people can see where I'm trying to go: Instead of having recursive and timed flavors of every kind of mutex (exclusive, sharable, upgradable), one might instead have adaptors that one could apply to these, and perhaps to user-defined mutexes as well (more motivation for mutex concepts: here's more generic mutex code people could plug into).
I like the idea of providing adaptors so that behaviour can be added to basic mutexes. Maybe some traits or Concepts could be used to mean that people didn't have to explicitly specialize timed_mutex_adaptor<custom_mutex> if custom_mutex already supported timed locks. Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock: bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
I like it. -Howard

Howard Hinnant wrote:
On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
I like it.
Can I respectfully suggest time_duration_type instead of time_period_type? That would bring the terminology in line with N1900/N2058. I'm assuming what you actually want to do is have user code that looks like this: if (try_lock(seconds(3))) {... if (try_lock(milliseconds(100))) { ... In boost date_time and N1900/N2058 time_period is an interval type with a start time and an end time. Oh, and if N1900 isn't persuasive enough, 'duration' happens to be the term ISO 8601 uses define a length of time. Jeff

On Nov 2, 2006, at 11:05 AM, Jeff Garland wrote:
Howard Hinnant wrote:
On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
I like it.
Can I respectfully suggest time_duration_type instead of time_period_type? That would bring the terminology in line with N1900/N2058. I'm assuming what you actually want to do is have user code that looks like this:
if (try_lock(seconds(3))) {...
if (try_lock(milliseconds(100))) { ...
In boost date_time and N1900/N2058 time_period is an interval type with a start time and an end time. Oh, and if N1900 isn't persuasive enough, 'duration' happens to be the term ISO 8601 uses define a length of time.
<nod> As far as I'm concerned this is implied. On Nov 2, 2006, at 10:58 AM, Doug Gregor wrote:
Are there conversions from integral types to target_time_type and/or time_period_type? If so, I would be quite surprised if try_lock(1000) spun 1,000 times before failing, rather than 1,000 milliseconds (for instance).
This works perfectly on a machine where 1 spin takes 1 millisecond! :-) I think we're into brainstorming here and not specifying precise interfaces. You guys are an iteration or two ahead of us. But thanks for leading the way! :-) -Howard

Jeff Garland <jeff@crystalclearsoftware.com> writes:
Howard Hinnant wrote:
On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
I like it.
Can I respectfully suggest time_duration_type instead of time_period_type?
Of course; the names were just placeholders.
That would bring the terminology in line with N1900/N2058. I'm assuming what you actually want to do is have user code that looks like this:
if (try_lock(seconds(3))) {...
if (try_lock(milliseconds(100))) { ...
Yes.
In boost date_time and N1900/N2058 time_period is an interval type with a start time and an end time. Oh, and if N1900 isn't persuasive enough, 'duration' happens to be the term ISO 8601 uses define a length of time.
Thanks for the references. Duration is what I meant. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
Sorry, naive user just woke up and saw this fly by... Are there conversions from integral types to target_time_type and/or time_period_type? If so, I would be quite surprised if try_lock(1000) spun 1,000 times before failing, rather than 1,000 milliseconds (for instance). Cheers, Doug

Doug Gregor <dgregor@cs.indiana.edu> writes:
On Nov 2, 2006, at 10:09 AM, Anthony Williams wrote:
Your sample adaptor has given me the idea of not having an explicit timed_lock function, but rather overloads of try_lock:
bool try_lock(); // just try once bool try_lock(unsigned spin_count); // spin this many times bool try_lock(target_time_type target_time); // wait until the specified time bool try_lock(time_period_type wait_time); // wait for the specified period
Sorry, naive user just woke up and saw this fly by...
Are there conversions from integral types to target_time_type and/or time_period_type? If so, I would be quite surprised if try_lock(1000) spun 1,000 times before failing, rather than 1,000 milliseconds (for instance).
I would expect time_period_type and target_time_type to have explicit constructors. The usage I imagine is something like: mutex m; m.try_lock(milliseconds(1000)); // wait 1s m.try_lock(seconds(1000); // wait 1000s m.try_lock(date_time(2006,11,2,16,9,27)); // wait until 16:09:27 on 2nd November 2006 m.try_lock(1000); // spin 1000 times I'm not averse to m.try_lock(spin(1000)); // spin 1000 times but I'm not sure it's necessary. Just specifying a plain number begs the question "1000 what?", and the docs will make it clear that it's a spin count. Consider: sleep(5); Sleep(5); One waits 1000 times as long as the other. It's not obvious to the casual observer what the time units are; you have to know your API. Anthony -- Anthony Williams Software Developer Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk

Anthony Williams wrote:
Roland Schwarz <roland.schwarz@chello.at> writes:
... Please put me a short note in case you are starting with this, so we do not end up editing the same files concurrently.
Sure, I'll help. I'll drop you a note when I get on it.
Not necessary anymore. I have finished it. Roland
participants (8)
-
Anthony Williams
-
Chris Thomasson
-
Doug Gregor
-
Howard Hinnant
-
Jeff Garland
-
John Maddock
-
Peter Dimov
-
Roland Schwarz