
On Friday 23 January 2015 12:08:38 Niall Douglas wrote:
Dear all,
CC: Stephan @ Microsoft - Stephan I'd love to know what the MSVC STL does below so we have the option of matching your behaviour.
During investigating this bug report for Boost.Thread (https://svn.boost.org/trac/boost/ticket/9856) I have discovered a very worrying situation: it would appear that potentially all timed waits in Boost.Thread, and potentially in other parts of Boost, are broken on Windows Vista and later and have been for some years.
The problem is in correct handling of timeouts. If one does this:
mutex mtx; condition_variable cond; unique_lock<mutex> lk(mtx); assert(cv_status::timeout == cond.wait_for(lk, chrono::seconds(1)));
... one would reasonably expect occasional failures on POSIX due to spurious wakeups. It turns out that this also spuriously fails on Windows, which is a surprise probably to many as Windows hides signal handling (actually APCs) inside its Win32 APIs and automatically restarts the operation after interruption. There is, therefore, the potential that quite a lot of code written to use Boost.Thread on Windows makes the hard assumption that the assert above will never fail.
That assert is false. Any code that assumes that cv.wait() does not spuriously wake up is buggy, Windows or not. This is specified so for std::condition_variable as well.
This raises the question about what to do with Boost.Thread. We have the following options:
Option 1: Timed waits are allowed to spuriously fail by the standard, so we mark this as wontfix and move on. Anyone using the predicate timed waits has never seen a problem here anyway.
That's right. There's nothing to fix, spurious wakeups are expected and should be accounted for in the user's code regardless of the underlying operating system.
Option 2: We loop waiting until steady_clock (really QueryPerformanceCounter under Boost) shows the requested timeout has passed. Problem: This wastes battery power and generates needless wakeups. A more intelligent implementation would ask Windows for the thread quanta and transform timeouts to match the Vista kernel scheduler in combination with always using deadline scheduling, but this would slow down the timed waits implementation.
That is a missed notification waiting to happen.
Option 3: We adjust Boost.Thread to return timeouts when Windows returns a timed out status code, even if the actual time waited is considerably lower than the time requested. Problem: some code written for POSIX where when you ask for a timeout you always get it may misbehave in this situation.
That is simply incorrect. Why would you indicate a timeout when none occurred? This will surely break some timed code. The standard description is pretty clear: return cv_status::timeout only when the timeout has expired, otherwise return cv_status::no_timeout. Boost.Thread should follow this.