New subject: [thread] Timed waits in Boost.Thread potentiallyfundamentally broken on Windows (possibly rest of Boost too)

23 Jan 2015

      Dear all,

CC: Stephan @ Microsoft - Stephan I'd love to know what the MSVC STL 
does below so we have the option of matching your behaviour.

During investigating this bug report for Boost.Thread 
(https://svn.boost.org/trac/boost/ticket/9856) I have discovered a 
very worrying situation: it would appear that potentially all timed 
waits in Boost.Thread, and potentially in other parts of Boost, are 
broken on Windows Vista and later and have been for some years.

The problem is in correct handling of timeouts. If one does this:

mutex mtx;
condition_variable cond;  
unique_lock<mutex> lk(mtx);
assert(cv_status::timeout == cond.wait_for(lk, chrono::seconds(1)));

... one would reasonably expect occasional failures on POSIX due to 
spurious wakeups. It turns out that this also spuriously fails on 
Windows, which is a surprise probably to many as Windows hides signal 
handling (actually APCs) inside its Win32 APIs and automatically 
restarts the operation after interruption. There is, therefore, the 
potential that quite a lot of code written to use Boost.Thread on 
Windows makes the hard assumption that the assert above will never 
fail.

The reason why Windows spuriously fails above isn't due to spurious 
wakeups, it is in fact due to changes in the Vista kernel scheduler 
as documented at 
https://technet.microsoft.com/en-us/magazine/2007.02.vistakernel.aspx.
 In essence, if you now ask Windows to go sleep for X milliseconds, 
Windows Vista onwards will in fact sleep for anywhere between zero 
and X+N milliseconds where N is some arbitrarily long value. In other 
words, timeouts in Windows are purely advisory, and are freely 
ignored by the Windows kernel from Vista onwards. You can test this 
for yourself using this little program which reduces the #9856 bug 
report to its Win32 API essentials:

#include <windows.h>
#include <stdio.h>
#include <chrono>

int main(void)
{
  ULONG ulDelay_ms = 20;
  HANDLE hSemaphoreDelay = CreateSemaphore(NULL, 0, 1, NULL);

  for (size_t n = 0; n < 50; n++)
  {
    while (1) {
      auto begin = std::chrono::high_resolution_clock::now();
      ULONG hr = WaitForSingleObject(hSemaphoreDelay, ulDelay_ms);
      auto end = std::chrono::high_resolution_clock::now();
      auto diff = end - begin;

      if (hr == WAIT_ABANDONED)
        printf("Wait Abandoned ");
      else if (hr == WAIT_TIMEOUT)
        printf("Timed out ");
      else
        printf("Signaled ");
      DWORD lTDelta = 
std::chrono::duration_cast<std::chrono::milliseconds>(diff).count();
      printf("Target Wait Interval: %u Real Wait Interval: %u 
(%u)\n", ulDelay_ms, lTDelta, diff.count());
      if (lTDelta >= ulDelay_ms) break;
    }
    printf("\n");
  }

  CloseHandle(hSemaphoreDelay);
  return 0;
}

As you'll see, actual time waited is anywhere between zero and 20 + N 
milliseconds where actual time waited is sometimes a whole integer 
multiple of the Windows kernel granularity (15 ms) or some fraction 
of that granularity. Most of the time he tries to hit what you asked 
for, but you get no guarantees.

More detail about how and why Windows Vista onwards does this can be 
found at 
http://forum.sysinternals.com/bug-in-waitable-timers_topic16229.html.

This raises the question about what to do with Boost.Thread. We have 
the following options:

Option 1: Timed waits are allowed to spuriously fail by the standard, 
so we mark this as wontfix and move on. Anyone using the predicate 
timed waits has never seen a problem here anyway.

Option 2: We loop waiting until steady_clock (really 
QueryPerformanceCounter under Boost) shows the requested timeout has 
passed. Problem: This wastes battery power and generates needless 
wakeups. A more intelligent implementation would ask Windows for the 
thread quanta and transform timeouts to match the Vista kernel 
scheduler in combination with always using deadline scheduling, but 
this would slow down the timed waits implementation.

Option 3: We adjust Boost.Thread to return timeouts when Windows 
returns a timed out status code, even if the actual time waited is 
considerably lower than the time requested. Problem: some code 
written for POSIX where when you ask for a timeout you always get it 
may misbehave in this situation.

Boost commmunity, I turn it over to you for advice!

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

[thread] Timed waits in Boost.Thread potentially fundamentally broken on Windows (possibly rest of Boost too)

Niall Douglas

Andrey Semashev

Niall Douglas

Andrey Semashev

Niall Douglas

Andrey Semashev

Peter Dimov

Niall Douglas

Peter Dimov

Niall Douglas

Peter Dimov

Andrey Semashev

Niall Douglas

Vicente J. Botet Escriba

Vicente J. Botet Escriba

Peter Dimov

Peter Dimov

tags

participants (4)