Re: [boost] Boost Threads Condition may block forever on clock change

26 Nov 2007

      Thorsten Froehlich <froetho <at> iit.edu> writes:
...
Anthony Williams wrote:
...
...
The implementation uses an absolute time internally, since absolute times are
composable --- though the win32 API calls are made with a timeout in
milliseconds, it is the number of milliseconds until the supplied absolute
time.  This allows for multiple win32 API calls without having to work out
how much of the timeout has elapsed --- this is done implicitly by the
calculation of the number of milliseconds remaining.
...
...
Ideally we would have a monotonically increasing timer which is independent
of the system clock. Unfortunately, we don't have such a timer --- we have to
rely on the system's idea of UTC time. If the clock is set back after the
timeout for the wait has been chosen, but before the number of milliseconds
to wait in a win32 API call has been calculated, the number of milliseconds
will be rather large.
...
With all due respect, it should be clear that such a dependency on a
user-controlled setting that can indefinably block a program is simply not an
acceptable design choice for professional programmers. No matter what the
reason, infinite blocking behavior is a bug in the boost implementation and
must be fixed in boost, not elsewhere.
Harsh words.

This is not infinite blocking, just blocking with a long timeout (e.g. an hour
if you set the clock back an hour).

This has been a property of boost threads since it was first committed, over 6
years ago. The problem is that the POSIX and win32 APIs have different policies
with respect to timeouts. POSIX takes absolute times, whereas win32 takes
relative times.

As the POSIX spec points out, there is a race condition implementing an absolute
timeout on top of a relative-timeout-based API. If we want to support absolute
timeouts (and we do), then their use on Windows will always be subject to this
race condition. If the user changes the clock, this just exacerbates the
problem. However, the timeout is an absolute time: if you just set the clock
back an hour, then it's an hour longer before the specified absolute time is
reached. The problem is when the clock is advanced forward passed the absolute
timeout --- on Windows, the timeout is expressed in milliseconds, and this is
independent of the clock time, so once we're waiting, we're waiting.

In addition, the win32 API does not support condition variables pre-Vista, so we
need to implement them using the available win32 primitives. This requires
waiting multiple times on different synchronization primitives, and looping. The
only reliable way to get the timeout right is to calculate the absolute timeout
at the start, and use that as the basis for all the relative timeouts on the
individual calls.

Having said all that, there may be things that can be done. Windows apps
*should* send WM_TIMECHANGE when they update the clock, so if we're in a
message-handling thread, and we receive that message, we can potentially handle
that by interrupting the wait and resuming based on the new clock time. That's
quite a few "if"s, though.

The GetTickCount API is good for 49 days, so we could use that as the basis for
the timeout once the wait routine was actually called, but it still won't handle
clock changes --- once the timeout is calculated it will remain fixed.

This is a hard problem, and not one to be dismissed with "simply not an
acceptable design choice for professional programmers".

Anthony