
Thorsten Froehlich <froetho <at> iit.edu> writes:
Anthony Williams wrote:
The implementation uses an absolute time internally, since absolute times are composable --- though the win32 API calls are made with a timeout in milliseconds, it is the number of milliseconds until the supplied absolute time. This allows for multiple win32 API calls without having to work out how much of the timeout has elapsed --- this is done implicitly by the calculation of the number of milliseconds remaining.
Ideally we would have a monotonically increasing timer which is independent of the system clock. Unfortunately, we don't have such a timer --- we have to rely on the system's idea of UTC time. If the clock is set back after the timeout for the wait has been chosen, but before the number of milliseconds to wait in a win32 API call has been calculated, the number of milliseconds will be rather large.
With all due respect, it should be clear that such a dependency on a user-controlled setting that can indefinably block a program is simply not an acceptable design choice for professional programmers. No matter what the reason, infinite blocking behavior is a bug in the boost implementation and must be fixed in boost, not elsewhere.
Harsh words. This is not infinite blocking, just blocking with a long timeout (e.g. an hour if you set the clock back an hour). This has been a property of boost threads since it was first committed, over 6 years ago. The problem is that the POSIX and win32 APIs have different policies with respect to timeouts. POSIX takes absolute times, whereas win32 takes relative times. As the POSIX spec points out, there is a race condition implementing an absolute timeout on top of a relative-timeout-based API. If we want to support absolute timeouts (and we do), then their use on Windows will always be subject to this race condition. If the user changes the clock, this just exacerbates the problem. However, the timeout is an absolute time: if you just set the clock back an hour, then it's an hour longer before the specified absolute time is reached. The problem is when the clock is advanced forward passed the absolute timeout --- on Windows, the timeout is expressed in milliseconds, and this is independent of the clock time, so once we're waiting, we're waiting. In addition, the win32 API does not support condition variables pre-Vista, so we need to implement them using the available win32 primitives. This requires waiting multiple times on different synchronization primitives, and looping. The only reliable way to get the timeout right is to calculate the absolute timeout at the start, and use that as the basis for all the relative timeouts on the individual calls. Having said all that, there may be things that can be done. Windows apps *should* send WM_TIMECHANGE when they update the clock, so if we're in a message-handling thread, and we receive that message, we can potentially handle that by interrupting the wait and resuming based on the new clock time. That's quite a few "if"s, though. The GetTickCount API is good for 49 days, so we could use that as the basis for the timeout once the wait routine was actually called, but it still won't handle clock changes --- once the timeout is calculated it will remain fixed. This is a hard problem, and not one to be dismissed with "simply not an acceptable design choice for professional programmers". Anthony