Spurious wakeup

From Wikipedia, the free encyclopedia

In the POSIX thread API, the function pthread_cond_wait is used to wait on a condition variable. A naive programmer might expect that when a thread returns from this function, the condition associated with the condition variable will be true. However, it is recommended that all threads check the condition after returning from pthread_cond_wait because there are several reasons the condition might not be true. One of these reasons is a spurious wakeup; that is, a thread might get woken up even though no thread signalled the condition.

According to Butenhof's Programming with POSIX Threads ISBN 0-201-63392-2:

"This means that when you wait on a condition variable, the wait may (occasionally) return when no thread specifically broadcast or signalled that condition variable. Spurious wakeups may sound strange, but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations. The race conditions that cause spurious wakeups should be considered rare."

However, in later personal correspondence, Butenhof admitted:

"Though there were indeed some members of the working group who argued that it was theoretically possible to imagine that there might be such an implementation, that wasn't really the reason. (And they were never able to prove it.) POSIX threads were the result of a lot of tension between pragmatic hard realtime programmers and largely academic researchers. Spurious wakeups are the mechanism of an academic computer scientist clique to make sure that everyone had to write clean code that checked and verified predicates!
"But the (perhaps) largely spurious (or at least arcanely philosophical) 'efficiency' argument went over better with the realtime people, and the real reason was usually relegated to second place in the rationale.
"I've thought many times about how you might construct a correct and practical implementation that would really have spurious wakeups. I've never managed to construct an example. Doesn't mean there isn't one, though, and it makes a good story."

[edit] Spurious wakeup in Linux

The pthread_cond_wait() function in Linux is implemented using the futex system call. Each blocking system call on Linux returns abruptly with EINTR when the process receives a signal. A POSIX signal will therefore generate a spurious wakeup. This state is not trivial to fix due to 2 reasons:

  • Making signal delivery not interrupt system calls keeps the stack used. If another system call is invoked during a userspace signal handling routine, and that system call is interrupted too, etc, the kernel stack could run out quickly. Returning with EINTR allows to keep stack usage under control. glibc checks (or supposed to) for EINTR after every blocking system call. The futex data structure contains enough information to restart these calls.
  • pthread_cond_wait() can't restart the waiting because it may miss a real wakeup in the little time it was outside the futex system call. This race condition can only be avoided by the caller checking for an invariant.

[edit] Other reasons for verifying the invariant

Practical reasons exist for checking the invariant after a return from a wait other than spurious wakeups. For example, a waked-up thread may not be scheduled immediately after the wake up, but be at the mercy of the system scheduler. A scheduler may preempt a process abruptly or schedule other threads. It may be the case that in the mean time, an external entity (another process, hardware) has invalidated the invariant assumption. Wrapping the wait with a loop avoids such cases.

[edit] External links