Seqlock
From Wikipedia, the free encyclopedia
A seqlock (short for "sequential lock") is a special locking mechanism used in Linux for supporting fast writes of shared variables between two parallel operating system routines. The semantics stabilized as of version 2.5.59, and they are present in the 2.6.x stable kernel series. The seqlocks were developed by Stephen Hemminger and originally called frlocks.
It is a reader-writer consistent mechanism which avoids the problem of writer starvation. A seqlock consists of storage for saving a sequence number in addition to a lock. The lock is to support synchronization between two writers and the counter is for indicating consistency in readers. In addition to updating the shared data, the writer increments the sequence number, both after acquiring the lock and before releasing the lock. Readers read the sequence number before and after reading the shared data. If the sequence number is odd on either occasion, a writer had taken the lock while the data was being read and it may have changed. If the sequence numbers are different, a writer has changed the data while it was being read. In either case readers simply retry (using a loop) until they read the same even sequence number before and after.
It should be noted that the reader never blocks, but it may have to retry if a write is in progress; this speeds up the readers in the case where the data was not modified, since they do not have to acquire the lock as they would with a traditional read-write lock. Also, writers do not wait for readers, whereas with traditional read-write locks they do, leading to potential resource starvation in a situation where there are a number of readers (because the writer must wait for there to be no readers). Because of these two factors, seqlocks are more efficient than traditional read-write locks for the situation where there are many readers and few writers.
It should also be noted that the technique will not work for data that contains pointers, because any writer could invalidate a pointer that a reader has already followed.
This was first applied to system time counter updating. Each time interrupt updates the time of the day; there may be many readers of the time for operating system internal use and applications, but writes are relatively infrequent and only occur one at a time. The BSD timecounter code for instance appears to use a similar technique.