Talk:Spinlock
From Wikipedia, the free encyclopedia
There are many errors in this page:
- The example code:
- the global variable needs a "volatile"
- there is no lock! This code demonstrates polling, not a spinlock.
- The drawn conclusions are wrong.
- The code does not work (or only by pure chance). You cannot do spinlocks in C (or any other higher language)! The update must be atomic, special assembler instructions must be used for this. (otherwise, you get a race condition)
- If you are using pthreads anyway, use pthreads_spinlock, since they have the same effect but are implemented in a useful way (see below; no, they are not really spinlocks).
- Spinlocks are not useful in User-space.
- In Kernels, they do have an application for SMP-System (on single CPU systems they are never useful). However, the art of kernel design is to avoid spinlock as far as possible (by using per-cpu data, for example).
- the alternatives for spinlocks are.... strange. Yielding (instead of busy waiting) per cpu/thread data, more complex synchronization mechanisms are better examples.
- the problem of atomic updating of the lock is not touched at all.
- even though many synchronization libraries offer something called spin lock, this is implemented in a different way. One example is the pthread library.
- spinlock are only used if the updating of the data is a bit more complex. A simple increment, decrement, setting, etc. of a single word is done with atomic updates (if available on the particular architecture).
- the reason to wait for the release of a spinlock is not (or to be more realistic: should never) be to avoid rescheduling, since that would take a long time. Rather, only use spinlock if you are sure that other holders will hold it for a (short) maximum time (rather in terms of nanoseconds than in number of instructions) AND you cannot do rescheduling (in a "normal" way). You also better make sure, that the holder gets not preempted, etc.
- The dangers of spinlocks are not stated (and how to avoid it):
- memory bus usage
- cache trashing
- pipeline bubbles
(plus some more...) Don't have time right now, will correct it soon, however... --Uvatter 14:02, 17 Dec 2004 (UTC)
[edit] Busy waiting page redirects here.
A lot of people, textbooks, and several of the papers cited at the bottom of this article use spinlocks and busy waiting interchangeably. Somebody else set up a redirect for the Busy Waiting page to route here, so that might be where the confusion comes from.
However, it sounds like you're not down with that, and I hate arguing about semantics. I'll undo the redirect for the busy waiting page, and copy my code there. I suggest you use this page to talk about the particular OS kernel spinlocks you interested in. Does that work for you?
--Waxmop 22:29, 29 Dec 2004 (UTC)
Spinlocks are an application of busy waiting, but a very specific one. I've written up new details in line with some of the suggestions above, although I haven't gone into quite that much depth.
Can somebody check my example code? Not quite sure I've got that right. JulesH 23:44, 9 July 2005 (UTC)
--
Spinlocks apply to OPERATING SYSTEMS-- not software engineering!
[edit] Merge with busy waiting
I say no. While spin locks are a form of busy waiting, there are also lots of other ways you can busy wait and spin locks are an important enough concept on their own to merit their own page. --NJM 03:42, 8 November 2005 (UTC)
I agree (unsigned comment by 72.136.48.238, 17 December 2005 -- attribution corrected JulesH 17:31, 20 December 2005 (UTC))
Absolutely should not be merged. I came to look for an article on spinlocks, not busy waiting. The concepts are related, but they merit two separate articles.
This has been up here for months now, and nobody has come up with a reason to perform the merge. I suggest removing the template suggesting it. JulesH 14:42, 18 January 2006 (UTC)
I've gone ahead and removed it. JulesH 13:25, 12 February 2006 (UTC)
[edit] Should the "significant optimizations" section be here?
The example is here just to give an idea of what a spinlock does and happens to be in x86 asm because that's relatively common. However, the opimizations section goes into great detail with x86 minutiae that isn't really relevant to the article and probably just confuses the issue. I think it should be removed and replaced with a note that the example was written to be easy to understand and isn't the most optimal solution. --NJM 06:44, 28 May 2006 (UTC)
- I'm not sure. While some of the details are probably irrelevant, some of them are useful to know about. It may not be an ideal situation, but most of the world's general-purpose computers are based on the x86 architecture, so optimising things for it is important. And the information about cache protocols preventing bus traffic is unexpected; I think most people would expect a spinlock to generate a lot of bus traffic and therefore be inefficient because of that, so this is quite important and not x86-specific. All the information presented may be useful to spinlock implementors, so it should perhaps stay. JulesH 07:57, 26 June 2006 (UTC)
[edit] "Memory ordering" vs "Memory operation ordering"
I've reverted a change from "memory ordering" to "memory operation ordering".
My reason for doing this is that while "memory operation ordering" is a clearer phrase that is likely to be more meaningful to someone not familiar with it, "memory ordering" is the phrase used by most people when discussing it (see, e.g., Intel's description of memory ordering on the Itanium [1]). JulesH 13:52, 17 July 2006 (UTC)
[edit] Dijkstra Spinlocks
Currently the article says
- Generally this is only possible with special assembly language instructions, such as atomic test-and-set operations, and cannot be implemented from high level languages like C.
I'm not sure what the author means by "generally", since it's definitely possible using the spinlock mechanisms described by Dikstra and Peterson in the 1960s. The main reason their mechanism is not used is that it's O(n) in time and space on the number of threads. I've used Dijkstra spinlocks in real commercial projects (written in C) where the hardware didn't have "compare and exchange" or "test and set" instructions. In fact, see Peterson's algorithm. RPTB1 13:08, 4 October 2006 (UTC)
- I'll admit, I've never heard this class of algorithm called by the name "spinlock", so never considered it necessary to mention them on this page, which is about an entirely different approach to mutual exclusion. Also, it isn't *generally* possible to use such an algorithm from C, as is noted in the article you linked to:
- Implementation of Peterson's and related algorithms on an out-of-order processor generally require use of [memory barrier] operations to work correctly to keep sequential operations from happening in an incorrect order.
- C does not provide any means of guaranteeing a memory barrier (although I'll grant that there are a variety of add-on libraries for it in most platforms that do, these are typically implemented in assembly language).
- A few other high level languages do include explicit memory barrier support (Java springs to mind; there is a memory barrier on entry and exit from a synchronized() block, IIRC) but these tend to be combined with higher level synchronization measures that make this kind of approach pointless, except perhaps for educational purposes.
- It is worth noting that this class of algorithm can be used on architectures that lack atomic operations, though. JulesH 13:28, 5 October 2006 (UTC)
[edit] Source required
I've added this sentence:
- But note that such an implementation may require more memory than a spinlock, be slower to allow progress after unlocking, and may not be implementable in a high-level language if out-of-order execution is in use.
While I'm confident that it's true, I could do with a source that confirms it. My primary reference for such things describes the algorithm, but fails to mention its lack of efficiency or any possible out-of-order problems. The closest it comes is describing the use of an atomic operation as "simpler". Any suggestions? JulesH 13:45, 5 October 2006 (UTC)
[edit] New section needed on "the dangers of spinlocks" as Uvatter mentioned above
- Deadlock/Livelock. In a real-time OS that obeys thread priorities, spinlocks can lead to deadlocks if used by threads with different priorities. The workaround is to never access spin-locks from threads of different priorities. Example: low-prio thread locks spin lock. High prio thread then tries to lock the spin lock. Deadlock, because the high-prio thread is spinning on the lock, and never allows the low-prio thread to unlock.
-
- I guess this is part of the motivation for so-called "hybrid" spinlocks as used by solaris: if you don't spin unless you know the process that owns the lock is scheduled to another processor, this situation can never arise. It's an interesting point, and I can certainly see it would be useful to discuss it. Do you know of any books and/or papers that discuss the issue, that we can use as a reference? JulesH 21:25, 12 June 2007 (UTC)
[edit] Getting read of atomic operation in unlock is not necessarily a "significant optimization"
The article says
- On later implementations of the x86 architecture, spin_unlock can safely use an unlocked MOV instead of the locked XCHG, which is much faster.
While it's much faster for the cpu that does the unlock, it can be actually slower overall, because the write to the spinlock to unlock it will tend to stay in the CPU's store buffer instead of going directly to memory/cache, meaning that other CPUs will see the update later, and will spin for a longer period of time.
For this reason, I don't think this qualifies as a "significant optimization". —The preceding unsigned comment was added by 67.188.127.3 (talk) 09:01, 22 December 2006 (UTC).
- I believe MESI or other cache coherency systems would take care of this issue in most architectures. Unless somebody knows otherwise? JulesH 21:29, 12 June 2007 (UTC)
- In fact, reading the description of MESI linked above, it definitely prevents the issue. When it modifies the item, it is stored into the CPU's cache and other processors' copies are invalidated. When they next read the item, they read it directly out of the modifying processor's cache, not out of main memory. JulesH 21:33, 12 June 2007 (UTC)