Lock (computer science)
From Wikipedia, the free encyclopedia
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.
Contents |
[edit] Types
Generally, locks are advisory locks, where each thread cooperates by acquiring the lock before accessing the corresponding data. Some systems also implement mandatory locks, where attempting unauthorized access to a locked resource will force an exception in the entity attempting to make the access.
A (binary) semaphore is the simplest type of lock. No distinction is made between shared (read only) or exclusive (read and write) modes. Other schemes provide for a shared mode, where several threads can acquire a shared lock for read-only access to the data. Other modes such as shared, exclusive, intend-to-exclude and intend-to-upgrade are also widely implemented.
Independent of the type of lock chosen above, locks can be classified by what happens when the lock strategy prevents progress of a thread. Most locking designs block the execution of the process requesting the lock until it is allowed to access the locked resource. A spinlock is a lock where the thread simply waits ("spins") until the lock becomes available. It is very efficient if threads are only likely to be blocked for a short period of time, as it avoids the overhead of operating system process re-scheduling. It is wasteful if the lock is held for a long period of time.
[edit] Implementation
Locks typically require hardware support for efficient implementation. This usually takes the form of one or more atomic instructions such as "test-and-set", "fetch-and-add" or "compare-and-swap". These instructions allow a single process to test if the lock is free, and if free, acquire the lock in a single atomic operation.
Uniprocessor architectures have the option of using uninterruptable sequences of instructions, using special instructions or instruction prefixes to disable interrupts temporarily, but this technique does not work for multiprocessor shared-memory machines. Proper support for locks in a multiprocessor environment can require quite complex hardware and/or software support, with substantial synchronization issues.
The reason an atomic operation is required is because of concurrency, where more than one task executes the same logic. For example, consider the following C code:
if (lock == 0) lock = myPID; /* lock free - set it */
The above example does not guarantee that the task has the lock, since more than one task can be testing the lock at the same time. Since both tasks will detect that the lock is free, both tasks will attempt to set the lock, not knowing that the other task is also setting the lock. Dekker's or Peterson's algorithm are possible substitutes if atomic locking operations are not available.
Careless use of locks can result in deadlock. This occurs when a process holds a lock and then attempts to acquire a second lock. If the second lock is already held by another process, the first process will be blocked. If the second process then attempts to acquire the lock held by the first process, the system has "deadlocked": no progress will ever be made. A number of strategies can be used to avoid or recover from deadlocks, both at design-time and at run-time.
[edit] Granularity
Before introducing lock granularity, one needs to understand three concepts about locks.
- lock overhead: The extra resources for using locks, like the memory space allocated for locks, the CPU time to initialize and destroy locks, and the time for acquiring or releasing locks. The more locks a program uses, the more overhead associated with the usage.
- lock contention: This occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more granular the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.)
- deadlock: The situation when two tasks that are waiting on locks, each holding a lock that the other is waiting for. Unless something is done, the two tasks will wait forever.
So there is a tradeoff between decreasing lock overhead and decreasing lock contention when choosing the number of locks in synchronization.
An important property of a lock is its granularity. The granularity is a measure of the amount of data the lock is protecting. In general, choosing a coarse granularity (a small number of locks, each protecting a large segment of data) results in less lock overhead when a single process is accessing the protected data, but worse performance when multiple processes are running concurrently. This is because of increased lock contention the more coarse the lock, the higher the likelihood that the lock will stop an unrelated process from proceeding. Conversely, using a fine granularity (a larger number of locks, each protecting a fairly small amount of data) increases the overhead of the locks themselves but reduces lock contention. More locks also increase the risk of deadlock.
In a database management system, for example, a lock could protect, in order of increasing granularity, a record, a data page, or an entire table. Coarse granularity, such as using table locks, tends to give the best performance for a single user, whereas fine granularity, such as record locks, tends to give the best performance for multiple users.
[edit] Database locks
In databases, locks can be used as a means of ensuring transaction synchronicity. i.e. when making transaction processing concurrent (interleaving transactions), using 2-phased locks ensures that the concurrent execution of the transaction turns out equivalent to some serial ordering of the transaction. However, deadlocks become an unfortunate side-effect of locking in databases. Deadlocks are either prevented by pre-determining the locking order between transactions or are detected using waits-for graphs. An alternate to locking for database synchronicity while avoiding deadlocks involves the use of totally ordered global timestamps.
[edit] The problems with locks
Lock-based resource protection and thread/process synchronization has many disadvantages:
- It is a blocking method, which means the thread/process has to wait until a lock held by others is released.
- Using locks is a conservative approach because each thread has to acquire the lock whenever there is a possibility of access conflict, which is actually rare in real execution. A conservative approach usually induces unnecessary overhead.
- Locks are vulnerable to failures and faults. If one thread holding a lock dies, other threads waiting for the lock may wait forever.
- Programming using locks is error-prone, like the notorious deadlock.
- Using locks does not scale with problem size and complexity.
- Have to balance the granularity of locked data against the costs of fine-grain locks.
- locks are not composable. For example, deleting Item X of Table A and inserting X into Table B cannot be combined as one single atomic operations using locks.
- Priority inversion. High priority threads/processes cannot proceed if a low priority thread/process is holding the common lock.
- Convoying. All other threads have to wait if a thread holding a lock is descheduled due to a time-slice interrupt or page fault (See lock convoy)
- Hard to debug: Bugs associated with locks are time dependent. They are extremely hard to repeat.
One strategy is to avoid locks entirely by using non-blocking synchronization methods, like lock-free programming techniques and transactional memory.
[edit] The lock keyword
In the C# programming language, the lock keyword can be used to ensure that a block of code runs to completion without interruption by other threads, similar to the Java keyword synchronized. For more information, see Barrier (computer science).