MESI protocol

MESI state diagram. PrRd = Processor Read - Read request from processor, PrWr = Processor Write - Write request from processor, BusRd = Bus - Read request from the bus without intent to modify. The 'S' denotes that the shared signal was asserted by another cache. BusRdX = Bus Read Exclusive - Read request from the bus with intent to modify. The transitions are labeled "action observed/action performed".
Activity diagram of MESI protocol. "Main mem." could be replaced by "L2 cache" depending on the processor.

The MESI protocol (known also as Illinois protocol due to its development at the University of Illinois at Urbana-Champaign[1]) is a widely used cache coherence and memory coherence protocol. It is the most common protocol which supports write-back cache.

States

Every cache line is marked with one of the four following states (coded in two additional bits):

Modified
The cache line is present only in the current cache, and is dirty; it has been modified from the value in main memory. The cache is required to write the data back to main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. The write-back changes the line to the Shared state.
Exclusive
The cache line is present only in the current cache, but is clean; it matches main memory. It may be changed to the Shared state at any time, in response to a read request. Alternatively, it may be changed to the Modified state when writing to it.
Shared
Indicates that this cache line may be stored in other caches of the machine and is clean; it matches the main memory. The line may be discarded (changed to the Invalid state) at any time.
Invalid
Indicates that this cache line is invalid (unused).

For any given pair of caches, the permitted states of a given cache line are as follows:

 M   E   S   I 
 M  Red X Red X Red X Green tick
 E  Red X Red X Red X Green tick
 S  Red X Red X Green tick Green tick
 I  Green tick Green tick Green tick Green tick

Operation

In a typical system, several caches share a common bus to main memory. Each also has an attached CPU which issues read and write requests. The caches' collective goal is to minimize the use of the shared main memory.

A cache may satisfy a read from any state except Invalid. An Invalid line must be fetched (to the Shared or Exclusive states) to satisfy a read.

A write may only be performed if the cache line is in the Modified or Exclusive state. If it is in the Shared state, all other cached copies must be invalidated first. This is typically done by a broadcast operation known as Request For Ownership (RFO).

A cache may discard a non-Modified line (i.e. Shared or Exclusive) at any time, changing to the Invalid state. A Modified line must be written back first.

A cache that holds a line in the Modified state must snoop (intercept) all attempted reads (from all of the other caches in the system) of the corresponding main memory location and insert the data that it holds. This is typically done by forcing the read to back off (i.e. retry later), then writing the data to main memory and changing the cache line to the Shared state.

A cache that holds a line in the Shared state must listen for invalidate or request-for-ownership broadcasts from other caches, and discard the line (by moving it into Invalid state) on a match.

A cache that holds a line in the Exclusive state must also snoop all read transactions from all other caches, and move the line to Shared state on a match.

The Modified and Exclusive states are always precise: i.e. they match the true cache line ownership situation in the system. The Shared state may be imprecise: if another cache discards a Shared line, this cache may become the sole owner of that cache line, but it will not be promoted to Exclusive state. Other caches do not broadcast notices when they discard cache lines, and this cache could not use such notifications without maintaining a count of the number of shared copies.

In that sense the Exclusive state is an opportunistic optimization: If the CPU wants to modify a cache line that is in state S, a bus transaction is necessary to invalidate all other cached copies. State E enables modifying a cache line with no bus transaction.

Read For Ownership

A Read For Ownership (RFO) is an operation in cache coherency protocols that combines a read and an invalidate broadcast. The operation is issued by a processor trying to write into a cache line that is in the shared (S) or invalid (I) states of the MESI protocol. The operation causes all other processors to set the state of such a line to I. A read for ownership transaction is a read operation with intent to write to that memory address. Therefore this operation is exclusive. It brings data to the cache and invalidates all other processor caches which hold this memory line.

Memory Barriers

MESI in its naive, straightforward implementation exhibits two particular low-performance behaviours. First, when writing to an invalid cache line, there is a long delay while the line is fetched from another CPU. Second, moving cache lines to the invalid state is time-consuming.

To mitigate these delays, CPUs implement store buffers and invalidate queues.

A store buffer is used when writing to an invalid cache line. Since the write will proceed anyway, the CPU issues a read-invalid message (hence the cache line in question and all other CPUs' cache lines which store that memory address are invalidated) and then pushes the write into the store buffer, to be executed when the cache line finally arrives in the cache.

A direct consequence of the store buffer's existence is that when a CPU commits a write, that write is not immediately written in the cache. Therefore, whenever a CPU needs to read a cache line, it first has to scan its own store buffer for the existence of the same line, as there is a possibility that the same line was written by the same CPU before but hasn't yet been written in the cache (the preceding write is still waiting in the store buffer). Note that while a CPU can read its own previous writes in its store buffer, other CPUs *cannot see those writes* before they are flushed from the store buffer to the cache - a CPU cannot scan the store buffer of other CPUs.

With regard to invalidation messages, CPUs implement invalidate queues, whereby incoming invalidate requests are instantly acknowledged but not in fact acted upon. Instead, invalidation messages simply enter an invalidation queue and their processing occurs as soon as possible (but not necessarily instantly). Consequently, a CPU can be oblivious to the fact that a cache line in its cache is actually invalid, as the invalidation queue contains invalidations which have been received but haven't yet been applied. Note that, unlike the store buffer, the CPU can't scan the invalidation queue, as that CPU and the invalidation queue are physically located on opposite sides of the cache.

As a result, memory barriers are required. A store barrier will flush the store buffer, ensuring all writes have been applied to that CPU's cache. A read barrier will flush the invalidation queue, thus ensuring that all writes by other CPUs become visible to the flushing CPU.

Furthermore, memory management units do not scan the store buffer, causing similar problems. This effect is already visible in single threaded processors. [2]

See also

References

  1. Papamarcos, M. S.; Patel, J. H. (1984). "A low-overhead coherence solution for multiprocessors with private cache memories". Proceedings of the 11th annual international symposium on Computer architecture - ISCA '84 (PDF). p. 348. doi:10.1145/800015.808204. ISBN 0818605383. Retrieved March 19, 2013.
  2. Chen, G.; Cohen, E.; Kovalev, M. (2014). "Store Buffer Reduction with MMUs". Verified Software: Theories, Tools and Experiments. Lecture Notes in Computer Science 8471. p. 117. doi:10.1007/978-3-319-12154-3_8. ISBN 978-3-319-12153-6.

External links

This article is issued from Wikipedia - version of the Saturday, December 05, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.