Translation lookaside buffer

From Wikipedia, the free encyclopedia

A Translation lookaside buffer (TLB) is a CPU cache that is used by memory management hardware to improve the speed of virtual address translation. All current desktop and server processors (such as x86) use a TLB. A TLB has a fixed number of slots containing page table entries, which map virtual addresses onto physical addresses. It is typically a content-addressable memory (CAM), in which the search key is the virtual address and the search result is a physical address. If the requested address is present in the TLB, the CAM search yields a match very quickly, after which the physical address can be used to access memory. If the requested address is not in the TLB, the translation proceeds using the page table, which is slower to access. Furthermore, the translation takes significantly longer if the translation tables are swapped out into secondary storage, which a few systems allow.

Contents

[edit] Overview

The TLB references physical memory addresses in its table. It may reside between the CPU and the CPU cache or between the CPU cache and primary storage memory. This depends on whether the cache uses physical or virtual addressing. If the cache is virtually addressed, requests are sent directly from the CPU to the cache, which then accesses the TLB as necessary. If the cache is physically addressed, the CPU does a TLB lookup on every memory operation, and the resulting physical address is sent to the cache. There are pros and cons to both implementations.

A common optimization for physically addressed caches is to perform the TLB lookup in parallel with the cache access. The low-order bits of any virtual address (e.g., in a virtual memory system having 4KB pages, the lower 12 bits of the virtual address) represent the offset of the desired address within the page, and thus they do not change in the virtual-to-physical translation. During a cache access, two steps are performed: an index is used to find an entry in the cache's data store, and then the tags for the cache line found are compared. If the cache is structured in such a way that it can be indexed using only the bits that do not change in translation, the cache can perform its "index" operation while the TLB translates the upper bits of the address. Then, the translated address from the TLB is passed to the cache. The cache performs a tag comparison to determine if this access was a hit or miss. It is possible to perform the TLB lookup in parallel with the cache access even if the cache must be indexed using some bits that may change upon address translation; see the address translation section in the cache article for more details about virtual addressing as it pertains to caches and TLBs.

[edit] Miss

Two schemes for handling TLB misses are commonly found in modern architectures:

  • With hardware TLB management, the CPU itself walks the page tables to see if there is a valid page table entry for the specified virtual address. If an entry exists, it is brought into the TLB and the TLB access is retried: this time the access will hit, and the program can proceed normally. If the CPU finds no valid entry for the virtual address in the page tables, it raises a page fault exception, which the operating system must handle. Handling page faults usually involves bringing the requested data into physical memory, setting up a page table entry to map the faulting virtual address to the correct physical address, and restarting the program (see Page fault for more details.)
  • With software-managed TLBs, a TLB miss generates a "TLB miss" exception, and the operating system must walk the page tables and perform the translation in software. The operating system then loads the translation into the TLB and restarts the program from the instruction that caused the TLB miss. As with hardware TLB management, if the OS finds no valid translation in the page tables, a page fault has occurred, and the OS must handle it accordingly.

[edit] Typical TLB

  • Size: 8 - 4,096 entries
  • Hit time: 0.5 - 1 clock cycle
  • Miss penalty: 10 - 30 clock cycles
  • Miss rate: 0.01% - 1%

If a TLB hit takes 1 clock cycle, a miss takes 30 clock cycles, and the miss rate is 1%, the effective memory cycle rate is an average of

1 \times 0.99 + (1 + 30) \times 0.01 = 1.30

(1.30 clock cycles per memory access).

In a Harvard architecture or hybrid thereof, a separate virtual address space may exist for instruction and data caching. This can lead to distinct TLB buffers for each of the caches (instructions, data, or unified TLB).

[edit] Task switch

On a task switch, some TLB entries can become invalid, since for example the previously running process had access to a page, but the process to run has not. The simplest strategy to deal with this is to completely flush the TLB. Newer CPUs have more efficient strategies; for example in the Alpha EV6, each TLB entry is tagged with an "address space number" (ASN), and only TLB entries with an ASN matching the current task are considered valid.

[edit] References

[edit] See also