Translation Lookaside Buffer

From Wikipedia, the free encyclopedia

A Translation Lookaside Buffer (TLB) is a cache in a CPU that is used to improve the speed of virtual address translation. A TLB has a fixed number of entries containing parts of the page table which translate virtual addresses into physical addresses. It is typically a content-addressable memory (CAM), in which the search key is the virtual address and the search result is a physical address. If the CAM search yields a match, the translation is known very quickly, and the physical address is used to access memory. If the virtual address is not in the TLB, the translation proceeds via the page table, which takes longer to complete. It takes significantly longer if the translation tables are swapped out into secondary storage, which a few systems allow.

The TLB references physical memory addresses in its table. It may reside between the CPU and the CPU cache or between the CPU cache and primary storage memory. This depends on whether the cache uses physical or virtual addressing. If the cache is virtually addressed, requests are sent directly from the CPU to the cache, which then accesses the TLB as necessary. If the cache is physically addressed, the CPU does a TLB lookup on every memory operation, and the resulting physical address is sent to the cache. There are pros and cons to both implementations.

A common optimization for physically addressed caches is to perform the TLB lookup in parallel with the cache access. The low-order bits of any virtual address (e.g., in a virtual memory system having 4KB pages, the lower 12 bits of the virtual address) do not change in the virtual-to-physical translation. During a cache access, two steps are performed: an index is used to find an entry in the cache's data store, and then the tags for the cache line found are compared. If the cache is structured in such a way that it can be indexed using only the bits that do not change in translation, the cache can perform its "index" operation while the TLB translates the upper bits of the address. Then, the translated address from the TLB is passed to the cache. The cache performs a tag comparison to determine if this access was a hit or miss. See the address translation section in the cache article for more details about virtual addressing as it pertains to caches and TLBs.

[edit] Miss

When a TLB miss occurs, two schemes are commonly found in modern architectures. With hardware TLB management, the CPU itself walks the page tables to see if there is a valid page table entry for the specified virtual address. If an entry exists, it is brought into the TLB and the TLB access is retried; this time the access will hit, and the program can proceed normally. If the CPU finds no valid entry for the virtual address in the page tables, it raises a page fault exception, which the operating system must handle. Handling page faults usually involves bringing the requested data into physical memory, setting up a page table entry to map the faulting virtual address to the correct physical address, and restarting the program; see the page fault article for more details. With software-managed TLBs, a TLB miss generates a "TLB miss" exception, and the operating system must walk the page tables and perform the translation in software. The operating system then loads the translation into the TLB and restarts the program from the instruction that caused the TLB miss. Like with hardware TLB management, if the OS finds no valid translation in the page tables, a page fault has occurred, and the OS must handle it accordingly.

[edit] Typical statistics

Size: 8 - 4,096 entries
Hit time: 0.5 - 1 clock cycle
Miss penalty: 10 - 30 clock cycles
Miss rate: 0.01% - 1%

If a TLB hit takes 1 clock cycle, a miss takes 30 clock cycles, and the miss rate is 1%, the effective memory cycle rate is an average of 1 * 0.99 + 30 * 0.01 = 1.29 clock cycles per memory access.