Runahead
Runahead is a technique that allows a microprocessor to pre-process instructions during cache miss cycles instead of stalling. The pre-processed instructions are used to generate instruction and data stream prefetches by detecting cache misses before they would otherwise occur by using the idle execution resources to calculate instruction and data stream fetch addresses using the available information that is independent of the cache miss.
The principal hardware cost is a means of checkpointing the register file state and preventing pre-processed stores from modifying memory. This checkpointing can be accomplished using very little hardware since all results computed during runahead are discarded after the cache miss has been serviced, at which time normal execution resumes using the checkpointed register file state.
Branch outcomes computed during runahead mode can be saved into a shift register, which can be used as a highly accurate branch predictor when normal operation resumes.
Runahead was initially investigated in the context of an in-order microprocessor, however this technique has been extended for use with out of order microprocessors.
Entering runahead
When a runahead processor detects a level one instruction or data cache miss it records the instruction address of the faulting access and enters runahead mode. A demand fetch for the missing instruction or data cache line is generated if necessary. The processor checkpoints the register file by one of several mechanisms discussed later. The state of the memory hierarchy is checkpointed by disabling stores. Store instructions are allowed to compute addresses and check for a hit, but they are not allowed to write to memory.
Because the value returned from a cache miss cannot be known ahead of time, it is possible for pre-processed instructions to be dependent upon invalid data. These are denoted by adding an "invalid" or INV bit to every register in the register file. If runahead was initiated by a load instruction, the load's destination register is marked INV.
Pre-processing instructions
The processor then continues to execute instructions after the miss, however all results are strictly temporary and are only used to attempt to generate additional load, store, and instruction cache misses, which are turned into prefetches. The designer can opt to allow runahead to skip over instructions that are not present in the instruction cache with the understanding that the quality of any prefetches generated will be reduced since the effect of the missing instructions is unknown.
Registers that are the target of an instruction that has one or more source registers marked INV are marked INV. This allows the processor to know which register values can be (reasonably) trusted during runahead mode. Branch instructions that cannot be resolved due to INV sources are simply assumed to have had their direction predicted correctly. Branch outcomes are saved in a shift register for later use as highly accurate predictions during normal operation.
Note that it is not possible to perfectly track INV register values during runahead mode. This is not required since runahead is only used to optimize performance and all results computed during runahead mode are discarded. In fact, it is impossible to perfectly track invalid register values if runahead was initiated by an instruction cache miss, an instruction cache miss occurred during runahead, a load is dependent upon a store with an INV address (assumes that hardware is present to allow store to load forwarding during runahead), or if a branch outcome during runahead is dependent upon an INV register.
Leaving runahead
The register file state is restored from the checkpoint and the processor is redirected to the original faulting fetch address when the fetch that initiated runahead mode has been serviced.
Register file checkpoint options
The most obvious method of checkpointing the register file (RF) is to simply perform a flash copy to a shadow register file, or backup register file (BRF) when the processor enters runahead mode, then perform a flash copy from the BRF to the RF when normal operation resumes. There are simpler options available.
One way to eliminate the flash copy operations is to write to both the BRF and RF during normal operation, read from only the RF during normal operation, and read/write only the BRF during runahead mode.
An even more aggressive approach is to eliminate the BRF and rely upon the forwarding paths to provide modified values during runahead mode. Checkpointing is accomplished by disabling register file writes. Modified values during runahead mode can only be provided by the forwarding paths.
See also
- Rock processor
- Hardware scout
References
- Improving data cache performance by pre-executing instructions under a cache miss
- Improving processor performance by dynamically preprocessing the instruction stream
- Runahead execution: an alternative to very large instruction windows for out-of-order processors