Multithreading (computer hardware)

From Wikipedia, the free encyclopedia

This article may require cleanup to meet Wikipedia's quality standards.
Please improve this article if you can. (January 2008)

This article describes hardware supports for multitreads. For thread in software, see Thread (computer science).

Multithreading computers have hardware support to efficiently execute multiple threads.

1 Overview
2 Block multi-threading
3 Interleaved multi-threading
4 Simultaneous multi-threading
5 Implementation specifics
6 See also

[edit] Overview

The Multithreading paradigm has become more popular as efforts to further exploit instruction level parallelism have stalled since the late-1990s. This allowed the concept of Throughput Computing to reemerge to prominence from the more specialized field of transaction processing:

Even though it is very difficult to further speed up a single thread or single program, most computer systems are actually multi-tasking among multiple threads or programs.
Techniques that would allow speedup of the overall system throughput of all tasks would be a meaningful performance gain.

The two major techniques for throughput computing are multiprocessing and multithreading.

Some criticism of multithreading include:

Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs).
Execution times of a single-thread are not improved but can be degraded.
Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.

Hardware techniques used to support multithreading often parallel the software techniques used for computer multitasking of computer programs.

[edit] Block multi-threading

[edit] Concept

The simplest type of multi-threading is where one thread runs until it is blocked by an event that normally would create a long latency stall. Such a stall might be a cache-miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of ready-to-run threads.

For example:

Cycle i : instruction j from thread A is issued
Cycle i+1: instruction j+1 from thread A is issued
Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches
Cycle i+3: thread scheduler invoked, switches to thread B
Cycle i+4: instruction k from thread B is issued
Cycle i+5: instruction k+1 from thread B is issued

Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in which tasks voluntarily give up execution time when they need to wait upon some type of event.

[edit] Terminology

This type of multithreading is known as Block or Cooperative or Coarse-grained multithreading.

[edit] Hardware cost

The goal of multithreading hardware support is to allow quick switching between a blocked thread and another thread ready to run. To achieve this goal, the hardware cost is to replicate the program visible registers as well as some processor control registers (such as the program counter). Switching from one thread to another thread means the hardware switches from using one register set to another.

Such additional hardware has these benefit:

The thread switch can be done in one CPU cycle.
It appears to each thread that they are executing alone and not sharing any hardware resources with any other threads. This minimizes the amount of software changes needed within the application as well as the operating system to support multithreading.

In order to switch efficiently between active threads, each active thread needs to have its own register set. For example, to quickly switch between two threads, the register hardware needs to be instantiated twice.

[edit] Examples

Many families of microcontrollers and embedded processors have multiple register banks to allow quick context switching for interrupts. Such schemes can be considered a type of block multithreading among the user program thread and the interrupt threads.
Intel Super-threading
Intel Itanium 2

[edit] Interleaved multi-threading

See article: barrel processor

[edit] Concept

A higher performance type of multithreading is where the processor switches threads every CPU cycle. For example:

Cycle i : an instruction from thread A is issued
Cycle i+1: an instruction from thread B is issued
Cycle i+2: an instruction from thread C is issued

The purpose of this type of multithreading is to remove all data dependency stalls from the execution pipeline. Since one thread is relatively independent from other threads, there's less chance of one instruction in one pipe stage needing an output from an older instruction in the pipeline.

Conceptually, it is similar to pre-exemptive multi-tasking used in operating systems. One can make the analogy that the time-slice given to each active thread is one CPU cycle.

[edit] Terminology

This type of multithreading was first called Barrel processing, in which the staves of a barrel represent the pipeline stages and their executing threads. Interleaved or Pre-emptive or Fine-grained or time-sliced multithreading are more modern terminology.

[edit] Hardware costs

In addition to the hardware costs discussed in the Block type of multithreading, interleaved multithreading has an additional cost of each pipeline stage tracking the thread ID of the instruction it is processing. Also, since there are more threads being executed concurrently in the pipeline, shared resources such as caches and TLBs need to be larger to avoid thrashing between the different threads.

[edit] Examples

Denelcor Heterogeneous Element Processor
Sun Microsystems UltraSPARC T1
Lexra NetVortex
MIPS 34K core which implements the Multi-Threaded ASE
Raza Microelectronics Inc XLR

[edit] Simultaneous multi-threading

See main article Simultaneous multithreading

[edit] Concept

The most advanced type of multi-threading applies to superscalar processors. A normal superscalar processor issues multiple instructions from a single thread every CPU cycle. In Simultaneous Multi-threading (SMT), the superscalar processor can issue instructions from multiple threads every CPU cycle. Recognizing that any single thread has a limited amount of instruction level parallelism, this type of multithreading is trying to exploit parallelism available across multiple threads to decrease the waste associated with unused issue slots.

For example:

Cycle i : instructions j and j+1 from thread A; instruction k from thread B all simultaneously issued
Cycle i+1: instruction j+2 from thread A; instruction k+1 from thread B; instruction m from thread C all simultaneously issued
Cycle i+2: instruction j+3 from thread A; instructions m+1 and m+2 from thread C all simultaneously issued

[edit] Terminology

To distinguish the other flavors of multithreading from SMT, the term Temporal multithreading is used to denote when instructions from only one thread can be issued at a time.

[edit] Hardware costs

In addition to the hardware costs discussed for interleaved multithreading, SMT has the additional cost of each pipeline stage tracking the Thread ID of each instruction being processed. Again, shared resources such as caches and TLBs have to be sized for the large number of active threads.

[edit] Examples

Alpha AXP EV8 (uncompleted)
Intel Hyperthreading
IBM POWER5
Power Processing Element within the Cell microprocessor
Sun Microsystems UltraSPARC T2

[edit] Implementation specifics

A major area of research is the thread scheduler which must quickly choose among the list of ready-to-run threads to execute next as well as maintain the read-to-run and stalled thread lists. An important sub-topic are the different thread priority schemes that can be used by the scheduler. The thread scheduler might be implemented totally in software or totally in hardware or as a hw/sw combination.

Another area of research is what type of events should cause a thread switch - cache misses, inter-thread communication, DMA completion, etc.

If the multithreading scheme replicates all software visible state, include privileged control registers, TLBs, etc., then it enables virtual machines to be created for each thread. This allows each thread to run its own operating system on the same processor. On the other hand, if only user-mode state is saved, less hardware is required which would allow for more threads to be active at one time for the same die-area/cost.

[edit] See also

Thread (computer science)
Simultaneous multithreading, SMT
Temporal multithreading, also known as Interleaved multi-threading

v • d • e CPU technologies

Architecture	Instruction Set Architecture · RISC · CISC · EPIC · VLIW · OISC · ZISC · Harvard architecture · Von Neumann architecture

Parallelism	Instruction pipelining · Superscalar · Out-of-order execution · Register renaming · Speculative execution · Multithreading · Multiprocessing

Components	ALU · FPU · Vector processor · SIMD · 32-bit/64-bit · Registers · Cache · ASIC · FPGA · DSP · Microcontroller · ASIP · SoC

Power management	Dynamic frequency scaling · Dynamic voltage scaling · Clock gating

v • d • e Parallel computing topics

General	High-performance computing

Parallelism	Bit-level parallelism · Instruction level parallelism · Data parallelism · Task parallelism

Theory	Speedup · Amdahl's law · Flynn's taxonomy (SISD • SIMD • MISD • MIMD) · Cost efficiency · Gustafson's law · Karp-Flatt metric · Parallel slowdown

Elements	Process · Thread · Fiber · Parallel Random Access Machine

Coordination	Multiprocessing · Multithreading · Multitasking · Memory coherency · Cache coherency · Barrier · Synchronization · Distributed computing · Grid computing

Programming	Programming model · Implicit parallelism · Explicit parallelism

Hardware	Computer cluster · Beowulf · Symmetric multiprocessing · Non-Uniform Memory Access · Cache only memory architecture · Asymmetric multiprocessing · Simultaneous multithreading · Shared memory · Distributed memory · Massively parallel processing · Superscalar processing · Vector processing · Supercomputer · Stream processing · GPGPU

Software	Distributed shared memory · Application checkpointing · Warewulf

APIs	POSIX Threads · OpenMP · Message Passing Interface (MPI) · Intel Threading Building Blocks

Problems	Embarrassingly parallel · Grand Challenge · Software lockout

Hidden categories: Cleanup from January 2008 | All pages needing cleanup

Multithreading (computer hardware)

From Wikipedia, the free encyclopedia

Contents

[edit] Overview

[edit] Block multi-threading

[edit] Concept

[edit] Terminology

[edit] Hardware cost

[edit] Examples

[edit] Interleaved multi-threading

[edit] Concept

[edit] Terminology

[edit] Hardware costs

[edit] Examples

[edit] Simultaneous multi-threading

[edit] Concept

[edit] Terminology

[edit] Hardware costs

[edit] Examples

[edit] Implementation specifics

[edit] See also

Views

Navigation

Interaction

Search

Languages