Floating-point unit

An Intel 80287

A floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, square root, and bitshifting. Some systems (particularly older, microcode-based architectures) can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines.

In general purpose computer architectures, one or more FPUs may be integrated as execution units within the central processing unit; however many embedded processors do not have hardware support for floating-point operations.

When a CPU is executing a program that calls for a floating-point operation, there are three ways to carry it out:

A floating-point unit emulator (a floating-point library)
Add-on FPU
Integrated FPU

Some systems implemented floating point via a coprocessor rather than as an integrated unit. This could be a single integrated circuit, an entire circuit board or a cabinet. Where floating-point calculation hardware has not been provided, floating point calculations are done in software, which takes more processor time but which avoids the cost of the extra hardware. For a particular computer architecture, the floating point unit instructions may be emulated by a library of software functions; this may permit the same object code to run on systems with or without floating point hardware. Emulation can be implemented on any of several levels: in the CPU as microcode (not a common practice), as an operating system function, or in user space code. When only integer functionality is available the CORDIC floating point emulation methods are most commonly used.

In most modern computer architectures, there is some division of floating-point operations from integer operations. This division varies significantly by architecture; some, like the Intel x86 have dedicated floating-point registers, while some take it as far as independent clocking schemes.

Floating-point operations are often pipelined. In earlier superscalar architectures without general out-of-order execution, floating-point operations were sometimes pipelined separately from integer operations. Since the early 1990s, many microprocessors for desktops and servers have more than one FPU.

The modular architecture of Bulldozer microarchitecture uses a special FPU named FlexFPU, which uses simultaneous multithreading. Each physical integer core, two per module, is single threaded, in contrast with Intel's Hyperthreading, where two virtual simultaneous threads share the resources of a single physical core.^[1]^[2]

Floating-point library

Wikibooks has a book on the topic of: Floating Point/Soft Implementations

Wikibooks has a book on the topic of: Embedded Systems/Floating Point Unit

Some floating-point hardware only supports the simplest operations - addition, subtraction, and multiplication. But even the most complex floating-point hardware has a finite number of operations it can support - for example, none of them directly support arbitrary-precision arithmetic.

When a CPU is executing a program that calls for a floating-point operation that is not directly supported by the hardware, the CPU uses a series of simpler floating-point operations. In systems without any floating-point hardware, the CPU emulates it using a series of simpler fixed-point arithmetic operations that run on the integer arithmetic logic unit.

The software that lists the necessary series of operations to emulate floating-point operations is often packaged in a floating-point library.

Integrated FPUs

In some cases, FPUs may be specialized, and divided between simpler floating-point operations (mainly addition and multiplication) and more complicated operations, like division. In some cases, only the simple operations may be implemented in hardware or microcode, while the more complex operations are implemented as software.

In some current architectures, the FPU functionality is combined with units to perform SIMD computation; an example of this is the augmentation of the x87 instructions set with SSE instruction set in the x86-64 architecture used in newer Intel and AMD processors.

Add-on FPUs

In the 1980s, it was common in IBM PC/compatible microcomputers for the FPU to be entirely separate from the CPU, and typically sold as an optional add-on. It would only be purchased if needed to speed up or enable math-intensive programs.

The IBM PC, XT, and most compatibles based on the 8088 or 8086 had a socket for the optional 8087 coprocessor. The AT and 80286-based systems were generally socketed for the 80287, and 80386/80386SX based machines for the 80387 and 80387SX respectively, although early ones were socketed for the 80287, since the 80387 did not exist yet. Other companies manufactured co-processors for the Intel x86 series. These included Cyrix and Weitek.

Coprocessors were available for the Motorola 68000 family, the 68881 and 68882. These were common in Motorola 68020/68030-based workstations like the Sun 3 series. They were also commonly added to higher-end models of Apple Macintosh and Commodore Amiga series, but unlike IBM PC-compatible systems, sockets for adding the coprocessor were not as common in lower end systems.

There are also add-on FPUs coprocessor units for microcontroller units (MCUs/µCs)/single-board computer (SBCs), which serve to provide floating-point arithmetic capability. These add-on FPUs are host-processor-independent, possess their own programming requirements (operations, instruction sets, etc.) and are often provided with their own integrated development environments (IDEs).

References

Raymond Filiatreault (2003). "SIMPLY FPU".

CPU technologies
Architecture	Turing machine Post–Turing machine Universal Turing machine Quantum Turing machine Belt machine Stack machine Register machine Counter machine Pointer machine Random access machine Random access stored program machine Finite-state machine Queue automaton Von Neumann Harvard (Modified) Dataflow TTA Cellular Artificial neural network Machine learning Deep learning Neural processing unit (NPU) Convolutional neural network Load/store architecture Register memory architecture Register register architecture Endianness FIFO Zero-copy NUMA HUMA HSA Heterogeneous computing Parallel computing Amorphous computing Reconfigurable computing Cognitive computing DNA computing Peptide computing Chemical computing Organic computing Wetware computer Quantum computing Neuromorphic computing Optical computing Reversible computing Unconventional computing Hypercomputation Ternary computer Symmetric multiprocessing (SMP) Asymmetric multiprocessing (AMP) Cache hierarchy Memory hierarchy
ISA types	ASIP CISC RISC EDGE (TRIPS) VLIW (EPIC) MISC OISC NISC ZISC Comparison
ISAs	x86 z/Architecture ARM MIPS Power Architecture (PowerPC) SPARC VISC Mill Itanium (IA-64) Alpha Prism SuperH Clipper VAX Unicore PA-RISC MicroBlaze
Word size	1-bit 2-bit 4-bit 8-bit 9-bit 10-bit 12-bit 15-bit 16-bit 18-bit 22-bit 24-bit 25-bit 26-bit 27-bit 31-bit 32-bit 33-bit 34-bit 36-bit 39-bit 40-bit 48-bit 50-bit 60-bit 64-bit 128-bit 256-bit 512-bit Variable
Execution	Instruction pipelining Bubble Operand forwarding Out-of-order execution Register renaming Speculative execution Branch predictor Memory dependence prediction Hazards
Parallel level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory
Multithreading	Temporal Simultaneous (SMT) (Hyper-threading) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout
Flynn's taxonomy	SISD SIMD (SWAR) SIMT MISD MIMD SPMD Addressing mode
CPU performance	Instructions per second (IPS) Instructions per clock (IPC) Cycles per instruction (CPI) Floating-point operations per second (FLOPS) Transactions per second (TPS) SUPS Performance per watt Orders of magnitude (computing) Cache performance measurement and metric
Core count	Single-core processor Multi-core processor Manycore processor
Types	Central processing unit (CPU) GPGPU AI accelerator Vision processing unit (VPU) Vector processor Barrel processor Stream processor Digital signal processor (DSP) I/O processor/DMA controller Network processor Baseband processor Physics processing unit (PPU) Coprocessor Secure cryptoprocessor ASIC FPGA FPOA CPLD Microcontroller Microprocessor Mobile processor Notebook processor Ultra-low-voltage processor Multi-core processor Manycore processor Tile processor Multi-chip module (MCM) Chip stack multi-chip modules System on a chip (SoC) Network on a chip (NoC) Multiprocessor system-on-chip (MPSoC) Programmable System-on-Chip (PSoC)
Components	Execution unit (EU) Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Load-store unit (LSU) Fixed-point unit (FXU) Vector unit (VU) Branch predictor Branch execution unit (BEU) Instruction Decoder Instruction Scheduler Instruction Fetch Unit Instruction Dispatch Unit Instruction Sequencing Unit Unified Reservation Station Barrel shifter Uncore Sum addressed decoder (SAD) Front-side bus Back-side bus Northbridge (computing) Southbridge (computing) Adder (electronics) Binary multiplier Binary decoder Address decoder Multiplexer Demultiplexer Registers Cache Memory management unit (MMU) Input–output memory management unit (IOMMU) Integrated Memory Controller (IMC) Power Management Unit (PMU) Translation lookaside buffer (TLB) Stack engine Register file Processor register Hardware register Memory buffer register (MBR) Program counter Microcode ROM Datapath Control unit Instruction unit Re-order buffer Data buffer Write buffer Coprocessor Electronic switch Electronic circuit Integrated circuit Three-dimensional integrated circuit Boolean circuit Digital circuit Analog circuit Mixed-signal integrated circuit Power management integrated circuit Quantum circuit Logic gate Combinational logic Sequential logic Emitter-coupled logic (ECL) Transistor–transistor logic (TTL) Glue logic Quantum gate Gate array Counter (digital) Bus (computing) Semiconductor device Clock rate CPU multiplier Vision chip Memristor
Power management	APM ACPI Dynamic frequency scaling Dynamic voltage scaling Clock gating
Hardware security	Non-executable memory (NX bit) Bounds checking (Intel MPX) Intel Secure Key Hardware restriction (firmware) Software Guard Extensions (Intel SGX) Trusted Execution Technology OmniShield Trusted Platform Module (TPM) Secure cryptoprocessor Hardware security module Hengzhi chip
Related	History of general-purpose CPUs

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

Floating-point unit

Floating-point library

Integrated FPUs

Add-on FPUs

See also

References