Reconfigurable computing
From Wikipedia, the free encyclopedia
Reconfigurable computing is computer processing with highly flexible computing fabrics. The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the data path itself in addition to the control flow.
Contents |
[edit] History and characteristics
The concept of reconfigurable computing has been around since the 1960s, when Gerald Estrin's landmark paper proposed the concept of a computer consisting of a standard processor and an array of “reconfigurable” hardware. The main processor would control the behavior of the reconfigurable hardware. The reconfigurable hardware would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware; unfortunately this idea was way ahead of its time in terms of electronic technology.
In the last decade there was a renaissance in this area of research with many proposed reconfigurable architectures developed both in industry and academia such as, Matrix, Garp, Elixent, XPP, Silicon Hive, Montium, Pleiades, Morphosys, PiCoGA. Such designs were feasible due to the relentless progress of silicon technology that allowed complex designs to be implemented on a single chip. The world's first commercial reconfigurable computer, the Algotronix CHS2X4, was completed in 1991. It was not a commercial success, but it was promising enough that Xilinx Inc. (the inventor of the Field-Programmable Gate Array (FPGA)) purchased the technology and hired the Algotronix staff [1].
Currently there are a number of vendors with commercially available reconfigurable computers aimed at the high performance computing market; including Cray, SGI and SRC Computers, Inc. . Cray supercomputer company (not affiliated with SRC Computers) acquired OctigaBay and its reconfigurable computing platform, which Cray marketed as the XD1 until recently. SGI sells the RASC platform with their Altix series of supercomputers[2]. SRC Computers, Inc. has developed a family of reconfigurable computers based on their IMPLICIT+EXPLICIT architecture and MAP processor.
All of the offerings are hybrid "Estrin" computers with traditional microprocessors coupled to user-programmable FPGAs. The systems can be used as traditional cluster computers without using the FPGAs (in fact, the FPGAs are an option on the XD1 and the SGI RASC). The XD1 and SGI FPGA reconfiguration is accomplished either via the traditional Hardware Description Languages (HDL) or using a high level languages like the graphical tool Starbridge Viva or C-like languages like for example Handel-C from Celoxica, Impulse-C from Impulse Accelerated Technologies or Mitrion-C from Mitrionics. According to the XD1 programming guide, "Development of the raw FPGA logic file is a complex process that requires specialized knowledge and tools."
SRC has developed a "Carte" compiler that takes an existing high-level languages like C or Fortran, and with a few modifications, compiles them for execution on both the FPGA and microprocessor. According to SRC literature, "...application algorithms are written in a high-level language such as C or Fortran. Carte extracts the maximum parallelism from the code and generates pipelined hardware logic that is instantiated in the MAP. It also generates all the required interface code to manage the movement of data to and from the MAP and to coordinate the microprocessor with the logic running in the MAP." (note that SRC also allows a traditional HDL flow to be used). The XD1 communicates between microprocessor and FPGA over its RapidArray interconnection network. The SRC systems communicate via the SNAP memory interface, and/or the (optional) Hi-Bar switch. Clearly, classifications of reconfigurable architectures are still being developed and refined as new architectures are developed; no unifying taxonomy has been suggested to date. However, several recurring parameters can be used to classify these systems.
[edit] Granularity
The granularity of the reconfigurable logic is defined as the size of the smallest functional unit (CLB) that is addressed by the mapping tools. Low granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bit-level manipulation level; whilst coarse grained processing elements (rDPU) are better optimised for standard data path applications. One of the drawbacks of coarse grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. This problem can be solved by having a coarse grain array (rDPA) and a FPGA on the same chip.
Coarse-grained architectures (rDPA) are intended for the implementation for algorithms needing word-width data paths (rDPU). As their functional blocks are optimized for large computations they will perform these operations more quickly and power efficiently than a smaller set of functional units connected together with some interconnect, this is due to the connecting wires are shorter, meaning less wire capacitance and hence faster and lower power designs. A potential undesirable consequence of having larger computational blocks is that when the size of operands may not match the algorithm an inefficient utilisation of resources can result. Often the type of applications to be run are known in advance allowing the logic, memory and routing resources to be tailored (for instance, see KressArray Xplorer) to enhance the performance of the device whilst still providing a certain level of flexibility for future adaptation. Examples of this are domain specific arrays aimed at gaining better performance in terms of power, area, throughput than their more generic finer grained FPGA cousins by reducing their flexibility.
[edit] Rate of reconfiguration
Configuration of these reconfigurable systems can happen at deployment time, between execution phases or during execution. In a typical reconfigurable system, a bit stream is used to program the device at deployment time. Fine grained systems by their own nature requires greater configuration time than more coarse-grained architectures due to more elements needing to be addressed and programmed. Therefore more coarse-grained architectures gain from potential lower energy requirements, as less information is transferred and utilised. Intuitively, the slower the rate of reconfiguration the smaller the energy consumption as the associated energy cost of reconfiguration are amortised over a longer period of time. Partial reconfiguration aims to allow part of the device to be reprogrammed while another part is still performing active computation. Partial reconfiguration allows smaller reconfigurable bit streams thus not wasting energy on transmitting redundant information in the bit stream. Compression of the bit stream is possible but careful analysis is to be carried out to insure that the energy saved by using smaller bit streams is not outweighed by the computation needed to decompress the data.
[edit] Host coupling
Often the reconfigurable array is used as a processing accelerator attached to a host processor. The level of coupling determines the type of data transfers, latency, power, throughput and overheads involved when utilising the reconfigurable logic. Some of the most intuitive designs use a peripheral bus to provide a coprocessor like arrangement for the reconfigurable array. However, there have also been implementations where the reconfigurable fabric is much closer to the processor, some are even implemented into the data path, utilising the processor registers. The job of the host processor is to perform the control functions, configure the logic, schedule data and to provide external interfacing.
[edit] Routing/interconnects
The flexibility in reconfigurable devices mainly comes from their routing interconnect. One style of interconnect made popular by FPGAs vendors, Xilinx and Altera are the island style layout, where blocks are arranged in an array with vertical and horizontal routing. A layout with inadequate routing may suffer from poor flexibility and resource utilisation, therefore providing limited performance. If too much interconnect is provided this requires more transistors than necessary and thus more silicon area, longer wires and more power consumption.
[edit] Tool flow
Generally, tools for configurable computing systems can be split up in two parts, CAD tools for reconfigurable array and compilation tools for CPU. The front-end compiler is an integrated tool, and will generate a structural hardware representation that is input of hardware design flow. Hardware design flow for reconfigurable architecture can be classified by the approach adopted by three main stages of design process: technology mapping, placement algorithm and routing algorithm. The software frameworks differ in the level of the programming language.
Some types of reconfigurable computers are microcoded processors where the microcode is stored in RAM or EEPROM, and changeable on reboot or on the fly. This could be done with the AMD 2900 series bit slice processors (on reboot) and later with FPGAs (on the fly).
Some dataflow processors are implemented using reconfigurable computing.
[edit] A Paradigm Shift
The fundamental model of the Reconfigurable Computing Machine paradigm, the data-stream-based anti machine is well illustrated by the differences to other machine paradigms having been introduced earlier, as shown by Nick Tredennick's following classification scheme of computing paradigms:
Early Historic Computers: | |
Programming Source | |
---|---|
Resources fixed | none |
Algorithms fixed | none |
von Neumann Computer: | |
Programming Source | |
Resources fixed | none |
Algorithms variable | Software (instruction streams) |
Reconfigurable Computing Systems: | |
Programming Source | |
Resources variable | Configware (configuration) |
Algorithms variable | Flowware (data streams) |
The fundamental model of a Reconfigurable Computing Machine, the data-stream-based anti machine (also called Xputer), is the counterpart of the instruction-stream-based von Neumann machine paradigm. This is illustrated by a simple reconfigurable system (not dynamically reconfigurable), which has no instruction fetch at run time. The reconfiguration (before run time) can be considered as a kind of super instruction fetch. An anti machine does not have a program counter. The anti machine has data counters instead, since it is data-stream-driven. Here the definition of the term data streams is adopted from the systolic array scene, which defines, at which time which data item has to enter or leave which port, here of the reconfigurable system, which may be fine-grained (e. g. using FPGAs) or coarse-grained, or a mixture of both.
The systolic array scene, originally (early 80ies) mainly mathematicians, only defined one half of the anti machine: the data path: the Systolic array (also see Super Systolic Array). But they did not define nor model the data sequencer methodology, considering that this is not their job to to take care where the data streams come from or end up. The data sequencing part of the anti machine is modeled as distributed memory, preferrably on chip, which consists of auto-sequencing memory blocks (ASM blocks). Each ASM block has a sequencer including a data counter. An example is the Generic Address Generator (GAG), which is a generalization of the DMA.
[edit] Terminology
Reconfigurable Device | FPGAs, rDPAs, and any other device whose functionality can be changed during execution. The reconfigurable device may have a fine-grained architecture like FPGAs, or a coarse-grained architecture like rDPAs.
If in a hardware architecture both functionalities of processing elements and interconnections between them can be modified after fabrication time then it is a reconfigurable device or architecture. |
Bitstream | The file that configures the FPGA (has a .bit extension). The Bitstream gets loaded into an FPGA when ready for execution. Obtained after place and route, final result of the place and route phase. |
Common Memory | A.k.a Shared Memory. Should refer to memory on a multi-FPGA board to which all the FPGAs can communicate data to DIRECTLY and is external to the FPGA. |
Compile/Compilation | Code segments/pieces that are meant to run on the microprocessor. This could include simulation/emulation runs, which are executing on the processor. Alternatively, this word could be used to encompass the processes of synthesis, and place and route for reconfigurable devices. |
Cocompilation | Compilation for generating Software Code and Configware Code, including automatic Software / Configware partitioning. |
Configware | Source programs for Configuration. Being of structural nature, Configware is the counterpart of Software (being of procedural nature). |
Configuration | Should refer to the bitstream currently loaded on an FPGA. When used loosely, it could also refer to the components/chipset making up a board or reconfigurable machine, which should not be the case. |
Cycle accurate simulation | Simulation that exactly mimics the clock on the FPGA, records changes in data based on the rising/falling edge of the clock. |
Emulation/Simulation | A.k.a Simulation, Modeling. Process of mimicking the behavior of the FPGA hardware on a processor based system. |
Flowware | in addition to configware the second programming source needed for data scheduling. |
High Performance Computing (HPC) | A.k.a High Performance Embedded Computing, Parallel Computing. Parallel computing based on an array of microprocessors or (Reconfigurable HPC): FPGAs or rDPAs characterized by large run-times and computing resources, parallel implementations of algorithms. |
Hybrid | In this context the term "hybrid" stands for a symbiosis of procedural (instruction-stream-based) computing and reconfigurable computing (no instruction fetch at run time). |
On-chip memory | A.k.a Block RAM, Cache. This term should refer to memory that is available on-chip within a single chip (whether it be BlkRAM Slices or SRAM slices). The term cache should be reserved purely for memory directly attached to processors on the system/host side. |
Aggregate On-chip memory | Refers to total on-chip memory available for multi-FPGA systems. |
Local Memory | A.k.a DRAM, SRAM, QDR, DDR SRAMs, ZBT RAM. This term should been used purely to describe memory that is external to an FPGA or rDPA, is attached directly to an FPGA, and is not attached to any other FPGA or device on the board or outside of it. It should be called "On-chip memory" when located on board of the same chip with the FPGA or rDPA. |
Reconfigurable Computing | A computing paradigm employing reconfigurable devices such as FPGAs or rDPAs to process data. A different bitstream can be loaded during the execution of a program or to run a different program on the fly. Estrin architecture reconfigurable computers include conventional von Neuman processors as main or control processors, and typically use one or more reconfigurable devices as co-processors. Newer FPGA-based architectures eliminate the need for a host processor by providing mechanisms to configure the device on boot from flash, and to directly support essential interfaces to memory and network resources via a bus configured in the device fabric. Providing a stable and stateful computational platform within a reconfigurable device requires, however, partial reconfigurability - that is, the ability to reconfigure only that portion of the device that implements an application, while leaving unchanged the portion of the device that implements the platform - the memory and network interfaces, the device drivers, and so forth. Current FPGA devices allow partial reconfiguration, but implementing designs that can effectively use this feature is still a tough exercise in system-on-chip design. |
Reconfiguration | Configuration, programming, re-programming (also see Configware) |
System Memory/Host Memory | Should refer to memory on the microprocessor motherboard. You could also refer to it as host memory, NOT cache memory. |
Reconfigurable Computer | An Estrin architecture reconfigurable computer typically pairs a comventional microprocessor host computer with a reconfigurable co-processor, such as an FPGA or rDPA board. The co-processor can be reconfigured to perform different computations during execution of a host computer program by loading appropriate bitstreams. Newer FPGA-based architectures eliminate the need for a host processor by providing mechanisms to configure the device on boot from flash, and to directly support essential interfaces to memory and network resources via a bus configured in the device fabric.
A fairly recent market has developed for low-power reconfigurable system-on-chip (SoC) devices that manufacturers can customize to their product applications, which are typically portable consumer media electronics. The devices typically incorporate one or more von Neuman processors, and provide mechanisms to extend the processor(s) instruction set and/or interface the device to other subsystems in the product. While these devices are technically "reconfigurable processors," they are really designed to be configured once during production, or to be reconfigured as part of a field upgrade, but not to be reconfigured on-the-fly. |
Synthesis | Process of creating a netlist from a circuit description described using HDLs (Hardware Description Language), HLLs (High Level Language), GUI (Graphical User Interfaces) |
Place and Route | Process of converting a netlist into physically mapped and placed components on the FPGA or rDPA, ending in the creation of a bitstream. |
[edit] References
- G. Estrin, "Organization of Computer Systems—The Fixed Plus Variable Structure Computer," Proc. Western Joint Computer Conf., Western Joint Computer Conference, New York, 1960, pp. 33-40.
- N. Tredennick: The Case for Reconfigurable Computing; Microprocessor Report, Vol. 10 No. 10, 5 Aug 1996, pp 25-27.
[edit] See also
[edit] External links
- IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)
- International Conference on Field-Programmable Logic and Applications (FPL)
- BYU Configurable Computing Laboratory's FPGA CAD tool set
- The Morphware Page
- The OpenFPGA effort
- RC Education Workshop
- Reconfigurable Architectures Workshop
- Reconfigurable Computing: Coming of Age
- The Virginia Tech Configurable Computing Laboratory
- The University of Florida High-Performance Computing & Simulation Research Laboratory
- The University of Kansas Hybridthreads Project - OS for Hybrid CPU/FPGA chips
- Why we need Reconfigurable Computing Education