Computer architecture

Pipelined implementation of MIPS architecture. Pipelining is a key concept in computer architecture.

In computer engineering,^[1] computer architecture is a set of disciplines that describes the functionality, the organization and the implementation of computer systems; that is, it defines the capabilities of a computer and its programming model in an abstract way, and how the internal organization of the system is designed and implemented to meet the specified capabilities.^[2]^[3] Computer architecture involves many aspects, including instruction set architecture design, microarchitecture design, logic design, and implementation.^[4] Some fashionable (2011) computer architectures include cluster computing and non-uniform memory access.

Computer architects use computers to design new computers. Emulation software can run programs written in a proposed instruction set. While the design is very easy to change at this stage, compiler designers often collaborate with the architects, suggesting improvements in the instruction set. Modern emulators may measure time in clock cycles: estimate energy consumption in joules, and give realistic estimates of code size in bytes. These affect the convenience of the user, the power consumption and the size and expense of the computer's largest physical part: its memory. That is, they help to estimate the value of a computer design.

History

The first documented computer architecture was in the correspondence between Charles Babbage and Ada Lovelace, describing the analytical engine. Two other early and important examples were:

John von Neumann's 1945 paper, First Draft of a Report on the EDVAC, which described an organization of logical elements; and
Alan Turing's more detailed Proposed Electronic Calculator for the Automatic Computing Engine, also 1945 and which cited von Neumann's paper.^[5]

The term “architecture” in computer literature can be traced to the work of Lyle R. Johnson, Mohammad Usman Khan and Frederick P. Brooks, Jr., members in 1959 of the Machine Organization department in IBM’s main research center. Johnson had the opportunity to write a proprietary research communication about the Stretch, an IBM-developed supercomputer for Los Alamos Scientific Laboratory. To describe the level of detail for discussing the luxuriously embellished computer, he noted that his description of formats, instruction types, hardware parameters, and speed enhancements were at the level of “system architecture” – a term that seemed more useful than “machine organization.”

Subsequently, Brooks, a Stretch designer, started Chapter 2 of a book (Planning a Computer System: Project Stretch, ed. W. Buchholz, 1962) by writing,

Computer architecture, like other architecture, is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints.

Brooks went on to help develop the IBM System/360 (now called the IBM zSeries) line of computers, in which “architecture” became a noun defining “what the user needs to know”. Later, computer users came to use the term in many less-explicit ways.

The earliest computer architectures were designed on paper and then directly built into the final hardware form.^[6] Later, computer architecture prototypes were physically built in the form of a Transistor–Transistor Logic (TTL) computer—such as the prototypes of the 6800 and the PA-RISC—tested, and tweaked, before committing to the final hardware form. As of the 1990s, new computer architectures are typically "built", tested, and tweaked—inside some other computer architecture in a computer architecture simulator; or inside a FPGA as a soft microprocessor; or both—before committing to the final hardware form.

Subcategories

The discipline of computer architecture has three main subcategories:^[7]

Instruction Set Architecture, or ISA. The ISA defines the codes that a central processor reads and acts upon. It is the machine language (or assembly language), including the instruction set, word size, memory address modes, processor registers, and address and data formats.
Microarchitecture, also known as Computer organization describes the data paths, data processing elements and data storage elements, and describes how they should implement the ISA.^[8] The size of a computer's CPU cache for instance, is an organizational issue that generally has nothing to do with the ISA.
System Design includes all of the other hardware components within a computing system. These include:

Data paths, such as computer buses and switches
Memory controllers and hierarchies
Data processing other than the CPU, such as direct memory access (DMA)
Miscellaneous issues such as virtualization, multiprocessing and software features.

Some architects at companies such as Intel and AMD use finer distinctions:

Macroarchitecture: architectural layers more abstract than microarchitecture, e.g. ISA
Instruction Set Architecture (ISA): as above but without:
- Assembly ISA: a smart assembler may convert an abstract assembly language common to a group of machines into slightly different machine language for different implementations
Programmer Visible Macroarchitecture: higher level language tools such as compilers may define a consistent interface or contract to programmers using them, abstracting differences between underlying ISA, UISA, and microarchitectures. E.g. the C, C++, or Java standards define different Programmer Visible Macroarchitecture.
UISA (Microcode Instruction Set Architecture)—a family of machines with different hardware level microarchitectures may share a common microcode architecture, and hence a UISA.
Pin Architecture: The hardware functions that a microprocessor should provide to a hardware platform, e.g., the x86 pins A20M, FERR/IGNNE or FLUSH. Also, messages that the processor should emit so that external caches can be invalidated (emptied). Pin architecture functions are more flexible than ISA functions because external hardware can adapt to new encodings, or change from a pin to a message. The term "architecture" fits, because the functions must be provided for compatible systems, even if the detailed method changes.

The Roles

Definition

The purpose is to design a computer that maximizes performance while keeping power consumption in check, costs low relative to the amount of expected performance, and is also very reliable. For this, many aspects are to be considered which includes Instruction Set Design, Functional Organization, Logic Design, and Implementation. The implementation involves Integrated Circuit Design, Packaging, Power, and Cooling. Optimization of the design requires familiarity with Compilers, Operating Systems to Logic Design and Packaging.

Instruction set architecture

An instruction set architecture (ISA) is the interface between the computer's software and hardware and also can be viewed as the programmer's view of the machine. Computers do not understand high level languages which have few, if any, language elements that translate directly into a machine's native opcodes. A processor only understands instructions encoded in some numerical fashion, usually as binary numbers. Software tools, such as compilers, translate high level languages, such as C into instructions.

Besides instructions, the ISA defines items in the computer that are available to a program—e.g. data types, registers, addressing modes, and memory. Instructions locate operands with Register indexes (or names) and memory addressing modes.

The ISA of a computer is usually described in a small book or pamphlet, which describes how the instructions are encoded. Also, it may define short (vaguely) mnenonic names for the instructions. The names can be recognized by a software development tool called an assembler. An assembler is a computer program that translates a human-readable form of the ISA into a computer-readable form. Disassemblers are also widely available, usually in debuggers, software programs to isolate and correct malfunctions in binary computer programs.

ISAs vary in quality and completeness. A good ISA compromises between programmer convenience (more operations can be better), cost of the computer to interpret the instructions (cheaper is better), speed of the computer (faster is better), and size of the code (smaller is better). For example, a single-instruction ISA is possible, inexpensive, and fast, (e.g., subtract and jump if zero. It was actually used in the SSEM), but it was not convenient or helpful to make programs small. Memory organization defines how instructions interact with the memory, and also how different parts of memory interact with each other.

Computer organization

Main article: Microarchitecture

Computer organization helps optimize performance-based products. For example, software engineers need to know the processing ability of processors. They may need to optimize software in order to gain the most performance at the least expense. This can require quite detailed analysis of the computer organization. For example, in a multimedia decoder, the designers might need to arrange for most data to be processed in the fastest data path and the various components are assumed to be in place and task is to investigate the organisational structure to verify the computer parts operates.

Computer organization also helps plan the selection of a processor for a particular project. Multimedia projects may need very rapid data access, while supervisory software may need fast interrupts. Sometimes certain tasks need additional components as well. For example, a computer capable of virtualization needs virtual memory hardware so that the memory of different simulated computers can be kept separated. Computer organization and features also affect power consumption and processor cost.

Implementation

Once an instruction set and micro-architecture are described, a practical machine must be designed. This design process is called the implementation. Implementation is usually not considered architectural definition, but rather hardware design engineering. Implementation can be further broken down into several (not fully distinct) steps:

Logic Implementation designs the blocks defined in the micro-architecture at (primarily) the register-transfer level and logic gate level.
Circuit Implementation does transistor-level designs of basic elements (gates, multiplexers, latches etc.) as well as of some larger blocks (ALUs, caches etc.) that may be implemented at this level, or even (partly) at the physical level, for performance reasons.
Physical Implementation draws physical circuits. The different circuit components are placed in a chip floorplan or on a board and the wires connecting them are routed.
Design Validation tests the computer as a whole to see if it works in all situations and all timings. Once implementation starts, the first design validations are simulations using logic emulators. However, this is usually too slow to run realistic programs. So, after making corrections, prototypes are constructed using Field-Programmable Gate-Arrays (FPGAs). Many hobby projects stop at this stage. The final step is to test prototype integrated circuits. Integrated circuits may require several redesigns to fix problems.

For CPUs, the entire implementation process is often called CPU design.

Design goals

The exact form of a computer system depends on the constraints and goals. Computer architectures usually trade off standards, power versus performance, cost, memory capacity, latency (latency is the amount of time that it takes for information from one node to travel to the source) and throughput. Sometimes other considerations, such as features, size, weight, reliability, and expandability are also factors.

The most common scheme does an in depth power analysis and figures out how to keep power consumption low, while maintaining adequate performance.

Performance

Modern computer performance is often described in IPC (instructions per cycle). This measures the efficiency of the architecture at any clock speed. Since a faster clock can make a faster computer, this is a useful, widely applicable measurement. Historic computers had IPC counts as low as 0.1 (See instructions per second). Simple modern processors easily reach near 1. Superscalar processors may reach three to five by executing several instructions per clock cycle. Multicore and vector processing CPUs can multiply this further by acting on a lot of data per instruction, which have several CPUs executing in parallel.

Counting machine language instructions would be misleading because they can do varying amounts of work in different ISAs. The "instruction" in the standard measurements is not a count of the ISA's actual machine language instructions, but a historical unit of measurement, usually based on the speed of the VAX computer architecture.

Historically, many people measured a computer's speed by the clock rate (usually in MHz or GHz). This refers to the cycles per second of the main clock of the CPU. However, this metric is somewhat misleading, as a machine with a higher clock rate may not necessarily have higher performance. As a result manufacturers have moved away from clock speed as a measure of performance.

Other factors influence speed, such as the mix of functional units, bus speeds, available memory, and the type and order of instructions in the programs being run.

In a typical home computer, the simplest, most reliable way to speed performance is usually to add random access memory (RAM). More RAM increases the likelihood that needed data or a program is in RAM—so the system is less likely to need to move memory data from the disk. The disk is often ten thousand times slower than RAM because it has mechanical parts that must move to access its data.

There are two main types of speed, latency and throughput. Latency is the time between the start of a process and its completion. Throughput is the amount of work done per unit time. Interrupt latency is the guaranteed maximum response time of the system to an electronic event (e.g. when the disk drive finishes moving some data).

Performance is affected by a very wide range of design choices — for example, pipelining a processor usually makes latency worse (slower) but makes throughput better. Computers that control machinery usually need low interrupt latencies. These computers operate in a real-time environment and fail if an operation is not completed in a specified amount of time. For example, computer-controlled anti-lock brakes must begin braking within a predictable, short time after the brake pedal is sensed.

The performance of a computer can be measured using other metrics, depending upon its application domain. A system may be CPU bound (as in numerical calculation), I/O bound (as in a webserving application) or memory bound (as in video editing). Power consumption has become important in servers and portable devices like laptops.

Benchmarking tries to take all these factors into account by measuring the time a computer takes to run through a series of test programs. Although benchmarking shows strengths, it may not help one to choose a computer. Often the measured machines split on different measures. For example, one system might handle scientific applications quickly, while another might play popular video games more smoothly. Furthermore, designers may add special features to their products, in hardware or software, that permit a specific benchmark to execute quickly but don't offer similar advantages to general tasks.

Power consumption

Main article: low-power electronics

Power consumption is another measurement that is important in modern computers. Power efficiency can often be traded for speed or lower cost. The typical measurement in this case is MIPS/W (millions of instructions per second per watt).

Modern circuits have less power per transistor as the number of transistors per chip grows. Therefore, power efficiency has increased in importance. Recent processor designs such as Intel's Haswell (microarchitecture), put more emphasis on increasing power efficiency. Also, in the world of embedded computing, power efficiency has long been and remains an important goal next to throughput and latency.

Notes

John L. Hennessy and David Patterson (2006). Computer Architecture: A Quantitative Approach (Fourth Edition ed.). Morgan Kaufmann. ISBN 978-0-12-370490-0.
Barton, Robert S., "Functional Design of Computers", Communications of the ACM 4(9): 405 (1961).
Barton, Robert S., "A New Approach to the Functional Design of a Digital Computer", Proceedings of the Western Joint Computer Conference, May 1961, pp. 393–396. About the design of the Burroughs B5000 computer.
Bell, C. Gordon; and Newell, Allen (1971). "Computer Structures: Readings and Examples", McGraw-Hill.
Blaauw, G.A., and Brooks, F.P., Jr., "The Structure of System/360, Part I-Outline of the Logical Structure", IBM Systems Journal, vol. 3, no. 2, pp. 119–135, 1964.
Tanenbaum, Andrew S. (1979). Structured Computer Organization. Englewood Cliffs, New Jersey: Prentice-Hall. ISBN 0-13-148521-0.

References

↑ Curriculum Guidelines for Undergraduate Degree Programs in Computer Engineering (PDF). Association for Computing Machinery. 2004. p. 60. Computer architecture is a key component of computer engineering and the practicing computer engineer should have a practical understanding of this topic...
↑ Hennessy, John; Patterson, David. Computer Architecture: A Quantitative Approach (Fifth Edition ed.). p. 11.
↑ Clements, Alan. Principles of Computer Hardware (Fourth Edition ed.). p. 1. Architecture describes the internal organization of a computer in an abstract way; that is, it defines the capabilities of the computer and its programming model. You can have two computers that have been constructed in different ways with different technologies but with the same architecture.
↑ Hennessy, John; Patterson, David. Computer Architecture: A Quantitative Approach (Fifth Edition ed.). p. 11. This task has many aspects, including instruction set design, functional organization, logic design, and implementation.
↑ Reproduced in B. J. Copeland (Ed.), "Alan Turing's Automatic Computing Engine", OUP, 2005, pp. 369-454.
↑ ACE underwent seven paper designs in one year, before a prototype was initiated in 1948. [B. J. Copeland (Ed.), "Alan Turing's Automatic Computing Engine", OUP, 2005, p. 57]
↑ John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach (Third Edition ed.). Morgan Kaufmann Publishers.
↑ Laplante, Phillip A. (2001). Dictionary of Computer Science, Engineering, and Technology. CRC Press. pp. 94–95. ISBN 0-8493-2691-5.

External links

Wikimedia Commons has media related to Computer architecture.

Digital systems

Components	Combinational logic Digital circuit Integrated circuit (IC) Logic gate Sequential logic

Theory	Boolean algebra Computer architecture Digital signal processing Logic synthesis

Applications	Digital audio Digital photography Digital video Electronic literature