Cray-2
From Wikipedia, the free encyclopedia
The Cray-2 was a vector supercomputer made by Cray Research starting in 1985. It was the fastest machine in the world when it was released, replacing Cray's own X-MP in that spot. The Cray-2 was bumped off of the top spot by the ETA-10G in 1990.
Contents |
[edit] Initial design
With the successful launch of his famed Cray-1, Seymour Cray immediately turned to the design of its successor. By 1979 he had become fed up with constant management interruptions in what was now a large company, and as he had done in the past, decided to resign his management post and move to form a new lab. As with his original move to Chippewa Falls, Wisconsin from Control Data HQ in Minneapolis, MN, Cray management understood his needs and supported his move to a new lab in Boulder, Colorado. Working as an independent consultant at these new Cray Labs, he put together a team and started on a completely new design. This Lab would later close, and a decade later a new facility in Colorado Springs would open.
Cray had previously attacked the problem of increased speed with three simultaneous advances: more functional units to give the system higher parallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. The classic example of this design is the CDC 8600, which packed four CDC 7600-like machines based on ECL logic into a 1 x 1 meter cylinder and ran them at an 8 ns cycle speed (125 MHz). Unfortunately the incredible density needed to achieve this cycle time led to the machine's downfall. The circuit boards inside were densely packed, and since even a single malfunctioning transistor would cause an entire module to fail, by packing more of them onto the cards the odds of failure greatly increased.
One solution to this problem, one that most computer vendors had already moved to, was to use integrated circuits (ICs) instead of individual components. Each IC included a selection of components from a module pre-wired into a circuit by the automated construction process. If an IC didn't work, you simply threw it away and tried another. At the time the 8600 was being designed the simple MOSFET-based technology simply didn't offer the speed Cray needed. Relentless improvements changed things by the mid-1970s, however, and the Cray-1 had been able to use newer ICs and still run at a respectable 12.5 ns (80 MHz). In fact, the Cray-1 was actually somewhat faster than the 8600 because it packed considerably more logic into the system due to the IC's small size.
Although IC design continued to improve, the physical size of the ICs was constrained largely by mechanical limits; the resulting component had to be large enough to solder into a system. Dramatic improvements in density were possible, as the rapid improvement in microprocessor design was showing, but for the sorts of ICs used by Cray, ones representing a very small part of a complete circuit, things had pretty much plateaued. In order to gain another 10-fold increase in performance over the Cray-1, the goal Cray always aimed for, the machine would have to grow more complex. So once again he turned to an 8600-like solution, doubling the clock speed through increased density, adding more of these smaller processors into the basic system, and then attempting to deal with the problem of getting heat out of the machine.
Cray also felt that silicon technology had almost run its course; improvements on the Cray-1's 12.5 ns cycle time were possible, but much more than doubling didn't seem easy. There was, however, the possibility of use gallium arsenide-based (GaAs) circuits instead, which offered at least 10 times the switching speed, and used less power to do it, thereby generating less heat as well. For some time in the late 1970s and early '80s it seemed a wholescale switch to GaAs by the entire computer industry was just around the corner, and a team from Cray worked with Rockwell International's semiconductor division to try to beat everyone to the punch. However the chips simply weren't ready for production, and the Cray-2 had to press ahead with existing silicon-based designs.
Another design problem was the increasing performance gap between the processor and main memory. In the era of the CDC 6600 memory ran at the same speed as the processor, and the main problem was feeding data into it. Cray solved this by adding ten smaller computers to the system, allowing them to deal with the slower external storage (disks and tapes) and "squirt" data into memory when the main processor was busy. This solution no longer offered any advantages; memory was large enough that entire data sets could be read into it, but the processors ran so much faster than memory that they would often spend long times waiting for data to arrive. Adding four processors simply made this problem worse.
To avoid this problem the new design included a 128 kB block of the very fastest memory possible, attaching the four background processors to it with separate high-speed pipes. This cache was fed data by a dedicated foreground processor who was in turn attached to the main memory through a number of Gbit/s channels. It was the foreground processor's task to "run" the computer, handling storage and making efficient use of the multiple channels into main memory. It drove the background processors by passing in the instructions they should run via eight 16 word (256 byte) buffers, instead of tying up the existing cache pipes to the background processors. Modern CPUs use a variety of this design as well, although the foreground processor is now referred to as the load/store unit and is not a complete machine unto its own.
Main memory was arranged such that different areas were able to be accessed at the same time, allowing programmers to scatter their data across memory to gain higher parallelism. The downside to this approach is that the cost of setting up the scatter/gather unit in the foreground processor was fairly high. For small datasets, or data that didn't lend itself to being spread out evenly, the system would often be slower than a simpler architecture due to high latencies.
[edit] Packed circuit boards and new design ideas
Cray-2 models soon settled on a design using large circuit boards absolutely packed with ICs. So packed, in fact, that they were almost impossible to solder together, and yet the density was still not enough to reach their performance goals. Teams worked on the design for about two years before even Cray himself "gave up" and decided it would be best if they simply cancelled the project and fired everyone working on it. Les Davis, Cray's former design collaborator who had remained at Cray headquarters, decided it should be continued at low priority. After some minor personnel movements the team continued on much as before.
Six months later Cray had his "eureka" moment. He called the main engineers together for a meeting and presented a new solution to the problem. Instead of making one larger circuit board, each "card" would instead consist of a 3-D stack of eight, connected together in the middle of the boards using pins sticking up from the surface (known as "pogos" or "z-pins"). The cards were packed right on top of each other, so the resulting stack was only about 3 inches high. With this sort of density there was no way any conventional air-cooled system would work; there was too little room for air to flow between the ICs. Instead the system would be immersed in a tank of a new inert fluid from 3M, fluorinert. The cooling fluid was forced sideways through the modules under pressure. The heated fluid was cooled using chilled water heat exchangers and returned to the main tank. Work on the new design started in earnest in 1982, several years after the original start date.
While this was going on the Cray X-MP was being developed under the direction of Steve Chen at Cray headquarters, and looked like it would give the Cray-2 a serious run for its money. In order to address this internal threat, as well as a series of newer Japanese Cray-1-like machines, the Cray-2 memory system was dramatically improved, both in size as well as the number of "pipes" into the processors. When the machine was eventually delivered in 1985 the delays had been so long that much of its performance benefits were due to the fast memory, and the machine only really made sense to purchase for uses with huge data sets to process.
This large memory should not lightly be discounted. The first real Cray-2 when delivered possessed more physical memory (256 MWord) than all previously delivered Cray memory (Cray-1s, Cray X-MPs, and the 2 Cray-2 field delivered prototypes) combined in the world. Simulation moved from a 2-D realm or coarse 3-D to a finer 3-D realm because computation did not have to rely on slow virtual memory. This inability to trade space (memory) for time (speed) is what makes supercomputation (extreme, high-end computing).
[edit] Uses and successors
The Cray-2 was predominantly developed for the American Departments of Defense and Energy. Uses tended to be for nuclear weapons research or oceanographic (sonar) development. However, the Cray-2 also found its way into civil agencies (such as NASA Ames Research Center), universities, and corporations worldwide.
The Cray-2 would have been superseded by the Cray-3, but due to development problems only one system was built and never paid for. The spiritual descendant of the Cray-2 is the Cray X1, offered by Cray Inc.
[edit] External links
Wikimedia Commons has media related to: |