Evans & Sutherland ES-1
The ES-1 was Evans & Sutherland's abortive attempt to enter the supercomputer market. About to be released just as the market was drying up in the post-cold war military wind-down, only a handful were built and only two sold.
Background
Jean-Yves Leclerc was a computer designer who was unable to find funding in Europe for a high-performance server design. In 1985 he visited Dave Evans, his former PhD. adviser, looking for advice. After some discussion he eventually convinced him that since most of their customers were running E&S graphics hardware on Cray Research machines and other supercomputers, it would make sense if E&S could offer their own low-cost platform instead. Eventually a new Evans & Sutherland Computer Division, or ESCD, was set up in 1986 to work on the design. Unlike the rest of E&S's operations which are headquartered in Salt Lake City, Utah, it was felt that the computer design would need to be in the "heart of things" in Silicon Valley, and the new division was set up in Mountain View, California.
Basic design
8 × 8 crossbar
The basic idea of Leclerc's system was to use an 8×8 crossbar switch to connect eight custom CMOS CPUs together at high speed. An extra channel on the crossbar allowed it to be connected to another crossbar, forming a single 16-processor unit. The units were 16-sized (instead of 8) in order to fully utilize a 16-bank high-speed memory that had been designed along with the rest of the system. Since memory was logically organized on the "far side" of the crossbars, the memory controller handled many of the tasks that would normally be left to the processors, including interrupt handling and virtual memory translation, avoiding a trip through the crossbar for these housekeeping tasks.
The resulting 16-unit processor/memory blocks could then be connected using another 8×8 crossbar, creating a 128-processor machine. Although the delays between the 16-unit blocks would be high, if the task could be cleanly separated into units the delay would not have a huge effect on performance. When data did have to be shared across the banks the system balanced the requests; first the "leftmost" processor in the queue would get access, then the "rightmost". Processors added their requests onto the proper end of the queue based on their physical location in the machine. It was felt that the simplicity and speed of this algorithm would make up for the potential gains of a more complex load-balancing system.
Instruction pipeline
In order to allow the system to work even with the high inter-unit latencies, each processor used an 8-deep instruction pipeline. Branches used a variable delay slot, the end of which was signaled by a bit in the next instruction. The bit indicated that the results of the branch had to be re-merged at this point, stalling the processor until this took place. Each processor also included a floating point unit from Weitek. For marketing purposes, each processor was called a "computational unit", and a card-cage populated with 16 was referred to as a "processor". This allowed favorable per-processor performance comparisons with other supercomputers of the era.
The processors ran at 20 MHz in the integer units and 40 MHz for the FPUs, with the intention being to increase this to 50 MHz by the time it shipped. At about 12 Mflops peak per CU, the machine as a whole would deliver up to 1.5 Gflops, although due to the memory latencies this was typically closer to 250 Mflops. While this was fast for a CMOS machine processor of the time, it was hardly competitive for a supercomputer. Nevertheless the machine was air cooled, and would have been the fastest such machine on the market.
The machine ran an early version of the Mach kernel for multi-processor support. The compilers were designed to keep the processors as full as possible by reducing the number of branch delay slots, and did a particularly good job of it.
Fatal flaw
Unfortunately the new leftmost-rightmost algorithm had a fatal flaw. In high-contention cases the "middle" units would never be serviced, and could stall for thousands of cycles. By 1989 it was clear this was going to need a redesign, but by this point other machines with similar price/performance ratios were coming on the market and the pressure was on to ship immediately. The first two machines were shipped to Caltech and the University of Colorado at Boulder in November 1989, but there were no other immediate sales. One sample ES-1 is in storage at the Computer History Museum.
Evans resigned from the E&S board in 1989, and suddenly the votes turned against continuing the project. E&S looked for a buyer who was interested in continuing the effort, but finding none they instead closed the division in January 1990.
References
- Robert Schreiber and Horst D. Simon, "Towards the Teraflops Capability for CFD," in Parallel CFD—Implementations and Results Using Parallel Computers, edited by Horst D. Simon, Scientific and Engineering Computation Series, MIT Press, Cambridge, Mass., 1992, pp. 313–341. Cites 6 made.