Intel Details 80-Core Teraflops Research Chip

Intel’s 80-Core Chip Takes Shape, But Will Never Turn “Real”

by Anton Shilov
02/12/2007 | 10:49 PM

Intel Corp., has unveiled several new details about its so-called Teraflops research chip, which demonstrates Intel’s ability to pack 80 processing cores into a single piece of silicon and provides ability to analyze the bahaviour of certain new technologies required to make such processors. But while the chip has floating point performance that exceeds that of IBM’s Cell by more than four times, it will never be a commercial product.


"Our researchers have achieved a wonderful and key milestone in terms of being able to drive multi-core and parallel computing performance forward. It points the way to the near future when Teraflop-capable designs will be commonplace and will reshape what we can all expect from our computers and the Internet at home and in the office,” said Justin R. Rattner, Intel’s chief technology officer.

The 80-core Teraflops research chip built using 65nm process technology contains 80 cores, which Intel calls tiles due to the fact that they are very simplistic and hardly resembles modern central processing units’ cores, organized into 10x8 2D mesh network and operating at 4GHz clock-speed. Each tile consists of a processing engine (PE) connected to a 5-port router with mesochronous interfaces, which forwards packets between tiles.

The 80-tile on-chip network enables a bisection bandwidth of 256GB/s. The PE contains two independent fully-pipelined single-precision floating-point multiply-accumulator (FPMAC) units, 3KB single-cycle instruction memory (IMEM), and 2KB data memory (DMEM). A 96-bit VLIW encodes up to eight operations per cycle. With a 10-port (6-read, 4-write) register file, the architecture allows scheduling to both FPMACs, simultaneous DMEM load and stores, packet send/receive from mesh network, program control, and dynamic sleep instructions. A router interface block (RIB) handles packet encapsulation between the PE and router. The fully symmetric architecture allows any PE to send (receive) instruction and data packets to (from) any other tile, which resembles AMD’s multi-processor architecture.

Intel said that a 3.16GHz Teraflops research chip can deliver 1.01Teraflops performance with 62W thermal design power (TDP) and 0.95V voltage. At the same time, 5.1GHz and 5.7GHz processors would drive performance to 1.63Teraflops and 1.81Teraflops with 175W and 265W TDP, respectively. Achieving 1.01Teraflops performance is a respectable achievement, as IBM declares 205Gigaflops per Cell processor (currently the Gigaflops champ) in its QS20 blade-servers.

Further Tera-scale research will focus on the addition of 3D stacked memory to the chip as well as developing more sophisticated research prototypes with many general-purpose Intel Architecture-based cores. Today, the Intel Tera-scale Computing Research Program has over 100 projects underway that explore other architectural, software and system design challenges.