by Anton Shilov
12/28/2010 | 12:06 PM
Modern graphics processors feature over a thousand of processing elements, or cores, and can use them all. Modern commercial microprocessors feature up to twelve cores, but Intel Corp.’s experimental single-chip cloud computer (SCC) features 48-cores. According to an Intel engineer, it is theoretically possible to create a CPU with a thousand of cores, the question is how to utilize those cores.
“I came up with that 1000 number by playing a Moore's Law doubling game. If the integration capacity doubles with each generation and a generation is nominally two years, then in four or five doublings from today's 48 cores, we are at 1000. So this is really a question of how long do we think our fabs can keep up with Moore's Law. If I've learned anything in my 17+ years at Intel, it's never bet against our fabs,” said Timothy Mattson, who is a principal engineer at Intel's Microprocessor Technology Laboratory, in an interview with ZDNet UK web-site.
Theoretically, Intel would be able to create a 1000-core SCC-like homogeneous multi-core central processing units (CPUs) in eight or ten years. The question is whether this is actually necessary: graphics processing units with over one and a half thousand of processing elements are available already, but they cannot run operating systems or efficiently solve truly complex problems, such those computed on modern servers. As a result, both AMD and Intel are working on multi-core heterogeneous microprocessors.“Speaking from a technical perspective, I can easily see us using 1000 cores. The issue, however, is really one of product strategy and market demands. As I said earlier, in the research world where I work, my job is to stay ahead of the curve so our product groups can take the best products to the market, optimised for usage models demanded by consumers,” added Mr. Mattson.
The prototype chip contains 24 tiles with two x86 cores per each, which results in 48 cores – the largest number ever placed on a single piece of silicon. Each core can run a separate OS and software stack and act like an individual compute node that communicates with other compute nodes over a packet-based network. Every core sports its own non-coherent L2 cache and each tile sports a special router logic that allows tiles to communicate with each other using a 24-router mesh network with 256GB/s bisection bandwidth. There is no hardware cache coherence support among cores in order to simplify the design, reduce power consumption and to encourage the exploration of datacenter distributed memory software models on-chip. Each tile (2 cores) can have its own frequency, and groupings of four tiles (8 cores) can each run at their own voltage. The processor sports four integrated DDR3 memory controllers, or one controller per twelve cores.
Back in the past Intel already introduced a research processor with 80-cores, but that chip, unlike SCC, even did not reach any researchers outside the company. The first research processor of Intel's Tera-Scale research project had performance of 1.6TFLOPS SP at 5GHz clock-speed.