Bookmark and Share


The next big thing for supercomputers are projected to be exascale machines. The leading chip designers are working on technologies that will enable the next leap in the high-performance computing space. According to an HPC expert from the University of Tennessee, in exascale systems will require new central processors, graphics processors or hybrids that combine both onto the same piece of silicon. But, Cell chips, which are heterogeneous multi-core processors, are dead end.

Building an exaFLOPS machine is a huge challenge. Even Advanced Micro Devices and Intel Corp. – whose x86 central processing units (CPUs) power the absolute majority of supercomputers – admit that construction of a machine capable of performing quintillion floating point operations per second (1018 exaFLOPS) or more using x86 chips is hardly an executable task. As a result, AMD is trying to incorporate special FireStream compute accelerators (which are based on the massively parallel graphics processing units [GPUs]) into high-performance computing systems, whereas Intel is working on such accelerators, first of which is code-named Knights Corner and is projected to be released sometime in 2012 or 2013.

GPUs Are Not Ideal for Supercomputers

But the accelerators that are designed for PCI Express bus are not exactly the best thing possible since communication between CPUs and GPUs is not a strong side of such systems. Moreover, GPUs are not easy to program for. One of the ways to overcome the issues is to integrate CPUs and GPUs.

“The obvious upside of GPUs is that they provide compelling performance for modest prices. The downside is that they are more difficult to program, since at the very least you will need to write one program for the CPUs and another program for the GPUs. Another problem that GPUs present pertains to the movement of data. Any machine that requires a lot of data movement will never come close to achieving its peak performance. The CPU-GPU link is a thin pipe, and that becomes the strangle-point for the effective use of GPUs. In the future this problem will be addressed by having the CPU and GPU integrated in a single socket,” said Jack Dongarra, the director of the innovative computing laboratory at the innovative computing laboratory and the director of the center for information technology research at the University of Tennessee, in an interview with Next Big Future web-site.

Cell is Dead for HPC

Chips that contain both x86 general processing cores as well as graphics processing cores are essentially heterogeneous multi-core processors, which AMD calls Fusion. The vast majority of multi-core chips today are homogenous chips that contain a number of similar processing engines. There are processors with different types of cores – the Cell chips jointly developed by IBM, Sony Corp. and Toshiba Corp. – which originally promised to redefine the market of multimedia chips as well as CPUs for HPC market. However, since all three companies cease to develop Cell, it has no future.

“The Cell architecture is no longer being developed, so it is effectively dead. No new supercomputers will use Cell,” claimed Mr. Dongarra.

New Paradigms

But even when central processors and highly-parallel accelerators essentially represent the same piece of silicon, exascale systems will still need to have certain optimizations on the platform and software level to be efficient.

“The current memory paradigm is hierarchical, based on registers, L1 and L2 caches, local memory, shared memory, and distributed memory among nodes. That is a potential model for exaFLOPS systems. However, we want exaFLOPS systems to be designed to be relatively easy to program. We therefore want a globally shared address space, and explicit methods to pass data between the processors in order to orchestrate the unfolding computation. That paradigm may be necessary for a machine that has a billion threads,” explained the HPC specialist.

$200 Million, 20 Mega Watt

Asked how much will an exaFLOPS-capable machine cost and what its specifications are likely to be, the professor pointed out that the cost could be as high as $200 million and power consumption could be gargantuan.

“The maximum price will be no more than $200 million, and the maximum power budget will be 20megaW. It will contain about 64PB of RAM, so that alone will probably cost $100 million. Given that our Jaguar now consumes 7megaW, keeping within a 20mW budget will be a major challenge,” said Mr. Dongarra.

New Chips Needed

In order to actually stay in the 20mW power budget when it comes to exascale supercomputers it will be needed to either utilize a huge number of very low power simplistic chips or to incorporate highly-integrated chips that either feature special-purpose co-processors, built-in graphics cores or both.

“There are two models that we can use to get to an exaflop while staying within a 20megaW budget. The first model employs huge numbers of lightweight processors, such as IBM Blue Gene Processor running at 1.0GHz. If we use 1 million chips, and each chip has 1000 cores, then we can get to a potential billion threads of execution. The other approach is a hybrid that makes extensive use of coprocessors or GPUs. It would use a 1.0GHz processor and 10 000 floating point units per socket, and 100 000 sockets per system,” explained the HPC expert.

Tags: HPC, GPGPU, AMD, Fusion, ATI, , Knights Corner, x86, Intel, Nvidia, FireStream, Tesla


Comments currently: 2
Discussion started: 07/07/10 07:26:43 PM
Latest comment: 07/09/10 04:16:10 AM


Very nice article!
One thing: 1 [mW] = 0.001 [W] and 1 [MW] = 1,000,000 [W]
That means Mega is Mega, not mega
and 20mW is 20 milliWatts (= 0.02 [W])

0 0 [Posted by: cogee  | Date: 07/07/10 07:26:43 PM]

GPUs have 256bit architecture, hundreds of millions of more transistors and different design/pipe lines, caches specifically designed for their purpose while a CPU might me similar, but different in its own league in its own world, still a very important part, lets hope one day it all unifies in to just one chip with 128 cores with 100 billion transistors running at 10GHz with 1TB L1, L2 and L3 caches LOL just kidding
0 0 [Posted by: mike1101  | Date: 07/09/10 04:16:10 AM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture