by Anton Shilov
07/06/2010 | 11:15 AM
The next big thing for supercomputers are projected to be exascale machines. The leading chip designers are working on technologies that will enable the next leap in the high-performance computing space. According to an HPC expert from the University of Tennessee, in exascale systems will require new central processors, graphics processors or hybrids that combine both onto the same piece of silicon. But, Cell chips, which are heterogeneous multi-core processors, are dead end.
Building an exaFLOPS machine is a huge challenge. Even Advanced Micro Devices and Intel Corp. – whose x86 central processing units (CPUs) power the absolute majority of supercomputers – admit that construction of a machine capable of performing quintillion floating point operations per second (1018 exaFLOPS) or more using x86 chips is hardly an executable task. As a result, AMD is trying to incorporate special FireStream compute accelerators (which are based on the massively parallel graphics processing units [GPUs]) into high-performance computing systems, whereas Intel is working on such accelerators, first of which is code-named Knights Corner and is projected to be released sometime in 2012 or 2013.
But the accelerators that are designed for PCI Express bus are not exactly the best thing possible since communication between CPUs and GPUs is not a strong side of such systems. Moreover, GPUs are not easy to program for. One of the ways to overcome the issues is to integrate CPUs and GPUs.
“The obvious upside of GPUs is that they provide compelling performance for modest prices. The downside is that they are more difficult to program, since at the very least you will need to write one program for the CPUs and another program for the GPUs. Another problem that GPUs present pertains to the movement of data. Any machine that requires a lot of data movement will never come close to achieving its peak performance. The CPU-GPU link is a thin pipe, and that becomes the strangle-point for the effective use of GPUs. In the future this problem will be addressed by having the CPU and GPU integrated in a single socket,” said Jack Dongarra, the director of the innovative computing laboratory at the innovative computing laboratory and the director of the center for information technology research at the University of Tennessee, in an interview with Next Big Future web-site.
Chips that contain both x86 general processing cores as well as graphics processing cores are essentially heterogeneous multi-core processors, which AMD calls Fusion. The vast majority of multi-core chips today are homogenous chips that contain a number of similar processing engines. There are processors with different types of cores – the Cell chips jointly developed by IBM, Sony Corp. and Toshiba Corp. – which originally promised to redefine the market of multimedia chips as well as CPUs for HPC market. However, since all three companies cease to develop Cell, it has no future.
“The Cell architecture is no longer being developed, so it is effectively dead. No new supercomputers will use Cell,” claimed Mr. Dongarra.
But even when central processors and highly-parallel accelerators essentially represent the same piece of silicon, exascale systems will still need to have certain optimizations on the platform and software level to be efficient.
“The current memory paradigm is hierarchical, based on registers, L1 and L2 caches, local memory, shared memory, and distributed memory among nodes. That is a potential model for exaFLOPS systems. However, we want exaFLOPS systems to be designed to be relatively easy to program. We therefore want a globally shared address space, and explicit methods to pass data between the processors in order to orchestrate the unfolding computation. That paradigm may be necessary for a machine that has a billion threads,” explained the HPC specialist.
Asked how much will an exaFLOPS-capable machine cost and what its specifications are likely to be, the professor pointed out that the cost could be as high as $200 million and power consumption could be gargantuan.
“The maximum price will be no more than $200 million, and the maximum power budget will be 20megaW. It will contain about 64PB of RAM, so that alone will probably cost $100 million. Given that our Jaguar now consumes 7megaW, keeping within a 20mW budget will be a major challenge,” said Mr. Dongarra.
In order to actually stay in the 20mW power budget when it comes to exascale supercomputers it will be needed to either utilize a huge number of very low power simplistic chips or to incorporate highly-integrated chips that either feature special-purpose co-processors, built-in graphics cores or both.
“There are two models that we can use to get to an exaflop while staying within a 20megaW budget. The first model employs huge numbers of lightweight processors, such as IBM Blue Gene Processor running at 1.0GHz. If we use 1 million chips, and each chip has 1000 cores, then we can get to a potential billion threads of execution. The other approach is a hybrid that makes extensive use of coprocessors or GPUs. It would use a 1.0GHz processor and 10 000 floating point units per socket, and 100 000 sockets per system,” explained the HPC expert.