Nvidia Fermi Powers World's Highest-Performing Supercomputer
General purpose computing on graphics processing unit (GPGPU) is the idea that is almost ten years old. Thanks to massively parallel architecture, GPUs are great for parallel computing. This year Nvidia supplied compute cards to power the world's most powerful supercomputer, the Tianhe-1A.
Nvidia Fermi in its final incarnation was a bumpy road for Nvidia. The company first demonstrated the its code-named GF100 chip for the first time in September, 2009, and said that it would have 512 compute elements with a kind of multi-threading technology, 768KB of L2 cache and that the graphics processor would deliver massive performance in double-precision floating point operations. From October, 2009, to April, 2010, the company delayed the chip for a number of times, blamed poor yields at TSMC for that and made rather bold promises.
At the end, the flagship Fermi-class GeForce graphics processor for games was cut-down to 480 compute elements, whereas the most powerful Fermi-class Tesla chip for computations featured only 448 stream processors so that to improve yields. Moreover, the GeForce GTX 480 was slower compared to ATI Radeon HD 5970 and could not provide a substantial amount of advantages compared to ATI Radeon HD 5870 released over half a year earlier. The position of the company on the graphics market was tough. But then all the efforts that Nvidia has put into GPGPU suddenly started to pay off.
Starting from around the middle of the decade, Nvidia has spent a lot of resources on creation of its CUDA platform for GPU computing and eventually also integrated a number of compute-specific logic into its Fermi-series graphics processors. For example, the Fermi architecture is tailored to deliver maximum double-precision floating point computer performance; SIMD processors of Fermi can both read and write from/to the unified L2 cache, something that is needed for compute and something that is less needed for graphics, and also something that AMD decided to skip.
In November '10 Nvidia was also the first to receive fruits from its efforts. In 2010 three supercomputers out of top five supercomputers in the world, including the #1 Tiahne-1A with 2.56petaFLOPS performance, were powered by Nvidia Tesla 2000-series compute cards. Moreover, the same cards were used in many other HPC systems.
According to the updated Top 500 list of supercomputers, the most powerful system nowadays is Tianhe-1A located in National Supercomputing Center (NSC) in Tianjin, China. the system scores 2.566 petaFLOPS (PFLOPS) in LINPACK benchmark and can theoretically perform 4.7 quadrillion floating point operations per second (FLOPS). The most powerful supercomputer on the planet is powered by 14 336 six-core Intel Xeon X5670 (2.93GHz) central processing units (CPUs) as well as 7168 Nvidia Tesla 2050 compute boards.
Other supercomputers powered by Nvidia Tesla in the top 5 are Nebulae (1.271PFLOPS, 4640 Tesla compute boards, 2.55MW) that belongs to NSC in Shenzhen, China as well as Tsubame 2.0 (1.192PFLOPS, 4200 Tesla compute boards, 1.340MW) located in GSIC Center of Tokyo Institute of Technology in Japan.
It is interesting to note that Tianhe-1A's Rmax performance in LINPACK benchmark is 83% lower than theoretical Rpeak rating. When it comes to CPU-based clusters such difference is usually around 30%, but GPU-based supercomputers listed in Top 500 tend to show much higher gap between actual and theoretical performance.
Despite of the fact that supercomputers powered by compute accelerators like Nvidia Tesla are only beginning to take off, their performance per watt efficiency is pretty spectacular. For example, Tianhe-1A consumes 4.04MW of power, whereas a CPU-based cluster based on today's microprocessors with 2.566PFLOPS performance would have used 50 thousand of CPUs and consumed 12.7MW of power, according to Nvidia. One of the most notable new entry to the Top 500 is Tsubame 2.0, the new supercomputer from Tokyo Institute of Technology. The system delivers petaflop-class performance while remaining extremely efficient, consuming just 1.340MW, dramatically less power than any other system on the top five.
"Tsubame 2.0 is an impressive achievement, balancing performance and power to deliver the most energy efficient petaflop-class supercomputer ever built. The path to exascale computing will be forged by groundbreaking systems like Tsubame 2.0,” said Bill Dally, chief scientist at Nvidia.