AMD and Penguin Build World's First HPC Cluster Based on Fusion APUs

Penguin Computing Installs First HPC Cluster Running Hybrid CPU-GPU Chips

by Anton Shilov
11/02/2011 | 09:41 PM

Penguin Computing, experts in high performance computing (HPC) solutions, on Wednesday announced that the company has successfully installed the world’s first HPC cluster powered by AMD accelerated processing units (APUs) at Sandia National Labs in Albuquerque, New Mexico.


The Altus A2A00 system comprises 104 servers powered by A-series Fusion Llano APUs (one chip per server) with four x86 cores and 320/400 stream processors that are interconnected through a QDR Infiniband fabric. It delivers a theoretical peak performance of 59.6TFLOPs. The Altus 2A00 was specifically designed by Penguin Computing, in partnership with AMD, to support the AMD Fusion APU architecture. It is the world's first Fusion APU system in a rack mountable chassis in a 2U form factor.

"With the Altus 2A00, Penguin is the first to bring AMD’s unique APU capabilities to the HPC community. We are extremely proud of our successful deployment of this platform on such a large scale. We believe that the high level of integration and the resulting benefits for HPC users will further accelerate the adoption of the GPU processing model in HPC. The APU architecture has the potential to become a key component of future exascale systems," said Phil Pokorny, chief technology officer of Penguin Computing.

Numerous high-performance computing (HPC) customers have deployed systems with compute accelerator cards, such as Nvidia Tesla or AMD FireStream solutions that are based on graphics processing units (GPUs). Those systems are powered by both x86 microprocessors as well as compute card and are used to achieve record HPC performance in highly-parallelized tasks. The complexity of GPU-accelerated clusters is very high, just like its power consumption and usage of chips with both x86 cores and multitude of stream processors greatly reduces both. The purpose of the particular machine is not to set records in terms of performance, but to allow AMD Penguin to understand challenges for hybrid processors in HPC in order to create efficient hardware and software in the future.

The APU includes 400 parallel processing cores that can be leveraged for HPC applications through the OpenCL programming framework. Unlike conventional GPU server architectures, APU parallel multiprocessors share the same physical memory space with CPU cores. As a result, the programming model for APUs is simpler, bottlenecks for data movement between GPU and main memory are avoided and data duplication is eliminated. These capabilities offer particularly compelling benefits when deployed in conjunction with low-latency RDMA interconnects such as Infiniband, as they allow for building efficient distributed GPU applications.

"We are interested in research on next generation computer architectures and look forward to collaborating with Penguin and AMD to advance power-efficient computing strategies. This first of a kind cluster of Altus 2A00 servers will support our exploration of advanced programming models like OpenCL, which seamlessly map MPI applications to the CPU and GPU cores, and research into system software support for advanced data movement capabilities,” said James Ang, manager of the scalable computer architectures department at Sandia National Labs.