Cray, Intel and Lawrence Livermore Laboratory Co-Develop Next-Gen HPC System

Catalyst Supercomputer Uses 800GB Non-Volatile RAM

by Anton Shilov
11/04/2013 | 11:50 PM

Lawrence Livermore National Laboratory in partnership with Intel Corp. and Cray, has announced a unique high performance computing (HPC) cluster that will serve research scientists at all three institutions and provide a proving ground for new HPC and Big Data technologies and architectures.

 

"As the name implies, Catalyst aims to accelerate HPC simulation and big data innovation, as well as collaborations between the three institutions. The partnership between Intel, Cray and LLNL allows us to explore different approaches for utilizing large amounts of high performance non-volatile memory in HPC simulation and Big Data analytics," said Matt Leininger, deputy of advanced technology projects for LLNL.

The Catalyst cluster consists of two scalable units (SUs), and represents an upgrade from the Appro clusters acquired under the tri-lab Linux capacity cluster (TLCC-2) procurement of a few years ago (Appro has since been acquired by Cray). TLCC aggregates the HPC capacity computing needs of the three weapons laboratories that serve the National Nuclear Security Administration's (NNSA) advanced simulation and computing (ASC) program – Lawrence Livermore, Los Alamos and Sandia national laboratories – to procure commodity cluster systems more cost effectively.

The 150TFLOPS (trillion floating operations per second) Catalyst cluster has 324 nodes, 7776 cores and employs the latest-generation 12-core Intel Xeon E5-2695v2 processors. Catalyst runs the NNSA-funded tri-lab open source coftware (TOSS) that provides a common user environment across NNSA tri-lab clusters. Catalyst features include 128GB of dynamic random access memory (DRAM) per node, 800GB of non-volatile memory (NVRAM) per compute node, 3.2TB of NVRAM per Lustre router node, and improved cluster networking with dual rail quad data rate (QDR-80) Intel TrueScale fabrics. The addition of an expanded node local NVRAM storage tier based on PCIe high-bandwidth Intel solid state drives (SSD) allows for the exploration of new approaches to application check-pointing, in-situ visualization, out-of-core algorithms and big data analytics.

Delivered to LLNL in late October, Catalyst is expected to be available for limited use this month and general use by December. The Catalyst resource, a Cray CS300 cluster supercomputer, will be shared between the three partners with access rights based on level of investment. System access will be managed through LLNL's high performance computing innovation center (HPCIC), whose mission is to work with industrial partners in the development of computing solutions for America to compete effectively in the 21st century global economy.

"Big Data unlocks an entirely new method of discovery by deriving the solution to a problem from the massive sets of data itself. To research new ways of translating Big Data into knowledge, we had to design a one-of-a-kind system. Equipped with the most powerful Intel processors, fabrics and SSDs, the Catalyst cluster will become a critical tool, providing insights into the technologies required to fuel innovation for the next decade," said Raj Hazra, vice president and general manager of the technical computing group at Intel.

The Catalyst architecture is expected to provide insights into the kind of technologies the ASC program will require over the next 5-10 years to meet high performance simulation and Big Data computing mission needs. The increased storage capacity of the system (in both volatile and nonvolatile memory) represents the major departure from classic simulation-based computing architectures common at DOE laboratories and opens new opportunities for exploring the potential of combining floating-point-focused capability with data analysis in one environment. Consequently, the insights provided by Catalyst could become a basis for future commodity technology procurements.