Bookmark and Share


Nvidia Corp. shed some light onto its exascale development project known as Echelon at the SC10 supercomputer-related trade-show last week. The company's researchers are completely convinced that machines capable of performing at least a quintillion double-precision floating point operations per second (1018 FLOPS) should be heterogeneous, e.g. employ both highly-parallel as well as high-performance serial processors.

Even though today's graphics processing units (GPUs) are more efficient than central processing units (CPUs) in terms of raw performance-per-watt, even modern GPUs cannot power supercomputers that are 1000 times faster than modern ones while consuming reasonable amount of energy. As a result, graphics processors should evolve radically and rapidly in order to enable exascale systems in 2018 - 2020 timeframe. Moreover, in order to efficiently program heterogeneous systems, new programming models and paradigms are required.

William Dally, the chief scientist of Nvidia who also heads the development of Echelon extreme-scale computing project partly funded by DARPA under the Ubiquitous High Performance Computing (UHPC) program, shared his thoughts about the future chips capable of powering an ExaFLOPS-class supercomputer.

Sketch of Nvidia Echelon research system

According to Steve Keckler, the director of architecture research at Nvidia, the Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. Just like in current architectures, eight stream cores will form a streaming multiprocessor (SM) and 128 of SMs will forum the large pool of throughput-optimized processing elements. Such a chip could deliver 20 teraFLOPS with double precision and a number of them will form a 2.6 petaFLOPS rack. At present Nvidia Fermi (GF110) chip 512 with stream processors operating at 1544MHz can deliver 0.79TFLOPS of DP compute performance. Considerint the 25 times difference in performance, it is highly likely that the Echelon will employ post-Maxwell (~2013 ~ 2014) Nvidia GPU design.

In order to keep power consumption of such a chip relatively low, stream processors have to process a double-precision floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips, EETimes web-site quoted Mr. Dally as saying. To facilitate that drop in energy consumption, each of 1024 stream processors per chip have to perform four FLOPS per cycle.

To further trim usage of power, Nvidia intends to integrate a large (~1024) number of configurable 256KB SRAM banks into the chip. The huge amount of on-chip memory should allow to keep as many data onboard as possible and as close to processing elements as possible to avoid power-costly fetching operations where doable. The SRAM banks should be configurable and either act as unified memory pool, as dedicated caches for processing elements, as shared memory for explicit management and so on.

At present the Echelon is only a research project and not a chip from Nvidia's roadmap. From some point of view, the Echelon is much like Intel's single-chip cloud computer (SCC) which belongs to Tera-Scale research project.

Tags: Nvidia, Tesla, GPGPU, Exascale, Maxwell, Kepler


Comments currently: 1
Discussion started: 11/27/10 11:58:52 AM
Latest comment: 11/27/10 11:58:52 AM


i hadnt heard of the the "dragonfly" interconnect before. there seem to be a number of papers & presentations on it. the title paper is "Technology-Driven, Highly-Scalable Dragonfly Topology", I believe, and it's lead authors are John Kim of NW U, William Dally of Stanford, Steve Scott of Cray, and Dennis Abts of Google.

google webview:
0 0 [Posted by: rektide  | Date: 11/27/10 11:58:52 AM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture