Nvidia Discusses Future Graphics Chips for Extreme Graphics and Exascale Computing[11/24/2010 05:51 PM]
Nvidia Corp. shed some light onto its exascale development project known as Echelon at the SC10 supercomputer-related trade-show last week. The company's researchers are completely convinced that machines capable of performing at least a quintillion double-precision floating point operations per second (1018 FLOPS) should be heterogeneous, e.g. employ both highly-parallel as well as high-performance serial processors.
Even though today's graphics processing units (GPUs) are more efficient than central processing units (CPUs) in terms of raw performance-per-watt, even modern GPUs cannot power supercomputers that are 1000 times faster than modern ones while consuming reasonable amount of energy. As a result, graphics processors should evolve radically and rapidly in order to enable exascale systems in 2018 - 2020 timeframe. Moreover, in order to efficiently program heterogeneous systems, new programming models and paradigms are required.
William Dally, the chief scientist of Nvidia who also heads the development of Echelon extreme-scale computing project partly funded by DARPA under the Ubiquitous High Performance Computing (UHPC) program, shared his thoughts about the future chips capable of powering an ExaFLOPS-class supercomputer.
Sketch of Nvidia Echelon research system
According to Steve Keckler, the director of architecture research at Nvidia, the Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. Just like in current architectures, eight stream cores will form a streaming multiprocessor (SM) and 128 of SMs will forum the large pool of throughput-optimized processing elements. Such a chip could deliver 20 teraFLOPS with double precision and a number of them will form a 2.6 petaFLOPS rack. At present Nvidia Fermi (GF110) chip 512 with stream processors operating at 1544MHz can deliver 0.79TFLOPS of DP compute performance. Considerint the 25 times difference in performance, it is highly likely that the Echelon will employ post-Maxwell (~2013 ~ 2014) Nvidia GPU design.
In order to keep power consumption of such a chip relatively low, stream processors have to process a double-precision floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips, EETimes web-site quoted Mr. Dally as saying. To facilitate that drop in energy consumption, each of 1024 stream processors per chip have to perform four FLOPS per cycle.
To further trim usage of power, Nvidia intends to integrate a large (~1024) number of configurable 256KB SRAM banks into the chip. The huge amount of on-chip memory should allow to keep as many data onboard as possible and as close to processing elements as possible to avoid power-costly fetching operations where doable. The SRAM banks should be configurable and either act as unified memory pool, as dedicated caches for processing elements, as shared memory for explicit management and so on.
At present the Echelon is only a research project and not a chip from Nvidia's roadmap. From some point of view, the Echelon is much like Intel's single-chip cloud computer (SCC) which belongs to Tera-Scale research project.
Enter your username and e-mail address. Password will be sent to you.
10:00 pm | AT&T: It Is Impossible to Afford Big Subsidies on Smartphones Any Longer. AT&T Calls to End Smartphone Subsidies Citing Rising Costs
9:53 pm | European Regulator Warns Nokia Not to Become a Patent Troll. EU Hopes That Nokia Will Not Convert Into a Patent Troll
9:48 pm | Plextor to Unleash Next-Generation Solid-State Storage Solutions Next Month. Plextor to to Formally Unveil M6-Series SSDs During CES 2014
10:55 pm | IBM’s Scientific Breakthrough Could Enable Lower-Cost High-Performance Big Data Systems. IBM Scientists Demonstrate Quantum Phenomenon for the First Time Using a Plastic Film
10:51 pm | Toshiba Announces Cost-Efficient Serial ATA SSDs for Read-Intensive Applications. Toshiba Introduces HK3R eSSDs for Tiered Storage Applications
10:20 pm | Sales of Xbox One Surpass Two Million Units Worldwide – Microsoft. Initial Sales of Microsoft Xbox One Match Those of Sony PlayStation 4
10:10 pm | Intel Core “Haswell Refresh” Chips Not Expected to Bring Significant Performance Improvements - Specs. First Specifications of Core i-Series “Haswell Refresh” Processors Get Published
10:05 pm | Micron's High-Density 45nm Serial NOR Flash Doubles Programming Speed for Embedded Applications. Micron Introduces Industry's First 2Gb Serial NOR Device
10:00 pm | Microsoft Windows “Threshold” Expected to Bring Back Start Menu, Windowed Multitasking. Microsoft Readies Multiple New Operating Systems Code-Named “Threshold”