http://www.lanl.gov/orgs/...t-salishan-2009-final.pdf
google webview:
http://webcache.googleuse...f+dragonfly+interconnectL

Nvidia Discusses Future Graphics Chips for Extreme Graphics and Exascale Computing
[11/24/2010 05:51 PM]Nvidia Corp. shed some light onto its exascale development project known as Echelon at the SC10 supercomputer-related trade-show last week. The company's researchers are completely convinced that machines capable of performing at least a quintillion double-precision floating point operations per second (1018 FLOPS) should be heterogeneous, e.g. employ both highly-parallel as well as high-performance serial processors.
Even though today's graphics processing units (GPUs) are more efficient than central processing units (CPUs) in terms of raw performance-per-watt, even modern GPUs cannot power supercomputers that are 1000 times faster than modern ones while consuming reasonable amount of energy. As a result, graphics processors should evolve radically and rapidly in order to enable exascale systems in 2018 - 2020 timeframe. Moreover, in order to efficiently program heterogeneous systems, new programming models and paradigms are required.
William Dally, the chief scientist of Nvidia who also heads the development of Echelon extreme-scale computing project partly funded by DARPA under the Ubiquitous High Performance Computing (UHPC) program, shared his thoughts about the future chips capable of powering an ExaFLOPS-class supercomputer.

Sketch of Nvidia Echelon research system
According to Steve Keckler, the director of architecture research at Nvidia, the Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. Just like in current architectures, eight stream cores will form a streaming multiprocessor (SM) and 128 of SMs will forum the large pool of throughput-optimized processing elements. Such a chip could deliver 20 teraFLOPS with double precision and a number of them will form a 2.6 petaFLOPS rack. At present Nvidia Fermi (GF110) chip 512 with stream processors operating at 1544MHz can deliver 0.79TFLOPS of DP compute performance. Considerint the 25 times difference in performance, it is highly likely that the Echelon will employ post-Maxwell (~2013 ~ 2014) Nvidia GPU design.

In order to keep power consumption of such a chip relatively low, stream processors have to process a double-precision floating point operation using just 10 picojoules of power, down from 200 picojoules on Nvidia's current Fermi chips, EETimes web-site quoted Mr. Dally as saying. To facilitate that drop in energy consumption, each of 1024 stream processors per chip have to perform four FLOPS per cycle.

To further trim usage of power, Nvidia intends to integrate a large (~1024) number of configurable 256KB SRAM banks into the chip. The huge amount of on-chip memory should allow to keep as many data onboard as possible and as close to processing elements as possible to avoid power-costly fetching operations where doable. The SRAM banks should be configurable and either act as unified memory pool, as dedicated caches for processing elements, as shared memory for explicit management and so on.

At present the Echelon is only a research project and not a chip from Nvidia's roadmap. From some point of view, the Echelon is much like Intel's single-chip cloud computer (SCC) which belongs to Tera-Scale research project.
Tags: Nvidia, Tesla, GPGPU, Exascale, Maxwell, Kepler
8:15 pm | Microsoft and Sony to Launch Next-Gen Consoles by End of October . Battlefield 4 Launch Date Reveals Availability Timeframe for PlayStation 4, Xbox One
7:44 pm | Microsoft Unveils Xbox One: The One and Only Machine One Needs in the Living Room. Microsoft Reveals Its New Vision for Game Consoles with Xbox One System
11:57 pm | Samsung Taps Intel Atom Processor for Galaxy Tab 3 10” Media Tablet. Samsung to Use x86 Microprocessor for Forthcoming Galaxy Tab 3 Slate
11:40 pm | Razer Launches Atrox Arcade Stick for Fighting Video Games. Razer Launches Controller for Old-School Fighting Games
9:57 pm | Western Digital’s HGST Launches Highest Capacity Hard Drive for Notebooks. HGST Unleashes World’s First 2.5”/9.5mm HDD with 1.5TB Capacity
9:31 pm | SanDisk and Toshiba Set to Begin to Produce NAND Flash Using Second-Gen 19nm Process Technology. SanDisk and Toshiba Create World’s Smallest 64Gb NAND Flash Chip
8:42 pm | Samsung Starts Manufacturing of High-Performance SSD for Enterprise Servers and Data Centers. Samsung Begins to Produce Enterprise-Class SSDs
8:10 pm | Nvidia: GeForce GTX Titan Outsold Dual-Chip GeForce GTX 690. In Three Months on the Market, Nvidia’s GeForce GTX Titan Outsold Year-Old GeForce GTX 690
6:43 pm | Futuremark’s PCMark 8 to Benchmark Performance and Power Consumption. Futuremark Announces PCMark 8 Benchmark
6:13 pm | Samsung Display Showcases Retina-Class Displays for Tablets and Notebooks. Samsung Display Shows Off State-of-the-Art Displays
