UPDATE: Correcting the number of stream processors in GK110 chip.
Nvidia Corp. this week formally announced two new Tesla compute accelerators that are based on the code-named Kepler architecture. The Tesla K10 - based on the well-known GK104 chip - will become available shortly and will be aimed at those, who need maximum raw single-precision compute performance here and now. The Tesla K20, which will show up late this year, promises to be a performance monster thanks to GK110 chip with whopping 7 billion transistors.
The Nvidia Tesla "K10" compute card is based on two GK104 GPUs that deliver an aggregate peak performance of 4.58TFLOPS single precision, 0.190TFLOPS double precision and 320GB/s of m memory bandwidth. The Tesla "K10" is optimized for customers in oil and gas exploration and the defense industry. Since the K10 compute board is powered by GK104 chip, which is generally meant for graphics processing, it does not deliver really strong double precision performance and will not be a good solution for many fields of high performance computing.
The flagship compute solution based on Kepler architecture will be Tesla K20, which will be based on GK110 graphics processing unit. The latter will be a monster chip containing whopping 7.1 billion transistors, 15 streaming multiprocessors with total of 2880 stream processors and "delivering three times more double precision [performance] compared to Fermi architecture-based Tesla products", according to Nvidia. Given the fact that the highest-end Tesla 2090 provides 0.665TFLOPS of DP performance, the Tesla "K20" has potential to deliver up to enormous 2TFLOPS DP, although Nvidia claims about "over 1TFLOPS" (which is not that impressive, considering the fact that AMD Radeon HD 7970 hits 947TFLOPS DP at 925MHz).
But in addition to enormous transistor count and high performance, the GK110 will offer new capabilities that are not available on other chips:
- Dynamic Parallelism enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. This simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
- Hyper-Q enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU, which increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI, according to Nvidia.
The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.
Nvidia Tesla K20 is planned to be available in the fourth quarter of 2012.