Nvidia Gives a Glimpse on Next-Generation “Fermi” Graphics Processors

Nvidia G300/NV60: 512 Stream Processors, 768KB L2 Cache, 3 Billion Transistors

by Anton Shilov
09/30/2009 | 05:53 PM

Nvidia Corp. on Wednesday officially disclosed the first details about its next-generation graphics processor, which was previously known as G300, GT300 or NV60 code-names. Apparently, the new family of chips is called Fermi and it is architected to provide rather massive computing power in general-purpose applications. Unfortunately, the company did not say when its next-generation graphics chips hit the market.


“It is completely clear that GPUs are now general purpose parallel computing processors with amazing graphics, and not just graphics chips anymore. The Fermi architecture, the integrated tools, libraries and engines are the direct results of the insights we have gained from working with thousands of CUDA developers around the world. We will look back in the coming years and see that Fermi started the new GPU industry,” said Jen-Hsun Huang, the chief executive officer of Nvidia Corp.

The flagship Fermi graphics processor will feature 512 stream processing engines (which are organized as 16 streaming multi-processors with 32 cores in each) that support a type of multi-threading technology to maximize utilization of cores. Each stream processor has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). The top-of-the-range chip contains 3 billion of transistors, features 384-bit memory GDDR5 memory controller with ECC and features rather unprecedented 768KB unified level-two cache as well as rather complex cache hierarchy in general. Naturally, the Fermi family is compatible with DirectX 11, OpenGL 3.x and OpenCL 1.x application programming interfaces (APIs). The new chips will be made using 40nm process technology at TSMC.

Prior Nvidia GPUs used IEEE 754-1985 floating point arithmetic. The Fermi architecture implements the new IEEE 754-2008 floating-point standard, providing the fused multiply-add (FMA) instruction for both single and double precision arithmetic, which improves upon multiply-add (MAD) by retaining full precision in the intermediate stage. According to Nvidia, the top-of-the-line Fermi chip can process up to 512 FMA operations per clock with single precision floating point or 256 FMA operations per clock with double precision floating point. As a result, the new chip can be up to 8.5 times faster compared to the predecessor, the GeForce GTX 200-series in terms of double precision processing at the same clock-speed. Peak single precision floating point throughput of the Fermi-GT300 could be higher than 2.13TFLOPs (provided that it can work at the same clock speed with the GeForce GTX 285, roughly 650MHz).

For Nvidia the Fermi family of graphics processors is much more than a new lineup of chips. The firm claims that its new GPUs are tailored for general-purpose computing, which is why they feature massive amount of cache, extreme amount of executed threads, IEEE 754-2008 floating-point standard and so on.

“Nvidia and the Fermi team have taken a giant step towards making GPUs attractive for a broader class of programs. I believe history will record Fermi as a significant milestone.” said Dave Patterson, director Parallel Computing Research Laboratory, U.C. Berkeley and co-author of Computer Architecture: A Quantitative Approach.

Nvidia did not unveil any actual configurations of graphics processors based on Fermi architecture. Besides, the company did not reveal when such graphics cards are set to hit the market.