GPU Architecture and Positioning
Faster and more efficient – that’s how Nvidia promotes its Kepler architecture. But how did they intend to increase efficiency and performance of their new GPUs? DirectX 11 still being the newest API for 3D applications, both GPU developers have focused on polishing off their existing DirectX 11 implementations. As you already know, AMD replaced its VLIW architecture with GCN, which is more GPGPU oriented. Nvidia's previous architecture, codenamed Fermi, was originally developed for DirectX 11 and GPGPU and had no downright weak spots. How could it be optimized further? Towards higher performance and energy efficiency, of course. So, here is the Kepler architecture:
The Kepler is similar to the Fermi architecture in its general topology and structure, so we will just cover the changes here.
Like the previous generation of Nvidia’s GPUs, the GK104 contains a GigaThreadEngine, memory controllers, L2 cache for data and instructions, raster back-ends, and Graphics Processing Clusters responsible for computing and texture-mapping operations. The GeForce GTX 680 has as many as four GPCs, each comprising a Raster Engine and several stream multiprocessors which are now called SMXs rather than SMs as in the Tesla and Fermi architectures. With the GK104, each GPC contains two SMXs as opposed to the Fermi’s four SMs in each GPC.
Being the basic building block of the Kepler architecture, an SMX contains:
- One geometry processing unit called PolyMorph Engine 2.0 which is supposed to be twice as fast as its Fermi counterpart;
- 192 CUDA cores, 32 special function units and 32 load & store units working at the base GPU frequency (rather than at a double frequency as in the Fermi);
- 16 texture-mapping units (two times more than in the Fermi);
- 64KB L1 cache;
- A register file for 65536 x 32-bit entries, which is twice as large as in the Fermi;
- 8 dispatch units and 4 warp schedulers (twice more than in the Fermi).
For easier reading, the next table summarizes the architectural differences between the GK104, GF114 and GF110:
It can be noted that the GK104 is structurally closer to the GF114 rather than to the GF110. This seems to be another indication that the GK104 is a flagship solution but temporarily and will eventually be replaced by a GK110 that is going to feature some GPGPU optimizations.
Thanks to the new SMX design, the GK104 has got a lot of execution units, surpassing the GeForce GTX 580 in nearly every parameter. It has half the latter's geometry-processing units but each of those units has double performance. And the transistor count has only increased by 18% at that: from 3 billion in the GeForce GTX 580 to 3.54 billion in the GeForce GTX 680.
Another goal Nvidia wanted to reach with its Kepler was to make the new architecture more energy efficient. Most of the changes in the architecture were implemented with this purpose in mind. The power consumption of stream processors and control logic has been reduced. The double clock rate of stream processors in previous GPUs used to require various buffers and queues but now that the whole GPU is clocked at the same frequency, these auxiliary components are not needed anymore and their transistors can be utilized to implement more execution devices. The clock generators and stream processors requiring less power at the lower frequency, the total power consumption is reduced although the peak theoretical performance remains the same. Nvidia claims that the new stream processor design makes them twice as energy efficient as in the Fermi.
The control logic has been modified as well. The Fermi's hardware instruction decoding and reordering mechanisms have been replaced with software ones which, although require almost twice the amount of transistors to implement, consume less power. This has helped Nvidia to boost performance and reduce power consumption even more. The amount of control logic per stream processor has decreased, contributing to power saving. Considering that most of a GPU's transistors account for execution devices and control logic, the described changes make the Kepler architecture far more energy efficient compared to its predecessor.