Some time ago Nvidia unveiled its new GM204 GPU. Featuring the Maxwell 2.0 architecture, the chip is installed on the GeForce GTX 980 and GTX 970 graphics cards. We’ve recently tested MSI’s original GeForce GTX 970 Gaming product and today we’re going to take a look at a reference GeForce GTX 980 and check out the new GPU’s architecture.
We’ll also benchmark the new card in comparison with its market opponents in gaming tests and at certain computing tasks.
Maxwell 2.0: On Road to Perfection
Half a year ago we examined Nvidia’s first Maxwell-based products: GeForce GTX 750 Ti and GeForce GTX 750. Although they used a junior model of the new GPU series, it sported excellent energy efficiency. Manufactured on the same 28nm technology as the earlier Kepler, it beat much more complex chips in performance while staying within a TDP of 75 watts. And the Kepler itself had been considered the most energy-efficient architecture available! Nvidia implied then that that was just the first version of the Maxwell architecture, with the second version scheduled to appear soon in high-performance GPUs. And here they are: the GeForce GTX 980 and GeForce GTX 970 graphics cards which use the GM204 GPU with Maxwell 2.0 architecture. We don’t expect another breakthrough in terms of energy efficiency, though. The GPU is still manufactured on 28nm technology and looks like an update to Maxwell 1.0. That said, Maxwell 2.0 is still full of surprises.
SMM, PolyMorph Engine and Memory Subsystem
We had a lengthy discussion of the Maxwell 1.0 architecture in our GeForce GTX 750 Ti review, so today we focus on what differentiates Maxwell 2.0 from the first version of that architecture rather than from the Kepler design. Here’s a diagram of the GM204 processor.
Like its predecessor GK104, the GM204 incorporates four Graphics Processing Clusters and four 64-bit memory channels. There are 16 raster operators linked to each of the memory channels, which is twice their number in the Kepler or Maxwell 1.0 design. Each GPC, in its turn, contains four updated stream multiprocessors. As opposed to the SMMs in Maxwell 1.0, they have more of shared memory (96 instead of 64 kilobytes) and a new geometry engine called PolyMorph Engine 3.0. The latter features increased performance as well as new functionality which we’ll discuss later on. The number of double-precision computing units (FP64) has been reduced to 4 per each SMM (as opposed to 6 per each SMX in the GK104). Thus, the GPU’s FP64 performance is going to be only 1/32nd of its single-precision computing speed. The GPU and the graphics card are positioned as gaming products, so this fact is hardly a problem. Moreover, the fastest GK110-based cards for floating-point computations will still be available, probably until the release of the top-end GM200 GPU targeted at professional applications. The changes we’ve just mentioned are summed up in the following table:
As for the memory subsystem, the GM204 has 2 megabytes of shared L2 cache memory. This is 4 times as much as in the GK104 and the same amount as in the GM107. We think all of the Maxwell-based GPUs are going to have the same amount of cache, perhaps with the exception of the top-end GM200. In our Maxwell 1.0 review we wrote that the large cache would help use graphics memory more efficiently. Maxwell 2.0 improves this even more by using new lossless data compression algorithms. They are 25% efficient on average, depending on the type of data, and help increase memory subsystem performance without raising the clock rate or making the memory bus broader. The GM204 has a 256-bit bus, yet those algorithms help ensure performance equivalent to a traditional 320-bit bus.
Nvidia keeps on improving 3D rendering quality, trying to find new ways to achieve photorealism. That’s why the Maxwell 2.0 GPUs provide hardware acceleration for the new global illumination algorithm VXGI (Voxel Global Illumination) and support Microsoft’s upcoming DirectX 12 API. We’ll dwell on both items in more detail now.
Accurate lighting is key to making a 3D scene look photorealistic. Traditional methods have gone a long way since the T&L subunit was introduced in the GeForce 256 yet have not reached perfection. They still require some post-processing. On the other hand, the alternative scene illumination method, called Ray Tracing, has been around for a long time, too. With ray tracing, the way of lots of rays is calculated as they go from a light source, including direct and indirect lighting as well as numerous reflections. This method ensures high quality and is used for special effects in movies and cartoons, but it is too resource-consuming to be expedient in games and other real-time applications. There are no consumer-class hardware devices capable of calculating ray-tracing illumination in real time at a speed of tens of frames per second. To address this problem, Nvidia suggested the Voxel Global Illumination algorithm in 2011. The algorithm uses a simplified version of ray tracing. First, all objects in a 3D scene are divided into rather large "bricks" called voxels and it is for such bricks that the global illumination is calculated. Instead of millions of pixels covering 3D scene objects, the illumination has to be calculated for just a few thousand voxels. And second, cones are used instead of rays, which helps save resources at a negligible reduction in quality. The VXGI tech demo showed the Apollo landing on the Moon. You can see in the screenshots how the scene was divided into voxels and how lighting was calculated for the voxel scene.
The VXGI was used in the tech demo of Unreal Engine 4. Unfortunately, for all Nvidia’s efforts, the resulting algorithm was still too resource-consuming to be executed on a single GPU. But they found a way to introduce hardware acceleration for it. That's why the PolyMorph Engine had to be upgraded to version 3.0 in the SMMs. The VXGI is a software algorithm which can be used on any modern GPU but it runs at thrice its speed on Maxwell 2.0 GPUs thanks to the hardware acceleration.
Microsoft’s new APUs, DirectX 11.3 and DirectX 12, are supported in the Maxwell 2.0 design. DirectX 12 is an upcoming API for Microsoft’s future operating systems (presumably, Windows 9). It is going to dramatically reduce latencies when processing GPU commands and minimize CPU load. DirectX 12 won’t have new hardware requirements and will work on DirectX 11 graphics cards. As for DirectX 11, it does have some minor enhancements requiring hardware support in the latest version (DirectX 11.3). So, we don't have any breakthroughs in this area, yet Maxwell 2.0 seems to have good potential which will be fully revealed in due time.