Two years ago, in March 2012, Nvidia unveiled to the public its Kepler architecture and the first graphics card it was implemented in, GeForce GTX 680. At that time Nvidia was trying to catch up with AMD whose side was represented by the flagship Radeon HD 7970. The GeForce GTX 680 was a success indeed and restored the equilibrium in the top-end price category. Later on, Nvidia rolled out less advanced and more affordable Kepler-based products as well.
The peculiarity of the recent announcement of Nvidia's Maxwell architecture is that it is first implemented not in top-end but in low-end solutions which come under the names of GeForce GTX 750 Ti and GTX 750. And there are some valid reasons for that.
We’ll explain those reasons in our today’s review. We will also take a look at the GeForce GTX 750 Ti and compare its performance with its competitors.
Maxwell 1.0 Architecture: More Energy Efficiency!
Nvidia has officially announced its first GPU with Maxwell architecture, emphasizing the fact that it is just the first generation of Maxwell-based GPUs. Manufactured on the polished-off 28nm tech process, it will consist of two chips: GM107 and GM108. Codenames ending in 7 and 8 are traditionally reserved by Nvidia for junior GPUs which are used in affordable graphics cards. The GeForce GTX 750 and the GeForce GTX 750 Ti are based on the GM107 chip, which succeeds the older GK107.
The second generation of Maxwell-based GPUs will be rolled out in the second half of this year and will supposedly consist of three chips codenamed GM200, GM204 and GM206 - all manufactured on new 20nm technology. There will be some additional modifications in the second-generation Maxwell but, like in the first generation, the focus will be on energy efficiency.
Some people argue that performance matters more than energy efficiency but it turns out that energy efficiency is actually the key to high performance. The saved watts can be spent to increase clock rates or the number of computing subunits. For example, suppose we have two chips with the same performance, but one consumes 300 watts whereas the other, 150 watts. The latter chip can be modified to have more computing subunits or higher clock rates and beat the first chip in performance because the 300W chip has no reserves for similar modifications due to its already excessive power draw. Moreover, lower power consumption makes a graphics card with better consumer properties, other factors being the same. Less advanced power and cooling systems can be used, which translates into better reliability and lower noise level.
Energy efficiency has been improving for years, becoming even more important after the process of transitioning to new manufacturing technologies has slowed down. When it is not possible to quickly switch to a "thinner" tech process, you have to resort to increasing clock rates in order to make your top-end solutions faster. And it would not be possible without optimizing your product designs for better energy efficiency. Even transitioning to a new tech process is not a solution anymore since transistor density grows faster than power consumption per each transistor. So if the GPU architecture is not optimized, heat dissipation per each square millimeter of the semiconductor chip will be growing until it becomes an obstacle to increasing clock rates and, consequently, performance.
Nvidia claims that the Maxwell was designed for a record-breaking performance-per-watt ratio. It is no wonder considering that it was developed with the future 20nm tech process in mind. We can remind you that two years ago Intel released its Ivy Bridge CPUs manufactured on 22nm tech process with 3D tri-gate technology. The transistor became tridimensional. TSMC plans to switch to 3D transistors in its 20nm tech process, too. That will increase transistor density but also heat dissipation. So, if with the current 28nm tech process Maxwell-based solutions prove to be economical and cold, the upcoming 20nm Maxwell-based chips (with much more transistors) will be just ordinary in these parameters. Now that it's clear why Nvidia tried to make its Maxwell 1.0 architecture as energy-efficient as possible, let’s see how it did so. Here is an overview of the GM107 chip:
There seem to be no fundamental differences from the previous Kepler architecture. We can see a GigaThread Engine, an L2 cache, raster operators and a graphics processing cluster (GPC) with five Maxwell streaming multiprocessors (Nvidia calls them “SMMs”). But every difference we can spot is actually meant to ensure higher energy efficiency. The L2 cache is 2 MB large, which is 8 times the size of the L2 cache of the previous GK107 chip. The larger cache can contain more data for reading and writing, reducing memory controller load and increasing the cache hit rate. Besides higher energy efficiency, the cache also boosts the overall performance, by the way.
The key difference of the Maxwell from the Kepler is in the new streaming multiprocessors which are referred to as SMMs rather than SMXs. As is typical of Nvidia’s junior solutions, the GM107 has one GPC but this GPC contains five SMMs whereas in the Kepler architecture there were up to three SMXs per each GPC. It means that there is less control logic per each SM. The SMM design is different, too.
Here are the key differences from the Kepler architecture:
- The system of caches has been revised. In the Kepler, there was a 64KB L1 cache/shared memory while the texture cache was separate. In the Maxwell, there is a separate 64KB block of shared memory while the texture and L1 caches are combined into one.
- As opposed to the Kepler where all of SMX resources were shared, an SMM is divided into several groups of subunits with dedicated control logic. This simplifies the internal connections and control logic, lowering the resulting power consumption.
- As in the Kepler, one SMM contains four warp schedulers but now they control not a single array of 192 SPs but four separate 32-SP arrays (one per each warp scheduler). Thus, one SMM contains 128 SPs. Thanks to control logic and SP design optimizations, the efficiency of each SP has grown by about 35%, so one SMM is just a little slower compared to a single SMX while consuming only half the power and incorporating fewer transistors. The register file is the same total size as in the Kepler: 65,536 entries, 32 bits each. It is also split up into four parts, 16,384 entries each.
- Each warp scheduler now has an instruction buffer halfway between the instruction cache and the warp scheduler itself. This helps increase performance while lowering the total power consumption.
- There are eight texture subunits in an SMM as opposed to 16 in an SMX. These eight are split up into two quads, each quad having dedicated texture and L1 caches (which are now combined). One quad of texture subunits is shared by two processing subunits consisting of 32 SPs.
So despite the outward similarities, the Maxwell differs from the Kepler architecture from a number of aspects. Many subunits have been redesigned. There are now fewer inter-chip connections. Large subunits have been split up into smaller ones. There is a neat hierarchy of subunits with different types of execution devices linked to dedicated resources. According to Nvidia, the execution devices have become more efficient whereas the energy efficiency of the new architecture is double that of the Kepler. In fact, the GM107, as implemented in the GeForce GTX 750 Ti card, claims to be faster practically than the GeForce GTX 650 Ti while having fewer execution devices and lower peak theoretical performance. And the new card only needs 60 watts while its predecessor, up to 110! That’s quite an achievement, really. The Kepler used to be regarded as energy-efficient, too. So improving this parameter within the same tech process must have been a daunting task.
Now let’s move on from theory to practice and put Nvidia’s claims to test.