Estimated Efficiency of New Bulldozer Microarchitecture
Before we got to the actual benchmarking part, we decided to try and predict what we could expect the new Bulldozer microarchitecture to be capable of in general. To accomplish this we compared the new processor against other CPUs on K10 and Sandy Bridge microarchitectures in synthetically created identical environments: at the same clock frequency and with the same number of active cores.
To be more exact we compared AMD FX-8150, Phenom II X6 1100T and Core i7-2600 at 3.6 GHz frequency and with only two active computational cores. To ensure the purity of the experiment we disabled all power-saving and auto-overclocking technologies. We used a set of simple synthetic benchmarks in SiSoft Sandra 2011 suite, where we manually disabled all instructions beyond SSE3, because K10 microarchitecture doesn’t support them.
The numbers in this table speak louder than words. The performance of Bulldozer microarchitecture has become way lower than that of the previous-generation processors. The simplification of Bulldozer microarchitecture by combining a pair of cores into a single module with shared resources led to a significant (25-40%) drop in specific performance compared with the previous-generation AMD microarchitecture. As a result, Bulldozer cores do not just work at half the speed of Sandy Bridge cores. In addition to that the performance of the Bulldozer processor module with two cores is even lower than that of a single Sandy Bridge core with enabled Hyper-Threading technology. Should we expect any performance records from a CPU with such microarchitecture? This is more of a rhetorical question…
At the same time let’s take a look at the practical characteristics of the caches and memory sub-system. To estimate the performance of these functional units we resorted to Cachemem utility from Aida64 suite. We used DDR3-1600 SDRAM with 9-9-9-27-1T timings. Just as in the previous case, the processors all worked at 3.6 GHz clock frequency.
As we can see, the practical latencies of all caches and memory sub-system in Zambezi processors increased. We have already discussed it in the chapter devoted to Bulldozer microarchitecture. However, the memory bandwidth increased almost in all cases due to modifications of the internal cache-memory organization.
At the same time, the fastest dual-channel memory controller and the fastest cache-memory sub-system are the ones in Sandy Bridge. Although in terms of cache size, the ne Bulldozer will be superior.