Secrets of AMD’s Multi-Core Design
It is quite logical that the increase in the number of processor cores inevitably leads to increase in size of the processor die. As a result, the complexity of the manufacturing process as well as production costs increase as well. Therefore, processors with the largest number of cores, for example, are currently only used in the server segment, because corporate clients are much more eager to pay extra than regular users. AMD’s strategy to increase the number of processor cores while maintaining their acceptable price point implies that they will need to simplify the cores accordingly. However, on the other hand, simplifying the cores may produce some unwanted consequences, such as lowering of the system performance in applications that cannot parallel well, which are currently still pretty numerous.
Therefore, AMD engineers decided to take a unique alternate way. The microarchitecture of individual cores has become more complex increasing the number of instructions executed per single clock cycle where possible.
But parts of the resources, which normally exist in each processor core but have excessive efficiency, are now shared between pairs of cores.
The resulting dual-core units became the primary building blocks for the Bulldozer processors. AMD refers to these units as modules. Each of them has two fully-functional sets of integer execution units. However, the floating point unit, data prefetch units and instructions decoders as well as the L2 cache are shared between the two cores. According to the developers, these components have sufficient potential to feed both cores, because in those cases when there is one such set per core, it is often idling. Moreover, the delays in their stall-free operation do not have any serious effect on the resulting overall performance.
According to AMD, a single module designed as described above can perform at the 80% capacity of a fully-functional dual-core processor. However, they save about 44% of the transistor budget (and consequently of the semiconductor die size).
This inventive approach to increasing the AMD processor cores density allowed the company to design an eight-core (or four-module) Bulldozer semiconductor die.
Moreover, a pretty significant part of the die is allocated for cache memory. L2 processor caches shared between pairs of CPU cores within a single module are 2 MB each, and the L3 cache memory shared across the entire processor is 8 MB big. This way, if we take into account AMD’s traditional exclusive cache design, we can state that the total amount of cache memory reaches 16 MB per eight-core CPU. At the same time, the Bulldozer die size remains within reasonable limits, which means that AMD developers indeed achieved their ultimate goal.
If we talk absolute numbers, it means that eight-core Bulldozer processors will have a smaller semiconductor die than, say, six-core Thuban (Phenom II X6) CPUs with K10 microarchitecture inside. However, it is important to keep in mind that Bulldozer processor will be manufactured using more advanced 32 nm production process. And compared with contemporary quad-core Intel Sandy Bridge processors, the new eight-core AMD CPUs will have only 45% larger die.
However, due to Hyper-Threading technology, the operating system may also see quad-core Intel Sandy Bridge processors as eight-core ones, just like the Bulldozer. This fact may pose a question about how appropriate it is to actually call Bulldozer a fully-fledged eight-core processor. However, it is important to understand that AMD and Intel took different approaches to implementing simultaneous execution of eight computational threads. Intel developers armed their microarchitecture with the ability to execute two computational threads within a single core using only one set of execution units. AMD, on the contrary, removed all “unnecessary” extras from two fully functional cores, but kept two sets of execution units inside each module.
As a result, Intel’s Hyper-Threading technology increases multi-threaded performance only by about 15-20%, while AMD’s solution produced an 80% boost on transition from 4 to 8 threads.
Although I have to admit that the semiconductor die of the eight-core Bulldozer processor looks more like a four-core one because of its modular internal structure.