According to the original concept, any APU consists of three major components. And here Trinity doesn’t change anything: new generation of hybrid processors consists of processor cores, integrated graphics accelerator and a small, but very important component – unified North Bridge. This is exactly what links a bunch of versatile cores into a balanced system and, together with the DDR3 SDRAM controller, ensures that computing and graphics cores communicate with each other and with the system memory seamlessly and are capable of working jointly with the same data.
Overall, the Trinity structure remained the same as it was in Llano, but all the individual components have been modified. Moreover, all changes have been made without increasing the size of the semiconductor die dramatically: AMD’s didn’t change the production process and continued using 32 nm Globalfoundries SOI technology, but increasing the price of the APUs positioned as affordable products also didn’t seem like such a good idea. As a result, Trinity dies got only 8% larger and is now 246 mm2 in size. The transistor count also increased only a little bit and now reaches 1.303 bln (it used to be 1.178 bln). Moreover, even the distribution of the transistor budget between the computing and graphics components hasn’t really changed that much either: they occupy about the same area of the die in both cases.
Nevertheless, this is where the discussion of similarities between Llano and Trinity ends. For example, computing cores in the new APU generations have been changed a lot. From now on hybrid processors will use Bulldozer microarchitecture, and to be more exact its second generation called Piledriver. Dual- and quad-core Trinity processors contain one or two quasi dual-core modules, which as you remember, contain two sets of execution units and can process two threads simultaneously, but at the same time share cache memory, instruction fetcher, instruction decoder and floating point unit. However, unlike FX processors on Bulldozer microarchitecture, which do not have integrated graphics, Trinity not only has fewer cores, but also has no L3 cache.
However, the second generation Bulldozer microarchitecture used in the new APUs and nowhere else, boasts a number of minor improvements boosting performance, reducing leakage currents and ensuring stability at high clock speeds. The front-end now features more precise branch predictor and a larger instruction window. Execution units acquired an improved scheduler and are now able to execute certain instructions faster, particularly such as integer and floating-point division. Moreover, the developers mention having increased the L1 TLB and having improved arbitration and data prefetch algorithms in the L2 cache. All this provides Trinity processors with about 25% computing performance boost compared to Llano (according to the manufacturer).
The unified North Bridge has also undergone significant modifications. First of all, engineers revised the access priorities for the shared memory, by giving the top spot on the list to computing cores, which in reality generate relatively small portion of requests. Besides, AMD made sure that there is support for new memory types including DDR3-1866 in the nominal mode or DDR3-2400 in overclocked mode. Internal data busses were expanded. Now the graphics core can communicate with the memory controller along the 256-bit Radeon memory Bus, while all the communications outside the chip use PCI Express protocol replacing Hyper-Transport.
However, the changes made to the graphics core are the most interesting. The thing is that AMD managed to boost its performance quite substantially without really increasing the transistor budget or dramatically modifying the architecture. In other words, they managed to increase the density of the effective GPU units by sequestering some extras. In my opinion, this finding deserves special attention, especially since Trinity’s integrated graphics core is our primary focus today anyway.