by Anton Shilov
08/29/2012 | 06:03 PM
At the Hot Chips conference, Advanced Micro Devices has revealed the first details about its next-generation x86 core code-named Steamroller, which is presumably due in 2013. Steamroller contains a number of improvements specifically made to increase performance in traditional x86 apps, but in general Steamroller is a highly-improved Bulldozer, not something brand new.
Just like in case of the Bulldozer architecture, Steamroller x86 cores - which will power AMD's future high-performance Opteron and FX chips - will be located inside dual-core modules and therefore processors on its base should be similar by design with Orochi and Viperfish, with some minor exceptions that will not be truly important (new memory controller, different internal buses additional tweaks, etc) foe x86 performance. The main improvements will be independent instruction decoders for each core within a module, better schedulers, larger and smarter caches, more register resources and some other enhancements.
One of the reasons why dual-core Bulldozer modules [the same may be said about Piledriver] are not completely efficient is because they have only one instruction decoder for two ALUs and one FPU. With steamroller, AMD not only incorporated two decoders per module, but also increased instruction cache size (to lower i-cache misses by 30%), enhanced instruction pre-fetch (the number of mis-predicted branches is down by 20% compared to Bulldozer ) as well as improved max-width dispatches per thread by 25%. AMD believes that Steamroller will provide 30% improvement in ops per cycle.
AMD also advanced single-core execution by implementing 5%-10% more efficient scheduling, incorporated higher-capacity register files and performed some other tweaks. It should be noted that while integer pipes of Steamroller will not be too different from existing ones, the floating point pipe will be a bit redesigned. In general, AMD promises that both integer and floating point per-core performance of Steamroller will be higher than they are today with Bulldozer micro-architecture.
One of the interesting features of AMD Steamroller will be its ability to disable unused parts of L2 cache. Since not all apps are cache-bound, this may result in decreased power consumption and/or AMD's ability to boost clock-speeds of its microprocessors dynamically.
It is noteworthy that AMD decided to talk about its Steamroller micro-architecture that will be utilized inside microprocessors made using 28nm process technology approximately a year or more ahead of their roll-out. Potentially, it means that the company is pins a lot of hopes on the new micro-architecture, even more that it pins on the nearly available Piledriver.