AMD Reveals Details About Bulldozer Microprocessors

AMD Unveils Bulldozer Micro-Architecture Peculiarities

by Anton Shilov
08/24/2010 | 03:48 PM

Advanced Micro Devices on Tuesday revealed numerous details about the Bulldozer micro-architecture, which will power the company's next-generation central processing units (CPUs) for desktops, servers and workstations. Apparently, the main goal of AMD's designers when it came to Bulldozer was to ensure maximum sharing of resources within multi-core microprocessors to get high performance amid moderate low power and die sizes.

 

Traditional approach to creation of multi-core microprocessor is very straightforward: each core acts independently and shares only the most obvious resources with other: L3 cache, memory controller, processor bus, etc. In Bulldozer designs cores will be able to dynamically share fetch and decode blocks, caches and other units. At least in initial designs, multi-core chips will consist of several major blocks, each of which will have two independent integer cores (that will share fetch, decode and L2 functionality) with dedicated schedulers and two 128-bit FMAC pipes with one FP scheduler. This means that each major block is, according to AMD, essentially a tightly-linked dual-core microprocessor with shared fetch, decode and floating point units.

Such "dual-core" major block will not be as efficient as two traditional cores, but will consume less power and will use less die space, which in effect means that more of major building blocks can be installed without increasing thermal design power or die size to unacceptable levels. Moreover, AMD reasonably claims that the approach is more efficient than simultaneous multi-threading or chip-level multi-threading. In fact, according to AMD, each major block can provide 80% performance of a dual-core microprocessor.

AMD also implied that Bulldozer will feature a new predication-directed instruction preset mechanism to overlap instruction miss requests to cache or memory and thus improve efficiency of execution. In particular, this will help AMD to maximize utilization of the "dual-core" major blocks of Bulldozer microprocessors.

Bulldozer instruction set architecture supports SSE 4.1; SSE 4.2; AVX with AMD 4-operand FMAC subset, 256-bit YMM registers and AES; XSAVE state space management and XOP instructions. Bulldozer will also support light weight profiling (LWP) technology. As indicated earlier, there are no word on 3DNow! extensions or SSE5 instruction set.

Quite naturally, Bulldozer, just like the code-named Llano accelerated processing unit (APU), supports advanced power management featuring chip power gating and digital measurements of temperatures. Obviously, the chip will be able to dynamically boost clock-speed when thermal design power allows and multiple cores are not required.

Given the fact that AMD's Bulldozer architecture seems to be very modular, we can expect AMD to tailor designs in accordance with performance and/or power requirements. Perhaps, instead of certain major blocks or even instead of certain cores special-purpose accelerators will be installed to boost performance in specific applications.In case of an eight-core chip, there will be four major blocks sharing L3 cache, memory controller as well as north bridge units.

“In my opinion, Bulldozer and Bobcat are not only two of the greatest technical achievements in AMD’s rich history, but two of the most important for the industry as well. With CPUs and APUs built from these core implementations, we expect our customers to deliver a new wave of innovative PC form factors and high-performance computing experiences," said Chekib Akrout, senior vice president and general manager of AMD technology development.

Unfortunately, AMD decided not to share any information about clock-speeds or L2/L3 cache sizes. What we do know is that the first Bulldozer processors will be made using 32nm SOI fabrication process in 2011 and that with 33% increase of the number of cores, up to 50% of additional performance may be received in server applications, at least, based on AMD's internal simulations.