News
 

Bookmark and Share

(8) 

Advanced Micro Devices on Tuesday revealed numerous details about the Bulldozer micro-architecture, which will power the company's next-generation central processing units (CPUs) for desktops, servers and workstations. Apparently, the main goal of AMD's designers when it came to Bulldozer was to ensure maximum sharing of resources within multi-core microprocessors to get high performance amid moderate low power and die sizes.

Traditional approach to creation of multi-core microprocessor is very straightforward: each core acts independently and shares only the most obvious resources with other: L3 cache, memory controller, processor bus, etc. In Bulldozer designs cores will be able to dynamically share fetch and decode blocks, caches and other units. At least in initial designs, multi-core chips will consist of several major blocks, each of which will have two independent integer cores (that will share fetch, decode and L2 functionality) with dedicated schedulers and two 128-bit FMAC pipes with one FP scheduler. This means that each major block is, according to AMD, essentially a tightly-linked dual-core microprocessor with shared fetch, decode and floating point units.

Such "dual-core" major block will not be as efficient as two traditional cores, but will consume less power and will use less die space, which in effect means that more of major building blocks can be installed without increasing thermal design power or die size to unacceptable levels. Moreover, AMD reasonably claims that the approach is more efficient than simultaneous multi-threading or chip-level multi-threading. In fact, according to AMD, each major block can provide 80% performance of a dual-core microprocessor.

AMD also implied that Bulldozer will feature a new predication-directed instruction preset mechanism to overlap instruction miss requests to cache or memory and thus improve efficiency of execution. In particular, this will help AMD to maximize utilization of the "dual-core" major blocks of Bulldozer microprocessors.

Bulldozer instruction set architecture supports SSE 4.1; SSE 4.2; AVX with AMD 4-operand FMAC subset, 256-bit YMM registers and AES; XSAVE state space management and XOP instructions. Bulldozer will also support light weight profiling (LWP) technology. As indicated earlier, there are no word on 3DNow! extensions or SSE5 instruction set.

Quite naturally, Bulldozer, just like the code-named Llano accelerated processing unit (APU), supports advanced power management featuring chip power gating and digital measurements of temperatures. Obviously, the chip will be able to dynamically boost clock-speed when thermal design power allows and multiple cores are not required.

Given the fact that AMD's Bulldozer architecture seems to be very modular, we can expect AMD to tailor designs in accordance with performance and/or power requirements. Perhaps, instead of certain major blocks or even instead of certain cores special-purpose accelerators will be installed to boost performance in specific applications.In case of an eight-core chip, there will be four major blocks sharing L3 cache, memory controller as well as north bridge units.

“In my opinion, Bulldozer and Bobcat are not only two of the greatest technical achievements in AMD’s rich history, but two of the most important for the industry as well. With CPUs and APUs built from these core implementations, we expect our customers to deliver a new wave of innovative PC form factors and high-performance computing experiences," said Chekib Akrout, senior vice president and general manager of AMD technology development.

Unfortunately, AMD decided not to share any information about clock-speeds or L2/L3 cache sizes. What we do know is that the first Bulldozer processors will be made using 32nm SOI fabrication process in 2011 and that with 33% increase of the number of cores, up to 50% of additional performance may be received in server applications, at least, based on AMD's internal simulations.

Tags: AMD, Bulldozer, Zambezi, Orochi, Interlagos, Valencia, Phenom, Athlon

Discussion

Comments currently: 8
Discussion started: 08/24/10 07:20:09 PM
Latest comment: 08/27/10 12:12:00 PM
Expand all threads | Collapse all threads

[1-5]

1. 
AMD is currently doing the talk, now, can they do the walk? They can't risk another "Barcelona" and, from my perspective, they need another "Hammer" win. AMD lost in the server and desktop markets and they are playing catchup. It is also discouraging that there are not any more specific release dates.
0 0 [Posted by: RtFusion  | Date: 08/24/10 07:20:10 PM]
Reply
- collapse thread

 
Don't you worry about DAMit's server excecution, they doing well even with delayed G34/G32 ramp up.

What's frustrating is that AMD deliberately treats their desktop users as third rate customers with their socket loan shark business. It's IDIOTICALLY to announce AM3 new chipset line just few month (a year before delayed Bulldozer) new micro-architecture comes to scene and now to play foolish with anybody that upgraded to "new fully Bulldozer compatible chipset" but not be capable of running it on "new-yet old mobo"
I dont wanna fill anybodys pocket and yet be undergo idiot treatment by some AMDs marketing departement which has IQ=2, try to rub them together and see if they'll catch on fire.
0 0 [Posted by: OmegaHuman  | Date: 08/24/10 11:51:40 PM]
Reply
 
AM2 and AM3 CPU will work on AM3+ socket,you forget to say some word about Intel users ,which always pay more money for upgrade IQ=1.5 .Intel have 4 or 5 socket ,what u talking about man? buhahaha.
0 0 [Posted by: Blackcode  | Date: 08/25/10 06:28:26 AM]
Reply

2. 
"Unfortunately, AMD decided not to share any information about clock-speeds or L2/L3 cache sizes."

If that's true, you forgot to mention L1 data and L1 instruction cache size :wink:
0 0 [Posted by: OmegaHuman  | Date: 08/24/10 11:45:49 PM]
Reply

3. 
The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions,XOP is a revision of the SSE5 instruction set proposal announced on August 30, 2007
In fact XOP is SSE5.

http://en.wikipedia.org/wiki/XOP_instruction_set
0 0 [Posted by: Blackcode  | Date: 08/25/10 06:23:11 AM]
Reply
- collapse thread

 
XOP is not SSE5, XOP is additional extension set which is now included into SSE5 specifications, just like CVT.
0 0 [Posted by: OmegaHuman  | Date: 08/25/10 02:15:39 PM]
Reply

4. 
why do you claim that the module design "will not be as efficient as two traditional cores"? you seem to be claiming that most code is bottlenecked within the (shared) fetch/decode stages - but is that actually the case?
0 0 [Posted by: markhahn  | Date: 08/26/10 03:13:08 PM]
Reply

5. 
I wonder if any one is going to attempt to use a level 0 scratchpad memory.
0 0 [Posted by: nforce4max  | Date: 08/27/10 12:12:00 PM]
Reply

[1-5]

Add your Comment




Related news

Latest News

Tuesday, July 22, 2014

10:40 pm | ARM Preps Second-Generation “Artemis” and “Maya” 64-Bit ARMv8-A Offerings. ARM Readies 64-Bit Cores for Non-Traditional Applications

7:38 pm | AMD Vows to Introduce 20nm Products Next Year. AMD’s 20nm APUs, GPUs and Embedded Chips to Arrive in 2015

4:08 am | Microsoft to Unify All Windows Operating Systems for Client PCs. One Windows OS will Power PCs, Tablets and Smartphones

Monday, July 21, 2014

10:32 pm | PQI Debuts Flash Drive with Lightning and USB Connectors. PQI Offers Easy Way to Boost iPhone or iPad Storage

10:08 pm | Japan Display Begins to Mass Produce IPS-NEO Displays. JDI Begins to Mass Produce Rival for AMOLED Panels

12:56 pm | Microsoft to Fire 18,000 Employees to Boost Efficiency. Microsoft to Perform Massive Job Cut Ever Following Acquisition of Nokia