Bookmark and Share


Advanced Micro Devices continies to share secrets about its forthcoming products that belong to code-named Bulldozer. Recently the company expained in details its new floating point unit (FPU) called "Flex FP" that promises to deliver high performance computing and be very efficient in terms of die size and power consumption.

As is known, Bulldozer processors consists of several so-called modules. Each module has two integer engines as well as one "Flex FP" FPU consisting of two 128-bit FMAC units that share with its own schedule.  The approach is different from a hypothetical 256-bit FPU with appropriate data paths that would be underutilized oftern. Moreover, unified scheduler for both FP and integer execution units would also be less efficient, according to AMD.

"Each Flex FP has its own scheduler; it does not rely on the integer scheduler to schedule FP commands, nor does it take integer resources to schedule 256-bit executions. This helps to ensure that the FP unit stays full as floating point commands occur. Our competitors’ architectures have had single scheduler for both integer and floating point, which means that both integer and floating point commands are issued by a single shared scheduler vs. having dedicated schedulers for both integer and floating point executions," said John Fruehe, the director of product marketing for server/workstation products at AMD.

Modern 128-bit FPUs can execute four single precision commands or two double precision commands in parallel per cycle. The yet-to-come AVX technology allows to execute eight 32-bit commands or four 64-bit commands per cycle. However, once a program does not support AVX then: "that flashy new 256-bit FPU only executes in 128-bit mode". This is naturally a blow for Intel's 256-bit FPU of Sandy Bridge processor.

The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores. With each cycle, either core can operate on 256 bits of parallel data via two 128-bit instructions or one 256-bit instruction, or each of the integer cores can execute 128-bit commands simultaneously.

"In today’s typical data center workloads, the bulk of the processing is integer and a smaller portion is floating point. So, in most cases you don’t want one massive 256-bit floating point unit per core consuming all of that die space and all of that power just to sit around watching the integer cores do all of the heavy lifting. By sharing one 256-bit floating point unit per every 2 cores, we can keep die size and power consumption down, helping hold down both the acquisition cost and long-term management costs," explained Mr. Fruehe.

By having a shared Flex FP the power budget for the processor is held down. This allows AMD to add more integer cores into the same power budget. In fact, AMD claims that the Flex FP is designed to reduce its active idle power consumption to a mere 2% of its peak power consumption.

"Obviously, there are benefits of recompiled code that will support the new AVX instructions. But, if you think that you will have some older 128-bit FP code hanging around (and let’s face it, you will), then don’t you think having a flexible floating point solution is a more flexible choice for your applications? For applications to support the new 256-bit AVX capabilities they will need to be recompiled; this takes time and testing, so I wouldn’t expect to see rapid movement to AVX until well after platforms are available on the streets," concluded Mr. Fruehe.

Tags: AMD, Bulldozer, AVX, , Flex FP, 32nm


Comments currently: 4
Discussion started: 10/28/10 01:55:00 AM
Latest comment: 10/29/10 05:46:36 PM
Expand all threads | Collapse all threads


Interesting take on things .. Sure there is more integer work to be done but I hope that, when it comes to full FP work, AMD's unit will at least be on par with whatever INTEL would bring. Also, the die size savings and power savings are good decisions but let's hope that they'll be enough the counteract the cost savings INTEL's new 450mm wafer will bring.

Also, being able to manifest better FPU performance WITHOUT the need to recompile software is a good thing as, as we all know, programmers are too dumb and superficial to correctly identify a processor and use it's abilities and especially dumb when it comes to optimizations. Take the fact that many , many games identify the Athlon Barton CPUs as having no SSE capabilities and are also compiled so that they wouldn't work on AMD CPUs as they don't search for SSE but for the manufacturer string. If it's not INTEL, it won't run. Like I've said, horrendously stupid programmers. One well known case is Borderlands. And just look at what happened with 3D Now and ulterior versions of it. They were never used properly although were earlier to the market, in some ways better that INTEL's implementation, and, don't forget, as all in the know agreed, INTEL practically copied much from AMD when introducing SSE and ulterior versions .
0 0 [Posted by: East17  | Date: 10/28/10 01:55:00 AM]

What the hell are you talking about East17??? Intel nor anyone else will be using 450nm wafers for at least another 5 years...

Did you read this article or just look at the headline???

As for SSE usage on AMD CPUs or lack there of, can be blamed on Intel & developers, since a lot of development houses use Intel compilers (which use to intentionally not recognize features in AMD CPUs). However, knowing how Intel works they probably were giving software development companies the software free so long as they purchased Intel hardware. See how that works???
0 0 [Posted by: freebird26  | Date: 10/28/10 05:06:31 AM]
- collapse thread

I know its far away in the future BUT .. if its far away for INTEL, it may be further away for AMD/Globalfoundries.
0 0 [Posted by: East17  | Date: 10/29/10 05:46:36 PM]

Don't worry. AMD doesn't disclose planned BD clock frequency for a reason. Everything discussed about BD vs. SB looks to compare them at the same clock frequencies. But remember that a higher clock frequency also means higher throughput. The way how Bulldozer's FPU is constructed, it will also work well for legacy code. If you like, read my latest blog entries for more background info on pipelining and latencies.

450nm wafers are a future story. And cost competitiveness in the future will very likely be one goal of GlobalFoundries strategy.
0 0 [Posted by: Dresdenboy  | Date: 10/28/10 05:35:26 AM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture