http://forums.amd.com/dev...d=208&threadid=112934
Furthermore you could probably combine those 2x128bit FP Units to calculate 1 256bit instruction.

AMD Discloses First Details About Next-Generation Bulldozer Processor
[11/12/2009 03:44 PM]Advanced Micro Devices this week disclosed the first details about its next-generation Bulldozer processor that is due in 2011. Although specifications of the chip seem to be rather promising at this point of time, in about one and a half years from now the central processing unit (CPU) may face a too strong rival and repeat the history of its predecessors.
Based on the information provided by AMD during its annual Analyst Day in November, the first Bulldozer chip code-named Zambezi (which belongs to Orochi family, according to the firm) will feature eight x86 processing engines with multithreading technology, two 128-bit FMAC floating point units, shared L2 cache, shared L3 cache as well as integrated memory controller. AMD also states that the new CPU will feature “extensive new power management innovations”.

The implementation of 128-bit FMAC is quite logical: AMD’s SSE5 set of extensions do feature 128-bit multimedia instructions as well as 128-bit three-operant instructions. In fact, there is a trend of increasing of precision of floating point instructions, as we can observe from the last decade.
What is important to note is that Intel Corp.’s forthcoming Sandy Bridge processor features Advanced Vector Extensions (AVX), which support 256-bit FP operations, something very progressive. Both AMD and Intel have already released documentation regarding AVX and SSE5 for developers, but Intel managed to unleash a new compiler supporting AVX in June ’09, whereas AMD has not managed to roll-out an SSE5-supporting tool. As a result, the vast majority of developers are already capable of creating AVX-capable software; however, almost no designers can make SSE5-capable programs at the moment.
Nevertheless, based on the diagram that AMD demonstrated, the company intends to dramatically improve multithreading performance of its CPUs: two INT schedulers, an FP scheduler and separate data caches for each of four cores should do the job very well.
AMD has not released any data regarding performance of Bulldozer chip, unfortunately, but since the chip designer positions the unit as a solution for desktop and server solutions in 2011, it does expect this 32nm SOI with high-k metal gate power-house to be a high-performer.
Tags: AMD, Bulldozer, Zambezi, Orochi, , SSE5, AVX, 32nm
11:41 pm | Dell Admits Windows 8 Did Not Meet Expectations, Pins Hopes on “Blue” Updates. Dell Disappointed with Windows 8, But Believes in the Future
10:59 pm | AMD Needs More Than Game Console Design Wins to Offset PC Market Declines – Analysts. AMD Has to Develop Competitive Product Lineup to Survive in Current Environment
10:33 pm | Corning Introduces Corning Lotus XT Glass for High-Performance Displays. Corning Advances Glass Substrate for High-Performance Displays
9:51 pm | True Stereo-3D Will Require 330MP – 3.3GP Resolutions, Says Developer of 8K Video Format. NHK: 8K Is the Final 2D Format, All Future Formats Will Be in 3D
9:41 pm | Innodisk Begins to Ship DDR4 RDIMM Samples to Server Makers. Independent DIMM Supplier Samples DDR4 RDIMMs
8:56 pm | Samsung Develops 45nm Embedded Flash Logic Process Technology. Samsung Successfully Tests 45nm Embedded Flash Logic Manufacturing Tech
7:57 pm | NHK Shows World’s First 8K Movie at Cannes Film Festival. Japanese National Broadcasting Company Demos 8K Movie, Content to Film Industry
7:27 pm | Intel’s Paul Otellini: Lack of Chip for iPhone, iPad Was My Worst Mistake. Intel’s Outgoing CEO Regrets About Mission Opportunities with Apple iOS
